As the scale of input data goes up linearly, the scale of commonly observed input patterns goes up logarithmically. If we bumped the scale up an order of magnitude in terms of common input tokens, that still means we could annotate the important part of a 150TB text corpus for 125B worth of human annotation. Given that could break the budget of even large corporations, realistically we'd probably train a model to predict the scores of interest using a fraction of that much human annotation, which would be inferior but still a massive improvement. It is also likely that corporations would team up with indirect competitors to share the cost of annotation and gain an advantage against direct competitors.
How do you figure? Let's say a commonly observed input pattern comprises 1% of training data. For a data set of size N, 0.01 * N examples will contain the pattern. If we increase the size to 2N, 0.01 * 2 * N examples will contain the pattern. Why is the growth logarithmic?