Ah, I see what you mean: the number of unique examples increases logarithmicly with data size, which kind of makes sense. Language, in this case, follows a power law.
I think you argument is that this means smaller datasets are ok because they contain "most" of what the larger datasets contains. But I think this data-power-rule implies the opposite. ML models can often get to 80-90% accuracy on some task. Unfortunately, these models often aren't that useful because that missing 10% of accuracy matters a lot to users. So what this data-power-rule implies is that, in order to get the last 10% of gains, you need 10x the amount of data.