We are nowhere near out of data. We're just out of hyper-relevant modern data. There is probably about 100-200T of old books, newspapers, journal articles, magazines and so forth. For reference, GPT3 was trained on 45T.
Just order of magnitude seems pretty close to out of data, actually, if beyond that we're looking at the firehose of low-density bulk-generated modern data.