>
it's that the dataset really is the 'secret sauce'alwayshasbeen.jpg
There have been articles about how "data is the new oil" for a couple of decades now, with the first reference I could find being from British mathematician Clive Humby in 2006 [0]. The fact that it rings even more true in the age of LLMs is simply just another transformation of the fundamental data underneath.
[0] https://en.wikipedia.org/wiki/Clive_Humby#cite_ref-10