Snake eating its tail: how can synthetic data possibly work for training AI? (opens in new tab)

(tomdekan.com)

3 pointstomdekan10mo ago2 comments

2 comments

Animal analogies! They're fun, aren't they?

I think synthetic data generation can make leapfrog improvements to AI and finally get us somewhere where the technology pays off.

However, leapfroging has its risks: https://en.wikipedia.org/wiki/The_Scorpion_and_the_Frog

So, the idea is to gather a very large amount of such scorpions (people that are only in for the hype and will definitely sting you), and do a tremendous scorpion barbecue on the other side of the river (when we finally discover some good use for AI).

fcpguru10mo ago

I thought synthetic training data was a different concept:

Use a current LLM to generate a bunch of fake reddit posts, other websites, code, etc. billions of artifacts but change certain properties of our real world. You could make the sky red, and dogs able to talk, and dolphins as well you get the idea. Anything goes. Generate not just a crazy world but all the artifacts from this world. Then train an new LLM on this crazy data. Now ask questions. And rather that silly changes like sky is red, you can do really fundamental changes to the laws of physics and from that, learn something new about the real rules.

j / k navigate · click thread line to collapse