undefined | Better HN

0 pointsrobbiep1y ago0 comments

I’m not involved in the space, but it seems to me that having a model, in particular a massive model, exposed to a corpus of text like a book in the training data would have very minimal impact. I’m aware that people have been able to return data ‘out of the shadows’ pf the training data but to my mind a model being mildly influenced by the weights between different words in this text hardly constitute hard recall, if anything it now ‘knows’ a little of the linguistic style of the authour.

How far off am I?

0 comments

int_19h1y ago

It depends on how many times it had seen that text during training. For example, GPT-4 can reproduce ayats from the Quran word for word in both Arabic and English. It can also reproduce the Navy SEAL copypasta complete with all the typos.

kaibee1y ago

Poe's "The Raven" also.

19h1y ago

Brothers in username.. :-)

Salgat1y ago

Remember, it's also trained on countless internet discussions and papers on the book.

j / k navigate · click thread line to collapse