> How much do language models memorize?
— https://arxiv.org/abs/2505.24832
— https://news.ycombinator.com/item?id=44171363
It shows that models are limited in how much they can memorise (~3.6 bits per parameter), and once that threshold is reached, the model starts to generalise instead of memorise.
I mean humans don't forget copyrighted information. We just typically adjust it enough (some of the time) to avoid getting a copyright strike while modifying it in some way useful.
We don't forget 'private' information either. We might not tell other people that information, but it still influences our thoughts.
The idea of a world where we have AI minds forget vast amounts of information that humans have to deal with every day is concerning and dystopian to me.
New works in familiar styles are something I can't wait for. The idea that the best Beethoven symphony hasn't been composed yet, or that the best Basquiat hasn't been painted yet, or that if the tech ever gets far enough, Game of Thrones might actually be done properly with the same actors, is a pretty mouthwatering prospect. Also styles we haven't discovered, that AI can anticipate. How's it to do that without a full understanding of culture? Hobbling the delight it could bring generally for the sake of protected classes will just make the tech less human and a lot less exciting.
IMO the only reason there's even a question about whether LLMs can legally be trained on copyrighted works without permission is that the training is being done by (agents working on behalf of) rich people. If you or I scraped up every copyrighted work we could get our hands on without ever asking permission, trained an LLM on it, and then tried to sell access to the result? Just ask Aaron Swartz how that sort of thing goes, and his actions were orders of magnitude less.
Humans don't forget copyrighted material but we also don't normally memorize it. It takes substantial time and effort to be able to reproduce copyrighted material with just your brain.
> As far as copyrighted and artistic works go, I've never fully understood what the objection is … > But if that's accepted, then for fairness it would have to be extended to every other profession which stands to be wiped out by AI, which would be daft. … > Hobbling the delight it could bring generally for the sake of protected classes will just make the tech less human and a lot less exciting.
So let me get this straight, you want to ruin the livelihoods of everyone so you can have a fancier toy to play with?
When your life is ruined and can’t make a living you’ll have the answers you desire and understand the objections to why you can’t have fancier toys.
But heres the thing, and with the way the world is going atm, not being able to make a living is going to be the least of your and everyone else’s worries that feel the way you do if ya’ll get your way.
People don’t like having their livelihoods taken away, and when you threaten the livelihoods of their children… people tend towards violence.
I really wish there was a more polite way to put this. Alas what you’re proposing is all out war for what? A better game of thrones?
While, yes, you can argue the slippery slope, it may be advantageous to flag certain training material as exempt. We as humans often make decisions without perfect knowledge, and "knowing more" isn't a guarantee that it produces better outcomes, given the types of information consumed.
BTW, I don't really understand what "social pressure" and "shame" has to do with your story? In my book, the person with a good memory isn't to blame. They're just demonstrating a security issue, which is a good thing.