If I can hire an employee who draws on knowledge they learned from copyrighted textbooks, why can't I hire an AI which draws on knowledge it learned from copyrighted textbooks? What makes that argument "wacky" in your eyes?
The other is a person learning from a copyrighted textbook in the legally protected manner, and whom and use the textbook was written for.
"Can you elaborate on how it's not comparable?"
The process of individual people interacting with their culture is a vastly different process than that used to train large language models. In what ways to you think these processes have anything in common?
"It seems obvious to me that it is -- they both learn and then create -- so what's the difference?"
This doesn't seem obvious to me (obviously)! Maybe you can argue that an LLM "learns" during training, but that ceases once training is complete. For sure, there are work-arounds that meet certain goals (RAG, fine-tuning); maybe your already vague definition of "learning" could be stretched to include these? Still, comparing this to how people learn is pretty far-fetched. AFAICT, there's no literature supporting the view that there's any commonality here; if you have some I would be very interested to read it. :-)
Do they both create? I suspect not; an LLM is parroting back data from it's training set. We've seen many studies showing that tested LLMs perform poorly on novel problem sets. This article was posted just this week:
https://news.ycombinator.com/item?id=42565606
The court is still out on the copyright issue, for the perspective of US law we'll have to wait on this one. Still, it's clear that an LLM can't "create" in any meaningful way.
And so on and so forth. How is hiring an employee at all similar to subscribing to an OpenAI ChatGPT plan? Wacky indeed!
But if they're learning from the same kinds of materials, and producing the same kind of output, then obviously the comparison can be made. And your idea that LLM's don't create seems obviously false.
So I have to conclude the two seem comparable, and someone would have to show why different legal principles around copyright ought to apply, when it's a simple question of input/output. Why should it matter if it's a human or algorithm doing the processing, from a copyright perspective? Nothing "wacky" about the question at all.
Strangely like the situation itself.
The question is just looked to how can we guarantee a model is influenced rather than memorising an input?
And then is a human who is influenced simply relying on a faulty or less than perfect memory?