undefined | Better HN

0 pointsdkjaudyeqooe1y ago0 comments

> simply training a model on illegally distributed text should not be copyright infringement

You can train a model on copyrighted text, you just can't distribute the output in any way without violating copyright. (edit: depending on the other fair use factors).

One of the big problems is that training is a mechanical process, so there is a direct line between the copyrighted works and the model's output, regardless of the form of the output. Just on those terms it is very likely to be a copyright violation. Even if they don't reproduce substantive portions, what they do reproduce is a derived work.

0 comments

saulpw1y ago

If that mechanical process is not reversible, then it's not a copyright violation. For instance, I can compute the SHA256 hashes for every book in existence and distribute the resulting table of (ISBN, SHA256) and that is not a copyright violation.

dkjaudyeqooeOP1y ago

That's actually within the other fair use factors. So your hash table is fair use because its transformative and doesn't substitute for the original work.

I edited my post to make it a bit clearer.

anticensor1y ago

It's actually even less than fair use, it's non-copyright use: one-way hashes are intentionally designed to eliminate the creative element and output random looking data.

gruez1y ago

>One of the big problems is that training is a mechanical process, so there is a direct line between the copyrighted works and the model's output, regardless of the form of the output. Just on those terms it is very likely to be a copyright violation. Even if they don't reproduce substantive portions, what they do reproduce is a derived work.

Google making thumbnails or scanning books are both arguably "mechanical". Both have been ruled as fair use.

aoanevdus1y ago

What’s a “mechanical process”? If I read The Lord of the Rings and it teaches me to write Star Wars, is that a mechanical process? My brain is governed by the laws of physics, right?

What if I’m a simulated brain running on a chip? What if I’m just a super-smart human and instead of reading and writing in the conventional way, I work out the LLM math in my head to generate the output?

dkjaudyeqooeOP1y ago

Anything a machine does. You can simulate whatever you like, but under the law it's not human so it's mechanical.

j / k navigate · click thread line to collapse

0 pointsdkjaudyeqooe1y ago0 comments

> simply training a model on illegally distributed text should not be copyright infringement

You can train a model on copyrighted text, you just can't distribute the output in any way without violating copyright. (edit: depending on the other fair use factors).

0 comments

saulpw1y ago

dkjaudyeqooeOP1y ago

That's actually within the other fair use factors. So your hash table is fair use because its transformative and doesn't substitute for the original work.

I edited my post to make it a bit clearer.

anticensor1y ago

It's actually even less than fair use, it's non-copyright use: one-way hashes are intentionally designed to eliminate the creative element and output random looking data.

gruez1y ago

Google making thumbnails or scanning books are both arguably "mechanical". Both have been ruled as fair use.

aoanevdus1y ago

What’s a “mechanical process”? If I read The Lord of the Rings and it teaches me to write Star Wars, is that a mechanical process? My brain is governed by the laws of physics, right?

dkjaudyeqooeOP1y ago

Anything a machine does. You can simulate whatever you like, but under the law it's not human so it's mechanical.

j / k navigate · click thread line to collapse