I think they would need to have some explicit contract every time they want to s...

zelphirkalt · on Feb 7, 2025

If you arrive at the point of being able to buy that book, it means it has passed the publisher's hands and I would think, that the publisher was OK with those terms then, and limiting the usage of the text may in fact be effective. If it was self-published, then even more so.

echoangle · on Feb 7, 2025

But the license restriction would have to apply both to the publisher and the customer.

If I go to the bookstore, buy the book, make a scan, and train an LLM with it, how would you enforce your license as an author? The customer never knew that he shouldn’t have been allowed to train LLMs.

Edit: I think I misunderstood the original comment, I thought the idea was to sell books and restrict use for LLM training. If we’re only talking about stuff that’s publicly released, the restriction should be possible.

zelphirkalt · on Feb 7, 2025

Whether you make a scan of it or not, the license applies to the IP, I guess (IANAL).

Whether the shop makes a scan should not affect you as the buyer of the actual book. What does the scan have to do with you?

Whether the author learns about that scan and perhaps training of some LLM using the scan or not, does not change the legality of it.

echoangle · on Feb 7, 2025

But the license doesn’t apply to me as a customer if I can’t be expected to even notice it. If I buy a book in a bookstore, no one would assume that training LLMs on it would be explicitly forbidden. And adding a note to the book would probably not be binding because no one is expected to read the legal notice in a book.

zelphirkalt · on Feb 8, 2025

Ah, I assumed, that the clauses regarding the use in training of an LLM are printed inside the book somewhere.

EMIRELADERO · on Feb 8, 2025

It would still be unenforceable because there's no consideration.

There is nothing of value that the license gives me that I wouldn't already have if the contract didn't exist. I can already read the book, merely by having it in front of me.

zelphirkalt · on Feb 8, 2025

How does that give you the right to train an LLM on it?

Or are we talking about training an LLM on it and never releasing that LLM to anyone ever? Then I guess it wouldn't matter. But if that LLM is released to anyone, shouldn't the author of the book have a say on it?

EMIRELADERO · on Feb 8, 2025

> How does that give you the right to train an LLM on it?

Fair use gives me that right, not a contract or license.

zelphirkalt · on Feb 8, 2025

Whether that falls under fair use is highly debatable.

EMIRELADERO · on Feb 9, 2025

It's going through the courts right now. We'll probably have an answer in a year or two.

wyldfire · on Feb 10, 2025

I felt for a long time that it should be fair use. If an LLM can abstract what it learns from the copyrighted work, then that seems "fair" because that's what humans do.

But ... as I've thought about it more, it doesn't really feel just to me. The kind of value reaped from the works seems to suggest that the creator is due some portion of that value. Also, in practice - there's just an absolutely enormous amount of knowledge that can be consumed from the public domain. Even if Meta, OpenAI and friends decided to license a ~small handful of the long-term archives of some globally-read newspapers, they could get very broad and deep knowledge about the events, trends, terms of the last century to fill in a lot of gaps.