I fail to see how the argument is ridiculous; and I'll bet that a jury would fin...

Workaccount2 · on June 5, 2024

If a made a bot that read amazon reviews and then output a meta-review for me, would that be a violation of amazon's copyright? (I'm sure somewhere in the amazon ToS they claim all ownership rights of reviews).

If it output those reviews verbatim, sure I can see the issue, the model is over fitting. But if I tweak the model or filter the output to avoid verbatim excerpts, does an amazon lawyer have a solid footing for a "violation of copyright" lawsuit?

jononor · on June 5, 2024

As far as I understand, according to current copyright practices: If you sing a song that someone else has written, or pieces thereof, you are in violation. This is also the case of you switch out the instrumentation completely, say play trumpet instead of guitar, or a male choir sings a female line. If on would make a medley many such parts, it is not automatically not violation anymore either. So we do have examples of things being very far from verbatim copy, being considered violations.

jrm4 · on June 6, 2024

Generally yes. You're talking about "derivative works."

GaggiX · on June 5, 2024

Having exact copies of the samples inside the model weights would be an extremely inefficient use of space, and also it would not generalize, unless it generated a copy so close to the original that it would violate copyright law if used, I wouldn't find it very reasonable to think that there is a memorized copy inside the model weights somewhere.

Arainach · on June 5, 2024

An MP3 file is a lossy copy, but is still copyright infringement.

Copyright infringement doesn't require exact copies.

GaggiX · on June 5, 2024

I didn't say it takes an exact copy for copyright infringement.

ziofill · on June 5, 2024

A program that can produce copies is the same as a copy. How that copy comes into being (whether out of an algorithm or read from a support) is related, but not relevant.

LordDragonfang · on June 5, 2024

>A program that can produce copies is the same as a copy.

A program that always produces copies is the same as a copy. A program that merely can produce copies categorically is not.

The Library of Babel[1] can produce copyrighted works, and for that matter so can any random number generator, but in almost every normal circumstance will not. The same is true for LLMs and diffusion models. While there are some circumstance that you can produce copies of a work, in natural use that's only for things that will come up thousands of times in its training set -- by and large, famous works in the public domain, or cultural touch-stones so iconic that they're essentially genericized (one main copyrighted example are the officially released promo materials for movies).

[1] https://libraryofbabel.info/

waffletower · on June 6, 2024

A human illustrator can also copy existing works. As a result they are not criminalized for making other non-copies. The output of an AI needs to be considered independently of its input. Further, the folly of copyright ought to be also considered, since no work -- whether solely human in origin (such as speech, unaccompanied song, dance etc.) or built with technological prosthesis/instrumentality -- is ever made in a cultural vacuum. All art is interdependent. Copyright exists to allow art (in the general sense) to be compensated for. But copyright has grown into a voracious entitlement for deeply monied interests, and has long intruded upon the commons and fair use.

FractalHQ · on June 7, 2024

human illustrator != machine owned by a Moloch powered mega tech corp

GaggiX · on June 5, 2024

Yeah that's right, I doubt that a model would generate an image or text so close to a real one to violate copyright law just by pure chance, the image/text space is incredibly large.

jrm4 · on June 6, 2024

But you're a tech person. I'm trying to think of this from the point of view of e.g. a likely potential jury.

Again: Imagine two AI machines, different in one way: One of them has been fed "Article X" and the other hasn't.

You press buttons on the machine(s) in the same way.

The machine that was fed "Article X" spits out something that looks like Article X, and the one that wasn't, doesn't.

The magic inside, I don't think will much matter.

GaggiX · on June 6, 2024

>But you're a tech person. I'm trying to think of this from the point of view of e.g. a likely potential jury.

Courts can call experts to testify on matters requiring specialized knowledge or expertise.

jrm4 · on June 7, 2024

Absolutely; but again, if I'm the lawyer on the other side, I'm pretty confident that I'm beating any "expert" on this with the simple logic of:

- You put thing into the machine

- You press buttons, it makes obvious derivative work

- You don't put thing into the machine, and it can't do that anymore.

There is "something" in there GENERATING COPIES and we see exactly where it came from, even if we can't identify it in the code or whatever.

waffletower · on June 6, 2024

But you already have the same situation with people -- 'what you're left with is an artist that produces things that strongly resemble the original, that would not have been produced, had the artist not studied the original work'. Yes it is ridiculous.

jrm4 · on June 6, 2024

Right. But broadly, the law also strongly tends to distinguish humans from machines in this space. (Which, imho, is a very good idea)

lisperforlife · on June 5, 2024

I am curious about models like encodec or soundstream. They are essentially meant to be codecs informed by the music they are meant to compress to achieve insane compression ratios. The decompression process is indeed generative since a part of the information that is meant to be decoded is in the decoder weights. Does that pass the smell test from a copyright law's perspective? I believe such a decoder model is powering gpt-4o's audio decoding.