Stable Audio Open

Uehreka · on June 5, 2024

> The new model was trained on audio data from FreeSound and the Free Music Archive. This allowed us to create an open audio model while respecting creator rights.

This feels like the “Ethereum merge moment” for AI art. Now that there exists a prominent example with the big ethical obstacle (Proof of Work in the case of Ethereum, nonconsensual data-gathering in the case of generative AI) removed, we can actually have interesting conversations about the ethics of these things.

In the past I’ve pushed back on people who made the argument that “generative AI intrinsically requires theft from artists”, but the terrible quality of models trained on public domain data made it difficult to make that argument in earnest, even if I knew I was right in the abstract.

Workaccount2 · on June 5, 2024

The idea that AI trained on artist created content is theft is kind of ridiculous anyway. Transformers aren't large archives of data with needles and thread to sew together pieces. The whole argument is meant to stifle an existential threat, not to halt some illegal transgression. If they cared about the latter a simple copyright filter on the output of the models would be all that's needed.

jrm4 · on June 5, 2024

I fail to see how the argument is ridiculous; and I'll bet that a jury would find the idea that "there is a copy inside" at least reasonable, especially if you start with the premise that "the machine is not a human being."

What you're left with is a machine that produces "things that strongly resemble the original, that would not have been produced, had you not fed the original into the machine."

The fact that there's no "exact copy inside" the machine seems a lot like splitting hairs; like saying "Well, there's no paper inside the hard drive so the essence of what is copyable in a book can't be in it"

Workaccount2 · on June 5, 2024

If a made a bot that read amazon reviews and then output a meta-review for me, would that be a violation of amazon's copyright? (I'm sure somewhere in the amazon ToS they claim all ownership rights of reviews).

If it output those reviews verbatim, sure I can see the issue, the model is over fitting. But if I tweak the model or filter the output to avoid verbatim excerpts, does an amazon lawyer have a solid footing for a "violation of copyright" lawsuit?

jononor · on June 5, 2024

As far as I understand, according to current copyright practices: If you sing a song that someone else has written, or pieces thereof, you are in violation. This is also the case of you switch out the instrumentation completely, say play trumpet instead of guitar, or a male choir sings a female line. If on would make a medley many such parts, it is not automatically not violation anymore either. So we do have examples of things being very far from verbatim copy, being considered violations.

jrm4 · on June 6, 2024

Generally yes. You're talking about "derivative works."

GaggiX · on June 5, 2024

Having exact copies of the samples inside the model weights would be an extremely inefficient use of space, and also it would not generalize, unless it generated a copy so close to the original that it would violate copyright law if used, I wouldn't find it very reasonable to think that there is a memorized copy inside the model weights somewhere.

Arainach · on June 5, 2024

An MP3 file is a lossy copy, but is still copyright infringement.

Copyright infringement doesn't require exact copies.

GaggiX · on June 5, 2024

I didn't say it takes an exact copy for copyright infringement.

ziofill · on June 5, 2024

A program that can produce copies is the same as a copy. How that copy comes into being (whether out of an algorithm or read from a support) is related, but not relevant.

LordDragonfang · on June 5, 2024

>A program that can produce copies is the same as a copy.

A program that always produces copies is the same as a copy. A program that merely can produce copies categorically is not.

The Library of Babel[1] can produce copyrighted works, and for that matter so can any random number generator, but in almost every normal circumstance will not. The same is true for LLMs and diffusion models. While there are some circumstance that you can produce copies of a work, in natural use that's only for things that will come up thousands of times in its training set -- by and large, famous works in the public domain, or cultural touch-stones so iconic that they're essentially genericized (one main copyrighted example are the officially released promo materials for movies).

[1] https://libraryofbabel.info/

waffletower · on June 6, 2024

A human illustrator can also copy existing works. As a result they are not criminalized for making other non-copies. The output of an AI needs to be considered independently of its input. Further, the folly of copyright ought to be also considered, since no work -- whether solely human in origin (such as speech, unaccompanied song, dance etc.) or built with technological prosthesis/instrumentality -- is ever made in a cultural vacuum. All art is interdependent. Copyright exists to allow art (in the general sense) to be compensated for. But copyright has grown into a voracious entitlement for deeply monied interests, and has long intruded upon the commons and fair use.

FractalHQ · on June 7, 2024

human illustrator != machine owned by a Moloch powered mega tech corp

GaggiX · on June 5, 2024

Yeah that's right, I doubt that a model would generate an image or text so close to a real one to violate copyright law just by pure chance, the image/text space is incredibly large.

jrm4 · on June 6, 2024

But you're a tech person. I'm trying to think of this from the point of view of e.g. a likely potential jury.

Again: Imagine two AI machines, different in one way: One of them has been fed "Article X" and the other hasn't.

You press buttons on the machine(s) in the same way.

The machine that was fed "Article X" spits out something that looks like Article X, and the one that wasn't, doesn't.

The magic inside, I don't think will much matter.

GaggiX · on June 6, 2024

>But you're a tech person. I'm trying to think of this from the point of view of e.g. a likely potential jury.

Courts can call experts to testify on matters requiring specialized knowledge or expertise.

jrm4 · on June 7, 2024

Absolutely; but again, if I'm the lawyer on the other side, I'm pretty confident that I'm beating any "expert" on this with the simple logic of:

- You put thing into the machine

- You press buttons, it makes obvious derivative work

- You don't put thing into the machine, and it can't do that anymore.

There is "something" in there GENERATING COPIES and we see exactly where it came from, even if we can't identify it in the code or whatever.

waffletower · on June 6, 2024

But you already have the same situation with people -- 'what you're left with is an artist that produces things that strongly resemble the original, that would not have been produced, had the artist not studied the original work'. Yes it is ridiculous.

jrm4 · on June 6, 2024

Right. But broadly, the law also strongly tends to distinguish humans from machines in this space. (Which, imho, is a very good idea)

lisperforlife · on June 5, 2024

I am curious about models like encodec or soundstream. They are essentially meant to be codecs informed by the music they are meant to compress to achieve insane compression ratios. The decompression process is indeed generative since a part of the information that is meant to be decoded is in the decoder weights. Does that pass the smell test from a copyright law's perspective? I believe such a decoder model is powering gpt-4o's audio decoding.

mey · on June 5, 2024

Our copyright model isn't sufficient yet. Is putting a work through a training/model sufficient to clear the transformative use bar? That doesn't make you safe from Trademarks. If the model can produce outputs on the other side that aren't sufficiently transformative then that single instance is a copyright violation.

Honestly, instead of trying to cleanup the output, it's much safer to create a licensed input corpus. People haven't because it's expensive and time consuming. Every time I engage with an AI vendor, my first question is do you indemnify from copyright violations of your output. I was shocked that Google Gemini/Bard only added that this year.

jncfhnb · on June 5, 2024

Nothing will ever protect you from trademark violations because trademarks can be violated completely by accident without knowledge of the real work. Copying is not the issue.

ffsm8 · on June 5, 2024

I'm honestly surprised AI-washing hasn't become way more widespread then it is at this point.

I mean recording a good song is hard. Generating a good song almost impossible. But my gut feeling would've been that recreating a popular song for plausible deniability would be a lot easier.

Same with republishing bestselling books and related media. (I.e. take Lord of the rings and feed it paragraph for paragraph into an LLM that you've prompted to rephrase each to a currently bestselling author.)

kimixa · on June 5, 2024

I think the definition between "Lossy Compression" and "Trained AI" is... vague according to the current legal definitions. Or even "lossless" in some cases - as shown by people being able to get written articles output verbatim.

While the extremes are obvious, there's a big stretch of gray in the middle. A similar issue occurs in non-AI art, the difference between inspiration and tracing/copying isn't well defined either, but the current method of dealing with that (being on a case-by-case basis and a human judging the difference) clearly cannot scale to the level that many people intend to use these tools.

cthalupa · on June 5, 2024

Has anyone been able to actually get a verbatim copy of a written article? The NYT got a ~100 word fragment made up of multiple snippets of a ~15k word article, with the different snippets not even being in order. (The Times had to re-arrange the snippets to match the article after the fact)

I am simply not aware of anyone successfully doing this.

kimixa · on June 5, 2024

The amount of content required to call it a "Copy" is also a gray area.

Same with the idea of "prompting" and the amount required to generate that copywritten output - again there's the extremes of "The prompt includes copywritten information" to "Vague description".

Arguably some of the same issues exist outside AI, just it's accessibility, scale, and lack of a "Legal Individual" on one side complicates things. For example, if I describe Micky Mouse sufficiently accurately to an artist they reproduce it to the degree it's considered copyright infringement, is it me or the artist that did the infringement? Then what if the artist /had/ seen the previously copywritten artwork, but still produced the same output from that same detailed prompt?

immibis · on June 5, 2024

What's good for the goose is good for the gander. It may or may not be like theft, but either way, if one of us trained an AI on Hollywood movies, you best believe we'd get sued for eleventy billion dollars and lose. It's only fair that we hold corporations to the same standard.

JohnKemeny · on June 5, 2024

I think you should read the case material for NY Times v OpenAI and Microsoft.

It literally says that within ChatGPT is stored, verbatim, large archives of NY Times articles and that they were able to retrieve them through their API.

Workaccount2 · on June 5, 2024

..which makes no sense. It is either an argument of ignorance or of purposeful deceit. There is no coherent data corpus (compressed or not) in ChatGPT. What is stored are weights that create a string of tokens that can recreate excerpts data that it was trained on, with some imperfect level of accuracy.

Which I agree is problematic, and OpenAI doesn't have the right to disseminate that.

But that doesn't mean OpenAI doesn't have the right to train on it.

Content creators are doing a purposeful slight of hand to confabulate "outputting copyrighted data" with "training on copyrighted data".

It's illegal for me to read an NYT article and recite it from memory onto my blog.

It's not illegal for me to read an NYT article and write my own summary of the article's contents on my blog. This has been true forever and has forever been a staple in new content creation.

SahAssar · on June 5, 2024

> Content creators are doing a purposeful slight of hand to confabulate "outputting copyrighted data" with "training on copyrighted data".

I don't think so, I think it's usually argued as two different things.

The "training on copyrighted data" argument is usually that we never licensed this work for this sort of use and it is different enough from previously licensed uses that it should be treated differently.

The "outputting copyrighted data" argument is somewhat like your output is so similar as to constitute a (at least) partial copy.

Another argument is that licensed data is whitewashed by being run through a model. So you could have GPL licensed code that is open source run through a model and then output exactly the same but because it has been outputted by the model it is considered "cleaned" from the GPL restrictions. Clearly this output should still be GPL:ed.

> It's not illegal for me to read an NYT article and write my own summary of the article's contents on my blog. This has been true forever and has forever been a staple in new content creation.

What if I compress the NYT article with gzip? What if I build a LLM model that always replies with the full article within 99% accuracy? Where is the line?

This is not a technical issue, we need to decide on this just like we did with copyright, trademarks, etc. Regardless of what you think this is not a non-issue and we cant use the same rules as we did up until now unless we treat all ML systems as either duplication or humans and neither seems to solve the issues.

freedomben · on June 5, 2024

> Another argument is that licensed data is whitewashed by being run through a model. So you could have GPL licensed code that is open source run through a model and then output exactly the same but because it has been outputted by the model it is considered "cleaned" from the GPL restrictions. Clearly this output should still be GPL:ed.

I don't think anybody is making that argument. The NY Times claims to have gotten ChatGPT to spit out NY Times articles verbatim but there is considerable doubt about that. Regardless, everyone agrees that a verbatim (or close to) copy is copyright violation, even OpenAI. Every serious model has taken steps to prevent that sort of thing.

SahAssar · on June 5, 2024

Both chatGPT and copilot will happily spit out blocks of code without any form of license. When you say "Every serious model has taken steps to prevent that sort of thing" do you mean they are hiding it or really changing the training data?

megaman821 · on June 6, 2024

How much code is needed for copyright to take effect? A whole program, a file, a block, a line, a variable name? Since there isn't a legally accepted answer yet, I am not sure what can be done other than litigating cases where it goes too far.

SahAssar · on June 6, 2024

Would this example be over that line for you? https://x.com/mitsuhiko/status/1410886329924194309

I think most people would agree that function is copyrightable if recreated verbatim.

Philip-J-Fry · on June 5, 2024

When you describe ChatGPT as just a model with weights that can create a string of tokens, is it any different from any lossless compression algorithm?

I'm sure if I had a JPEG of some copyrighted raw image it could still be argued that it is the same image. JPEG is imperfect, the result you get is the same every time you open it but it's not the same as the original input data.

ChatGPT would give you the same output every time, and it does if you turn off the "temperature" setting. Introduce a bit of randomness into a JPEG decoder and functionally what's the difference? A slightly different string of tokens for ChatGPT versus a slightly different collection of pixels for a JPEG.

CyberDildonics · on June 5, 2024

Did you mean lossy compression algorithm? That would make sense.

bckr · on June 5, 2024

> There is no coherent data corpus (compressed or not) in ChatGPT.

I disagree.

If you can get the model to output an article verbatim, then that article is stored in that model.

Just because it’s not stored in the same format is meaningless. It’s the same content regardless of whether it’s stored as plaintext, compressed text, PDF, png, or weights in a model.

Just because you need an algorithm such as a specialized prompt to retrieve this memorized data, is also irrelevant. Text files need to be interpreted in order to display them meaningfully, as well.

cthalupa · on June 5, 2024

> If you can get the model to output an article verbatim, then that article is stored in that model.

You can't get it to do that, though.[1]

The NYT vs OpenAI case, if anything, shows that even with significant effort trying to get a model to regurgitate specific work, it cannot do it. They found articles it had overfit on due to snippets being reposted elsewhere across the internet, and they could only get it to output those snippets, and not in correct order. The NYT, knowing the correct order, re-arranged them to fit the ordering in the article.

Even doing this, they were only able to get a hundred or so words out of the 15k+ word articles.

No one who knows anything about these models disagrees that overfitting can cause this sort of behavior, but the overwhelming majority of the data in these models is not overfit and they take a lot of care to resolve the issue - overfitting isn't desirable for general purpose model performance even if you don't give a shit about copyright laws at all.

People liken it to compression, like the GP mentioned, and in some ways, it really is. But in the most real sense, even with the incredibly efficient "compression" the models do, there's simply no way for them to actually store all this training data people seem to think is hidden in there, if you just prompt it the right way. The reality is only the tiniest fraction of overfit data can be recovered this way. That doesn't mean that the overfit parts can't be copyright infringing, but that's a very separate argument than the general idea that these are constantly putting out a deluge of copyrighted material.

(None of this goes for toy models with tiny datasets, people intentionally training models to overfit on data, etc. but instead the "big" models like GPT, Claude, Llama, etc.)

1. https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkxbmgpe/...

bckr · on June 5, 2024

> The NYT, knowing the correct order, re-arranged them to fit the ordering in the article.

> Even doing this, they were only able to get a hundred or so words out of the 15k+ word articles.

OK, that’s less material than I believed, which shows the details matter. But we agree that the overfit material, while limited, is stored in the model.

Of course, this can be (and surely is) mitigated by filtering the output, as long as the product is the output and not the model itself.

semi · on June 5, 2024

>Just because you need an algorithm such as a specialized prompt to retrieve this memorized data, is also irrelevant.

I disagree. Granted I'm a layman and not a lawyer so I have no clue how the court feels. But I can certainly make very specialized algorithms to produce whatever output I want from whatever input I want, and that shouldn't let me declare any input as infringing on any rights.

For the reducto ad absurdum example: I demand everyone stops using spaces, using the algorithm 'remove a space and add my copyrighted text' it produces an identical copy of my copyrighted text.

For the less absurd example.. if I took any clean model without your copyrighted text, and brute forced prompts and settings until I produced your text, is your model violating the copyright or is my inputs?

bckr · on June 6, 2024

Well, I think this is why it’s not settled yet. However, the law depends on many reasonability tests.

neuralRiot · on June 5, 2024

> It's not illegal for me to read an NYT article and write my own summary of the article's contents on my blog. This has been true forever and has forever been a staple in new content creation.

It’s not that clear-cut. It falls into the “Fair use doctrine”The cose 107 of the US copyright law states that the resolutiodepends on>

> (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.

Another thing we need to consider is that the law was redacted with the human mind limitations as a unconcious factor, (i.e not many people would be able to recite War and peace verbatim from memory). This just brings up the fact that copyright law needs a complete re-think.

__loam · on June 6, 2024

Using the product of someone's labor to build for profit systems that are trying to put them out of work remains fucked up. And no, a simple copyright filter isn't the problem. The problem is that the work was used without permission.

QuantumGood · on June 5, 2024

NY Times v OpenAI and Microsoft says the opposite, that verbatim, large archives of NY Times articles were retrieved via API. This may or may not matter to how LLMs work, but "large archive" seems accurate, other than semantic arguments (e.g. "Compressed archive" may be semantically more accurate).

cthalupa · on June 5, 2024

> NY Times v OpenAI and Microsoft says the opposite, that verbatim, large archives of NY Times articles were retrieved via API.

This does not match my understanding of the information available in the complaint. They might claim they were able to do this, but the complaint itself provides some specific examples that OpenAI and Microsoft discuss in a motion to dismiss... and I think the motion does a very strong job of dismantling that argument based on said examples.

https://fingfx.thomsonreuters.com/gfx/legaldocs/byvrkxbmgpe/...

tomcam · on June 5, 2024

Yet before “safeguards” were added a prompt could say “in the style of Studio Ghibli” and you could get exactly that.

Would it be possible if Studio Ghibli images had not been used in the training?

semi · on June 5, 2024

if it was trained on a sufficient amount of fan art made in the studio Ghibli style and tagged as such, yes.

otherwise those would just be unknown words, same as asking an artist to do that without any examples.

though I am curious how performance would differ between training on only actual studio Ghibli art, only fan art, or a mix. Maybe the fan art could convey what we expect 'studio Ghibli style' to be even more, whereas actual art from them could have other similarities that that tag conveys.

Unai · on June 5, 2024

I don't understand. If I make a painting (hell, or a whole animated movie) in the style of Studio Ghibli, am I infringing their copyright? I don't think so. A style is just an idea, if you want to protect an idea to the point of no one even getting inspired by it just don't let it out of your brain.

If the produced work is not a copy, why does it matter if it was generated by a biological brain or by a mechanical one?

__loam · on June 6, 2024

When will programmers get it through their thick skulls that an artist taking inspiration from a style and a well funded tech corporation downloading 400 million images to train on are two different things that shouldn't be compared. GPT is not a brain and humans correctly have different rights than computing systems.

Unai · on June 6, 2024

I don't know how long you have been in HN, but I would recommend you familiarise with / refresh yourself on the community guidelines (https://news.ycombinator.com/newsguidelines.html). While HN looks and works similar to Reddit (and other such online communities), the tone and culture is not the same.

__loam · on June 6, 2024

Yeah you're right we should be justifying the non-consentual exploitation of millions of workers instead of saying mildly mean words.

hapticmonkey · on June 5, 2024

If you're worried about Proof of Work leading to giant server farms using huge amounts of energy, then I've got something to tell you about AI...

hecanjog · on June 5, 2024

I also highly doubt anyone who signed agreements to have their music included in the Free Music Archive would have been OK with this. The particular type of license was important to contributors and there's a difference between allowing for rebroadcast without paying royalties and allowing for derivative works... I don't really care to argue the point, but it's why there were so many different types of licenses for the original FMA. This just glosses over all that.

blargey · on June 5, 2024

If you look at the repo where the model is actually hosted they specify

> All audio files are licensed under CC0, CC BY, or CC Sampling+.

These explicitly permit derivative works and commercial use.

> Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.

So it’s not being glossed over, and licenses are being abided by in good faith imo.

I wish they’d just added a sentence to their press release specifying this, though, since I agree it looks suspect if all you have to go by is that one line.

(Link: https://huggingface.co/stabilityai/stable-audio-open-1.0#dat... )

hecanjog · on June 6, 2024

That's an important clarification, thanks! The full FMA database has lots of music that isn't licensed that way.

ancientworldnow · on June 5, 2024

This has been Adobe Firefly's value proposition for months now. It works fine and is already being utilized in professional workflows with the blessing of lawyers.

SequoiaHope · on June 5, 2024

I’m so happy to see this! I’ve been saying for a while, if they focused on sample efficiency and building large public datasets, including encouraging Twitter and other social media sites to add image license options and also encouraging people to add alt text (which would also help the vision impaired!), they really could build the models they want while also respecting creatives, thus avoiding pissing a bunch of people off. It’s nice to see Stability step up and actually train on open data!

rfoo · on June 5, 2024

Why is Proof of Work less ethical than Proof of Rich a.k.a. rich being gradually more rich without doing anything?

Not saying PoW is safer (it's not), but less ethical is pretty a bold claim.

Uehreka · on June 5, 2024

The environmental impact. And yes, I know, 0.5%, but my issue was always that if PoW currencies went from being a niche subculture to a point where it was used for everyday exchange (many people were arguing that this would and should happen) that 0.5% would surely go up by a great deal. To a point where crypto had to clear a super high bar of usefulness to counterbalance the harm it would do.

To be fair, AI training also has a big carbon footprint, but I feel like the utility provided by AI makes it easier to argue that its usefulness counterbalances its ecological harm.

avarun · on June 5, 2024

There is no "environmental impact". Environmental impact comes from energy production, not energy usage. It's incoherent to argue others should tamper down their energy usage because most folks producing energy aren't doing it in an ethical way.

lolinder · on June 5, 2024

> It's incoherent to argue others should tamper down their energy usage because most folks producing energy aren't doing it in an ethical way.

There's a general consensus that paying someone else to do your dirty work doesn't free you of the moral (or, usually, legal) culpability for the damage done. If you knowingly direct your money towards unethical providers, you are directly increasing the demand for unethical behavior.

(That's assuming that the producers themselves are responsible for the ethics. If a producer is doing its best to convert to clean energy as fast as possible, they may be entirely in the clear but POW would still be unethical. In that scenario POW is placing strain on the limited clean energy supplies, forcing the producer to use more fossil fuels than they'd otherwise need to.)

ben_w · on June 5, 2024

Ultimately any proof-of-work system has to burn joules rather than clock cycles (because any race on cycles-per-joule is rapidly caught up), and that makes it clearer where the waste is: to be economically stable, in the face of adversarial actions by other nation states who sometimes have a vested interest in undermining your currency so actively seek the chaos and loss of trust in a double-spend event, your currency has to be backed by more electricity than any hostile power can spend on breaking it.

skybrian · on June 5, 2024

It seems you've come up with a proof that there's no such thing as wasting electricity. When you prove an extraordinary claim like that, it's time to go back and figure out how you got it wrong.

jncfhnb · on June 5, 2024

Officer I merely stabbed the man. What he died from was blood loss.

_940h · on June 5, 2024

This is greenwashing. You're still positively valuing the past harms from proof of work.

Solving the past harms means starting from scratch, not giving value to people who previously wasted electricity.

rpicard · on June 5, 2024

Environmental impact of the proof of work algorithms is my understanding.

toenail · on June 5, 2024

Proof of work mining is probably the only industry on the planet that has the potential to be carbon negative AND profitable.

Uehreka · on June 5, 2024

(sigh)

Is this the methane flaring argument, or Peter Thiel’s “windmills in Vermont”?

toenail · on June 5, 2024

(sigh)

Do you expect a reply when you start like this?

Uehreka · on June 5, 2024

I don’t really care either way. I’m tired of having to debunk the same sloppy arguments year after year.

toenail · on June 6, 2024

https://news.ycombinator.com/newsguidelines.html

> Be kind. Don't be snarky. Converse curiously; don't cross-examine.

If you can't follow the guidelines it would probably better to remain quiet.

foota · on June 5, 2024

Renewable energy?

toenail · on June 5, 2024

Waste energy like unused methane, flare gas and the like.

pa7x1 · on June 5, 2024

How do the rich become gradually richer under PoS? I'm flabbergasted by the level of math education.

Assume we have 2 validators in the network; the first one owns 90% of the network, the second one owns 10%. Lets call them Whale and Shrimpy, respectively.

To make the numbers round let's assume total circulating supply of ETH is 100 initially and that the yield resulting from being a validator is 10% per year. After the first year, 10 new ETH will have been minted. Whale would have gotten 9 ETH, and Shrimpy would have gotten 1 ETH. OP is assuming that as 9 is bigger than 1, Whale is getting richer faster than Shrimpy. But, let's look at the final situation globally.

At year 0:

Total ETH circulating supply: 100 ETH

Whale has 90 ETH. Owns 90% of the network.

Shrimpy has 10 ETH. Owns 10% of the network.

At year 1:

Total ETH circulating supply: 110 ETH

Whale has 99 ETH. Owns 90% of the network.

Shrimpy has 11 ETH. Owns 10% of the network.

Whale has exactly the same network ownership after validating for 1 whole year, the network is not centralizing at all! The rich are not getting richer any faster than the poor.

TL;DR: Friends don't let friends skip elementary math classes.

rfoo · on June 5, 2024

Sure, friends also won't let friends skip the fact that circulating supply of ETH is now decreasing instead of increasing.

Also, only ~30% tokens are staked. The 30% who chose to stake essentially tax the other 70% in use. Each of the validator do the same amount of work (ok, strictly speaking you get to do more when you have more ETH staked, but being a validator is cheap and does not cost significantly more energy even if you are being selected more frequently because running one proposal is too cheap, that's the whole environmental point, right?) except what they receive is proportioned to how much they stake.

I hate being mean, but sorry, remembering to check one's assumption is a habit I gained after elementary school, so maybe that's too hard for you.

pa7x1 · on June 5, 2024

> Sure, friends also won't let friends skip the fact that circulating supply of ETH is now decreasing instead of increasing.

This changes absolutely nothing of the calculation. Furthermore, the change in circulating supply last year was of 0.07%.

> Also, only ~30% tokens are staked.

Correct.

> The 30% who chose to stake essentially tax the other 70% in use.

There is something called opportunity cost. With the existence of liquid staking derivatives the choice to stake or not is one of opportunity cost. Plenty of people may consider the return observed by staking insufficient given the opportunity cost and additional risks. Participating in staking is fully permissionless, stakers are not taxing non-stakers. They are being remunerated for their work.

> Each of the validator do exactly same amount of work (that's the point, right) except what they receive is proportioned to how much they stake.

Incorrect. A staker does proportionate amount of work to its stake. That's why it gets paid more. A staker gets paid for fulfilling its duties as defined in the protocol (attesting, proposing blocks, participating in sync committees). For each of those things there are some rewards and some punishments in case you fail to fulfill them. If a staker has more validators running you simply fulfill more of those duties more often, hence your reward scales linearly with number of validators.

rfoo · on June 5, 2024

> Participating in staking is fully permissionless, stakers are not taxing non-stakers. They are being remunerated for their work.

That's just a more polite way to say tax. Being permissionless is cool, but it's still tax in my dict.

> There is something called opportunity cost.

And, who is going to be able to have a larger percentage of their funds staked, a poor or a whale? You need a (mostly) fixed amount of liquidity to use the thing.

> Incorrect. A staker does proportionate amount of work to its stake.

Apologies, I edited my original reply which should answer this.

In short, I don't see anything preventing me to run 10000 validators with 32 ETH each with very similar cost to running just one. It's certainly not linear.

pa7x1 · on June 5, 2024

> That's just a more polite way to say tax. Being permissionless is cool, but it's still tax in my dict.

It most certainly is not. They are doing a work for the network and getting remunerated for it. That's not a tax. That's what is commonly referred to as a job. A kid that delivers newspapers over the weekend is not taxing the kid that decides not to. Both make a free decision on what to do with their time and effort given how much it's worth to them. Running a validator takes skill, time, opportunity cost, and you assume certain risks of capital loss. You are getting remunerated for it.

> And, who is going to be able to have a larger percentage of their funds staked, a poor or a whale? You need a (mostly) fixed amount of liquidity to use the thing.

Indeed, the protocol cannot solve wealth inequality. That's an out of protocol issue. It cannot cure cancer either.

> In short, I don't see anything preventing me to run 10000 validators with 32 ETH each with very similar cost to running just one. It's certainly not linear.

There are some fixed costs, indeed. But they are rather negligible. You need a consumer-grade PC (1000 USD) and consumer-grade broadband to solo stake. Or you can use a Liquid Staking Derivative which will have no fixed costs but will have a 10% cut. The curve of APY as a function of stake is very flat. Almost anything else around us has greater barriers of entry or economies of scale.

everfree · on June 5, 2024

> And, who is going to be able to have a larger percentage of their funds staked, a poor or a whale?

This is a truth that's fundamental to all types of investing. Advantaged people can set aside millions and not touch it for a year or five or twenty. Disadvantaged people can't invest $20 because there's a good chance they'll need it to buy dinner.

Stocks, bonds, CDs, real estate, it all works like this. You've touched on a fundamental property of wealth.

rfoo · on June 6, 2024

Indeed, I guess you can say I hate wealth.

Wel, but at least in PoW you burn actual money (and in the end, actual resource) proportioned to your profit to keep the network running. In PoS you burn nothing.

hanniabu · on June 5, 2024

> Also, only ~30% tokens are staked. The 30% who chose to stake essentially tax the other 70% in use.

And in PoW miners tax 100% of holders.

> what they receive is proportioned to how much they stake

Wealthy miners with state of the art ASICS benefit more than some kid mining at home with an old GPU. Maintenance/cost of mining equipment benefits from economies of scale too.

I hate being mean, but sorry, remembering to check one's assumption is a habit I gained after elementary school, so maybe that's too hard for you.

rfoo · on June 6, 2024

Yeah, PoW is bad too. But I'm happy to pay a tax to those who burnt energy to keep the networking running and converted USD to the native token, proportioned to their effort.

I'm less happy to pay someone a tax just because they are rich and they did barely anything.

Tricky_Troll · on June 6, 2024

>I'm less happy to pay someone a tax just because they are rich and they did barely anything.

As the operator of a single validator node you can get out of here with that take. I'm using up very significant bandwidth, having to keep a computer running 24/7, updating node and OS software, troubleshoot it after a power or internet outage and at some point I will have to replace the SSD since it is constantly reading and writing and will need replacing after a few years.

Is it a full time job? Absolutely not, but is it free from responsibility? Definitely not. If anything, I could be making more than 3%pa elsewhere if I weren't also in it for ideological reasons.

domothy · on June 6, 2024

>But I'm happy to pay a tax to those who burnt energy to keep the networking running and converted USD to the native token, proportioned to their effort.

Why? That's doubly bad for non-mining holders: not only does your share of supply get diluted with newly printed coins, but it also get devalued relative to USD when these coins inevitably get sold to pay expenses

In Ethereum's post-merge world, non-staking holders can keep their share of supply the same (or even have it passively increase) when total supply shrinks. And if the supply does increase by ~0.5-1% and you as a holder aren't okay with that amount of dilution, the barrier of entry to stake profitably and protect your share of supply is much, much lower than the barrier of entry of profitable bitcoin mining.

And the total newly issued coins (which are nominally much lower than pre-merge) have a much lower need to be sold off. If you view issuance as a tax on holders, Ethereum's model wins on all counts

>just because they are rich and they did barely anything

But stakers also "keep the networking running", just like miners under PoW. In both cases, it's gonna be the amount of capital involved that decides how the rewards are proportioned, there's no way around it - these permissionless systems ultimately use the inherent scarcity of economic capital as the anti-sybil mechanism with economic incentives to keep everyone honest. PoS just bypasses the need for burning a huge amount of energy and the embarrassing quantity of single-purpose e-waste to indirectly calculate who has how much at stake. It goes straight to the point: the capital at stake is simply measured in the value of the coin itself instead of external energy/hardware.

On the outside it does kinda look like stakers get rewarded passivly for doing nothing, but there are definitely costs involved, they're just mostly economic instead of physical - think of all the usual risk involved in crypto's volatility, now compound that with slashing risks, illiquidity, opportunity costs – staking yield is like 4-5% atm (and has been down only for quite some time), if you're a billionaire whale you definitely have other investment opportunities available that yield way more than that. I mean just the fact that the net supply growth can go negative shows that even internally in the blockchain itself there can be better things to do with your ETH than stake it; these people aren't burning their ETH on transactions fees for fun, they're actively using their ETH to do stuff that gets them some economic utility.

rfoo · on June 6, 2024

> but it also get devalued relative to USD when these coins inevitably get sold to pay expenses

Good point. It invalidates the "good" part but does not make it doubly bad IMO.

And for ETH, well, I don't think it's about protecting value, it's more about:

> In both cases, it's gonna be the amount of capital involved that decides how the rewards are proportioned, there's no way around it

Yes. The difference is, PoW requires you to BURN resource proportioned to your rewards, while PoS just requires you to HAVE (but not burn) it. This makes a huge difference IMO.

For example, I would consider it more "ethical" (whatever that means) to add a light PoW part (with constant or slowly increasing difficulty, that is chosen to reduce environmental impact) to the ETH PoS protocol as-is: the random-chosen validators have to solve a PoW in addition to make their efforts proportioned to how much they stake, instead of being mostly constant.

hanniabu · on June 6, 2024

Okay so startup capital aside, you like PoW because there's an ongoing cost to participating in consensus and don't like how PoS is more or less free.

I'm not sure how you arrived at this when your initial complaint was the rich get richer with PoS. PoW has much higher costs to participate and after a few years you have more costs when you need to upgrade your mining rigs because they're either burnt out or outcompeted by newer hardware.

_940h · on June 5, 2024

The "Etherium merge moment" is entirely different and it irks me to see it compared favorably with this project. It didn't 'solve' proof of work. It assigned positive value to past environmental harms. It prevents more proof of work, but doesn't solve past harms.

The only 'solution' (more a mitigation) to Etherium proof of work's environmental harms is to devalue it.

Unlike your example, this project actually seems to be a net positive for society that wasn't built on top of unnecessary harms.

mg · on June 5, 2024

When they released Stable Audio 2.0, I tried to create "unusual" songs with prompts like "roaring dragon tumbling rocks stormy morning". The results are quite interesting:

https://www.youtube.com/@MarekGibney/videos

I find it fascinating that you can put all information needed to recreate a whole complex song into a string like

     rough stormy morning car rocks hammering
     drum solo roaring dragon downtempo
     audiosparx-v2-0 seed 5

This means a whole album of these songs could easily fit into a single TCP/IP packet.

If a music genre evolves in which each song is completely defined by its title, maybe it will be called "promptmusic".

I will try the new model with the same prompts and upload the results.

TeMPOraL · on June 5, 2024

That's a great example of the fact that information about something, say a song, isn't entirely encoded only in the medium you use to transfer it - it's partially there, and partially in the device you're using to read it! An MP3 file is just gibberish without a program that can decode it.

In this case, the whole album could indeed fit into a single TCP/IP packet - because the bulk of information that make up those songs is contained in the model, which weights however many gigabytes it does. The packet carrying your album is meaningless until the recipient also procures the model.

(Tangent: this observation was my first mind. blown. experience when reading GEB over a decade ago.)

jononor · on June 5, 2024

Also note that it is the same for language. The meaning of these words are not in this text. The words are merely codes which point to things in the readers databank. And hopefully the word have similar enough associations to mine, such that the decoded message is close to what I attempted to encode...

minimaxir · on June 5, 2024

Note that this has the typical noncommercial "you have to pay for a membership to use commercially" Stability license.

simonw · on June 5, 2024

Sigh:

    Stable Audio Open is an open source text-to-audio model [...]

License: https://huggingface.co/stabilityai/stable-audio-open-1.0/blo...

    STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE AGREEMENT

Stability are one of the worse offenders for abusing the term "open source" at the moment.

Zuiii · on June 6, 2024

Unless you've explicitly agreed to this, or operate in a litigation-happy country and don't have a budget for lawyers and the small risk of loosing, feel free to ignore this license and others like it. Weights are not currently copyrightable. We have a photo of a smiling monkey to prove it.

littlestymaar · on June 5, 2024

What a shame for something that could have been completely free from copyright issues given the training source… (And they're still no legal ground on such a license claim since training hardly qualify as creative process on their side)

jononor · on June 5, 2024

The training data is licensed under various Creative Commons licenses. These are not in anyway "completely free of copyright issues". Most of them have specific conditions on their use.

elpocko · on June 5, 2024

Every time I call out the absurd interpretation of "Open Source" in this space in general, I get showered with downvotes and hateful attacks. One time someone posted their "AI mashup" project on reddit, that egregiously violated the terms of not only one, but several GPL-licensed projects. Calling this out earned me a lot of downvotes and replies with absolutely insane justifications from people with no clue.

No one cares. Not in this space.

jeroenhd · on June 5, 2024

AI fans don't seem to care much for copyright, unless it's their work being stolen (remember the people that got mad at "prompt stealing"?).

Companies are more risk-averse, though, and hobbyists on Reddit don't have the money to do anything serious with this software.

elpocko · on June 5, 2024

Nope. Most of the relevant software is OSS made by hobbyists. ComfyUI, llama.cpp, etc. For example, Nvidia is building stuff based on ComfyUI, a GPL-licensed application.

My complaint is about people ignoring the license of open source software made by hobbyists. I disagree with your ignorant "AI fans" generalization.

immibis · on June 5, 2024

to be fair, open-source is far too corporate-friendly at the moment. it should be more non-commercial. To what extent, is an open question.

jeroenhd · on June 5, 2024

Open Source, as used by the most influential open source projects, is corporate-friendly by definition.

There are good reasons to use something more aggressive. I'm a big fan of the strict copyleft licenses for this, even if that means companies like Google don't want to that software anymore.

bufferoverflow · on June 5, 2024

It produces decent audio, but something unpleasant about its high frequencies. And no voices, it doesn't seem to talk or sing.

Udio, so far, is undefeated.

And ElevenLabs' music demos were very very impressive, but it's still not released.

genericacct · on June 5, 2024

Have you tried suno? It is quite good at least for some genres

bufferoverflow · on June 5, 2024

Suno is good at generating music, but its voices sound metallic with a dash of high-frequency noise. Which ruins it for me. It's almost there though, I think they will fix it in the next version.

jononor · on June 5, 2024

I suspect we will get ML based "upsamplers" / enchancers / artifact removers, similar as we already have for images. That can be used to automatically post process. If they do not already exist?

bufferoverflow · on June 5, 2024

I think the models will just get better. I mean, these already sound like professional recordings:

https://twitter.com/elevenlabsio/status/1788628175766859891

https://twitter.com/flavioschneide/status/178865450379062893...

https://twitter.com/elevenlabsio/status/1788628178786787822

https://twitter.com/elevenlabsio/status/1788628173367685281

https://twitter.com/elevenlabsio/status/1788628171044053386

https://twitter.com/flavioschneide/status/178867866121578134...

hierophantical · on June 5, 2024

None of these are impressive in the least. Anything I have heard from Udio is basically trash. It is the AI art equivalent of synthetic cats and pretty face shots. Who cares.

What is ultimately going to be undefeated is training your own model.

bufferoverflow · on June 5, 2024

I've heard many good things from Udio and the demo tracks from ElevenLabs are very high quality.

https://www.udio.com/songs/ai2uAaBffRGdWdTNNqAbDx

https://www.udio.com/songs/19xQAMG6E1UXG7wNvP7nDW

https://www.udio.com/songs/mPAFYyFgo7Nqjb8ypeFfh9

https://www.udio.com/songs/7sKM9jMwZrXwTTzmMYN9qv

https://www.udio.com/songs/coixNX1gnJ1oWT8z2LQddk

acureau · on June 7, 2024

I just don't see how any of these examples from Udio or ElevenLabs could be considered good. To me they sound just bland enough that I may be fooled by them if I weren't paying close attention. If I do pay attention it sounds like a song accidentally came about. It sounds like music might in a dream. Eerie.

bufferoverflow · on June 10, 2024

I think if I gave you 20 pieces of music, half of them made by AI, you wouldn't be able to tell which ones are made by AI much better than guessing randomly.

acureau · on June 12, 2024

I am certain I could, in fact I'd bet a lot on it. I'd have a hard time distinguishing AI generated images at this point, but earlier on there were major indicators. This tech is still primitive. The lyrics are dead giveaways, the tracks all sound blended together. A 5 second snippet of a song seems coherent, but the songs never go anywhere. If you listen to a lot of music, and have a decent pair of headphones, it's immediately obvious. Someone who makes music would be able to identify the specific flaws better than me.

MrNeon · on June 5, 2024

> What is ultimately going to be undefeated is training your own model.

From scratch?

wavemode · on June 6, 2024

As far as I understand, this isn't competing with Udio? It seems to be designed for creating sound effects and loops, not entire songs.

treesciencebot · on June 5, 2024

This looks like the one that got leaked a couple weeks ago, so i guess they decided its better to open source at this point after the leak [0].

[0]: https://x.com/cto_junior/status/1794632281593893326

washadjeffmad · on June 5, 2024

It is. The model.ckpt from petra-hi-small matches the official HF repo.

SHA256: 6049ae92ec8362804cb4cb8a2845be93071439da2daff9997c285f8119d7ea40

tmabraham · on June 5, 2024

it was already planned for open-sourcing, the leak did not affect the plans in any way

ben_w · on June 5, 2024

> Warm arpeggios on an analog synthesizer with a gradually rising filter cutoff and a reverb tail

I appreciate that the underlying tech is completely different and much more powerful, but it is a pretty strange feeling to find a major AI lab's example sounding so similar to an actual Markov chain MIDI generator I made 14-15 years ago: https://youtu.be/depj8C21YHg?si=74a4DHP14EFCeYrB

(Not that similar, just enough for me to go "huh, what a coincidence").

hehdhdjehehegwv · on June 5, 2024

Highly commendable:

“The new model was trained on audio data from FreeSound and the Free Music Archive. This allowed us to create an open audio model while respecting creator rights.”

This should be standard: commons go in, commons go out.

mastermedo · on June 5, 2024

Except here the out is not commons if I understand correctly.

EDIT: might be cc, non commercial

drivebyhooting · on June 5, 2024

From announcement I couldn’t figure out if it can do audio to audio.

Text to audio is too limiting. I’d rather input a melody or a drum beat and have the AI compose around it.

duranduran · on June 5, 2024

This kind of exists, but I doubt there are any commercial solutions based on it yet. https://crfm.stanford.edu/2023/06/16/anticipatory-music-tran...

Their paper says that they trained it on the Lakh MIDI dataset, and they have a section on potential copyright issues as a result.

Assuming you don't care for legal issues, theoretically you could do: raw signal -> something like Spotify Basic Pitch (outputs MIDI) -> Anticipatory (outputs composition) -> Logic Pro/Ableton/etc + Native Instruments plugin suite for full song

samfriedman · on June 5, 2024

Free idea because I’m never going to get around to building it: An “AI 8 track” app; click record and hum a melody, then add a prompt and click generate. The model converts your input to an instrument matching your prompt, keeping the same notes/rhythm you hummed in the original. Record up to 8 tracks and do some simple mixing.

Would be a truly amazing thing for sketching songs! All you need is decent humming/singing/whistling pitch. Hum and generate a bass line, guitar lead, strings, etc. And then sing over it - would make solo musicians able to sketch out a song far easier than transcribing melody to a piano roll.

xnx · on June 5, 2024

Google MusicLM (and probably lots of other tools) do this: "MusicLM .. can transform whistled and hummed melodies according to the style described in a text caption."

https://google-research.github.io/seanet/musiclm/examples/

samspenc · on June 5, 2024

Sounds like Suno AI will soon have this feature as well https://x.com/suno_ai_/status/1794408506407428215

ortusdux · on June 5, 2024

App name - Beat-it

https://www.youtube.com/watch?v=eZeYw1bm53Y

https://www.nme.com/blogs/nme-blogs/the-incredible-way-micha...

eman2d · on June 5, 2024

A killer Ad would be converting each vocal track back into the original song

michaelbrave · on June 5, 2024

the tech is technically there using 2-5 different AI solutions, it mostly lacks an interface that automatically takes one step to the next

beoberha · on June 5, 2024

Facebook’s MusicGen can pretty much do this

_bkyr · on June 5, 2024

"a drummer could fine-tune on samples of their own drum recordings to generate new beats"

Yes, this is the reason someone becomes a drummer.

112233 · on June 6, 2024

Can this be refitted for noise reduction / audio restoration purposes? On hf they mention this is a "latent diffusion" model, I guess it implies there is a component inside that tries to recover noisy signal.

[edit] If not, what free and open ML tools that can be used for restoration / deconvolution / denoising are there?

nickthegreek · on June 5, 2024

I keep hearing about the pending death of Stability, but here we are with another release. I am rootin for them.

sunlin · on June 9, 2024

that's cool, there is a landing page i found it , can try it demo online. https://stable-audio-open.com/