Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Salesforce releases language model bigger than GPT-2 large (github.com/salesforce)
145 points by strin on Sept 15, 2019 | hide | past | favorite | 40 comments


I am working on a guide (should be released tomorrow) to easily get it up and running for personal use. Here's my Twitter thread of current experiments with the model: https://twitter.com/minimaxir/status/1173081315177975810

I recommend reading the linked paper in the repo as it gives decent examples/instructions on how to use the model. Although the size and architecture is comparable to GPT-2, the emphasis on conditional generation differentiates it.


Awesome work. Following, I'm a noob when it comes to this type of stuff but always found it highly interesting.



> running for personal use

how one can use it for personal use? In my understanding it will not fit into single GPU memory available to average person? Someone need to distill model first?


It currently fits into a P100, but barely.


Maybe using a Cloud provider?


Following the Twitter thread, keep up the great work Max!


Wow, that's some license addendum:

> This software should not be used to promote or profit from:

> violence, hate, and division,

> environmental destruction,

> abuse of human rights, or

> the destruction of people's physical and mental health.


There is no license addendum. It is explicitly stated that they are asking you to respect this rule.

> The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

The license is in LICENSE.txt and that statement seems to unambiguously confirm that LICENSE.txt is the beginning and end of actual legal obligations.

It is not uncommon for FOSS projects to make requests without legal contract about how users use software. In this case it may simply be SalesForce trying to preliminarily distance themselves from malicious actors, with knowledge that the license would be useless if it attempted to give these rules teeth.


Wouldn't it be better to simply require that all texts produced with this software to be marked as AI-generated? That should rule out many nefarious uses.


Forget for a moment that they don't define what they mean by any of this, what they also do is cloud the license. You called it a license addendum... but I don't believe that it is from my reading... having said that: you can't tell whether you are right or I am clearly.

There's an old saying (that may well be impolitic these days): "you can't be half pregnant". It seems that's what they maintainers are shooting for... I'd urge them to get off the fence one way or the other.


This is the sort of license that effectively says "If you don't really give a damn about licenses, then you're allowed to use this. If you get pedantic about licenses, then you're not."

Similar in principle to AGPLv3 or even WTFPL.


Yes if you read the license itself (which I did).

The issue is there's a license section in the README. There they reiterate that they use BSD 3 clause... and then have something that looks like "but...."

Again, as I read it: they're BSD 3 clause licensed and the README is an expression of their own (ill-expressed) hopes rather than a licensing requirement.... but I could see how some could be confused and indeed... maybe they are, too: and that's what would worry me about this statement.


What are the chances this could actually be upheld as enforcible by a court?


The licence is BSD-3... so completely unenforceable. They're just asking nicely and you can kindly tell them to f off.


So Salesforce could never use it?


Not sure which of those you think they would be violating? And they own it, so the license doesn't apply to them, it applies to everyone else. (Apologies if you're making a joke and I'm ruining it)


You can very easily argue that Salesforce, by dealing with companies who do it, does all of the things that are forbidden by the license.


Pretty much. My question is who would enforce such bizarre terms anyway?


How easy is it for companies to dodge terms like this by claiming not to know what their clients do with their products?

I’d say most social media platforms check most or all of those boxes in some form, but I can also see them claiming not to know how their platforms are being used.


Anyone have a real-world use case for something like this? I must admit I'm having trouble thinking of any that aren't essentially deceptive. Because in my little biased world, I have no need of "text" per se, and what value any text has to me is closely linked to the fact that it came from a human.


Machine learning researchers aren't working on language modeling because they want to enable fake news.

They are working on it because it improves all downstream NLP tasks. See: http://ruder.io/nlp-imagenet/. BERT, Elmo and XLNet all fall under this use case.

For example if you're trying to recognize speech or translate some text, it helps a lot if you can start off producing something that is statistically grammatical even if the content is nonsense.


One use case I've seen is autocomplete -- people have used GPT-2 to build things like TabNine[1] and Write With Transformer[2].

[1]: https://tabnine.com/

[2]: https://transformer.huggingface.co/


I imagine summarization being a usecase


From the blog post: "Beyond the technical work to develop this model, we’ve also taken several steps to anticipate and mitigate malicious use cases where possible."

From the preprint, this seems to be doing some review before release and having a code of conduct in the GitHub repo.


The unicorn prompt is the new text generator lorem ipsum


It was trained on 140GB of text on 256 TPUs for 2 weeks, the model being made of 48 transformer layers. I'm wondering when we will see a model trained on 1TB or 10TB of text.


I doubt training a scaled up transformer on 10TB of text will lead to significant improvements (btw, 10TB is about the size of all books in English in the Library of Congress). Image classifiers don't get a lot better when trained on a lot more data than ImageNet. 140GB is probably enough to train a general model, which could be finetuned on extra data for specific tasks.

Text generators need a world model and situational awareness, something like a map and a GPS signal. So we are probably two major breakthroughs away from a machine that actually understands something (or at least which seems to understand something, if you're philosophically opposed to the idea that a machine can understand something).


Could someone provide a high level summary of what this is for a technical person not conversant with the field?


Salesforce has created a computer program where you put in a small prompt, like "Wikipedia page about badgers" or "News article starting with the line, 'Donald Trump was impeached today'", or "French translation of 'I like pears'", and it tries to predict what the text will be. You can also run the program in reverse, where you put in a snippet of text and it predicts whether it came from Wikipedia or a mystery novel or the fitness subreddit.

Salesforce created the program by first writing some relatively simple linear algebra, then fiddling with the constants until the output happened to look right. Their program contains 1.6 billion constants, which is more than any other program of its kind.

This program is also special because Salesforce has released it publicly; other organizations, like OpenAI, have previously claimed that text-generation software is too dangerous to release to the general public.


> writing some relatively simple linear algebra

Except, that it wouldn’t work if it was purely linear.


Right, yeah, it's linear algebra combined with a few non-linear functions. The point is that Salesforce didn't come up with an algorithm that generated English text by writing a grammar or thinking really hard about what sentences look like—all the functionality comes from the "training" process that set the constants.


> Advertisement

Yeap, This one is indistinguishable from reality


Are there any hardware reqs to work with this?


In theory, no. But for any decent performance, you need a big CUDA capable GPU, as far as I know.

But you can try it on a CPU of course. (Maybe with some modifications; see this: https://news.ycombinator.com/item?id=20977776 ; also if someone can get it working in Google Colab you get a GPU capable instance for free.)


Open AI did the right thing by not releasing their model; it's disappointing that researchers are so callous about the potential effects of their research in the name of progress.


I've never really gotten why AI types are so concerned about text-generation models.

Like, sure, I can kind of see why you wouldn't want to make the Deepfakes program public; it currently takes a lot of time, effort, and expertise to swap faces realistically in a video, and maybe we don't want to give every average Joe the ability to do that.

But pretty much everyone in the world can already pretty trivially write text. (I'm doing it right now!) And the "typical" generation output from these programs usually isn't very good—OpenAI had to try like thirty times for each of the prompts in their PR materials—so it usually ends up being less work to just write the fake news yourself instead of using the software.

My personal conspiracy theory is that all this talk of "the model is too dangerous to release" really boils down to "if we let people test out the model, they'll find it doesn't work as well as our PR team wants them to think it does".


I dunno, this time the text looks really good. I got as far as 5 or 6 phrases deep before it said anything silly. I would have been fooled if I red it in real life.

My guess is that they will perfect the transformer and its training process, curate the dataset and make this method really easy to use. Maybe it can do translation, math, even auto-complete code. That is only by iterating more on the current formulation of the Transformer.

But it is also possible that it is surpassed by something even better. This new language model could replace the inductive bias specific to the Transformer - the ability to "attend" to any part of the input text, with something more efficient, because Transformers are quite hard and expensive to train right now. Maybe the Transformer inductive bias is too general (like a fully connected network) and needs too much data, with a slightly different idea it could be made much more efficient and probably more convincing.


I sort of agree with the principle -- for instance, releasing easy-to-use pretrained SOTA models for "deepfakes" type tasks can cause real harm. There are existing repositories out there which are already morally dubious in my view -- if researchers from a large organization made a big advance and released the code, that would seem very irresponsible.

But I just don't buy that there's significant danger in the public having access to a generative language model, at their current level of quality.

It doesn't seem like this team was callous -- they seem to have honestly thought about potential problems before deciding to release it.


the reason openai didn't release their model was just marketing and using the AI hypetrain. None of these text generating models is ever going to fool anyone any more than a markov chain generator, as they have no grasp on the actual meaning of the text they're generating.

The openAI model produced sentences along the lines of "before the first human ever walked on earth, humans did such and such". Hiring workers in a developing country to write you propaganda is cheaper than training that model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: