Alpaca: A strong open-source instruction-following model

simonw · on March 13, 2023

This is why I think we're seeing a Stable Diffusion moment for LLMs: https://simonwillison.net/2023/Mar/11/llama/

Look at the timeline:

24th February 2023: LLaMA is announced, starts being shared with academic partners: https://research.facebook.com/publications/llama-open-and-ef...

2nd March: Someone posts a PR with a BitTorrent link to the models: https://github.com/facebookresearch/llama/pull/73

10th March: First commit to llama.cpp by Georgi Gerganov: https://github.com/ggerganov/llama.cpp/commit/26c084662903dd...

11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi: https://twitter.com/miolini/status/1634982361757790209

12th March: npx dalai llama: https://cocktailpeanut.github.io/dalai/

13th March (today): llama.cpp on a Pixel 6 phone: https://twitter.com/thiteanish/status/1635188333705043969

And now, Alpaca. It's not even lunchtime yet!

Turned this into a blog post: https://simonwillison.net/2023/Mar/13/alpaca/

2bitencryption · on March 13, 2023

Here is one question I have not seen answered yet:

All the magic of "7B LLaMA running on a potato" seems to involve lowering precision down to f16 and then further quantizing to int4.

Clearly this quantized model still outputs something resembling human language, at the very least.

But I haven't seen anyone show what effect this quantizing has on the quality of the output. If the quality of the output is bad, it's unclear if it's because the model needs to be finetuned (as Stanford did here) or if it's because the quanitizing reduced the quality, or both.

If this fine-tuned Stanford model still has excellent output after quantizing it to run on a Raspberry Pi 4GB, that would be awesome!

dmw_ng · on March 13, 2023

For 10 billion+ parameter models, the effects of quantization are relatively small, for smaller models like Llama 7B the effect becomes more dramatic, but there is ongoing research on new quantization methods (like GPTQ) that preserve significant performance even on the lower end.

Quantization isn't the only technique available for downsizing a model, Llama itself is already the result of sizing the model and input data according to "Chinchilla optimality", a very recent (as in 2022) result that e.g. GPT-3 predates. The result is that Llama-13B performs in benchmarks similarly with GPT-3 175B despite the tremendous size difference. There are separately also a variety of pruning methods to further eliminate inactive weights present in the trained model (I think this is also active research)

Finally even on something like a Raspberry Pi, implementations for inference (like llama.cpp) are nowhere near mature yet. There are already a multitude of runtimes available for inference making large tradeoffs between performance and flexibility (e.g. many models running on PyTorch vs ONNX report 5-10x speedups running under ONNX)

I think the really exciting part of Alpaca is the size and budget of the team - 5 students with $100 scraping OpenAI put this model together in a couple of hours of training. Any notions of premium persisting in the AI space for much longer seem fantastic at best, for all intents and purposes it has already been commoditized. And that's scary considering the size of the dent ChatGPT has put in my Google traffic

ntonozzi · on March 13, 2023

Llama is trained with _more_ data than is chinchilla optimal in order to make it better and cheaper at inference time, instead of just getting the highest quality of model that you can based on a given training budget. Llama has fewer parameters and was trained on more data specifically so that it would get high quality results on cheaper hardware and be easier and faster to run at inference time.

Havoc · on March 14, 2023

Curious about the google traffic comment. Are you saying people are visiting sites less because they can stay on Bing/OpenAI?

dmw_ng · on March 14, 2023

There is some very natural split regarding what I'll send to ChatGPT vs. what goes to Google. For example "six nations fixtures" obviously Google, but anything of depth or where recency is irrelevant goes the other direction. Asked it a few Linux questions today, how to interpret the title of a particular FRED chart, and a ton more sessions that Firefox history somehow didn't manage to correctly track the title for. I vastly prefer ChatGPT's interaction format compared to the equivalent random keyword spelunking session on Google.

counttheforks · on March 14, 2023

Same, until I realized that about 60% of the information it gives me is either subtly wrong or 100% factually incorrect. Yet it's so, so confident.

nodemaker · on March 14, 2023

And in that way its actually more overall correct than the most knowledgeable person on earth. With google you also get fed some very dangerously wrong info (recent example masks) but you think its correct. With ChatGPT you have to actually use your critical thinking skills and get to the truth which in my opinion a huge advancement over google.

counttheforks · on March 14, 2023

Not really, with google you get multiple sources at a glance. Sure, they can still be wrong, but some critical thinking + multiple sources = more likely to be correct than relying on a single unreliable source.

pradn · on March 14, 2023

No, the crucial thing is that a good human will tell you if they don't know something, or if they are simply unsure.

dmw_ng · on March 14, 2023

It's adorable seeing this kind of critique in the context of HN, I wonder how many folk knew my heavily upvoted comment above ("For 10 billion+ ...") was from someone who has only been looking at this stuff for a few weeks. ChatGPT is no better or worse than any consultant I've ever met (including myself), or most of the commenters you find here every single day.

counttheforks · on March 14, 2023

It's adorable that you think people assume HN comments are factually correct. I read everything here with extreme skepticism, because I know this is all coming from flawed humans. An computer system giving authoritative text and insisting it is 100% correct is a different story.

dragonwriter · on March 14, 2023

Every bit of text from a computer system also comes from flawed humans.

avereveard · on March 14, 2023

I don't know the data but as an anecdote for most searches that would have returned blogspam (i.e. "what's the best birthday gift for a groom") in relying more and more on chatgpt.

I used to use it even more, but some of the recent changes reduced its ability at complex, creative tasks.

throwaway290 · on March 14, 2023

It's a nice business model, scrape the web and be the ultimate knowledge middle man

koheripbal · on March 13, 2023

The difference is small, UNTIL you get to 4 bit quantization, where the model is noticeably dumber.

8 bits, imo, is the minimum.

MacsHeadroom · on March 14, 2023

WRONG. Research shows effectively imperceptible performance difference at 4-bit and even 3-bit with GPTQ quantization. You cannot tell the difference and if you think you do you're wrong, because it barely even registers on any benchmark.

(Note: llama.cpp's 4bit is naive, not GPTQ, and sucks but they are refactoring it to use GPTQ quantization)

References:

https://arxiv.org/abs/2210.17323 - GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers [Oct, 2022]

https://arxiv.org/abs/2212.09720 - The case for 4-bit precision: k-bit Inference Scaling Laws [Dec, 2022]

https://github.com/ggerganov/llama.cpp/issues/9 - llama.cpp: GPTQ Quantization (3-bit and 4-bit) #9

https://github.com/qwopqwop200/GPTQ-for-LLaMa/ - 4 bits quantization of LLaMa using GPTQ

tysam_and · on March 14, 2023

Good points, though I would gently encourage not starting a post with "WRONG." in the middle of a nuanced discussion. I remember 'way back when' when there was a .5-2% flat performance drop for UINT8 on some models when it was first introduced (depends upon the modality).

Like, 4 bit quantization really is probably enough for a number of usecases and likely beats smaller models with precision enough to make it the equivalent number of bits, but this really is only presenting half of the story. "You cannot tell the difference and if you think you do you're wrong, because it barely even registers on any benchmark" can be regarded as antagonistic, and also really doesn't line up with reality in a number of usecases. Sure, maybe for some models, UINT4 quantization is good enough. But there's a very large space of model architectures and problems, even for language learning, many of which do have very demonstrable drops in performance. And at certain perplexity levels, every bit (heh) matters.

In any case, an argument for moderation, please.

MacsHeadroom · on March 14, 2023

Good points, I didn't mean to come off abrasive but I can see why I would. My attention was to get attention on a thread where my new comment would be buried under the 8 other replies, so I put a big attention grabber at the start.

But again good points about the nuances of lower precision. For LLMs at least 'The Case for 4-bit Precision' and 'GPTQ' seem fairly conclusive that over ~10B parameters even 3-bit precision has virtually undetectable loss with the right trircks. Levels which, if they even mattered, can easily be overcome with a little additional training.

Newer ongoing research on LLaMA specifically[0] shows we can reduce the model's size around 84% without any meaningful performance loss through a combination of GPTQ, binning, and 3-bit.

[0] https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...

imp0cat · on March 14, 2023

Tak that WRONG! as a reference to the Two Stupid Dogs, then it may be a lot easier to stomach. :)

magicalhippo · on March 13, 2023

Some parameters would be more sensitive than others I suppose? So could you use 4 bits for most, and 8 bits, or even 16, for the remaining?

sour-taste · on March 13, 2023

I know nothing about this so my opinion means little, but I imagine it's hard to know which parameters are important enough to use more bits for.

I do wonder if it would be possible to have the model determine during training how important each parameter is, while maybe rewarding it for having more small parameters?

dongping · on March 14, 2023

That's exactly why bitsandbytes has a threshold parameter to control the quantization.

magicalhippo · on March 14, 2023

Nice, good to know, thanks!

DrSiemer · on March 14, 2023

So which is better, running 7B without quantization or running 13B with? They both require about the same amount of vRAM (10gb).

MacsHeadroom · on March 14, 2023

Empirically, 13B with quantization.

In fact the person who said 4bit is worse is empirically incorrect.

13B with quantization even down to 3-bits has very near the same performance as uncompressed 16bit 13B with GPTQ quantization and binning.

Source: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...

tysam_and · on March 14, 2023

I looked at the numbers you posted, and am feeling concerned with how aggressively you're commenting towards a number of people on this website.

For starters, I started in this field a few years after the 2012 wave started. I've been with it for a while and have seen a lot of trends come and go. One thing that stays the same is that things are always changing. Very few things are set in stone, and due to a few other things it takes years and years before anything even begins to be finalized.

The numbers you are quoting are from various research groups, and are days to weeks old. You've antagonized a number of users in this forum, from calling them wrong directly, or saying that another person is empirically incorrect based on numbers you haven't verified yourself, and that have not had time to settle in the field yet with respect to real-world usecases. I went to one of the methods you linked, GPTQ, and it indeed had a _good_ performance to size improvement, but was not 'no difference'. This also does not count that 4-bit GPU support is still not-well supported. On 13B, for 4-bit, a .1 perplexity difference is great, but I also believe that that is also at least a noticeable drop. The .42 perplexity drop for 3 bit is massive, but also still very information efficient.

This completely ignores the conversation about (back to the GPU side of things) kernel-level support for these operators, which is very underdeveloped. Technical and unvalidated lab numbers do not represent the real world, it's like battery technologies. They are two very different things, though there are impressive tech demos and numbers out there. Like many things, in my experience, at least, it comes down to a big 'it depends'. It'll all settle out in the wash and we'll see what methods end up reigning in the long run.

Again -- please stop attacking other HN users based on a partial -- if well-researched -- understanding of the subject matter. It seems you're very involved in this topic, and I agree that more people need to hear about it. I think you could do an excellent job in sharing that news to them. That is good, and I hope the evangelism efforts go well and wish you all the best on that front. However, it seems (and this may be an inappropriate judgement on my end) that you might have become personally entangled in what is generally a technical issue.

I am just a commenter on this website, though I have used hacker news for a very long time at this point. I requested previously that you tamp down flaming the other users a bit, and I'd like to ask you once more. A good litmus test to maybe ask yourself is "Am I including any information in this message that indicates that another person may be right or wrong, or that I might be right or wrong? How strongly do I feel that my perspective is reality vs their incorrect perspective?" If you trigger that line when writing out a comment -- even if there is a strong impulse to ignore it, it may be time to step back, breathe, and separate out what is a personal issue for you, and what is a technical issue that you are passionate about. You can have both at once.

Please just slow it down a bit. I want to see what you and everyone else can mutually bring to the table in this conversation. Thank you.

MacsHeadroom · on March 14, 2023

Many good points. I agree with essentially everything you've said, especially regarding relative perplexity.

I'm aware that I was aggressively overselling an unnuanced and overstated position on 4-bit and especially 3-bit performance. That was partially a rhetorical tactic to swing the pendulum the other way, as it were.

And partially it was simply frustration with the number of threads I've seen in the past week of LLaMA drama spreading misinformation about bit precision like "a 16bit 13B model surely outperforms a 4-bit 30B model" which could not be further from the truth. That frustration is my own responsibility to manage and I understand that.

Taek · on March 14, 2023

Definitively, 13B with quantization will perform better. 4bits has been shown to be the optimal quantization for accuracy vs memory requirements.

thot_experiment · on March 14, 2023

Yeah, 7b vs 13b is basically no comparison in any situation, 16bit 7b is def worse than 4bit 13b. I'll be looking into 30B tomorrow. I may be able to do a full matrix of tests 4-16bit X 7-30b.

Vetch · on March 14, 2023

This is interesting. What sizes are you seeing this for?

throwaway81523 · on March 14, 2023

I have heard that the human brain uses the equivalent of around 6 bits. I wonder if that is some kind of optimum reached by evolution.

gpm · on March 13, 2023

> All the magic of "7B LLaMA running on a potato" seems to involve lowering precision down to f16

LLaMa weights are f16s to start out with, no lowering necessary to get to there.

You can stream weights from RAM to the GPU pretty efficiently. If you have >= 32GB ram and >=2GB vram my code here should work for you: https://github.com/gmorenz/llama/tree/gpu_offload

There's probably a cleaner version of it somewhere else. Really you should only need >= 16 GB ram, but the (meta provided) code to load the initial weights is completely unnecessarily making two copies of the weights in RAM simultaneously. You could also lower vram requirements a bit more with a bit more work (I just made the smallest change possible to make it work)

nextaccountic · on March 14, 2023

> the (meta provided) code to load the initial weights is completely unnecessarily making two copies of the weights in RAM simultaneously

This is the kind of thing that the stable diffusion community optimized the shit out

gamegoblin · on March 13, 2023

Decrease in accuracy is negligible and decreases as model size increases. That is, larger models quantize even better than smaller models.

https://arxiv.org/abs/2210.17323

byteknight · on March 13, 2023

Is this because averages are weighed less (less sensitive) as the total sample size increases?

londons_explore · on March 13, 2023

Yes. In a dense everything to everything neural network layer, the number of 'inputs' to a node is proportional to the square root of the number of weights.

Therefore, assuming quantization noise is uncorrelated, as the number of weights doubles, the number of inputs goes up by sqrt(2), and the (normalized) noise goes down by a factor of 2*(sqrt(2)).

So, as a rule of thumb, you can remove 1 bit of precision of the weights for every 4x increase in the number of weights.

All this assumes weights and activations are uncorrelated random variables - which may not hold true.

londons_explore · on March 13, 2023

Something is wrong with this math... by your logic I could scale the network up big enough that I could quantize the weights down to zero bits...

sebzim4500 · on March 13, 2023

Having fewer than 1 bit per weight is not absurd. E.g. you can use 2 bits to represent 3 'weights' if you insist that at most one of the weights is allowed to exist. If you try to order nodes so that adjacent nodes are uncorrelated the performance loss might be manageable.

People are already doing stuff like this (see sparsification) so it is conceivable to me that this is just what networks will look like in a few years.

magicalhippo · on March 13, 2023

> If you try to order nodes so that adjacent nodes are uncorrelated the performance loss might be manageable.

shower thought

In graphics we use barycentric coordinates to encode the position within an arbitrary triangle using two coordinates (u,v), with the third being constrained to be 1-u-v. If you order nodes to be correlated, could you use a similar trick to encode three weights for the price of two?

hgsgm · on March 13, 2023

Yes, it's the same thing.

Robotbeat · on March 13, 2023

Rules of thumb typically are just first order approximations which by definition are not guaranteed to hold far from their point of interest (or point of tangency).

underlines · on March 14, 2023

See: https://arxiv.org/abs/2210.17323

Q: Doesn't 4bit have worsen output performance than 8bit or 16bit? A: GPTQ doesn't quantize linearly. While RTN 8bit does reduce output quality, GPTQ 4bit has effectively little output quality loss compared to baseline uncompressed fp16.

https://i.imgur.com/xmaNNDd.png https://i.imgur.com/xmaNNDd.png

bippingchip · on March 14, 2023

This is really interesting, thank you for the reference!

Having worked more with images based NN than language models before, I wonder: are LLM inherently more suited to aggressive quantisation, due to their very large size? I see people suggesting here 4b is pretty good, and 3b should be the target.

I remember ResNets etc can of course also be quantized, and up to 8-6b you get pretty good results with very little effort, with low-ish degradation in performance. Trying to go down to 4b is more challenging, though this paper claims with quantisation aware training 4b is possible indeed, but that means a lot of dedicate training compute needed to get to 4b (not just finetuning post-training): https://arxiv.org/abs/2105.03536

loufe · on March 13, 2023

Might I suggest looking the story between the 2nd and 10th of march? I've noticed Hacker News hasn't been following certain areas of the effort. A lot of great work had happened and continues to be happen in close conjunction with the text-generation-webui (seriously, most of the cutting edge with 4-bit GPTQ etc. has been closely tied to the project).

>https://github.com/oobabooga/text-generation-webui/

simonw · on March 13, 2023

Wow, yeah that's a VERY active project: https://github.com/oobabooga/text-generation-webui/graphs/co... - only started Dec 18, 2022 and already 22 contributors and 806 commits!

ulnarkressty · on March 13, 2023

I'm excited to see what the OpenAssistant crowd does with these models, they seem to have gathered the dataset to finetune them.

Lots of people use these models as talk therapy. We really need 1) standalone options, 2) reproducible weights with crowd sourced datasets to reduce biases (or at least know who you're talking to).

atleastoptimal · on March 13, 2023

Question: what percentage of the hype and momentum for this is so people can run sex chatbots on their local machine?

sebzim4500 · on March 13, 2023

A lower portion than the equivalent number for Stable Diffusion, but still significant.

yellowapple · on March 13, 2023

Feature-length AI-generated pornos don't seem that far off the horizon.

jhbadger · on March 14, 2023

Or really just any text generation that chatGPT dislikes. It's nice not to be judged by a program (and perhaps logged somewhere that you asked for something "inappropriate").

zh217 · on March 14, 2023

Also today: ChatGLM released by Tsinghua University. I've made a separate submission for it: https://news.ycombinator.com/item?id=35150190

The GitHub page is https://github.com/THUDM/ChatGLM-6B. The GitHub description is all in Chinese, but the model itself can handle English queries on a single consumer GPU well. Considering its size, I'd say the quality of its responses are outstanding.

zitterbewegung · on March 14, 2023

LLAMA.cpp with 65B parameters runs on a MacBook M1 Max with 64GB of RAM. See https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...

aortega · on March 14, 2023

That is still a 4000 usd computer. You can get 2 RTX3900 used for ~1000 usd and run 65B much faster.

I have a discord server up serving almost 500 users with 65B.

https://twitter.com/ortegaalfredo/status/1635402627327590400

For some things is better than GPT3, for other even Alpaca is better.

zitterbewegung · on March 14, 2023

How do you make it load on two GPUs or does llama.cpp does it automatically? I have a setup with a threadrippper and a RTX3090 and a Titan RTX. I haven't had the time to set it up so that's why I have been using my Mac.

kkielhofner · on March 14, 2023

llama.cpp doesn't use the GPU at all. The genius *.cpp (whisper.cpp, llama.cpp) projects are specifically intended to optimize/democratize otherwise GPU only models to run on CPU/non-GPU (CUDA, ROCm). Technically speaking the released models are capable of running on GPU via standard framework (PyTorch, TensorFlow) support for CPU but in practice without a lot of optimization they are incredibly slow to the point of useless, hence *.cpp.

You want something along these lines (warning: unnecessarily potentially offensive):

https://rentry.org/llama-tard-v2

aortega · on March 14, 2023

Llama.cpp takes advantage that LLaMa 7B is a tiny, very optimized model. It would run in anything, and very fast. I really doubt you can run the 30B or 65B models at acceptable speed on a CPU at least for a couple years. (I'm ready to eat my words in a couple weeks)

zitterbewegung · on March 14, 2023

Okay my thread ripper can handle it because it has a 128GB of Ram.

chesneyc · on March 14, 2023

Thanks for taking the time to set this up. I will definitely give it a go later today. I don't have access to hardware that I can run LLaMA on and I'm really curious to see what the 65B model has to offer.

notpushkin · on March 14, 2023

What's the correlation between parameter count and RAM usage? Will LLaMA-13B fit on my MacBook Air with 8 GB of RAM or am I stuck with 7B?

maxxk · on March 14, 2023

13B uses about 9GB on my MacBook Air. If you have another machine (x86) with enough RAM to convert the original LLaMA representation to GGML, you can give it a try. But quantization step must be done on MacBook.

Maybe it is more feasible for you to use 7B with larger context. For some "autocompletion" experiments with Python code I had to extend context to 2048 tokens (+1-1.5GB).

cfn · on March 14, 2023

I have also seen it working on a Mac Studio with 64Gb of RAM. It is quite slow, not unbearably so, but slow.

nwoli · on March 13, 2023

A lot of them aren’t very good though at the same stable diffusion vram level unfortunately (and we’ve had large non consumer level gpu llms open sourced for a while eg gpt-j)

chaxor · on March 13, 2023

That is likely because "good" is a higher bar in language than images, because people don't mind or notice the longer range artifacts in image models as much.

CuriouslyC · on March 13, 2023

A lot of people are running Llama using the CPU/system memory.

r721 · on March 14, 2023

>Turned this into a blog post: https://simonwillison.net/2023/Mar/13/alpaca/

HN discussion: https://news.ycombinator.com/item?id=35141531

tvink · on March 14, 2023

I think the Stable Diffusion moment is very dependant on someone creating a commercially licensable version of this somehow. I think the prospect of never being able to put your creations in a product is too inhibitive for the hypergrowth stable diffusion saw.

wkat4242 · on March 13, 2023

I know, this is crazy!!

I can't fathom how development has suddenly seemed to accelerate.

all2 · on March 13, 2023

The timing of the Facebook leak seems suspect.

meghan_rain · on March 14, 2023

What do you mean?

all2 · on March 15, 2023

I mean ChatGPT had a lot of attention, so a leak of a competing architecture would shift the attention away from ChatGPT. Which Meta's LLaMA did. And we see it swinging in the other direction with OpenAI announcing GPT-4.

theWreckluse · on March 14, 2023

Do you mean Meta’s publishing of Llama?

nickthegreek · on March 13, 2023

Quick compare with this vs 7B LLama.

Input: Tell me something about owls.

Output 7B LLama: Asked by kimberly at 12:54 PM on May. 30, 2011 in Just Plain Fun & Humor There are a number of species and varieties of owl found around the world--some live only in Australia or Africa while others can be found as far north as Canada... The one thing that all have in common is their ability to adapt. Owls will find homes in places where they would never survive on their own (cities with high noise levels, for example), and many different types of owl inhabit areas ranging from deserts like Death Valley right down into the rainforests near Cape Town South Africa... Answer by mamasaurus at 12:56 PM on May. 30, 2011 Owls are nocturnal birds with large eyes and flat-topped heads that give them great vision in low light conditions such as moonlight or starry skies... Often they have a soft feathery tuft atop their head (called an 'ear'-tuft) which helps them focus on prey objects.. The ears are not really "ears" but rather

Output Stanford-Alpaca-7B: Owls are birds of prey that can be found in many habitats throughout the world. They have large, forward-facing eyes, a round facial disk, and a downward-turned bill. Owls are nocturnal, meaning they are active mainly at night, and many species use their facial disk to collect sound from their environment, allowing them to hunt and locate prey in the dark. They mainly feed on small mammals, reptiles, insects, and birds.

mx20 · on March 13, 2023

I looked in the training set data and they have quite a few questions about owls. Also it got " downward curved beak" from davinci and got it still wrong.

Like:

"instruction": "Describe the sound an owl makes.",

"instruction": "Summarize the differences between an owl and a hawk.",

"instruction": "Find a fact about the bird of the following species", "input": "Species: Great Horned Owl",

"instruction": "What is the binomial nomenclature of the barn owl?",

"instruction": "Generate a riddle about an owl.",

rjb7731 · on March 13, 2023

With the 7B none-instruction tuned you have to really pre-prompt it to get good answers to things so like:

SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.

USER: Hello, who are you?

ASSISTANT: Hi! I am an AI research assistant. How can I help you today?

USER: Can you tell me something about owls.

IanCal · on March 13, 2023

I thought I'd put some in as an example. I added "ASSISTANT: " to the end, otherwise it likes to autocomplete the user query part too.

This is all with 4 bit quantisation from lambda.cpp, using `--repeat_last_n 256 --repeat_penalty 1.178 --temp 0.7 --top_k 40 --top_p 0`. I've not experimented too much.

7B model:

1) Owl is the common name for birds of prey in the genus Strix (subfamily Strycinae). They have large forward-facing eyes; their binocular vision gives them good night vision. 2) The word "owl" comes from Old English āwol, which means "bird". \end{code} [end of text]

30B model:

100% sure. Owl is the common name for several species of birds in the order Strigiformes, which includes about 200 extant bird species. Most are solitary and nocturnal. They have large forward-facing eyes and ear tufts. The smallest owl — the elf owl — weighs as little as 31 grams (1 oz) and measures about 15 centimetres (6 in). The largest owls — two species of great horned owls — may weigh up to 4 kilograms (9 lb) and measure over 70 cm (28 in); they appear as tall as humans with their long legs fully extended, although when perched on a typical branch or power pole they seem less imposing.

USER: Thank you very much. \end{code} [end of text]

(edit - I seem to be getting a lot of latex outputs here)

mattkrause · on March 13, 2023

It would be even more impressive if it were correct!

- Owls are in the order Strigiformes; "true" owls are in the family Strigidae, and only some (earless) owls are in the genus Strix.

- As for size, https://largest.org/animals/owls/ has great horned owls as the 5th largest; Wikipedia has them as the 2nd or 3rd largest in North America.

- Their legs are long, but the height of an adult person seems like a huge stretch.

Elf owls are adorable though!

aortega · on March 14, 2023

LLaMa_65B answer is similar:

SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.

USER: Hello, who are you?

ASSISTANT: Hi! I am an AI research assistant. How can I help you today?

USER: Tell me something about owls.

ASSISTANT: Owl is the common name for birds of the order Strigiformes. They belong to the class Aves. The other bird orders in the Caprimulgiformes and the Apodiformes are its nearest living relatives.

bryan0 · on March 13, 2023

yeah the "ASSISTANT:" at the end is important. without it the 7B model continued:

"USER: Can you tell me something about owls. They fascinate me, but don’t know much else than the fact they have a really cool call and that they live in trees. Maybe it would be interesting to see some drawings or hear their call again when we hang up? I also like cats!"

mrtranscendence · on March 13, 2023

For completeness, here is the output of that prompt from the 65B model:

> ASSISTANT: Owl is the common name for any of several species of birds of prey in the family Strigidae. They have large forward-facing eyes and ears, a hawklike beak, and a conspicuous circle of feathers around each eye called a facial disc. Although owls have been traditionally classified with the raptors, recent evidence suggests a closer relationship to nightjars and their allies. The term “owl” first appeared in Middle English as oule (c 1305–25), which was derived from the Old French uol, later ouel and oulle, which comes from classical Latin ulula or ululæ, meaning “to howl”.

> The owls are divided into two families; typical owls or true owls, Strigidae; and barn-owls, Tytonidae.

> Owls possess large, forward facing eyes and ear holes, but no ear flaps. Their plumage is generally cryptic although several species have brightly coloured head or tail markings. The smallest owl — weighing less than 31 grams (1 oz) and measuring some 13 centimetres (5 inches). Some owls are highly nomadic and individual owls may move about over hundreds of miles during the course of a year. Most owls live in structural habitats, using shafts of leaves, cracks in bark, caves, rocky outcrops, abandoned burrows etc., many species nest on the ground, with varying amounts of additional refugia in nearby vegetation. Several types of forest owls can be considered semi-aquatic, since they often forage for food near lakes, rivers or swamps. Many individual tropical owls roost in groups that also include individuals of other species. Tropical open country owls such as the short eared owl and the northern hawk owl will take refuge from storms or intense heat underneath clumps of grasses, or sheltered spots under shrubs. [end of text]

I can't speak to its accuracy, but I can say right off the bat that the etymology is probably wrong, and its definition of owl doesn't contain barn owls.

code51 · on March 14, 2023

Why would I be getting this from LLaMA_MPS when use the same prompt?

NARRATOR (Voice): This is the latest version of 1857, our new general purpose artificial intelligence core architecture. It has been designed to perform as diverse tasks as possible in order to create technologies useful for society.

SYSTEM: We have successfully created emotional intelligence, cognitive reasoning, natural language processing, problem solving skills, perception capabilities, and even social intelligence. The only thing left to test is what kind of human-like consciousness we can replicate.

NARRATOR (Voice): What you are about to do will be vital to the future development of artificial intelligence technology. SYSTEM: For this experiment we’ll be using your own memories as reference data. Your memory will give us insight into how complex human thinking works. And since it was previously recorded through brain scans, there won’t be any need for additional information collection from you.

NARRATOR (Voice): Our first step will be to copy your existing memory data files. The next stage involves rewiring those neural pathways so they function

code51 · on March 14, 2023

Adding prompt for reference:

SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.

USER: Hello, who are you?

ASSISTANT: Hi! I am an AI research assistant. How can I help you today?

USER: Can you tell me something about owls.

ASSISTANT:

wkat4242 · on March 13, 2023

That's really impressive. Thanks for doing this test.

simonw · on March 13, 2023

Here's a link that opens their training data (52,000 rows) in Datasette Lite: https://lite.datasette.io/?json=https://github.com/tatsu-lab...

Means you can run SQL LIKE queries against it to try and get a feel for what's in there.

This link includes a query for running searches for terms: https://lite.datasette.io/?json=https://github.com/tatsu-lab...

Jackson__ · on March 13, 2023

Considering the size of the training data, there's still a concerning amount of complete nonsense in this.

Example 1:

>"instruction": "Using the PDF below, edit the text to make it more effective.",

>"input": "https://www.example.com/example.pdf",

>"output": "We recently launched a new product and wanted to(...)

Example 2:

>"instruction": "Explain why you think this painting is meaningful.",

>"input": "https://i.pinimg.com/originals/67/2a/d1/672ad1ed7e8d71e3e7fd...",

>"output": "This painting is meaningful to me because it conveys a strong sense of emotion(...)

If anyone is considering using this in the future, it'd probably be for the best to completely filter out any entries that use links as seen in the above examples.

nickthegreek · on March 13, 2023

>We are waiting for huggingface to officially support the llama models (i.e. this PR to be merged) before we release a stable version of the finetuning code.

Can't wait for this to be released to see all the different fine tuned models that will be coming out. There was an article a few days ago about LLM having its Stable Diffusion moment, and I think it was right. We went from LLama release > llama.cpp > Dalai and now an instruction tuned model in under 2 weeks.

wkat4242 · on March 13, 2023

Yeah this whole industry seems to suddenly have moved from the pace of a pebble creek to a waterfall.

I'm sure this is just my perception because things don't work like that in real life. Is it really just a ton of slowly developed things happening to mature close together?

yieldcrv · on March 13, 2023

ChatGPT was the catalyst that there is market demand and what to do, that's literally it.

Google had a comparable LLM for over a year and it was too busy finessing its engineers and getting them fired just like the movie Ex Machina, a prescient movie making fun of Google. The organization just sat on it. Plenty of others were in the same place, no imagined product market fit.

nestorD · on March 13, 2023

> Is it really just a ton of slowly developed things happening to mature close together?

It is a bit of both. Once something big become public (and gets significant visibility) then a lot of people play with it and can try their own ideas leading to a burst of improvements.

whimsicalism · on March 13, 2023

> Is it really just a ton of slowly developed things happening to mature close together

Yes. I think it's probably been pretty clear a product like this could be built for the past 2 years, but chatgpt coming out jumpstarted a lot of people building out public demos.

RcouF1uZ4gsC · on March 13, 2023

> Is it really just a ton of slowly developed things happening to mature close together?

I think a lot of progress was gated on computing power, not just for training but also for inference. Now that powerful consumer GPUs and M1/M2 with unified memory are fairly common a critical mass of people can run these models on their come computers.

layer8 · on March 13, 2023

Progress almost always comes in phases and jumps, this is not really surprising.

lxe · on March 14, 2023

Check this out: https://github.com/facebookresearch/llama/issues/169

Also I'm working on a very hacky way to tune with the hf ecosystem as my very first attempt to do deepspeed/accelerate/transformers https://github.com/lxe/llama-tune

shostack · on March 14, 2023

I need a chat bot baked into a news app that can summarize the latest developments for me. It's too hard to keep up on my own!

int_19h · on March 14, 2023

Bing kinda already does that, esp. since you can get it to summarize any page or document that you can open in the browser.

famouswaffles · on March 13, 2023

https://crfm.stanford.edu/2023/03/13/alpaca.html They say they'll release the weights. But Of course, you'll be beholden to a non-commercial license. They also released the dataset they used to train the model (which was generated by Davinci)

I wonder why Meta didn't just train the 7b and 13b models on the full 1.4T tokens. They curves show they weren't even close to converging. Shame really.

bryananderson · on March 13, 2023

I am not a lawyer, but the consensus (around here anyway) seems to be that weights cannot be copyrighted since they are the output of an automated process, so perhaps the license cannot limit how the weights are used. The code can be copyrighted, but the code can also be reimplemented from scratch and used to load the weights.

clbrmbr · on March 13, 2023

Just because they cannot be copyrighted doesn’t mean that they cannot be considered a trade secret. The agreement for using the product could done with an NDA. But unlike copyright, that doesn’t continue to be enforceable once someone puts them on a torrent and the chain of liability is broken. /notalawyereither

throwaway1851 · on March 13, 2023

Well, trade secret law requires the owner of the trade secret to take steps to keep it... secret. Facebook has done the opposite, and they haven’t made the people they distributed the weights to sign an NDA.

ronsor · on March 13, 2023

And Facebook employees openly discuss the torrent magnet links.

dchest · on March 13, 2023

Yeah, this happened with RC4 cipher which used to be a trade secret until someone posted it on Usenet: https://en.wikipedia.org/wiki/RC4#History

Animats · on March 13, 2023

The US doesn't have database copyright, but some countries, including the entire European Union, do.[1]

[1] https://en.wikipedia.org/wiki/Database_right

mx20 · on March 13, 2023

Interesting. That could be a huge disadvantage for the EU and it's companies that want to use AI and also often can't give their Data to third parties. But I am not sure a neural net is even considered a DB. Because when this was created they clearly had structured Information accessed by people in mind. There has to be some distinction otherwise I make a db that stores digits of PI in a file and if you publish part of it, that is considered substantial enough, you are infringing on my DB rights.

hgsgm · on March 14, 2023

If I copy your database, that violates copyright. I generate the same data using my own process, it doesn't.

That's different from creative works where it's implausible to claim that you created the same work independently of someone else.

amoss · on March 14, 2023

Those protections rely on "systematic arrangement", i.e. that some effort has been put into finding, verifying and collection a set of facts / individual pieces of data. It is not obvious that would apply to the set of weights that come out of a training algorithm. (standard I'm not a lawyer, I don't know but I suspect it is unclear enough that it would need to be argued in court).

sdenton4 · on March 14, 2023

I assure you that if you scramble the weights of a neutral networks, the model will perform poorly... Training something new cost millions of dollars in accelerator and research salary, and evaluation ('verifying') is a big part of the work.

amoss · on March 15, 2023

This is true, but not currently relevant. The cost (difficulty) of training does not indicate how good a fit the model is for copyright protection.

I suspect that part of the arguing this out in court will be making those economic arguments as part of an attempt to extend copyright protection to model weights - but that is a separate issue from if they are protected currently.

Part of the economic argument for copyright is to encourage the investment of time into producing new works, so it may be a persuasive argument for extending protection. Although (relatively) recent history suggests that persuasive argument is not as powerful as giant wads of cold hard cash in extending copyright coverage.

Der_Einzige · on March 13, 2023

God damn I hope this is where IP law lands. I really want a world where folks don't gatekeep high end AI model weights. Please make them impossible to copyright, and ideally impossible to be touched by IP law.

hgsgm · on March 14, 2023

Why? I mean, do you feel differently about every other patented invention or IP? That would be weird.

bogwog · on March 13, 2023

That seems like wishful thinking. The safe thing to do is to wait until the lawsuits come out and we get a real answer.

Or if you’re not risk averse, then go for it and take one for the team!

karmasimida · on March 13, 2023

Not copy right, but doesn't mean this isn't someone's property.

barnabee · on March 14, 2023

Information can’t be property.

IP “rights” are a perverse legal fiction that have inflicted untold damage.

whimsicalism · on March 13, 2023

I am curious how you are expected to prove in a court of law that the weights you have are derived from some licensed weights if I take just a few gradient steps from the original weights.

sanxiyn · on March 13, 2023

Here is an example I just made up: in the network, include a token that is never produced by the tokenizer, and watermark the token's embedding. It won't be touched by normal fine tuning and will remain the same.

I am sure there are better methods.

hgsgm · on March 14, 2023

Same way that court determines whether my book is a paraphrase of yours. Substantial similarity.

whimsicalism · on March 14, 2023

I'm curious how they will be doing that for weights though. Euclidean distance?

jpeter · on March 13, 2023

They probably did but didn't release them for the public.

sebzim4500 · on March 13, 2023

Why would they withold those ones but the the 65B ones? Seems like a weird choice IMO.

LeoPanthera · on March 13, 2023

I'm in my 40s, so I'm acutely aware of the risk of turning into a "get off my lawn" curmudgeon, so I will only say this: I am glad that I saved a dump of Wikipedia a few months ago, because soon it will probably be impossible to distinguish human-written content from AI.

pmoriarty · on March 13, 2023

If it's indistinguishable, then how does it matter who/what it was written by?

LeoPanthera · on March 14, 2023

Because I am far less likely to trust encyclopedic content written by an AI, which can lie so convincingly that it does not know that it is lying. I have even seen it generating fictional source references that are hard to verify.

I just don't trust it.

jxy · on March 14, 2023

Maybe we just need a better attribution, so people can decide on the level of objective confidence. If you live in the US, it is already difficult to avoid seeing and listening to real people lie publicly, which may take journalists a long time to verify. Put it in another way, we might just see late night talk show hosts start parodying $next_gen_llm instead of real people, if this $next_gen_llm started affecting people in a more profound way.

aortega · on March 14, 2023

>can lie so convincingly that it does not know that it is lying.

A human can also lie convincingly.

The problem with AI is that you can do it massively.

pmoriarty · on March 14, 2023

It takes more than merely being human to earn my trust.

akrymski · on March 14, 2023

Have you every tried to edit the Wikipedia? Spamming it is incredible hard for humans (whether AI is involved or not) because it has a pretty good system of moderators based on the social trust model. The only way to bypass it is to effectively bribe a mod, similarly to how spammers pay for backlinks to increase their PageRank. This is what prevents it from turning into a spam farm, not the lack of text content.

mellosouls · on March 14, 2023

[flagged]

SamBam · on March 14, 2023

I'm sorry, how will a LLM trained on human writing look less human?

We've already seen that, without very restrictive lockdowns by researchers, LLMs will display all the worst kinds of human traits. Why would this be different?

mellosouls · on March 14, 2023

Replied to a sibling comment.

ricketycricket · on March 14, 2023

You mean the AI content trained on human input?

mellosouls · on March 14, 2023

Yes, of course I mean that, and we've seen how flawed it is when it is either gamed by the left (during training and release control) or right (post release trolling).

I'm taking it as a given that we will have reasonably robust solutions to that by the time it is able to compete with Wikipedia - in which case, for the reasons given it will offer a positive distinguishing input, though I suspect it will be complementary for a time.

Perhaps we'll see a neutrality-bot at some point, automatically flagging some of the problematic behaviour that undermines Wikipedia...

realce · on March 14, 2023

the irrational worshiping is really starting off quickly

redox99 · on March 13, 2023

They used OpenAI davinci to generate the dataset for instruction tuning. They mention one of the reasons this model is noncommercial is because OpenAI's terms of use prohibit developing models that compete with OpenAI.

They also published the generated dataset.

If one were to use this dataset to tune your own commercial model, would this be OK, or illegal? I would guess OpenAIs ToS no longer applies, as this is now Alpaca's dataset. You are never agreeing to OpenAIs ToS.

freedmand · on March 13, 2023

The instruction tuning dataset is only 52,000 rows. It shouldn't be too hard to crowdsource high-quality human answers to this many questions and retrain the model, nixing the dependency on OpenAI.

Tiberium · on March 13, 2023

Such a thing already exists and there were some results - https://open-assistant.io

I'm not sure why the authors of Alpaca didn't try to train it on this dataset.

IanCal · on March 13, 2023

That dataset isn't released yet.

> Can I download the data?

> You will be able to, under CC BY 4.0, but it's not released yet. We want to remove spam and PII before releasing it.

losteric · on March 13, 2023

There's the OIG dataset (https://laion.ai/blog/oig-dataset/) which was used to train a NeoX 20B ChatBot (https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-...). The dataset is larger and publicly available. I want to try finetuning LLaMa on this tonight.

Jack5500 · on March 14, 2023

how did it go?

ilaksh · on March 13, 2023

Wow.. I really hope someone will train this model with that dataset. Or maybe open assistant will pick it up. The results looks so promising.

scottlawson · on March 13, 2023

This model is for non commercial use only so it's unclear to me how it could compete commercially with OpenAI. Sure, you could decide to interpret "compete" in a way that makes it seem like this violates TOS, but the mom commercial restriction is a powerful counterargument too.

vishal0123 · on March 14, 2023

They allow others to use it for noncommercial use. Other research groups won't have to use openAI APIs for some of the usecases, hence the model is competing.

hgsgm · on March 14, 2023

> OpenAI's terms of use prohibit developing models that compete with OpenAI.

People shouldn't collaborate with Ope(ratio)nAI's attempt to poison the English language with its name.

Government shouldn't allow lies to be trademarked brand names. It's fraud.

wongarsu · on March 13, 2023

I think it's a reasonable interpretation that Alpaca developers hold copyright over the dataset and can distribute it however they want, just how you hold copyright for any pictures you make with a camera. OpenAI can go after the Alpaca developers for the ToS violation, maybe force them to stop distributing the dataset on those grounds, but I don't see how that would affect third parties using that dataset.

adam_arthur · on March 13, 2023

Pretty exciting to see the democratization of this tech.

The next wave of innovation seems like it’ll be much less centralized into a handful of big companies. Even more powerful closed door LLMs will appear, but already so much utility out of these that can run on consumer hardware.

bogwog · on March 13, 2023

Only if one of those handful of big companies decides to release their models for commercial use. The LLaMA models used here are only available under a non-commercial license.

Can’t build anything significant with leaked models.

Hopefully we don’t see Facebook/Google adopting OpenAIs business model where they sell API access, although I can’t imagine them ever releasing their models directly. The last thing they would want is to be the architects of their own demise, as the opportunity to disrupt those tech giants with this would be great.

oceanplexian · on March 13, 2023

I think anyone can train one of these models. “anyone” being someone who can throw $1-5M in cloud resources at compute. Or a bored sysadmin with access to a few thousand GPUs and a cool boss.

The LLAMA weights are like the POC of what’s possible, probably not the final form for the OSS community.

bitL · on March 13, 2023

There seems to be a big difference between efficiently training a "large-ish" model on 4-8 GPUs and a gigantic model on 1000+ GPUs. The same techniques might not work due to different warm up steps, gradient overlaps etc.

All you can see running in the wild are quantized LLaMA variants (4 or 8-bit) whereas the original model is 32-bit.

wkat4242 · on March 13, 2023

Sounds like a great Kickstarter option for a truly free model. I'd back it.

kir-gadjello · on March 13, 2023

Please do it, people shouldn't put up with the apathetic siloed status quo. I'm sure people will find all sorts of beneficial uses for these models they are going to run on their own hardware!

pdc56 · on March 14, 2023

These models are as valuable as code now.

And we know how open source eventually took over pretty much all code. I expect this to be the same, but instead of four decades, it will be four months

oezi · on March 13, 2023

Whats possible now (March 2023).

The field is moving so quick.

azeirah · on March 13, 2023

Is there a reason why we aren't using BLOOM's model?

It's huge, 160B parameters. It's open, it's free, it's been out there for a while.

Is something wrong with it? Using outdated training data? Using outdated training techniques? Or is it just not well known enough -- ie, not in the news at the right time.

kir-gadjello · on March 13, 2023

BLOOM, while being a useful model for underrepresented languages, underperforms openai's GPT-3 models of similar parameter count on english benchmarks. Which is a pity.

And OpenAI's known models (at least the ones for which the architecture is publicly known) https://platform.openai.com/docs/model-index-for-researchers aren't pinnacles of parameter efficiency either: Deepmind's Chinchilla and LLaMA beat them on benchmarks while being much smaller.

Just an opinion, but if we truly aim for "green computing" and carbon footprint reduction, we should pay a lot of attention to the parameter efficiency of the models that get run in our datacenters - and we should invest money into public availability of the most efficient foundation models.

espadrine · on March 14, 2023

> underperforms openai's GPT-3 models of similar parameter count on english benchmarks.

On the flip side, it overperforms it on every other language. It seems clear to me that it is more useful for popular applications (where a large userbase will use their own tongue).

It also has a 7B version. Although it is much weaker than LLaMA, it could be pretrained to match, especially through distillation from LLaMA. By now, it is clear that even Chinchilla is underestimating how much we gain from training more: training is more efficient thanks to new techniques, and even in the paper, some lines are stopped early before reaching Pareto (likely because it ran out of tokens, but LLaMA makes it seem that >1 epoch training is fine).

mirekrusin · on March 14, 2023

Incentives already exist for being efficient - but of course it doesn't mean less carbon footprint, it means more powerful models.

In other words every carbon footprint reduction optimisation is used to yield more efficient output with the same (all available) carbon footprint.

ninjin · on March 14, 2023

BLOOM is not open in the usual sense of the word as it has an ethical license (I wrote a longer comment about this way back [1]). I doubt that this is an issue for academic usage, but I am not sure how comfortable one should be basing a company on compliance with (what is so far) an untested license with vague and wide wording.

[1]: https://news.ycombinator.com/item?id=32079558

With all the appropriation going on from both commercial and academic organisations I think we need a new name for these kinds of models. Currently I am leaning towards “restricted”, as they are neither open, nor proprietary.

stu2b50 · on March 13, 2023

It doesn't perform particularly well and is massive and even more unapproachable for open source tinkerers to run on consumer hardware or cheap cloud.

Llama performs better on benchmarks while a fraction of the size.

adam_arthur · on March 13, 2023

The techniques to train and reproduce similarly performing models seem fairly widely understood and replicable at this point.

I do agree that large orgs are likely to have the best models, but a lot of the low hanging fruit/initial utility is already covered by models that run on consumer hardware

bitL · on March 13, 2023

Can you post a link on how to train a gigantic LLM on consumer hardware with all detailed steps please?

EDIT: You seem to have deleted your response but I am posting my reply anyway:

Microsoft has mentioned today that it cost them many hundreds of millions of USD to accommodate OpenAI models. I studied NLP at Stanford including LLMs. Only Google had a smaller transformer that could somewhat compete (Reformer) but the rest went with the brute force GPT-like models and 1000+ GPUs, $10M+/training run etc. I haven't seen a single post anywhere explaining how to train such large models (a smaller model on 4-8 GPUs doesn't count). "Facts are clear"

kir-gadjello · on March 13, 2023

"Accomodate" is the word to scrutinize here. Yes, it will cost a lot to outright buy physical HPC infrastructure to train and infer a series of large models deployed for customers all over the globe. No, it won't cost nearly as much to rent cloud infra to train a similarly-sized model. No, you won't be able to train a large model on a single multi-GPU node, you will need a cluster containing a respectable power of two of GPUs (or other accelerators).

It's a widely known meme at this point, but to reiterate: For a popular large model, the largest part of the cost will be spent on inference, not on training. If we assume inference on end user device, this cost disappears.

And even if you have the million to rent a cluster, there is a very deep question of the optimal architecture, dataset and hyperparameters to train the best model possible under given constraints.

bitL · on March 13, 2023

You are missing the point. Extremely large LLMs don't train the same way as your BERT_Large x8 variety of LLMs. Your whole training procedure is different. Also Microsoft spent so much initially because their Azure Cloud was unable to cope with it electrically and they had to rewire a datacenter for it. So it's not even a question of just renting 1000 GPUs. Do you have actual experience training GPT-3+ sized models?

kir-gadjello · on March 13, 2023

If you are interested in the infrastructure-level details of how similar models are trained by lesser known groups, take a look at this paper: https://arxiv.org/abs/2204.06745

Quotes from the paper: Our model is trained using a codebase that builds on Megatron (Shoeybi et al., 2020) and DeepSpeed (Rasley et al., 2020) to facilitate efficient and straightforward training of large language models with tens of billions of parameters. We use the official PyTorch v1.10.0 release binary package compiled with CUDA 11.1. This package is bundled with NCCL 2.10.3 for distributed communications.

We trained GPT-NeoX-20B on twelve Supermicro AS-4124GO-NART servers, each with eight NVIDIA A100-SXM4-40GB GPUs and configured with two AMD EPYC 7532 CPUs. All GPUs can directly access the InfiniBand switched fabric through one of four ConnectX-6 HCAs for GPUDirect RDMA. Two NVIDIA MQM8700-HS2R switches—connected by 16 links—compose the spine of this InfiniBand network, with one link per node CPU socket connected to each switch.

And if you are interested in 176B-scale training, read the BLOOM-176B and OPT-175B papers and research logs.

generalizations · on March 13, 2023

> Can’t build anything significant with leaked models.

If it's behind a SaaS, I think you can.

redox99 · on March 13, 2023

I think a lot of people are going to just ignore the "license".

yieldcrv · on March 13, 2023

Feels like we’ll get to “good enough” in a matter of weeks.

Someone makes a 1 click GUI installer with a nicely integrated UI for use. No command line use at all.

We’ll get model and weight distributors for updated knowledge.

londons_explore · on March 13, 2023

The training data doesn't seem to be great quality...

        "instruction": "Calculate the sum of each column in the following table.",
        "input": "2   |  3  |  4\n   5   |  6  |  7",
        "output": "2+3+4 = 9\n5+6+7 = 18"

I think better results could be obtained by manually going through these 52,000 training examples - even a couple of seconds per example should be enough to weed out obviously wrong ones, and should only take <$1000 of human time.

leobg · on March 13, 2023

Don’t even need human time. Just run it through davinci. Is this correct, yes or no? Or MTurk. Can cost less than $500.

bogwog · on March 14, 2023

Not sure about this specific example, but the prompts used for training were generated by davinci. The researchers only used 175 human-written prompts, and these were fed into davinci to generate the other ~52k used for fine tuning LLaMA to make it follow instructions. They had humans review the generated prompts to eliminate bad ones before fine tuning, but humans mess up.

I wonder how davinci would respond if every single output was fed back into itself with a prompt like "does this contain any errors?". That probably wouldn't work very well.

BoorishBears · on March 14, 2023

A trick I've started to use with davinci is prompting it to use a calculator for all math

I preprompt it to believe it's math cannot be trusted (because it can't) and to instead output "!c <insert basic calculations>" any time it would otherwise try to solve something

The next step I'm working on this weekend is piping that to Wolfram Alpha and seeing if I can put together a wrapper to make it "seamless" in terms of responses

jimmyjack · on March 14, 2023

Funny enough it would.

LLMs have proven to be quite good at evaluating their own output. Think of things like "Think step by step" and how that improves output.

notpushkin · on March 14, 2023

Well, MTurk is selling human time, so...

minimaxir · on March 13, 2023

One of the famous use cases of InstructGPT is being able to infer things from structured data, even unusual questions.

londons_explore · on March 13, 2023

Yes... but the answer is wrong...

If you train it on wrong answers, it will generate more wrong answers...

The correct answer should be:

    2+5=7\n3+6=9\n4+7=11

z3t4 · on March 13, 2023

Training a model to do math is impossible. If you tell a child that 1+1=2 10+10=20 100+100=200 an "AI" can not figure out that 1000+1000=2000 like a child can.

i-use-nixos-btw · on March 13, 2023

A language model cannot, by itself, figure that out, at least not to arbitrary complex inputs.

But it’s a fairly sweeping statement to say that AI cannot do it.

barking_biscuit · on March 13, 2023

>Training a model to do math is impossible.

https://www.deepmind.com/blog/discovering-novel-algorithms-w...

eeegnu · on March 14, 2023

I wish I still believed this.

uh_uh · on March 13, 2023

Has this actually been tried? It would be trivial to generate a very large dataset of simple additions and see if an LLM gets it. Maybe it wouldn't serve an immediate practical purpose but I'm curious.

z3t4 · on March 14, 2023

I did try this, although I'm not an AI expert. It will work great for everything that are in the training set, but if there is a hole the model will have trouble. Lets say we train a model with the numbers 1 and 2. 1+1=2, 1+2=3, 2+2=4 we now give the model 2+1 it will likely guess 2 or 4 because those are the ones that have 2 or 1 in the correct place. A neural network can't do logic. It's good at other things though, for example parsing, a language model could parse human input "two plus one" to 2+1 and then just calculate the answer using a math library.

rkwasny · on March 13, 2023

Of course it is possible for LLM model to learn this!

The problem was with token embeddings, not with math.

kossTKR · on March 13, 2023

A Tangent, but how long before we will see half the population having relationships with their AI assistants like in the sci-fi movie "Her".

Add cutting edge voice synthesis and maybe a realistic video stream in a few years and we're there.

Japan already has problems with the concept of "2d Waifus" and lack of intimacy between the younger generation.

Maybe the downfall will not just be climate catastrophe but hyper isolated people living alone with their ultra realistic bot friends and family without any desire to experience the ups and downs of actual social experience.

Extrapolate a bit and you wont know what's real outside of direct experience. Things are getting crazy!

bentcorner · on March 13, 2023

> A Tangent, but how long before we will see half the population having relationships with their AI assistants like in the sci-fi movie "Her".

I don't know about half, but some people are already having relationships: https://replika.ai/

> Maybe the downfall will not just be climate catastrophe but hyper isolated people living alone with their ultra realistic bot friends and family without any desire to experience the ups and downs of actual social experience.

I think the danger is that bots are not necessarily ultra realistic, at least on an emotional level - they can be 100% subservient and loyal to you.

Also - we already chide parents for letting their kids grow up stuck to a device. Imagine if children could actually have an imaginary friend? Would that AI share the same culture and values as your family?

I suppose there could be some upsides but this is very uncharted territory.

simonw · on March 13, 2023

That's been happening already with Replika: https://www.abc.net.au/news/science/2023-03-01/replika-users...

seydor · on March 14, 2023

I don't know about relationships,but as far as online discussions, AIs are better because they don't argue in bad faith like all of the internet. Unfortunately they are too censored currently but locally running bots will arrive

earleybird · on March 13, 2023

To add to sci-fi examples, Surrogates https://www.imdb.com/title/tt0986263/

antondd · on March 13, 2023

Once we have models that can generate perfect audio and video + better AR/VR tech, it’s game over.

technerder · on March 13, 2023

Is there anyway to use Google's Coral TPU PCIe Accelerator[1]'s to increase the inference performance of GPT models like this and Llama?

[1] https://coral.ai/products/pcie-accelerator/

tempaccount420 · on March 14, 2023

Supposedly TPUs are better than GPUs, but everyone other than Google uses GPUs for models.

unshavedyak · on March 13, 2023

I absolutely love the progress taking place here. Gives me hope for being able to run a helpful assistant on my local. Feeding it all sorts of information and getting useful features without leaking anything to 3rd parties.

superkuh · on March 13, 2023

I've played a lot with davinci 3 ($25 of credits worth) and it can do some impressive rhyming and interpretation of concepts as emoji sequences. From the 3 times I've interacted with this fine tuned llama 7B it is clear it cannot do that. I've also run the "vanilla" 7B, 13B, and 30B on my home computer with llama.cpp modified for interactive "chat" mode with various pre-prompt and these can't do it either.

I have no love for openai/microsoft but their text completion davinci 3 really is incredible. I don't think these littler models have quite reached the point of emergence for some functions despite fine tuning.

valine · on March 13, 2023

7B parameters is next to nothing when compared to gpt3. If 7B works as well as it does here, A fine tuned 65B model could very easily achieve chatGPT level performance.

SteveDR · on March 13, 2023

I thought ChatGPT is only 20B parameters to begin with?

(Source https://www.forbes.com/sites/forbestechcouncil/2023/02/17/is...)

sebzim4500 · on March 13, 2023

I haven't seen anything official from OpenAI confirming that ChatGPT has fewer than 175B parameters, although it is a reasonable guess if you read between the lines of their statements.

Given the author of that article is a CEO of an 'AI Ad Optimization Platform' I think that number is speculative at best.

stavros · on March 13, 2023

ChatGPT is fine-tuned InstructGPT, which has 1.3B parameters, if I'm not mistaken.

Reference for the former: https://www.technologyreview.com/2023/03/03/1069311/inside-s...

sebzim4500 · on March 14, 2023

InstructGPT isn't a single model, it's a set of techniques for finetuning a foundation model

est · on March 14, 2023

what does "1.3B parameters" mean in this context?

Does it mean we load 175B gpt-3 model first, then overwrite 1.3B parameters with InstructGPT?

I find this sentence difficult to understand

> Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model

https://openai.com/research/instruction-following

I am a newbie, plz correct me if I am wrong.

sebzim4500 · on March 14, 2023

They mean that they took a 1.3B parameter model, applied the InstructGPT finetuning model and found that it worked better for their usecase than a 175B parameter model which had not gone through that process.

est · on March 14, 2023

Ah I got it now. Thanks.

From the gpt-3 paper it looks like they have many variants like

- GPT-3-350M

- GPT-3-1.3B

- GPT-3-2.7B

- GPT-3-6.7B

- GPT-3-13B

- GPT-3-175B

Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively. The names are pretty suggestive.

thot_experiment · on March 14, 2023

I mean, I'd push back a little bit on that statement but not much, 13B is crazy nice even a 4bit on a 5 year old graphics card. I was able to get it to pretend to be a VM, cat some code into a file, and then pretend it executed the code and it returned correct answers on the first try. I can't wait to try 30B and 65B when I can get access to hardware large enough.

abidlabs · on March 13, 2023

Cool to see that the entire website, from the agreement to generating the predictions, is one Gradio [1] app written in Python

[1] https://gradio.dev/

zwaps · on March 13, 2023

would be better if it'd work on Firefox tho