What's the correlation between parameter count and RAM usage? Will LLaMA-13B fit...

maxxk · on March 14, 2023

13B uses about 9GB on my MacBook Air. If you have another machine (x86) with enough RAM to convert the original LLaMA representation to GGML, you can give it a try. But quantization step must be done on MacBook.

Maybe it is more feasible for you to use 7B with larger context. For some "autocompletion" experiments with Python code I had to extend context to 2048 tokens (+1-1.5GB).