Currently which are the minimum system requirements for running these models?

jart · on Nov 29, 2023

You need at minimum a stock operating system install of:

- Linux 2.6.18+ (arm64 or amd64) i.e. any distro RHEL5 or newer

- MacOS 15.6+ (arm64 or amd64, gpu only supported on arm64)

- Windows 8+ (amd64)

- FreeBSD 13+ (amd64, gpu should work in theory)

- NetBSD 9.2+ (amd64, gpu should work in theory)

- OpenBSD 7+ (amd64, no gpu support)

- AMD64 microprocessors must have SSSE3. Otherwise llamafile will print an error and refuse to run. This means, if you have an Intel CPU, it needs to be Intel Core or newer (circa 2006+), and if you have an AMD CPU, then it needs to be Bulldozer or newer (circa 2011+). If you have a newer CPU with AVX or better yet AVX2, then llamafile will utilize your chipset features to go faster. No support for AVX512+ runtime dispatching yet.

- ARM64 microprocessors must have ARMv8a+. This means everything from Apple Silicon to 64-bit Raspberry Pis will work, provided your weights fit into memory.

I've also tested GPU works on Google Cloud Platform and Nvidia Jetson, which has a somewhat different environment. Apple Metal is obviously supported too, and is basically a sure thing so long as xcode is installed.

anst · on Nov 30, 2023

Time to go amd, poor old me, Intel MB Air 2018 (zsh: exec format error, Darwin Kernel Version 22.2.0, MacOS Ventura 13.1).

jart · on Nov 30, 2023

You need to upgrade to zsh 5.9+ or run `sh -c ./llamafile`. See the Gotchas section of the README.

anst · on Nov 30, 2023

Many thanks! Incredibly versatile implementation.

mercutio2 · on Nov 29, 2023

Apple Security will be excited to reach out to you to find out where you got a copy of macOS 15.6 :)

I'm guessing this should be 13.6?

jart · on Nov 29, 2023

15.6 is a Darwin kernel version from 2018. It's the number `uname -a` reports. We should probably just switch to using XNU version numbers, which are in the 10000s now, so there's no confusion. I'm reasonably certain it works that far back, but I currently lack the ability to spin up old MacOS VMs for testing. Caveat emptor anyone not running MacOS on a recent version.

gary_0 · on Nov 29, 2023

This is jart we are talking about. Perhaps, having made code Actually Portable in space, now she is doing time.

rgbrgb · on Nov 29, 2023

In my experience, if you're on a mac it's about the file size * 150% of RAM to get it working well. I had a user report running my llama.cpp app on a 2017 iMac with 8GB at ~5 tokens/second. Not sure about other platforms.

Hedepig · on Nov 29, 2023

I am currently tinkering with this all, you can download a 3b parameter model and run it on your phone. Of course it isn't that great, but I had a 3b param model[1] on my potato computer (a mid ryzen cpu with onboard graphics) that does surprisingly well on benchmarks and my experience has been pretty good with it.

Of course, more interesting things happen when you get to 32b and the 70b param models, which will require high end chips like 3090s.

[1] https://huggingface.co/TheBloke/rocket-3B-GGUF

jart · on Nov 30, 2023

That's a nice model that fits comfortably on Raspberry Pi. It's also only a few days old! I've just finished cherry-picking the StableLM support from the llama.cpp project upstream that you'll need in order to run these weights using llamafile. Enjoy! https://github.com/Mozilla-Ocho/llamafile/commit/865462fc465...

Hedepig · on Nov 30, 2023

Thank you for this :)

brucethemoose2 · on Nov 29, 2023

Basically enough to fit the download in RAM + a bit more.

In practice, you kinda need a GPU, even a small one. Otherwise prompt processing is really slow.

wazoox · on Dec 3, 2023

It's really decent without any GPU. Image analysis is somewhat long, but text prompts are fine. My Ryzen laptop does 2.5 to 4 tokens per second, my Mac pro more like 8.