Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for creating this demo. This was the point I was trying to make when the Gemini launch happened. All that hoopla for no reason.

Yes - GPT-4V is a beast. I’d even encourage anyone who cares about vision or multi-modality to give LLaVA a serious shot (https://github.com/haotian-liu/LLaVA). I have been playing with the 7B q5_k variant last couple of days and I am seriously impressed with it. Impressed enough to build a demo app/proof-of-concept for my employer (will have to check the license first or I might only use it for the internal demo to drive a point).



I’ve been using llava via https://github.com/Mozilla-Ocho/llamafile which runs on any modern system.


It's so great. I've been this vision model to rename all the files in my Pictures folder. For example, the one-liner:

    llamafile --temp 0 \
        --image ~/Pictures/lemurs.jpg \
        -m llava-v1.5-7b-Q4_K.gguf \
        --mmproj llava-v1.5-7b-mmproj-Q4_0.gguf \
        --grammar 'root ::= [a-z]+ (" " [a-z]+)+' \
        -p $'### User: What do you see?\n### Assistant: ' \
        --silent-prompt 2>/dev/null |
      sed -e's/ /_/' -e's/$/.jpg/'
Prints to standard output:

    a_baby_monkey_on_the_back_of_a_mother.jpg
This is something that's coming up in the next llamafile release. You have to build from source to have the ability to use grammar and --silent-prompt on a vision model right now.

Weights here: https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main

Sauce here: https://github.com/mozilla-Ocho/llamafile


Truly grateful for your work on cosmopolitan, cosmo libc, redbean, nudging POSIX towards realizing the unachieved dream and also for contributing to llama.cpp. It’s like wherever I look, you’ve already left your mark there!

To me, you exemplify and embody the spirit of OSS, and to top that - you seem to be just an amazing human. You are an inspiration for me and many others. And even though I know I’ll never ever get close, you make me want to try. Thank you. :)


Thanks!


That's cool! I've been a fan of your projects here since redbean was released, and if I understood C I would be more excited about the underlying tech that runs all these tools, but I'm more of an algorithm designer and back-end data processing system programmer (I use Python), so watching the progression of your technology is very impressive but I barely understand how it works :)


Update: For anyone else facing the commercial use question on LLaVA - it is licensed under Apache 2.0. Can be used commercially with attribution: https://github.com/haotian-liu/LLaVA/blob/main/LICENSE


The code is licensed under Apache 2.0, but the weights are CC BY-NC 4.0 according to the README, so no commercial use unfortunately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: