Thanks for your encouragement! We are working on quantization as well. We recently submitted a paper, Atom [1], that uses 4-bit quantization, delivering 7.73x throughput compared to FP16 and 2.53x compared to INT8. Atom is able to maintain a perplexity (i.e., model accuracy) close to FP16, outperforming existing quantization approaches.
We are polishing the 4-bit code. It will be added to Punica code base soon. Please stay tuned :)
Added to my reading list! The world of quantizations is moving so fast even TheBloke might not be able to keep up!
So Atom base models would be compatible with Punica?
I also wonder, many people already train LoRAs in 8 or even 4 bit (for the base model), would it make sense to match the quantization algo used during training and inference?
Certainly! We'd like our good designs to be picked up by frameworks and serve all users. Currently, Punica is built on top of PyTorch and HuggingFace Transformers ecosystems. Therefore, vLLM and LMDeploy, which are also in the PyTorch ecosystem, should have a smooth adaption. As for Nvidia Triton and TensorRT-LLM, since our kernels are written in CUDA, I believe it will also work seamlessly.
We call for the open source community to help us integrate Punica with all frameworks, thus the whole society can benefit from the efficiency improvement!
Thank you! We are also very excited about combining the fast fine-tuning and efficient serving. In fact, what you just said is very related to one of our very first motivations. In my previous blog post [1], I call this scheme "Just-in-time Fine-tuning". Our previous measurement is that, for a medium-sized webpage (~10K tokens), it takes around 30 seconds to 2 minutes to finetune a LoRA model. Another good side of this JIT fine-tuning scheme is that, we can turn any model into a long-context model.
We'll keep doing more research on finetuning. And hopefully, we'll see the results soon.
I'm using 27-inch 4K monitors with Gnome. I found it more practical to use no scaling but 1.25x text size (settings in gnome-tweaks). The problem for me with 2x scaling or fractional scaling is that it scales UI (icons, margins) as well. As someone who appreciates functionality (i.e., displaying text) over overdesigned white spacing, scaling the whole UI is just wasting my workspace size whereas scaling text-only is a perfect balance between good-looking text and workspace size.
But anyway, good to see that Linux desktop is gaining fractional scaling support!
...if you're OK with blur and wasted performance. I like my text crisp, rendered exactly to the desired size, with subpixel antialiasing, and knowing my hardware is not wasting cycles in rendering to a higher-than-needed resolution and then throwing some of that away with raster resampling.
What does overdesigned mean? I suppose you mean relying on scientifically proven design practices such as improving readability through use of whitespace and font sizes.
Can you receive SMS from apps that aren't the default SMS handler on modern versions of Android? I seem to recall Google putting a bunch of silly restrictions on it to the point where there wasn't even a permission you could grant if you wanted to.
> If you find a really good tutorial by programmers who are both excellent teachers and experienced in that particular field, it beats JIT learning on a personal project in one important respect. You get exposure to the One True Way of doing things.
That's exactly my feeling when I was watching Jon Gjengset's Rust tutorials. I like his real reactions to unexpected problems. Really learned a lot from this kind of lengthy but realistic videos. https://www.youtube.com/c/JonGjengset/featured
I personally use Rambox (https://rambox.pro/) instead of Franz these days, or just click the "install as app" button in my browser.
Really, I see little value over these browsers with vertically stacked tabs. There's already free solutions out there, what's the added value? Workspaces are easy to replicate using Firefox's containerised tabs and the vertical tab structure can be replicated with something like Tree Style Tabs. The remaining minor improvements are nice to have but certainly not worth $20 in my opinion.