Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

is end to end speech model like openai real time /gemini live or open source qwen 3 omni better in terms of latency?


There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: