Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
whimsicalism
5 days ago
|
parent
|
context
|
favorite
| on:
Qwen3-Omni-Flash-2025-12-01:a next-generation nati...
Makes sense, I think streaming audio->audio inference is a relatively big lift.
red2awn
5 days ago
[–]
Correct, it's breaks the single prompt, single completion assumption baked into the frameworks. Conceptually it's still prompt/completion but for low latency response you have to do streaming KV cache prefill with a websocket server.
reply
whimsicalism
4 days ago
|
parent
[–]
I imagine you have to start decoding many speculative completions in parallel to have true low latency.
reply
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: