Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think probably training on pause tokens or something similar would be the key to something like this. Maybe it's not even necessary. Maybe if you just tell GPT-4 to output something like .... every time it thinks it should wait for a response (you wouldn't need to wait for the user to finish then), things would be a lot smoother.


Yes, you could probably fine-tune (or even zero-shot) a LLM to handle the "knowing when to jump in" use case.

The real problem is that it's simply too computationally expensive to continually feed audio and video into it one of these massive LLMs just in case it might decide to jump in.

I was wondering if you could train a lightweight monitoring model that continually watching the audio/video input and only tried to work out when the full-sized LLM might want to jump in and generate a response.


As the human brain is a clump if regions all interconnected and interacting, for example, one may focus their attention elsewhere until their name is called, having a ight model wait for an important queue makes sense more than fiscally.

One time I was so distracted, I missed an entire paragraph someone said to me, walked to my car, drove away, and 5 minutes later processed it.


Yeah, one thing I've noticed myself do is that when I'm focused on something else and someone suddenly gets my attention I'll replay the last few seconds of the conversation in my head to get context on what was being talked about before I respond. That seems pretty trivial to do with a LLM; it doesn't need to be using 100% of its "brainpower" at all times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: