Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I also do not like Chat interface. What I meant by above comment was actually talking and having natural conversations with Operator agent while driving car or just going for a walk or whenever and wherever something comes to my mind which requires me to go to browser and fill out forms etc. That would get us closer to using chatGPT as a universal AI agent to get those things done. (This is what Siri was supposed to be one day when Steve Jobs introduced it on that stage but unfortunately that day never arrived.)


> This is what Siri was supposed to be one day when Steve Jobs introduced it on that stage but unfortunately that day never arrived.

The irony is, the reason neither Siri nor Alexa nor Google Assistant/Now/${whatever they call it these days} nor Cortana achieved this isn't the voice side of the equation. That one sucks too, when you realize that 20 years ago Microsoft Speech API could do better, fully locally, on cheap consumer hardware, but the real problem is the integration approach. Doing interop by agreements between vendors only ever led to commercial entities exposing minimal, trivial functionality of their services, which were activated by voice commands in the form of "{Brand Wake word}, {verb} {Brand 1} to {verb} {Brand 2}" etc.

This is not an ergonomic user interface, it's merely making people constantly read ads themselves. "Okay Google, play some Taylor Swift on Spotify" is literally three brand ads in eight words you just spoke out loud.

No, all the magical voice experience you describe is enabled[0] by having multimodal LLMs that can be sicced on any website and beat it into submission, whether the website vendor likes it or not. Hopefully they won't screw it up (again[1]) trying to commercialize it by offering third parties control over what LLMs can do. If, in this new reality, I have to utter the word "Spotify" to have my phone start playing music, this is going to be a double regression relative to MS Speech API in the mid 2000s.

--

[0] - Actually, it was possible ever since OpenAI added function calling, which was like over a good year ago - if you exposed stuff you care about as functions on your own. As it is, currently the smartphone voice assistant that's closest to Star Trek experience is actually free and easy to set up - it's Home Assistant with its mobile app (for the phone assistant side) and server-side integrations (mostly, but not limited to, IoT hardware).

[1] - Like OpenAI did with "GPTs". They've tried to package a system prompt and function call configuration into a digital product and build a marketplace around it. This delayed their release of the functionality to the official ChatGPT app/website for about half a year, leading to an absurd situation where, for those 6+ months, anyone with API access could use a much better implementation of "GPTs" via third-party frontends like TypingMind.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: