I don't. Chat interface sucks; for most of these things, a more direct interface could be much more ergonomic, and easier to operate and integrate. The only reason we don't have those interfaces is because neither restaurants, nor airlines, nor online stores, nor any other businesses actually want us to have them. To a business, the user interface isn't there to help the user achieve their goals - it's a platform for milking the users as much as possible. To a lesser or greater extent, almost every site actively defeats attempts at interoperability.
Denying interoperability is so culturally ingrained at this point, that it got pretty much baked into entire web stack. The only force currently countering this is accessibility - screen readers are pretty much an interoperability backdoor with legal backing in some situations, so not every company gets to ignore it.
No, we'll have to settle for "chat agents" powered by multimodal LLMs working as general-purpose web scrappers, because those models are the ultimate form of adversarial interoperability, and chat agents are the cheapest, least-effort way to let users operate them.
I think the chat interface is bad, but for certain things it could honestly streamline a lot of mundane things as the poster you're replying two stated.
For example, McDonald's has heavily shifted away from cashiers taking orders and instead is using the kiosks to have customers order. The downside of this is 1) it's incredibly unsanitary and 2) customers are so goddamn slow at tapping on that god awful screen. An AI agent could actually take orders with surprisingly good accuracy.
Now, whether we want that in the world is a whole different debate.
McDonalds is a good example. In the beginning the Kiosks were a real time-saver, and you could order with a few "klicks".
Today, you need to bypass "do you have the app", "do you want fries with that", "do you want to donate", "are you sure you don't want fries?" and a couple more.
All this is exactly what your parent comment was saying: "To a business, the user interface isn't there to help the user achieve their goals - it's a platform for milking the users as much as possible."
Regarding sanitation, not sure if they are any worse than, say, door handles.
Come to think of it, chat may make things even worse.
What I wrote earlier, about business seeing the interface as a platform for milking users, applies just as much to human interface. After all, "do you want fries with that?" didn't originate with the Kiosks, but with human cashiers. Human stuff is, too, being programmed by corporate to upsell you shit. They have explicit instructions for it, and regular compliance checks by "mystery shoppers".
Now, the upsell capabilities of human cashier interface are limited by training, compliance and controls, all of which are both expensive and unreliable processes; additionally, customers are able to skip some of the upsells by refusing the offer quickly and angrily enough - trying to force cashiers to upsell anyway breaks too many social and cultural expectations to be feasible. Meanwhile, programming a Kiosk is free on the margin - you get zero-time training (and retraining) and 100% compliance, and the customer has no control. You can say "stop asking me about fries" to a Kiosk all day, and it won't stop.
It's highly likely a voice chat interface will combine the worst of the characteristics above. It's still software like Kiosk, just programmed by prompts, so still free on the margin, compliant, and retrainable on the spot. At the same time, the voice/conversational aspect makes us perceive the system more like a person, making us more susceptible to upsells, while at the same time, denying us agency and control, because it's still a computer and can easily be made to keep asking you about fries, with no way for you to make it shut up.
> Regarding sanitation, not sure if they are any worse than, say, door handles.
It will depend on the material of the door handles. In my experience, many of the handles are some kind of metal, and bacteria has a harder time surviving on metal surfaces. Compare that to a screen that requires some pretty hard presses in order to get registered inputs from it, and I think you'd find a considerably higher amount of bacteria sitting there.
Additionally, I try to use to my sleeve in order to open door handles whenever possible.
McDonald's already tried having AI take orders and stopped when the AI did things like randomly add $250 of McNuggets or mistake ketchup for butter.
Note - because this is something which needs to be pointed out in any discussion of AI now - even though human beings also make mistakes this is still markedly less accurate than the average human employee.
Indeed. I think a GPT-4o class model, properly prompted, would work just fine today. The trick is, unlike a human, the computer is free to just say "no" without consequences. The model could be aggressively prompted to detect and refuse weird orders. Having to escalate to a human supervisor (who conveniently is always busy doing other things and will come to you in a minute or three) should be sufficient at discouraging pranksters and fraudsters, while not annoying enough to deter normal customer.
(I say model, but for this problem I'd consider a pipeline where the powerful model is just parsing orders and formulating replies, while being sanity-checked by a cheaper model and some old-school logic to detect excessive amounts or unusual combinations. I'd also consider using "open source" model in place of GPT-4o, as open models allow doing "alignment" shenanigans in the latent space, instead of just in the prompts.)
I've never used a McDonalds kiosk for the reason you gave. Actually, I think no matter how much you streamlined it with cutting edge AI assistants it would still be faster and more natural to just say "A big mac and a diet coke please" to the cashier. I don't see any end-user benefit to these assistants, the only ones who benefit are the bean counters and executives who will use them to do more layoffs and keep the money that saves to themselves.
With a true GPT ordering experience, you would just say “a Big Mac and a diet coke please” to a speaker just like you would in a drive thru and it would ring you up. It would replace the cashier
This is how it is in Australia at some Macca's with a kiosk, no cashiers at all. You can still request but there isn't people just waiting for you to order.
The guy taking orders does other things rather than just taking an order. Wake me up when chatgimp can prepare my fries and bring the bag with ready food to my car.
That will depend on the materials used for the door handles. If the handles are made of metal, then bacteria generally has a harder time surviving. Additionally, I use my sleeve when opening the door.
The point may have flown over your head. The kiosks are cleaned than most other items you will be touching up until that point. It is not incredibly unsanitary but it can be aggravating for those that think a lot about germs.
I quite like the kiosk system for ordering McDonald's. You can see the entire available menu, along with all possible options for adding or removing ingredients, sides, sizes, combo deals, etc. You can always see the current state of your order. A chat-based interface wouldn't be a major improvement on this UX imho.
Yes. Chat is absolutely bad, because it is opaque. It perfectly reproduces what used to be called "hunt the verb" in gaming, for the same reason. The simple truth is you're interacting with a piece of software, with features and subroutines. GUIs are great at surfacing features, affordances, changing with context. A chat interface invites you to guess.
LLMs, if used at all, aren't aware enough to even know what the software can do, and many actual chat UIs are worse than that!
My "favourite" design pattern for chat UIs is to invite you type, freely, whatever you like, then immediately enter a wizard "flow" like it's 1991 and entirely discard everything you typed. Pure hostility.
I never thought about this. Does McD's PR team have anything to say about it? I assume that a bunch of people have challenged them about it on Twitter or TikTok. Would you feel better if there was a kind of automatic/robotic window washer that sanitised the screen after each use?
The key to me about the kiosks is: (1) initially, replace cashier labour costs with new expensive machines, and (2) medium-to-long term, upgrade the software with more and more "upsell" logic. This could be incredibly effective as a sales tactic. (Not withstanding the possibility, I fully agree with your final sentence!)
Can you imagine if a celebrity, like Kim Kardashian or David Beckham, lent their likeness for a fee to McD's to create an assistant that would talk with you during your order? (Surely, AI/ML can generate video/anime that looks/moves/sounds just like them.) I can foresee it, and it would be the near-perfect economic exploitation of parasocial relationships in a retail setting.
> I never thought about this. Does McD's PR team have anything to say about it? I assume that a bunch of people have challenged them about it on Twitter or TikTok.
They probably ignore them, as they should - the same problem exists everywhere, from ATMs to door keypads to stores to self-checkout to tapping your card on stuff, etc.
> initially, replace cashier labour costs with new expensive machines,
Labor, like energy, is conserved in the system. It might be easier to counter proliferation of those systems if the narration was focused less about companies replacing labor on their side, and more on the fact that this labor gets transferred to the customers, who are now laboring for free for the company, doing the same things that used to be done better and faster by a dedicated employee.
I'm honestly pretty aware of it. When I open the door, I try to use my sleeve. If I'm unable to do that (say I'm wearing a short sleeve shirt), then I'll consider washing my hands if I'm eating in.
McDonald’s makes a lot more money with the kiosks. Slowness is an issue but the upselling is major, and putting a lot of images of tasty looking things in front of a hungry person is very effective. Chat could never do this!
I also do not like Chat interface. What I meant by above comment was actually talking and having natural conversations with Operator agent while driving car or just going for a walk or whenever and wherever something comes to my mind which requires me to go to browser and fill out forms etc. That would get us closer to using chatGPT as a universal AI agent to get those things done. (This is what Siri was supposed to be one day when Steve Jobs introduced it on that stage but unfortunately that day never arrived.)
> This is what Siri was supposed to be one day when Steve Jobs introduced it on that stage but unfortunately that day never arrived.
The irony is, the reason neither Siri nor Alexa nor Google Assistant/Now/${whatever they call it these days} nor Cortana achieved this isn't the voice side of the equation. That one sucks too, when you realize that 20 years ago Microsoft Speech API could do better, fully locally, on cheap consumer hardware, but the real problem is the integration approach. Doing interop by agreements between vendors only ever led to commercial entities exposing minimal, trivial functionality of their services, which were activated by voice commands in the form of "{Brand Wake word}, {verb} {Brand 1} to {verb} {Brand 2}" etc.
This is not an ergonomic user interface, it's merely making people constantly read ads themselves. "Okay Google, play some Taylor Swift on Spotify" is literally three brand ads in eight words you just spoke out loud.
No, all the magical voice experience you describe is enabled[0] by having multimodal LLMs that can be sicced on any website and beat it into submission, whether the website vendor likes it or not. Hopefully they won't screw it up (again[1]) trying to commercialize it by offering third parties control over what LLMs can do. If, in this new reality, I have to utter the word "Spotify" to have my phone start playing music, this is going to be a double regression relative to MS Speech API in the mid 2000s.
--
[0] - Actually, it was possible ever since OpenAI added function calling, which was like over a good year ago - if you exposed stuff you care about as functions on your own. As it is, currently the smartphone voice assistant that's closest to Star Trek experience is actually free and easy to set up - it's Home Assistant with its mobile app (for the phone assistant side) and server-side integrations (mostly, but not limited to, IoT hardware).
[1] - Like OpenAI did with "GPTs". They've tried to package a system prompt and function call configuration into a digital product and build a marketplace around it. This delayed their release of the functionality to the official ChatGPT app/website for about half a year, leading to an absurd situation where, for those 6+ months, anyone with API access could use a much better implementation of "GPTs" via third-party frontends like TypingMind.
Yes. Chat is absolutely bad, because it is opaque. It perfectly reproduces what used to be called "hunt the verb" in gaming, for the same reason. The simple truth is you're interacting with a piece of software, with features and subroutines. GUIs are great at surfacing features, affordances, changing with context. A chat interface invites you to guess.
Voice chat with LLMs is a complete interface, and it's one that already works and can be slotted right into the product. You can prototype voice chat-based ordering app via no-code tools today, and without much effort going into it.
Dynamically generated interactive UIs are something people are barely beginning to experiment with; we don't know if current models can do them reliably for realistic problems, and how effort has to go into setting them up for any particular product. At this point, they're an expensive, conceptual solution, that doesn't scale.
Denying interoperability is so culturally ingrained at this point, that it got pretty much baked into entire web stack. The only force currently countering this is accessibility - screen readers are pretty much an interoperability backdoor with legal backing in some situations, so not every company gets to ignore it.
No, we'll have to settle for "chat agents" powered by multimodal LLMs working as general-purpose web scrappers, because those models are the ultimate form of adversarial interoperability, and chat agents are the cheapest, least-effort way to let users operate them.