ChatGPT can remember but can it listen?

No! Well sorta! But not well!

Feb 16, 2024

You may have heard that ChatGPT will now have memory. That’s cool. Memory seems like an essential ingredient for a useful interlocutor. It got me thinking: what else can ChatGPT improve on? Listening, I think.

I’m not sure if you’ve used the voice mode for ChatGPT, the mode where you talk out loud to it and it responds in kind, but it’s a bit weird. It’s not an output problem. The responses are powered by the same intellect as the text-based mode, and the actual voice is remarkably human. It’s a problem in the input state.

ChatGPT’s input state. It says it’s listening and demonstrates it with…uh…a circle.

In the input state, ChatGPT says it’s listening, and technically is, but it has few of the trappings of listenership. It does the computer equivalent of giving you a blank stare until you trail off.

I think it’s mostly understood that actual listening involves a live feedback loop between the listener and the speaker. As you talk, the listener reacts to what you’re saying with their face, body, and voice. It’s not just that they hear you; it’s that your words change their disposition, and that disposition changes your words.

I think this is the manner of thing that ChatGPT needs to be a better listener. The live feedback loop, the shifting disposition while listening. It’s not totally clear to me how human-like these specific mechanics need to be, but let’s see what we find.

What follows are some rough thoughts on making ChatGPT a better listener. These aren’t pitches; they’re just a way of thinking through possibilities—sketches and starting points.

Concept 1: Clarity Graph

ChatGPT gives you live feedback on how well it’s understanding you. Perhaps if you pause when it’s more confused (towards the question mark) it will ask a clarifying question.

Concept 2: Live Questions

ChatGPT listens and displays questions visually as it forms them. Questions further from the center are more esoteric or more likely to change the subject, while closer ones are more about clarifying what you’re saying. The user can click on one, and ChatGPT will ask it.

Concept 3: Symbolic Faces

We give ChatGPT a symbolic face. A way to express in a human-like way while avoiding something too uncanny. They flash across the screen as you speak. If you pause speaking while one is being shown, perhaps it will respond in terms of that face–supportive or confused or something else.

Concept 4: Embrace the Uncanny

I think this one simultaneously looks the coolest and is my least favorite. Here, we attempt to create a relatively human-like face that reacts live.

I think the major risk here is that it ends up being creepy. I think about this as both a UX risk and a brand perception risk. My guess is that it’s best to start more abstractly (as with the previous examples) and possibly work in this direction.

Closing Thoughts

Presumably, we’d need some technical advancements to make this type of feature work. It’s not simply adding more UI; it’s changing the pattern by which ChatGPT sends and receives information.

That said, it feels like a technical hurdle worth jumping. The magic of ChatGPT is how human it can seem when typing with it. In voice mode, that illusion falls flat. More generally, it’s just hard to understate the value of listening. Someone who can listen well is more useful in more circumstances than someone who can’t.

Matt Bloom-Carlin's Newsletter

Discussion about this post