Notes on AI

语音会成为主要用户界面吗？

语音多年以来的卡点正在被一一快速消解。

Neo Zhang

May 17, 2024 — 11 min read

在 GPT-4o 的演示中，OpenAI 重点演示了语音交互的特性。很多人会怀疑，语音作为一种低清晰度（low fidelity）的媒介很难承担复杂交互的重任。的确，语音传输信息的方式是串行的，人和机器必须交替进行，即便我们用 2 倍速说和听，信息交互的速率也比不上我们用眼睛看和读。更不用提过去十年中在使用 Siri 和 Alexa 这些初代语音助手带给我们的挫败感，以至于今天在想起语音交互的时候，还是停留在问天气、计时、汇率转换这些基础 case 上。

Ben Thompson 在 SharpTech podcast 上讲道：

A lot of stuff that we think about as productivity has, for 40 years, been optimized for the WIMP interface - Windows, Icons, Mouse, Pointer. And so, they're not going to work better on touch. They're not going to work, maybe, not even on the Vision Pro, unless we have sort of a keyboard and a mouse to use it. And that's because there is a sort of determinism that happens, a path dependency, that's actually downstream from the user interface. Because we have a mouse and a keyboard, we build applications that leverage the mouse and keyboard. And we think of those applications as productivity.

40 年来，我们视为生产力的很多东西都针对 WIMP 界面进行了优化 - 窗口、图标、鼠标、指针。因此，它们在触摸时不会表现得更好。除非我们有键盘和鼠标来使用它，否则它们甚至可能无法在 Vision Pro 上工作。这是因为存在一种确定性，一种路径依赖，它实际上是用户界面的下游。因为我们有鼠标和键盘，所以我们构建了利用鼠标和键盘的应用程序。我们将这些应用程序视为生产力。

这段话给我很大的启发：我们过去所习惯的生产力软件很大程度上是围绕 Windows 的 UI 建立的，它与鼠标键盘的输入设备强关联。苹果在 iPhone 上花了十几年时间构建基于触摸屏的交互，而到了 iPad 上仍然不能直接平移——对 iPad 作为生产力工具的预期始终难以得到满足。

语音会成为主要用户界面吗？

Neo Zhang

Read more

代码的 YouTube 时刻

SPECIAL ISSUE: 暂停与重启

05.19.25: ChatGPT 是个好名字

05.12.25: 科技史即泡沫史