Sina Bahram, a brilliant PHD candidate in Human Computer interaction had a great response post to my last article on why Audio interfaces might not be the best solution. He makes some great points and I wanted to share and respond to them here:

I believe that we need to stop trying to limit and restrict interfaces to a single modality of input or output and instead accept that no interface is appropriate for all people and that just because speech might seem sexy now, part of that is novelty and part of it is our response to the needs that speech solves so well that existing interfaces do not.

He argues that a single modality input, like speech input or output is not a solution for everyone in all cases, and that limiting ourselves to a single modality unnecessarily increases error rates.

The point on error rates is a good one, though I do not think that adding another modality is the only answer. First, there are situations where multiple modalities aren’t possible: driving a car or walking down the street for example.

And second, there are other ways to reduce error rates. On a noisy bus or in a noisy cafe, one can use a noise canceling microphone. For those who need their hearing unobstructed, you can use bone conduction lke in Aftershokz or Google’s Glass. (another promising answer are connected hearing aides: Apple will be partnering with major manufacturers to release “made for iPhone” devices later this year).

Another possible argument against an audio only interface is the social-acceptability of talking to your phone in public, without your phone to your ear. I’m not really sure if there is a good answer to this, though I’m excited for possibilities of subvocalization, a sudo-mind reading tech that is maybe 5-10 years out. (see l).

The reason I argue that Audio based interfaces are the next wave in HCI, is not to diminish the power of multi modal interfaces, which like Eric in the comments said are powerful tools in education and otherwise, but rather it’s a response to

Computing is moving beyond mobile touch devices to wearable electronics. Google Glass and the Pebble smart watch are just the first to make it to market. In this ever smaller and more mobile paradigm, what is the modality that is most efficient? I believe that it’s an audio interface is the best bet given existing–and soon to be existing–technology, and that the weaknesses mentioned above are just hurdles to be overcome.

Response: Why just Audio Interfaces