The Surprising Repercussions of Making AI Assistants Sound Human
There’s much effort afoot to make the bots sound less… robotic. Amazon recently enhanced its Speech Synthesis Markup Language to give Alexa a more human range of expression. SSML now lets Alexa whisper, pause, bleep expletives, and vary the speed, volume, emphasis, and pitch of its speech.
This all comes on the heels of Amazon’s February release of so-called speechcons (like emoticons, get it?) meant to add some color to Alexa’s speech. These are phrases like ���zoinks,��� ���yowza,��� ���read ’em and weep,��� ���oh brother,��� and even ���neener neener,��� all pre-rendered with maximum inflection. (Still waiting on ���whaboom��� here.)
The effort is intended to make Alexa feel less transactional and, well, more human. Writing for Wired, however, Elizabeth Stinson considers whether human personality is really what we want from our bots���or whether it’s just unhelpful misdirection.
���If Alexa starts saying things like hmm and well, you���re
going to say things like that back to her,��� says Alan
Black, a computer scientist at Carnegie Mellon who
helped pioneer the use of speech synthesis markup tags
in the 1990s. Humans tend to mimic conversational styles;
make a digital assistant too casual, and people will
reciprocate. ���The cost of that is the assistant might
not recognize what the user���s saying,��� Black says.
A voice assistant���s personality improving at the expense
of its function is a tradeoff that user interface designers
increasingly will wrestle with. "Do we want a
personality to talk to or do we want a utility to give
us information? I think in a lot of cases we want a
utility to give us information,��� says John Jones, who
designs chatbots at the global design consultancy Fjord.
Just because Alexa can drop colloquialisms and pop
culture references doesn���t mean it should. Sometimes
you simply want efficiency. A digital assistant should
meet a direct command with a short reply, or perhaps
silence���not booyah! (Another speechcon Amazon added.)
Personality
and utility aren���t mutually exclusive, though. You���ve
probably heard the design maxim form should follow
function. Alexa has no physical form to speak of, but
its purpose should inform its persona. But the comprehension
skills of digital assistants remain too rudimentary
to bridge these two ideals. ���If the speech is very
humanlike, it might lead users to think that all of
the other aspects of the technology are very good as
well,��� says Michael McTear, coauthor of The Conversational
Interface. The wider the gap between how an assistant
sounds and what it can do, the greater the distance
between its abilities and what users expect from it.
When designing within the constraints of any system, the goal should be to channel user expectations and behavior to match the actual capabilities of the system. The risk of adding too much personality is that it will create an expectation/behavior mismatch. Zoinks!
Wired | The Surprising Repercussions of Making AI Assistants Sound Human