When Spoken To

Richard Noble, For the Courier | Jan 11, 20230

Series: Tech Space | Story 21

Our voice is how the majority of us communicate. It’s personal, it’s identifiable by our loved ones, and it’s the product of our linguistic experience over the course of our lifetimes. This innate dependance on vocals also shows up when interacting with non-humans. Smart speakers, phones, our vehicles; even the most basic thing as calling to pay a bill could result in having a conversation with a voice which was never attached to a person. Why then is there such broadness in the quality of these voices? Why do some make us so wildly uncomfortable while some are borderline indistinguishable from a real, fleshy customer service representative?

Traditional text to speech has been around for years, at this point. The very first fully functioning system was developed in Japan back in the late 1960s. From back then through to relatively recently, these deeply impersonal and robotic sounding voices were most commonly used for telephone systems and for adding voice prompts to GPS navigation. In 2010, Siri became Apple’s go-to voice assistant, and was one of the first to handle back and forth conversations. This made it one of the first true examples of an AI driven voice. Since then, Amazon’s Alexa, Google’s Assistant and Microsoft’s Cortana have entered the busy marketplace.

One of the biggest innovations here is something called Natural Voice Processing, or NLP. NLP is an AI-based process, using a neural network, which allows computers to better “understand” our natural flow of language. Think of how many ways there are to simply ask how the weather looks today; it’s NLP which enables (and simplifies) the question and meaningful response process we nowadays expect. Modern devices actually have these neural processors built in, rather than having to go out via the internet to the cloud, to process these AI-intensive tasks. The voices generated in this way sound much more realistic, mimicking the tone, inflection and wording which you might expect from a real person. You may not even know, in the case of advertising or other short interactions, that the voice you’re hearing is that of an AI at all.

All of this has brought us to a recent announcement by Apple regarding their Books service. You can now, as audiobooks prove to be a growing market, listen to narrations performed digitally by an AI. Previous machine-read audiobooks were, frankly quite difficult to listen to due to that deeply inhuman robot tone, but with NLP and other advances in artificial intelligence, these new voices are strangely personable and pleasant. Don’t get me wrong, anything read by Samual L. Jackson is still getting my vote, but this is an alternative, especially for authors who can’t front the cost of having someone high-profile read their book aloud.

It’s early days and Apple are limiting these new digital, human-esque readers to a few genres for now. One can easily see this taking off, but given the irrefutable fact that whomever narrates a piece of writing brings their own value to it, I don’t foresee this as the end of needing people’s actual mouth words. For anything non-fiction for example, not needing a voice actor to read aloud and record an entire reference manual has benefits to both costs and time. Meanwhile, something deeply story driven with rich, intricate characters and complex worlds probably better suits the (for now) unmistakable warmth of your favorite narrator.

When Spoken To

Series: Tech Space | Story 21

Most Popular

Les Kieth Kinzell

Glasgow School Board Work Sessions And Regular School Board Meeting

Glasgow City Council Regular Meeting Monday, Feb. 3

Reader Comments(0)