The History of Speech Recognition and a Glimpse into its Future

8 May 2017

With the release of Apple's Siri and comparable voice search assistance from Microsoft and Google, you might have speculated why it took so long for speech recognition innovation to progress to this stage. In addition, one may also wonder what the future holds for natural language-based machine intelligence learning and its impact on our everyday lives.

A closer look at the history and development of voice recognition technology may be somewhat akin to watching a toddler grow up, advancing from the baby-talk level and developing terminologies of countless words to responding to queries with fast, amusing repartees, just like what the clever digital assistant Siri does. Here is a close depiction at the innovations of the past generations with regards to speech recognition and what the future has in store for this technology.

Baby Talk: 1950s and 1960s

The "Audrey" system is the earliest speech recognition device that could recognize only digits. It was designed by Bell Laboratories in 1952 which can identify numbers spoken by a solitary voice. Ten years later, in 1962, IBM exhibited at the World's Fair its "Shoebox" system, which could comprehend 16 English spoken words.

They may not appear like much progress, but these initial initiatives were a remarkable start given the primitive computers used during those times.

Voice Recognition Picks Up: 1970s

The U.S. Defense Department played a significant part for the major breakthroughs in the speech recognition innovations through their funding and interest on the matter. The DARPA SUR (Speech Understanding Research) of the Department of Defense from the early 1970s was one of the most significant in the heritage of speech recognition. The program was the one responsible for the "Harpy" speech-understanding method developed by Carnegie Mellon. Harpy can comprehend exactly 1,011 words, an estimated vocabulary of a typical three-year-old toddler.

Predictive Speech Recognition: 1980s

The new methods and techniques to understanding what men and women say, voice recognition vocabulary leaped to several thousand words and the likelihood to identify a limitless number of words over the next decade. The hidden Markov model (HMM), an innovative statistical model developed by L.E. Baum and his coworkers is one of the significant reasons for the advancements. HMM deemed the possibility of unknown sounds as words.

It was then that business applications and customized industry, like in the medical field, started to use speech recognition. It even influenced the home in the World of Wonders' "Julie" doll. Youngsters could train the doll to interact with their voices.

Intelligent Speech Recognition Shows Up to the Masses: 1990s and 2000s

In the '90s, PCs, desktops, and laptops with swifter processors eventually arrived, and speech recognition programs became practical for the public. "Dragon Dictate," the first consumer voice recognition item was introduced by Dragon Company in 1990 for a whopping $9,000. Afterward, the second version, "Dragon NaturallySpeaking," was developed. It is marginally lower in price than the original version but still expensive at about $700. The app can identify nonstop speech, at about 100 words per minute, but you need to practice with the program for about 50 minutes.

Personalized Recognition from Google: 2010s

Google included a personalized recognition feature to Voice Search on a number of Android phones. In the midst of 2011, the company then added Voice Search to its Chrome web browser.

A Closer Look to the Future: Accurate, All-Pervasive Speech

Nowadays, voice recognition apps are everywhere and are indicators of what the future may bring. These programs will not only manage your computers by voice or maybe turn your speech to text, they may probably service multiple languages, provide contrasting speaker voices, and can assimilate to every part of your mobile phones. The quality may very well enhance, too. For example, Trulyhandsfree Voice Control by Sensory may be able to hear and recognize you, even in raucous surroundings.

It is not very difficult then to imagine a foreseeable future where we will be instructing coffee machines, conversing with one's printers, and commanding the lights to turn off on their own - oh wait, can you say Alexa?