Will Speech Technologies Deliver in 2010?
On Monday in my Jamison-Consulting.com blog I wrote how speech technologies fared in 2009, which I'm contrasting here with what I think will happen in 2010 for speech within unified communications and other market segments. Here is what I think will happen in 2010.
Hosting is going to stay hot. With companies such as Voxeo, Microsoft Tellme, Contact Solutions, Angel.com, and myriad others providing hosting as an adjunct or alternative to premises-based speech deployments, customers have a lot of safe choices for not having to do everything themselves. No longer an anomaly, I believe that we will see hosting brought up in conversation in the majority of deals in 2010, even if it's only talked about.
I still think outbound will be big in 2010, but what I will call intelligent outbound. That is, with regulation to halt "robocalls", and the public tiring of them, vendors will use outbound because it's too attractive to pass up when done well, but they will be more cognizant of making sure there is value for the recipient. Proactive outbound notifications that a customer might opt in for, such as wanting to be informed when a flight has been cancelled or a prescription is ready, will be a key driver in this increase. And if you can make that interactive outbound, as Voxify would say, it is all the better as it increases the value to both parties.
I think we are going to see a much bigger increase over 2009 because the money will start to flow and companies will continue to want to fill in niches or focus more on speech in general. Granted we don't have a lot of big players to snatch up, but I think that a lot of little start-ups, that don't have a lot of mindshare, but have some good ideas, but more importantly, much needed talent, will get gobbled up in the coming year. Speech scientists, linguists, and the like aren't a dime a dozen, so I think we will see consolidation as a way to fill unique niches in products and as a way to gain talent. Look to companies in voice search, language translation, speech analytics, or business analytics combined with speech to be targets.
Mobility and Voice Search
Mobility is still hot and so is voice search. I think we are only seeing the beginning of what we can do with speech technologies on mobile devices of all sorts. As an example, check out Google's just announced Nexus One phone. It has ASR and TTS and STT. Any place on the phone that you can input text with a keypad you can use your voice. That is pretty cool. As I will probably write about in my April Speech Technology Magazine column, speech technologies are the enabler of all sorts of interesting possibilities for what you can do with a phone without having to deal with a tiny keypad, and in this case enabler is not a bad word. Mobility also helps power a critical endpoint of unified communications. Think beyond voice-activated dialing and command and control, to voice search, translation, dictation, speech-to-text of anything typed. Mobility is a speech geek's dream.
I'm looking forward to the third annual Voice Search conference being held in San Francisco in April 2010. To fit with the trends that are happening, this event is being re-branded as Mobile Voice conference. It is well worth attending for anyone in the contact center, mobility or UC space.
You can't talk about mobility and voice search without talking about text-to-speech. 2010 will be another big year here too. E-Book reading will continue to make news. Besides the upcoming Apple product, there is a new e-book platform called Blio that will be launched at the Consumer Electronics show this month, through a partnership with Baker & Taylor (B&T), who is the world's largest distributor of books. Blio was developed by Ray Kurzweil, a long time speech technology pioneer, and it will contain TTS. What is exciting about Blio is that the software will work with any operating system, including computers and iPhones, and has multi-media functionality including graphics and video. Additionally, in China we have Insdream's SX601 that employs Mandarin TTS for reading.
One would hope that all of the addition of speech technologies would bolster products in the assistive technology market as well. We have seen better hand held devices for sight impaired users starting to be offered, but we are also seeing a more concentrated effort in making the web more accesible too, and this is where TTS really plays a part. For example, Google just unveiled a web search site that helps the sight impaired to do search. This site prioritizes a user's search results based on how simple the Web page layouts are. This is critical for people who depend upon TTS to read a page to them, because the more complicated the page, with graphics and information that is not core to the information that a person is searching for, the more extraneous material is read that a person does not need to hear. A sited person can ignore the superfluous stuff and skip to the text, but a sight-impaired person cannot.
Speech continues to be loaded into all mobility aspects of unified communications, so those UC vendors who aren't paying attention will. This will appear in new capabilities on devices, in many cases multi-modal applications that allow a user a choice between voice and keypad, and in new applications that use speech technologies. In addition, there will be an increase in the adoption of speech analytics, among other business analytics, to facilitate better business processes in both the enterprise and the contact center, both of which UC plays a role in. Particularly in the contact center, with more of the focus on the "voice on the customer" trend, speech analytics is integral to uncovering what is really happening with customer conversations and will continue to gain traction in 2010.
Dictation and speech-to-text
Use of dictation will continue to grow both in vertical market applications, such as healthcare and legal, to "end user dictation" in which the end user dictates using speech-to-text in mobility applications, such as sending texts or using products/services that convert voicemails to texts. Nuance, who is already heavily invested in dictation technology and products, purchase of SpinVox at year end was an additional indicator of how this market is growing.
The industry made further inroads into translations systems in 2009 and this will continue in 2010, particularly as the footprint required to run them shrinks. This is particularly useful in mobility applications on phones and, along those lines, in the last week of 2009 Toshiba delivered a trilingual translation system that uses TTS and ASR that can be used embedded on a cell phone, rather than requiring a user to incur network charges to access an application on the network. This application translates between Japanese, Chinese and English and uses the ASR to determine which language is being spoken and what is said, and then TTS for output.
I don't know if I'm missing anything here, but these are the trends I see happening in speech in 2010 right now.