Baidu’s Deep Voice vs. Google’s WaveNet
The company recently revealed that researchers at its Silicon Valley lab have developed a TTS (text-to-speech) system that functions better than Google’s WaveNet.
Baidu’s text-to-speech system is called Deep Voice, and the company has claimed that it solves a major problem hounding WaveNet: the inability to quickly synthesize natural and realistic speech with little human input.
The company said that it only takes a few hours to train Deep Voice to produce natural speech. In contrast, training WaveNet is computationally demanding, so the process is long and makes the real-world application of WaveNet difficult.
Baidu said that it solved WaveNet’s problems by using deep-learning technology to build the ability to convert text phenomes—the smallest unit of speech—into Deep Voice. The company has also given Deep Voice the ability to convey different emotions, a function that is absent in Google’s WaveNet.
A $12 billion revenue opportunity
With this breakthrough, Baidu may have just left Google behind in the pursuit of nearly $12 billion in voice and speech technology revenue, according to MarketsandMarkets estimates.
TTS systems can be used to enhance the functionality of digital assistants such as Alexa by Amazon (AMZN), Siri by Apple (AAPL), and Cortana by Microsoft (MSFT). They can also be leveraged to simplify interactions with IoT (Internet of Things) devices such as connected vehicles.
Both Baidu and Google are developing autonomous driving technologies as they pursue revenues outside their online advertising staples.