Cloud-based speech technologies – ASR and TTS
What can cloud telephony enable you to do that previously hasn’t been economically viable for both enterprises and SMBs?
This post touches on a particular area into which cloud telephony is set to breathe new life. It will focus on the impact a cloud telephony approach can have on the uptake of premium tools/resources, such as speech recognition and synthetic speech, to the benefit of businesses, both large and small.
There is a universal need for SMBs and enterprises to automate certain types of calls, where possible. Doing so means that human resources can focus on the more critical, technical, personal and revenue generating calls, whilst speeding up the time it takes to deliver the information a customer/caller desires.
Despite the proliferation of Web-based help and customer self service options, it remains absolutely necessary to offer customers a voice channel, for all manner of queries. Such calls can be automated in a variety of ways, using techniques such as DTMF detection or pre-recording standard messages for playback. In truth, however, those methods really are suitable only for those occasions when short/simple pieces of information need to be conveyed. In addition, it’s best if the information is fairly static i.e., it isn’t likely to change often or on a caller by caller basis.
- Typical IVR menu for automated receptionist using TTS
Two types of speech technologies lend themselves to improving this approach, namely automatic speech recognition (ASR) and text-to-speech (TTS). Both offer a more natural way for callers to interact with and obtain information from an automated system. However, the flexibility and ‘interaction enhancing’ qualities of these technologies come at a price – high licence fees and (relatively) huge computing resource consumption.
Enter cloud computing. In many respects it is somewhat of a match made in heaven. On the one hand, you have the ‘virtually’ limitless resource of the cloud and on the other, you have the resource hungry requirements of the speech technology. A cloud-based telephony platform brings the two together in a way that can be delivered very cost-effectively on a pay-for-what-you-use basis. Rather than having to purchase redundant servers and provision for peak calls, which means a very expensive investment being fundamentally underutilised for large amounts of time, users can relax and let all those concerns float by on the cloud.
Ok, it’s not that simple – your IT group swaps managing technology for managing a technology provider, etc. However, when you think about all those redundant, over provisioned ASR and TTS servers/licences burning dollars, you can see how the cloud becomes a very attractive proposition. Telephony applications can be written to access and use speech technologies from within a pool of cloud-based resources, as and when needed, which is when you pay for them – only when needed. Now, tell me that’s a bad thing.
Aculab Cloud supports a wide range of TTS languages (Cepstral and Amazon Polly) with multiple male/female voices for most:
| || || |
To view more on our TTS capabilities, check out the TTS guide in the documentation area.
Update - April 2020
To complement the extensive TTS coverage, we have just finished integrating Google Speech-to-Text, a powerful, natural language speech recognition engine, into Aculab Cloud. You can use it to implement natural, conversational IVR, and transcribe speech to text in real time during the call. Powered by machine learning technology and utilising powerful neural network models, it currently supports 120 languages and language variants.
By choosing best-of-breed TTS and SR capabilities from Amazon and Google, we can offer developers a comprehensive feature set for development of communications systems.
To view more on our speech recognition capabilities, check out the SR guide in the documentation area.
Product Manager, Aculab Cloud