Cloud-based speech technologies – ASR and TTS
What can cloud telephony enable you to do that previously hasn’t been economically viable for both enterprises and SMBs?
This post touches on a particular area into which cloud telephony is set to breathe new life. It will focus on the impact a cloud telephony approach can have on the uptake of premium tools/resources, such as speech recognition and synthetic speech, to the benefit of businesses, both large and small.
There is a universal need for SMBs and enterprises to automate certain types of calls, where possible. Doing so means that human resources can focus on the more critical, technical, personal and revenue generating calls, whilst speeding up the time it takes to deliver the information a customer/caller desires.
Despite the proliferation of Web-based help and customer self service options, it remains absolutely necessary to offer customers a voice channel, for all manner of queries. Such calls can be automated in a variety of ways, using techniques such as DTMF detection or pre-recording standard messages for playback. In truth, however, those methods really are suitable only for those occasions when short/simple pieces of information need to be conveyed. In addition, it’s best if the information is fairly static i.e., it isn’t likely to change often or on a caller by caller basis.
- Typical IVR menu for automated receptionist using TTS
Two types of speech technologies lend themselves to improving this approach, namely automatic speech recognition (ASR) and text-to-speech (TTS). Both offer a more natural way for callers to interact with and obtain information from an automated system. However, the flexibility and ‘interaction enhancing’ qualities of these technologies come at a price – high licence fees and (relatively) huge computing resource consumption.
Enter cloud computing. In many respects it is somewhat of a match made in heaven. On the one hand, you have the ‘virtually’ limitless resource of the cloud and on the other, you have the resource hungry requirements of the speech technology. A cloud-based telephony platform brings the two together in a way that can be delivered very cost-effectively on a pay-for-what-you-use basis. Rather than having to purchase redundant servers and provision for peak calls, which means a very expensive investment being fundamentally underutilised for large amounts of time, users can relax and let all those concerns float by on the cloud.
Ok, it’s not that simple – your IT group swaps managing technology for managing a technology provider, etc. However, when you think about all those redundant, over provisioned ASR and TTS servers/licences burning dollars, you can see how the cloud becomes a very attractive proposition. Telephony applications can be written to access and use speech technologies from within a pool of cloud-based resources, as and when needed, which is when you pay for them – only when needed. Now, tell me that’s a bad thing.
Aculab Cloud supports a wide range of TTS languages with multiple male/female voices for most:
To view more on our TTS capabilities, check out the TTS guide in the documentation area.
When it comes to speech capabilities, Aculab has a lengthy history of development going back to the days when telecom resources were delivered on boards crammed with DSPs or using embedded software. Our latest initiative is to bring the speech recognition capabilities (ASR) to our cloud telecom platform to benefit customers who want to enhance their call centre or IVR system with a cost-effective speech recognition capability. If you would like to learn more about our ASR capabilities, have a look at this post, Cloud-based-speech-recognition.
Product Manager, Aculab Cloud