right  Talk To Us!

Multi-lingual speech recognition now supported on Aculab Cloud

We’re listening – what would you like to do today?

A generation of people have grown up trying to avoid ringing a contact centre – not because they didn’t like talking to the cheery people who work in such places, but because they first had to get past the IVR system put in place to direct the call. Press 1 for support, 2 for sales, 3 if you know the extension of the person you wish to speak to.…and so on, and so on. We quickly realised that many of these systems would let you bypass the IVR menu and get to a real person if we pressed ‘0’.

So, the IVR system was often bypassed and everyone became disillusioned with the experience, including the IT manager who had just managed to persuade the company to invest in the system.


Scroll forward 20 or more years to the present day and what follows is more like the experience you will get when you ring into a contact centre

“Hello, how can we help you today? Please explain briefly why you are calling”

And more often than not, the caller is more than happy to talk to this automated interface.

image for April 2020 ASR blog post

As you are likely aware, the market for text-to-speech and speech recognition has exploded in recent years. The engineering efforts undertaken by the giants of the tech world (Apple, Google, and Amazon, for example with Siri, Google Home and Alexa respectively) have rapidly increased the sophistication, accuracy, and ease of use of such systems. Many of us now have our own personal voice recognition system at home or in the car, and the acceptance level amongst consumers for such systems has increased substantially. This in turn is driving greater use of voice-driven systems in enterprise application areas such as call centre. For example, both Alexa and Amazon Connect (the AWS contact centre offering) have dialogs driven by Amazon Lex with Transcribe and Polly under the hood to convert between speech and text.

Our TTS and speech recognition partners

When we decided to offer our media processing capabilities as a cloud-based service (CPaaS), we scoured the market for partners to help us offer the best possible service to developers wishing to build their own communications applications. We were not limited in our choices to a single favoured supplier – each one had to be a best-of-breed partner. We chose to host our service on Amazon AWS infrastructure – with separate clouds to support the US and Europe to allow customers to keep their data where they wished, and we chose voice carrier and SMS messaging partners to give us the highest quality, worldwide connectivity options for calls and messaging.

As the market progressed, we evolved the Aculab Cloud platform to keep up with these developments – the first step in that evolution was the integration of HIPAA-compliant text-to-speech (TTS) voices from Amazon Polly.

TTS support is a key feature for systems sending outbound voice messages such as appointment reminders. Rather than record and store the voice message before sending to customers, TTS can be used to deliver clear, natural sounding, bespoke voice messages in multiple languages.

To complement the multi-lingual TTS support, we needed a speech recognition system – and for that we again sought a best-of-breed partner, choosing Google Speech Recognition, one of the Google Cloud AI building blocks.

Speech recognition in 120 languages

If you want to localise your communications system for a new region, then it's likely we can support you with that requirement.

The Aculab Cloud speech recognition feature enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. More than 120 languages and variants can be recognised.

Since our platform is predominantly for conversational communications over the phone, we have focused on that use case - speech recognition is integrated cleanly into our REST API v2, making it easy to implement speech driven conversational dialogs. We've also provided real-time transcription of calls, allowing you to augment the agent's screen based on what's said on the call, perform sentiment analysis to show the contact centre manager where the hot spots are, etc. In addition, our API allows for voice-command interrupts that could be used, for example, in voicemail systems where a voice command such as ‘repeat’ spoken during message playback will prompt the system to replay the message from the beginning, with other words being ignored. And of course you can feed our call recordings to a speech recogniser for offline transcription, allowing later search, etc.

Conversational dialogs on Aculab Cloud

Armed with our high quality TTS and natural language Speech Recognition, you can now use conversational dialog services such as Amazon Lex and Google Dialogflow to drive your call flows on Aculab Cloud. As well as providing a high quality customer experience, this means you can use these same services across other channels such as chat and messaging, providing a consistent user experience. 

Further information about the feature can be found in our documentation area.