Text-To-Speech (TTS)

Aculab Cloud supports Amazon Polly and Cepstral Text To Speech (TTS) engines.

Selecting a voice in the REST API

In the REST API Play action, the text_to_say property supports Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken. However, this cannot be used to select the voice used by TTS to say your text. This defaults to the voice configured in your service. You can choose a different voice by setting tts_voice to a Selector from the voice tables below. For example, to set English US Female Polly Kimberly use the following setting for tts_voice:

"tts_voice" : "English US Female Polly Kimberly"

Selecting a voice in the UAS API

In the UAS API, the Say methods support Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken, for example, by choosing which voice you'd like to use using the voice tag. You can also choose the TTS engine to use, via the optional acu-engine tag which, if provided, must be outermost in the string. If you don't provide these tags your account's Default TTS voice will be used. For example, to set English US Female Polly Kimberly use the following SSML:

channel.FilePlayer.Say("<acu-engine name='Polly'><voice name='Kimberly'>I have something to say.</voice></acu-engine>");

Using a Polly voice

 The preset default for your account will usually be a Standard Polly voice

We support both Standard and Neural Polly voices. Standard voices synthesize lifelike natural speech that is suitable for many applications. Neural voices are enhanced through the use of deep learning technologies to deliver even more natural sounding speech. Pricing information for Standard and Neural voices is available on the Pricing page of your Cloud Console.

Polly's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Polly demos.

Polly TTS supports a subset of SSML, which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, to go W3C SSML 1.1 recommendation.

We support the following Polly voices:

Filter by:
  • NameSelectorAudio Clip
    Zeina Arabic Arabic Female Polly Zeina
    Naja Danish Denmark Female Polly Naja
    Mads Danish Denmark Male Polly Mads
    Lotte Dutch Netherlands Female Polly Lotte
    Ruben Dutch Netherlands Male Polly Ruben
    Nicole English Australia Female Polly Nicole
    Russell English Australia Male Polly Russell
    Aditi English India Female Polly Aditi
    Raveena English India Female Polly Raveena
    Amy English UK Female Polly Amy
    Emma English UK Female Polly Emma
    Brian English UK Male Polly Brian
    Ivy English US Female Polly Ivy
    Joanna English US Female Polly Joanna
    Kendra English US Female Polly Kendra
    Kimberly English US Female Polly Kimberly
    Salli English US Female Polly Salli
    Joey English US Male Polly Joey
    Justin English US Male Polly Justin
    Matthew English US Male Polly Matthew
    Geraint English Wales Male Polly Geraint
    Chantal French Canada Female Polly Chantal
    Celine French France Female Polly Celine
    Lea French France Female Polly Lea
    Mathieu French France Male Polly Mathieu
    Marlene German Germany Female Polly Marlene
    Vicki German Germany Female Polly Vicki
    Hans German Germany Male Polly Hans
    Aditi Hindi India Female Polly Aditi
    Dora Icelandic Iceland Female Polly Dora
    Karl Icelandic Iceland Male Polly Karl
    Bianca Italian Italy Female Polly Bianca
    Carla Italian Italy Female Polly Carla
    Giorgio Italian Italy Male Polly Giorgio
    Mizuki Japanese Japan Female Polly Mizuki
    Takumi Japanese Japan Male Polly Takumi
    Seoyeon Korean Korea Female Polly Seoyeon
    Zhiyu Mandarin China Female Polly Zhiyu
    Liv Norwegian Norway Female Polly Liv
    Ewa Polish Poland Female Polly Ewa
    Maja Polish Poland Female Polly Maja
    Jacek Polish Poland Male Polly Jacek
    Jan Polish Poland Male Polly Jan
    Camila Portuguese Brazil Female Polly Camila
    Vitoria Portuguese Brazil Female Polly Vitoria
    Ricardo Portuguese Brazil Male Polly Ricardo
    Ines Portuguese Portugal Female Polly Ines
    Cristiano Portuguese Portugal Male Polly Cristiano
    Carmen Romanian Romania Female Polly Carmen
    Tatyana Russian Russia Female Polly Tatyana
    Maxim Russian Russia Male Polly Maxim
    Conchita Spanish Castile Female Polly Conchita
    Lucia Spanish Castile Female Polly Lucia
    Enrique Spanish Castile Male Polly Enrique
    Mia Spanish Mexico Female Polly Mia
    Lupe Spanish US Female Polly Lupe
    Penelope Spanish US Female Polly Penelope
    Miguel Spanish US Male Polly Miguel
    Astrid Swedish Sweden Female Polly Astrid
    Filiz Turkish Turkey Female Polly Filiz
    Gwyneth Welsh UK Female Polly Gwyneth
  • NameSelectorAudio Clip
    Hala Arabic United Arab Emirates Female Polly Hala Neural
    Zayd Arabic United Arab Emirates Male Polly Zayd Neural
    Hiujin Cantonese China Female Polly Hiujin Neural
    Arlet Catalan Castile Female Polly Arlet Neural
    Sofie Danish Denmark Female Polly Sofie Neural
    Lisa Dutch Belgium Female Polly Lisa Neural
    Laura Dutch Netherlands Female Polly Laura Neural
    Olivia English Australia Female Polly Olivia Neural
    Kajal English India Female Polly Kajal Neural
    Niamh English Ireland Female Polly Niamh Neural
    Aria English New Zealand Female Polly Aria Neural
    Ayanda English South Africa Female Polly Ayanda Neural
    Amy English UK Female Polly Amy Neural
    Emma English UK Female Polly Emma Neural
    Arthur English UK Male Polly Arthur Neural
    Brian English UK Male Polly Brian Neural
    Danielle English US Female Polly Danielle Neural
    Ivy English US Female Polly Ivy Neural
    Joanna English US Female Polly Joanna Neural
    Kendra English US Female Polly Kendra Neural
    Kimberly English US Female Polly Kimberly Neural
    Ruth English US Female Polly Ruth Neural
    Salli English US Female Polly Salli Neural
    Gregory English US Male Polly Gregory Neural
    Joey English US Male Polly Joey Neural
    Justin English US Male Polly Justin Neural
    Kevin English US Male Polly Kevin Neural
    Matthew English US Male Polly Matthew Neural
    Stephen English US Male Polly Stephen Neural
    Suvi Finnish Finland Female Polly Suvi Neural
    Isabelle French Belgium Female Polly Isabelle Neural
    Gabrielle French Canada Female Polly Gabrielle Neural
    Liam French Canada Male Polly Liam Neural
    Lea French France Female Polly Lea Neural
    Remi French France Male Polly Remi Neural
    Hannah German Austria Female Polly Hannah Neural
    Vicki German Germany Female Polly Vicki Neural
    Daniel German Germany Male Polly Daniel Neural
    Kajal Hindi India Female Polly Kajal Neural
    Bianca Italian Italy Female Polly Bianca Neural
    Adriano Italian Italy Male Polly Adriano Neural
    Kazuha Japanese Japan Female Polly Kazuha Neural
    Tomoko Japanese Japan Female Polly Tomoko Neural
    Takumi Japanese Japan Male Polly Takumi Neural
    Seoyeon Korean Korea Female Polly Seoyeon Neural
    Zhiyu Mandarin China Female Polly Zhiyu Neural
    Ida Norwegian Norway Female Polly Ida Neural
    Ola Polish Poland Female Polly Ola Neural
    Camila Portuguese Brazil Female Polly Camila Neural
    Vitoria Portuguese Brazil Female Polly Vitoria Neural
    Thiago Portuguese Brazil Male Polly Thiago Neural
    Ines Portuguese Portugal Female Polly Ines Neural
    Lucia Spanish Castile Female Polly Lucia Neural
    Sergio Spanish Castile Male Polly Sergio Neural
    Mia Spanish Mexico Female Polly Mia Neural
    Andres Spanish Mexico Male Polly Andres Neural
    Lupe Spanish US Female Polly Lupe Neural
    Pedro Spanish US Male Polly Pedro Neural
    Elin Swedish Sweden Female Polly Elin Neural

Using a Cepstral voice

Cepstral's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Cepstral demos.

Cepstral TTS supports a subset of the Speech Synthesis Markup Language (SSML), which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, go to Cepstral SSML FAQ and scroll down to the 'Common Usage Examples'. With reference to that page, please bear in mind the following:

We support the following Cepstral voices:

NameSelector
Callie-8kHz (default) English US Female Cepstral Callie
Marta-8kHz Spanish US Female Cepstral Marta
Vittoria Italian Italy Female Cepstral Vittoria

We don't support:

  • Inserting recorded audio files (our APIs' play functions already allow file replay)
  • Applying Cepstral special effects
  • Inserting bookmarks

Reserved characters

Some characters are reserved so, if the text you need to say contains any of these, replace them as shown:

Reserved CharacterReplace With
<&lt;
>&gt;
&&amp;
|
^

For example, "Bill & Ben played in the garden" would be become "Bill &amp; Ben played in the garden".

Text length

The maximum length of the text to be converted is 1500 characters. As the length of the text is increased the generation time for the associated audio will also increase and, if is not a repeated phrase (and therefore may be cached) there will be a longer delay before the audio is played.

Common SSML tags

Polly and Cepstral both support a subset of SSML. Details of common tags can be found below. It is highly recommended that you test your application before deploying with a different TTS engine.

TagDescription
break

Inserts a break or pause in the speech.

Optional arguments are time and strength.

time sets an absolute value for the pause. For example <break time="3s"> and <break time="3ms"> set the break time to be three seconds and three milliseconds respectively. The length of a break may be up to 10 seconds in duration

strength sets the relative value of the pause. These are none, x-weak, weak, medium, strong and x-strong.

Examples:

This is a <break /> sentence break.
This is a <break time="2s"/> two second break.
This is a dramatic <break strength="x-strong"/> break.
voice

Allows the user to change the voice used. Parameter name is required, specifying the voice to use. The supported voices for each TTS are listed above.

 This SSML tag is supported in the UAS API only. For the REST API please use the tts_voice setting.
 Polly does not support using more than one voice in a request. The first voice tag will set the voice used for all the text.

Examples:

<acu-engine name='Polly'><voice name='Amy'>I'm using Amy instead of the default voice.</voice></acu-engine>
                
prosody

Allows the user to change the pitch, speed and volume of a segment of speech.

Common optional parameters are: pitch, rate and volume.

pitch can be used to set the pitch of speech. Options are: x-low, low, medium, high, x-high,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

rate sets the rate of speech. Options are: x-slow, slow, medium, fast and x-fast,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

volume sets the volume for speech. Options are: silent, x-soft, soft, medium, loud and x-loud, a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

Examples:

<prosody rate="x-fast">I'm using a very fast rate.</prosody>
This is normal volume. <prosody volume="soft">This is a soft volume.</prosody>
I can talk very <prosody rate="slow" pitch="low">deeply and slowly.</prosody>
Today's date is the <prosody rate="-50%">15th April, 2012.</prosody>
emphasis

Can be used to read with empasis.

Required parameter: level. Options are: reduced, moderate and strong.

Examples:

This is a <emphasis level="strong">level of emphasis</emphasis>, which can be used to highlight important information.

Charging

Our TTS is charged per conversion, per minute with 15 second granularity. So, for example:

  • A play action that plays for 12 seconds will be charged for 15 seconds.
  • A get input action that plays a prompt of 5 seconds and then plays "I'm sorry I didn't catch what you said" which lasts 6 seconds and the 5 second prompt again will be charged for 30 seconds (5+6+5=16, rounded up to 2 periods of 15 seconds).

You can obtain detailed charge information for a specific call using the Application Status web service. You can obtain detailed charge information for calls over a period of time using the Managing Reports web services.