Text-To-Speech (TTS)

Aculab Cloud supports Amazon Polly and Cepstral Text To Speech (TTS) engines.

Selecting a voice in the REST API
Selecting a voice in the UAS API
Using a Polly voice
Using a Cepstral voice
Reserved characters
Text length
Common SSML tags
Charging

Selecting a voice in the REST API

In the REST API Play action, the text_to_say property supports Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken. However, this cannot be used to select the voice used by TTS to say your text. This defaults to the voice configured in your service. You can choose a different voice by setting tts_voice to a Selector from the voice tables below. For example, to set English US Female Polly Kimberly use the following setting for tts_voice:

"tts_voice" : "English US Female Polly Kimberly"

Selecting a voice in the UAS API

In the UAS API, the Say methods support Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken, for example, by choosing which voice you'd like to use using the voice tag. You can also choose the TTS engine to use, via the optional acu-engine tag which, if provided, must be outermost in the string. If you don't provide these tags your account's Default TTS voice will be used. For example, to set English US Female Polly Kimberly use the following SSML:

channel.FilePlayer.Say("<acu-engine name='Polly'><voice name='Kimberly'>I have something to say.</voice></acu-engine>");

Using a Polly voice

The preset default for your account will usually be a Standard Polly voice

We support both Standard and Neural Polly voices. Standard voices synthesize lifelike natural speech that is suitable for many applications. Neural voices are enhanced through the use of deep learning technologies to deliver even more natural sounding speech. Pricing information for Standard and Neural voices is available on the Pricing page of your Cloud Console.

Polly's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Polly demos.

Polly TTS supports a subset of SSML, which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, to go W3C SSML 1.1 recommendation.

We support the following Polly voices:

Standard
Neural

Name	Selector	Audio Clip
Zeina	Arabic Arabic Female Polly Zeina
Naja	Danish Denmark Female Polly Naja
Mads	Danish Denmark Male Polly Mads
Lotte	Dutch Netherlands Female Polly Lotte
Ruben	Dutch Netherlands Male Polly Ruben
Nicole	English Australia Female Polly Nicole
Russell	English Australia Male Polly Russell
Aditi	English India Female Polly Aditi
Raveena	English India Female Polly Raveena
Amy	English UK Female Polly Amy
Emma	English UK Female Polly Emma
Brian	English UK Male Polly Brian
Ivy	English US Female Polly Ivy
Joanna	English US Female Polly Joanna
Kendra	English US Female Polly Kendra
Kimberly	English US Female Polly Kimberly
Salli	English US Female Polly Salli
Joey	English US Male Polly Joey
Justin	English US Male Polly Justin
Matthew	English US Male Polly Matthew
Geraint	English Wales Male Polly Geraint
Chantal	French Canada Female Polly Chantal
Celine	French France Female Polly Celine
Lea	French France Female Polly Lea
Mathieu	French France Male Polly Mathieu
Marlene	German Germany Female Polly Marlene
Vicki	German Germany Female Polly Vicki
Hans	German Germany Male Polly Hans
Aditi	Hindi India Female Polly Aditi
Dora	Icelandic Iceland Female Polly Dora
Karl	Icelandic Iceland Male Polly Karl
Bianca	Italian Italy Female Polly Bianca
Carla	Italian Italy Female Polly Carla
Giorgio	Italian Italy Male Polly Giorgio
Mizuki	Japanese Japan Female Polly Mizuki
Takumi	Japanese Japan Male Polly Takumi
Seoyeon	Korean Korea Female Polly Seoyeon
Zhiyu	Mandarin China Female Polly Zhiyu
Liv	Norwegian Norway Female Polly Liv
Ewa	Polish Poland Female Polly Ewa
Maja	Polish Poland Female Polly Maja
Jacek	Polish Poland Male Polly Jacek
Jan	Polish Poland Male Polly Jan
Camila	Portuguese Brazil Female Polly Camila
Vitoria	Portuguese Brazil Female Polly Vitoria
Ricardo	Portuguese Brazil Male Polly Ricardo
Ines	Portuguese Portugal Female Polly Ines
Cristiano	Portuguese Portugal Male Polly Cristiano
Carmen	Romanian Romania Female Polly Carmen
Tatyana	Russian Russia Female Polly Tatyana
Maxim	Russian Russia Male Polly Maxim
Conchita	Spanish Castile Female Polly Conchita
Lucia	Spanish Castile Female Polly Lucia
Enrique	Spanish Castile Male Polly Enrique
Mia	Spanish Mexico Female Polly Mia
Lupe	Spanish US Female Polly Lupe
Penelope	Spanish US Female Polly Penelope
Miguel	Spanish US Male Polly Miguel
Astrid	Swedish Sweden Female Polly Astrid
Filiz	Turkish Turkey Female Polly Filiz
Gwyneth	Welsh UK Female Polly Gwyneth

Name	Selector	Audio Clip
Hala	Arabic Arabic Female Polly Hala Neural
Zayd	Arabic Arabic Male Polly Zayd Neural
Hala	Arabic United Arab Emirates Female Polly Hala Neural
Zayd	Arabic United Arab Emirates Male Polly Zayd Neural
Hiujin	Cantonese China Female Polly Hiujin Neural
Arlet	Catalan Castile Female Polly Arlet Neural
Sofie	Danish Denmark Female Polly Sofie Neural
Lisa	Dutch Belgium Female Polly Lisa Neural
Laura	Dutch Netherlands Female Polly Laura Neural
Olivia	English Australia Female Polly Olivia Neural
Kajal	English India Female Polly Kajal Neural
Niamh	English Ireland Female Polly Niamh Neural
Aria	English New Zealand Female Polly Aria Neural
Ayanda	English South Africa Female Polly Ayanda Neural
Amy	English UK Female Polly Amy Neural
Emma	English UK Female Polly Emma Neural
Arthur	English UK Male Polly Arthur Neural
Brian	English UK Male Polly Brian Neural
Danielle	English US Female Polly Danielle Neural
Ivy	English US Female Polly Ivy Neural
Joanna	English US Female Polly Joanna Neural
Kendra	English US Female Polly Kendra Neural
Kimberly	English US Female Polly Kimberly Neural
Ruth	English US Female Polly Ruth Neural
Salli	English US Female Polly Salli Neural
Gregory	English US Male Polly Gregory Neural
Joey	English US Male Polly Joey Neural
Justin	English US Male Polly Justin Neural
Kevin	English US Male Polly Kevin Neural
Matthew	English US Male Polly Matthew Neural
Stephen	English US Male Polly Stephen Neural
Suvi	Finnish Finland Female Polly Suvi Neural
Isabelle	French Belgium Female Polly Isabelle Neural
Gabrielle	French Canada Female Polly Gabrielle Neural
Liam	French Canada Male Polly Liam Neural
Lea	French France Female Polly Lea Neural
Remi	French France Male Polly Remi Neural
Hannah	German Austria Female Polly Hannah Neural
Vicki	German Germany Female Polly Vicki Neural
Daniel	German Germany Male Polly Daniel Neural
Kajal	Hindi India Female Polly Kajal Neural
Bianca	Italian Italy Female Polly Bianca Neural
Adriano	Italian Italy Male Polly Adriano Neural
Kazuha	Japanese Japan Female Polly Kazuha Neural
Tomoko	Japanese Japan Female Polly Tomoko Neural
Takumi	Japanese Japan Male Polly Takumi Neural
Seoyeon	Korean Korea Female Polly Seoyeon Neural
Zhiyu	Mandarin China Female Polly Zhiyu Neural
Ida	Norwegian Norway Female Polly Ida Neural
Ola	Polish Poland Female Polly Ola Neural
Camila	Portuguese Brazil Female Polly Camila Neural
Vitoria	Portuguese Brazil Female Polly Vitoria Neural
Thiago	Portuguese Brazil Male Polly Thiago Neural
Ines	Portuguese Portugal Female Polly Ines Neural
Lucia	Spanish Castile Female Polly Lucia Neural
Sergio	Spanish Castile Male Polly Sergio Neural
Mia	Spanish Mexico Female Polly Mia Neural
Andres	Spanish Mexico Male Polly Andres Neural
Lupe	Spanish US Female Polly Lupe Neural
Pedro	Spanish US Male Polly Pedro Neural
Elin	Swedish Sweden Female Polly Elin Neural
Burcu	Turkish Turkey Female Polly Burcu Neural

Using a Cepstral voice

Cepstral's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Cepstral demos.

Cepstral TTS supports a subset of the Speech Synthesis Markup Language (SSML), which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, go to Cepstral SSML FAQ and scroll down to the 'Common Usage Examples'. With reference to that page, please bear in mind the following:

We support the following Cepstral voices:

Name	Selector
Callie-8kHz (default)	English US Female Cepstral Callie
Marta-8kHz	Spanish US Female Cepstral Marta
Vittoria	Italian Italy Female Cepstral Vittoria

We don't support:

Inserting recorded audio files (our APIs' play functions already allow file replay)
Applying Cepstral special effects
Inserting bookmarks

Reserved characters

Some characters are reserved so, if the text you need to say contains any of these, replace them as shown:

Reserved Character	Replace With
<	<
>	>
&	&
\|
^

For example, "Bill & Ben played in the garden" would be become "Bill & Ben played in the garden".

Text length

The maximum length of the text to be converted is 1500 characters. As the length of the text is increased the generation time for the associated audio will also increase and, if is not a repeated phrase (and therefore may be cached) there will be a longer delay before the audio is played.

Common SSML tags

Polly and Cepstral both support a subset of SSML. Details of common tags can be found below. It is highly recommended that you test your application before deploying with a different TTS engine.

Tag	Description
break	Inserts a break or pause in the speech. Optional arguments are time and strength. time sets an absolute value for the pause. For example <break time="3s"> and <break time="3ms"> set the break time to be three seconds and three milliseconds respectively. The length of a break may be up to 10 seconds in duration strength sets the relative value of the pause. These are none, x-weak, weak, medium, strong and x-strong. Examples: This is a <break /> sentence break. This is a <break time="2s"/> two second break. This is a dramatic <break strength="x-strong"/> break.
voice	Allows the user to change the voice used. Parameter name is required, specifying the voice to use. The supported voices for each TTS are listed above. This SSML tag is supported in the UAS API only. For the REST API please use the tts_voice setting. Polly does not support using more than one voice in a request. The first voice tag will set the voice used for all the text. Examples: <acu-engine name='Polly'><voice name='Amy'>I'm using Amy instead of the default voice.</voice></acu-engine>
prosody	Allows the user to change the pitch, speed and volume of a segment of speech. Common optional parameters are: pitch, rate and volume. pitch can be used to set the pitch of speech. Options are: x-low, low, medium, high, x-high,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%. rate sets the rate of speech. Options are: x-slow, slow, medium, fast and x-fast,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%. volume sets the volume for speech. Options are: silent, x-soft, soft, medium, loud and x-loud, a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%. Examples: <prosody rate="x-fast">I'm using a very fast rate.</prosody> This is normal volume. <prosody volume="soft">This is a soft volume.</prosody> I can talk very <prosody rate="slow" pitch="low">deeply and slowly.</prosody> Today's date is the <prosody rate="-50%">15th April, 2012.</prosody>
emphasis	Can be used to read with empasis. Required parameter: level. Options are: reduced, moderate and strong. Examples: This is a <emphasis level="strong">level of emphasis</emphasis>, which can be used to highlight important information.

Charging

Our TTS is charged per conversion, per minute with 15 second granularity. So, for example:

A play action that plays for 12 seconds will be charged for 15 seconds.
A get input action that plays a prompt of 5 seconds and then plays "I'm sorry I didn't catch what you said" which lasts 6 seconds and the 5 second prompt again will be charged for 30 seconds (5+6+5=16, rounded up to 2 periods of 15 seconds).

You can obtain detailed charge information for a specific call using the Application Status web service. You can obtain detailed charge information for calls over a period of time using the Managing Reports web services.

Cloud documentation