Aculab Cloud supports Cepstral and Polly Text To Speech (TTS).
Choosing the voice to use
In the REST and UAS APIs, the Play action and Say functions respectively, support the Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken, for example, by choosing which voice you'd like to say it. SSML also allows you to choose which TTS engine you'd like, through use of the optional acu-engine tag which, if provided, must be outermost in the string. If you don't provide this tag your account default will be used, based on the default voice.
Polly's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Polly demos.
Polly TTS supports a subset of SSML, which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, to go W3C SSML 1.1 recommendation.
We support the following voices:
Cepstral's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Cepstral demos.
Cepstral TTS supports a subset of the Speech Synthesis Markup Language (SSML), which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, go to Cepstral SSML FAQ and scroll down to the 'Common Usage Examples'. With reference to that page, please bear in mind the following:
We support the following voices:
|Callie-8kHz||US English female, default|
|Marta-8kHz||American Spanish female|
We don't support:
- Inserting recorded audio files (our APIs' play functions already allow file replay)
- Applying Cepstral special effects
- Inserting bookmarks
Some characters are reserved for use in SSML so, if the text you need to say contains any of these, replace them as shown:
|Reserved Character||Replace With|
For example, "Bill & Ben played in the garden" would be become "Bill & Ben played in the garden".
Common SSML tags
Cepstral and Polly both support a subset of SSML. Details of common tags can be found below. It is highly recommended that you test your application before deploying with a different TTS engine.
Inserts a break or pause in the speech.
Optional arguments are time and strength.
time sets an absolute value for the pause. For example <break time="3s"> and <break time="3ms"> set the break time to be three seconds and three milliseconds respectively. The length of a break may be up to 10 seconds in duration
strength sets the relative value of the pause. These are none, x-weak, weak, medium, strong and x-strong.
This is a <break /> sentence break. This is a <break time="2s"/> two second break. This is a dramatic <break strength="x-strong"/> break.
Allows the user to change the voice used. Parameter name is required, specifying the voice to use. The supported voices for each TTS are listed above.
<acu-engine name='Polly'><voice name='Amy'>I'm using Amy instead of the default voice.</voice></acu-engine>
Allows the user to change the pitch, speed and volume of a segment of speech.
Common optional parameters are: pitch, rate and volume.
pitch can be used to set the pitch of speech. Options are: x-low, low, medium, high, x-high,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.
rate sets the rate of speech. Options are: x-slow, slow, medium, fast and x-fast,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.
volume sets the volume for speech. Options are: silent, x-soft, soft, medium, loud and x-loud, a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.
<prosody rate="x-fast">I'm using a very fast rate.</prosody> This is normal volume. <prosody volume="soft">This is a soft volume.</prosody> I can talk very <prosody rate="slow" pitch="low">deeply and slowly.</prosody> Today's date is the <prosody rate="-50%">15th April, 2012.</prosody>
Can be used to read with emphasis.
Required parameter: level. Options are: reduced, moderate and strong.
This is a <emphasis level="strong">level of emphasis</emphasis>, which can be used to highlight important information.