Many factors can affect acceptability, recognition accuracy, and the users' perceptions of how useful an ASR system is to them. The design of the prompts given to each caller is critical in respect to all of these. They must:
It is also important to make sure that each caller can get to the information they need quickly and without too much prompting, and that they are passed to a human operator quickly if they are having difficulty. Such difficulties can be detected by the proportion of recognition results which are being rejected or are deemed uncertain by the ASR algorithm. Problems can also be detected by the duration of the intervals between a prompts and the corresponding recognition results: if these are too long, something is probably wrong, and it would be best to transfer the caller to an operator if the problem persists. Such human factors issues can be crucial to acceptance of the system by the general public.
Note also, that well-designed prompts can play an important role in
synchronising the speaker and the recogniser: if
sm_asr_listen_for()
is invoked while a caller is in the process of speaking, the first part
of their utterance will be lost. If the
asr_mode
is anything other than kSMASRModeDisabled, the remainder of
the utterance will then be treated as part of the following
speech. This may lead to an error in the immediately following ASR
result. Playing the caller a voice prompt immediately prior to
activating ASR on a channel is a good method to synchronise the caller’s
speech with the ASR system (it encourages them to stop talking and start
again).
The biggest single factor affecting recognition performance in real applications (both in terms of accuracy and latency) is the choice of vocabulary. In particular, the pronunciation of short and potentially emotive words such as "no" is highly variable. By comparison, digits are rarely used to convey emotion, and are spoken much more consistently. It is therefore wise to minimise the number of active vocabulary words and their confusability. For example, the word "no" can easily be confused with "oh" or even "nine" (especially if "no" is actually pronounced "nah" in vernacular speech). Recognition performance will therefore improve dramatically if a digit vocabulary (zero/nought/oh/one/two/.../nine) is kept separate from a confirmation (yes/no) vocabulary.
When speech recognition is active on a channel, any incoming signal may give rise to a recognition result. The more speech-like is the signal, the bigger is the chance of it producing a result. The more clearly it is spoken, the bigger is the chance of that result being correct.
If ASR is active while a spoken prompt is being replayed to the caller, the echo from the caller's telephone is often sufficiently loud and clear to cause spurious recognition results. The simplest way to avoid this problem is to disable recognition until the replay of the prompt has completed. However, this can slow down the caller's navigation of a menu-based application: once callers become familiar with the structure of a series of ASR prompts, they will often find it more efficient to pre-empt the prompts, and say the next word before the prompt has finished, a process known as "barge-in".
Prosody's ASR algorithm can selectively ignore the echo component, and thus allow barge-in. The recogniser can be activated immediately a replay starts, provided the channel performing the replay is specified as the sidetone channel when invoking sm_asr_listen_for().
In deciding whether to allow barge-in in an application, the developer should keep in mind the restrictions associated with the use of the sidetone channel i.e. that both the input (ASR) and the output (prompt) channels must reside on the same module, and therefore that that module must have sufficient processing resources to perform replay and ASR.
The available vocabularies for Prosody ASR are distributed in the files
listed in the following tables, but note that the complexity of each
model is different, so the files have different sizes. Note also that
some filenames ("zero.sas", "stop.sas", etc.) are
used in more than one language's vocabulary. Despite having the same
names, these files are not interchangeable, and it is important
that the correct version of each is used when recognising the respective
language. The files for a particular language are distributed in a
directory called $(TiNG)/iwr/gen/$lang where $lang is a language
code consistent with ISO 639-1, ISO 3166, and
IETF RFC 3066.
| One | one.sas |
| Two | two.sas |
| Three | three.sas |
| Four | four.sas |
| Five | five.sas |
| Six | six.sas |
| Seven | seven.sas |
| Eight | eight.sas |
| Nine | nine.sas |
| Zero | zero.sas |
| Nought | nought.sas |
| Oh | oh.sas |
| Yes | yes.sas |
| No | no.sas |
| Help | help.sas |
| Start | start.sas |
| Restart | restart.sas |
| Stop | stop.sas |
| Erase | erase.sas |
| Delete | delete.sas |
| Cancel | cancel.sas |
| Double | double.sas |
| Triple | triple.sas |
| Treble | treble.sas |
| Phone | phone.sas |
| Call | call.sas |
| Get me | get-me.sas |
| Save | save.sas |
| Store | store.sas |
| Remember | remember.sas |
| New | new.sas |
| Name | name.sas |
| Number | number.sas |
| Dial | dial.sas |
| Record | record.sas |
| End | end.sas |
| Operator | operator.sas |
| Emergency | emergncy.sas |
| Directory | directry.sas |
| One | one.sas |
| Two | two.sas |
| Three | three.sas |
| Four | four.sas |
| Five | five.sas |
| Six | six.sas |
| Seven | seven.sas |
| Eight | eight.sas |
| Nine | nine.sas |
| Zero | zero.sas |
| Double | double.sas |
| Oh | oh.sas |
| Yes | yes.sas |
| No | no.sas |
| Help | help.sas |
| Start | start.sas |
| Restart | restart.sas |
| Stop | stop.sas |
| Erase | erase.sas |
| Delete | delete.sas |
| Cancel | cancel.sas |
| Directory | directry.sas |
| Phone | phone.sas |
| Call | call.sas |
| New | new.sas |
| Name | name.sas |
| Number | number.sas |
| Dial | dial.sas |
| Save | save.sas |
| End | end.sas |
| Operator | operator.sas |
| Emergency | emergncy.sas |
| Eins | eins.sas |
| Zwei | zwei.sas |
| Zwo | zwo.sas |
| Drei | drei.sas |
| Vier | vier.sas |
| Fünf | funf.sas |
| Sechs | sechs.sas |
| Sieben | sieben.sas |
| Acht | acht.sas |
| Neun | neun.sas |
| Null | null.sas |
| Ja | ja.sas |
| Nein | nein.sas |
| Ne | ne.sas |
| Na | na.sas |
| Andere wahl | andrwahl.sas |
| Korrigieren | korrigrn.sas |
| Bestätigung | bestgung.sas |
| Befragen | befragen.sas |
| Telephonistin | telfnstn.sas |
| Stornierung | stornrng.sas |
| Zuhören | zuhoren.sas |
| Wiederholung | wdrholng.sas |
| Inhalt | inhalt.sas |
| Vohrer | vohrer.sas |
| Hilfe | hilfe.sas |
| Information | informtn.sas |
| Zurück | zuruck.sas |
| Stop | stop.sas |
| Beenden | beenden.sas |
| Anfang | anfang.sas |
| Weiter | weiter.sas |
| Aufnehmen | aufnehmn.sas |
| Nachricht | nachrcht.sas |
| Bestellen | bestelln.sas |
| Un | un.sas |
| Deux | deux.sas |
| Trois | trois.sas |
| Quatre | quatre.sas |
| Cinq | cinq.sas |
| Six | six.sas |
| Sept | sept.sas |
| Huit | huit.sas |
| Neuf | neuf.sas |
| Zéro | zero.sas |
| Oui | oui.sas |
| Non | non.sas |
| Autre choix | autrechx.sas |
| Correction | correctn.sas |
| Validation | validatn.sas |
| Consultation | conslttn.sas |
| Opérateur | operteur.sas |
| Quitter | quitter.sas |
| Écouter | ecouter.sas |
| Répéter | repeter.sas |
| Sommaire | sommaire.sas |
| Mode d'emploi | mddmploi.sas |
| Guide | guide.sas |
| Information | infrmatn.sas |
| Précédent | precednt.sas |
| Suivant | suivant.sas |
| Retour | retour.sas |
| Stop | stop.sas |