|
||||||||||||
|
||||||||||||
The VoiceXML interpreter conducts the call interaction with a caller, based on instructions from a VoiceXML script supplied by an application server. The interpreter processes documents (voice/call control applications) written in VoiceXML and CCXML languages and drives the underlying telephony platform, which is based on the Prosody family of products. The interpreter natively understands touchtone input and can manage pre-recorded audio prompts or files. It can also call upon voice technologies such as text-to-speech (TTS) and automatic speech recognition (ASR) for enhanced functionality. Efficient applications Performance-optimised script execution algorithms guarantee uncompromised system operation for high density applications requiring thousands of simultaneous calls, for either hosted or on-premises deployment options. Aculab offers a highly efficient VoiceXML/CCXML interpreter, which guarantees operational performance. Most interpreters on the market today, as high level applications become more dense and complex and resource dependencies substantially increase the CPU and memory usage, experience performance bottlenecks. In contrast, Aculab’s interpreter remedies this situation by using a unique architecture that virtually eliminates application load and memory penalties, thereby allowing applications to stream smoothly. Architecture The logical architecture of a VoiceXML/CCXML-based solution is shown in figure 1. The VoiceXML/CCXML interpreter communicates via web protocols (HTTP) to either a local or remote application server, which can query an enterprise’s middleware and database systems to dynamically drive content to the end user. The application server delivers pre-compiled VoiceXML/CCXML text pages and binary media files, which can be cached locally to improve runtime efficiency. The interactivity with the end user is allowed by providing the application server with spoken, touch-tone, and recorded input received back from the interpreter. Although supplied in one software package, VoiceXML and CCXML parts of Aculab’s interpreter operate separately. The VoiceXML part is responsible for handling media interaction between high level applications and end users. It enables interactive dialogues with humans based on standard media processing features, including prompt recording and playback, music on hold and DTMF event handling. Integration with 3rd party speech engines is also supported, allowing solutions to utilise the benefits of interacting with end users using synthesised human speech and ASR. The CCXML part of the implementation is responsible for IP or TDM call control, allowing the initiation, management and termination of sessions with end user devices.
Figure 1: Aculab VoiceXML/CCXML solution architecture VoiceXML applications can be housed either in hosted datacentres, or served directly from an enterprise application server in real-time. Prosody media processing platforms, which provide low level connectivity to IP and PSTN networks, utilise a highly bit-efficient, low latency ASSP protocol for communication with the high level control layer, consisting of VoiceXML/CCXML interpreter, TiNG, call control, and switch APIs. Intelligent caching of VoiceXML and audio files by the interpreter further minimises the impact on media latency. When a VoiceXML application server is located in a hosted datacentre, communication with backend enterprise databases typically occurs via real-time XML connections over a secure IP link. The integration with 3rd party voice engines (ASR, TTS, etc.) is enabled via Aculab’s MRCP client. Active call redundancy The basic requirement for the creation of resilient solutions is eliminating single points of failure by introducing redundant functional components. Aculab’s VoiceXML/CCXML interpreter can be distributed between several physical hosts and load balancing software may be implemented to minimise the impact of a failover event. Automatic failover mechanisms are also supported by the Prosody family of media processing platforms, making the solution failure tolerant from low level telephony hardware level up to the application part. An example of high availability solution is shown in figure 2.
Figure 2: Example architecture for high availability VoiceXML/CCXML solutions Technical summary Extensions to standard – Aculab has added custom objects within the VoiceXML syntax. Details can be found in the VoiceXML/CCXML Interpreter Software User’s Manual. Functionality includes:
VoIP and PSTN call control – supports SIP and the full range of Aculab’s PSTN protocols and signalling systems (for more details please visit protocols and approvals. Audio files – support for a wide set of audio file formats (please refer to the Prosody TiNG API), including the standard wave file (RIFF header format, filename suffix ’.wav’). ASR/TTS voice engines – supported using Aculab’s MRCP software against any 3rd party voice engine supporting MRCP v2 (as specified in IETF draft: draft-ietf-speechsc-mrcpv2-12). Scalability – Aculab’s VoiceXML/CCXML interpreter features linear scalability with no limit on the number of channels that are supported in a distributed environment. Tested performance – developed for use in solutions that handle hundreds to thousands of simultaneous calls, standard testing of the interpreter set begins at 1000 channels per Dual Core Intel Xeon 5160 processor host running at 3GHz clock. Operating system – Linux. For information on supported Linux distributions please refer to Aculab’s software support notes. Telephony level integration – the interpreter supports all variants and form factors of Prosody X (PCI, cPCI and PCIe) and Prosody S at version 3.0 or later. Administration and management tools – supported using SNMP protocol, which offers a standards-based capability for remote control and in-field service. Both Net-SNMP and Microsoft’s SNMP compatible MIBs are offered to provide choice. Event logging and diagnostics – supports basic and verbose levels of event logging into file. Licensing – the interpreter is available with a flexible software licence attached to a host machine. System channel counts may start form a single channel, with no fixed upper limit. Free development and evaluation licences are available (4 channels, 45 days).
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|