Media servers have had their day

Media servers have played an important role in enabling many of the real-time – and non-real-time – telecommunications applications with which we are all familiar. Those interactive applications include many things we take for granted. They include network announcements (e.g., the ‘speaking clock’), voicemail, IVR, unified messaging (which has morphed into unified communications), and outbound diallers (think campaigns and collections).

Media servers provide a range of low-level functionality used to underpin many applications. Their functions include echo cancellation, DTMF digit detection and generation, loudest speaker detection, call recording, and the ‘front-ending’ of text-to-speech (TTS) and speech recognition servers, with such essential features as ‘grunt detection’ and ‘barge-in’ (nothing to do with Marines on exercise).

These days, with a plethora of smartphone apps generating increasing media traffic in telecommunications networks, you could be forgiven for thinking that media servers can only grow in importance. The trouble with that is a bit like poor man’s origami; it’s twofold.

For one thing, lots of functionality is taken care of by the smartphone. For example, in a speech enabled navigation app, you don’t need a TTS server, or a media server to playback its synthesised speech. That’s because the app relies on a synthetic voice on the device, not on a TTS engine and media server in the network. Nor is a media server needed for the GPS signal used by the app.

Most of the traffic involved is data associated with downloads and regular updates, either to the mapping application or to the TTS voice engine on the smartphone, and none of that requires a media server.

The other thing is that the technology behind media servers is outdated. Traditionally, they have been based on boards full of digital signal processors (DSPs). More recently, they have been implemented on servers with licensed software, which offers functionality akin to that provided by DSPs. Along the way, they have been controlled via complex, low level C APIs or intermediate markup languages, such as MSCML, MSML, and VoiceXML.

None of those technologies are sustainable in a world where many real-time telco applications are no longer revenue generating, merely revenue protecting. I fear subscribers don’t associate service providers with cool applications, and what reduces subscriber churn – or increases their numbers – is free cinema vouchers and early access to tickets for concerts and sports events.

Nowadays, increasing numbers of businesses are relying on cloud-based applications. In the United Kingdom, over 80 per cent are using cloud in one way or another1. That is significant adoption and means the concept has been well and truly accepted. What price then an old fashioned media server that doesn’t lend itself to a cloud environment? Even when you consider software-based alternatives, there are still issues associated with scalability and licensing that mean capacity and ‘just in case’ over-provisioning are issues in much the same way as they are with hardware-based media servers.

When you have a cloud-based application involving aspects of telecommunications, the only logical place to lodge the functions equivalent to those offered by a media server is in the cloud. It stands to reason. When questions are asked about scalability, the cloud is the only reasonable answer.

DSP boards are scalable at significant iterative cost. Software alternatives benefit from ever increasing processor power, and are scalable in servers and on virtual machines – provided you’ve purchased sufficient licences and can deploy/redeploy them quickly enough for purpose. In each case, however, you must make a purchase dependent on your estimation of how much (or how many licences) you will need.

With cloud, there is no such dilemma. The resources you need are available on demand, when you need them, for as long or short a time as you need them. The scalability issue simply disappears. In reality, it’s someone else’s problem. You can rely on that. It’s the beauty of cloud-based resources.

With a cloud-based approach, you needn’t worry about installing duplicated banks of media servers or redundantly paired racks of servers running pre-licensed software in a data centre. Neither do you need to concern yourself over things like brokering media resources; something the unnecessarily complicated IMS has defined along with media resource functions and media resource controllers. Those things are not needed when you’re using the cloud, because of its inherent resilience and scalability. It’s all very well being able to control upwards of 30,000 sessions at a time, but if you haven’t got the underlying media resource functions – scalable and at your fingertips – brokering is a moot point.

A cloud telephony resources platform manages its own, multiple resources and offers redundancy, resilience, and persistence across different locations, different countries and different continents. The user – enterprise or telco – does not need to concern itself with how many media sessions are needed, nor with their availability. Concerns over where to route media service requests and whether they are being handled efficiently simply do not figure. The cloud platform takes care of that.

A further benefit of cloud-based resources is they make time to market so much more of a non-issue. Instead of presenting APIs in languages such as MSML and VoiceXML, which surely no one wants to learn, cloud platforms offer RESTful APIs and APIs in popular, general purpose programming languages, such as Python and Java. Those are the languages with which today’s web developers are familiar and wish to use. When they are coding an application using APIs from several sources (a typical mashup scenario), a familiar programming language is a fundamental desire.

Traditional media servers have had their day. The days of cloud-based telephony media resources are here – today and tomorrow – and that’s the future.


The Aculab blog

Cloud news, views and industry insights from Aculab

  • “Daoruni gimi, Ionos Sonaro.” *

    Languages, eh; who would have thought that in the 21st Century there would still be so much diversity?

    In Westeros, in the world of George R. R. Martin’s epic Game of Thrones, there are spoken only two major languages – the Old Tongue and the Common Tongue. But what about computer languages?

    Read more

  • Aculab Cloud and Protected Health Information

    You may have seen our press release recently announcing Aculab Cloud conformance with HIPAA and HITECH regulations. In that release, we stated that Aculab is able to enter into HIPAA Business Associate Agreements (BAA) with its Covered Entity customers providing healthcare platforms.

    Read more

  • Aculab Cloud for healthcare applications

    Aculab Cloud, our communications platform-as-a-service (CPaaS), has been received well by developer customers who serve the healthcare market. With its high-level APIs and pay for what you use approach, it provides a simple, cost-effective means to send and receive voice, fax and SMS messages.

    Read more

  • Cloud-based speech technologies – ASR and TTS

    What can cloud telephony enable you to do that previously hasn’t been economically viable for both enterprises and SMBs?

    This post touches on a particular area into which cloud telephony is set to breathe new life. It will focus on the impact a cloud telephony approach can have on the uptake of premium tools/resources, such as speech recognition and synthetic speech, to the benefit of businesses, both large and small.

    Read more

  • Cloud-based speech recognition

    Interactive Voice Response (IVR) systems are widely used to provide automated call handling for businesses. But sometimes for the caller, remembering which digit to press to connect to a certain department is not so straightforward, and can be time consuming. In addition, with the prevalence of smartphones, it can be somewhat annoying to have to listen to the prompts, then bring up the numeric keypad display on the phone before you can enter your digit choice. Wouldn't it be simpler if the caller could just speak the name of the department they required or speak the digits of a PIN code? Well, they can, using automated speech recognition technology, ASR.

    Read more