By using the Aculab site, you agree with our use of cookies.

Lies, damn lies and statistics

I’ve got a great product, but do I need to manufacture hype in order to sell it? There’s nothing wrong with using a little ‘artistic license’ to get an otherwise complex concept across, but you should draw the line at misleading. Using big numbers to beguile the audience is as bad as using big words to disguise the true meaning of something. The need to describe their products has marketing professionals producing everything from datasheets to application notes, via white papers and case studies. That’s notwithstanding more technically focussed documents, such as API and user guides, which are the province of their engineering colleagues. Of course, everyone is searching for the best way to illustrate their product’s unique selling points (USPs). Recently, I’ve noticed a trend in the realm of voice biometrics that I’ve found rather irritating. It centres around what, on the face of it, seems like a legitimate differentiator. It involves claims such as “my product is better, because it measures (or analyses) more characteristics than any other product.” The significance of this becomes apparent when you examine some of the core principles of voice biometrics. Essentially, the technology works by analysing samples of audio from a speaker and producing a reference model for that person (a process known as enrolment). Thereafter, when seeking to verify the identity of that person, a fresh audio sample is captured, analysed and compared to the reference model. The resulting statistical output is a measure of confidence in the speaker being who they claim to be. That being the case, you could be forgiven for thinking that the more points of comparison available for analysis, the better the result. Well, not really. You see, more is not necessarily better. If you can achieve the same result (statistical likelihood) from fewer points of reference, surely that’s a better outcome for everyone? Let’s look at another biometric methodology as a comparison. Did you know that the FBI standard for fingerprint identification requires just 10 out of a possible 36 minutiae to establish a match? It begs the question, if there were more than 36 points of comparison, would the accuracy be any greater? Here are some examples of what I mean: “We measure more than 100 unique characteristics to match someone’s voice.” Or, “Our next generation system analyses 10 times the number of audio features.” and my personal favourite “Our product measures nearly 4000 voice characteristics per second.” At first glance these seem like impressive numbers, but in truth they are meaningless statistics. The “nearly 4000” figure quoted above sounds remarkably like a basic ‘textbook’ telephone bandwidth voice biometric system. Such systems commonly use 13 Mel Frequency Cepstral Coefficients (MFCCs), each one supplemented with two dynamic coefficients (39 values in total). Normally, those are calculated every 10 milliseconds (100 times per second) which, coincidentally, gives a total of 3,900 values per second.However, it is misleading to refer to those as 3,900 ‘characteristics’, because none of them on their own is in any way characteristic of a particular speaker. If it’s all about the numbers, why not use the baseline 2016 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) system figure? That used 23 MFCCs, plus dynamic coefficients, giving 6,000 ‘characteristics’ per second. That’s half as good again, right? Commercial voice biometric systems use multiple MFCCs, proprietary coefficient values and prosodic features; all calculated at varying rates. The result is many thousands of characteristics per second of active speech. However, this number has no significance to an end user. An MFCC is no more a characteristic of a speech signal than the original time domain samples of the audio waveform. This idea that bigger is better could be stretched to extreme lengths, with vendors choosing to quote individual bits as characteristics, arriving at ridiculous figures such as 256,000 per second. What is important to anyone wishing to delve into the detail, is the definition of the characteristic features, the algorithms used to calculate them, and how they are used in the rest of the system. It’s a bit like digital photography. Just because you have more pixels, it doesn’t mean you’ll get a better-quality image. Quality output is more about the lens, the sensor and the engineering than it is the sheer number of pixels. Be wary of vendors who claim there are thousands of unique characteristics being measured, and that those include both physical and behavioural voice characteristics. Physical characteristics, like the size and shape of the larynx or nasal cavity, play an essential role in our voice identity. Exactly how is a system measuring these? When it comes to assessing the performance of voice biometrics solutions, don’t get distracted by the numbers. As with many things in life, look for quality over quantity. If you are thinking of introducing voice biometrics for authentication and verification in your business, and would like to discuss your requirements, contact one of our consultants today.

Archive

The Aculab blog

News, views and industry insights from Aculab

  • Voice Biometrics: Why Businesses and Users are driving its adoption

    In this blog post, we’ll look at the rapidly growing market of Voice Biometrics, and what drives its increasing rate of adoption, as more businesses and services are made aware of the need for multi-factor authentication.

    Read more

  • An underused tool in the fight against the second wave of Coronavirus

    In this article, we'll go into a bit more depth as to why exactly Broadcast Messaging is such a powerful tool. We have compiled a list of six unique characteristics to highlight exactly how it can be used productively, to shore up the lines of communication in the ongoing situation with Coronavirus.

    Read more

  • The seven realms of Broadcast Messaging

    Broadcast messaging that uses a cloud-based service is a natural choice. Using a cloud as-a-service approach gives a variety of message delivery options, and cuts down costs by automatically scaling to meet demand. Find out what makes Aculab Cloud such a natural choice for voice and SMS broadcast messaging, and how other customers are already reaping the benefits from using Aculab's CPaaS platform.

    Read more

  • The technology working behind the scenes to support emergency services networks

    Now more than ever, telecoms infrastructures play a vital role in supporting the health of our communities. Behind the scenes, networking technologies are working to keep the lines of communications open between emergency services and those in need.

    A recent example from the Lombardy region of Italy highlights a typical scenario:

    Read more

  • What’s wrong with Knowledge-Based Authentication (KBA)?

    For many years, online and telephone-based authentication has relied on knowledge-based systems using passwords, PINs, and question-and-answer dialogues to confirm a customer’s identity. With the explosion in the number of contact centres, this approach is close to breaking point. Nobody in the modern world can be expected to remember all of the passwords they need to securely access all their services.

    Read more