Lies, damn lies and statistics

I’ve got a great product, but do I need to manufacture hype in order to sell it? There’s nothing wrong with using a little ‘artistic license’ to get an otherwise complex concept across, but you should draw the line at misleading. Using big numbers to beguile the audience is as bad as using big words to disguise the true meaning of something. The need to describe their products has marketing professionals producing everything from datasheets to application notes, via white papers and case studies. That’s notwithstanding more technically focussed documents, such as API and user guides, which are the province of their engineering colleagues. Of course, everyone is searching for the best way to illustrate their product’s unique selling points (USPs). Recently, I’ve noticed a trend in the realm of voice biometrics that I’ve found rather irritating. It centres around what, on the face of it, seems like a legitimate differentiator. It involves claims such as “my product is better, because it measures (or analyses) more characteristics than any other product.” The significance of this becomes apparent when you examine some of the core principles of voice biometrics. Essentially, the technology works by analysing samples of audio from a speaker and producing a reference model for that person (a process known as enrolment). Thereafter, when seeking to verify the identity of that person, a fresh audio sample is captured, analysed and compared to the reference model. The resulting statistical output is a measure of confidence in the speaker being who they claim to be. That being the case, you could be forgiven for thinking that the more points of comparison available for analysis, the better the result. Well, not really. You see, more is not necessarily better. If you can achieve the same result (statistical likelihood) from fewer points of reference, surely that’s a better outcome for everyone? Let’s look at another biometric methodology as a comparison. Did you know that the FBI standard for fingerprint identification requires just 10 out of a possible 36 minutiae to establish a match? It begs the question, if there were more than 36 points of comparison, would the accuracy be any greater? Here are some examples of what I mean: “We measure more than 100 unique characteristics to match someone’s voice.” Or, “Our next generation system analyses 10 times the number of audio features.” and my personal favourite “Our product measures nearly 4000 voice characteristics per second.” At first glance these seem like impressive numbers, but in truth they are meaningless statistics. The “nearly 4000” figure quoted above sounds remarkably like a basic ‘textbook’ telephone bandwidth voice biometric system. Such systems commonly use 13 Mel Frequency Cepstral Coefficients (MFCCs), each one supplemented with two dynamic coefficients (39 values in total). Normally, those are calculated every 10 milliseconds (100 times per second) which, coincidentally, gives a total of 3,900 values per second.However, it is misleading to refer to those as 3,900 ‘characteristics’, because none of them on their own is in any way characteristic of a particular speaker. If it’s all about the numbers, why not use the baseline 2016 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) system figure? That used 23 MFCCs, plus dynamic coefficients, giving 6,000 ‘characteristics’ per second. That’s half as good again, right? Commercial voice biometric systems use multiple MFCCs, proprietary coefficient values and prosodic features; all calculated at varying rates. The result is many thousands of characteristics per second of active speech. However, this number has no significance to an end user. An MFCC is no more a characteristic of a speech signal than the original time domain samples of the audio waveform. This idea that bigger is better could be stretched to extreme lengths, with vendors choosing to quote individual bits as characteristics, arriving at ridiculous figures such as 256,000 per second. What is important to anyone wishing to delve into the detail, is the definition of the characteristic features, the algorithms used to calculate them, and how they are used in the rest of the system. It’s a bit like digital photography. Just because you have more pixels, it doesn’t mean you’ll get a better-quality image. Quality output is more about the lens, the sensor and the engineering than it is the sheer number of pixels. Be wary of vendors who claim there are thousands of unique characteristics being measured, and that those include both physical and behavioural voice characteristics. Physical characteristics, like the size and shape of the larynx or nasal cavity, play an essential role in our voice identity. Exactly how is a system measuring these? When it comes to assessing the performance of voice biometrics solutions, don’t get distracted by the numbers. As with many things in life, look for quality over quantity. If you are thinking of introducing voice biometrics for authentication and verification in your business, and would like to discuss your requirements, contact one of our consultants today.

Written on 17 September 2019.

The Aculab blog

News, views and industry insights from Aculab

Biometrics To Stop Hackers

In today's digital era, cyber threats are far too common; data breaches are an ongoing danger, meaning robust security measures are essential. Traditional authentication methods such as passwords and PINs are increasingly vulnerable to hacking and cyber-attacks. However, there is a powerful tool to help the fight against cybercrime: biometric technology.

Continue reading
Eliminating Barriers to Communication with Live Audio Translation for Phone Calls

In an increasingly interconnected world, clear and effective communication is more essential than ever. That’s why Aculab intends to help break down language barriers and foster cross-cultural communications.

Continue reading
The End of the PSTN in the US

As the technical world has evolved, so has the way we communicate. The gradual, global transition away from the Public Switched Telephone Network (PSTN) is the most noticeable change in recent years. This begs the question, is the PSTN in the US headed towards a slow end as we transition into the digital era?

Continue reading
Revolutionising the Landscape of Remote Authentication

In a time where borders blur and workplaces extend beyond the confines of traditional offices, the significance of remote authentication has taken centre stage. As we advance, so does the need for secure and efficient ways to verify and authenticate our identity remotely. Finding the balance between security and user convenience is key when seeking to implement successful remote authentication.

Continue reading
Choosing The Ideal Communication Platform: Key Considerations to Optimise Your Business

Communication Platforms as a Service have become a necessity in the current digital age; allowing businesses to obtain frictionless means of communicating effectively. However, as technology rapidly evolves, so must communications. Much of the platforms on offer today are homogenous, so choosing the best fit for your business can be difficult. In this blog, we have shared some key points and trends for to consider, so your business can amplify communications and increase operational efficiency!

Continue reading