Lies, damn lies and statistics

I’ve got a great product, but do I need to manufacture hype in order to sell it? There’s nothing wrong with using a little ‘artistic license’ to get an otherwise complex concept across, but you should draw the line at misleading. Using big numbers to beguile the audience is as bad as using big words to disguise the true meaning of something. The need to describe their products has marketing professionals producing everything from datasheets to application notes, via white papers and case studies. That’s notwithstanding more technically focussed documents, such as API and user guides, which are the province of their engineering colleagues. Of course, everyone is searching for the best way to illustrate their product’s unique selling points (USPs). Recently, I’ve noticed a trend in the realm of voice biometrics that I’ve found rather irritating. It centres around what, on the face of it, seems like a legitimate differentiator. It involves claims such as “my product is better, because it measures (or analyses) more characteristics than any other product.” The significance of this becomes apparent when you examine some of the core principles of voice biometrics. Essentially, the technology works by analysing samples of audio from a speaker and producing a reference model for that person (a process known as enrolment). Thereafter, when seeking to verify the identity of that person, a fresh audio sample is captured, analysed and compared to the reference model. The resulting statistical output is a measure of confidence in the speaker being who they claim to be. That being the case, you could be forgiven for thinking that the more points of comparison available for analysis, the better the result. Well, not really. You see, more is not necessarily better. If you can achieve the same result (statistical likelihood) from fewer points of reference, surely that’s a better outcome for everyone? Let’s look at another biometric methodology as a comparison. Did you know that the FBI standard for fingerprint identification requires just 10 out of a possible 36 minutiae to establish a match? It begs the question, if there were more than 36 points of comparison, would the accuracy be any greater? Here are some examples of what I mean: “We measure more than 100 unique characteristics to match someone’s voice.” Or, “Our next generation system analyses 10 times the number of audio features.” and my personal favourite “Our product measures nearly 4000 voice characteristics per second.” At first glance these seem like impressive numbers, but in truth they are meaningless statistics. The “nearly 4000” figure quoted above sounds remarkably like a basic ‘textbook’ telephone bandwidth voice biometric system. Such systems commonly use 13 Mel Frequency Cepstral Coefficients (MFCCs), each one supplemented with two dynamic coefficients (39 values in total). Normally, those are calculated every 10 milliseconds (100 times per second) which, coincidentally, gives a total of 3,900 values per second.However, it is misleading to refer to those as 3,900 ‘characteristics’, because none of them on their own is in any way characteristic of a particular speaker. If it’s all about the numbers, why not use the baseline 2016 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) system figure? That used 23 MFCCs, plus dynamic coefficients, giving 6,000 ‘characteristics’ per second. That’s half as good again, right? Commercial voice biometric systems use multiple MFCCs, proprietary coefficient values and prosodic features; all calculated at varying rates. The result is many thousands of characteristics per second of active speech. However, this number has no significance to an end user. An MFCC is no more a characteristic of a speech signal than the original time domain samples of the audio waveform. This idea that bigger is better could be stretched to extreme lengths, with vendors choosing to quote individual bits as characteristics, arriving at ridiculous figures such as 256,000 per second. What is important to anyone wishing to delve into the detail, is the definition of the characteristic features, the algorithms used to calculate them, and how they are used in the rest of the system. It’s a bit like digital photography. Just because you have more pixels, it doesn’t mean you’ll get a better-quality image. Quality output is more about the lens, the sensor and the engineering than it is the sheer number of pixels. Be wary of vendors who claim there are thousands of unique characteristics being measured, and that those include both physical and behavioural voice characteristics. Physical characteristics, like the size and shape of the larynx or nasal cavity, play an essential role in our voice identity. Exactly how is a system measuring these? When it comes to assessing the performance of voice biometrics solutions, don’t get distracted by the numbers. As with many things in life, look for quality over quantity. If you are thinking of introducing voice biometrics for authentication and verification in your business, and would like to discuss your requirements, contact one of our consultants today.

Archive

The Aculab blog

News, views and industry insights from Aculab

  • 4 Uses of CPaaS to improve Healthcare services

    The healthcare industry is a constantly shifting marketplace, with new technologies evolving on a regular basis. However these changes tend to be behind the scenes; until the COVID-19 pandemic very little had changed in terms of how doctors and medical staff interact with patients. Now healthcare providers are playing catch up to create pandemic and futureproof communication models. For many, a CPaaS solution is their salvation.

    Continue reading

  • 3 Ways to Reduce Carbon Emissions with Cloud Communications

    As traditional communication solutions, which have a large energy footprint, fall short with sustainability, could cloud-based communications be the answer?

    Continue reading

  • The Battle Against Wildfires

    Wildfires (or forest fires) are happening more and often every year. While it is true that wildfires are a natural process, the frequency and intensity that we are starting to see year-on-year across the globe is concerning. Continue reading to find out how Aculab provides mission-critical infrastructure for emergency networks, to tackle high-risk situations such as wildfires.

    Continue reading

  • Reminder: The world is reopening (again)

    Appointment reminders provide a crucial service in the healthcare industry, find out how Aculab Cloud can help alleviate stress and take your communication to another level.

    Continue reading

  • How to choose a CPaaS Solution

    Identify your CPaaS needs in 3 simple steps using this simple guide, mistakes to avoid when choosing a CPaaS solution and why you need CPaaS for your business

    Continue reading