Lies, damn lies and statistics

I’ve got a great product, but do I need to manufacture hype in order to sell it? There’s nothing wrong with using a little ‘artistic license’ to get an otherwise complex concept across, but you should draw the line at misleading. Using big numbers to beguile the audience is as bad as using big words to disguise the true meaning of something. The need to describe their products has marketing professionals producing everything from datasheets to application notes, via white papers and case studies. That’s notwithstanding more technically focussed documents, such as API and user guides, which are the province of their engineering colleagues. Of course, everyone is searching for the best way to illustrate their product’s unique selling points (USPs). Recently, I’ve noticed a trend in the realm of voice biometrics that I’ve found rather irritating. It centres around what, on the face of it, seems like a legitimate differentiator. It involves claims such as “my product is better, because it measures (or analyses) more characteristics than any other product.” The significance of this becomes apparent when you examine some of the core principles of voice biometrics. Essentially, the technology works by analysing samples of audio from a speaker and producing a reference model for that person (a process known as enrolment). Thereafter, when seeking to verify the identity of that person, a fresh audio sample is captured, analysed and compared to the reference model. The resulting statistical output is a measure of confidence in the speaker being who they claim to be. That being the case, you could be forgiven for thinking that the more points of comparison available for analysis, the better the result. Well, not really. You see, more is not necessarily better. If you can achieve the same result (statistical likelihood) from fewer points of reference, surely that’s a better outcome for everyone? Let’s look at another biometric methodology as a comparison. Did you know that the FBI standard for fingerprint identification requires just 10 out of a possible 36 minutiae to establish a match? It begs the question, if there were more than 36 points of comparison, would the accuracy be any greater? Here are some examples of what I mean: “We measure more than 100 unique characteristics to match someone’s voice.” Or, “Our next generation system analyses 10 times the number of audio features.” and my personal favourite “Our product measures nearly 4000 voice characteristics per second.” At first glance these seem like impressive numbers, but in truth they are meaningless statistics. The “nearly 4000” figure quoted above sounds remarkably like a basic ‘textbook’ telephone bandwidth voice biometric system. Such systems commonly use 13 Mel Frequency Cepstral Coefficients (MFCCs), each one supplemented with two dynamic coefficients (39 values in total). Normally, those are calculated every 10 milliseconds (100 times per second) which, coincidentally, gives a total of 3,900 values per second.However, it is misleading to refer to those as 3,900 ‘characteristics’, because none of them on their own is in any way characteristic of a particular speaker. If it’s all about the numbers, why not use the baseline 2016 National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) system figure? That used 23 MFCCs, plus dynamic coefficients, giving 6,000 ‘characteristics’ per second. That’s half as good again, right? Commercial voice biometric systems use multiple MFCCs, proprietary coefficient values and prosodic features; all calculated at varying rates. The result is many thousands of characteristics per second of active speech. However, this number has no significance to an end user. An MFCC is no more a characteristic of a speech signal than the original time domain samples of the audio waveform. This idea that bigger is better could be stretched to extreme lengths, with vendors choosing to quote individual bits as characteristics, arriving at ridiculous figures such as 256,000 per second. What is important to anyone wishing to delve into the detail, is the definition of the characteristic features, the algorithms used to calculate them, and how they are used in the rest of the system. It’s a bit like digital photography. Just because you have more pixels, it doesn’t mean you’ll get a better-quality image. Quality output is more about the lens, the sensor and the engineering than it is the sheer number of pixels. Be wary of vendors who claim there are thousands of unique characteristics being measured, and that those include both physical and behavioural voice characteristics. Physical characteristics, like the size and shape of the larynx or nasal cavity, play an essential role in our voice identity. Exactly how is a system measuring these? When it comes to assessing the performance of voice biometrics solutions, don’t get distracted by the numbers. As with many things in life, look for quality over quantity. If you are thinking of introducing voice biometrics for authentication and verification in your business, and would like to discuss your requirements, contact one of our consultants today.

Archive

The Aculab blog

News, views and industry insights from Aculab

  • The Future of Communications: React Native SDK from Aculab

    Having access to quality, and trust-worthy digital communication platforms is essential in the contemporary business world. That’s why here at Aculab, we have employed the use of React Native – integrated into the WebRTC browser interface, we provide bespoke, human-centered, voice and video communication options

    Continue reading

  • The Future of Finance: Unlocking the Power of Biometrics

    Biometric technology works by using unique biological characteristics to identify individuals. Through its use of sensors and algorithms, it can capture and analyse biometric data and compare it with stored data to confirm an individual’s identity.

    Continue reading

  • The Big Switch Off Is Happening. Are You Ready?

    What Exactly Is The Big Switch Off?

    The Big Switch Off refers to the growing phase-out of BT’s Public Switched Telephone Network (PSTN) and Integrated Services Digital Network (ISDN). Businesses and homeowners will no longer be able to acquire PSTN and ISDN connections after September 2023, followed by the old technology being completely phased out and switched off by December 2025.

    Continue reading

  • 3 Ways Cloud Voice & Messaging Save a Business Time

    When we think of business voice call technology, we often imagine traditional phone calls, placed by a person with hundreds of calls on their list. Advancements in voice technology, particularly driven by the cloud, have changed the way we approach such situations.

    Continue reading

  • 5 Reasons to Use ApplianX in Your Migration Strategy

    Gateways have an important role to play in assisting in the migration from TDM solutions to IP based networks, by connecting them together with ease. Here are five reasons why you should consider using gateways as part of your migration plan.

    Continue reading