Choosing your passphrase

If you have a speaker verification system (or plan to implement one) and haven’t decided on a passphrase yet, the following may be of interest. You’ve probably seen the advert where the little kid says “my voice is my password” to access his phone. For secure applications, this isn’t necessarily the best idea. Commonly spoken phrases, by their very nature, are easily predicted and susceptible to spoofing attacks by hackers. Similarly, using the same passphrase for all your applications (contact centre solutions, IVR systems, mobile apps etc) is akin to having the same password for everything. Not a good idea. Granted, it’s not quite as bad as having "Password" as your password. The biometric component of speaker verification adds a much-needed degree of added security. Of course, it’s unlikely that your corporate security policy would permit use of the same passphrase by every user, across multiple systems. Ideally, your speaker verification system should be flexible enough to allow users to choose their own passphrase.

What makes a good passphrase?

In a text-dependent system (i.e. one that is reliant on a specific phrase or sequence of words), you should be looking for a unique arrangement of words that takes about 2-3 seconds to utter. We recommend a phrase with a minimum of four syllables; something that is both easy to say, and easy to remember. This is enough to create a viable voiceprint when enrolment involves the analysis of several repeats of the same passphrase. From a security point of view, try to avoid a passphrase that might be easily associated with you as an individual. Social engineering is a popular identity theft tool, so avoid something predictable like your home address. In a text-dependent system, very few phonemes (the distinct, audible sounds that comprise a language) are needed, if you have enough samples. NB: although the English language has 26 letters in its alphabet, it has 44 phonemes. If you are enrolling via repetition, repeating the same sounds within the passphrase is not a bad thing, because we do not always pronounce them in the same way.

Enrolment vs. verification

If you are using a text-dependent passphrase for verification, it makes sense to use the same phrase for enrolment. However, if you are implementing a text-independent system, the ideal enrolment would involve attaining a greater degree of phonemic content (sounds), repeated in many different contexts. In the case of a text-independent system, enrolment data will need to be detailed enough to cover all the sounds expected to be encountered during verification. Recordings with lots of syllables will generally produce more precise models and better verification accuracy. It is rare for one party in a telephone conversation to speak continuously for more than 10 to 20 seconds. So, enrolment recordings should be comprised of multiple, shorter passages; captured throughout the duration of a conversation. Verification is then achieved during extended dialogue between the caller and an agent or through spoken responses to an IVR system, rather than by repeating a specific passphrase. If you are implementing a text-prompted system, where the response will also be recognised using ASR, several examples of each possible prompt will be required for enrolment. If your prompt is to be a random, 4-digit sequence of numbers between 1 and 9, enrolment should consist of repeating each number several times. Counting from one to nine and back again in separate recordings will suffice. Your choice of active or passive enrolment is likely to be determined by what is most practical at the time of enrolment. A passive enrolment, where enough audio is captured during a conversation, may be more natural experience but a less practical one. Multiple repetitions of a four second active passphrase or number sequence is a more artificial, but efficient process. Find out more information on speaker verification and authentication, check out our Look who's talking white paper.


The Aculab blog

News, views and industry insights from Aculab

  • SMS Scams over the Holidays: Ready, Set, GO.

    In the last year, global e-commerce has jumped to over $26.7 trillion, accelerated by COVID-19 according to United Nations UN News. It all sounds like great news for the economy, however fraudsters are following this upward trend and adapting their scams.

    Continue reading

  • STIR/SHAKEN and Robocalls

    The STIR/SHAKEN framework has been the talk of the North American telecoms town over the past few years, but what is it, how does it impact your business, and how can you make sure your business’s communications conform to this framework?

    Continue reading

  • 7 Reasons to implement Cloud based Voice Biometric today

    What is Voice Biometric Authentication?

    From privacy and security, to ease of use and savings. Discover how your business can benefit from cloud based voice biometrics.

    Continue reading

  • STIR / SHAKEN in CPaaS

    Robocalls: Good guy vs Bad guy

    Tired of robocalls? Who isn't. I barely answer my cell phone unless it's from someone I know. With the usage of cell phones in the US rising substantially over the past decade, consumers have seen a sharp rise in the number of spoof and robocalls they receive.

    Continue reading

  • How To: Add voice and video calls to your webpage

    The advent of the internet fundamentally changed how people communicate. We are now able to connect with people across the globe almost instantaneously, not only through voice and text, but also through video communication.

    In this blog post we will be diving into WebRTC, showing how it can help you as a business, and explaining what you can achieve with Aculab Cloud WebRTC.

    Continue reading