Are facemasks a problem for Voice Biometrics?

Wearing a mask is now the primary way to limit the spread of coronavirus, and has been found to reduce the daily growth rate of reported infections in large scale populations by around 45%- but this raises a potential problem for voice biometric security.

Contents:

 


 

The effect of facemasks on Voice Biometrics: an experiment

 

Wearing a mask is now the primary, everyday way to limit the spread of coronavirus, and has been reported to reduce the daily growth rate of reported infections by around 45% in large scale populations.

Face coverings have been mandatory on public transport since June 15th in the UK, and became mandatory in Germany in public in early April. In these two countries, they are the most familiar index of ‘the new normal’ in culture.

This raises a potential problem for voice biometric security.

People’s behaviours adapt to change and these new behaviours then become normalised. Because of this, there is now a greater demand for contactless forms of verification and identification, especially as, alongside mask wearing, contactless identification and verification becomes associated with the benefits of public health. Gloves block the success of fingerprinting; masks hinder facial recognition. This blog will attempt to find out whether mask wearing affects the ability of voice biometric identification and verification to function properly.

 

A (very) brief overview of Voice Biometrics

 

Voice biometrics is the next leading edge technology, facilitating high levels of security and unlocking ease of access. It allows us to use our voices to navigate our everyday interactions with the digital world around us. A voiceprint, the key, is created either through the user repeating a passphrase, or passively through conversation with customer agents. When a user wishes to access a service with a voice biometric system, all they need to do is speak, and the voice biometric AI algorithm decides whether or not the sound of their voice in that moment matches the voiceprint on file, as a key to a lock. As some examples, voice biometrics is currently used in multi-layer authentication for logging into accounts with valuable information such as online banking. It is used in healthcare to provide better diagnosis of patients, and is perhaps most widely deployed in customer facing applications such as call centres, in IVR, and for mobile applications.

 

The problem masks could cause for Voice Biometrics

 

Face coverings come in all sizes and thicknesses and can impact a wearer’s speech patterns, by distorting the sound of their speech or by greatly attenuating it. The masks that are most effective, which create a tighter seal around the mouth and nose, would be expected to have more of an acoustic effect on speech sounds, and therefore could possibly affect the verification process - as the natural sound of the user’s voice will have been changed.

Another possible problem comes from the fact that the way that a person speaks through a mask is also likely to change. It may be the case that users wearing masks tend to speak more clearly than without one, as they may be self-conscious of the fact that they might not be heard properly, thus changing the sound being analysed on a more fundamental level.

Overall, it's important that the convenience of Voice Biometrics should not be hampered by the user wearing a mask. The simple question therefore is: are Voice Biometric algorithms affected by the user wearing a mask, and if so, how can the algorithm adapt to take into account mask-wearing users?

To understand this problem, we’ve put Aculab’s VoiSentry algorithm to the test. A voiceprint was created without a mask. Four different styles of face covering were then worn, one after the other, and access was attempted.

 

The face coverings being used for the test

 

mask type

To try the Aculab VoiSentry demo for yourself, visit

 

#1: Filtering Facepiece (FFP3)
Heavy duty mask, very similar to the N95 mask, which is also of the PPE class FFP.

#2: Fabric Mask
This mask is made of a light, stretchy fabric. Some light fabric masks, however, are tighter, and wrap around the mouth and nose fully.

#3: Disposable Surgical Mask
One of the most commonly available masks, this is made out of a light fabric and creates as much of a seal around the mouth and nose as it can with a small strip of pliable plastic that bends to follow the contours of your face.

#4: Woolen Scarf
Not officially a mask, but a face covering. This also works as a stand-in for face-coverings and veils.

 

Initial Spectrogram Tests

 

This spectrogram was taken through recording a voice, repeating the passphrase “my voice will let me in”, which is needed to create a voiceprint with VoiSentry. The audio was recorded using a phone to emulate the most common microphone type, in a small, quiet room. The audio files were then fed into an audio processing program and through a spectrogram plugin. The passphrase was repeated whilst wearing different masks, and was repeated as close as the same intonation, speed and diction as possible to the control test, with the same distance to the microphone and in the same position in the room, to maximise the potential of the only variable being the swapping in and out of face coverings.

 

  • mask test 1
  • No Mask (Control Test)

    This is the basis on which the subsequent tests are verified. A voiceprint is created whilst wearing no mask, using a mobile phone’s microphone.

    This spectrogram shows frequency on the y axis and time on the x axis. The greater the amplitude at a certain frequency, the higher the response in the heatmap. Sibilant sounds, (voi-CE) register higher on the frequency spectrum. Plosive sounds (le-T) inherently have more amplitude.

    Interestingly, the small feature of T-glottalisation (how people pronounce their T’s) is a good way to differentiate sound properties of speech across individuals, as it is based not only on physicality of the mouth, but cultural, demographic factors, and situational factors. Here you can begin to see how difficult it would be for someone to imitate someone else’s voice on the level that the algorithm analyses the recordings, in that everyone’s voice sounds so uniquely different, and the algorithm uses extremely well developed mathematical frames beyond human judgment.

  • mask test 2
  • Filtering Face Piece FFP3 ( #1)

    In comparison to the control test with no masks, differences in the spectrogram show that some of the higher frequencies and harmonics of the voice are lost.
    The mask creates a tight seal around the mouth and nose with a plastic ring around the edges of the mask.
    This can be seen most clearly in the sibilance of voi-CE, as well as the voiceless plosive phoneme T. This is to be expected due to the muffling effect of the fabric.

  • mask test 3
  • Fabric Mask ( #2)

    This mask has an effect on the mid-low frequencies and on the high frequencies and sibilances, and especially the plosives. The definition and amount of amplitude of the heatmap in general (the amount of red) is much lower.

    The tight, denser fabric not only creates a muffling effect but also a dampening effect on the volume.

  • mask test 4
  • Disposable Surgical Mask (#3)

    The thin material of the surgical mask allows for more high frequencies than other masks to get through, and sibiliances are more defined than other masks. The effect that this mask has is on the mid frequencies, which are mildly affected, such as on the longer vowel phonemes “wi-ll”, “m-e” and “i-n”.

  • mask test 5
  • Scarf (#4)

    The scarf, wrapped tightly in a couple of layers around the mouth and nose, has had a large effect on both the mid-low frequencies as well as the high frequencies. Instead of having only a muffling effect, where the weightedness of the frequencies is pushed to the lower end, the scarf instead creates a situation where the frequencies are all minimised in amplitude, spread more evenly, and blurred a bit more together. Overall amplitude is lost, but the frequency spectrum as a whole is more widely balanced.

 

VoiSentry Tests

 

VoiSentry enrolment and initial verification of the voiceprint occurred using a mobile phone, and with no mask. Three verification attempts were made against this initial enrolment for each face covering. Distance to microphone, diction and other variables kept the same, with some organic variability allowed.

 

  • VoiSentry mask test 1
  • VoiSentry mask test 2
  • VoiSentry mask test 3
Three step enrolment, verification, and identification process.

 

RESULTS

#1: Filtering Facepiece (FFP3)
  • Attempt 1: Verification Successful
  • Attempt 2: Verification Successful
  • Attempt 3: Verification Successful
#2: Fabric Mask
  • Attempt 1: Verification Successful
  • Attempt 2: Verification Successful
  • Attempt 3: Verification Successful
#3: Disposable Surgical Mask
  • Attempt 1: Verification Successful
  • Attempt 2: Verification Successful
  • Attempt 3: Verification Successful
#4: Woolen Scarf
  • Attempt 1: Verification Successful
  • Attempt 2: Verification Successful
  • Attempt 3: Verification Successful

 

Conclusions

 

These tests show that VoiSentry doesn't seem to be affected by the presence of masks.

Tests were also done with deliberate attempts to modulate the voice, and the way the passphrase was spoken, to test the presentation attack detection, or imposter detection. This worked well with all masks, showing that VoiSentry can still confirm a speaker's identity, even when wearing a protective mask.

What’s more, even if a genuine speaker is rejected, all that is needed is a simple update of their model to regain correct operation. Where re-enrolment is needed (which did not happen in these tests), VoiSentry provides a simple mechanism to update its voiceprints in cases such as this, to take into account mask wearing.

The efficiency and efficacy of VoiSentry lies in the fact that the algorithm has been designed to detect and verify users in noisy environments, and to take into account differences in acoustic properties, that may come from room reverberation, background noise, and distance from the microphone.

Acoustic properties may also be affected by the hardware that is picking up the signal, which may have built-in enhancements, such as gain control, dynamic equalisation, noise suppression, and other automatic signal processing. This is quite common on mobile phones.

Therefore, wearing a mask can have little to no impact on the verification and authentication process, meaning that contactless forms of verification and identification can continue to be deployed to great effect. This enables companies, products and services to gain a competitive edge in customer service situations where voice channels are available, and where the identity of the caller is critical to a business model.

Public perception of the effect of face coverings on a person's speech is somewhat exaggerated because human perception of speech is very much linked to visual perception - being able to see a speaker’s mouth move as well as discern body language, and subtle changes in facial movements all goes towards facilitating easy communication.

Fortunately for Voice Biometrics, algorithms and their performance, that is to say, matching a voiceprint to a user’s voice, operate on a highly attuned mathematical level - rather than psychological or psychoacoustic- and are designed to emulate human perception, and perhaps even improve upon it.

What is interesting is that although we can infer differences visually using a spectrogram to claim a difference in sound properties, the VoiSentry algorithm uses thousands of mathematical frames that operate over and above this dimension of our intuition. Indeed, this is the primary benefit, and some would say, magic of using an AI algorithm- it just works.

 

 

Archive

The Aculab blog

News, views and industry insights from Aculab

  • SMS Scams over the Holidays: Ready, Set, GO.

    In the last year, global e-commerce has jumped to over $26.7 trillion, accelerated by COVID-19 according to United Nations UN News. It all sounds like great news for the economy, however fraudsters are following this upward trend and adapting their scams.

    Continue reading

  • STIR/SHAKEN and Robocalls

    The STIR/SHAKEN framework has been the talk of the North American telecoms town over the past few years, but what is it, how does it impact your business, and how can you make sure your business’s communications conform to this framework?

    Continue reading

  • 7 Reasons to implement Cloud based Voice Biometric today

    What is Voice Biometric Authentication?

    From privacy and security, to ease of use and savings. Discover how your business can benefit from cloud based voice biometrics.

    Continue reading

  • STIR / SHAKEN in CPaaS

    Robocalls: Good guy vs Bad guy

    Tired of robocalls? Who isn't. I barely answer my cell phone unless it's from someone I know. With the usage of cell phones in the US rising substantially over the past decade, consumers have seen a sharp rise in the number of spoof and robocalls they receive.

    Continue reading

  • How To: Add voice and video calls to your webpage

    The advent of the internet fundamentally changed how people communicate. We are now able to connect with people across the globe almost instantaneously, not only through voice and text, but also through video communication.

    In this blog post we will be diving into WebRTC, showing how it can help you as a business, and explaining what you can achieve with Aculab Cloud WebRTC.

    Continue reading