Aculab Cloud voice and speech analysis system parameters and algorithms


Many diseases and medical conditions affect a subject’s voice and the patterns of their speech. The assessment and diagnosis of these conditions generally involves attendance at specialist clinics where speech and language therapists, or other voice specialists, analyse a number of characteristics of the speech. Their analysis is largely subjective and requires significant levels of training and expertise on the part of the clinician. A detailed history of objective measurements would allow clinicians to make a more informed and accurate assessment, but this can be time-consuming and expensive since regular clinical visits would be required.

Making audio recordings over the telephone can provide a large cost saving, and simultaneously minimise the disruption to both the subject and the clinician. Telephone-quality speech contains a wealth of information, not only allowing for a caller’s speech to be understood, but also for individual speaker characteristics to be identified, and for many abnormalities to be detected and quantified.

Aculab’s voice and speech analysis system, VoiScan, provides a tool that may be of interest to clinicians, speech therapists, medical researchers and other voice professionals, enabling objective measurements of a subject’s voice and speech characteristics to be taken over the phone, via a fully automated dialogue. With suitable customisation, the system could potentially be used for monitoring and screening and provision of a highly cost-effective means of managing the respective disorders and diseases, as well as allowing more effective use of resources including clinicians’ time, equipment and facilities.

Calls can be scheduled as frequently as necessary, and the system can include additional communication between clinician and patient, such as confirmation of the patient’s availability for clinical appointments or prompting the patient to perform any self-medication or other actions that may be required.


A simple demonstration of VoiScan has been set up. In this demonstration, a subject’s responses to a short series of spoken prompts is recorded, analysed, and a report returned to them via email. In a practical system, the email would be sent to a clinician, therapist or other speech or voice specialist, who would then interpret the results and compare them with the subject’s history, before deciding whether a clinical visit was warranted.


Demo parameters and algorithms

The fundamental parameters calculated by the VoiScan system have been designed to be extracted from unconstrained telephone recordings of subjects’ voices.

Aculab has drawn on its extensive experience in real-world telephone systems to ensure that the specifications of the parameters, and the algorithms used to calculate them, provide the fullest and most accurate information regarding each subject’s voice and speech.

The individual parameters measured in this demo are defined below, or alternatively you can download these details as a pdf file.


Voice source parameters

Vocal tract parameters

Articulatory dynamics parameters

Apart from the above parameters, which describe overall characteristics of the subject’s voice and speech, a more detailed analysis can be made of individual speech sounds (phonemes) and the transitions between them. VoiScan can report information regarding the articulation of these sounds in the form of likelihood scores and acoustic segment durations obtained from an automatic phonetic alignment system. This can aid a speech specialist in the identification of specific articulatory problems.


Aculab’s advanced speech processing group have a number of academic publications in this field, including a monograph, “Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders” (Springer, 2013). This book surveys the use of automatic voice and speech analysis for diagnosis and monitoring, and can be obtained here.

Aculab’s system emulates many of the parameters traditionally used by speech and language therapists, as described in the book “Parkinson’s Disease and Movement Disorders – Diagnosis and Treatment Guidelines for the Practicing Physician” (Charles H Adker and Jeric Ahlskog, Eds., Human Press, 2000); in particular, the comprehensive and authoritative chapter by Joseph R Duffy, “Motor Speech Disorders: Clues to Neurologic Diagnosis“, which describes current “Best Clinical Practice” for diagnosis and treatment of Motor Speech Disorders. It is available here.

A number of other research papers have used lexical or phonetic alignment based on Automatic Speech Recognition (ASR) technology to quantify speech parameters akin to many of those listed by Duffy, but which were beyond the capabilities of the existing clinical systems. This technology is also a key component of Aculab’s voice and speech analysis system, and can be used to improve the detection of conditions such as Parkinson’s Disease. The system can be used to evaluate over 170 voice and speech characteristics, but the demonstration only evaluates a subset of these, including those which correspond to one or more of the features identified in Duffy’s book chapter.