Doctoral theses of the School of Electrical Engineering at Aaltodoc (external link)
Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.
Title of the thesis: Speech-based classification and regression studies on vocal intensity and severity level of Parkinson’s disease
Thesis defender: Manila Kodali
Opponent: Prof. John Hansen, University of Texas at Dallas, US
Custos: Prof. Paavo Alku, Aalto University School of Electrical Engineering
Speech is a fundamental form of human communication conveying linguistic, paralinguistic,
and extralinguistic cues. Advances in machine learning (ML) have enabled automatic analysis of
such information, offering a scalable and low-cost alternative to traditional clinical voice
assessment. This thesis focuses on two interlinked topics, vocal intensity and Parkinson’s
disease (PD). The thesis studies the multi-class classification of vocal intensity categories, the
prediction of sound pressure level (SPL) from amplitude-normalized speech, and the multi-
class classification of PD severity using spectral features, pre-trained embeddings, and wavelet
scattering representations with ML models.
For the vocal intensity category classification problem, the best system improved accuracy from the chance level (of 25 %) up to almost 86 %. For the SPL prediction problem, the best regression system was able to estimate the ground truth SPL values from speech signals produced by pathological and healthy speakers with a mean absolute error of approximately 2.0 dB. In the severity classification of PD, monologue and reading tasks outperformed vowel and sentence production tasks, as well as a diadochokinetic task. Articulation and fusion feature sets performed significantly better than phonation and prosody features, while the choice of classifier had only a minor impact on accuracy.
In conclusion, spectral features, and pre-trained model embeddings, combined with ML models, showed improved performance compared to non-fine tuned embeddings for both of the studied vocal intensity problems. In particular, the prediction of SPL from amplitude-normalized speech signals stands out as an attractive new method with strong clinical potential because it enables improved interpretability of speech-based biomarking technology when speech is recorded in non-calibrated real-world conditions. Furthermore, the thesis shows that a monologue speaking task combined with articulation and fusion features yielded the highest accuracy in the classification of the severity level of PD.
Key words: Vocal intensity, Parkinson's disease, sound pressure level, machine learning, classification, regression
Thesis available for public display 7 days prior to the defence at Aalto University's public display page.
Contact:
Phone: +358505713343
www.linkedin.com/in/manila-kodali-7bb346150
Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.