Events

Public defence, Acoustics and Speech Technology, MSc (Tech) Saska Tirronen

Data Efficiency and Domain Robustness in Speech-Based Biomarking of Health.
Public defence from the Aalto University School of Electrical Engineering, Department of Information and Communications Engineering.
Doctoral hat floating above a speaker's podium with a microphone.

Title of the thesis: Data Efficiency and Domain Robustness in Speech-Based Biomarking of Health.

Thesis defender: Saska Tirronen
Opponent: Prof. Juan Ignacio Godino-Llorente, Universidad Politécnica de Madrid (UPM), Spain
Custos: Prof. Paavo Alku, Aalto University School of Electrical Engineering 

Speech carries information not only through words, but also through the way the voice is produced. Changes in voice and speech can reflect changes in health, making speech a promising tool for detecting and monitoring health conditions in a non-invasive and cost-efficient way. Such methods could support healthcare through simple recordings made with a smartphone.

This dissertation studied how machine learning systems can identify health-related information from speech more reliably when medical speech datasets are small and recordings vary in ways unrelated to health. The focus was on classification tasks, such as distinguishing healthy speakers from speakers with voice or speech disorders, and separating disorder types or severity levels. With limited training data, models may learn dataset-specific patterns instead of signs that truly reflect health conditions, especially when training and testing recordings differ because of devices, acoustic environments, or telephone channels. The dissertation examined methods that help systems use limited data more effectively and remain accurate under such changing conditions.

The results show that speech-based health classification can be improved in three complementary ways. First, performance improved when the systems used speech representations that reduced the influence of variation not directly related to health. Representations learned from large speech datasets were especially useful, improving accuracy and robustness compared with conventional speech features. Second, complex classification tasks benefited from being divided into simpler, clinically meaningful sub-tasks: a hierarchical classifier first separated healthy and disordered speech and then distinguished between disorder types, outperforming common multiclass methods. Third, the proposed methods improved robustness under real-world mismatch. In cross-database experiments, they reduced the performance loss caused by training and testing on different datasets, and in telephone-channel experiments, a preprocessing method restored performance close to the level achieved with high-quality recordings.

The main conclusion is that speech-based health technology can become more accurate and robust by avoiding unnecessary speaker-, device-, and environment-specific details. The findings support future speech-based tools for healthcare, while showing that further research is needed before such systems can be reliably used in clinical settings.

Key words: speech biomarkers, data efficiency, domain robustness, environmental variability

Thesis available for public display 7 days prior to the defence at Aalto University's public display page.

Contact: saska.tirronen@aalto.fi 

Doctoral theses of the School of Electrical Engineering

A large white 'A!' sculpture on the rooftop of the Undergraduate centre. A large tree and other buildings in the background.

Doctoral theses of the School of Electrical Engineering at Aaltodoc (external link)

Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.

Zoom Quick Guide
  • Updated:
  • Published:
Share
URL copied!