Events

Public defence in Speech and Language Technology, M.Sc. Ragheb Al-Ghezi

Public defences

This PhD thesis developed AI systems to assess speaking skills in Finnish and Swedish using self-supervised learning with Wav2vec2, improving ASR accuracy.
- Public defence from the Aalto University School of Electrical Engineering, Department of Information and Communications Engineering

The image depicts a vintage oscilloscope displaying a green waveform next to an old signal generator, with a piece of paper cont

When

14.6.2024 12:00 – 15:00

Where

Computer Science building & Online

Lecture hall T2

Event language(s)

English

The title of the thesis: Use of Self-Supervised Learning in Automated Speaking Scoring for Low-Resource Languages

Doctoral student: Ragheb Al-Ghezi
Opponent: Prof. Helmer Strik, Radboud University, The Netherlands
Custos: Prof. Mikko Kurimo, Aalto University School of Electrical Engineering, Department of Information and Communications Engineering

This PhD thesis focused on developing automatic systems to assess speaking skills for less commonly learned languages like Finnish and Swedish. The purpose was to create tools that help people learn these languages independently and support language tests and teacher training programs despite the limited availability of training data.

The research is highly relevant to other studies in language learning technology, as it addresses the challenge of creating effective language learning tools for languages with limited data. The study tested an AI method called self-supervised learning, using a model known as Wav2vec2, to build automatic speech recognition (ASR) and scoring systems for young learners and children with speech disorders in Swedish and Finnish.

The results showed that fine-tuning the Wav2vec2 model for Swedish significantly reduced errors in recognizing spoken words, achieving a 7% improvement using just a few hours of training data. The model also successfully adapted to tasks that evaluate overall speaking ability and could accurately predict proficiency levels. Additionally, it was found that AI assessments of pronunciation and fluency were as reliable as human evaluations.

The study's main result was that fine-tuned ASR models could effectively create automatic systems for reading aloud and spontaneous speech assessments for low-resource languages like Finnish and Swedish. This research brings new information on how self-supervised learning can be used to develop language learning tools even with limited data.

The findings can be applied to create better language learning apps, tools for teachers, and resources for speech therapy, especially for languages with fewer learners. The conclusions drawn from this study are that advanced AI techniques like Wav2vec2 can overcome data limitations and significantly improve the accuracy and reliability of automatic speaking assessment systems for less commonly learned languages.

Thesis available for public display 10 days prior to the defence at: https://aaltodoc.aalto.fi/doc_public/eonly/riiputus/

Contact information

[email protected]

Doctoral theses in the School of Electrical Engineering: https://aaltodoc.aalto.fi/handle/123456789/53

Published: 3.5.2024
Updated: 11.6.2024