Events

Public defence in Acoustics and Speech Technology, M.Sc. Dejan Porjazovski

Public defence from the Aalto University School of Electrical Engineering, Department of Information and Communications Engineering
A robot listening to audio and trying to answer questions like "who", "when", "how", "what", "where", and "why".

The title of the thesis: Spoken Language Understanding: Deep Neural Network Approaches for Low-Resource Languages

Thesis defender: Dejan Porjazovski
Opponent: Prof. Yannick Estève, Avignon University, France
Custos: Prof. Mikko Kurimo, Aalto University School of Electrical Engineering

Spoken language understanding encompasses a variety of tools that enable computers to understand human speech. While these tools work reasonably well for high-resource languages like English, for languages with more limited data, such as Finnish, the performance can be greatly decreased. 

This thesis first raises the question of the most optimal way to represent the audio signal so that the model would have an easier task of extracting semantic information. As there are many ways to convert an audio signal into a meaningful vector representation, called embedding, this thesis investigates which approach is most optimal in different scenarios. The findings in this area revealed that some embedding methods have better multilingual capabilities than others. Moreover, smaller models with significantly fewer parameters can match or outperform their larger counterparts. 

Another area that this thesis investigates is whether the end-to-end architectural design can surpass the well-established, traditional cascading approach that decomposes the system into multiple separately-trained modules. To this end, the thesis directly compares the cascading and end-to-end systems on various spoken language understanding tasks, such as named entity recognition and topic identification. The findings revealed that the end-to-end systems that jointly optimise all the components are a promising future direction. 

The final topic that this thesis touches on is related to the generalisation of end-to-end models. The findings revealed that these models do not satisfy the generalisation criteria outlined in the thesis. Furthermore, the thesis provides reasons for the limited generalisation which should be considered in the future development of these models. 

Spoken language understanding systems are important in hands-free interaction devices like personal assistants. As these technologies become increasingly embedded in our daily lives, it is crucial to develop reliable models that support low-resource languages. This ensures linguistic diversity is preserved and prevents English from overtaking other languages in the field of technology.

Key words: spoken language understanding, low-resource, end-to-end

Thesis available for public display 10 days prior to the defence at Aaltodoc

Contact: dejan.porjazovski@aalto.fi

Doctoral theses of the School of Electrical Engineering

A large white 'A!' sculpture on the rooftop of the Undergraduate centre. A large tree and other buildings in the background.

Doctoral theses of the School of Electrical Engineering at Aaltodoc (external link)

Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.

Zoom Quick Guide
  • Updated:
  • Published:
Share
URL copied!