Defence of dissertation in the field of Language Technology, M.Sc. (Tech.) Peter Smit

The title of the thesis is “Modern subword-based models for automatic speech recognition”

This thesis implements subwords in a theoretical sound way for modern finite-state transducer -based speech recognition systems. This implementation is so powerful, that even characters can be used as a unit, which allows for true unlimited vocabulary decoding and possibilities for using new character-based language models that were earlier not possible to use.  

Whereas older style n-gram language models can handle large word vocabularies, newer neural network language models have practical limitations for the vocabulary size. It is shown that subword models reduce the need for workarounds and heuristics and reduce the number of parameters needed to train models with similar power.

Through reasoning and experimentation, it is shown that these new subword-based models outperform word-based models in almost all scenarios. Specifically, the subword-based models are strong for languages with very large vocabularies or languages with limited text resources.
In practice, these inventions and results will allow speech recognition to be more accurate and easier to build, even for those languages that have only little resources available.

Opponent: Professor Thomas Hain, Sheffield University, UK

Supervisor:  Professor Mikko Kurimo Aalto University School of Electrical Engineering, Department of Signal Processing and Acoustics

Thesis webpage

Contact information: Peter Smit, Department of Signal Processing and Acoustics, [email protected], +31622782851

  • Published:
  • Updated:
URL copied!