Events

Public defence in Acoustics and Speech Technology, M.Sc. Mohammad Hassan Vali

Public defences

Public defence from the Aalto University School of Electrical Engineering, Department of Information and Communications Engineering

Doctoral hat floating above a speaker's podium with a microphone.

When

7.2.2025 12:00 – 15:00 (UTC +2)

Where

School of Business & Online

Lecture hall V001

Event language(s)

English

The title of the thesis: Vector Quantization in Deep Neural Networks for Speech and Image Processing

Thesis defender: Mohammad Hassan Vali
Opponent: Dr. Jean-Marc Valin, Google Inc.
Custos: Prof. Tom Bäckström, Aalto University School of Electrical Engineering, Department of Information and Communications Engineering

Vector quantization (VQ) is a classic signal processing technique that models the probability density function of a distribution using a set of representative vectors called codebook (or dictionary). Deep neural networks (DNNs) are a branch of machine learning that has gained popularity in recent decades. Since VQ provides an abstract high-level discrete representation of a distribution, it has been widely used in various DNN-based applications such as speech recognition, image generation, and speech and video coding. Hence, a small improvement in VQ can significantly boost the performance of many applications dealing with different data types, such as speech, image, video, and text.

This thesis mainly focuses on improving various VQ methods within deep learning frameworks, including:

Improvement in Performance: The perception quality of the final compressed speech signal is improved by replacing the scalar quantization with VQ used to model the spectral envelopes of speech in a machine learning-based speech coding model.
Improvement in Training: VQ is non-differentiable, and thus, it cannot backpropagate gradients. Noise Substitution in Vector Quantization (NSVQ) is proposed as a new solution to this issue that trains VQ codebook better than two state-of the-art solutions, i.e., Straight-Through Estimator and Exponential Moving Average (EMA).
Improvement in Interpretability: With the combination of VQ and space-filling curves concepts, a new quantization technique called Space-Filling Vector Quantization (SFVQ) is proposed. This technique helps to interpret the latent spaces of DNNs.
Improvement in Speaker's Privacy: The Space-Filling Vector Quantization technique allows a new clustering of the speaker embeddings that enhances the speaker's privacy in speech processing tools based on DNNs which employs VQ.

Keywords: Vector Quantization, Deep Neural Networks, Space-Filling Curves, Space-Filling Vector Quantization, Gradient Collapse, Interpretability, Speaker Anonymization

Thesis available for public display 10 days prior to the defence at: https://aaltodoc.aalto.fi/doc_public/eonly/riiputus/

Contact:

mohammad.vali@aalto.fi

Doctoral theses in the School of Electrical Engineering: https://aaltodoc.aalto.fi/handle/123456789/53

Updated: 17.1.2025
Published: 14.1.2025