Doctoral theses of the School of Electrical Engineering at Aaltodoc (external link)
Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.
The title of the thesis: Unsupervised Audio Enhancement with Diffusion-Based Generative Models
Thesis defender: Eloi Moliner Juanpere
Opponent: Prof. Bozena Kostek, Gdansk University of Technology, Poland
Custos: Prof. Vesa Välimäki, Aalto University School of Electrical Engineering
Audio recordings are often degraded by noise, reverberation, or other distortions that reduce their clarity and quality. This is seen in historical music recordings damaged by the ageing of analogue media, or in speech recordings where reverberation makes voices difficult to understand. This doctoral thesis presents new approaches to restoring such recordings through unsupervised audio enhancement with diffusion-based generative models—powerful AI systems that can produce clean, natural sound without being trained for specific restoration tasks.
The thesis brings together a series of studies showing how a generative model trained only on undistorted audio can be adapted to a wide range of restoration problems during inference. The first part demonstrates this approach in music bandwidth extension, inpainting, and declipping. The second part addresses blind restoration, where the type of degradation is unknown, with methods that estimate and correct spectral changes in historical gramophone recordings and regenerate missing content. The final part focuses on single-channel blind speech dereverberation, combining a diffusion model with a parametric room acoustics model to recover clean speech while also estimating the acoustic properties of the recording space.
These studies show that diffusion-based generative models can match or surpass specialised supervised systems, particularly in conditions that differ from their training data. The results open new possibilities for restoring cultural heritage recordings, improving speech clarity in challenging environments, and creating adaptable tools for audio processing in media production, broadcasting, and forensics.
Thesis available for public display 10 days prior to the defence at Aaltodoc.
Doctoral theses of the School of Electrical Engineering are available in the open access repository maintained by Aalto, Aaltodoc.