Public defence in Acoustics and Audio Signal Processing, M.Sc. Leonardo Fierro

Public defence from the Aalto University School of Electrical Engineering, Department of Information and Communications Engineering
Abstract visualization of the Sines-Transients-Noise decomposition

The title of the thesis: Audio Decomposition for Time Stretching

Doctoral student: Leonardo Fierro
Opponent: Prof. Simon Dixon, Queen Mary University of London, UK 
Custos: Prof. Vesa Välimäki, Aalto University School of Electrical Engineering, Department of Information and Communications Engineering 

Time stretching is an audio signal processing task that involves slowing down a sound without altering its frequency content. The research presented in this thesis explores transients and noise sounds in the context of audio processing and investigates the use of sound decomposition to improve the quality of time stretching for normal and extreme stretching factors. 

Traditionally, time-stretching methods often introduce artifacts, such as phasiness and transient smearing, especially when the stretching factor is large. To address the issue, this thesis introduced an improved method to decompose sounds into their constituent sine, transient, and noise components, and a different processing technique can be separately applied to each individual class. This allows for better preservation of transient features, even at extreme stretching factors, and improves the perceived quality of time-stretched audio signals compared to traditional methods. 

This thesis also presents an alternative audio-visual evaluation method for audio decomposition using an interactive audio player application, where access to the individual sinusoidal, transient, and noise classes is granted through a graphical user interface. This application aims at covering the shortcomings of misused objective metrics and promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. 

This thesis also discusses the motivation behind the use of the sines-transient-noise decomposition for time stretching by analyzing the performance drop in a well-known time-stretching method due to incorrect transient and noise handling. This work shows that, by adopting the proposed three-way decomposition within its framework, the performance of the aforementioned method is increased. 

The noise component is typically overlooked by conventional time-stretching methods. This thesis introduces a novel hybrid design using a deep learning model to generate the stretched noise component with high quality even for extreme stretching factors, when the sound is slowed down by more than four times as it happens for slow motion sport videos or synthesis of ambient music. A simple and effective solution named noise morphing is proposed, producing state-of-the-art results across a wide range of inputs and stretching factors.

Thesis available for public display 10 days prior to the defence at:


Email  [email protected]
Mobile  +393486461249

Doctoral theses in the School of Electrical Engineering:

  • Published:
  • Updated: