Spatial Audio Techniques

Our research in spatial audio technologies concentrates mainly on non-linear methods of capturing, transmitting, synthesizing, and reproducing spatial sound scenes.

The result is a decade of development on the technique Directional Audio Coding (DirAC). It is a general purpose method for encoding recorded spatial sound in time-frequency domain based on psychoacoustic assumptions. The result is a single B-format or mono audio stream and a metadata stream. This metadata stream contains the frequency-dependent information about the direction of arrival and diffuseness for the spatial sound scene.

The group members participating in this research concentrate on different areas and applications of the DirAC technology. One constant aim is to perfect DirAC for all sound material, and different recording and reproduction systems. The current main applications of DirAC under study are its use in teleconferencing, in high-quality spatial sound reproduction, in virtual worlds, and in technical audiology. Additional work is done on other topics related to spatial sound, that is, in non-linear beamforming and general optimal frameworks for spatial audio.

In teleconferencing, DirAC can be used efficiently and scalably transmit the spatial sound scene between the participants as well as possible and is acceptable for the bandwidth demands. The advantage of this technology is the natural separation between multiple participants, thus enabling more natural conversation.

In high-quality reproduction, DirAC can be used as a straightforward tool to store the spatial sound and convert between different recording formats and reproduction systems. The main research topics in this concentrate on the different problematic signals, e.g., applauses, and on the development of simple tools for end users, e.g., studio technicians. Furthermore, the natural extention to this work is the research on general optimal frameworks for spatial audio.

In virtual world applications, the general principles of DirAC is used to create intuitive and low-cost tools for synthesizing high-quality spatial audio. Currently, the main work is done in synthesis of spatial extent and development of general architecture using the created tools. Additionally, work has been done on spatial audio effects that can be applied in this context and in high-quality reproduction.

A novel area for DirAC is its applications in technical audiology. DirAC can be used to reproduce real recorded sound scenes and corresponding room acoustics for soundfield speech audiometry, which can enable the measuring of real-world representative hearing abilities in clinical environments. In addition, the soundfield-analysis principles of DirAC can be applied to binaural hearing aids. In this case, the analyzed parameters of sound field are used to amplify desired sound and attenuate undesired and reverberant sound, enabling the user for example to hear speech better and with less effort in background noise.

Another technology we have developed is in the class of parametric spatial audio processing techniques and is called the cross pattern coherence algorithm (CroPaC). It is a novel, coherence-based spatial-filtering method, utilizing microphones with higher order directional characteristics, that computes a gain/attenuation parameter and is then applied to a reference signal. The advantage of this technology is its performance is suppressing noise, especially in the low frequency region, by employing small-sized microphone arrays. Some of the potentials of CroPaC are sound extraction corrupted by interfering sounds and noise reduction. Application candidates for CroPaC are teleconferencing systems and directional microphone design.

Selected publications

Delikaris-Manias S. and Pulkki V., “Cross Pattern Coherence Algorithm for Spatial Filtering Applications Utilizing Microphone Arrays”, in IEEE Transactions on Audio, Speech and Language Processing, 21(11), 2013.

Koski T., Sivonen V., Pulkki V., "Measuring speech intelligibility in noisy environments reproduced with parametric spatial audio", in AES 135th Convention, 2013.

Pihlajamäki T., Laitinen M.-V., Pulkki V., "Modular Architecture for Virtual-World Parametric Spatial Audio Synthesis", in AES 49th International conference, London, United Kingdom, 2013

Politis A., Pihlajamäki T., Pulkki V., "Parametric Spatial Audio Effects", in 15th International conference on Digital Audio Effects (DAFx-12), York, United Kingdom, 2012.

Laitinen M.-V., Pihlajamäki T., Erkut C., Pulkki V., "Parametric time-frequency representation of spatial sound in virtual worlds", ACM Transactions on Applied Perception, 9(2), 2012.

Del Galdo G., Taseska M., Thiergart O., Ahonen J., Pulkki V., "The diffuse sound field in energetic analysis", The Journal of the Acoustical Society of America, 131, 2012.

Ahonen J., Del Galdo G., Kuech F., Pulkki V., "Directional Analysis with Microphone Array Mounted on Rigid Cylinder for Directional Audio Coding", The Journal of the Audio Engineering Society, 60(5), 2012.

Laitinen M.-V., Kuech F., Disch S., Pulkki V., "Reproducing Applause-Type Signals with Directional Audio Coding", The Journal of the Audio Engineering Society, 59(1/2), 2011.

Vilkamo J., Lokki T., Pulkki V., "Directional audio coding: Virtual microphone-based synthesis and subjective evaluation", The Journal of the Audio Engineering Society, 57(9), 2009.

Pulkki V., "Directional audio coding in spatial sound reproduction and stereo upmixing", in AES 28th International Conference, Pitea, Sweden, June 2006.

Pulkki V. and Faller C., "Directional audio coding: Filterbank and STFT-based design", in AES 120th Convention, Paris, France, May 2006.

Published: 10.9.2018
Updated: 1.7.2021