Public defence in Computer Science, M.Sc. (Tech) Lassi Meronen
Title of the doctoral thesis: Uncertainty Quantification in Deep Learning
Doctoral student: Lassi Meronen
Opponent: Associate Prof. Carl Henrik Ek, University of Cambridge, England
Custos: Assistant Prof. Arno Solin, Aalto University School of Science, Department of Computer Science
Thesis available for public display 10 days prior to the defence at: https://aaltodoc.aalto.fi/doc_public/eonly/riiputus/
Uncertainty Quantification in Deep Learning
Artificial intelligence has been a hot topic in the news recently. When artificial intelligence is mentioned, it usually refers to using deep learning models, such as the recent ChatGPT. These models can be powerful, but they are often overconfident, poor at estimating the uncertainty in their predictions, and mostly unable to say “I do not know” when encountering an unfamiliar situation. This overconfidence becomes an issue in safety-critical applications, such as self-driving cars, where poor decision-making can lead to severe consequences. For example, a deep learning model could be helpful in automatic medical diagnosis from medical imaging data if the model can accurately determine 90% of the cases, leaving only 10% as uncertain cases that require checking by a doctor. However, if the model cannot tell which cases are uncertain, the doctor would need to go through all the cases after all, making the deep learning model useless.
The research presented in this doctoral thesis focuses on improving the ability of deep learning models to estimate the uncertainty in their predictions, which would allow the use of deep learning models in safety-critical applications more widely. The main results stem from building mathematical connections to principled probabilistic models that are known to have high-quality uncertainty estimates for their predictions. The discovered connections allow bringing these beneficial properties into deep learning models. Improving uncertainty estimates of deep learning models also allows them to separate complex inputs from easy ones. This ability can be used to save on computational resources used by deep learning models, as heavy computation only needs to be targeted to complex inputs. Such smart allocation of computational resources reduces the power consumption of deep learning models, which is highly valuable, as the models keep growing in size, and their use requires more and more energy.
Doctoral theses in the School of Science: https://aaltodoc.aalto.fi/handle/123456789/52