Defence of doctoral dissertation in the field of computational science, MSc (Tech) Tuomas Sivula
Applied statistical modelling often involves situations, in which the analysed data set is split into separately handled disjoint sets. One such a situation arises, when the data set is too large to be processed as a whole or the data set has a natural partitioning. Another common use of partitioned analysis occurs when assessing the predictive performance of a model or comparing several models. Various methods have been developed for dealing with such situations. While comprehensively studied in literature before, the behaviour and applicability of these methods under different situations are not completely known.
The main contribution of this dissertation is in analysing several techniques in Bayesian data analysis that involve data partitioning. The analysed methods consist of expectation propagation (EP) for approximate inference and leave-one-out cross-validation (LOO-CV) for model evaluation and comparison. The applicability and behaviour of the methods are studied in different situations. In particular, this dissertation addresses EP as a framework for distributed inference and the uncertainty of the LOO-CV method.
In previous literature, the EP method has been usually considered from the approximate inference point-of-view. In this dissertation, the applicability of the method is studied and demonstrated in the distributed inference setting. In this approach, based on the experimental results, the EP method serves as a convenient generalisable tool with competing efficiency and accuracy compared to some alternative approaches. In addition, the method can be utilised to conveniently reduce the complexity of some problems by separating the inferences of the observations affecting different features.
In the context of the LOO-CV model comparison, the results of the studies in this dissertation show that the uncertainty of the method plays a great role in certain common situations. Estimating this uncertainty is a complex task and the currently popular approaches often produce inaccurate results. However, the dissertation also demonstrates that it is possible to improve upon the current approaches by developing problem-specific estimators. Based on the analysis, the dissertation presents various general considerations that should be taken into account when applying the LOO-CV method for model assessment and selection.
Opponent: Dr. Daniel Hernández-Lobato, Universidad Autónoma de Madrid, Spain
Custos: Professor Aki Vehtari, Aalto University School of Science, Department of Computer Science
Contact information of the doctoral candidate: Tuomas Sivula, [email protected]
The defence will be organised via remote technology (Zoom). Link to the defence.
The dissertation is publicly displayed 10 days before the defence in the publication archive Aaltodoc of Aalto University.