Department of Computer Science: MSc Thesis Presentation
When
Where
Event language(s)
Learning interpretable predictive biomarkers from multi-omics data
Author: Ellimari Paunio
Advisor: Riikka Huusari, Taneli Pusa
Supervisor: Juho Rousu
Abstract: Advancements in technologies for generating large-scale omics data and the extensive development of machine learning methods are transforming the field of medicine by providing opportunities for improved diagnosis, monitoring, and treatment of diseases through the use of multivariate biomarkers. Moreover, advancements offer opportunities for precision medicine, where treatments can be tailored to the needs of individual patients. Multivariate biomarker discovery, which involves the prediction of clinical outcomes reproducibly using a small set of biomarkers, has emerged as a promising approach. However, from a machine learning perspective, the integration of multi-omics data to discover multi-omics biomarkers remains challenging. In addition, transparency and explainability of models are key issues in the translation into clinical practice.
Recently proposed sparse pre-image kernel machines offer embedded feature selection and improved interpretability compared to traditional kernel methods. Another benefit for discovering multi-omics biomarkers is that sparse pre-image kernel machines can be extended to multi-view learning. This thesis explores the application of sparse pre-image kernel machines to multivariate biomarker discovery with a multi-omics coronavirus disease 2019 data set. To study, whether the stability of feature selection can be improved, the thesis couples a method known as stability selection with sparse pre-image kernel machines. The stability of feature selection and model performance with the selected features are compared to two baseline methods, random forest and logistic regression.
The thesis considers two types of feature selection pipelines for sparse pre-image kernel machines, where the first is a general grid search approach to select a level of regularization, and thus features. In the second pipeline, sparse pre-image kernel machines are combined with stability selection. Results show that stability selection improves the stability of the learned features significantly. In addition, the proposed multi-view approach learns a more balanced set of features in terms of learning features from both views compared to other methods. The findings of this thesis provide insights into the potential application of sparse pre-image kernel machines for discovering multi-omics biomarkers in complex diseases.