Events

Department of Computer Science: MSc Thesis Presentation

Lectures and seminars

Ellimari Paunio will present her MSc thesis "Learning interpretable predictive biomarkers from multi-omics data" on Monday 10 July at 15:00 in A346, CS building.

When

10.7.2023 15:00 – 15:30 (UTC +3)

Where

Computer Science building

meeting room A346

Event language(s)

English

Learning interpretable predictive biomarkers from multi-omics data

Author: Ellimari Paunio

Advisor: Riikka Huusari, Taneli Pusa

Supervisor: Juho Rousu

Abstract: Advancements in technologies for generating large-scale omics data and the extensive development of machine learning methods are transforming the field of medicine by providing opportunities for improved diagnosis, monitoring, and treatment of diseases through the use of multivariate biomarkers. Moreover, advancements offer opportunities for precision medicine, where treatments can be tailored to the needs of individual patients. Multivariate biomarker discovery, which involves the prediction of clinical outcomes reproducibly using a small set of biomarkers, has emerged as a promising approach. However, from a machine learning perspective, the integration of multi-omics data to discover multi-omics biomarkers remains challenging. In addition, transparency and explainability of models are key issues in the translation into clinical practice.

Recently proposed sparse pre-image kernel machines offer embedded feature selection and improved interpretability compared to traditional kernel methods. Another benefit for discovering multi-omics biomarkers is that sparse pre-image kernel machines can be extended to multi-view learning. This thesis explores the application of sparse pre-image kernel machines to multivariate biomarker discovery with a multi-omics coronavirus disease 2019 data set. To study, whether the stability of feature selection can be improved, the thesis couples a method known as stability selection with sparse pre-image kernel machines. The stability of feature selection and model performance with the selected features are compared to two baseline methods, random forest and logistic regression.

The thesis considers two types of feature selection pipelines for sparse pre-image kernel machines, where the first is a general grid search approach to select a level of regularization, and thus features. In the second pipeline, sparse pre-image kernel machines are combined with stability selection. Results show that stability selection improves the stability of the learned features significantly. In addition, the proposed multi-view approach learns a more balanced set of features in terms of learning features from both views compared to other methods. The findings of this thesis provide insights into the potential application of sparse pre-image kernel machines for discovering multi-omics biomarkers in complex diseases.

Updated: 4.7.2023
Published: 4.7.2023