Department of Computer Science: MSc Thesis Presentations

Kelly Piho will present her MSc thesis on Tuesday 4 June at 12:00 in A346, CS building

Biomarker Discovery from Multi-view Health Data

Author: Kelly Piho
Supervisor: Juho Rousu
Thesis advisor: Heli Julkunen

Abstract: The increasing prevalence of type 2 diabetes (T2D) creates a growing public health concern. People with T2D often develop complications related to diabetes, prompting the search for biomarkers that signal an increased risk of complications. This thesis explores the connections between molecular risk factors and health outcomes in type 2 diabetics by employing various methods of canonical correlation analysis (CCA). This involves jointly correlating two data views: the first view representing comprehensive health data (metabolomics, biochemical markers, blood count, and baseline characteristics) and the second view representing health outcomes, specifically complications of T2D (nephropathy, myocardial infarction, stroke, neuropathy, and retinopathy). The aim is to uncover and discern both linear and non-linear associations between these two views and potentially identify features that could function as biomarkers indicating an increased risk of complications.

A comparative analysis of three variations of gradKCCA and SCCA-HSIC was conducted utilising data from UK Biobank. For gradKCCA, we considered a linear kernel and polynomial kernels of degree 2 and 3. The findings indicate that both linear gradKCCA and SCCA-HSIC discovered reliable and relevant associations, exhibiting the highest correlation and HSIC values. Notably, SCCA-HSIC uncovered weak but relevant relationships that linear gradKCCA did not detect, while polynomial gradKCCA methods exhibited overfitting and failed to identify generalisable associations. Each method highlighted the significance of glycated haemoglobin (HbA1c), an established biomarker linked to diabetes. Furthermore, both linear gradKCCA and SCCA-HSIC identified several established biomarkers associated with specific complications (e.g., creatinine for kidney function) and general diabetes-related biomarkers (e.g., glucose, albumin). Additionally, SCCA-HSIC recognised the association between insulin resistance and branched-chain amino acids (BCAAs).

This study underscores the efficacy of SCCA-HSIC and the importance of ongoing research into these biomarkers to prevent diabetes complications and improve patient outcomes. Future research should address the limitations of the current deflation strategy, which is derived from linear CCA, to enhance non-linear methods. This research has been conducted using the UK Biobank Resource under application number 147811.   

Department of Computer Science

Read more
Mahine Learning researchers working at Department of Computer Science in Aalto University
  • Published:
  • Updated: