Public defence in Information and Computer Science, M.Sc. (Tech) Viivi Halla-aho

Probabilistic methods for improving results from DNA methylation analysis
Doctor's hat

Title of the doctoral thesis: Probabilistic modeling of DNA methylation sequencing data

Opponent: Professor Jan Komorowski, Uppsala University, Sweden
Custos: Professor Harri Lähdesmäki, Aalto University School of Science, Department of Computer Science

The public defence will be organised on campus (Maarintie 8, lecture hall AS2).

Thesis available for public display at:
Electronic thesis can be found at:

Public defence announcement:

DNA methylation is a gene-regulating epigenetic modification in which methyl groups attach to the DNA. The connection between aberrant DNA methylation and different diseases has been widely studied. DNA methylation can be measured with sequencing methods and in this thesis data from two types of such methods, bisulfite sequencing and cfMeDIP-seq, was modeled. Bisulfite sequencing is a widely used method which enables uncovering DNA methylation states on base pair level. The more recent method, cfMeDIP-seq, can be used to measure the DNA methylation states of cell-free DNA. Cell-free DNA consists of DNA fragments released to the bloodstream by the tissues of the body. Detecting changes in the cell-free DNA is considered as a potential non-invasive cancer screening method. The aim of this thesis was to develop and improve methods for bisulfite sequencing data based differential DNA methylation analysis and for cfMeDIP-seq-based cancer classification.

The thesis proposes two DNA methylation analysis tools for bisulfite sequencing data, which utilize the correlation between the methylation states of proximate cytosines through a new correlation structure. In addition, the thesis proposes a workflow for preprocessing bisulfite sequencing data with a method for handling inflated p-values from differential DNA methylation analysis. The accuracy of cfMeDIP-seq-based cancer classification was improved with probabilistic methods and different feature selection methods. All the models proposed in this thesis share the probabilistic approach where data and model parameters are described with probability distributions to quantify the uncertainty about the underlying process.

The results presented in the thesis show that probabilistic modeling and Bayesian methods work well and can improve the analysis of DNA methylation sequencing data. The proposed methods enable analysis of DNA methylation with improved accuracy and their applications include uncovering the causes of different diseases and cancer screening. The methods are available as open source code for researchers and bioinformaticians to use.

Contact details of the doctoral candidate:  [email protected], +358 504918991

  • Published:
  • Updated:
URL copied!