Department of Computer Science: MSc Thesis Presentation
Date: Wednesday, 18 May 2022
Title: Cancer Detector on Histological Slides Using Semi-supervised Learning
Author: Atte Föhr
Advisor: Joona Pohjonen, Helsinki Urological Cancer Research Group, University of Helsinki
Supervisor: Juho Rousu
There is a growing interest for computer aided diagnosis in the field of pathology. Diagnosing vast amounts of histological samples takes time from physicians. This process can be eased with using machine learning to help doctors diagnose faster, more cost effectively and more accurately.
Computer vision has taken huge steps in the last decade. It has outperformed humans in many tasks such as classification. This has been due to growing datasets, processing power and research on the topic. While the availability of data has grown, so has the need to label them. This can become expensive, especially in the medical field. One solution to this problem can be in semi-supervised learning. It uses both labelled and unlabelled data during the training process, and the hope is that the additional data increases the model's performance.
In this work I train and validate semi-supervised deep learning models using histological images of renal cell carcinoma. Two different tasks are trained and validated: One to predict cancer and another to predict cancer relapse. The initial model is trained with labelled data in a supervised manner. Then the trained model is used to pseudo-label unlabelled images, that are in turn used in the semi-supervised training with the original labelled data.
The addition of pseudo-labelled data did not increase the models' performances. In cancer prediction, the supervised model achieved an average of 97.5% for balanced accuracy and 0.991 for AUROC. The semi-supervised models did not reach as high accuracies but did perform similarly and within the range of statistical significance. For relapse prediction the models performed worse. The supervised model received a 72.2% in balanced accuracy and 0.773 in AUROC. Again, almost all of the semi-supervised models produced similar results as the original model, but within the range of statistical significance. The only model to statistically underperform with respect to the rest of the models was the one that was trained with all available data.