Department of Computer Science: MSc Thesis Presentations

Akseli Anttonen will present his MSc thesis on Thursday 4 July at 13:00 in A346, CS building

Identifying mislabelled data in extreme multi-label text classification – applying confident learning to a medical coding dataset

Author: Akseli Anttonen
Supervisor: Juho Rousu

Abstract: Data annotations in datasets used for machine learning are often produced by human annotation or other noisy processes. Systematic label errors may be introduced to datasets due to biases in the data generation or processing. This means that the given labels in most datasets contain label errors. Mislabels can reduce predictive performance and undermine machine learning models' generalization ability. This thesis investigates mislabel detection in the context of an extreme multi-label text classification task. This is a setting where each text document is annotated with several labels chosen from a set of thousands of options. Experiments are carried out to test one mislabel detection method on a dataset used for automatic medical coding. Automatic medical coding is the task of predicting medical diagnosis or procedure codes based on medical records. The employed method, confident learning, uses the predicted probabilities of a trained model. Cases where the model confidently disagrees with a given label are detected as potential label errors. The mislabel detection is evaluated against a keyword-search-based ground truth on a subset of labels. Furthermore, the effect of cleaning the training set is investigated by re-training the model after correcting label errors. The results suggest that confident learning can spot cases where an erroneous extra label is present with high precision. However, the method is too unreliable to fully automatically clean the dataset. The re-training results show that a model trained on cleaned data is more conservative, having a lower false positive rate, but performs worse overall.

Department of Computer Science

Read more
Mahine Learning researchers working at Department of Computer Science in Aalto University
  • Published:
  • Updated: