Machine learning can predict students’ graduation with 74% probability
In his thesis, Lauri Viitanen, a student of Computer Science and Engineering, applied survival models to the information in the student register at Metropolia University of Applied Sciences with the aim of finding variables that can best distinguish the students who will graduate from those who will drop out. In addition, he also aimed to find variables that predict the remaining study time the most accurately.
‘In my thesis, I composed a model that will predict the graduation date of students at Metropolia University of Applied Sciences based on their study performance in their first year and other explanatory factors. The explanatory factors in the model included, for example, age, gender, the field of a prior study right, whether or not the studies begun during the spring term, credit points accumulated during the first year and the weighted average of grades.
In the thesis, students were classified using naive Bayes classifiers, generalised linear models, support vector machine classifiers and Gaussian processes. Gaussian processes have not been applied to a similar material before. Although survival models are very well suited to this kind of longitudinal studies, Gaussian processes can be used to increase their flexibility. It would be useful to examine exactly how much accuracy would be improved by increased flexibility.
‘In my thesis, I also compared how accurately some well-known machine learning methods could classify students right after their first year of studies either to those that will eventually graduate or those that will drop out.
An accuracy of 74 per cent could be achieved with the best machine learning method, which was the support vector machine. In other words, for three out of four students, the end of their study right could be estimated correctly as early as after the first year of studies. Completing extra credits during the first year increased the graduation probability more than an improvement of the grade average by one grade. It has not been studied before what kind of influence the total number of credits accumulated has on graduation in comparison to other factors.
‘Metropolia University of Applied Sciences intends to utilise the results of the thesis in planning their budget, as inaccurate estimation of graduation dates makes it more difficult to predict future funding. By using the student-specific model, Metropolia intends to reduce the error. The results are also likely to be utilised in the students’ workspaces so that it will be easy for guidance counsellors and group leaders to detect the students whose progress they should pay most attention to,’ Viitanen estimates the possibilities provided by the study.
The results of the study are in line with earlier studies: the grade average and the student’s gender are significant variables when estimating graduation probability. Girls are more likely to graduate and they graduate faster than boys, regardless of the subject. The student’s age had a negative effect on graduation probabilities both in this study and in almost all earlier studies.