The purpose of this project is to predict pass/fail of courses for early interventions that can help change student outcomes, in the context of distance learning. How early can we predict outcomes for intervention while still being reliable? Or how soon can we detect that a student is "slipping"? Through this project a number of models are created that effectively predict student outcomes halfway through the semester.
Data includes student demographic data (location, age, disability, education level, gender), student's performance data, and student's interactions (clicks) with the Virtual Learning Environment. This data is provided by the Open University, a British University with the highest number of undergraduate students in the UK. As can be understood from its name, Open University is mainly populated by off-campus students.
During this project a number of machine learning algorithms are evaluated: k-nearest neighbor, naive-bayes, logistic regression, decision trees and feedforward neural networks. Relatively simple algorithms are tested first to help identify patterns in the data. Also, model interpretability is an important factor for this study, so easier to interpret algorithms are tested before neural networks.
All the algorithms performed relatively well, although feedforward neural networks show a slightly better performance. An ensemble of models achieves a f-1 score of 0.819, predicting pass scores with 87% accuracy and fail scores with 77% accuracy.
- Tools: Python (pandas, numpy, matplotlib, scikit-learn, keras).
- Related documents and code: HERE