A ML algorithm capable of conducting an in-depth analysis of students' responses to STACK questions for the Ethical AI Hackathon promoted by Sage Foundation.
STACK is the world-leading open-source online assessment system for mathematics and STEM. It is available for Moodle, ILIAS and as an integration through LTI.
The algorithms used were CA for discovery lexical similarity between students' incorrect answers and K-means to cluster them.
- Team Introduction & Understanding the Problem
- Data Cleaning
- Python Scripts & Visualisation
- ML algorithms
- Presentation
- Future Improvements
- Team & Researchers
Our challenge in this hackathon is to develop a machine learning algorithm to analyze students' responses to STACK questions. The aim is to classify correct vs. incorrect responses, further delve into the types of incorrect responses, group similar incorrect responses, and identify any outlier responses.
To devise an algorithm that effectively provides an in-depth analysis of students' answers to STACK questions.
- Classification of Correct vs. Incorrect Responses
- Multilevel Classification of Incorrect Responses (Predicted vs. unpredicted responses using PRT paths)
- Cluster Analysis - Grouping Similar incorrect responses
- Anomaly Detection Based on Question Text
For the purposes of our analysis, only the finished attempts are considered.
Each objective was approached with a dedicated Python script, followed by visualization to represent the analysis results.
- Script for Objective 1-2: Link to the Code
- Script for Objective 3-4: Link to the Code
- Contingency tables: for each type of question, a contingency table of students'answer was build using as vocabulary the characters present in each response.
- Correspondence Analysis (CA): for each type of question, 2D CA was performed on predicted and not predicted wrong students'answers in order to analyzes lexical (dis)similarities between them.
- K-means: used for clustering to understand common errors for each type of question, using as input the results from each CA.
- Data Saving and Retrieval: save analyzed data for future use or further analysis.
The findings, algorithm, and insights were compiled and documented for presentation to the Hackathon judges.
- After individual testing, all code blocks should be integrated into a single program.
- Increase num of dimensions for Correspondence Analysis (3D).
- Add mathematical functions and symbol to the vocabulary for creating contingency table.
- Choose effective num of clusters based on the better view of data from 3D CA.
- Create API to fetch this clustering data and work as an input to the STACK system.
Below are the contributors to this project:
- Umang Murawat: LinkedIn team-leader
- Gaurav Khetwal: LinkedIn
- Navjot Singh: LinkedIn
- Davinder Singh
- Jivan Goyal
- Google Colab: https://colab.research.google.com
- Prince for Correspondence Analysis: https://pypi.org/project/prince/
- Plotly for Interactive Plots: https://plotly.com/python/
- KMeans Clustering: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html