This is by far my biggest and most favorite project. Take a look at the project details; you might find it interesting.
This dataset contains information on the performance of students in the GCE Advanced Level (AL) exam in Sri Lanka in 2020. It was collected by Sasika Amarasinghe and is available on Kaggle.
I removed some columns from the original dataset for ethical reasons. However, here is a sample of the data when a search query is entered.
When a school candidate's name is provided, the system retrieves comprehensive details, including their birthdate, which is not originally disclosed on the exam result sheet. (Applicable to candidates from the 2020 AL batch 😄)
- The dataset consists of over 300,000 records of student performance in the GCE AL exam in Sri Lanka.
- The data includes information on student identification, school, district, medium of instruction, stream, and their scores in different subjects.
- The data also includes the overall Z-score of each student, which is a standard score that indicates the number of standard deviations by which the student's exam results are above or below the mean.
Index
: A unique identifier for each studentSchool ID
: Identification number of the schoolDistrict
: District where the school is locatedStream
: Science, Arts, or Commerce stream of the studentMedium
: Sinhala or English medium of instructionSubjects
: The scores of the student in each of the subjects - Mathematics, Science, English, Buddhism, and HistoryZ-Score
: The overall Z-score of the studentBday
: Birthday of applicant
- This dataset can be used to study the performance of students in different subjects and in different streams, medium of instruction, and districts.
- The data can also be used to study the relationship between student performance and demographic factors such as medium of instruction and district.
- This dataset can be used to identify the factors that contribute to the performance of students in the GCE AL exam and to make recommendations for improving student performance in the future.
- 9.41 / 10
- Data was collected from (https://www.doenets.lk/examresults) which is the exam result site in Sri Lanka
- The data was scraped using a Python script written by the author, using the index number as the primary key.
- Subsequently, the national identity card numbers were decoded to extract applicants' birthdays and genders.
- Due to privacy concerns, "Full name," "National Identity Card number," and "Index number" were removed, but the birthdays and genders were added to the dataset.
- AWS EC2 instances were employed to collect data concurrently, reducing both the time and data usage.
I was awarded a bronze 🥉 medal for this dataset, receiving 38 upvotes in the Kaggle Community, along with very positive feedback from the community members.
⭐
This can be actually used to look after the academic likelihoods and whereabouts of Sri Lankan students' academics! Great job!
-- VISHESH THAKUR - Datasets Expert
⭐
This data could be used for EDA, visualization and even model development! Good work and great dataset!
-- RAVI RAMAKRISHNAN-Notebooks Grandmaster