Analysis and Prediction of Students performance
In this post, I am going to analyze Students performance in exams based on the kaggle dataset
AIM of this project is to predict how well the student is going to perform given the attributes
This is the data we have :
Data columns (total 8 columns):
we don't have any missing values in the dataData columns (total 8 columns):
- gender 1000 non-null object
- race_ethnicity 1000 non-null object
- parental_level_of_education 1000 non-null object
- lunch 1000 non-null object
- test_preparation_course 1000 non-null object
- math_score 1000 non-null int64
- reading_score 1000 non-null int64
- writing_score 1000 non-null int64
I am going to check the distribution of math score, reading score, writing score to see if I can find any outliers
Well, I don't think we have any outliers or missing values from this data. so I am going to proceed
with basic EDA before that let's do some feature engineering
Feature engineering :
I am going to create 4 new features from given data- Total marks - Combining math score, reading score, writing score
- Result - Weather the student is pass or fail in the exam
- Percentage - Percentage of the student in the exam
- Performance - How well a student performed based on the grading system
After creating those features this is how data looks :
Data columns (total 12 columns):
Data columns (total 12 columns):
- gender 1000 non-null object
- race_ethnicity 1000 non-null object
- parental_level_of_education 1000 non-null object
- lunch 1000 non-null object
- test_preparation_course 1000 non-null object
- math_score 1000 non-null int64
- reading_score 1000 non-null int64
- writing_score 1000 non-null int64
- total_marks 1000 non-null int64
- result 1000 non-null object
- percent 1000 non-null float64
- performance 1000 non-null category
Let's check the pass percentage between male and female students
Female students pass percentage is 4% higher than male students pass percentage
More than 50% of students didnt complete test preparation but let's see if there is any difference in pass percent
- male students - 48.2%
- female students - 51.8%
Now I am going to examine the other features that might affect the performance
race_ethinicity :
So the students from group-C and group-D have good performance when compared to other groups
parental_level_of_education :
The feature parental_level_of_education has 6 unique values
- high school
- some high school
- associate's degree
- some college
- bachelor's degree
- master's degree
I am going to create a new feature based on the above data for better analysis
- basic_education - high school, some high school
- good education - associate's degree, some college
- high-level education - bachelor's degree, master's degree
Now I am gonna analyze this feature
Parent with a basic and good education have more students than parents with high-level education
I am going to examine the pass percentage of each category
So the students whose parents have basic education have less pass percent when compared to the parents with good and high-level education
- Students of parents with basic education pass percent - 94.7%
- Students of parents with good education pass percent - 98.2%
- Students of parents with high-level education pass percent - 98.9%
Lunch :
I am gonna analyze if lunch has any impact on student performance
So students with free or reduced lunch might not perform well in exams
test_preparation_course :
The students who completed the course have more pass percentage than students who didn't complete the course so test preparation course does have an impact on the student performance
we can even analyze further and compare the independent performances in math, reading, writing but I am going to stop this analysis here
Conclusion :
The students with the following attribute are gonna perform better than other students
- race_ethinicity - group c or group d
- parents _education - Good education or high-level education
- lunch - Standard
- test_preperation - Completed
For Prediction and Code for this project please go to following links
Comments
Post a Comment