Analysis and Prediction of Students performance


In this post, I am going to analyze Students performance in exams based on the kaggle dataset



AIM of this project is to predict how well the student is going to perform given the attributes
  • The dataset for this project - Data
  • Kaggle Kernel - 
  • Github repo - Github
This is the data we have :

Data columns (total 8 columns): 

  1. gender 1000 non-null object 
  2. race_ethnicity 1000 non-null object 
  3. parental_level_of_education 1000 non-null object
  4. lunch 1000 non-null object 
  5. test_preparation_course 1000 non-null object 
  6. math_score 1000 non-null int64 
  7. reading_score 1000 non-null int64 
  8. writing_score 1000 non-null int64 
dtypes: int64(3), object(5)


we don't have any missing values in the data

I am going to check the distribution of math score, reading score, writing score to see if I can find any outliers



Well, I don't think we have any outliers or missing values from this data. so I am going to proceed
with basic EDA before that let's do some feature engineering

Feature engineering :

I am going to create 4 new features from given data
  1. Total marks - Combining math score, reading score, writing score
  2. Result - Weather the student is pass or fail in the exam
  3. Percentage - Percentage of the student in the exam
  4. Performance - How well a student performed based on the grading system
After creating those features this is how data looks :

Data columns (total 12 columns): 
  1. gender 1000 non-null object
  2. race_ethnicity 1000 non-null object 
  3. parental_level_of_education 1000 non-null object 
  4. lunch 1000 non-null object 
  5. test_preparation_course 1000 non-null object
  6. math_score 1000 non-null int64 
  7. reading_score 1000 non-null int64 
  8. writing_score 1000 non-null int64 
  9. total_marks 1000 non-null int64
  10. result 1000 non-null object
  11. percent 1000 non-null float64 
  12. performance 1000 non-null category

First few rows of data :



EDA : 

Let's check the pass percentage between male and female students 




Female students pass percentage is 4% higher than male students pass percentage
  • male students - 48.2%
  • female students - 51.8%
Now I am going to examine the other features that might affect the performance

race_ethinicity :

So the students from group-C and group-D have good performance when compared to other groups

parental_level_of_education :

The feature parental_level_of_education has 6 unique values
  1. high school
  2. some high school
  3. associate's degree
  4. some college
  5. bachelor's degree
  6. master's degree
I am going to create a new feature based on the above data for better analysis
  •  basic_education         -   high school, some high school 
  •  good education          -   associate's degree, some college
  • high-level education   -  bachelor's degree, master's degree
Now I am gonna analyze this feature

Parent with a basic and good education have more students than parents with high-level education

I am going to examine the pass percentage of each category

So the students whose parents have basic education have less pass percent when compared to the parents with good and high-level education
  • Students of parents with basic education pass percent - 94.7%
  • Students of parents with good education pass percent - 98.2%
  • Students of parents with high-level education pass percent - 98.9%


Lunch :

I am gonna analyze if lunch has any impact on student performance


So students with free or reduced lunch might not perform well in exams


test_preparation_course :


 More than 50% of students didnt complete test preparation but let's see if there is any difference in pass percent
The students who completed the course have more pass percentage than students who didn't complete the course so test preparation course does have an impact on the student performance

we can even analyze further and compare the independent performances in math, reading, writing but I am going to stop this analysis here

Conclusion :

The students with the following attribute are gonna perform better than other students

  • race_ethinicity - group c or group d
  • parents _education - Good education or high-level education
  • lunch - Standard
  • test_preperation - Completed

For Prediction and Code for this project please go to following links

Comments

Popular Posts