IRIS Species classification
For Complete notebook visit: Github repo
In this Project, I am going to explore the famous iris dataset
This dataset contains the following features :
- SepalLengthCm : Sepal length in CM
- SepalWidthCm : Sepal Width in CM
- PetalLengthCm : Petal length in CM
- PetalWidthCm : Petal Width in CM
- Species: The species of the flower
Each row represents a flower in this dataset, The goal of this notebook is to predict the species of flower based on the above features
There are three unique flower species in this dataset
- Iris-Setosa
- Iris-versicolor
- Iris-virginica
Iris-Setosa | Source |
Iris-Versicolor | Source |
Iris-Virginica | Source |
Here are the first five rows of data
SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Checking the mean and percentile of data
SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | |
---|---|---|---|---|
count | 150.000000 | 150.000000 | 150.000000 | 150.000000 |
mean | 5.843333 | 3.054000 | 3.758667 | 1.198667 |
std | 0.828066 | 0.433594 | 1.764420 | 0.763161 |
min | 4.300000 | 2.000000 | 1.000000 | 0.100000 |
25% | 5.100000 | 2.800000 | 1.600000 | 0.300000 |
50% | 5.800000 | 3.000000 | 4.350000 | 1.300000 |
75% | 6.400000 | 3.300000 | 5.100000 | 1.800000 |
max | 7.900000 | 4.400000 | 6.900000 | 2.500000 |
EDA
Sepal Length VS Sepal Width
Petal Length VS Petal Width
plot 1 :
Plot 2 :
Classification :
Before proceeding with classification I am gonna check the correlation between features
So Sepal Width has less correlation with all features so I am gonna drop that features proceed with the data
Logistic Regression :
Test accuracy : 0.83
Train accuracy : 0.95
Support Vector Classifier :
Test accuracy : 0.96
Train accuracy : 0.98
K - Nearest neighbors:
Test accuracy: 1.0
Train accuracy: 0.95
Random-Forest :
Test accuracy: 0.96
Train accuracy: 1.0
For Complete notebook visit: Github repo
Comments
Post a Comment