IRIS Species classification


For Complete notebook visit: Github repo

In this Project, I am going to explore the famous iris dataset

This dataset contains the following features :
  • SepalLengthCm : Sepal length in CM
  • SepalWidthCm : Sepal Width in CM
  • PetalLengthCm : Petal length in CM
  • PetalWidthCm : Petal Width in CM
  • Species: The species of the flower

Each row represents a flower in this dataset, The goal of this notebook is to predict the species of flower based on the above features

There are three unique  flower species in this dataset

  1. Iris-Setosa
  2. Iris-versicolor
  3. Iris-virginica

Iris-Setosa  | Source
Iris-Versicolor | Source
Iris-Virginica | Source
First, I am gonna analyze the data

Here are the first five rows of data 

SepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCmSpecies
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa

Checking the mean and percentile of data

SepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm
count150.000000150.000000150.000000150.000000
mean5.8433333.0540003.7586671.198667
std0.8280660.4335941.7644200.763161
min4.3000002.0000001.0000000.100000
25%5.1000002.8000001.6000000.300000
50%5.8000003.0000004.3500001.300000
75%6.4000003.3000005.1000001.800000
max7.9000004.4000006.9000002.500000

EDA

Sepal Length VS Sepal Width



Petal Length VS Petal Width


Comparing Distributions

plot 1 :

Plot 2 :


Classification :

Before proceeding with classification I am gonna check the correlation between features 


So Sepal Width has less correlation with all features so I am gonna drop that features proceed with the data

Logistic Regression :

Test accuracy : 0.83
Train accuracy : 0.95

Support Vector Classifier :

Test accuracy : 0.96
Train accuracy : 0.98

K - Nearest neighbors:

Test accuracy: 1.0
Train accuracy: 0.95

Random-Forest :

Test accuracy: 0.96
Train accuracy: 1.0

For Complete notebook visit: Github repo

Comments

Popular Posts