IRIS Species classification

October 13, 2018

IRIS Species classification

For Complete notebook visit: Github repo

In this Project, I am going to explore the famous iris dataset

This dataset contains the following features :

SepalLengthCm : Sepal length in CM
SepalWidthCm : Sepal Width in CM
PetalLengthCm : Petal length in CM
PetalWidthCm : Petal Width in CM
Species: The species of the flower

Each row represents a flower in this dataset, The goal of this notebook is to predict the species of flower based on the above features

There are three unique flower species in this dataset

Iris-Setosa
Iris-versicolor
Iris-virginica

Iris-Setosa | Source

Iris-Versicolor | Source

Iris-Virginica | Source

First, I am gonna analyze the data

Here are the first five rows of data

	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

Checking the mean and percentile of data

SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
count	150.000000	150.000000	150.000000	150.000000
mean	5.843333	3.054000	3.758667	1.198667
std	0.828066	0.433594	1.764420	0.763161
min	4.300000	2.000000	1.000000	0.100000
25%	5.100000	2.800000	1.600000	0.300000
50%	5.800000	3.000000	4.350000	1.300000
75%	6.400000	3.300000	5.100000	1.800000
max	7.900000	4.400000	6.900000	2.500000

EDA

Sepal Length VS Sepal Width

Petal Length VS Petal Width

Comparing Distributions

plot 1 :

Plot 2 :

Classification :

Before proceeding with classification I am gonna check the correlation between features

So Sepal Width has less correlation with all features so I am gonna drop that features proceed with the data

Logistic Regression :

Test accuracy : 0.83

Train accuracy : 0.95

Support Vector Classifier :

Test accuracy : 0.96

Train accuracy : 0.98

K - Nearest neighbors:

Test accuracy: 1.0

Train accuracy: 0.95

Random-Forest :

Test accuracy: 0.96

Train accuracy: 1.0

For Complete notebook visit: Github repo

Search This Blog

Mrudhuhas

IRIS Species classification

EDA

Sepal Length VS Sepal Width

Petal Length VS Petal Width

Comparing Distributions

Classification :

Logistic Regression :

Support Vector Classifier :

K - Nearest neighbors:

Random-Forest :

Comments

Post a Comment

Popular Posts

Analysis and Prediction of Students performance