DECISION TREE CLASSIFICATION MODEL

A decision tree classifier is a supervised machine learning technique. It has four main parts in its structure :

1.Root nodes:

The main node of tree where the decision tree starts.

2.Branches

Branches divide a decision node into a sub tree.

3.Leaf nodes:

Leaf nodes are the final output notes which cannot be further divided into any nodes.

4.Decision nodes

Decision nodes are the nodes which can be further divided into sub trees. The decision nodes are further divided into other decision nodes or leaf nodes.


For modeling I used the dataset from : https://data.world/uci/occupancy-detection


For training the model I used training dataset and for testing I used test dataset.


I used Temperature, Humidity, Light, CO2, Humidity Ratio as input variables. And Occupancy as output variable. Once modeled the accuracy is obtained by confusion matrix.


Here is the code for Decision Tree Classifier:


#loading the libraries and reading the dataset

import numpy as nm

import pandas as pd


#read the data files

train = pd.read_csv(‘datatraining.txt’)

test = pd.read_csv(‘datatest.txt’)


train.head()


 test.head()

train.shape


test.shape

#looking if the data has any nan values

train.isnull().sum()

test.isnull().sum()


#as there are no Nan values, cleaning of data is not required


x_train = train.iloc[:, 1:6].values

y_train = train.iloc[:,6].values

x_test = test.iloc[:, 1:6].values

y_test = test.iloc[:,6].values


#scaling the input variables

from sklearn.preprocessing import StandardScaler

s_x= StandardScaler()

x_train= s_x.fit_transform(x_train)

x_test= s_x.transform(x_test)


from sklearn.tree import DecisionTreeClassifier

classifier= DecisionTreeClassifier(criterion=’entropy’, random_state=0)

classifier.fit(x_train, y_train)


y_pred= classifier.predict(x_test)

from sklearn.metrics import confusion_matrix

c_matrix = confusion_matrix(y_test, y_pred)


c_matrix



Conclusion: The accuracy is 88.14%. It is a good model.



Comments