DECISION TREE CLASSIFICATION MODEL
A decision tree classifier is a supervised machine learning technique. It has four main parts in its structure :
1.Root nodes:
The main node of tree where the decision tree starts.
2.Branches
Branches divide a decision node into a sub tree.
3.Leaf nodes:
Leaf nodes are the final output notes which cannot be further divided into any nodes.
4.Decision nodes
Decision nodes are the nodes which can be further divided into sub trees. The decision nodes are further divided into other decision nodes or leaf nodes.
For modeling I used the dataset from : https://data.world/uci/occupancy-detection
For training the model I used training dataset and for testing I used test dataset.
I used Temperature, Humidity, Light, CO2, Humidity Ratio as input variables. And Occupancy as output variable. Once modeled the accuracy is obtained by confusion matrix.
Here is the code for Decision Tree Classifier:
#loading the libraries and reading the dataset
import numpy as nm
import pandas as pd
#read the data files
train = pd.read_csv(‘datatraining.txt’)
test = pd.read_csv(‘datatest.txt’)
train.head()
test.head()
train.shape
test.shape
#looking if the data has any nan values
train.isnull().sum()
test.isnull().sum()
#as there are no Nan values, cleaning of data is not required
x_train = train.iloc[:, 1:6].values
y_train = train.iloc[:,6].values
x_test = test.iloc[:, 1:6].values
y_test = test.iloc[:,6].values
#scaling the input variables
from sklearn.preprocessing import StandardScaler
s_x= StandardScaler()
x_train= s_x.fit_transform(x_train)
x_test= s_x.transform(x_test)
from sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion=’entropy’, random_state=0)
classifier.fit(x_train, y_train)
y_pred= classifier.predict(x_test)
from sklearn.metrics import confusion_matrix
c_matrix = confusion_matrix(y_test, y_pred)
c_matrix
Conclusion: The accuracy is 88.14%. It is a good model.




Comments
Post a Comment