RANDOM FOREST CLASSIFIER

Random forest algorithm contains many decision trees on various subsets and take the average of output of all trees to improve the accuracy. The more the number of trees, the more is the accuracy. It is a concept of ensemble learning.

I used the dataset from : https://data.world/uci/occupancy-detection

For training the model I used training dataset and for testing I used test dataset. 

I used Temperature, Humidity, Light, CO2, Humidity Ratio as input variables. And Occupancy as output variable. Once modeled the accuracy is obtained by confusion matrix.

Here is the code for Random Forest Classifier:


import numpy as nm

import matplotlib.pyplot as mtp

import pandas as pd

#read the data files

train = pd.read_csv('datatraining.txt')

test = pd.read_csv('datatest.txt')

train.head()

test.head()


train.shape

test.shape

#looking if the data has any nan values

train.isnull().sum()

test.isnull().sum()



#as there are no Nan values, cleaning of data is not required


x_train = train.iloc[:, 1:6].values
y_train = train.iloc[:,6].values
x_test = test.iloc[:, 1:6].values
y_test = test.iloc[:,6].values

#scaling the input variables

from sklearn.preprocessing import StandardScaler
s_x= StandardScaler()
x_train= s_x.fit_transform(x_train)
x_test= s_x.transform(x_test)

from sklearn.ensemble import RandomForestClassifier
classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")
classifier.fit(x_train, y_train)

y_pred= classifier.predict(x_test)

from sklearn.metrics import confusion_matrix
c_matrix = confusion_matrix(y_test, y_pred)

c_matrix


Conclusion:
Accuracy = (TP+TN)/TOTAL = (1638+882)/2665 = 0.94559 i.e 94.56%
The accuracy can be increased with increase in n_estimators (decision trees) in the RandomForestClassifier command.

Comments