RANDOM FOREST CLASSIFIER

Random forest algorithm contains many decision trees on various subsets and take the average of output of all trees to improve the accuracy. The more the number of trees, the more is the accuracy. It is a concept of ensemble learning.

I used the dataset from : https://data.world/uci/occupancy-detection

For training the model I used training dataset and for testing I used test dataset.

I used Temperature, Humidity, Light, CO2, Humidity Ratio as input variables. And Occupancy as output variable. Once modeled the accuracy is obtained by confusion matrix.

Here is the code for Random Forest Classifier:

import numpy as nm

import matplotlib.pyplot as mtp

import pandas as pd

#read the data files

train = pd.read_csv('datatraining.txt')

test = pd.read_csv('datatest.txt')

train.head()

test.head()

train.shape

test.shape

#looking if the data has any nan values

train.isnull().sum()

test.isnull().sum()

#as there are no Nan values, cleaning of data is not required

x_train = train.iloc[:, 1:6].values

y_train = train.iloc[:,6].values

x_test = test.iloc[:, 1:6].values

y_test = test.iloc[:,6].values

#scaling the input variables

from sklearn.preprocessing import StandardScaler

s_x= StandardScaler()

x_train= s_x.fit_transform(x_train)

x_test= s_x.transform(x_test)

from sklearn.ensemble import RandomForestClassifier

classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")

classifier.fit(x_train, y_train)

y_pred= classifier.predict(x_test)

from sklearn.metrics import confusion_matrix

c_matrix = confusion_matrix(y_test, y_pred)

c_matrix

Conclusion:

Accuracy = (TP+TN)/TOTAL = (1638+882)/2665 = 0.94559 i.e 94.56%

The accuracy can be increased with increase in n_estimators (decision trees) in the RandomForestClassifier command.

Search This Blog

MACHINE LEARNING AND STATISTICS WORLD

RANDOM FOREST CLASSIFIER

Comments

Post a Comment