Random forest algorithm contains many decision trees on various subsets and take the average of output of all trees to improve the accuracy. The more the number of trees, the more is the accuracy. It is a concept of ensemble learning.
I used the dataset from : https://data.world/uci/occupancy-detection
For training the model I used training dataset and for testing I used test dataset.
I used Temperature, Humidity, Light, CO2, Humidity Ratio as input variables. And Occupancy as output variable. Once modeled the accuracy is obtained by confusion matrix.
Here is the code for Random Forest Classifier:
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
#read the data files
train = pd.read_csv('datatraining.txt')
test = pd.read_csv('datatest.txt')
train.head()
test.head()
train.shape
test.shape
#looking if the data has any nan values
train.isnull().sum()
test.isnull().sum()
#as there are no Nan values, cleaning of data is not required
x_train = train.iloc[:, 1:6].values
y_train = train.iloc[:,6].values
x_test = test.iloc[:, 1:6].values
y_test = test.iloc[:,6].values
#scaling the input variables
from sklearn.preprocessing import StandardScaler
s_x= StandardScaler()
x_train= s_x.fit_transform(x_train)
x_test= s_x.transform(x_test)
from sklearn.ensemble import RandomForestClassifier
classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy")
classifier.fit(x_train, y_train)
y_pred= classifier.predict(x_test)
from sklearn.metrics import confusion_matrix
c_matrix = confusion_matrix(y_test, y_pred)
c_matrix
Conclusion:
Accuracy = (TP+TN)/TOTAL = (1638+882)/2665 = 0.94559 i.e 94.56%
The accuracy can be increased with increase in n_estimators (decision trees) in the RandomForestClassifier command.
Comments
Post a Comment