Posts

LINEAR DISCRIMINANT ANALYSIS IN PYTHON

Image
 Linear discriminant analysis is a supervised dimensionality reduction algorithm. When dealing with large data with lot of features, it becomes difficult to compute and hence we opt for dimensionality reduction methods. The dataset I used is seed dataset from :  https://archive.ics.uci.edu/ml/datasets/seeds Here is the code: import pandas as pd import numpy as np import matplotlib.pyplot as plt #loading the dataset df = pd.read_csv('seeds_dataset.csv') df.head() X = df.iloc[:, 1:8].values y = df.iloc[:, 8].values #training the model from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0) #standardizing the values from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) #performing LDA from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA lda = LDA(n_components = 2) X_train = lda.fit_trans...

K-MEANS CLUSTERING

Image
 K-Means clustering is an unsupervised centroid based algorithm. The algorithm tends to reduce the distance between the points in a cluster and the cluster centroid. The dataset I used is seeds dataset from :  https://archive.ics.uci.edu/ml/datasets/seeds import pandas as pd import numpy as np import matplotlib.pyplot as plt #loading dataset df = pd.read_csv('seeds_dataset.csv') df.head() #taking compactness and perimeter columns z= df.iloc[:,[2,3]].values #applying elbow method to find the maximum number of clusters from sklearn.cluster import KMeans   elbow_list= []  for i in range(1, 11):       kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)       kmeans.fit(z)       elbow_list.append(kmeans.inertia_)   plt.plot(range(1, 11), elbow_list)   plt.title('The Elbow Method Graph')   plt.xlabel('Number of clusters(k)')   plt.ylabel('elbow_list')...

SIMULATION OF AUTOREGRESSIVE PROCESS AR(2) in R

Image
 Here is the code in R for AR(2) process: set.seed(2017) X.ts <- arima.sim(list(ar = c(.7, .2)), n=1000) par(mfrow=c(2,1)) plot(X.ts,main="AR(2) Time Series, phi1=.7, phi2=.2") X.acf = acf(X.ts, main="Autocorrelation of AR(2) Time Series")

SIMULATION OF AUTOREGRESSIVE PROCESS AR(1) IN R

Image
Here is the code in R for AR(1): set.seed(20190)  n=10000  phi = .6 Z = rnorm(n,0,1)  X=NULL  X[1] = Z[1]  for (t in 2:n) { X[t] = Z[t] + phi*X[t-1]  } X.ts = ts(X) par(mfrow=c(2,1)) plot(X.ts,main="AR(1) Time Series on White Noise, phi=.6") X.acf = acf(X.ts, main="AR(1) Time Series on White Noise, phi=.6")

SIMULATION OF MOVING AVERAGE PROCESS IN R

Image
  Here is the code in R for simulation of moving average: #simulating MA(3) process noise = rnorm(10000) ma3= NULL for(i in 4:10000) { ma3[i] = noise[i] + 0.8*noise[i-1] + 0.5*noise[i-2] + 0.3*noise[i-3] } moving_average = ma3[4:10000] #changing the series into time series moving_average = ts(moving_average) par(mfrow=c(2,1)) plot(moving_average, col='blue') acf(moving_average) Conclusion: We observe the lag cuts off at 3 in the autocorrelation graph showing that the process is a MA(3) 

SIMULATION OF A RANDOM WALK IN R

Image
Here is the code in R for simulation of Random walk : x=NULL x[1]=0 for( i in 2:10000) { x[i]=x[i-1] + rnorm(1) } print(x) #converting it into a time series data random_walk = ts(x) plot(random_walk, main='visualization of a random work' , xlab='days', ylab=' ') acf(random_walk) As we see there is a high correlation in the correlogram, the random walk is a non-stationary process.  # making the series stationary by differencing the values z<-diff(random_walk) plot(z)  # we get white noise acf(z) Conclusion :  We observe that there is no lag and hence no correlation. Thus we obtained stationary series by differencing the time series.

ESTIMATION OF PI USING MONTE CARLO METHOD USING PYTHON

Image
  The value of pi is calculated using monte carlo method by taking a square of 1 unit and inscribing a circle in the square. The radius of circle is 0.5 units. Now the ratio of area of circle to the ratio of square multiplied by 4 gives us pi. Python code: import random n=1000000 c_points=0  #points inside circle s_points=0  #points inside square for i in range(n):     x = random.uniform(0,1)     y = random.uniform(0,1)     d = x**2 + y**2     if d<=1 :         c_points +=1     s_points +=1     pi = 4*(c_points/s_points)      print("pi value is:", pi) Conclusion: Higher the value of n, higher is the accuracy of value of pi.