Principal component analysis (PCA) is a statistical analysis technique that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Moreover, it has a wide variety of application in machine learning, it can be used to find structure in features and some pre-processing of the machine learning model.
Overall PCA is an ideal candidate to visualize the data along with the reduction of a number of data dimensions.
Data preparation
#Only Four columns (Sepal.Length Sepal.Width Petal.Length Petal.Width) data = iris[,c(1,2,3,4)] class(data)
## [1] "data.frame"
1. Scale data
data.scaled = scale(data, center = TRUE, scale = TRUE) head(data.scaled,5)
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## [1,] -0.8976739 1.01560199 -1.335752 -1.311052 ## [2,] -1.1392005 -0.13153881 -1.335752 -1.311052 ## [3,] -1.3807271 0.32731751 -1.392399 -1.311052 ## [4,] -1.5014904 0.09788935 -1.279104 -1.311052 ## [5,] -1.0184372 1.24503015 -1.335752 -1.311052
2. The correlation matrix
res.cor <- cor(data.scaled) res.cor
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411 ## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259 ## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654 ## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
3. The eigenvectors of the correlation matrix
res.eig <- eigen(res.cor) res.eig
## eigen() decomposition ## $values ## [1] 2.91849782 0.91403047 0.14675688 0.02071484 ## ## $vectors ## [,1] [,2] [,3] [,4] ## [1,] 0.5210659 -0.37741762 0.7195664 0.2612863 ## [2,] -0.2693474 -0.92329566 -0.2443818 -0.1235096 ## [3,] 0.5804131 -0.02449161 -0.1421264 -0.8014492 ## [4,] 0.5648565 -0.06694199 -0.6342727 0.5235971
plot(res.eig$values, col=c("red","orange","green","blue"),type="h",main="Eigen values")

As the first eigenvalue “2.91849782” is largest so it is our first principal component.
4. Let’s compute components by multiplying the transposed scaled matrix and transposed eigenvector matrix.
# Transpose eigeinvectors eigenvectors.t <- t(res.eig$vectors) # Transpose the adjusted data data.scaled.t <- t(data.scaled) # The new dataset data.new <- eigenvectors.t %*% data.scaled.t # Transpose new data ad rename columns data.new <- t(data.new) colnames(data.new) <- c("PC1", "PC2", "PC3", "PC4") head(data.new)
## PC1 PC2 PC3 PC4 ## [1,] -2.257141 -0.4784238 0.12727962 0.024087508 ## [2,] -2.074013 0.6718827 0.23382552 0.102662845 ## [3,] -2.356335 0.3407664 -0.04405390 0.028282305 ## [4,] -2.291707 0.5953999 -0.09098530 -0.065735340 ## [5,] -2.381863 -0.6446757 -0.01568565 -0.035802870 ## [6,] -2.068701 -1.4842053 -0.02687825 0.006586116
barplot(data.new, col = c("red","orange","green","blue"))

plot(data.new, col = c("blue"), main="PC1 vs PC2")

PCA using prcomp function
pca <- prcomp(iris[, -5]) summary(pca)
## Importance of components: ## PC1 PC2 PC3 PC4 ## Standard deviation 2.0563 0.49262 0.2797 0.15439 ## Proportion of Variance 0.9246 0.05307 0.0171 0.00521 ## Cumulative Proportion 0.9246 0.97769 0.9948 1.00000
biplot(pca, col = c("blue","red"),main = "PCA using prcomp")

https://www.rdocumentation.org/packages/stats/versions/3.5.3/topics/prcomp
Note: This is a guest post, and opinion in this article is of the guest writer. If you have any issues with any of the articles posted at www.marktechpost.com please contact at [email protected]