Programming languages for Machine learning : R

0
780
Photo Credit: Unsplash.com

From C++ to C– we have (in 2018) more than 250 programming languages and many more will emerge but which one is best suited for machine learning? There’s plenty of articles and discussions attempting to answer these questions. Form our last review (Programming languages for Machine learning : Julia) we demonstrated an example of machine learning application, k-means, to make our self-aware and familiarize with the syntax of Julia in context with machine learning. Which included most critical criteria for choosing a language for machine learning that is the availability of library packages, ease of coding and ease of visualization. That may be the reason for Python being the most popular language for machine learning. Python language is easy to start at first place and lots of popular packages are also available for example Tensorflow, Pytorch, and Theano. Some Python-based packages and their installation in Ubuntu have been also reviewed previously (Eight Deep learning Software Libraries & Their Installation on Ubuntu). But, sometimes it becomes difficult to switch language for a specific purpose and we look for alternatives in the language in which we comfortable.

There are several programming languages which are also popular among machine learning developers.

  • Julia
  • R
  • C/C++
  • JavaScript
  • Scala
  • Ruby
  • Octave
  • MATLAB
  • SAS
Image result for R language
R language

R is a language and environment primarily developed for statistical computing and graphics. It is a open source project and is similar to the S language which was developed at AT&T, now Lucent Technologies by John Chambers and colleagues.

It provides a wide spectrum of statistical and graphical techniques, and is highly extensible. Among all most important strength of R are the ease with well-designed publication-quality plots, including mathematical symbols and formulae. This may be the reason this very popular among computational biologist and bioinformaticians. It compiles and runs on a wide variety of operating systems including Windows and MacOS, Linux and on Android (A kind of linux).

At present the latest version of R is Feather Spray) R-3.5.1 and can be downloaded via CRAN. 

Example: k-means clustering

Here is the snippet of code for k-means clustering using Julia. In this example, Iris flower data set is used.

library(datasets)
head(iris)
Sepal.Length
<dbl>
Sepal.Width
<dbl>
Petal.Length
<dbl>
Petal.Width
<dbl>
Species
<fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

 

library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()

Figure 1. 

 

 

 

 

 

 

 

 

set.seed(20)
irisCluster <- kmeans(iris[, 3:4], 3, nstart = 20)
irisCluster
K-means clustering with 3 clusters of sizes 50, 52, 48

Cluster means:
  Petal.Length Petal.Width
1     1.462000    0.246000
2     4.269231    1.342308
3     5.595833    2.037500

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [75] 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3
[112] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3
[149] 3 3

Within cluster sum of squares by cluster:
[1]  2.02200 13.05769 16.29167
 (between_SS / total_SS =  94.3 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"

 

print(irisCluster$cluster)
print(irisCluster$centers)
print(irisCluster$size)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [75] 2 2 2 3 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 3 3 3 3
[112] 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3
[149] 3 3
  Petal.Length Petal.Width
1     1.462000    0.246000
2     4.269231    1.342308
3     5.595833    2.037500
[1] 50 52 48

 

print(table(irisCluster$cluster, iris$Species))
    setosa versicolor virginica
  1     50          0         0
  2      0         48         4
  3      0          2        46
irisCluster$cluster <- as.factor(irisCluster$cluster)
ggplot(iris, aes(Petal.Length, Petal.Width, color = irisCluster$cluster)) + geom_point()

Figure 2. 

 

 


Note: This is a guest post, and opinion in this article is of the guest writer. If you have any issues with any of the articles posted at www.marktechpost.com please contact at asif@marktechpost.com

 

Previous articleThe Math Behind Machine Learning
Next articleDo you want to become a Machine Learning Engineer?
Nilesh Kumar
I am Nilesh Kumar, a graduate student at the Department of Biology, UAB under the mentorship of Dr. Shahid Mukhtar. I joined UAB in Spring 2018 and working on Network Biology. My research interests are Network modeling, Mathematical modeling, Game theory, Artificial Intelligence and their application in Systems Biology. I graduated with master’s degree “Master of Technology, Information Technology (Specialization in Bioinformatics)” in 2015 from Indian Institute of Information Technology Allahabad, India with GATE scholarship. My Master’s thesis was entitled “Mirtron Prediction through machine learning approach”. I worked as a research fellow at The International Centre for Genetic Engineering and Biotechnology, New Delhi for two years.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.