Programming languages for Machine learning : Julia

From C++ to C– we have more than 250 programming languages but which one is best suited for machine learning? There’s plenty of articles and discussions attempting to answer these questions. The most critical criteria for choosing a language for machine learning often based upon availability of library packages and ease of coding. That may be the reason for Python being the most popular language for machine learning. Python language is easy to start at first place and lots of popular packages are also available for example Tensorflow, Pytorch, and Theano. Some Python-based packages and their installation in Ubuntu have been also reviewed previously (Eight Deep learning Software Libraries & Their Installation on Ubuntu).

Other programming languages for machine learning

There are several programing languages which are also popular among developers.

  • Julia
  • R
  • C/C++
  • JavaScript
  • Scala
  • Ruby
  • Octave
  • MATLAB
  • SAS
Julia

Julia is designed for high-performance computing and growing very fast. Using LLVM is compiles code to efficient native code for multiple platforms. The micro-benchmarks from the official web page is quite impressive which comparing Julia with major flagship languages.

(Image source : https://julialang.org/images/benchmarks.svg )

Example: k-means clustering

Here is the snippet of code for k-means clustering using Julia. In this example, Iris flower data set is used.

1. import packages

#For data (Iris data)
using RDatasets

#For visualization
using Gadfly

#For machine learning (clustering)
using Clustering

2. loading data
iris = dataset("datasets", "iris")
head(iris)
6×5 DataFrames.DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa  │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa  │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ setosa  │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ setosa  │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ setosa  │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ setosa  │
3. Clustering

features = permutedims(convert(Array, iris[:,1:4]), [2, 1])
#Three clusters
result = kmeans( features, 3 )

4. Print results
println("Centers: ", result.centers)
println("Assignments: ", result.assignments)
println("Costs: ", result.costs)
println("Counts: ", result.counts)
println("Weights: ", result.cweights)
println("Total cost: ", result.totalcost)
println("Iterations: ", result.iterations)
println("Converged: ", result.converged)

 

Centers: [6.85385 5.006 5.88361; 3.07692 3.428 2.74098; 5.71538 1.462 4.38852; 2.05385 0.246 1.43443]
Assignments: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 3, 1, 3, 1, 3, 1, 1, 3, 3, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 3]
Costs: [0.01998, 0.20038, 0.17398, 0.27598, 0.03558, 0.45838, 0.17238, 0.00438, 0.65198, 0.14158, 0.23278, 0.06438, 0.25078, 0.83398, 1.02838, 1.45158, 0.42798, 0.02078, 0.67958, 0.15158, 0.21478, 0.10798, 0.40998, 0.14638, 0.23718, 0.20438, 0.04358, 0.04638, 0.04438, 0.16678, 0.17118, 0.18118, 0.51198, 0.84598, 0.12238, 0.12278, 0.27758, 0.06598, 0.57878, 0.01318, 0.03438, 1.55758, 0.44758, 0.14958, 0.36278, 0.23238, 0.16838, 0.22278, 0.16398, 0.02238, 1.49503, 0.494085, 0.97426, 0.510642, 0.432446, 0.0676915, 0.610314, 2.45736, 0.601298, 0.708839, 2.30802, 0.1072, 0.645888, 0.170314, 0.74556, 0.796708, 0.164249, 0.280642, 0.409495, 0.489003, 0.513921, 0.219331, 0.497364, 0.202282, 0.317855, 0.581626, 1.01376, 0.666568, 0.0555603, 1.03179, 0.721626, 0.926216, 0.302282, 0.548839, 0.317691, 0.487691, 0.896708, 0.386052, 0.248839, 0.374249, 0.222118, 0.159823, 0.232773, 2.34884, 0.135724, 0.191298, 0.11261, 0.151298, 2.71195, 0.138511, 0.636568, 0.731626, 0.102722, 0.415799, 0.159645, 1.34734, 1.10851, 0.636568, 0.428107, 0.740414, 0.521953, 0.54426, 0.0573373, 0.78556, 1.449, 0.45426, 0.241953, 2.22964, 2.40734, 0.684413, 0.0781065, 0.665396, 1.76503, 0.570314, 0.0757988, 0.280414, 0.406544, 0.509167, 0.298107, 0.338876, 0.546568, 2.08888, 0.317337, 0.687364, 1.23657, 0.930414, 0.54426, 0.317337, 0.383593, 0.10426, 0.157337, 0.441953, 0.731626, 0.112722, 0.272722, 0.355799, 0.822118, 0.399645, 0.691953, 0.7072]
Counts: [39, 50, 61]
Weights: [39.0, 50.0, 61.0]
Total cost: 78.85566582597737
Iterations: 9
Converged: true
5. Plotting clusters
p = plot(iris, x = "PetalLength", y = "PetalWidth", color = result.assignments, Geom.point)
img = SVG("iris_plot.svg", 14cm, 8cm)
draw(img, p)

I am Nilesh Kumar, a graduate student at the Department of Biology, UAB under the mentorship of Dr. Shahid Mukhtar. I joined UAB in Spring 2018 and working on Network Biology. My research interests are Network modeling, Mathematical modeling, Game theory, Artificial Intelligence and their application in Systems Biology.

I graduated with master’s degree “Master of Technology, Information Technology (Specialization in Bioinformatics)” in 2015 from Indian Institute of Information Technology Allahabad, India with GATE scholarship. My Master’s thesis was entitled “Mirtron Prediction through machine learning approach”. I worked as a research fellow at The International Centre for Genetic Engineering and Biotechnology, New Delhi for two years.