For someone it is a magic, somebody hates its notation (maybe you!), it has some weird rules and maybe it is just a programming language like others (That is also my opinion). As the other programming languages, R has its good and bad properties but I can say it is the best candidate as a
toolbox of a statistician or researchers who work on data analysis.
In this blog post, I collect
8 (from 0 to 7) nice properties of R. As a lecturer and researcher, I experienced that many students are more capable to understand some statistical concepts when I try to show and get them work using Monte Carlo simulations. In R, we are able to write compact codes to demonstrate these concepts which would be difficult to implement in an other programming language. R is not a simple toy, so we are always capable to enhance our knowledge, programming skills and get capabilities of writing better codes by introducing external codes that are written in real programming languages (an old joke of real man which uses C).
So, if it is, why is R awesome ?
0. Syntax of Algol Family
R has a weird assign operator but the remaining part is similar to Algol family languages such as C, C++, Java and C#. R has a similar facility of operator overloading (yes, it is not exactly the operator overloading), in other terms, single or compound character of symbols can be assigned to function names like this:
> '%_%' <- function(a,b){
+ return(exp(a+b))
+ }
> 5 %_% 2
[1] 1096.633
1. Vectors are primitive data types
Yes, vectors are also primitives with an opening and a closing bracket in other members of Algol. In C/C++ they are arrays of primitives and objects in Java. Contrary this, binary operators are directly applicable on the vectors and matrices in R. For example estimation of least squares coefficients is a single line expression in R as:
> assign("x",cbind(1,1:30))
> assign("y",3+3*x[,2]+rnorm(30))
> solve(t(x) %*% x) %*% t(x) %*% y
[,1]
[1,] 2.858916
[2,] 3.003787
This example shows the differences between a scaler and a vector:
1
2
3
4
5
6
7
8
9
10
| > assign("x", c(1,2,3))
> assign("a", 5)
> typeof(x)
[1] "double"
> typeof(a)
[1] "double"
> class(x)
[1] "numeric"
> class(a)
[1] "numeric"
|
No difference!
2. Theorems get alive in minutes
Suppose that X is a random variable that follows an
Exponential Distribution with ratio = 5.
Sum or mean of randomly selected samples with size of N follows a normal distribution. This is an explanation of the
Central Limit Theorem with an example. Theorems are theorems. But you may see a fast demonstration (and probably a proof for educational purposes only) and try to write a rapid application. A process of writing a code like this takes minutes if you use R.
> assign("nsamp", 5000)
> assign("n", 100)
> assign("theta", 5.0)
> assign("sums", rep(0,nsamp))
>
> for (i in 1:nsamp){
+ sums[i] <- sum(rexp(n = n, rate = theta))
+ }
> hist(sums)
3. There is always a second plan for faster code
Now suppose that we are drawing
50,000 samples randomly using the code above. What would be the computation time?
> assign("nsamp", 50000)
> assign("n", 100)
> assign("theta", 5.0)
> assign("sums", rep(0,nsamp))
>
> s <- system.time(
+ for (i in 1:nsamp){
+ sums[i] <- sum(rexp(n = n, rate = theta))
+ }
+ )
>
> print(s)
user system elapsed
0.582 0.000 0.572
Drawing 50,000 samples with size 100 takes 0.582 seconds. Is it now fast enough? Lets try to write it in C++ !
#include <Rcpp.h>
using namespace Rcpp;NumericVector CalculateRandomSums(int m, int n) {
NumericVector result(m);
int i;
for (i = 0; i < m; i++){
result[i] = sum(rexp(n, 5.0));
}
return(result);
}
After compiling the code within Rcpp, we can call the function
CalculateRandomSums() from R.
> s <- system.time(
+ vect <- calculaterandomsums(50000,100)
> print(s)
user system elapsed
0.185 0.000 0.184
Now our R code is
3.145946 times slower than the code written in C++.
4. Interaction with C/C++/Fortran is enjoyable
Since a huge amount of R is written in C, migration of old C libraries is easy by writing wrapper methods using SEXP data types.
Rcpp masks these routines in a clever way. Fortran code is also
linkable. Interaction with other languages makes use of old libraries in R and enables the possibility of writing faster new libraries. It is also possible to create instances of R in C and C++ applications.
For an enjoyable example, have a look at the section 3. There is always a second plan for faster code.
The R package
eive includes a small portion of C++ code and it is a compact example of calling C++ functions from within R. Accessing C++ objects from R is also possible thank to Rcpp.
Click here to see the explanation and an example.
5. Interaction with Java
Calling Java from R (
rJava) and calling R from Java (
JRI,
RCaller) are all possible.
Renjin has a different concept as it is the R interpreter written in Java (Another possibility of calling R from Java , huh?). A detailed comparison of these method is given
in this documentation and
this.
6. Sophisticated variable scoping
In R, functions have their own variable scopes and accessing variables at the top level is possible. Addition to this, variable scoping is handled by standard R lists (specially they are called environments) and in any side of code user based environments can be created. For detailed information visit
Environment in R.
7. Optional Object Oriented Programming (O-OOP)
R functions take values of variables as parameters rather than their addresses. If a vector with size of 10,0000 is passed through a function, R first copies this vector then passes it to the function. After body of the function is performed, the copied parameter is then labeled as free for later garbage collecting. As C/C++ programmers know, passing objects with their addresses rather than their values is a good solution for using less memory and spending less computation time. Reference classes in R are passed to functions with their addresses in a way similar to passing C++ references and Java objects to functions and methods:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| Person <- setRefClass(
Class = "Person",
fields = c("name","surname","email"),
methods = list(
initialize = function(name, surname, email){
.self$name <- name
.self$surname <- surname
.self$email <- email
},
setName = function(name){
.self$name <- name
},
setSurname = function(surname){
.self$surname <- surname
},
setEMail = function (email){
.self$email <- email
},
toString = function (){
return(paste(name, " ", surname, " ", email))
}
) # End of methods
) # End of class
p <- Person$new("John","Brown","brown@server.org")
print(p$toString())
|
The output is
[1] "John Brown brown@server.org"
Java and C++ programmers probably like this notation!
Have a nice read!