Saturday, September 17, 2011

RCaller: Support for sequential commands with a single process

I think, this revision will be the foundation of the version  2.1. RCaller is supposed to be slow but the easiest way of calling R from Java.

Finally I have implemented the method runAndReturnResultOnline() for running sequential commands in a single process. What does this stand for? Let me give an example to explain this:

Suppose that you want to perform a simulation study to measure the success of your new procedure. For this, you decide to draw random numbers from a distribution and calculate something and handle the results in Java. RCaller creates  Rscript processes for each single iteration. This cause to too many operating system calls.

Latest release of RCaller includes the method for this. Lets have a look at the Test file:


@Test
  public void onlineCalculationTest() {
    RCaller rcaller = new RCaller();
    rcaller.setRExecutable("/usr/bin/R");
    rcaller.cleanRCode();
    rcaller.addRCode("a<-1:10");
    rcaller.runAndReturnResultOnline("a");
    assertEquals(rcaller.getParser().getAsIntArray("a")[0], 1);

    rcaller.cleanRCode();
    rcaller.addRCode("b<-1:10");
    rcaller.addRCode("m<-mean(b)");
    rcaller.runAndReturnResultOnline("m");
    assertEquals(rcaller.getParser().getAsDoubleArray("m")[0], 5.5, 0.000001);

    rcaller.cleanRCode();
    rcaller.addRCode("a<-1:99");
    rcaller.addRCode("k<-median(a)");
    rcaller.runAndReturnResultOnline("k");
    assertEquals(rcaller.getParser().getAsDoubleArray("k")[0], 50.0, 0.000001);
  }
  }
 

In first stage,we are creating an integer vector and getting the first element. In the second one, we are creating the same integer vector with a different name and calculating the arithmetic mean. In the last one, we are recreating the vector a and getting the median, which is equal to 50.

This example uses the same RCaller object. In first stage, the R executable file (it is /usr/bin/R in my Ubuntu Linux) is created once. In second stage the same R file is used and no longer processes are created again. In this stage, the vector a is accessible and still remains alive. At the last stage, b is alive again and a is recreated. So this example does not cause the R to open and close three times but only once.

This modification speeds up the RCaller, but it can be still considered as slow.
However, it is still easy to implement and much more faster than the previous implementation.

Have Fun!


Thursday, September 15, 2011

Handling R lists with RCaller 2.0

Since RCaller creates an Rscript process for each single run, it is said to be in-efficient for most cases. But there are useful non-hack methods to improve the method. Suppose that your aim is to calculate medians of two double vector like this:












@Test
  public void singleResultTest() {
    double delta = 0.0000001;
    RCaller rcaller = new RCaller();
    rcaller.setRscriptExecutable("/usr/bin/Rscript");
    rcaller.cleanRCode();
    rcaller.addRCode("x <- c(6 ,8, 3.4, 1, 2)");
    rcaller.addRCode("med <- median(x)");

    rcaller.runAndReturnResult("med");

    double[] result = rcaller.getParser().getAsDoubleArray("med");

    assertEquals(result[0], 3.4, delta);
  }

However, this example considers only computing the median of x, effort for computing medians of three variables needs three process which is very slow. Lists are "vector of vector" objects but they are different from matrices. A list object in R can handle several types of vector with their names. For example


alist <- list (
s = c("string1", "string2", "string3") , 
i = c(5,4,7,6),
d = c(5.5, 6.7, 8.9)
)
 

the list object alist is formed by three different kind of vectors: string vector s, integer vector i and double vector d. Also their names are s, i and d, respectively. Accessing elements of this list is straightforward. There are two ways to access to elements. First one is conventional way using indices. When the example above runs, strvec is set to String vector s.



alist <- list (
strvec <- alist[1]
While a list object can handle R objects with their names, we can handle more than more result in a single RCaller run. Back to our example, we wanted to calculate medians of three double vectors in a single run.
@Test
  public void TestLists2()throws Exception {
    double delta = 0.0000001;
    RCaller rcaller = new RCaller();
    rcaller.setRscriptExecutable("/usr/bin/Rscript");
    rcaller.cleanRCode();
    rcaller.addRCode("x <- c(6 ,8, 3.4, 1, 2)");
    rcaller.addRCode("med1 <- median(x)");

    rcaller.addRCode("y <- c(16 ,18, 13.4, 11,12)");
    rcaller.addRCode("med2 <- median(y)");

    rcaller.addRCode("z <- c(116 ,118, 113.4,111,112)");
    rcaller.addRCode("med3 <- median(z)");

    rcaller.addRCode("results <- list(m1 = med1, m2 = med2, m3 = med3)");

    rcaller.runAndReturnResult("results");

    double[] result = rcaller.getParser().getAsDoubleArray("m1");
    assertEquals(result[0], 3.4, delta);

    result = rcaller.getParser().getAsDoubleArray("m2");
    assertEquals(result[0], 13.4, delta);

    result = rcaller.getParser().getAsDoubleArray("m3");
    assertEquals(result[0], 113.4, delta);
  }
This code passes the tests. By the result at hand, we have three medians of three different vectors with one pass calculation. With this way, an huge number of vectors can be accepted as a result from R and this method may be considered efficient... these test files were integrated to source structure of project in http://code.google.com/p/rcaller/

hope works!

Wednesday, September 14, 2011

about the current Internet Connection in Turkey


30 minutes ago... Something happened. And now: Turkey has a very very slow internet access for the sites out of Turkey. Probably there is something wrong. I wanna share with all quickly.

I checked some news on some web sites and telecommunication companies but I can not find anything about it. Everything is possible. But somebody want to know the reason. :)

Wednesday, September 7, 2011

Embedding R in Java Applications using Renjin

Effort of embedding R in other languages is not a short history for programmers. Rserve, Rjava, RCaller and Renjin are prominent efforts for doing this. Their approaches are completely different. RServe opens server sockets and listens for connections whatever the client is. It uses its own protocol to communicate with clients and it passes commands to R which were sent by clients. This is the neatest idea for me.

RJava uses the JNI (Java Native Library) way to interoperate R and Java. This is the most common and intuitive way for me.

RCaller sends commands to R interpreter by creating a process for each single call. Then it handles the results as XML and parses it. It is the easiest and the most in-efficient way of calling R from Java. But it works.

And finally, Renjin, is a re-implementation of R for the Java Virtual Machine. I think, this will be the most rational way of calling R from Java because it is something like

Renjin,
is not for calling R from Java,
is for calling itself and maybe it can be said that: it is for calling java from java :),
for Java programmers who aimed to use R in their projects


So that is why I participated this project. External function calls are always make pain whatever the way you use.

Renjin is an R implementation in Java.

I think all these paragraphs tell the whole story.

How can we embed Renjin to our Java projects? Lets do something... But we have some requirements:

  1. renjin-core-0.1.2-SNAPSHOT.jar (Download from http://code.google.com/p/renjin/wiki/Downloads?tm=2)
  2. commons-vfs-1.0.jar (Part of apache commons)
  3. commons-logging-1.1.1.jar (Part of apache commons)
  4. guava-r07.jar (http://code.google.com/p/guava-libraries/downloads/list)
  5. commons-math-2.1.jar (Part of apache commons)

Ok. These are the renjin and required Jar files. Lets evaluate the R expression "x<-1:10" which creates a vector of integers from one to ten. Tracking the code is straightforward.
package renjincall;



import java.io.StringReader;

import r.lang.Context;

import r.lang.SEXP;

import r.parser.ParseOptions;

import r.parser.ParseState;

import r.parser.RLexer;

import r.parser.RParser;

import r.lang.EvalResult;



public class RenjinCall {



  public RenjinCall() {

    Context topLevelContext = Context.newTopLevelContext();

    try {

      topLevelContext.init();

    } catch (Exception e) {

    }

    StringReader reader = new StringReader("x<-1:10\n");
    ParseOptions options = ParseOptions.defaults();
    ParseState state = new ParseState();
    RLexer lexer = new RLexer(options, state, reader);
    RParser parser = new RParser(options, state, lexer);
    try {
      parser.parse();
    } catch (Exception e) {
      System.out.println("Cannot parse: " + e.toString());
    }
    SEXP result = parser.getResult();
    System.out.println(result);
  }

  public static void main(String[] args) {
    new RenjinCall();
  }
}



We are initializing the library, creating the lexer and the parser and hadling the result as a SEXP. Finally we are printing the SEXP object (not itself, its String representation)


<-(x, :(1.0, 10.0))
This is the parsed version of our "x<-1:10", it contains the same amount of information but it is a little bit different in form. Since we only parsed the content but it has not been evaluated. Track the code:
EvalResult eva = result.evaluate(topLevelContext, topLevelContext.getEnvironment());
System.out.println(eva.getExpression().toString());


Now, the output is

c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

and this is the well known representation of R integer vectors. Of course printing the result in String format is not all the work. We would handle the elements of this array rather than print it. Lets do some work on it:

IntVector vector = (IntVector) eva.getExpression();
    for (int i = 0; i < vector.length(); i++) {
      System.out.println(
i + ". element of this vector is: " + vector.getElementAsInt(i)
);
    }

IntVector is defined in renjin core library and is for handling integer vectors. We simple used the .length() and .getElementAsInt() methods like using Java's ArrayList class. Finally the result is

0. element of this vector is: 1
1. element of this vector is: 2
2. element of this vector is: 3
3. element of this vector is: 4
4. element of this vector is: 5
5. element of this vector is: 6
6. element of this vector is: 7
7. element of this vector is: 8
8. element of this vector is: 9
9. element of this vector is: 10

It is nice, hah?

Monday, September 5, 2011

Online R Interpreter - Under development

This is the online R interpreter, Renjin, the Java implementation of the popular statistical programme. Note that it is under development and it includes unimplemented functionality and bugs. But it may be nice to try it online and you can report some bugs or join this project. Link is http://renjindemo.appspot.com/