Thursday, September 15, 2011

Handling R lists with RCaller 2.0

Since RCaller creates an Rscript process for each single run, it is said to be in-efficient for most cases. But there are useful non-hack methods to improve the method. Suppose that your aim is to calculate medians of two double vector like this:












@Test
  public void singleResultTest() {
    double delta = 0.0000001;
    RCaller rcaller = new RCaller();
    rcaller.setRscriptExecutable("/usr/bin/Rscript");
    rcaller.cleanRCode();
    rcaller.addRCode("x <- c(6 ,8, 3.4, 1, 2)");
    rcaller.addRCode("med <- median(x)");

    rcaller.runAndReturnResult("med");

    double[] result = rcaller.getParser().getAsDoubleArray("med");

    assertEquals(result[0], 3.4, delta);
  }

However, this example considers only computing the median of x, effort for computing medians of three variables needs three process which is very slow. Lists are "vector of vector" objects but they are different from matrices. A list object in R can handle several types of vector with their names. For example


alist <- list (
s = c("string1", "string2", "string3") , 
i = c(5,4,7,6),
d = c(5.5, 6.7, 8.9)
)
 

the list object alist is formed by three different kind of vectors: string vector s, integer vector i and double vector d. Also their names are s, i and d, respectively. Accessing elements of this list is straightforward. There are two ways to access to elements. First one is conventional way using indices. When the example above runs, strvec is set to String vector s.



alist <- list (
strvec <- alist[1]
While a list object can handle R objects with their names, we can handle more than more result in a single RCaller run. Back to our example, we wanted to calculate medians of three double vectors in a single run.
@Test
  public void TestLists2()throws Exception {
    double delta = 0.0000001;
    RCaller rcaller = new RCaller();
    rcaller.setRscriptExecutable("/usr/bin/Rscript");
    rcaller.cleanRCode();
    rcaller.addRCode("x <- c(6 ,8, 3.4, 1, 2)");
    rcaller.addRCode("med1 <- median(x)");

    rcaller.addRCode("y <- c(16 ,18, 13.4, 11,12)");
    rcaller.addRCode("med2 <- median(y)");

    rcaller.addRCode("z <- c(116 ,118, 113.4,111,112)");
    rcaller.addRCode("med3 <- median(z)");

    rcaller.addRCode("results <- list(m1 = med1, m2 = med2, m3 = med3)");

    rcaller.runAndReturnResult("results");

    double[] result = rcaller.getParser().getAsDoubleArray("m1");
    assertEquals(result[0], 3.4, delta);

    result = rcaller.getParser().getAsDoubleArray("m2");
    assertEquals(result[0], 13.4, delta);

    result = rcaller.getParser().getAsDoubleArray("m3");
    assertEquals(result[0], 113.4, delta);
  }
This code passes the tests. By the result at hand, we have three medians of three different vectors with one pass calculation. With this way, an huge number of vectors can be accepted as a result from R and this method may be considered efficient... these test files were integrated to source structure of project in http://code.google.com/p/rcaller/

hope works!

Wednesday, September 14, 2011

about the current Internet Connection in Turkey


30 minutes ago... Something happened. And now: Turkey has a very very slow internet access for the sites out of Turkey. Probably there is something wrong. I wanna share with all quickly.

I checked some news on some web sites and telecommunication companies but I can not find anything about it. Everything is possible. But somebody want to know the reason. :)

Wednesday, September 7, 2011

Embedding R in Java Applications using Renjin

Effort of embedding R in other languages is not a short history for programmers. Rserve, Rjava, RCaller and Renjin are prominent efforts for doing this. Their approaches are completely different. RServe opens server sockets and listens for connections whatever the client is. It uses its own protocol to communicate with clients and it passes commands to R which were sent by clients. This is the neatest idea for me.

RJava uses the JNI (Java Native Library) way to interoperate R and Java. This is the most common and intuitive way for me.

RCaller sends commands to R interpreter by creating a process for each single call. Then it handles the results as XML and parses it. It is the easiest and the most in-efficient way of calling R from Java. But it works.

And finally, Renjin, is a re-implementation of R for the Java Virtual Machine. I think, this will be the most rational way of calling R from Java because it is something like

Renjin,
is not for calling R from Java,
is for calling itself and maybe it can be said that: it is for calling java from java :),
for Java programmers who aimed to use R in their projects


So that is why I participated this project. External function calls are always make pain whatever the way you use.

Renjin is an R implementation in Java.

I think all these paragraphs tell the whole story.

How can we embed Renjin to our Java projects? Lets do something... But we have some requirements:

  1. renjin-core-0.1.2-SNAPSHOT.jar (Download from http://code.google.com/p/renjin/wiki/Downloads?tm=2)
  2. commons-vfs-1.0.jar (Part of apache commons)
  3. commons-logging-1.1.1.jar (Part of apache commons)
  4. guava-r07.jar (http://code.google.com/p/guava-libraries/downloads/list)
  5. commons-math-2.1.jar (Part of apache commons)

Ok. These are the renjin and required Jar files. Lets evaluate the R expression "x<-1:10" which creates a vector of integers from one to ten. Tracking the code is straightforward.
package renjincall;



import java.io.StringReader;

import r.lang.Context;

import r.lang.SEXP;

import r.parser.ParseOptions;

import r.parser.ParseState;

import r.parser.RLexer;

import r.parser.RParser;

import r.lang.EvalResult;



public class RenjinCall {



  public RenjinCall() {

    Context topLevelContext = Context.newTopLevelContext();

    try {

      topLevelContext.init();

    } catch (Exception e) {

    }

    StringReader reader = new StringReader("x<-1:10\n");
    ParseOptions options = ParseOptions.defaults();
    ParseState state = new ParseState();
    RLexer lexer = new RLexer(options, state, reader);
    RParser parser = new RParser(options, state, lexer);
    try {
      parser.parse();
    } catch (Exception e) {
      System.out.println("Cannot parse: " + e.toString());
    }
    SEXP result = parser.getResult();
    System.out.println(result);
  }

  public static void main(String[] args) {
    new RenjinCall();
  }
}



We are initializing the library, creating the lexer and the parser and hadling the result as a SEXP. Finally we are printing the SEXP object (not itself, its String representation)


<-(x, :(1.0, 10.0))
This is the parsed version of our "x<-1:10", it contains the same amount of information but it is a little bit different in form. Since we only parsed the content but it has not been evaluated. Track the code:
EvalResult eva = result.evaluate(topLevelContext, topLevelContext.getEnvironment());
System.out.println(eva.getExpression().toString());


Now, the output is

c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

and this is the well known representation of R integer vectors. Of course printing the result in String format is not all the work. We would handle the elements of this array rather than print it. Lets do some work on it:

IntVector vector = (IntVector) eva.getExpression();
    for (int i = 0; i < vector.length(); i++) {
      System.out.println(
i + ". element of this vector is: " + vector.getElementAsInt(i)
);
    }

IntVector is defined in renjin core library and is for handling integer vectors. We simple used the .length() and .getElementAsInt() methods like using Java's ArrayList class. Finally the result is

0. element of this vector is: 1
1. element of this vector is: 2
2. element of this vector is: 3
3. element of this vector is: 4
4. element of this vector is: 5
5. element of this vector is: 6
6. element of this vector is: 7
7. element of this vector is: 8
8. element of this vector is: 9
9. element of this vector is: 10

It is nice, hah?

Monday, September 5, 2011

Online R Interpreter - Under development

This is the online R interpreter, Renjin, the Java implementation of the popular statistical programme. Note that it is under development and it includes unimplemented functionality and bugs. But it may be nice to try it online and you can report some bugs or join this project. Link is http://renjindemo.appspot.com/

Friday, August 26, 2011

renjin - JVM-based Interpreter for R Language for Statistical Computing

Today, i have just participated to renjin project with my first patch. I believe that porting R from C to Java makes the R available in different kind of computers rather than PC's. At a first glance, it may the R available in Android systems, for example (Except for native libraries).