Wednesday, September 7, 2011

Embedding R in Java Applications using Renjin

Effort of embedding R in other languages is not a short history for programmers. Rserve, Rjava, RCaller and Renjin are prominent efforts for doing this. Their approaches are completely different. RServe opens server sockets and listens for connections whatever the client is. It uses its own protocol to communicate with clients and it passes commands to R which were sent by clients. This is the neatest idea for me.

RJava uses the JNI (Java Native Library) way to interoperate R and Java. This is the most common and intuitive way for me.

RCaller sends commands to R interpreter by creating a process for each single call. Then it handles the results as XML and parses it. It is the easiest and the most in-efficient way of calling R from Java. But it works.

And finally, Renjin, is a re-implementation of R for the Java Virtual Machine. I think, this will be the most rational way of calling R from Java because it is something like

Renjin,
is not for calling R from Java,
is for calling itself and maybe it can be said that: it is for calling java from java :),
for Java programmers who aimed to use R in their projects


So that is why I participated this project. External function calls are always make pain whatever the way you use.

Renjin is an R implementation in Java.

I think all these paragraphs tell the whole story.

How can we embed Renjin to our Java projects? Lets do something... But we have some requirements:

  1. renjin-core-0.1.2-SNAPSHOT.jar (Download from http://code.google.com/p/renjin/wiki/Downloads?tm=2)
  2. commons-vfs-1.0.jar (Part of apache commons)
  3. commons-logging-1.1.1.jar (Part of apache commons)
  4. guava-r07.jar (http://code.google.com/p/guava-libraries/downloads/list)
  5. commons-math-2.1.jar (Part of apache commons)

Ok. These are the renjin and required Jar files. Lets evaluate the R expression "x<-1:10" which creates a vector of integers from one to ten. Tracking the code is straightforward.
package renjincall;



import java.io.StringReader;

import r.lang.Context;

import r.lang.SEXP;

import r.parser.ParseOptions;

import r.parser.ParseState;

import r.parser.RLexer;

import r.parser.RParser;

import r.lang.EvalResult;



public class RenjinCall {



  public RenjinCall() {

    Context topLevelContext = Context.newTopLevelContext();

    try {

      topLevelContext.init();

    } catch (Exception e) {

    }

    StringReader reader = new StringReader("x<-1:10\n");
    ParseOptions options = ParseOptions.defaults();
    ParseState state = new ParseState();
    RLexer lexer = new RLexer(options, state, reader);
    RParser parser = new RParser(options, state, lexer);
    try {
      parser.parse();
    } catch (Exception e) {
      System.out.println("Cannot parse: " + e.toString());
    }
    SEXP result = parser.getResult();
    System.out.println(result);
  }

  public static void main(String[] args) {
    new RenjinCall();
  }
}



We are initializing the library, creating the lexer and the parser and hadling the result as a SEXP. Finally we are printing the SEXP object (not itself, its String representation)


<-(x, :(1.0, 10.0))
This is the parsed version of our "x<-1:10", it contains the same amount of information but it is a little bit different in form. Since we only parsed the content but it has not been evaluated. Track the code:
EvalResult eva = result.evaluate(topLevelContext, topLevelContext.getEnvironment());
System.out.println(eva.getExpression().toString());


Now, the output is

c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

and this is the well known representation of R integer vectors. Of course printing the result in String format is not all the work. We would handle the elements of this array rather than print it. Lets do some work on it:

IntVector vector = (IntVector) eva.getExpression();
    for (int i = 0; i < vector.length(); i++) {
      System.out.println(
i + ". element of this vector is: " + vector.getElementAsInt(i)
);
    }

IntVector is defined in renjin core library and is for handling integer vectors. We simple used the .length() and .getElementAsInt() methods like using Java's ArrayList class. Finally the result is

0. element of this vector is: 1
1. element of this vector is: 2
2. element of this vector is: 3
3. element of this vector is: 4
4. element of this vector is: 5
5. element of this vector is: 6
6. element of this vector is: 7
7. element of this vector is: 8
8. element of this vector is: 9
9. element of this vector is: 10

It is nice, hah?

13 comments:

  1. Nice post!

    If you use maven, you can also add renjin and all of its dependencies to your project by adding the following to your pom.xml:


    com.bedatadriven.renjin
    renjin-core
    0.1.2-SNAPSHOT

    You will need to add a reference to bedatadriven's public repo:


    bedatadriven
    Bedatadriven Public REpo
    http://nexus.bedatadriven.com/content/groups/public/

    ReplyDelete
  2. Rserve, Rjava, RCaller, Renjin, etc...

    What's the fastest in terms of runtime execution time and why???

    Thanks!

    ReplyDelete
  3. i think it depends on what you wanted do.

    There is an entry at "http://stackoverflow.com/questions/7435619/calling-r-from-java-faster-alternative-to-rcaller" about in-efficiency of RCaller 2.0 by means of simulations or iterated calculations. that is true.

    RCaller 2.0 is suitable in cases that lots of works and a list of results.

    When the aim is to perform simulation studies and iterations it is convenient to use Rserve or RJava, because the former uses tcp sockets and the latter uses JNI interfaces which are almost always fast than starting an external process. I think the most convenient method is using Rserve, because the initial calculations are done ones. Sockets are fast when compared with disk access and external method invocation.

    there are always more than one solutions at hand and selecting the best is the whole art.

    I developed the RCaller as an alternative way and i have never see it different somewhat it is not.

    Renjin is the other way, it is my favourite one, however it is under development. Alex and me are performing the build primitives of base package of R in order to run a single external package at first. When the renjin gets starts, there will be no reason to use sockets or jni or RCaller. Because renjin is a JVM based R interpreter which means that you will call Java from Java :) or R from R only :)) it will be nice...

    in Which project do you plan to use r with java? is it a scientific project? please let us know. i will be pleased if i can help..

    Hope you find the best fit.

    ReplyDelete
  4. Finally, the upcoming version of rcaller supports calculating more than one commands using a single process.

    This means we put a rocket back to our turtle.

    With the version 2.1 rcaller will have some new nice capabilities.

    Coming soon.

    ReplyDelete
  5. good job;) pls how we can embed a R console to a java application ??? thanks

    ReplyDelete
  6. To embed an R console into a java application, simply follow the working example located here:

    http://renjin.googlecode.com/svn/trunk/shell/src/main/java/org/renjin/cli/JlineRepl.java

    - Pat

    ReplyDelete
  7. how can I download an external package ?

    ReplyDelete
  8. Hi,

    thanks for your ideas. I have problems with our code. I don't get to import import r.lang.EvalResult; So Eclipse marks an error in the line EvalResult eva = result.evaluate(topLevelContext, topLevelContext.getEnvironment());

    Could you help me?

    Best Retards

    ReplyDelete
    Replies
    1. Aaron, i have the same problem.. tell me how you solved it?

      Delete
  9. I was wondering if it is possible to embed renjin into an existing web app? the web app is a hospital management system and is running on tomcat server using MySQL database. I want to have a link that allows the administrator to click on it and it loads renjin cli window to allow statistical evaluation of datasets provided for by the web app. I have already incoporated renjin into the the web app and added the dependencies into the pom.xml. To initialize it, I call it from a jsp file using this command

    ->JlineRepl.main(new String[0]);<-

    but I get this error:

    ->org.renjin.parser.RLexException: IOException while reading<-

    Please advice how to proceed. Assume I am a newbie.

    ReplyDelete
  10. Current Remjin is 0.7 and this article does not seem to apply.

    There are a lot more dpendencies and I cannot see package r inside renjin-core-0.7.0-SHAPSHOT.jar.

    Please update the article or delete it.

    ReplyDelete
  11. and as I can do with a file but do not put it in R for the StringReader???

    ReplyDelete
  12. Hi all,
    Thanks for this introductive demo to use Renjin.
    I am seeking for the best way to use cgdsr package (http://packages.renjin.org/packages/cgdsr.html) through a GUI java interface. I would like to build maven project (eclipse) which use R code to explore an handle cgds data base. It think that Renjin needs R.oo (http://packages.renjin.org/packages/cgdsr.html) to access to cgds. but it is seems that the dependency is not working yet.
    1- Is there a solution to built a maven project (eclipse) with Rengin for cgdsr package or I need to use JGR?
    The first maven project must be autonomous that able to run R (after JAVA_HOME, R_HOME setting), install and load related packages, accept input file (txt) and return results (sub-window or file), plot statistic graphs..

    In the second step I would like to integrate the the first maven project as plugin for cytoscape. Cytoscape is an open source project which has its own API (https://github.com/cytoscape/cytoscape-api).

    Any idea is welcome!
    Thanks
    Karim

    ReplyDelete

Thanks