Sunday, August 18, 2013

Sorting Multi-Column Datasets in R

Sorting Multi-Column Datasets in R


August 18, 2013

In this entry, we present the most straightforward way to sort multi-column datasets

Suppose that we have a vector in three dimensional space. This vector can be defined as

<- c(3,1,2)

in R. The built-in function sort sorts a given vector in ascending order by default. An ordered version of vector a is calculated as shown below:

sorted_<- sort(a)

The result is

[1] 1 2 3

For reverse ordering, the value of decreasing parameter must be set to TRUE.

> sort(a,decreasing=TRUE) 
[1] 3 2 1

Another related function order returns indices of sorted elements.

> a <- c(3,1,2) 
> order (a) 
[1] 2 3 1

In the example above, it is shown that if the vector a is ordered in ascending order, second element of a will be placed at index 1. It is easy to show that, result of the function order can be used to sort data.

> a <- c(3,1,2) 
> o <- order (a) 
> a[o] 
[1] 1 2 3

Suppose that we need to sort a matrix by a desired row. The solution is easy. Get the indices of sorted desired column and use the same method as in example given above.

> x <- round(runif(5, 0, 100)) 
> y <- round(runif(5, 0, 100)) 
> z <- round(runif(5, 0, 100)) 
> data <- cbind(x,y,z) 
> data 
      x  y  z 
[1,] 48 35 75 
[2,] 40 21 43 
[3,] 58 69  1 
[4,] 49 38  2 
[5,] 43 66 46

Now, get the indices of sorted z:

> o <- order(z) 
> data[o,] 
      x  y  z 
[1,] 58 69  1 
[2,] 49 38  2 
[3,] 40 21 43 
[4,] 43 66 46 
[5,] 48 35 75

Finally, we sorted the dataset data by the column vector z.

Saturday, August 17, 2013

A User Document For RCaller

A new research paper as a RCaller documentation is freely available at http://www.sciencedomain.org/abstract.php?iid=550&id=6&aid=4838#.U5YSoPmSy1Y




RCaller: A library for calling R from Java

by M.Hakan Satman

August 17, 2013

Contents


Abstract

RCaller is an open-source, compact, and easy-to-use library for calling R from Java. It offers not only an elegant solution for the task but its simplicity is key for non-programmers or programmers who are not familier with the internal structure of R. Since R is not only a statistical software but an enormous collection of statistical functions, accessing its functions and packages is of tremendous value. In this short paper, we give a brief introduction on the most widely-used methods to call R from Java and highlight some properties of RCaller with short examples. User feedback has shown that RCaller is an important tool in many cases where performance is not a central concern.

1 Introduction


R [R Development Core Team(2011)] is an open source and freely distributed statistics software package for which hundreds of external packages are available. The core functionality of R is written mostly in C and wrapped by R functions which simplify parameter passing. Since R manages the exhaustive dynamic library loading tasks in a clever way, calling an external compiled function is easy as calling an R function in R. However, integration with JVM (Java Virtual Machine) languages is painful.
The R package rJava [Urbanek(2011a)] provides a useful mechanism for instantiating Java objects, accessing class elements and passing R objects to Java methods in R. This library is convenient for the R packages that rely on external functionality written in Java rather than C, C++ or Fortran.
The library JRI, which is now a part of the package rJava, uses JNI (Java Native Interface) to call R from Java [Urbanek(2009)]. Although JNI is the most common way of accessing native libraries in Java, JRI requires that several system and environment variables are correctly set before any run, which can be difficult for inexperienced users, especially those who are not computer scientists.
The package Rserve [Urbanek(2011b)] uses TCP sockets and acts as a TCP server. A client establishes a connection to Rserve, sends R commands, and receives the results. This way of calling R from the other platforms is more general because the handshaking and the protocol initializing is fully platform independent.
Renjin (http://code.google.com/p/renjin) is an other interesting project that addresses the problem. It solves the problem of calling R from Java by re-implementing the R interpreter in Java! With this definition, the project includes the tasks of writing the interpreter and implementing the internals. Renjin is intended to be 100% compatible with the original. However, it is under development and needs help. After all, an online demo is available which is updated simultaneously when the source code is updated.
Finally, RCaller [RCaller Development Team(2011)] is an LGPL’d library which is very easy to use. It does not do much but wraps the operations well. It requires no configuration beyond installing an R package (Runiversal) and locating the Rscript binary distributed with R. Altough it is known to be relatively inefficient compared to other options, its latest release features significant performance improvements.

2 Calling R Functions


Calling R code from other languages is not trivial. R includes a huge collection of math and statistics libraries with nearly 700 internal functions and hundreds of external packages. No comparable library exists in Java. Although libraries such as the Apache Commons Math [Commons Math Developers(2010)] do provide many classes for those calculations, its scope is quite limited compared to R. For example, it is not easy to find such a library that calculates quantiles and probabilities of non-central distributions. [Harner et al.(2009)Harner, Luo, and Tan] affirms that using R’s functionality from Java prevents the user from writing duplicative codes in statistics softwares.
RCaller is an other open source library for performing R operations from within Java applications in a wrapped way. RCaller prepares R code using the user input. The user input is generally a Java array, a plain Java object or the R code itself. It then creates an external R process by running the Rscript executable. It passes the generated R code and receives the output as XML documents. While the process is alive, the output of the standard input and the standard error streams are handled by an event-driven mechanism. The returned XML document is then parsed and the returned R objects are extracted to Java arrays.
The short example given below creates two double vectors, passes them to R, and returns the residuals calculated from a linear regression estimation.
RCaller caller = new RCaller();
RCode code = new RCode();
double[] xvector = new double[]{1,3,5,3,2,4};
double[] yvector = new double[]{6,7,5,6,5,6};

caller.setRscriptExecutable("/usr/bin/Rscript");

code.addDoubleArray("X", xvector);
code.addDoubleArray("Y", yvector);
code.addRCode("ols <- lm ( Y ~ X )");

caller.setRCode(code);

caller.runAndReturnResult("ols");

double[] residuals =
   caller.getParser().
     getAsDoubleArray("residuals");  

The lm function returns an R list with a class of lm whose elements are accessible with the $ operator. The method runAndReturnResult() takes the name of an R list which contains the desired results. Finally, the method getAsDoubleArray() returns a double vector with values filled from the vector residuals of the list ols.
RCaller uses the R package Runiversal [Satman(2010)] to convert R lists to XML documents within the R process. This package includes the method makexml() which takes an R list as input and returns a string of XML document. Although some R functions return the results in other types and classes of data, those results can be returned to the JVM indirectly. Suppose that obj is an S4 object with members member1 and member2. These members are accessible with the @ operator like obj@member1 and obj@member2. These elements can be returned to Java by constructing a new list like result\A1-list(m1=obj@member1, m2=obj@member2).

3 Handling Plots


Although the graphics drivers and the internals are implemented in C, most of the graphics functions and packages are written in the R language and this makes the R unique with its graphics library. RCaller handles a plot with the function startPlot() and receives a java.io.File reference to the generated plot. The function getPlot() returns an instance of the javax.swing.ImageIcon class which contains the generated image in a fully isolated way. A Java example is shown below:
RCaller caller = new RCaller();
RCode code = new RCode();
File plotFile = null;
ImageIcon plotImage = null;

caller.
setRscriptExecutable("/usr/bin/Rscript");

code.R_require("lattice");

try{
 plotFile = code.startPlot();
 code.addRCode("
      xyplot(rnorm(100)~1:100, type=’l’)
      ");
}catch (IOException err){
 System.out.println("Can not create plot");
}

caller.setRCode(code);
caller.runOnly();

plotImage = code.getPlot(plotFile);
code.showPlot(plotFile);

The method runOnly() is quite different from the method RunAndReturnResult(). Because the user only wants a plot to be generated, there is nothing returned by R in the example above. Note that more than one plots can be generated in a single run.
Handling R plots with a java.io.File reference is also convenient in web projects. Generated content can be easly sent to clients using output streams opened from the file reference. However, RCaller uses the temp directory and does not delete the generated files automatically. This may be a cause of a too many files OS level error which can not be caught by a Java program. However, cleaning the generated output using a scheduled task solves this problem.

4 Live Connection


Each time the method runAndReturnResult() is called, an Rscript instance is created to perform the operations. This is the main source of the inefficiency of RCaller. A better approach in the cases that R commands are repeatedly called is to use the method runAndReturnResultOnline(). This method creates an R instance and keeps it running in the background. This approach avoids the time required to create an external process, initialize the interpreter, and load packages in subsequent calls.
The example given below returns the determinants of a given matrix and its inverse in sequence, that is, it uses a single external instance to perform more than one operation.
double[][] matrix =
    new double[][]{{5,4,5},{6,1,0},{9,-1,2}};
caller.setRExecutable("/usr/bin/R");
caller.setRCode(code);

code.clear();
code.addDoubleMatrix("x", matrix);
code.addRCode("result<-list(d=det(x))");
caller.runAndReturnResultOnline("result");

System.out.println(
"Determinant is " +
  caller.getParser().
   getAsDoubleArray("d")[0]
   );

code.addRCode("result<-list(t=det(solve(x)))");
caller.runAndReturnResultOnline("result");

System.out.println(
"Determinant of inverse is " +
  caller.getParser().
   getAsDoubleArray("t")[0]
   );

This use of RCaller is fast and convenient for repeated commands. Since R is not thread-safe, its functions can not be called by more than one threads. Therefore, each single thread must create its own R process to perform calculations simultaneously in Java.

5 Monitoring the Output


RCaller receives the desired content as XML documents. The content is a list of the variables of interest which are manually created by the user or returned automatically by a function. Apart from the generated content, R produces some output to the standard output (stdout) and the standard error (stderr) devices. RCaller offers two options to handle these outputs. The first one is to save them in a text file. The other is to redirect all of the content to the standard output device. The example given below shows a conditional redirection of the outputs generated by R.
if(console){
 caller.redirectROutputToConsole();
}else{
 caller.redirectROutputToFile(
     "output.txt" /* filename */,
     true  /* append? */);
}

6 Conclusion


In addition to being a statistical software, R is an extendable library with its internal functions and external packages. Since the R interpreter was written mostly in C, linking to custom C/C++ programs is relatively simple. Unfortunately, calling R functions from Java is not straightforward. The prominent methods use JNI and TCP sockets to solve this problem. In addition, renjin offers a different perspective to this issue. It is a re-implementation of R in Java which is intended to be 100% compatible with the original. However, it is under development and needs help. Finally, RCaller is an alternative way of calling R from Java. It is packaged in a single jar and it does not require setup beyond the one-time installation of the R package Runiversal. It supports loading external packages, calling functions, handling plots and debugging the output generated by R. It is not the most efficient method compared to the alternatives, but users report that performance improvements in the latest revision and its simplicity of use make it an important tool in many applications.

References


[Commons Math Developers(2010)]   Commons Math Developers. Apache Commons Math, Release 2.1. Available from http://commons.apache.org/math/download_math.cgi, Apr. 2010. URL http://commons.apache.org/math.
[Harner et al.(2009)Harner, Luo, and Tan]   E. Harner, D. Luo, and J. Tan. JavaStat: A Java/R-based statistical computing environment. Computational Statistics, 24(2):295–302, May 2009.
[R Development Core Team(2011)]   R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. URL http://www.R-project.org/. ISBN 3-900051-07-0.
[RCaller Development Team(2011)]   RCaller Development Team. RCaller: A library for calling R from Java, 2011. URL http://code.google.com/p/rcaller.
[Satman(2010)]   M. H. Satman. Runiversal: A Package for converting R objects to Java variables and XML., 2010. URL http://CRAN.R-project.org/package=Runiversal. R package version 1.0.1.
[Urbanek(2009)]   S. Urbanek. How to talk to strangers: ways to leverage connectivity between R, Java and Objective C. Computational Statistics, 24(2):303–311, May 2009.
[Urbanek(2011a)]   S. Urbanek. rJava: Low-level R to Java interface, 2011a. URL http://CRAN.R-project.org/package=rJava. R package version 0.9-2.
[Urbanek(2011b)]   S. Urbanek. Rserve: Binary R server, 2011b. URL http://CRAN.R-project.org/package=Rserve. R package version 0.6-5.





A new research paper as a RCaller documentation is freely available at http://www.sciencedomain.org/abstract.php?iid=550&id=6&aid=4838#.U5YSoPmSy1Y






























Sunday, July 7, 2013

How To Connect MySQL Database Using Java

Hello

In this article, I'll show you how to connect MySQL database using java. For this, you should configure the settings about MySQL connector in the JAVA IDE first. So all things what we need are;


  • IDE (NetBeans or EClipse is so good!)
  • MySQL Connector

  • MySQL Connector is a jar file which is so small size. You can download MySQL Connector here and check this line up:
    JDBC Driver for MySQL (Connector/J)
    After downloading, you have to introduce IDE and Connector. I use NetBeans. Because of I will show you how to introduce on the NetBeans already. Actually It doesn't matter which program has used. For that reason This process is the same for EClipse.

    Screen views given above show us the process about introducing. You click OK and it's finish. Your MySQL driver and JAVA IDE know each other from on now. Let's create a db class on there.
    void vt() {
            Connection conn = null;
            String hostName = "jdbc:mysql://localhost:3306/";
            String dbName = "DatabaseName";
            String driver = "com.mysql.jdbc.Driver";
            String userName = "root";
            String password = "*****";
            try {
                conn = DriverManager.getConnection(
                    hostName+dbName,userName,password);
                System.out.println("Connected to the database");
                Statement st = conn.createStatement();
                ResultSet result = st.executeQuery("SELECT * FROM  test");
                while (result.next()) {
                   int id = result.getInt("id");
                   String name = result.getString("name");
                   System.out.println(id + "\t" + name);
            }
                conn.close();
            } catch (Exception e) {
                System.out.println("No Connection!");
            }
    }
    

    In the try..catch debugging part, if there is no connection, You are going to see "No Connection!" alert on the screen. If not, You'll get names of the test table from your database. This code given above is not so good, but useful. If you want to code better, you can use OOP for example. Create a class of database connection, and use it each time connection.
    See you next article!

    Friday, July 5, 2013

    CURL Requests With PHP

    Hello,
    Most developers prefer to use HTTP Request / Response service in their projects. In this situation, you have to send data as POST method to the opponent.

    Imagine that you are a web master of your own e-commerce web site. Members of the site use coupon during check out. In these conditions, you may connect to other systems, for example the store which is in another sector, and have to validate if the coupon is correct or something like that. Here is the magnificent specimen of pure one of the best example in the world for this article :)

    If you want, let's code CURL for now!

    For this, I set a service up for posting data: service.php
    if(isset($_POST['field'])) {
        print "Field is: ".$_POST['field'];
    }
    else {
        print "Field is blank!";    
    }
    
    Like you've just seen above, the service is waiting for field post variable on it. If you send a field data, the screen is going to be like,

    Field is your field value
    
    But else,

    Field is blank!
    
    I suppose to send a field value to the service: myfile.php

    //display error
    ini_set('display_errors', 1);
     
    //curl
    $ch = curl_init("http://localhost/CURL/service.php");
     
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_POSTFIELDS, 'field=phpservisi.com');
     
    curl_exec($ch);
    curl_close($ch);
    
    If I run the myfile.php page, just going to see on on the screen like,

    Field is phpservisi.com
    
    For this example, I used a page which was http protocol. But sometimes I need use to https. In these conditions,have to add this line on it,

    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    
    There are so many settings and options in CURL. If you want to check it out, you can visit PHP official page, here

    See you next articles!

    Thursday, July 4, 2013

    How To Generate EXCEL Files With PHP

    Hello,
    Today, I will share a very useful application with you guys, generating excel files with using PHP. As you know, we need reports as excel, pdf, word etc. in our web sites. For that reason, generating all of'em is important thing for us. Some of us use basic HTML files with using PHP & a data base. But this is not enough. Thus, I use generating excel.
    So, Let's code :)
    $filename = "myExcelFile_".date('Y-m-d-h-i').".xls"; 
    
    header("Content-Disposition: attachment; 
            filename=\"$filename\""); 
    
    header("Content-Type: application/vnd.ms-excel"); 
    
    header('Content-Type: application/x-msexcel; 
            charset=UTF-8; format=attachment;');
    
    echo "\tMy EXCEL DATA\r\n"; 
    
    exit;
    

    I set my file name up as myExcelFile_ plus current time and some header configuration. Then finally I print. That's it.
    If you run, you'll see this view:

    This article was short but so useful. You can develop better, of course.
    See you next article.