Monday, March 16, 2015

Compact Genetic Algorithms with R


Compact Genetic Algorithm (CGA) is a member of Genetic Algorithms (GAs) and also Estimation of Distribution Algorithms (EDAs). Since it is based on a single chromosome rather than a population of chromosomes, it is compact.

For detailed information, research papers [1] and [2] present a complete and a brief documentations, respectively.

In this blog post, we give an example of use of compact genetic algorithms on ONEMAX function. ONEMAX function takes n-bits as parameters and returns the number of ones as integer. Since it is only one local optimum when all of the bits equal to 1, it is called ONEMAX.

First of all, we load the R package eive which includes the wrapped C++ function cga.

> require("eive")

The other step is to define the ONEMAX function.

> ONEMAX <- function (x){
+     return(-sum(x))
+ }

Now we write the main part, optimization with cga:

> result <- cga(chsize = 10 , popsize = 100 , evalFunc = ONEMAX)
> result
 [1] 1 1 1 1 1 1 1 1 1 1

The result is a vector in which the bits are all equal to 1.

The most important issue in this example is speed, because the algorithm is implemented in C++ and wrapped using Rcpp to be called within R.

Here is the example of 1000 bits and the time consumed by the cga function call:

> system.time(
+     result <- cga(chsize = 1000,popsize = 100,evalFunc = ONEMAX)
+ )
   user  system elapsed 
  0.443   0.000   0.433 
> ONEMAX(result)
[1] -994

This result seems to be considerably fast and 994 of 1000 bits are found as 1 by the function in 0.433 seconds. Lets increase the population size from 100 to 200:

> system.time(
+     result <- cga(chsize = 1000,popsize = 200,evalFunc = ONEMAX)
+ )
   user  system elapsed 
  0.891   0.000   0.866 
> print (ONEMAX(result))
[1] -1000

Now, after setting the population size from 100 to 200, function doubles the time consumed to 0.866 seconds. But this time, 1000 of 1000 bits are 1, and the optimal solution is reached.

Have a nice read !



[1] Harik, Georges R., Fernando G. Lobo, and David E. Goldberg. "The compact genetic algorithm." Evolutionary Computation, IEEE Transactions on 3.4 (1999): 287-297.

[2] Satman, M. Hakan, and Erkin Diyarbakirlioglu. "Reducing errors-in-variables bias in linear regression using compact genetic algorithms." Journal of Statistical Computation and Simulation ahead-of-print (2014): 1-20.


Accessing C++ objects from R using Rcpp

Rcpp (Seemless R and C++ integration) package for R provides an easy way of combining C++ and R code. Since R is an interpreter, a bulk of code would probably run at least 2 times slower than its counterpart written in C++. Speed is the most concerning issue many times, however, the main purpose of using C++ would be using an old native library with R.

In this post blog, we give an example of accessing a C++ class from within R using Rcpp. This C++ class is defined with name MyClass and has two private double typed variables. This class also has getter and setter methods for its private fields.

MyClass is defined as the code shown below:


#include <Rcpp.h>

using namespace Rcpp;
using namespace std;


class MyClass {
  private:
    double a,b;
  
  public:
    MyClass(double a, double b);
    ~MyClass();
    void setA(double a);
    void setB(double b);
    double getA();
    double getB();
};



MyClass has its private double typed variables a and b, a constructor, a destructor, getter and setter methods for a and b, respectively. The implementation of MyClass is given below:




MyClass::MyClass(double a, double b){
    this->a = a;
    this->b = b;
}

MyClass::~MyClass(){
    cout << "Destructor called" << std::endl;
}

void MyClass::setA (double a){
    this->a = a;
}

void MyClass::setB (double b){
    this->b = b;
}

double MyClass::getA(){
    return(this->a);
}

double MyClass::getB(){
    return(this->b);
}


MyClass is defined nearly minimal. Since it is a C++ class it is not directly accessable from R. In this example, we write some wrapper methods to create instances of MyClass and return their addresses in memory to perform later function calls. In other terms, in R side, we register address of C++ objects to access them. 


// [[Rcpp::export]]
long class_create(double a, double b){
    MyClass *m =  new MyClass(a,b);
    class_print((long) m);
    return((long)m);
}

The method class_create is a C++ method and it has a special comment which will be used by Rcpp before compiling. After compiling process, class_create wrapper R function will be created to call its C++ counterpart. This function create an instance of class_create with given double typed values and returns the address of created object in type long integer.  Here is the other wrapper functions:


// [[Rcpp::export]]
void class_print(long addr){
    MyClass *m = (MyClass*)addr;
    cout << "a = " << m->getA() << " b = " << m->getB() << "\n";   
}


// [[Rcpp::export]]
void class_destroy(long addr){
    MyClass *m = (MyClass*) addr;
    delete m;
}

// [[Rcpp::export]]
void class_set_a(long addr, double a){
    MyClass *m = (MyClass*) addr;
    m->setA(a);
}

// [[Rcpp::export]]
void class_set_b(long addr, double b){
    MyClass *m = (MyClass*) addr;
    m->setB(b);
}

// [[Rcpp::export]]
double class_get_a(long addr){
    MyClass *m = (MyClass*) addr;
    return(m->getA());
}

// [[Rcpp::export]]
double class_get_b(long addr){
    MyClass *m = (MyClass*) addr;
    return(m->getB());
}

Suppose the whole code is written in a file classcall.cpp.  In R side, this code can be compiled and tested as shown below:

> require("Rcpp")
Loading required package: Rcpp
> Rcpp::sourceCpp('rprojects/classcall.cpp')
> myobj <- class_create(3.14, 7.8)
a = 3.14 b = 7.8
> myobj
[1] 104078752
> class_set_a(myobj,100)
> class_set_b(myobj,500)
> class_print(myobj)
a = 100 b = 500

> class_get_a(myobj)
[1] 100
> class_get_b(myobj)
[1] 500
> class_destroy(myobj)
Destructor called


Have a nice read!


Saturday, March 14, 2015

SQLite with R - The sqldf package


R 's data sorting functions sort and order, the data filtering function which, vector accessing operators [], vector and matrix manipulation functions cbind and rbind, and other functions and keywords make data analysis easy in much situations. SQL (Structered Querying Language) is used for storing, adding, removing, sorting and filtering the data in which saved on a disk permenantly or memory.

The R package sqldf builds a SQLite database using an R data.frame object. A data.frame is a matrix with richer properties in R.  In this blog post, we present a basic introduction of sqldf package and its use in R.

First of all, the package can be installed by typing:

> install.packages("dftable")

After installing the package, it can be got ready to use by typing:

> require("dftable")
Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI


Now lets create two vectors with length of 100:

> assign("x", rnorm(100))
> assign("y", rnorm(100))
> assign("mydata", as.data.frame(cbind(x,y)))

We can see first 6 rows:

> head(mydata)
           x         y
1 -1.9357660 0.2784369
2 -0.6976428 1.4646022
3  0.1913628 0.1578977
4  0.3049607 0.6055087
5  2.3773249 1.1800434
6  0.4641791 1.7143130

Let's perform some SQL statements on this data frame using sqldf

Averages of x and y

> sqldf("select avg(x), avg(y) from mydata")
     avg(x)   avg(y)
1 0.0790934 0.220756


Number of cases

> sqldf("select count(x), count(y) from mydata")
  count(x) count(y)
1      100      100


First Three Cases

> sqldf("select x,y from mydata limit 3")
           x         y
1 -1.9357660 0.2784369
2 -0.6976428 1.4646022
3  0.1913628 0.1578977


Minimum and Maximum Values

> sqldf("select min(x),max(x),min(y),max(y) from mydata")
     min(x)   max(x)   min(y)   max(y)
1 -2.155768 2.377325 -1.75477 2.531869


First 3 Cases of Ordered Data

> sqldf("select x,y from mydata order by x limit 3")
          x         y
1 -2.155768 0.6614813
2 -1.935766 0.2784369
3 -1.837502 0.1073177
> sqldf("select x,y from mydata order by y limit 3")
          x         y
1 0.7665811 -1.754770
2 0.3373319 -1.736727
3 0.6199159 -1.335649


Insert into 

dftable does not alter the data frame. After inserting a new case, a new data.frame is created and returned. In the example below, sqldf takes a vector of two sql statements as parameters and the result is in accessable with the name main.mydata rather than mydata

> tail (sqldf( 
+ c(
+ "insert into mydata values (6,7)"
+ ,
+ "select * from main.mydata"
+ )
+ )
+ )
              x          y
96   1.58024523  1.3937920
97  -1.79352203  0.2105787
98   0.02632872 -1.0567890
99  -0.60934162 -0.1359667
100  1.43393159 -0.9396326
101  6.00000000  7.0000000


Delete

> sqldf( 
+ c(
+ "delete from mydata where x < 0 or y < 0"
+ ,
+ "select * from main.mydata"
+ )
+ )
            x          y
1  0.19136277 0.15789771
2  0.30496074 0.60550873
3  2.37732485 1.18004342
4  0.46417906 1.71431305
5  1.16290585 1.17154756
6  0.49335335 0.19904607
7  1.45769371 0.08291387
8  0.78473338 1.07769098
9  0.69043300 1.35040512
10 1.47893118 1.01057351
.....


Have a nice read!


Handling all variables in a workspace in R with RCaller

It is known that the R assigns a value to a variable name by using the Assignment Symbol <- which corresponds to assign function.

RCaller handles results as list objects. Since R environments are list s, they can easily be converted to R lists (Visit the previous blog post on R list here).

Here is an example of RCaller on getting all variables that are created in the run time in R side.





package rcallerenvironments;

import rcaller.RCaller;
import rcaller.RCode;

public class RCallerEnvironments {

    public static void main(String[] args) {
        RCaller rcaller = new RCaller();
        RCode code = new RCode();
        rcaller.setRscriptExecutable("/usr/bin/Rscript");

        code.addRCode("a <- 3");
        code.addRCode("b <- 10.45");
        code.addRCode("d <- TRUE");
        code.addRCode("avector <- c(9,6,5,6)");
        code.addRCode("allvars <- as.list(globalenv())");

        rcaller.setRCode(code);

        rcaller.runAndReturnResult("allvars");

        System.out.println(rcaller.getParser().getNames());
        try {
            System.out.println(rcaller.getParser().getXMLFileAsString());
        } catch (Exception e) {
            System.out.println("Error in accessing XML");
        }
    }

}

The output is 



As it is seen in output, created variables avector, a, b and d are returned to Java side in a single call without any manual translations.

Have a nice read!


Friday, March 13, 2015

RCaller 2.5 is available for downloading

We are happy to announce that our 'easy to use' Java library for calling R from Java is available for downloading by now on. Developers access the compiled jar file in site


 https://github.com/jbytecode/rcaller/releases/tag/2.5


This release does not extend the main functionality of the library but now there are some handy functions for performing some calculations and later development of the library.



What is new:

* Official document bibtex added to cite RCaller in any projects or papers

* RealMatrix class is implemented. Matrix operations are performed in more 'java-ish style'

* RService is implemented for developing wrapper functions


Where to start?

* Read the web page on RCaller http://mhsatman.com/tag/rcaller/
* Read blog entries in http://stdioe.blogspot.com.tr/search/label/rcaller
* Have a look at the source tree in https://github.com/jbytecode/rcaller
* Download the library in  https://github.com/jbytecode/rcaller/releases/tag/2.5

Have a nice try!