Saturday, March 7, 2015

K-means clustering with RCaller - A library for calling R from Java

Here is an example of RCaller, a library for calling R from Java.

In the code below, we create two variables x and y. K-means clustering function kmeans is applied on the data matrix that consists of x and y. The result is then reported in Java.






package kmeansrcaller;

import rcaller.RCaller;
import rcaller.RCode;

public class KMeansRCaller {

    public static void main(String[] args) {
        RCaller caller = new RCaller();
        RCode code = new RCode();

        double[] x = new double[]{1, 2, 3, 4, 5, 10, 20, 30, 40, 50};
        double[] y = new double[]{2, 4, 6, 8, 10, 20, 40, 60, 80, 100};

        code.addDoubleArray("x", x);
        code.addDoubleArray("y", y);

        code.addRCode("result <- kmeans(cbind(x,y), 2)");

        caller.setRCode(code);

        caller.setRscriptExecutable("/usr/bin/Rscript");

        caller.runAndReturnResult("result");
        System.out.println(caller.getParser().getNames());

        int[] clusters = caller.getParser().getAsIntArray("cluster");
        double[][] centers = caller.getParser().getAsDoubleMatrix("centers");
        double[] totalSumOfSquares = caller.getParser().getAsDoubleArray("totss");
        // RCaller automatically replaces dots with underlines in variable names
        // So the parameter tot.withinss is accessible as tot_withinss
        double[] totalWithinSumOfSquares = caller.getParser().getAsDoubleArray("tot_withinss");
        double[] totalBetweenSumOfSquares = caller.getParser().getAsDoubleArray("betweenss");

        for (int i = 0; i < clusters.length; i++) {
            System.out.println("Observation " + i + " is in cluster " + clusters[i]);
        }

        System.out.println("Cluster Centers:");
        for (int i = 0; i < centers.length; i++) {
            for (int j = 0; j < centers[0].length; j++) {
                System.out.print(centers[i][j] + " ");
            }
            System.out.println();
        }

        System.out.println("Total Within Sum of Squares: " + totalWithinSumOfSquares[0]);
        System.out.println("Total Between Sum of Squares: " + totalBetweenSumOfSquares[0]);
        System.out.println("Total Sum of Squares: " + totalSumOfSquares[0]);
    }

}



The output is



[cluster, centers, totss, withinss, tot_withinss, betweenss, size, iter, ifault]
Observation 0 is in cluster 2
Observation 1 is in cluster 2
Observation 2 is in cluster 2
Observation 3 is in cluster 2
Observation 4 is in cluster 2
Observation 5 is in cluster 2
Observation 6 is in cluster 2
Observation 7 is in cluster 1
Observation 8 is in cluster 1
Observation 9 is in cluster 1
Cluster Centers:
40.0 6.42857142857143 
80.0 12.8571428571429 
Total Within Sum of Squares: 2328.57142857143
Total Between Sum of Squares: 11833.9285714286
Total Sum of Squares: 14162.5



Have a nice read!






Saturday, February 28, 2015

Levenshtein Distance in R

Edit distance and Levenshtein distance are nonparametric distance measures that not like well known metric distance measures such as Euclidean or Mahalanobis distances in some persfectives.

Levenshtein distance is a measure of how many characters should be replaced or moved to get two strings same.

In the example below, a string text is asked from the user in console mode. Then the input string is compared to colour names defined in R.  Similar colour names are then reported:



user.string <- readline("Enter a word: ")
wordlist <- colours()
dists <- adist(user.string, wordlist)
mindist <- min(dists)
best.ones <- which(dists == mindist)

for (index in best.ones){
    cat("Did you mean: ", wordlist[index],"\n")
}



Here is the results:

Enter a word: turtoise
Did you mean:  turquoise 


Enter a word: turtle
Did you mean:  purple 


Enter a word: night blue
Did you mean:  lightblue 


Enter a word: parliament
Did you mean:  darkmagenta 


Enter a word: marooon
Did you mean:  maroon



Have a nice read





Friday, February 27, 2015

Frequency table of characters in a string in R

R's string manipulating functions includes splitting a string. One can think that parsing a string or extracting its characters into an array and generating the frequency information may be used in a language detection system.

Here is a example on a text that is captured from the Oracle - History of Java site. The code below defines a large string. The string is then parsed into its characters. After calculating the frequencies of each single character (including numbers, commas and dots) a histogram is saved in a file.






# Defining string 
s <- "Since 1995, Java has changed our world and our expectations. Today, with technology such a part of our daily lives, we take it for granted that we can be connected and access applications and content anywhere, anytime. Because of Java, we expect digital devices to be smarter, more functional, and way more entertaining. In the early 90s, extending the power of network computing to the activities of everyday life was a radical vision. In 1991, a small group of Sun engineers called the \"Green Team\" believed that the next wave in computing was the union of digital consumer devices and computers. Led by James Gosling, the team worked around the clock and created the programming language that would revolutionize our world – Java. The Green Team demonstrated their new language with an interactive, handheld home-entertainment controller that was originally targeted at the digital cable television industry. Unfortunately, the concept was much too advanced for the them at the time. But it was just right for the Internet, which was just starting to take off. In 1995, the team announced that the Netscape Navigator Internet browser would incorporate Java technology.Today, Java not only permeates the Internet, but also is the invisible force behind many of the applications and devices that power our day-to-day lives. From mobile phones to handheld devices, games and navigation systems to e-business solutions, Java is everywhere!"

# First converting to lower case
# then splitting by characters.
# strsplit return a list, we are unlisting to a vector.
chars <- unlist(strsplit(tolower(s), ""))

# Generating frequency table
freqs <- table(chars)

# Generating plot into a file
png("Graph.png")
hist(freqs,include.lowest=TRUE, breaks=46,freq=TRUE,labels=rownames(freqs))
dev.off()



The generated output is


We translate the text used in our example to Spanish using Google Translate site. The code is shown below:

# Defining string 
s <- "Desde 1995, Java ha cambiado nuestro mundo y nuestras expectativas. Hoy en día, con la tecnología de una parte de nuestra vida cotidia    na tal, damos por sentado que se puede conectar y acceder a las aplicaciones y contenido en cualquier lugar ya cualquier hora. Debido a Java    , esperamos que los dispositivos digitales para ser más inteligente, más funcional, y de manera más entretenida. A principios de los años 90    , que se extiende el poder de la computación en red para las actividades de la vida cotidiana era una visión radical. En 1991, un pequeño gr    upo de ingenieros de Sun llamado \"Green Team\" cree que la próxima ola de la informática fue la unión de los dispositivos digitales de cons    umo y ordenadores. Dirigido por James Gosling, el equipo trabajó durante todo el día y creó el lenguaje de programación que revolucionaría e    l mundo - Java. El Equipo Verde demostró su nuevo idioma con una mano controlador interactivo, el entretenimiento en casa que fue dirigido o    riginalmente a la industria de la televisión digital por cable. Por desgracia, el concepto fue demasiado avanzado para el ellos en el moment    o. Pero fue justo para Internet, que estaba empezando a despegar. En 1995, el equipo anunció que el navegador de Internet Netscape Navigator     incorporaría Java technology.Today, Java no sólo impregna el Internet, pero también es la fuerza invisible detrás de muchas de las aplicaci    ones y dispositivos que alimentan nuestra vida del día a día. Desde teléfonos móviles para dispositivos de mano, juegos y sistemas de navega    ción para e-business soluciones, Java está en todas partes!"

# First converting to lower case
# then splitting by characters.
# strsplit return a list, we are unlisting to a vector.
chars <- unlist(strsplit(tolower(s), ""))

# Generating frequency table
freqs <- table(chars)

# Generating plot into a file
png("Graph.png")
hist(freqs,include.lowest=TRUE, breaks=46,freq=TRUE,labels=rownames(freqs))
dev.off()


The generated plot for the Spanish translation is:


Note that some characters such as space, dots and commas can be replaced from the string s using a code similar to this:

news <- gsub(pattern=c(" "), "", x=s)

The code above removes all spaces from the string s and a new string variable news holds the modified string. Original variable remains same. 


Have a nice read!



Environments in R

An environment is a special term in R, but its concept is used in many interpreters of programming languages. The term of variable scope is directly related with environments. An environment in R encapsulates a couple of variables or objects which itself is encapsulated by a special environment called global environment.

After setting a variable to a value out of any function and class, the default holder of this variable is the global environment.

Suppose we set t to 10 by

t <- 10

and this is the same as writing


assign(x="t", value=12, envir=.GlobalEnv)


and the value of t is now 12:


> t <- 10
> assign(x="t", value=12, envir=.GlobalEnv)
> t
[1] 12




Instead of using the global environment, we can create new environments and attach them to parent environments. Suppose we create a new environment as below:


> my.env <- new.env()
> assign(x="t", value="20", envir=my.env)
> t
[1] 12


Value of t is still 12 because we create an other t variable which is encapsulated by the environment my.env.


Variables in environments are accessable using the assign and the get functions.


> get(x="t", envir=my.env)
[1] "20"
> get(x="t", envir=.GlobalEnv)
[1] 12


As we can see, values of the variables with same name are different, because they are encapsulated by separate environments.

exists() function returns TRUE if an object exists in an environment else returns FALSE. Examining existence of an object is crucial in some cases. 

> exists(x="t", envir=.GlobalEnv)
[1] TRUE
> exists(x="t", envir=my.env)
[1] TRUE
> exists(x="a", envir=my.env)
[1] FALSE


 is.environment() function returns TRUE if an object is an environment else returns FALSE. 

> is.environment(.GlobalEnv)
[1] TRUE
> is.environment(my.env)
[1] TRUE
> is.environment(t)
[1] FALSE



Finally, environments are simply lists and a list can be converted to an environment easly. 

> my.list <- list (a=3, b=7)
> my.env <- as.environment(my.list)
> get("a", envir=my.env)
[1] 3
> get("b", envir=my.env)
[1] 7



The inverse process is converting an environment to a list: 


> as.list(.GlobalEnv)
$t
[1] 12

$my.env

$my.list
$my.list$a
[1] 3

$my.list$b
[1] 7



Happy R days :)











Wednesday, February 18, 2015

Fast and robust estimation of regression coefficients with R

Outliers are aberrant observations that do not fit the remaining of the data, well. In regression analysis, outliers should not be distant from the remaining part, that is, if an observation is distant from the unknown regression object (a line in two dimensional space, a plane in three dimensional space, a hyper-plane in more dimensional space, etc.) it is said to be an outlier. If the observation is distant from the regression object by its independent variables, it is called bad leverage. If an observation is distant by its dependent variables, it is said to be regression outlier. If it is distant by both of the dimensions, it can be a good leverage, which generally reduces the standard errors of estimates. Bad leverages may result a big difference in estimated coefficients and they are accepted as more dangerous in the statistics literature.

Since an outlier may change the partial coefficients of regression, examining the residuals of a non-robust estimator results wrong conclusions. An outlier may change one or more regression coefficients and hide itself with a relatively small residual. This effect is called masking. This change in coefficients can get a clean observation distant from the regression object with higher residual. This effect is called swamping. A successful robust estimator should minimize these two effects to estimate regression coefficients in more precision.

The medmad function in R package galts can be used for robust estimation of regression coefficients. This package is hosted in the CRAN servers and can be installed in R terminal by typing


install.packages("galts")


Once the package is installed, its content can be used by typing


require("galts")


and the functions and help files can be ready to use after typing an enter key.  Here is a complete example of generating a regression data, contaminating some observations and estimating the robust regression coefficients:




The output is

(Intercept)          x1          x2 
4.979828          4.993914    4.985901 

and the medmad function returns in 0.25 seconds in an Intel i5 computer with 8 GBs ram installed.


in which the parameters are near to 5 as the data is generated before. The details of this algorithm can be found in the paper

Satman, Mehmet Hakan. "A New Algorithm for Detecting Outliers in Linear Regression." International Journal of Statistics and Probability 2.3 (2013): p101.

which is avaliable at site

http://www.ccsenet.org/journal/index.php/ijsp/article/view/28207

and

http://www.ccsenet.org/journal/index.php/ijsp/article/download/28207/17282


Have a nice detect!






Friday, November 7, 2014

403 Forbidden Error

Hello there!

In this article, I'll talk about 403 Forbidden Error for web sites.

403 Forbidden Error


Actually, solving this error is really easy. So, imagine that if you've got a server and a FTP account, you can put your files on to publish. In this case, you may know which files have been visited by your visitors in your site. But they cannot access them somehow. That may be following reasons:

  • You don't have any INDEX file on the root directory. For example index.html, index.php or default.php. If you cannot solve this, just do it:

  • Just write following codes on your Linux Console:
  • $. chmod +x /[path]/
    

Finally, you have forgetten to put your INDEX file on the FTP or should change permissions as executable.

See you later!

Friday, August 22, 2014

Javascript and Fuzuli Integration

JFuzuli, the Java implementation of Fuzuli Programming Language now supports limited Javascript integration.

JFuzuli currently supports passing Fuzuli variables to Javascript environment, passing Javascript variables to Fuzuli environment, embedding Javascript code in any part of a Fuzuli source code.

The full support is planned to have ability of calling Fuzuli functions directly from within Javascript.

Here is the examples. This is the simplest one to demonstrate the basic usage of Javascript support:



In the example above, the variable a is set to 10 in Fuzuli part, is incremented by 1 in Javascript part and is printed in the Fuzuli part again. After all, value of a is 11.





In the example above, the variable message is first defined in Javascript section and was null in Fuzuli section at the top. And also, it is clear that the variable message is defined using the var keyword in Javascript section. After all, at the Fuzuli section, message is printed with its value which was set in Javascript section.


The example above is more interesting as it has a function which is written in Fuzuli language, but the function has its body written in Javascript! In this example, square function has a single parameter x. x is then passed to Javascript body and the result is calculated. Value of result is then returned in Fuzuli. At the end, the Fuzuli function call  (square 5) simply returns 25 which is calculated by Javascript.


Passing Arrays 

Because the list object in Fuzuli is simply a java.util.ArrayList, all public fields and methods of ArrayList are directly accessable in Javascript section. Look at the example below. In this example a list object is created with values 1,2 and 3, respectively. In Javascript section, the values of this object is cleaned first and then 10 and 20 are added to the list. Finally, in the Fuzuli section, object is printed only with values 10 and 20.


List objects can be created directly in Javascript section. Look at the example below. Since JFuzuli interpreter uses the javax.scripting framework, a Java object can be created with new keyword. The variable a is a list object in Fuzuli section again and the printed output includes two values of 10 and 20.



You can try similar examples using our online interpreter in url 

http://fuzuliproject.org/index.php?node=tryonline

Hope you get fun with Fuzuli...







Monday, July 28, 2014

Passing Fuzuli Expressions to Functions

Fuzuli Programming Language has many features borrowed from many popular languages such as C and Java as well as Lisp and Scheme.

It is known that a function pointer can be passed to a function in C and C++, whereas, we must declare the structure of a function using interfaces for doing same job in Java.

In Fuzuli, a Fuzuli source code can be directly passed to a function. This feature allows us to create generic functions easly. Let's show it using an example.

The code below creates four expressions that sum, subtract, product and divide two numbers, respectively.


(let expr1 (expression (+ a b)))
(let expr2 (expression (- a b)))
(let expr3 (expression (* a b)))
(let expr4 (expression (/ a b)))


The expression directive defines a runnable code using the directive eval as we will see later. Let's define a generic function that changes its behaviour respect to a expression parameter:

(function generic_function (params e x y)
   (let a x)
   (let b y)
   (return (eval e))
)


The function generic_function takes three parameters. The first one defines the real action. x and y are parameters that will be passed to expression later. Let's call this generic function using previously defined expressions:

(let enter "\n")
(let x1 15)(let x2 5)
(print "x1=" x1 ", x2=" x2 enter)
(print "+ : " (generic_function expr1 x1 x2) enter)
(print "- : " (generic_function expr2 x1 x2) enter)
(print "* : " (generic_function expr3 x1 x2) enter)
(print "/ : " (generic_function expr4 x1 x2) enter)

In first line we define the enter variable for printing output with line feed. In second line, we set x1 to 15 and x2 to 5. In third line, we are reporting the values of these variables.

The whole story lies at last four lines. In line four, we are calling the function generic_function using the predefined summation expression. In the next line, the same function is called using a different expression which calculates x1 - x2 . As it is clear to see that, last two lines calls the same generic function using two different expressions for getting the product and division of two numbers, respectively.

The output is :

x1=15.0, x2=5.0
+ : 20.0       
- : 10.0       
* : 75.0       
/ : 3.0        



Happy readings...


Notes:

You can try this code using the online interpreter: http://fuzuliproject.org/index.php?node=tryonline
or you can download the JFuzuli Editor: http://mhsatman.com/fuzuli-programming-language-facebook-face/


Sunday, July 27, 2014

Fuzuli Programming Language and Editor

Our programming language, Fuzuli, now has a new interpreter written in Java which is officially called JFuzuli.

You can try it online at site http://fuzuliproject.org/index.php?node=tryonline

We also get our first JFuzuli Editor ready for downloading at https://drive.google.com/file/d/0B-sn_YiTiFLGRHdVSUQ2cFZyT0U/edit?usp=sharing

Please feel free and do not hesisate to share your thoughts about the language and the interpreter.

You can also visit the Facebook page which is aimed to inform Turkish users using the address https://www.facebook.com/FuzuliProgramlamaDiliVeYorumlayici?ref=hl







Monday, June 16, 2014

RCaller 2.4 has just been released

Rcaller turtle The key properties of this release:
  • Added deleteTempFiles() method in class RCaller for deleting temporary files that are created by RCaller at any time. 
  • runiversal.r is now more compact
  • StopRCallerOnline() method in class RCaller now stops the R instances in the memory which are created in runAndReturnResultOnline(). Click to see the example for RCaller.stopRCallerOnline() method.
The next release 2.5 will be submitted in 15th July 2014.


Get informed using the formal blog http://stdioe.blogspot.com.tr/search/label/rcaller

 Download page: https://drive.google.com/?authuser=0#folders/0B-sn_YiTiFLGZUt6d3gteVdjTGM

 Source code: https://code.google.com/p/rcaller/

 Home page: http://mhsatman.com/tag/rcaller/

 Journal Documentation: http://www.sciencedomain.org/abstract.php?iid=550&id=6&aid=4838#.U59D8_mSy1Y

Friday, June 13, 2014

R GUI written in Java using RCaller


This video demonstrates how the Java version of R GUI based on RCaller is now faster after the speed improvements. This simple gui is available in the source tree. Typed commands are passed to R using the online call mechanism of RCaller and there is a single active R process at the background. 

Please follow the rcaller label in this blog site to achive latest RCaller news, updates, examples and other materials. 

Have a nice watching!



Scholarly papers, projects and thesis that cite RCaller

paperRCaller is now in its 4th year with its version of 2.3 and it is considerable mature now. It is used in many commercial projects as well as scholarly papers and thesis. Here is the list of scholarly papers, projects and thesis that I stumbled upon in Google Scholar.  




 
  • Niya Wang, Fan Meng, Li Chen, Subha Madhavan, Robert Clarke, Eric P. Hoffman, Jianhua Xuan, and Yue Wang. 2013. The CAM software for nonnegative blind source separation in R-Java. J. Mach. Learn. Res. 14, 1 (January 2013), 2899-2903. http://dl.acm.org/citation.cfm?id=2567753
 
  • Meng, Fan. Design and Implementation of Convex Analysis of Mixtures Software Suite, Master's Thesis, 2012. Abstract: Various convex analysis of mixtures (CAM) based algorithms have been developed to address real world blind source separation (BSS) problems and proven to have good performances in previous papers. This thesis reported the implementation of a comprehensive software CAM-Java, which contains three different CAM based algorithms, CAM compartment modeling (CAM-CM), CAM non-negative independent component analysis (CAM-nICA), and CAM non-negative well-grounded component analysis (CAM-nWCA). The implementation works include: translation of MATLAB coded algorithms to open-sourced R alternatives. As well as building a user friendly graphic user interface (GUI) to integrate three algorithms together, which is accomplished by adopting Java Swing API.In order to combine R and Java coded modules, an open-sourced project RCaller is used to handle the establishment of low level connection between R and Java environment. In addition, specific R scripts and Java classes are also implemented to accomplish the tasks of passing parameters and input data from Java to R, run R scripts in Java environment, read R results back to Java, display R generated figures, and so on. Furthermore, system stream redirection and multi-threads techniques are used to build a simple R messages displaying window in Java built GUI.The final version of the software runs smoothly and stable, and the CAM-CM results on both simulated and real DCE-MRI data are quite close to the original MATLAB version algorithms. The whole GUI based open-sourced software is easy to use, and can be freely distributed among the communities. Technical details in both R and Java modules implementation are also discussed, which presents some good examples of how to develop software with both complicate and up to date algorithms, as well as decent and user friendly GUI in the scientific or engineering research fields. http://scholar.lib.vt.edu/theses/available/etd-08202012-162249/
 
  • Emanuel Gonçalves, Julio Saez-Rodriguez. Cyrface: An interface from Cytoscape to R that provides a user interface to R packages, F1000Research 2013, 2:192 Last updated: 20 JAN 2014, http://f1000research.com/articles/2-192/v1/pdf
 
 
 
 
   

Monday, June 9, 2014

WhatsApp update 2.11.238

WhatsApp, the mobile application that is widely used all around the world, is increasing its number of users with its new abilities, especially after Facebook had bought it.

After the last update, 2.11.238, the new WhatsApp has these properties:



  • Set alerts to show/hide/silent in group messages.
  • Slovene and Azerbaijan language support.
  • Removed bugs on voice messages.
  • Option for deleting additional files when deleting messages.
  • An icon with the number of unread messages on it added to main screen for Samsung devices.
Source: http://www.phpservisi.com


New Documentation for RCaller

As a new documentation and brief introduction, the research paper "RCaller: A Software Library for Calling R from Java" has just been published in the scholarly journal "British Journal of Mathematics and Computer Science".

The aim and the motivation underlying this paper is to give a brief introduction to RCaller, how to use it in relatively small projects by means of calling R scripts and commands from Java, generating plots and images, running commands online and converting and sending plain Java objects to R.

Other two important projects, rJava and Rserve, are compared to RCaller by means of time efficiency. As a result of this, it is shown that, rJava and Rserve outperforms the RCaller in time complexity, but RCaller seems to be easier to learn and requires less setting-up effort.

The paper is freely available for downloading at

 http://www.sciencedomain.org/abstract.php?iid=550&id=6&aid=4838#.U5VvkPl_t2M

and the author's page is

http://mhsatman.com/research-paper-rcaller-a-software-library-for-calling-r-from-java/ .

Have a nice read!






Thursday, May 15, 2014

New Release: RCaller 2.3.0

New version of RCaller has just been uploaded in the Google Drive repository.

The new version includes basic bug fixes, new test files and speed enhancements.

XML file structure is now smaller in size and this makes RCaller a little bit faster than the older versions.

The most important issue in this release is the method

public int[] getDimensions(String name)

which reports the dimensions of a given object with 'name'. Here is an example:

int n = 21;
        int m = 23;
        double[][] data = new double[n][m];
        for (int i=0;i<data.length;i++){
            for (int j=0;j<data[0].length;j++){
                data[i][j] = Math.random();
            }
        }
        RCaller caller = new RCaller();
        Globals.detect_current_rscript();
        caller.setRscriptExecutable(Globals.Rscript_current);
       
        RCode code = new RCode();
        code.addDoubleMatrix("x", data);
        caller.setRCode(code);
       
        caller.runAndReturnResult("x");
       
        int[] mydim = caller.getParser().getDimensions("x");
       
        Assert.assertEquals(n, mydim[0]);
        Assert.assertEquals(m, mydim[1]);

In the code above, a matrix with dimensions 21 and 23 is passed to R and got back to Java. The variable mydim holds the number of rows and columns and they are as expected as 21 and 23.

Please use the download link

https://drive.google.com/?tab=mo&authuser=0#folders/0B-sn_YiTiFLGZUt6d3gteVdjTGM

to access compiled jar files of RCaller.

Good luck!