Friday, February 27, 2015

Frequency table of characters in a string in R

R's string manipulating functions includes splitting a string. One can think that parsing a string or extracting its characters into an array and generating the frequency information may be used in a language detection system.

Here is a example on a text that is captured from the Oracle - History of Java site. The code below defines a large string. The string is then parsed into its characters. After calculating the frequencies of each single character (including numbers, commas and dots) a histogram is saved in a file.






# Defining string 
s <- "Since 1995, Java has changed our world and our expectations. Today, with technology such a part of our daily lives, we take it for granted that we can be connected and access applications and content anywhere, anytime. Because of Java, we expect digital devices to be smarter, more functional, and way more entertaining. In the early 90s, extending the power of network computing to the activities of everyday life was a radical vision. In 1991, a small group of Sun engineers called the \"Green Team\" believed that the next wave in computing was the union of digital consumer devices and computers. Led by James Gosling, the team worked around the clock and created the programming language that would revolutionize our world – Java. The Green Team demonstrated their new language with an interactive, handheld home-entertainment controller that was originally targeted at the digital cable television industry. Unfortunately, the concept was much too advanced for the them at the time. But it was just right for the Internet, which was just starting to take off. In 1995, the team announced that the Netscape Navigator Internet browser would incorporate Java technology.Today, Java not only permeates the Internet, but also is the invisible force behind many of the applications and devices that power our day-to-day lives. From mobile phones to handheld devices, games and navigation systems to e-business solutions, Java is everywhere!"

# First converting to lower case
# then splitting by characters.
# strsplit return a list, we are unlisting to a vector.
chars <- unlist(strsplit(tolower(s), ""))

# Generating frequency table
freqs <- table(chars)

# Generating plot into a file
png("Graph.png")
hist(freqs,include.lowest=TRUE, breaks=46,freq=TRUE,labels=rownames(freqs))
dev.off()



The generated output is


We translate the text used in our example to Spanish using Google Translate site. The code is shown below:

# Defining string 
s <- "Desde 1995, Java ha cambiado nuestro mundo y nuestras expectativas. Hoy en día, con la tecnología de una parte de nuestra vida cotidia    na tal, damos por sentado que se puede conectar y acceder a las aplicaciones y contenido en cualquier lugar ya cualquier hora. Debido a Java    , esperamos que los dispositivos digitales para ser más inteligente, más funcional, y de manera más entretenida. A principios de los años 90    , que se extiende el poder de la computación en red para las actividades de la vida cotidiana era una visión radical. En 1991, un pequeño gr    upo de ingenieros de Sun llamado \"Green Team\" cree que la próxima ola de la informática fue la unión de los dispositivos digitales de cons    umo y ordenadores. Dirigido por James Gosling, el equipo trabajó durante todo el día y creó el lenguaje de programación que revolucionaría e    l mundo - Java. El Equipo Verde demostró su nuevo idioma con una mano controlador interactivo, el entretenimiento en casa que fue dirigido o    riginalmente a la industria de la televisión digital por cable. Por desgracia, el concepto fue demasiado avanzado para el ellos en el moment    o. Pero fue justo para Internet, que estaba empezando a despegar. En 1995, el equipo anunció que el navegador de Internet Netscape Navigator     incorporaría Java technology.Today, Java no sólo impregna el Internet, pero también es la fuerza invisible detrás de muchas de las aplicaci    ones y dispositivos que alimentan nuestra vida del día a día. Desde teléfonos móviles para dispositivos de mano, juegos y sistemas de navega    ción para e-business soluciones, Java está en todas partes!"

# First converting to lower case
# then splitting by characters.
# strsplit return a list, we are unlisting to a vector.
chars <- unlist(strsplit(tolower(s), ""))

# Generating frequency table
freqs <- table(chars)

# Generating plot into a file
png("Graph.png")
hist(freqs,include.lowest=TRUE, breaks=46,freq=TRUE,labels=rownames(freqs))
dev.off()


The generated plot for the Spanish translation is:


Note that some characters such as space, dots and commas can be replaced from the string s using a code similar to this:

news <- gsub(pattern=c(" "), "", x=s)

The code above removes all spaces from the string s and a new string variable news holds the modified string. Original variable remains same. 


Have a nice read!



Environments in R

An environment is a special term in R, but its concept is used in many interpreters of programming languages. The term of variable scope is directly related with environments. An environment in R encapsulates a couple of variables or objects which itself is encapsulated by a special environment called global environment.

After setting a variable to a value out of any function and class, the default holder of this variable is the global environment.

Suppose we set t to 10 by

t <- 10

and this is the same as writing


assign(x="t", value=12, envir=.GlobalEnv)


and the value of t is now 12:


> t <- 10
> assign(x="t", value=12, envir=.GlobalEnv)
> t
[1] 12




Instead of using the global environment, we can create new environments and attach them to parent environments. Suppose we create a new environment as below:


> my.env <- new.env()
> assign(x="t", value="20", envir=my.env)
> t
[1] 12


Value of t is still 12 because we create an other t variable which is encapsulated by the environment my.env.


Variables in environments are accessable using the assign and the get functions.


> get(x="t", envir=my.env)
[1] "20"
> get(x="t", envir=.GlobalEnv)
[1] 12


As we can see, values of the variables with same name are different, because they are encapsulated by separate environments.

exists() function returns TRUE if an object exists in an environment else returns FALSE. Examining existence of an object is crucial in some cases. 

> exists(x="t", envir=.GlobalEnv)
[1] TRUE
> exists(x="t", envir=my.env)
[1] TRUE
> exists(x="a", envir=my.env)
[1] FALSE


 is.environment() function returns TRUE if an object is an environment else returns FALSE. 

> is.environment(.GlobalEnv)
[1] TRUE
> is.environment(my.env)
[1] TRUE
> is.environment(t)
[1] FALSE



Finally, environments are simply lists and a list can be converted to an environment easly. 

> my.list <- list (a=3, b=7)
> my.env <- as.environment(my.list)
> get("a", envir=my.env)
[1] 3
> get("b", envir=my.env)
[1] 7



The inverse process is converting an environment to a list: 


> as.list(.GlobalEnv)
$t
[1] 12

$my.env

$my.list
$my.list$a
[1] 3

$my.list$b
[1] 7



Happy R days :)











Wednesday, February 18, 2015

Fast and robust estimation of regression coefficients with R

Outliers are aberrant observations that do not fit the remaining of the data, well. In regression analysis, outliers should not be distant from the remaining part, that is, if an observation is distant from the unknown regression object (a line in two dimensional space, a plane in three dimensional space, a hyper-plane in more dimensional space, etc.) it is said to be an outlier. If the observation is distant from the regression object by its independent variables, it is called bad leverage. If an observation is distant by its dependent variables, it is said to be regression outlier. If it is distant by both of the dimensions, it can be a good leverage, which generally reduces the standard errors of estimates. Bad leverages may result a big difference in estimated coefficients and they are accepted as more dangerous in the statistics literature.

Since an outlier may change the partial coefficients of regression, examining the residuals of a non-robust estimator results wrong conclusions. An outlier may change one or more regression coefficients and hide itself with a relatively small residual. This effect is called masking. This change in coefficients can get a clean observation distant from the regression object with higher residual. This effect is called swamping. A successful robust estimator should minimize these two effects to estimate regression coefficients in more precision.

The medmad function in R package galts can be used for robust estimation of regression coefficients. This package is hosted in the CRAN servers and can be installed in R terminal by typing


install.packages("galts")


Once the package is installed, its content can be used by typing


require("galts")


and the functions and help files can be ready to use after typing an enter key.  Here is a complete example of generating a regression data, contaminating some observations and estimating the robust regression coefficients:




The output is

(Intercept)          x1          x2 
4.979828          4.993914    4.985901 

and the medmad function returns in 0.25 seconds in an Intel i5 computer with 8 GBs ram installed.


in which the parameters are near to 5 as the data is generated before. The details of this algorithm can be found in the paper

Satman, Mehmet Hakan. "A New Algorithm for Detecting Outliers in Linear Regression." International Journal of Statistics and Probability 2.3 (2013): p101.

which is avaliable at site

http://www.ccsenet.org/journal/index.php/ijsp/article/view/28207

and

http://www.ccsenet.org/journal/index.php/ijsp/article/download/28207/17282


Have a nice detect!






Friday, November 7, 2014

403 Forbidden Error

Hello there!

In this article, I'll talk about 403 Forbidden Error for web sites.

403 Forbidden Error


Actually, solving this error is really easy. So, imagine that if you've got a server and a FTP account, you can put your files on to publish. In this case, you may know which files have been visited by your visitors in your site. But they cannot access them somehow. That may be following reasons:

  • You don't have any INDEX file on the root directory. For example index.html, index.php or default.php. If you cannot solve this, just do it:

  • Just write following codes on your Linux Console:
  • $. chmod +x /[path]/
    

Finally, you have forgetten to put your INDEX file on the FTP or should change permissions as executable.

See you later!

Friday, August 22, 2014

Javascript and Fuzuli Integration

JFuzuli, the Java implementation of Fuzuli Programming Language now supports limited Javascript integration.

JFuzuli currently supports passing Fuzuli variables to Javascript environment, passing Javascript variables to Fuzuli environment, embedding Javascript code in any part of a Fuzuli source code.

The full support is planned to have ability of calling Fuzuli functions directly from within Javascript.

Here is the examples. This is the simplest one to demonstrate the basic usage of Javascript support:



In the example above, the variable a is set to 10 in Fuzuli part, is incremented by 1 in Javascript part and is printed in the Fuzuli part again. After all, value of a is 11.





In the example above, the variable message is first defined in Javascript section and was null in Fuzuli section at the top. And also, it is clear that the variable message is defined using the var keyword in Javascript section. After all, at the Fuzuli section, message is printed with its value which was set in Javascript section.


The example above is more interesting as it has a function which is written in Fuzuli language, but the function has its body written in Javascript! In this example, square function has a single parameter x. x is then passed to Javascript body and the result is calculated. Value of result is then returned in Fuzuli. At the end, the Fuzuli function call  (square 5) simply returns 25 which is calculated by Javascript.


Passing Arrays 

Because the list object in Fuzuli is simply a java.util.ArrayList, all public fields and methods of ArrayList are directly accessable in Javascript section. Look at the example below. In this example a list object is created with values 1,2 and 3, respectively. In Javascript section, the values of this object is cleaned first and then 10 and 20 are added to the list. Finally, in the Fuzuli section, object is printed only with values 10 and 20.


List objects can be created directly in Javascript section. Look at the example below. Since JFuzuli interpreter uses the javax.scripting framework, a Java object can be created with new keyword. The variable a is a list object in Fuzuli section again and the printed output includes two values of 10 and 20.



You can try similar examples using our online interpreter in url 

http://fuzuliproject.org/index.php?node=tryonline

Hope you get fun with Fuzuli...