Suppose you have a pdf file with many pages and you want to delete a single page (or a list of pages) from this file in Ubuntu. You can first install the pdftk package using Ubuntu's package manager or by simply typing
sudo apt-get install pdftk
in the command line.
Now think that you want 4th page to be excluded from the pdf. You can type
pdftk old.pdf cat 1-3 5-end output new.pdf
where the original pdf is old.pdf, new.pdf is the new one with 4th page is excluded.
Hope this helps.
Sunday, September 6, 2015
Wednesday, September 2, 2015
Parallel Numeric Integration using Python Multiprocessing Library
Parallel algorithms can divide a problem on several cores. Some
algorithms can easly divide a problem into smaller ones such as matrix calculation and numerical
function integration because divided parts of these operations are
independent each other, that is, integrating a function for a < x < b and b < x < c domains equals to integrating the same function for a < x < c, when the divided numerical integrals are summed up.
In Python, the class Thread can be extended by a user defined class for multi-threaded applications. However, because of the interpreter lock, a multi-threaded solution can take longer computation times even if the computer has more than one cores on the cpu. The solution is using the multiprocessing library.
Suppose that we have the standard normal distribution function coded in Python:
Now we define an integrator function that takes the function, the bounds of the domain and a Queue object for storing the result as parameters:
The function Integrator simply integrates the given function from a to b. This function uses a single core even the computer has more. However, this operation can be divided into several smaller parts. Now we define a MultipleIntegrator class to distribute these operation into n parts, where n is a user defined integer, optionally equals to number of threads.
When we instantiate this class and call the do function on a user defined function, the integration process of func will be shared on several cores.
Lets integrate the normal function from -4 to 4 using 2 threads:
This result and the computation time is:
1.0000039893377686
real 0m1.762s
user 0m3.384s
sys 0m0.024s
If we increase the number of threads to 4:
The computation time is reported as:
1.0000039894435513
real 0m1.364s
user 0m5.056s
sys 0m0.028s
which is reduced by the multi-processing. The whole example is given below:
Note that, the standard normal function is a probability density function and can not be integrated to a number bigger than 1. All the results include many floating-point rounding errors. Many corrections can be added to Integrator for rounding issues.
Hope you get fun!
In Python, the class Thread can be extended by a user defined class for multi-threaded applications. However, because of the interpreter lock, a multi-threaded solution can take longer computation times even if the computer has more than one cores on the cpu. The solution is using the multiprocessing library.
Suppose that we have the standard normal distribution function coded in Python:
def normal(x):
return ((1/math.sqrt(2*math.pi)) * math.exp(-0.5 * math.pow(x,2)))
Now we define an integrator function that takes the function, the bounds of the domain and a Queue object for storing the result as parameters:
def Integrator(func, a, b, lstresults):
epsilon = 0.00001
mysum = 0.0
i = float(a)
while(i < b):
mysum += func(i) * epsilon
i += epsilon
lstresults.put(mysum)
The function Integrator simply integrates the given function from a to b. This function uses a single core even the computer has more. However, this operation can be divided into several smaller parts. Now we define a MultipleIntegrator class to distribute these operation into n parts, where n is a user defined integer, optionally equals to number of threads.
class MultipleIntegrator:
def __init__(self):
"Init"
def do(self, func, a, b, cores):
self.cores = cores
integrationrange = (float(b) - float(a)) /float(cores)
self.integrators = [None]*cores
mya = float(a)
allresults = Queue()
result = 0.0
for i in range(0,cores):
self.integrators[i] = Process(target=Integrator, args=(func, mya, mya + integrationrange,allresults,))
self.integrators[i].start()
mya += integrationrange
for myIntegrator in self.integrators:
myIntegrator.join()
while(not allresults.empty()):
result += allresults.get()
return(result)
When we instantiate this class and call the do function on a user defined function, the integration process of func will be shared on several cores.
Lets integrate the normal function from -4 to 4 using 2 threads:
m = MultipleIntegrator()
print(m.do(normal, -10, 10,2))
This result and the computation time is:
1.0000039893377686
real 0m1.762s
user 0m3.384s
sys 0m0.024s
If we increase the number of threads to 4:
m = MultipleIntegrator()
print(m.do(normal, -10, 10,4))
The computation time is reported as:
1.0000039894435513
real 0m1.364s
user 0m5.056s
sys 0m0.028s
which is reduced by the multi-processing. The whole example is given below:
from multiprocessing import Process,Queue
import math
def normal(x):
return ((1/math.sqrt(2*math.pi)) * math.exp(-0.5 * math.pow(x,2)))
def Integrator(func, a, b, lstresults):
epsilon = 0.00001
mysum = 0.0
i = float(a)
while(i < b):
mysum += func(i) * epsilon
i += epsilon
lstresults.put(mysum)
class MultipleIntegrator:
def __init__(self):
"Init"
def do(self, func, a, b, cores):
self.cores = cores
integrationrange = (float(b) - float(a)) /float(cores)
self.integrators = [None]*cores
mya = float(a)
allresults = Queue()
result = 0.0
for i in range(0,cores):
self.integrators[i] = Process(target=Integrator, args=(func, mya, mya + integrationrange,allresults,))
self.integrators[i].start()
mya += integrationrange
for myIntegrator in self.integrators:
myIntegrator.join()
while(not allresults.empty()):
result += allresults.get()
return(result)
m = MultipleIntegrator()
print(m.do(normal, -10, 10,4))
Note that, the standard normal function is a probability density function and can not be integrated to a number bigger than 1. All the results include many floating-point rounding errors. Many corrections can be added to Integrator for rounding issues.
Hope you get fun!
Labels
python
Sunday, August 30, 2015
Overloading Constructor in Python Classes
In Python, the constructor __init__ can not be overloaded. If you are a Java or C++ programmer, you probably used this facility before because even the standard libraries of these languages use function overloading.
Altough the language does not support constructor overloading, we can follow the factory design pattern for using multiple constructors in Python. In factory design pattern, we define static class members that create new instances with given parameters. In Python, if a function labeled with a @classmethod, then this function belongs to the class rather than an object. This is the same as the static methods in Java.
Now think that we want to write a Python class with three constructors. In first constructor we want to set param1 to a specific value and others to zero. In second constructor we want to set param2 to a specific value and others to zero and goes on.
Of course we don't need to set other parameters to zero in all static methods. Lets get the code more compact:
And now, we can instantiate some objects from this class using different factory methods:
myc1 calls the first static factory method. As it is expected, only the value of the first parameter is changed. Following methods sets the other parameters only. The output is
As we seen at the last line of the output, the object myc4 is created using the __init__constructor and all the parameters have value of zero.
Hope you get fun!
Altough the language does not support constructor overloading, we can follow the factory design pattern for using multiple constructors in Python. In factory design pattern, we define static class members that create new instances with given parameters. In Python, if a function labeled with a @classmethod, then this function belongs to the class rather than an object. This is the same as the static methods in Java.
Now think that we want to write a Python class with three constructors. In first constructor we want to set param1 to a specific value and others to zero. In second constructor we want to set param2 to a specific value and others to zero and goes on.
class Clazz:
def __init__(self):
"please use create1, create2 or create3"
@classmethod
def create1(cls, param1):
c = Clazz()
c.param1 = param1
c.param2 = 0
c.param3 = 0
return c
@classmethod
def create2(cls, param2):
c = Clazz()
c.param1 = 0
c.param2 = param2
c.param3 = 0
return c
@classmethod
def create3(cls,param3):
c = Clazz()
c.param1 = 0
c.param2 = 0
c.param3 = param3
return c
def dump(self):
print("Param1 = %d, Param2 = %d, Param3 = %d" %
(self.param1,self.param2,self.param3))
Of course we don't need to set other parameters to zero in all static methods. Lets get the code more compact:
class Clazz:
param1 = 0
param2 = 0
param3 = 0
def __init__(self):
"please use create1, create2 or create3"
@classmethod
def create1(cls, param1):
c = Clazz()
c.param1 = param1
return c
@classmethod
def create2(cls, param2):
c = Clazz()
c.param2 = param2
return c
@classmethod
def create3(cls,param3):
c = Clazz()
c.param3 = param3
return c
def dump(self):
print("Param1 = %d, Param2 = %d, Param3 = %d" %
(self.param1,self.param2,self.param3))
And now, we can instantiate some objects from this class using different factory methods:
myc1 = Clazz.create1(5)
myc1.dump()
myc2 = Clazz.create2(10)
myc2.dump()
myc3 = Clazz.create3(50)
myc3.dump()
myc4 = Clazz()
myc4.dump()
myc1 calls the first static factory method. As it is expected, only the value of the first parameter is changed. Following methods sets the other parameters only. The output is
Param1 = 5, Param2 = 0, Param3 = 0
Param1 = 0, Param2 = 10, Param3 = 0
Param1 = 0, Param2 = 0, Param3 = 50
Param1 = 0, Param2 = 0, Param3 = 0
As we seen at the last line of the output, the object myc4 is created using the __init__constructor and all the parameters have value of zero.
Hope you get fun!
Labels
python
Tuesday, August 25, 2015
Lisp for the C++ programmer: Numerical Integration
In the series of Lisp for the C++ programmer, we present some Common Lisp examples in a way similar to examples which are written in C++ or some other languages belong the same family tree of C++.
Here is the example of Riemann sum in Common Lisp. We first define a function f which takes a single parameter x and the function y = f(x) = x, for simplicity. We will change this function later. It is known that integration of this function from 0 to 1 is 0.50.
Common Lisp code for numerical integration (approximatly result) is:
The result is 0.4999532. Now we can use a more complex function, for example a normal distribution function with zero mean and unit variance. This function can be defined in Common Lisp as
In the code above, as it can clearly be seen, we integrate the standard normal distribution from -1 to 1 and the result is 0.6826645.
Here is the example of Riemann sum in Common Lisp. We first define a function f which takes a single parameter x and the function y = f(x) = x, for simplicity. We will change this function later. It is known that integration of this function from 0 to 1 is 0.50.
Common Lisp code for numerical integration (approximatly result) is:
(defun f (x)
x
)
(defun integrate (f start stop)
(setq epsilon 0.0001)
(setq sum 0.0)
(loop for i from start to stop by epsilon do
(setq sum (+ sum (* (funcall f i) epsilon)))
)
sum
)
(print (integrate 'f 0 1))
The result is 0.4999532. Now we can use a more complex function, for example a normal distribution function with zero mean and unit variance. This function can be defined in Common Lisp as
(defun normal (x)
(setq a (* (/ 1 (sqrt (* 2 3.141592))) (exp (* -0.5 (* x x)))))
a
)
(defun integrate (f start stop)
(setq epsilon 0.0001)
(setq sum 0.0)
(loop for i from start to stop by epsilon do
(setq sum (+ sum (* (funcall f i) epsilon)))
)
sum
)
(print (integrate 'normal -1 1))
In the code above, as it can clearly be seen, we integrate the standard normal distribution from -1 to 1 and the result is 0.6826645.
Lisp for the C++ programmer: Changing an element of a list
Here is the example of changing an element of a Common Lisp List and its equivalent code in C++ typed as List comment.
; #include <cstdlib>
; #include <cstdio>
;
; int main(){
; double *d = (double*) malloc (sizeof(double) * 3);
; d[0] = 1;
; d[1] = 2;
; d[2] = 4;
;
; puts("Current List:");
; printf("%f %f %f\n", d[0], d[1], d[2]);
;
; puts("New List:");
; d[0] = 100;
; printf("%f %f %f\n", d[0], d[1], d[2]);
; return(0);
;}
(setq aList (list 1 2 3))
(print "Current List:")
(print aList)
(setf (elt aList 0) 100)
(print "New List:")
(print aList)
Lisp for the C++ programmer: for loop
Here is the example of for loop. Equivalent C++ code is commented on the top the Common Lisp code as comments.
; for (int i = 0; i <= 10; i++){
; printf("%f\n", i);
; printf("%f\n", i * 2);
; }
(loop for i from 0 to 10
do
(progn
(print i)
(print (* i 2))
)
)
Lisp for the C++ programmer: Function definitions and function calls
Here is the simple Common Lisp example, in which a sum and an ArithmaticMean functions defined. Both C++ and Common Lisp code here calculate the arithmetic mean of 1,2,3,4,5 and 6 which is 3.5. C++ code is commented as in the Common Lisp file.
; double sum (double *d, int len){
; double mysum = 0.0;
; for (int i = 0; i < len; i++){
; mysum += d[i];
; }
; return(mysum);
; }
;
; double ArithmeticMean (double *d, int len){
; return ( sum(d) / len );
; }
; int main(){
; double *mylist = (double*) malloc(sizeof(double) * 6);
; for (int i = 0; i < 6; i++){
; d[i] = (double)i;
; }
; printf("%f\n", ArithmeticMean(d, 6);
; return(0);
; }
(defun sum (aList)
(setq mysum 0.0)
(dotimes (i (length aList))
(setq mysum (+ mysum (nth i aList )))
)
mysum
)
(defun ArithmeticMean (aList)
(/ (sum aList) (length aList))
)
(print (ArithmeticMean '(1 2 3 4 5 6)))
Lisp for the C++ programmer: cond expression
Here is the example for the cond expression of Common Lisp and its C++ equivalent.
; int x = 10;
; if (x < 10) {
; puts("x is smaller than 10");
; } else if (x > 10){
; puts("x is bigger than 10");
; } else if (x == 10){
; puts("x equals to 10"));
; }
(setq x 10)
(cond
((< x 10) (print "x is smaller than 10"))
((> x 10) (print "x is bigger than 10"))
((= x 10) (print "x equals to 10"))
)
Wednesday, March 25, 2015
Singleton Design Pattern in PHP Example
Singleton Pattern |
//singleton design pattern class DB { private static $status = NULL; private static $info = 'mysql:host=localhost;dbname=dbname'; private static $username = 'username'; private static $password = 'password'; private function __construct() { } public static function connect() { if(!self::$status) { try { self::$status = new PDO(self::$info, self::$username, self::$password); self::$status->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION); self::$status->setAttribute(PDO::ATTR_EMULATE_PREPARES, false); } catch (PDOException $e) { throw new Exception($e->GetMessage()); } } return self::$status; } public function __destruct() { self::$status = NULL; } }
First, We define status as a static variable, because we want to control whether connection is exists via only one variable. Aftet that define database information as private static variables. After all we create our connection static function, that is connect().
Here is the way is actually, if status variable is not null, code try to connect PDO database, otherwise connection is exists and will return true,NOT null.
Finally, in destruct function, connection has been closed by null value. When we want to connect database, we call the singleton like:
See you!
Finally, in destruct function, connection has been closed by null value. When we want to connect database, we call the singleton like:
DB::connect();If class has any other function, then like:
DB::connect()->functionName($param);
See you!
Labels
design pattern,
php,
singleton
Sunday, March 22, 2015
Introduction to Fuzuli : JFuzuli REPL
Let's try JFuzuli:
1. Download the Jar
The current compiled jar of JFuzuli interpreter is release candidate 1.0. You can download it using the link https://github.com/jbytecode/fuzuli/releases/tag/v1.0_release_candidate. You can always find the newest releases in site JFuzuli Releases.
2. Open the Command Prompt
After downloading the jar file, open your operation system's command prompt and locate the jar file by using cd (change directory) command.
3. Start trying it!
In command prompt, type
java -jar JFuzuli.jar
to start. You will see the options:
Usage:
java -jar JFuzuli.jar fzlfile
java -jar JFuzuli.jar --repl
java -jar JFuzuli.jar --editor
You can specify a fuzuli source file to run. The option --repl opens a command shell. The last option --editor opens the GUI. Let's try the command shell.
java -jar JFuzuli.jar --repl
F:
F: (+ 2 7)
9.0
F: (- 7 10)
-3.0
F: (require "lang.nfl")
0.0
F: (let mylist '(1 2 3))
[1.0, 2.0, 3.0]
F: (first mylist)
1.0
F: (last mylist)
3.0
F: (length mylist)
3
F: (nth mylist 0)
1.0
F: (nth mylist 1)
2.0
Well, we introduce some basic operators, data types and commands here but not all of them. We always put an operator or command after an opening parenthesis, arguments follow this operator and a closing parenthesis takes place. This is the well-known syntax of LISP and Scheme. So what is the language properties, what are the commands, how to try more Fuzuli codes in JFuzuli??
Blog posts on JFuzuli : http://stdioe.blogspot.com.tr/search/label/jfuzuli
Blog posts on Fuzuli: http://stdioe.blogspot.com.tr/search/label/fuzuli
Our online interpreter is : http://fuzuliproject.org/index.php?node=tryonline
Fuzuli Language home page: http://fuzuliproject.org/
Thursday, March 19, 2015
Why is R awesome?
For someone it is a magic, somebody hates its notation (maybe you!), it has some weird rules and maybe it is just a programming language like others (That is also my opinion). As the other programming languages, R has its good and bad properties but I can say it is the best candidate as a toolbox of a statistician or researchers who work on data analysis.
In this blog post, I collect 8 (from 0 to 7) nice properties of R. As a lecturer and researcher, I experienced that many students are more capable to understand some statistical concepts when I try to show and get them work using Monte Carlo simulations. In R, we are able to write compact codes to demonstrate these concepts which would be difficult to implement in an other programming language. R is not a simple toy, so we are always capable to enhance our knowledge, programming skills and get capabilities of writing better codes by introducing external codes that are written in real programming languages (an old joke of real man which uses C).
So, if it is, why is R awesome ?
0. Syntax of Algol Family
R has a weird assign operator but the remaining part is similar to Algol family languages such as C, C++, Java and C#. R has a similar facility of operator overloading (yes, it is not exactly the operator overloading), in other terms, single or compound character of symbols can be assigned to function names like this:
1. Vectors are primitive data types
Yes, vectors are also primitives with an opening and a closing bracket in other members of Algol. In C/C++ they are arrays of primitives and objects in Java. Contrary this, binary operators are directly applicable on the vectors and matrices in R. For example estimation of least squares coefficients is a single line expression in R as:
This example shows the differences between a scaler and a vector:
No difference!
2. Theorems get alive in minutes
Suppose that X is a random variable that follows an Exponential Distribution with ratio = 5.
Sum or mean of randomly selected samples with size of N follows a normal distribution. This is an explanation of the Central Limit Theorem with an example. Theorems are theorems. But you may see a fast demonstration (and probably a proof for educational purposes only) and try to write a rapid application. A process of writing a code like this takes minutes if you use R.
3. There is always a second plan for faster code
Now suppose that we are drawing 50,000 samples randomly using the code above. What would be the computation time?
Drawing 50,000 samples with size 100 takes 0.582 seconds. Is it now fast enough? Lets try to write it in C++ !
Now our R code is 3.145946 times slower than the code written in C++.
4. Interaction with C/C++/Fortran is enjoyable
Since a huge amount of R is written in C, migration of old C libraries is easy by writing wrapper methods using SEXP data types. Rcpp masks these routines in a clever way. Fortran code is also
linkable. Interaction with other languages makes use of old libraries in R and enables the possibility of writing faster new libraries. It is also possible to create instances of R in C and C++ applications.
For an enjoyable example, have a look at the section 3. There is always a second plan for faster code.
The R package eive includes a small portion of C++ code and it is a compact example of calling C++ functions from within R. Accessing C++ objects from R is also possible thank to Rcpp. Click here to see the explanation and an example.
5. Interaction with Java
Calling Java from R (rJava) and calling R from Java (JRI, RCaller) are all possible. Renjin has a different concept as it is the R interpreter written in Java (Another possibility of calling R from Java , huh?). A detailed comparison of these method is given in this documentation and this.
6. Sophisticated variable scoping
In R, functions have their own variable scopes and accessing variables at the top level is possible. Addition to this, variable scoping is handled by standard R lists (specially they are called environments) and in any side of code user based environments can be created. For detailed information visit Environment in R.
7. Optional Object Oriented Programming (O-OOP)
R functions take values of variables as parameters rather than their addresses. If a vector with size of 10,0000 is passed through a function, R first copies this vector then passes it to the function. After body of the function is performed, the copied parameter is then labeled as free for later garbage collecting. As C/C++ programmers know, passing objects with their addresses rather than their values is a good solution for using less memory and spending less computation time. Reference classes in R are passed to functions with their addresses in a way similar to passing C++ references and Java objects to functions and methods:
The output is
Java and C++ programmers probably like this notation!
Have a nice read!
In this blog post, I collect 8 (from 0 to 7) nice properties of R. As a lecturer and researcher, I experienced that many students are more capable to understand some statistical concepts when I try to show and get them work using Monte Carlo simulations. In R, we are able to write compact codes to demonstrate these concepts which would be difficult to implement in an other programming language. R is not a simple toy, so we are always capable to enhance our knowledge, programming skills and get capabilities of writing better codes by introducing external codes that are written in real programming languages (an old joke of real man which uses C).
So, if it is, why is R awesome ?
0. Syntax of Algol Family
R has a weird assign operator but the remaining part is similar to Algol family languages such as C, C++, Java and C#. R has a similar facility of operator overloading (yes, it is not exactly the operator overloading), in other terms, single or compound character of symbols can be assigned to function names like this:
> '%_%' <- function(a,b){ + return(exp(a+b)) + } > 5 %_% 2 [1] 1096.633
1. Vectors are primitive data types
Yes, vectors are also primitives with an opening and a closing bracket in other members of Algol. In C/C++ they are arrays of primitives and objects in Java. Contrary this, binary operators are directly applicable on the vectors and matrices in R. For example estimation of least squares coefficients is a single line expression in R as:
> assign("x",cbind(1,1:30))
> assign("y",3+3*x[,2]+rnorm(30))
> solve(t(x) %*% x) %*% t(x) %*% y
[,1]
[1,] 2.858916
[2,] 3.003787
This example shows the differences between a scaler and a vector:
1 2 3 4 5 6 7 8 9 10 | > assign("x", c(1,2,3)) > assign("a", 5) > typeof(x) [1] "double" > typeof(a) [1] "double" > class(x) [1] "numeric" > class(a) [1] "numeric" |
No difference!
2. Theorems get alive in minutes
Suppose that X is a random variable that follows an Exponential Distribution with ratio = 5.
Sum or mean of randomly selected samples with size of N follows a normal distribution. This is an explanation of the Central Limit Theorem with an example. Theorems are theorems. But you may see a fast demonstration (and probably a proof for educational purposes only) and try to write a rapid application. A process of writing a code like this takes minutes if you use R.
> assign("nsamp", 5000) > assign("n", 100) > assign("theta", 5.0) > assign("sums", rep(0,nsamp)) > > for (i in 1:nsamp){ + sums[i] <- sum(rexp(n = n, rate = theta)) + } > hist(sums)
3. There is always a second plan for faster code
Now suppose that we are drawing 50,000 samples randomly using the code above. What would be the computation time?
> assign("nsamp", 50000) > assign("n", 100) > assign("theta", 5.0) > assign("sums", rep(0,nsamp)) > > s <- system.time( + for (i in 1:nsamp){ + sums[i] <- sum(rexp(n = n, rate = theta)) + } + ) > > print(s) user system elapsed 0.582 0.000 0.572
Drawing 50,000 samples with size 100 takes 0.582 seconds. Is it now fast enough? Lets try to write it in C++ !
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector CalculateRandomSums(int m, int n) { NumericVector result(m); int i; for (i = 0; i < m; i++){ result[i] = sum(rexp(n, 5.0)); } return(result); }
After compiling the code within Rcpp, we can call the function CalculateRandomSums() from R.> s <- system.time( + vect <- calculaterandomsums(50000,100) > print(s) user system elapsed 0.185 0.000 0.184
Now our R code is 3.145946 times slower than the code written in C++.
4. Interaction with C/C++/Fortran is enjoyable
Since a huge amount of R is written in C, migration of old C libraries is easy by writing wrapper methods using SEXP data types. Rcpp masks these routines in a clever way. Fortran code is also
linkable. Interaction with other languages makes use of old libraries in R and enables the possibility of writing faster new libraries. It is also possible to create instances of R in C and C++ applications.
For an enjoyable example, have a look at the section 3. There is always a second plan for faster code.
The R package eive includes a small portion of C++ code and it is a compact example of calling C++ functions from within R. Accessing C++ objects from R is also possible thank to Rcpp. Click here to see the explanation and an example.
5. Interaction with Java
Calling Java from R (rJava) and calling R from Java (JRI, RCaller) are all possible. Renjin has a different concept as it is the R interpreter written in Java (Another possibility of calling R from Java , huh?). A detailed comparison of these method is given in this documentation and this.
6. Sophisticated variable scoping
In R, functions have their own variable scopes and accessing variables at the top level is possible. Addition to this, variable scoping is handled by standard R lists (specially they are called environments) and in any side of code user based environments can be created. For detailed information visit Environment in R.
7. Optional Object Oriented Programming (O-OOP)
R functions take values of variables as parameters rather than their addresses. If a vector with size of 10,0000 is passed through a function, R first copies this vector then passes it to the function. After body of the function is performed, the copied parameter is then labeled as free for later garbage collecting. As C/C++ programmers know, passing objects with their addresses rather than their values is a good solution for using less memory and spending less computation time. Reference classes in R are passed to functions with their addresses in a way similar to passing C++ references and Java objects to functions and methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | Person <- setRefClass( Class = "Person", fields = c("name","surname","email"), methods = list( initialize = function(name, surname, email){ .self$name <- name .self$surname <- surname .self$email <- email }, setName = function(name){ .self$name <- name }, setSurname = function(surname){ .self$surname <- surname }, setEMail = function (email){ .self$email <- email }, toString = function (){ return(paste(name, " ", surname, " ", email)) } ) # End of methods ) # End of class p <- Person$new("John","Brown","brown@server.org") print(p$toString()) |
The output is
[1] "John Brown brown@server.org"
Java and C++ programmers probably like this notation!
Have a nice read!
Monday, March 16, 2015
Compact Genetic Algorithms with R
Compact Genetic Algorithm (CGA) is a member of Genetic Algorithms (GAs) and also Estimation of Distribution Algorithms (EDAs). Since it is based on a single chromosome rather than a population of chromosomes, it is compact.
For detailed information, research papers [1] and [2] present a complete and a brief documentations, respectively.
In this blog post, we give an example of use of compact genetic algorithms on ONEMAX function. ONEMAX function takes n-bits as parameters and returns the number of ones as integer. Since it is only one local optimum when all of the bits equal to 1, it is called ONEMAX.
First of all, we load the R package eive which includes the wrapped C++ function cga.
> require("eive")
The other step is to define the ONEMAX function.
> ONEMAX <- function (x){ + return(-sum(x)) + }
Now we write the main part, optimization with cga:
> result <- cga(chsize = 10 , popsize = 100 , evalFunc = ONEMAX)
> result [1] 1 1 1 1 1 1 1 1 1 1
The result is a vector in which the bits are all equal to 1.
The most important issue in this example is speed, because the algorithm is implemented in C++ and wrapped using Rcpp to be called within R.
Here is the example of 1000 bits and the time consumed by the cga function call:
> system.time( + result <- cga(chsize = 1000,popsize = 100,evalFunc = ONEMAX)
+ ) user system elapsed 0.443 0.000 0.433 > ONEMAX(result) [1] -994
This result seems to be considerably fast and 994 of 1000 bits are found as 1 by the function in 0.433 seconds. Lets increase the population size from 100 to 200:
> system.time( + result <- cga(chsize = 1000,popsize = 200,evalFunc = ONEMAX)
+ ) user system elapsed 0.891 0.000 0.866 > print (ONEMAX(result)) [1] -1000
Now, after setting the population size from 100 to 200, function doubles the time consumed to 0.866 seconds. But this time, 1000 of 1000 bits are 1, and the optimal solution is reached.
Have a nice read !
[1] Harik, Georges R., Fernando G. Lobo, and David E. Goldberg. "The compact genetic algorithm." Evolutionary Computation, IEEE Transactions on 3.4 (1999): 287-297.
[2] Satman, M. Hakan, and Erkin Diyarbakirlioglu. "Reducing errors-in-variables bias in linear regression using compact genetic algorithms." Journal of Statistical Computation and Simulation ahead-of-print (2014): 1-20.
Labels
c++,
genetic algorithms,
r,
rcpp
Accessing C++ objects from R using Rcpp
Rcpp (Seemless R and C++ integration) package for R provides an easy way of combining C++ and R code. Since R is an interpreter, a bulk of code would probably run at least 2 times slower than its counterpart written in C++. Speed is the most concerning issue many times, however, the main purpose of using C++ would be using an old native library with R.
In this post blog, we give an example of accessing a C++ class from within R using Rcpp. This C++ class is defined with name MyClass and has two private double typed variables. This class also has getter and setter methods for its private fields.
MyClass is defined as the code shown below:
In this post blog, we give an example of accessing a C++ class from within R using Rcpp. This C++ class is defined with name MyClass and has two private double typed variables. This class also has getter and setter methods for its private fields.
MyClass is defined as the code shown below:
#include <Rcpp.h>using namespace Rcpp; using namespace std;
class MyClass { private: double a,b; public: MyClass(double a, double b); ~MyClass(); void setA(double a); void setB(double b); double getA(); double getB(); };
MyClass has its private double typed variables a and b, a constructor, a destructor, getter and setter methods for a and b, respectively. The implementation of MyClass is given below:
MyClass::MyClass(double a, double b){ this->a = a; this->b = b; } MyClass::~MyClass(){ cout << "Destructor called" << std::endl; } void MyClass::setA (double a){ this->a = a; } void MyClass::setB (double b){ this->b = b; } double MyClass::getA(){ return(this->a); } double MyClass::getB(){ return(this->b); }
MyClass is defined nearly minimal. Since it is a C++ class it is not directly accessable from R. In this example, we write some wrapper methods to create instances of MyClass and return their addresses in memory to perform later function calls. In other terms, in R side, we register address of C++ objects to access them.
// [[Rcpp::export]]
long class_create(double a, double b){
MyClass *m = new MyClass(a,b);
class_print((long) m);
return((long)m);
}
The method class_create is a C++ method and it has a special comment which will be used by Rcpp before compiling. After compiling process, class_create wrapper R function will be created to call its C++ counterpart. This function create an instance of class_create with given double typed values and returns the address of created object in type long integer. Here is the other wrapper functions:
// [[Rcpp::export]]
void class_print(long addr){
MyClass *m = (MyClass*)addr;
cout << "a = " << m->getA() << " b = " << m->getB() << "\n";
}
// [[Rcpp::export]]
void class_destroy(long addr){
MyClass *m = (MyClass*) addr;
delete m;
}
// [[Rcpp::export]]
void class_set_a(long addr, double a){
MyClass *m = (MyClass*) addr;
m->setA(a);
}
// [[Rcpp::export]]
void class_set_b(long addr, double b){
MyClass *m = (MyClass*) addr;
m->setB(b);
}
// [[Rcpp::export]]
double class_get_a(long addr){
MyClass *m = (MyClass*) addr;
return(m->getA());
}
// [[Rcpp::export]]
double class_get_b(long addr){
MyClass *m = (MyClass*) addr;
return(m->getB());
}
Suppose the whole code is written in a file classcall.cpp. In R side, this code can be compiled and tested as shown below:
> require("Rcpp")
Loading required package: Rcpp
> Rcpp::sourceCpp('rprojects/classcall.cpp')
> myobj <- class_create(3.14, 7.8)
a = 3.14 b = 7.8
> myobj
[1] 104078752
> class_set_a(myobj,100)
> class_set_b(myobj,500)
> class_print(myobj)
a = 100 b = 500
> class_get_a(myobj)
[1] 100
> class_get_b(myobj)
[1] 500
> class_destroy(myobj)
Destructor called
Have a nice read!
Saturday, March 14, 2015
SQLite with R - The sqldf package
R 's data sorting functions sort and order, the data filtering function which, vector accessing operators [], vector and matrix manipulation functions cbind and rbind, and other functions and keywords make data analysis easy in much situations. SQL (Structered Querying Language) is used for storing, adding, removing, sorting and filtering the data in which saved on a disk permenantly or memory.
The R package sqldf builds a SQLite database using an R data.frame object. A data.frame is a matrix with richer properties in R. In this blog post, we present a basic introduction of sqldf package and its use in R.
First of all, the package can be installed by typing:
> install.packages("dftable")
After installing the package, it can be got ready to use by typing:
> require("dftable")
Loading required package: sqldf
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
Now lets create two vectors with length of 100:
> assign("x", rnorm(100))
> assign("y", rnorm(100))
> assign("mydata", as.data.frame(cbind(x,y)))
We can see first 6 rows:
> head(mydata)
x y
1 -1.9357660 0.2784369
2 -0.6976428 1.4646022
3 0.1913628 0.1578977
4 0.3049607 0.6055087
5 2.3773249 1.1800434
6 0.4641791 1.7143130
Let's perform some SQL statements on this data frame using sqldf:
Averages of x and y
> sqldf("select avg(x), avg(y) from mydata")
avg(x) avg(y)
1 0.0790934 0.220756
Number of cases
> sqldf("select count(x), count(y) from mydata")
count(x) count(y)
1 100 100
First Three Cases
> sqldf("select x,y from mydata limit 3")
x y
1 -1.9357660 0.2784369
2 -0.6976428 1.4646022
3 0.1913628 0.1578977
Minimum and Maximum Values
> sqldf("select min(x),max(x),min(y),max(y) from mydata")
min(x) max(x) min(y) max(y)
1 -2.155768 2.377325 -1.75477 2.531869
First 3 Cases of Ordered Data
> sqldf("select x,y from mydata order by x limit 3")
x y
1 -2.155768 0.6614813
2 -1.935766 0.2784369
3 -1.837502 0.1073177
> sqldf("select x,y from mydata order by y limit 3")
x y
1 0.7665811 -1.754770
2 0.3373319 -1.736727
3 0.6199159 -1.335649
Insert into
dftable does not alter the data frame. After inserting a new case, a new data.frame is created and returned. In the example below, sqldf takes a vector of two sql statements as parameters and the result is in accessable with the name main.mydata rather than mydata.
> tail (sqldf(
+ c(
+ "insert into mydata values (6,7)"
+ ,
+ "select * from main.mydata"
+ )
+ )
+ )
x y
96 1.58024523 1.3937920
97 -1.79352203 0.2105787
98 0.02632872 -1.0567890
99 -0.60934162 -0.1359667
100 1.43393159 -0.9396326
101 6.00000000 7.0000000
Delete
> sqldf(
+ c(
+ "delete from mydata where x < 0 or y < 0"
+ ,
+ "select * from main.mydata"
+ )
+ )
x y
1 0.19136277 0.15789771
2 0.30496074 0.60550873
3 2.37732485 1.18004342
4 0.46417906 1.71431305
5 1.16290585 1.17154756
6 0.49335335 0.19904607
7 1.45769371 0.08291387
8 0.78473338 1.07769098
9 0.69043300 1.35040512
10 1.47893118 1.01057351
.....
Have a nice read!
Subscribe to:
Posts (Atom)