Sunday, August 18, 2013

Sorting Multi-Column Datasets in R

Sorting Multi-Column Datasets in R


August 18, 2013

In this entry, we present the most straightforward way to sort multi-column datasets

Suppose that we have a vector in three dimensional space. This vector can be defined as

<- c(3,1,2)

in R. The built-in function sort sorts a given vector in ascending order by default. An ordered version of vector a is calculated as shown below:

sorted_<- sort(a)

The result is

[1] 1 2 3

For reverse ordering, the value of decreasing parameter must be set to TRUE.

> sort(a,decreasing=TRUE) 
[1] 3 2 1

Another related function order returns indices of sorted elements.

> a <- c(3,1,2) 
> order (a) 
[1] 2 3 1

In the example above, it is shown that if the vector a is ordered in ascending order, second element of a will be placed at index 1. It is easy to show that, result of the function order can be used to sort data.

> a <- c(3,1,2) 
> o <- order (a) 
> a[o] 
[1] 1 2 3

Suppose that we need to sort a matrix by a desired row. The solution is easy. Get the indices of sorted desired column and use the same method as in example given above.

> x <- round(runif(5, 0, 100)) 
> y <- round(runif(5, 0, 100)) 
> z <- round(runif(5, 0, 100)) 
> data <- cbind(x,y,z) 
> data 
      x  y  z 
[1,] 48 35 75 
[2,] 40 21 43 
[3,] 58 69  1 
[4,] 49 38  2 
[5,] 43 66 46

Now, get the indices of sorted z:

> o <- order(z) 
> data[o,] 
      x  y  z 
[1,] 58 69  1 
[2,] 49 38  2 
[3,] 40 21 43 
[4,] 43 66 46 
[5,] 48 35 75

Finally, we sorted the dataset data by the column vector z.

No comments:

Post a Comment

Thanks