introduction to r - university of minnesota supercomputing ...introduction to r haoyu yu...

53
Supercomputing Institute for Advanced Computational Research © 2011 Regents of the University of Minnesota. All rights reserved. Introduction to R Haoyu Yu ([email protected] , 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute, University of Minnesota help-line: [email protected] , 612-626-0802

Upload: others

Post on 02-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

Introduction to R

Haoyu Yu ([email protected], 612-625-1709)

Scientific Consulting Group (SCG) Supercomputing Institute, University of Minnesota help-line: [email protected], 612-626-0802

Page 2: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Starting R •  R library packages •  Online resources

•  Using R – basic steps and language essentials including

•  reading data •  data types •  control structures •  using/writing functions

•  basic steps •  graphics

Outline

Page 3: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

What is R? 1.  A language and a computing environment for data

analysis. 2.  What is the difference between R and S-PLUS?

•  S-PLUS: a commercial system from the Insightful Corporation •  R: a free system: http://www.r-project.org/

3.  There are some differences between the two, but in everyday use they are very similar.

4.  R runs on Windows, Mac, and a range of UNIX/Linux operating systems.

5.  But how to start?

Page 4: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

There are many useful links such as:

•  R Web site: http://www.r-project.org/

•  CRAN on R web site: http://cran.r-project.org/

•  R Documentation:

http://cran.r-project.org/doc/manuals/R-intro.pdf

http://cran.r-project.org/doc/manuals/R-lang.pdf

•  To start R on MSI machines: take a quick look at the MSI Web site

https://www.msi.umn.edu/sw/r

Useful Links to R Resources

Page 5: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

" R command line window

Starting R

Page 6: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Using R Packages

•  Installing packages in R: install.packages(“pkg_name”) •  Installed packages: library() (or installed.packages() ) •  Loading packages: library(“pkg_name”)

R Packages

Page 7: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

Objects and Functions in R

§  Everything in R is an object: a named storage space •  Examples of Objects

-  Vectors -  Matrices -  Arrays -  Lists -  Data Frames -  Factors

§  Functions are a special type of object •  Take arguments and carry out some operations

Page 8: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> x <- c(1, 2, 3, 4, 5, 6, -2, -3, -4) ; > x [1] 1 2 3 4 5 6 -2 -3 -4 > length(x) [1] 9 > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. -4.000 -2.000 2.000 1.333 4.000 6.000 > dim(x) <- c(3,3) > x [,1] [,2] [,3] [1,] 1 4 -2 [2,] 2 5 -3 [3,] 3 6 -4

> cov(x) [,1] [,2] [,3] [1,] 1 1 -1 [2,] 1 1 -1 [3,] -1 -1 1

Examples of R Objects *

Page 9: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  R can read in data in many different formats: tab-delimited, excel, csv, sas, stata, spss, etc.

some common functions to be used: •  readLines •  scan •  data.frame •  read.table •  read.cvs •  read.delim •  read.xls •  read.xport (for SAS xport files) •  read.dta (read Stata binary file) NOTE: some of these functions belong to add-on R

packages (e.g. the “foreign” package)

Reading Data

Page 10: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Interaction with databases directly in R

•  Definition of R database interface and some links:

http://stat.bell-labs.com/RS-DBI/doc/html/index.html

http://stat.bell-labs.com/RS-DBI/index.html

•  RODBC

•  An ODBC database interface

•  RMySQL

•  Database interface and MySQL driver for R

R and Databases

Page 11: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Vectors •  The most basic data object

•  Matrices and Arrays •  array() stores data in multiple dimensions •  matrix() creates a two dimensional array

•  Lists •  Ordered collection of other objects (same or

different types) •  Data frames

•  A special class of lists: structure to store tables •  Factors

•  A data type to handle categorical (nominal) data

Basic Types of Data Objects in R

Page 12: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Vectorization

•  Many functions in R operate on each element of a vector directly to produce a vector of the same length

•  Parallel computing in R

•  Perform computation in R on multi-core or multi-node of computers through explicit or implicit parallel computing modes

•  There are a number of packages that are useful in high-performance computing in R

•  http://cran.r-project.org/web/views/HighPerformanceComputing.html

Parallel Computing in R

Page 13: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  vector: an ordered collection of •  Numerical: integer, double, complex •  Character •  Logical

•  R has six basic vector types: integer, double, complex, character, logical, and raw (which is to hold raw bytes)

•  Indexing plays a key role •  Sub-setting can also be done

•  functions to create vectors •  c() •  seq() •  rep()

More Data Details

Page 14: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  using the matrix function: 2-d arrays > ma <- matrix( 1:15, 3, 5 ) > ma [,1] [,2] [,3] [,4] [,5] [1,] 1 4 7 10 13 [2,] 2 5 8 11 14 [3,] 3 6 9 12 15

•  add more rows or more columns to a matrix using rbind() or cbind()

> cbind ( ma, rbind ( A = 1, B =1:4, C = 11:14 ) ) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]

A 1 4 7 10 13 1 1 1 1 B 2 5 8 11 14 1 2 3 4

C 3 6 9 12 15 11 12 13 14

•  array() creates arrays with more dimensions

•  array(1:24, dim=c(2,4,3))

High Dimensional Arrays *

Page 15: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  list: combines a collection of objects (that can be of different modes) into one object

Example: Create one structure from one character string with the name “name”, one number with the name “year”, and one numeric vector for scores that has a length 4

> my.student <- list( name = c("Peter"), year = 1, class = c("Math101"), scores = c(80, 96, 88, 91) )

> my.student $name [1] "Peter”

$year [1] 1

$class [1] "Math101"

$scores [1] 80 96 88 91

Lists *

Page 16: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

data frame: a list of variables of the same length with unique row names. (It is like a matrix with columns that can be of different modes. It is displayed in matrix form, rows by columns.)

> my.class <- data.frame( stud.ids = c ("123", "16", "289", "1234", "78", "512"), final = c(90, 85, 99, 83, 92, 79) )

> my.class stud.ids final 1 123 90 2 16 85 3 289 99 4 1234 83 5 78 92 6 512 79

Data Frame *

Page 17: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  a vector of categorical variables that can be ordered or unordered

examples: tumor stage, social class, etc.

•  to create:

> gender <- c(1,1,0,1,0,0)

> fgender <- factor ( gender, levels=0:1)

> levels ( fgender) <- c( "male", "female ") > fgender [1] female female male female male male Levels: male female

Factor

Page 18: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  mode > mode( 2 ) # gives the storage mode [1] "numeric" > typeof( 2 ) # gives the R internal type [1] "double” •  length: •  names: > x <- list( a = 1, b = "A", c = 2:5 ) > names( x ) [1] "a" "b" "c" > x $a [1] 1 $b [1] "A" $c [1] 2 3 4 5

Data Attributes and Classes *

Page 19: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  comparison operators: •  equal: == •  not equal: != •  greater/less than: > or < •  greater/less than or equal: >= or <=

•  logical operators: •  & (AND): returns TRUE if both comparisons return TRUE

•  | (OR): returns TRUE if at least one comparison returns TRUE

•  ! (NOT): returns the negation (opposite) of a logical vector

•  other operators: •  assignment operator: <- or = (a recent addition)

•  precedence of operators: ?Syntax

Basic Operators

Page 20: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  for loops: examples •  for ( i in 1:10 ) plot( mydata[ , i] ) •  samples <- paste("sample",1:10,sep="")

for( i in samples ) print(i) •  while ( condition ) expression •  repeat

> y <- 1000 > x <- y/2 > while( abs( x*x-y ) >= 1e-10 ) (x <- (x + y/x)/2) > repeat { + x <- (x + y/x)/2 + if (abs(x*x-y) < 1e-10) break + }

Control Flow Structures

Page 21: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> x <- rnorm( 20, mean = 1, sd = 1)

[1] 0.56477127 0.99415526 0.56525882 0.06839325 1.71730871 [6] 0.52559999 -0.22859231 0.60854887 -0.38387933 1.43971644

[11] 1.78149272 2.28914317 1.03011630 0.69887935 0.11625638 [16] 0.82733328 0.36110657 1.53226873 -0.60203855 1.21676773

> ifelse ( x > 1, 1, 0) # “ifelse” is vectorized [1] 0 0 0 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0

> if ( x[1] > 1 ) 1 else 0 [1] 0

> # note: the length of the argument has to be 1 > if ( x > 1 ) 1 else 0 # “if … else” is not vectorized

[1] 0 Warning message:

In if (x > 1) 1 else 0 : the condition has length > 1 and only the first element will be used

If and Else *

Page 22: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  “switch” can evaluate a character string > f <- function( x, type ) {

switch( type , mean = mean ( x ) , range = range ( x ) , accsum = Reduce( "+", x , accumulate = T ) )

}

> f ( x , "mean" ) [1] 0.81785

> f ( x , "range" ) [1] -0.6463109 2.7719993

> f ( x , "accsum" ) [1] 0.2346313 -0.4116796 0.1852275 1.2166363 2.6556395 5.1379385 [7] 4.5755973 6.9113659 6.5363048 8.3764793 8.9208708 10.1792544

[13] 10.2792301 10.7349598 12.8859846 15.6579839 16.4046637 16.4132600 [19] 16.2106088 16.3570006 > x

[1] 0.23463133 -0.64631092 0.59690713 1.03140873 1.43900320 2.48229901 -0.56234115 2.33576855 -0.37506108 1.84017456

[11] 0.54439148 1.25838362 0.09997561 0.45572975 2.15102479] 2.77199931 0.74667984 0.00859621 -0.20265113 0.14639180

Switch

Page 23: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> x1 <- c(1,2,3,4) > y1 <- c(1,2,3,4) > x1 * y1 # element wise multiplication [1] 1 4 9 16 > x1 %*% y1 [,1] [1,] 30 > dim (x1) <- c(2,2) > x1 [,1] [,2] [1,] 1 3 [2,] 2 4 > x1 %*% x1 # matrix multiplication (inner product) [,1] [,2] [1,] 7 15 [2,] 10 22 > x1[,2] %*% x1 [,1] [,2] [1,] 11 25 > x1[1,] %*% x1 [,1] [,2] [1,] 7 15

Matrix Computation *

Page 24: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  %o% -- outer product > x1[,1] %o% x1 , , 1 ## x1[,1] %*% t(x1[,1]) [,1] [,2] [1,] 1 2 [2,] 2 4 , , 2 ## x1[,1] %*% t(x1[,2]) [,1] [,2] [1,] 3 4 [2,] 6 8 > dim(x1[,1] %o% x1) [1] 2 2 2 > y1 [1] 1 2 3 4 > y1 %o% y1 ## y1 %*% t(y1) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 2 4 6 8 [3,] 3 6 9 12 [4,] 4 8 12 16

Matrix Computation: More Functions *

Page 25: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  %x% -- kronecker product of two arrays > y1 %x% y1 ## m-by-n matrix “kronecker” p-by-q matrix gives a mp x nq matrix [1] 1 2 3 4 2 4 6 8 3 6 9 12 4 8 12 16 > kronecker(y1, y1) [1] 1 2 3 4 2 4 6 8 3 6 9 12 4 8 12 16 > kronecker( diag(1, 2), x1 ) [,1] [,2] [,3] [,4] [1,] 1 3 0 0 [2,] 2 4 0 0 [3,] 0 0 1 3 [4,] 0 0 2 4 •  crossprod() – cross product (may slightly faster) > crossprod( x1, x1) ## [,1] [,2] [1,] 5 11 [2,] 11 25 > t( x1 ) %*% x1 [,1] [,2] [1,] 5 11 [2,] 11 25

ATA

Matrix Computation: More Functions * … …

Page 26: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  more functions on matrix computation •  eigen() – compute eigenvalues and eigenvectors of

numerical matrices. •  norm(), rcond(), kappa() – matrix norm and condition

numbers •  svd() -- singular-value decomposition of a

rectangular matrix •  qr() -- QR decomposition of a matrix •  solve() – solves the equations A %*% X = B for x •  det() -- calculates the determinant of a matrix •  t() -- transpose of a matrix or a data.frame •  Conj(t()) -- conjugate transpose of a complex matrix •  aperm() – transpose a matrix by permuting its

dimensions

Matrix Computation: More Functions …

Page 27: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  For these data types, all elements have to have the same storage mode: for example: "logical", "integer", "double", "complex”, or "character” •  vector •  matrix •  array •  factor

•  These data types allow multiple types of elements with different storage modes: •  list •  data frame

Homogeneous and Heterogeneous Data Types

Page 28: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> hist.w.density <- function(x, xlab = deparse(substitute(x)), ...) { h <- hist( x, plot=F ); s <- sd( x ); m <- mean( x ); ylim <- range( 0, h$density ); h1 <- hist(x, freq=F, ylim=ylim, col='lightblue', xlab=xlab, ...); lines( density(x), col='purple', lwd = 2 ); list( mean = m, sd = s ); } > > mydata <- rchisq( 200, 10 ) > hist.w.density( mydata )

Note: deparse(substitute(x)) returns a character string version of the actual argument to the function (in this case, it is the “x”)

R functions: syntax, arguments, etc.

Page 29: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

R functions: syntax, arguments, etc.

Page 30: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  The last function uses the default return (the last expression) •  a “return” can also be used explicitly: return a value > x1

[,1] [,2]

[1,] 1 6 [2,] 2 7

[3,] 3 8

[4,] 4 9

[5,] 5 10 > ( y1 <- seq_len( 5 ) )

[1] 1 2 3 4 5

> xy <- function( x, y = x ) { return(t(x) %*% y) }

> xy ( x = x1 , y = y1 ) # not use the “argument matching” [,1] [1,] 55

[2,] 130

> xy ( x = x1 ) # use the default argument for the function “xy” [,1] [,2]

[1,] 55 130

[2,] 130 330

Default Arguments

Page 31: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Y(i,j) = a(j) + b(j)*STATUS(i) •  the index I and j here can be:

•  i goes through experiments (say from 1 to “n”) •  j goes through probes (possibly through probesets, say from 1 to “m”)

•  in other words for each j, to find a(j) and b(j) that minimize:

•  take the derivative w.r.t the coefficients a and b and set them to zero:

!"#$#%"&

'

&=1

*&,, = -, !"#$#%"&

'

&=1

+ /, !("#$#%"&)2'

&=1

 

!"#,%

&

#=1

= &  *% + ,% !-./.0-#

&

#=1

 

Yi, j ! (aj + bjSTATUSi )( )2i=1

n

"

Least Squares Fitting

Page 32: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  in a matrix form:

•  the (2,2)th element of its inverse matrix is:

n STATUSii=1

n

!

STATUSii=1

n

! (STATUSi )2

i=1

n

!

"

#

$$$$$

%

&

'''''

1 (STATUSi )2 !

STATUSii=1

n

"#

$%

&

'(

n

2

i=1

n

"

#

$

%%%%%

&

'

(((((

Least Squares Fitting …

Page 33: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

lr <- function( y, x ) { nobs <- if (is.null(dim(x))) rep(1,length(x)) else rep(1,nrow(x))

x <- cbind( nobs, x ) solve( crossprod( x, x ) ) %*% t( x ) %*% y

} > cases [1] 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 > dim( Y ) [1] 30 400 > coefficients <- lr( Y, cases ) > dim( coefficients ) [1] 2 400 •  crossprod( X, X ) = t( X ) * X •  solve( crossprod( X, X ) ) – the inverse of the matrix

•  coefficients[2,] -- the “slope” of the linear models

Least Squares Fitting …

Page 34: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  the standard error for the “slope” is

where an estimator for the variance could be: lr <- function(y, x) { nobs <- if (is.null(dim(x))) rep(1,length(x)) else rep(1,nrow(x)) x1 <- cbind(nobs, x)

invA <- solve( crossprod(x1, x1) ) coefficients <- solve(crossprod(x1, x1)) %*% t(x1) %*% y

sde_b <- sqrt(colSums( (y - (x1 %*% coefficients))^2 )/ ( dim(x1)[1]-1) * invA[2,2] )

return ( list( coefficients, sde_b ) ) }

! "#!$%$&!'2 −  (∑ !$%$&!-)

2

/01  

Yi, j !Yi. j^"

#$

%&'2

/ (n!1)i=1

n

(

Least Squares Fitting …

Page 35: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> results <- lr ( y = Y, x = cases)

> length( results ) [1] 2

> dim( results[[1]] )

[1] 2 400

> length( results[[2]] )

[1] 400

•  results[[1]] – estimates of the linear model coefficients •  results[[2]] – standard error of the “slope” coefficients

•  There are functions with “empty” argument list

•  when the argument of a function is like “…”, it matches all the arguments

•  “function” can be passed as arguments of other functions

Least Squares Fitting …

Page 36: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Functions can be used as arguments: > cases <- sapply(

runif( 6 ),

function( r ) { if (r > .5) rep(1, times=4) else rep (0,times=4) } )

> cases

[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 1 1 0 1 0 0

[2,] 1 1 0 1 0 0

[3,] 1 1 0 1 0 0

[4,] 1 1 0 1 0 0

Functions as Arguments

Page 37: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Treat functions as objects and pass them as arguments to a list

> means <- list( f1 = function( x ) { mean(x, trim = 0.05) } ,

f2 = function( x ) { mean(x, trim = 0.10) } ,

f3 = function( x ) { mean(x, trim = 0.15) } )

> a <- rnorm( 1000 )

> means[[1]] ( a )

[1] -0.04375495

> means[[2]] ( a )

[1] -0.04692238

> means[[3]] ( a )

[1] -0.04734544

How to Access Functions in a list

Page 38: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  Functions: apply, lapply, sapply, and vapply: lapply ( data, function )

sapply ( data, function ); vapply ( data, function, tmp_value ) •  “data” can be a vector or a list

•  these functions return a new list (or maybe simplified list like a vector) by applying the “function” on each of the input components

•  vapply is very similar to sapply, but with extra checking on the input components based on “tmp_value”. > x <- list( a = c(1:4), b = c(10:12), c = c(0.1:0.5) ) > x $a [1] 1 2 3 4

$b [1] 10 11 12

$c [1] 0.1

Functions with arguments that are functions

Page 39: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

functions lapply, sapply, and vapply continued …

> lapply( x, mean ) $a [1] 2.5

$b [1] 11

$c [1] 0.1

> sapply( x, mean )

a b c 2.5 11.0 0.1 what happens if ones uses “vapply”?

Applying Functions

Page 40: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

function apply: apply( mat, margin, function) •  “mat” a matrix or an array

•  “margin” the subscript that the “function” is applied

> x <- matrix( rnorm( 100 ), nrow = 20, ncol = 5 ) > x [,1] [,2] [,3] [,4] [,5] [1,] 0.12854081 0.83905886 -2.4188850 -0.40231317 0.45699871 [2,] 0.48828276 0.49086884 -1.6613223 -0.30584325 -0.95571975 [3,] -0.28471730 -0.30888101 -1.3912699 0.29120769 1.42745212 [4,] -1.68301946 0.93181591 1.4815138 0.59722151 0.80390538 [5,] 0.91531660 0.90153731 1.4904788 0.36291027 -0.90925853 ... [17,] -1.71004865 0.14872353 0.9783127 0.59204323 -0.03233043 [18,] -0.35315308 1.38312362 -0.3097932 0.03894128 -0.32933839 [19,] -0.40997310 2.00611129 -1.2574482 -0.17318760 0.92358798 [20,] 0.32112173 -0.29736373 -0.3486134 0.49357963 -0.25097612

> apply( x, 2, var ) [1] 1.6209041 0.9772905 1.5097936 0.4183222 0.8069382

Applying Functions …

Page 41: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

> dim( mat1 ) [1] 100 100

> mode( mat1 ) [1] "numeric” # for each row to find out the percentage of the data that are in the tails # ( 2.5% on each side )

> ind <- apply( mat1, 1, function( row ) {

+ sum(row > qnorm(.975) | row < qnorm(.025)) / length(row) + } ) > length( ind ) [1] 100

> ind[ 1:10 ] [1] 0.02 0.01 0.08 0.06 0.05 0.03 0.07 0.05 0.04 0.04

Applying Functions …

Page 42: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

tapply( x, ind, fun ): apply the “fun” to each cell a ragged array > ( groups <- factor(c(1,2,0,1,0,0,1,0,0,1, 2, 1, 0, 0, 1), levels = 0:2) )

[1] 1 2 0 1 0 0 1 0 0 1 2 1 0 0 1 Levels: 0 1 2

> weight <- c(1.4474550,1.5670582,1.520615,1.5407738,2.042491,1.864607,0.5865089,1.388697, 2.128079,1.6252041, 1.7806790, 2.9572416, 1.8650490, 0.9782662, 0.2721176)

> table( groups )

groups 0 1 2

7 6 2 > tapply( weight, groups, mean )

0 1 2 1.683972 1.404883 1.673869

> tapply( weight, groups, sd ) 0 1 2

0.4088762 0.9414384 0.1510527

Applying a Function to Multiple Elements

Page 43: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

build a contingency table with tapply( … ): > (test <- data.frame( weight, groups )) weight groups

1 1.4474550 1

2 1.5670582 2 3 1.5206150 0

4 1.5407738 1 5 2.0424910 0

6 1.8646070 0 7 0.5865089 1

8 1.3886970 0 9 2.1280790 0 10 1.6252041 1

11 1.7806790 2 12 2.9572416 1

13 1.8650490 0 14 0.9782662 0

15 0.2721176 1

> tapply( test$weight, test$groups, sum ) 0 1 2

11.787804 8.429301 3.347737

Contingency Table

Page 44: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  print ( pi, digits=10) [1] 3.141592654

•  aa <- seq(0,1,length.out=30) is.na ( aa ) <- aa > 0.75 print ( aa, na.print = ".", digits = 3) [1] 0.0000 0.0345 0.0690 0.1034 0.1379 0.1724 0.2069 0.2414 0.2759 0.3103 0.3448 [12] 0.3793 0.4138 0.4483 0.4828 0.5172 0.5517 0.5862 0.6207 0.6552 0.6897 0.7241 [23] . . . . . . . .

•  options ( "digits” = 3 ) cat(aa, fill = 30, labels = paste("[”1:10,"]:",sep=""))

[1]: 0 0.0345 0.069 0.103 [2]: 0.138 0.172 0.207 0.241 [3]: 0.276 0.310 0.345 0.379 [4]: 0.414 0.448 0.483 0.517 [5]: 0.552 0.586 0.621 0.655 [6]: 0.69 0.724 NA NA NA NA [7]: NA NA NA NA

Basic R Print and Concatenate Functions

Page 45: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  pi # with the default number of digits to print [1] 3.1415927

•  cat (letters, fill = 20, labels = paste("[“1:10,"]:",sep="")) [1]: a b c d e f g [2]: h i j k l m n [3]: o p q r s t u [4]: v w x y z

•  cat (LETTERS, fill = 20, labels = paste("[“1:10,"]:",sep="")) [1]: A B C D E F G [2]: H I J K L M N [3]: O P Q R S T U [4]: V W X Y Z

•  month.name [1] "January" "February" "March" "April" "May" "June" [7] "July" "August" "September" "October" "November" "December”

•  month.abb [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

Basic R Print and Concatenate Functions …

Page 46: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  There are many existing packages in R to facilitate creating and developing graphics in R

•  ggplot2 •  http://had.co.nz/ggplot2/

•  lattice •  plotrix •  rgl (a 3D real-time rendering device driver system for

R): http://rgl.neoscientists.org/gallery.shtml (Will show a d quick demo on using “rgl”.)

•  Many others see http://cran.r-project.org/web/views/Graphics.html

R Graphics Packages

Page 47: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

3D R Graphics from RGL

Page 48: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

data(volcano) ## from persp in {graphics} z <- 2 * volcano # Exaggerate the relief x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N) y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W) ## Don't draw the grid lines : border = NA par(bg = "white") ## draws perspective plots of a surface over the x–y plane persp(x, y, z, theta = 135, phi = 30, col = "green3", scale = FALSE, ltheta = -120, shade = 0.75, border = NA, box = FALSE) ########################################################################### library(rgl) z <- 2 * volcano # Exaggerate the relief x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N) y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W) zlim <- range(y) zlen <- zlim[2] - zlim[1] + 1 colorlut <- terrain.colors(zlen) # height color lookup table col <- colorlut[ z-zlim[1]+1 ] # assign colors to heights for each point open3d() surface3d(x, y, z, color=col, back="lines") open3d() x <- sort(rnorm(1000)) y <- rnorm(1000) z <- rnorm(1000) + atan2(x,y) plot3d(x, y, z, col=rainbow(1000)) ## to produce rotation of the plot

R Graphics

Page 49: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

3D R Graphics from RGL

Page 50: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

3D R Graphics from RGL

Page 51: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

R Graphics

Page 52: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

•  User-defined variables •  Workspace : a region of memory •  Useful functions:

> objects() [1] "my.mean" "my.range" "x" "y" > remove(x,y) > objects() [1] "my.mean" "my.range“ > search()

search() function gives a list of objects and attached packages.

•  It is recommend to clean up after yourself from time to time if you intend to save the workspace.

Cleaning Up Memory

Page 53: Introduction to R - University of Minnesota Supercomputing ...Introduction to R Haoyu Yu (haoyu@msi.umn.edu, 612-625-1709) Scientific Consulting Group (SCG) Supercomputing Institute,

Supercomputing Institute for Advanced Computational Research

© 2011 Regents of the University of Minnesota. All rights reserved.

traceback(): It shows the sequence of

function calls culminating in the error.

Esc (escape) key in S-PLUS or in R using the

mouse to press the Stop button in the toolbar.

Tips When Things go Wrong