r basics xudong zou prof. yundong wu dr. zhiqiang ye 18 th dec. 2013 1
TRANSCRIPT
2
R Basics History of R language
How to use R
Data type and Data Structure
Data input
R programming
Summary
Case study
What is R?• R is a programming language, and also a environment for statistics
analysis and graphics
Why use R• R is open and free. Currently contains 5088 packages that makes R a
powerful tool for financial analysis, bioinformatics, social network analysis and natural language process and so on.
• More and more people in science tend to learn and use R
# BioConduct: bioinformatics analysis(microarray)# survival: Survival analysis
Data type and Data structure
numeric : integer, single float, double floatcharactercomplexlogical
Data structure in R:
Data type in R :
Objects Class Mixed-class permitted?Vector numeric, char, complex, logical no
Factor numeric, char no
Array numeric, char, complex, logical no
Matrix numeric, char, complex, logical no
Data frame numeric, char, complex, logical yes
list numeric, char, complex, logical, func, exp… yes
28
Vector and vector operation
Vector is the simplest data structure in R, which is a single entity containing a collection of numbers, characters, complexes or logical. # Create two vectors:
# Check the attributes:
# basic operation on vector:
注意这个向左的箭头
29
Vector and vector operation
# basic operation on vector:
> max( vec1) > min (vec1) > mean( vec1) > median(vec1)> sum(vec1)> summary(vec1)
> vec1> vec1[1]> x <- vec1[-1] ; x[1] > vec1[7] <- 15;vec1
30
array and matrix
> x <- 1:24> dim( x ) <- c( 4,6) # create a 2D array with 4 rows and 6 columns> dim( x ) <- c(2,3,4) # create a 3D array
An array can be considered as a multiply subscripted collection of data entries.
31
array and matrix
> x <- 1:24> array( data=x, dim=c(4,6)) > array( x , dim= c(2,3,4) )
array()
array indexing
> x <- 1:24> y <- array( data=x, dim=c(2,3,4))> y[1,1,1]> y[,,2]> y[,,1:2]
32
array and matrix
> class(potentials) # “matrix”> dim(potentials) # 20 20 > rownames(potentials) # GLY ALA SER …> colnames(potentials) # GLY ALA SER …> min(potentials) # -4.4
Matrix is a specific array that its dimension is 2
33
list
List is an object that containing other objects as its component which can be a numeric vector, a logical value, a character or another list, and so on. And the components of a list do not need to be one type, they can be mixed type.
>Lst <- list(drugName="warfarin",no.target=3,price=500,+ symb.target=c("geneA","geneB","geneC")
>length(Lst) # 4>attributes(Lst) >names(Lst)>Lst[[1]]>Lst[[“drugName”]]>Lst$drugName
34
Data Frame
A data frame is a list with some restricts: ① the components must be vectors, factors, numeric matrices, lists or other
data frame ② Numeric vectors, logicals and factors are included as is, and by default
character vectors are coerced to be factors, whose levels are the unique values appearing in the vector
③ Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size
Names of components
35
Data Frame
> names(cars) [1] "Plant" "Type" "Treatment" "conc" "uptake“> length(cars) # 2> cars[[1]] > cars$speed # recommended
> attach(cars) # ?what’s this> detach(cars)
> summary(cars$conc) # do what we can do for a vector
36
Data Input
scan(file, what=double(), sep=“”, …) # scan will return a vector with data type the same as the what give.
read.table(file, header=FALSE, sep= “ ”, row.names, col.names, …)# read.table will return a data.frame object# my_data.frame <- read.table("MULTIPOT_lu.txt",row.names=1,header=TRUE)
# from SPSS and SASlibrary(Hmisc)mydata <- spss.get(“test.file”,use.value.labels=TRUE)mydata <- sasxport.get(“test.file”)#from Stata and systatlibrary(foreign)mydata<- read.dta(“test.file”)mydata<-read.systat(“test.file”)# from excellibrary(RODBC)channel <- odbcConnectExcel(“D:/myexcel.xls”)mydata <- sqlFetch(channel, “mysheet”)odbcclose(channel)
From other software
load package
39
Function
R Programming
Definition:
Example:
matrix.axes <- function(data) {x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);axis(side=1, at=x, labels=rownames(data), las=2);
x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);axis(side=2, at=x, labels=colnames(data), las=2);
}
40
Summary
Data type and Data Structure
numeric, character, complex, logical
vector, array/matrix, list, data frame
Data Input
scan, read.table
load from other software: SPSS, SAS, excel
Operators : <-
R Programming:
41
Case study
Residue based Protein-Protein Interaction potential analysis:
Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901
42
Reference
CRAN-Manual: http://cran.r-project.org/Quick-R: http://www.statmethods.net/index.htmlR tutorial: http://www.r-tutor.com/MOAC:http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/matrix_contour/