r basics xudong zou prof. yundong wu dr. zhiqiang ye 18 th dec. 2013 1

43
R Basics Xudong Zou Prof. Yundong Wu Dr. Zhiqiang Ye 18 th Dec. 2013 1

Upload: cadence-wittie

Post on 16-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

1

R Basics

Xudong Zou Prof. Yundong Wu Dr. Zhiqiang Ye18th Dec. 2013

2

R Basics History of R language

How to use R

Data type and Data Structure

Data input

R programming

Summary

Case study

3

History of R language

4

5

Robert Gentleman Ross Ihaka

6

History of R language

7

History of R language

8

History of R language

9

History of R language

10

History of R language

11

History of R language

12

History of R language

13

History of R language

14

History of R language

15

History of R language

16

History of R language

17

2013-09-25:Version: R-3.0.2

18

History of R language

19

History of R language

20

History of R language

21

History of R language

22

History of R language

23

History of R language

5088

What is R?• R is a programming language, and also a environment for statistics

analysis and graphics

Why use R• R is open and free. Currently contains 5088 packages that makes R a

powerful tool for financial analysis, bioinformatics, social network analysis and natural language process and so on.

• More and more people in science tend to learn and use R

# BioConduct: bioinformatics analysis(microarray)# survival: Survival analysis

控制台从这里输入命令

How to use R

?用来获取帮助

新建或打开 R脚本

点这里添加 R包

How to use R

Data type and Data structure

numeric : integer, single float, double floatcharactercomplexlogical

Data structure in R:

Data type in R :

Objects Class Mixed-class permitted?Vector numeric, char, complex, logical no

Factor numeric, char no

Array numeric, char, complex, logical no

Matrix numeric, char, complex, logical no

Data frame numeric, char, complex, logical yes

list numeric, char, complex, logical, func, exp… yes

28

Vector and vector operation

Vector is the simplest data structure in R, which is a single entity containing a collection of numbers, characters, complexes or logical. # Create two vectors:

# Check the attributes:

# basic operation on vector:

注意这个向左的箭头

29

Vector and vector operation

# basic operation on vector:

> max( vec1) > min (vec1) > mean( vec1) > median(vec1)> sum(vec1)> summary(vec1)

> vec1> vec1[1]> x <- vec1[-1] ; x[1] > vec1[7] <- 15;vec1

30

array and matrix

> x <- 1:24> dim( x ) <- c( 4,6) # create a 2D array with 4 rows and 6 columns> dim( x ) <- c(2,3,4) # create a 3D array

An array can be considered as a multiply subscripted collection of data entries.

31

array and matrix

> x <- 1:24> array( data=x, dim=c(4,6)) > array( x , dim= c(2,3,4) )

array()

array indexing

> x <- 1:24> y <- array( data=x, dim=c(2,3,4))> y[1,1,1]> y[,,2]> y[,,1:2]

32

array and matrix

> class(potentials) # “matrix”> dim(potentials) # 20 20 > rownames(potentials) # GLY ALA SER …> colnames(potentials) # GLY ALA SER …> min(potentials) # -4.4

Matrix is a specific array that its dimension is 2

33

list

List is an object that containing other objects as its component which can be a numeric vector, a logical value, a character or another list, and so on. And the components of a list do not need to be one type, they can be mixed type.

>Lst <- list(drugName="warfarin",no.target=3,price=500,+ symb.target=c("geneA","geneB","geneC")

>length(Lst) # 4>attributes(Lst) >names(Lst)>Lst[[1]]>Lst[[“drugName”]]>Lst$drugName

34

Data Frame

A data frame is a list with some restricts: ① the components must be vectors, factors, numeric matrices, lists or other

data frame ② Numeric vectors, logicals and factors are included as is, and by default

character vectors are coerced to be factors, whose levels are the unique values appearing in the vector

③ Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size

Names of components

35

Data Frame

> names(cars) [1] "Plant" "Type" "Treatment" "conc" "uptake“> length(cars) # 2> cars[[1]] > cars$speed # recommended

> attach(cars) # ?what’s this> detach(cars)

> summary(cars$conc) # do what we can do for a vector

36

Data Input

scan(file, what=double(), sep=“”, …) # scan will return a vector with data type the same as the what give.

read.table(file, header=FALSE, sep= “ ”, row.names, col.names, …)# read.table will return a data.frame object# my_data.frame <- read.table("MULTIPOT_lu.txt",row.names=1,header=TRUE)

# from SPSS and SASlibrary(Hmisc)mydata <- spss.get(“test.file”,use.value.labels=TRUE)mydata <- sasxport.get(“test.file”)#from Stata and systatlibrary(foreign)mydata<- read.dta(“test.file”)mydata<-read.systat(“test.file”)# from excellibrary(RODBC)channel <- odbcConnectExcel(“D:/myexcel.xls”)mydata <- sqlFetch(channel, “mysheet”)odbcclose(channel)

From other software

load package

37

Operators

38

Control Statements

R Programming

# switch( statement, list)

# repeat {…}

39

Function

R Programming

Definition:

Example:

matrix.axes <- function(data) {x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);axis(side=1, at=x, labels=rownames(data), las=2);

x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);axis(side=2, at=x, labels=colnames(data), las=2);

}

40

Summary

Data type and Data Structure

numeric, character, complex, logical

vector, array/matrix, list, data frame

Data Input

scan, read.table

load from other software: SPSS, SAS, excel

Operators : <-

R Programming:

41

Case study

Residue based Protein-Protein Interaction potential analysis:

Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901

42

Reference

CRAN-Manual: http://cran.r-project.org/Quick-R: http://www.statmethods.net/index.htmlR tutorial: http://www.r-tutor.com/MOAC:http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/r/matrix_contour/

43

Thanks for your attention