20130222 data structures and manipulation in r

20
Manipulationg data in 2013-02-22 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO KNOW

Upload: kazuki-yoshida

Post on 26-May-2015

619 views

Category:

Documents


1 download

TRANSCRIPT

Manipulationg data in

2013-02-22 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO  KNOW

Manipulating data in R

n What are Objects?

n What is Class attribute?

n Various data objects you will see in R.

Objects

n Just about everything named in R is an object

n An object is a container that

n knows its class (label for what’s inside).

n has contents (eg, Actual numbers).

Examples of objectsn dataset, which you use for analysis (various

classes)

n functions, which perform analysis (function class)

n results, which come out of analysis (various classes)

n In effect, you always get a new dataset filled with results when you analyze data.

Classes of data values inside data objects

n Numeric: Continuous variables

n Factor: Categorical variables

n Logical: TRUE/FALSE binary variables

n etc...

Class?

n An object’s class tells R how the object should be handled.

n For example, summarizing data should work differently for numbers and categories!

Categorical variables

inside!

http://en.wikipedia.org/wiki/File:3_D-Box.jpg

Object

Class attribute

Data objects

n Vector (contains single class of data values)

n List (contains multiple classes of data values)

Data objects

n Vector (contains single class of data values)

n Array including Matrix

n List (contains multiple classes of data values)

n Data frame

Vector

n Smallest building block of data objects

n Single dimension

n Combination of values of same class

n vec1 <- c(2013, 2, 15, -10) # combine

n vec2 <- 1:16 # integers 1 to 16

Vector

1-dimensional

Array/Matrixn Vector folded into a multidimensional structure

n 2-dimensional array is a matrix

n vec3 <- 1:16

n dim(vec3) <- c(4, 4) # 4 x 4 structure

n dim(vec3) <- c(2, 2, 4) # 2 x 2 x 4 structure

n arr1 <- array(1:60, dim = c(3,4,5))

Matrix

Folded vector with dimension

Listn Combination of any values or objects

n Can contain objects of multiple classes

n eg, a list of two vectors, a matrix, three arrays

n List_name$Variable_name operation with $ operator

n list1 <- list(first = 1:17, second = matrix(letters, 13,2))

n list2 <- list(alpha = c(1,4,5,7), beta = c("h","s","p","h"))

List

Multi-part object

Can contain vectors, arrays, or lists!

Data frame

n Special case of a list

n List of same-length vectors vertically aligned

n df1 <- data.frame(list2)

n list3 <- list(small = letters, large = LETTERS, number = 1:26)

n df2 <- data.frame(list3)

Data Frame

Multiple vectors of same length tied together!

Access by indexes

n letters[3] # 1-dimensional object

n arr1[1,2,3] # 3-dimensional object

n arr1[1, ,3] # implies 1,(all),3

n df1[ ,3] # implies (all),3

n list1[[1]] # list needs [[ ]]

Access named elements

n list3

n list3$small

n list3[["small"]]

n df1$large

n df1[, "large"]