3 r tutorial data structure
TRANSCRIPT
![Page 1: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/1.jpg)
R ProgrammingSakthi Dasan Sekar
http://shakthydoss.com 1
![Page 2: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/2.jpg)
Data structures
a) Vector
b) Matrix
c) Array
d) Data frame
e) List
http://shakthydoss.com 2
![Page 3: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/3.jpg)
Data structure
Vectors are one-dimensional arrays
a <- c(1, 2, 5, 3, 6, -2, 4)
b <- c("one", "two", "three")
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
a is numeric vector,
b is a character vector, and
c is a logical vector
http://shakthydoss.com 3
![Page 4: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/4.jpg)
Data structure
Scalars are one-element vectors.
f <- 3
g <- "US"
h <- TRUE.
They’re used to hold constants.
http://shakthydoss.com 4
![Page 5: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/5.jpg)
Data structure
The colon operator :
a <- c(1:5)
is equivalent to
a <- c(1,2, 3, 4, 5)
http://shakthydoss.com 5
![Page 6: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/6.jpg)
Data structure
Vector
You can refer to elements of a vector using a numeric vector of positions within brackets.
Example
vec <- c(“a”, “b”, “c”, “d”, “e”, ”f”)
vec[1] # will return the first element in the vector
vec[c(2,4)] # will return the 2nd and 4th element in the vector.
http://shakthydoss.com 6
![Page 7: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/7.jpg)
Data structure
Matrices
Matrix are two-dimensional data structure in R. Elements in matrix should have same mode (numeric, character, or logical).Matrices are created with the matrix() function.
vector <- c(1,2,3,4) foo <- matrix(vector, nrow=2, ncol=2)
http://shakthydoss.com 7
![Page 8: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/8.jpg)
Data structure
Matrices byrow (optional parameter)
byrow=TRUE, matrix elements are filled by row wise.
byrow=FALSE, matrix elements are filled by column wise.
foo <- matrix(vector, nrow=2, ncol=2, byrow = TRUE)
foo <- matrix(vector, nrow=2, ncol=2, byrow = FALSE)
http://shakthydoss.com 8
![Page 9: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/9.jpg)
Data structure
Matrix element can be accessed by subscript and brackets
Example
mat <- matrix(c(1:4), nrow=2,ncol = 2)
mat[1,] # returns first row in the matrix. mat[2,] # returns second row in the matrix.
mat[,1] # returns first column in the matrix. mat[,2] # returns second column in the matrix.
mat[1,2] # return element at first row of second column.
http://shakthydoss.com 9
![Page 10: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/10.jpg)
Data structure
Array
Arrays are similar to matrices but can have more than two dimensions
Arrays are created with the array() function.
array(vector, dimensions, dimnames)
a <- matrix(c(1,1,1,1) , 2, 2)
b <- matrix(c(2,2,2,2) , 2, 2)
foo <- array(c(a,b), c(2,2,2))
http://shakthydoss.com 10
![Page 11: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/11.jpg)
Data structure
Array
array elements can be accessed in the same way a matrices.
foo[1,,] # returns all elements in first dimension
foo[2,,] # returns all element in second dimension
foo[2,1,] # returns only first row element in second dimension
http://shakthydoss.com 11
![Page 12: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/12.jpg)
Data structure
Data frame Data frames are the most commonly used data structure in R.
Data frame is more like general matrix but its columns can contain different modes of data (numeric, character, etc.)
A data frame is created with the data.frame() function
data.frame(col1, col2, col3,..)
name <- c( “joe” , “jhon” , “Nancy” )
sex <- c(“M”, “M”, “F”)
age <- c(27,26,26)
foo <- data.frame(name,sex,age)
http://shakthydoss.com 12
![Page 13: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/13.jpg)
Data structure
Data frame
Accessing data frame elements can be straight forward. Element can be accessed by column names.
Example
foo$name # retruns name vector in the data frame
foo$age # retuns age vector in the data frame
foo$age[2] # retuns second element of age vector in the data frame
http://shakthydoss.com 13
![Page 14: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/14.jpg)
Data structure
FactorsCategorical variables in R are called factors.
Status (poor, improved, excellent) and Gender (Male, Female) are good example of an categorical variables.
Factor are created using factor() function.
gender <- c(“Male", “Female“, “Female”, “Male”)
status <- c(“Poor”, “Improved” “Excellent”, “Poor” , “Excellent”)
factor_gender <- factor(gender) # factor_genter has two levels called Male and Female
factor_status <- factor(status) # factor_status has three levels called Poor, Improved and Excellent.
http://shakthydoss.com 14
![Page 15: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/15.jpg)
Data structure
ListLists are the most complex data structure in R
List may contain a combination of vectors, matrices, data frames, and even other lists.
You create a list using the list() function
vec <- c(1,2,3,4)
mat <- matrix(vec,2,2)
foo <- list(vec, mat)
http://shakthydoss.com 15
![Page 16: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/16.jpg)
Data Import/Export
Import Excel File
Quite frequently, the sample data is in Excel format, and needs to be imported into R prior to use.
library(gdata) # load gdata package
help(read.xls) # documentation
mydata = read.xls("mydata.xls") # read from first sheet
http://shakthydoss.com 16
![Page 17: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/17.jpg)
Data Import/Export
Import Excel File
Alternate package XLConnect
library(XLConnect)
wk = loadWorkbook("mydata.xls")
df = readWorksheet(wk, sheet="Sheet1")
http://shakthydoss.com 17
![Page 18: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/18.jpg)
Data Import/Export
Import Minitab File
If the data file is in Minitab Portable Worksheet format, it can be opened with the function read.mtp from the foreign package. It returns a list of components in the Minitab worksheet.
library(foreign) # load the foreign package
help(read.mtp) # documentation
mydata = read.mtp("mydata.mtp") # read from .mtp file
http://shakthydoss.com 18
![Page 19: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/19.jpg)
Data Import/Export
Import Table File
A data table can resides in a text file. The cells inside the table are separated by blank characters. Here is an example of a table with 4 rows and 3 columns.
100 a1 b1 200 a2 b2 300 a3 b3 400 a4 b4
help(read.table) #documentation mydata = read.table("mydata.txt")
http://shakthydoss.com 19
![Page 20: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/20.jpg)
Data Import/Export
Import CSV File
The sample data can also be in comma separated values (CSV) format. Each cell inside such data file is separated by a special character, which usually is a comma.
help(read.csv) #documentation
mydata = read.csv("mydata.csv", sep=",")
http://shakthydoss.com 20
![Page 21: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/21.jpg)
Data Import/Export
Export Table filehelp(write.table) #documentation
write.table(mydata, "c:/mydata.txt", sep="\t")
Export Excel file library(xlsx)
help(write.xlsx) #documentation
write.xlsx(mydata, "c:/mydata.xlsx")
http://shakthydoss.com 21
![Page 22: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/22.jpg)
Data Import/Export
Export CSV file
help(write.csv)
write.csv(mydate, file = "mydata.csv")
Avoid writing the headers
write.csv(mydata, file = "mydata.csv", row.names=FALSE)
http://shakthydoss.com 22
![Page 23: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/23.jpg)
Data Import/Export
Knowledge Check
http://shakthydoss.com 23
![Page 24: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/24.jpg)
Data Import/Export
Every individual data value has a data type that tells us what sort of value it is.
A. TRUE
B. FALSE
Answer A
http://shakthydoss.com 24
![Page 25: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/25.jpg)
Data Import/Export
What happen when execute the code. vec <- c(1,"hello",TRUE)
A. vec is assigned with multiple values.
B. Nothing happens.
C. ERROE
D. vec has only one value and that is TRUE.
Answer C
http://shakthydoss.com 25
![Page 26: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/26.jpg)
Data Import/Export
Which statement is TRUE A. Matrix is a three-dimensional collection of values that all have the same
type.
B. A factor can be used to represent a categorical variable.
C. Vector is a two-dimensional collection of values that can have multiple mode (numeric, character, boolean).
D. At maximum a single data frame can hold only 20GB of data.
Answer B
http://shakthydoss.com 26
![Page 27: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/27.jpg)
Data Import/Export
What is most appropriate data structure for the below dataset.
A. Matrix
B. Data frame
C. Array
D. List
Answer B
Name Age Gender
Jhon 24 M
Joe 24 M
Nancy 25 F
http://shakthydoss.com 27
![Page 28: 3 R Tutorial Data Structure](https://reader031.vdocuments.mx/reader031/viewer/2022021814/58f067231a28ab6f218b4623/html5/thumbnails/28.jpg)
Data Import/Export
Function that is used to create array
A. a(vector, dimensions, dimnames)
B. create(vector, dimensions, dimnames)
C. array(vector, dimensions, dimnames)
D. a(vector,dimensions)
Answer C
http://shakthydoss.com 28