r lecture oga
TRANSCRIPT
![Page 1: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/1.jpg)
Handling quantitative data usingstatistical software R
Osamu Ogasawara2015.01.19
![Page 2: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/2.jpg)
Contents1. What is R?
2. An Introductory Example
3. Types and Data Structures (in C and R)
4. Functional Programming (apply() function)
5. R Graphics
6. Bioinformatics (RNA-seq)
![Page 3: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/3.jpg)
What is the R language?
![Page 4: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/4.jpg)
Computer Language Popularity
The TOIBE index is the weighted mean of following form: ((hits(PL,SE1)/hits(SE1) + ... + hits(PL,SEn)/hits(SEn))/nwhere the PL is the search query of following pattern +"<language> programming”
![Page 5: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/5.jpg)
Computer Language Popularity
C languageand its derivatives
(General purpose)Script languages
Domain specific language
![Page 6: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/6.jpg)
Computer Language Popularity
Domain SpecificLanguages
Script language The others
![Page 7: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/7.jpg)
Classification of Computer Languages
by abstraction levels
Assembly Languages
High Level LanguagesC, C++, Java, …
Very High Level Languages (VHLL)Scripting languages: Perl, Python, Ruby, …Domain Specific Language
R : statisticsMatlab, …
Higher level language is more closer to the natural language.
![Page 8: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/8.jpg)
Introductory Examples
![Page 9: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/9.jpg)
Simple Example (1) histogram
> x<-rnorm(100000000)> head(x)[1] 0.4667083 0.8907642 0.8147121 0.4839252 0.5811472 0.4941122> hist(x)
> system.time(x<-rnorm(100000000)) user system elapsed 8.771 0.249 9.020
![Page 10: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/10.jpg)
Simple Example (2) t-test>group1 <- c(0.7,-1.6,-0.2,-1.2,-0.1,3.4,3.7,0.8,0.0,2.0)
> group2 <- c(1.9, 0.8, 1.1, 0.1,-0.1,4.4,5.5,1.6,4.6,3.4)> group1 [1] 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0.0 2.0> group2 [1] 1.9 0.8 1.1 0.1 -0.1 4.4 5.5 1.6 4.6 3.4> boxplot(group1, group2)> t.test(group1, group2, var.equal=T)
Two Sample t-test
data: group1 and group2t = -1.8608, df = 18, p-value = 0.07919alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: -3.363874 0.203874sample estimates:mean of x mean of y 0.75 2.33
http://cse.naro.affrc.go.jp/takezawa/r-tips/r/65.html
![Page 11: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/11.jpg)
Getting Help in RDisplay the contents of the R manual. (If you know the name of the function)
Search functions by keywords
Search functions by (partial) matching of function names
?rnormhelp(“rnorm”)
??”normal distribution”help.search(“normal distribution”)
find(“rnorm”)appropos(“rnorm”)
![Page 12: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/12.jpg)
The R Graphical manual
![Page 13: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/13.jpg)
R manual
![Page 14: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/14.jpg)
Probability Distributions
dnorm() : Density function
pnorm() : (cumulative) probability distribution function
qnorm() : Quantile
rnorm() : Random number generation
“Quick-R” sitehttp://www.statmethods.net/advgraphs/probability.html
![Page 15: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/15.jpg)
Plotting the density function (1/2)
> x<-seq(-4,4,length=100)> x [1] -4.00000000 -3.91919192 -3.83838384 -3.75757576 -3.67676768 -3.59595960 [7] -3.51515152 -3.43434343 -3.35353535 -3.27272727 -3.19191919 -3.11111111 [13] -3.03030303 -2.94949495 -2.86868687 -2.78787879 -2.70707071 -2.62626263… omitted> dx<-dnorm(x)
![Page 16: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/16.jpg)
Plotting the density function (2/2)
> plot(x,dx,type="l",xlab="x",ylab="y",main="The normal distribution”)
![Page 17: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/17.jpg)
Plotting the probability distribution function
> x<-seq(-4,4,length=100)> px<-pnorm(x)> plot(x,px,type="l",xlab="x",ylab="y",main="The normal distribution")
![Page 18: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/18.jpg)
Quantile (1/5)plot(x,dnorm(x), type="n", ylim=c(0,1))
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.htmlCopyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
![Page 19: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/19.jpg)
Quantile (2/5)plot(x,dnorm(x), type="n", ylim=c(0,1))curve(dnorm(x), type="l", add=T)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.htmlCopyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
![Page 20: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/20.jpg)
Quantile (3/5)plot(x,dnorm(x), type="n", ylim=c(0,1))curve(dnorm(x), type="l", add=T)curve(pnorm(x), type="l", lty=3, add=T)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.htmlCopyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
![Page 21: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/21.jpg)
Quantile (4/5)plot(x,dnorm(x), type="n", ylim=c(0,1))curve(dnorm(x), type="l", add=T)curve(pnorm(x), type="l", lty=3, add=T)abline(h=0.05)abline(h=0.95)
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.htmlCopyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
![Page 22: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/22.jpg)
Quantile (5/5)x<-seq(-4,4,length=100)plot(x,dnorm(x), type="n", ylim=c(0,1))curve(dnorm(x), type="l", add=T)curve(pnorm(x), type="l", lty=3, add=T)abline(h=0.05)abline(h=0.95)
lower.alpha5<-qnorm(0.05)upper.alpha5<-qnorm(0.95)abline(v=lower.alpha5)abline(v=upper.alpha5)points(lower.alpha5, 0.05, cex=3.0, pch="*")points(upper.alpha5, 0.95, cex=3.0, pch="*")
http://cse.niaes.affrc.go.jp/minaka/R/R-normal.htmlCopyright (c) 2004 by MINAKA Nobuhiro. All rights reserved.
![Page 23: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/23.jpg)
Calculation of the p-value of a numeral vector x.
http://d.hatena.ne.jp/hoxo_m/20130213/p1
norm.dist.p <- function(x) { n <- length(x) mean <- mean(x) sd <- sd(x) / sqrt(n) p <- pnorm(-abs(mean), mean=0, sd=sd) * 2 p } x <- rnorm(10, mean=0) p <- norm.dist.p(x) cat("p =", p, "\n")
![Page 24: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/24.jpg)
Bias in small samples
alpha = 0.05ps <- sapply(1:10000, function(i) { x <- rnorm(10) p <- norm.dist.p(x) p })fp <- sum(ps < alpha) / length(ps)cat("alpha error rate =", fp, "\n")
alpha error rate = 0.0812
![Page 25: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/25.jpg)
Types and Data Structures
![Page 26: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/26.jpg)
Types in C (partial)Integer Types
Floating-Point Types
![Page 27: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/27.jpg)
Memory Layout of C Programs
1. Text segment (Code segment)
2. Initialized data segment (initialized global variables and static variables)
3. Uninitialized data segment
4. Stack (automatic variables)
5. Heap (for dynamic memory allocation by malloc(), free(), …)
http://www.geeksforgeeks.org/memory-layout-of-c-program/
![Page 28: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/28.jpg)
Stack frame and function call
int main() { int x = 0; a(); return 0;}
int a() { int x=1; b(); c(); return 0;}
http://www.tenouk.com/ModuleZ.html
![Page 29: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/29.jpg)
Recursion in C#include<stdio.h>
Fact(int f) { if (f == 1) return 1; return (f * Fact(f - 1)); //called in function only once }
int main() { int fact; fact = Fact(5); printf("Factorial is %d", fact); return 0;}
http://www.programmingspark.com/2013/03/Working-of-Recursion-in-detail-using-Stack.html
![Page 30: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/30.jpg)
Recursion in C
http://www.programmingspark.com/2013/03/Working-of-Recursion-in-detail-using-Stack.html
![Page 31: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/31.jpg)
C pointersint b = 17;
int* a = &b;
x = *a; /* x = 17 */
![Page 32: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/32.jpg)
Arrays and Linked Lists
![Page 33: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/33.jpg)
Adding an element to the containers
Linked ListC Array (R vector)
![Page 34: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/34.jpg)
Types in RLogical : TRUE, T, FALSE, F
Numerical (double): 1, 1.0, 1.4e+3
Complex: 3.5+4i
Character : “abc”> typeof(TRUE)[1] "logical"> typeof(1)[1] "double"> typeof(1.0)[1] "double”> typeof(3.5+4i)[1] "complex"> typeof("abc")[1] "character”
> is.vector(TRUE)[1] TRUE> is.vector(1)[1] TRUE> is.vector(3.5+4i)[1] TRUE> is.vector("abc")[1] TRUE
![Page 35: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/35.jpg)
Creation of R vectors
> c(1,2,3,4,5)[1] 1 2 3 4 5
> 1:5[1] 1 2 3 4 5
> 5.1:-1.2[1] 5.1 4.1 3.1 2.1 1.1 0.1 -0.9
> seq(1,3,0.5)[1] 1.0 1.5 2.0 2.5 3.0
> rep(
> numeric(10) [1] 0 0 0 0 0 0 0 0 0 0> logical(10) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE> character(10) [1] "" "" "" "" "" "" "" "" "" ""> complex(10) [1] 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i
![Page 36: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/36.jpg)
Operation on vectors
> 1:10*2 [1] 2 4 6 8 10 12 14 16 18 20
> 2*(3^(0:4))[1] 2 6 18 54 162
> v1<-1:10> v2<-10:1> v1+v2 [1] 11 11 11 11 11 11 11 11 11 11
![Page 37: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/37.jpg)
> v1<-c(1,2,3)> v1[1] 1 2 3> v1[1][1] 1> v1[4][1] NA> v1[5]<-10> v1[1] 1 2 3 NA 10> v1[6]<-"a"> v1[1] "1" "2" "3" NA "10" "a"
> v2<-runif(10, 1,10)> v2 [1] 4.851027 7.618278 5.371393 3.940181 1.002870 9.511409 2.364836 5.246343 [9] 3.361870 9.435904> v2<5 [1] TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE> v2[v2<5][1] 4.851027 3.940181 1.002870 2.364836 3.361870> v2[1:3][1] 4.851027 7.618278 5.371393> v2[1:3*2][1] 7.618278 3.940181 9.511409
![Page 38: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/38.jpg)
R Lists
![Page 39: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/39.jpg)
Creation of R Lists> w1<-list("a", 10, TRUE)> w1[[1]][1] "a"
[[2]][1] 10
[[3]][1] TRUE
> w2 <- as.list(c(1,2,3))> w2[[1]][1] 1
[[2]][1] 2
[[3]][1] 3
![Page 40: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/40.jpg)
Data structure of R objects
Type information pointers data (vector)
![Page 41: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/41.jpg)
R List> w1<-list(1:3,"ab",TRUE)> w1[[1]][1] 1 2 3
[[2]][1] "ab"
[[3]][1] TRUE
TRUE
“a” “b”
1 2 3
![Page 42: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/42.jpg)
w1[1] returns sublist w1[[1]] returns a content of
the listTRU
E
“a” “b”
1 2 3
> typeof(w1)[1] "list"> typeof(w1[1])[1] "list"> typeof(w1[[1]])[1] "integer”
> w1[1][[1]][1] 1 2 3
> w1[[1]][1] 1 2 3
> w1[[1]][1][1] 1
![Page 43: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/43.jpg)
w2<-w1[c(1,2)] TRUE
“a” “b”
1 2 3
w1
w2
> remove(w1) > w1Error: object 'w1' not found> w2[[1]][1] 1 2
[[2]][1] 3 4
![Page 44: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/44.jpg)
R List and “names”
> w3<-list(a=1:3, b="abc", NA)> w3$a[1] 1 2 3
$b[1] "abc"
[[3]][1] NA
> w3[[1]][1] 1 2 3> w3$a[1] 1 2 3> w3[1]$a[1] 1 2 3
![Page 45: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/45.jpg)
Attributes of an R object
TRUE
“a” “b”
1 2 3
> w3<-list(a=1:3,b="ab",TRUE)> attributes(w3)$names[1] "a" "b" "”
> attr(w3,"names")<-NULL> w3[[1]][1] 1 2 3
[[2]][1] "ab"
[[3]][1] TRUE
$names[1] "a" "b" ""
![Page 46: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/46.jpg)
data.frame : List of vectors> phenotype<-read.table("bodymap_phenodata.txt", header=T,
row.names=1, sep=" ", quote="")> phenotype num.tech.reps tissue.type gender age raceERS025098 2 adipose F 73 caucasianERS025092 2 adrenal M 60 caucasianERS025085 2 brain F 77 caucasianERS025088 2 breast F 29 caucasianERS025089 2 colon F 68 caucasianERS025082 2 heart M 77 caucasianERS025081 2 kidney F 60 caucasianERS025096 2 liver M 37 caucasianERS025099 2 lung M 65 caucasianERS025086 2 lymphnode F 86 caucasianERS025084 6 mixture <NA> NA caucasianERS025087 5 mixture <NA> NA caucasianERS025093 5 mixture <NA> NA caucasianERS025083 2 ovary F 47 african_americanERS025095 2 prostate M 73 caucasian… omitted
![Page 47: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/47.jpg)
RNA-seq
http://www.bgisequence.com/jp/services/sequencing-services/rna-sequencing/rna-seq/
![Page 48: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/48.jpg)
http://bowtie-bio.sourceforge.net/recount/
![Page 49: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/49.jpg)
bodymap_count_table.txt
Tab delimited formatThe first line shows a list of sample identifiers. (19 human organs) The first column is a list of gene identifiers (Ensemble genes)
![Page 50: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/50.jpg)
bodymap_phenodata.txt
![Page 51: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/51.jpg)
Read a data table to a data frame
> phenotype<-read.table("bodymap_phenodata.txt", header=T, row.names=1, sep=" ", quote="")> phenotype num.tech.reps tissue.type gender age raceERS025098 2 adipose F 73 caucasianERS025092 2 adrenal M 60 caucasianERS025085 2 brain F 77 caucasianERS025088 2 breast F 29 caucasianERS025089 2 colon F 68 caucasianERS025082 2 heart M 77 caucasianERS025081 2 kidney F 60 caucasianERS025096 2 liver M 37 caucasianERS025099 2 lung M 65 caucasianERS025086 2 lymphnode F 86 caucasianERS025084 6 mixture <NA> NA caucasianERS025087 5 mixture <NA> NA caucasianERS025093 5 mixture <NA> NA caucasianERS025083 2 ovary F 47 african_americanERS025095 2 prostate M 73 caucasian… omitted
![Page 52: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/52.jpg)
Inspect the type and attribute of the data frame
> typeof(phenotype)[1] "list"> attributes(phenotype)$names[1] "num.tech.reps" "tissue.type" "gender" "age" [5] "race"
$class[1] "data.frame"
$row.names [1] "ERS025098" "ERS025092" "ERS025085" "ERS025088" "ERS025089" "ERS025082" [7] "ERS025081" "ERS025096" "ERS025099" "ERS025086" "ERS025084" "ERS025087"[13] "ERS025093" "ERS025083" "ERS025095" "ERS025097" "ERS025094" "ERS025090"[19] "ERS025091"
![Page 53: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/53.jpg)
Read the count table
> data <- read.table("bodymap_count_table.txt", header=T, row.names=1, sep="\t", quote="")
> head(data) ERS025098 ERS025092 ERS025085 ERS025088 ERS025089 ERS025082ENSG00000000003 1354 216 215 924 725 125ENSG00000000005 712 134 4 1495 119 20ENSG00000000419 450 547 516 529 808 680ENSG00000000457 188 368 196 386 156 259ENSG00000000460 66 29 1 26 11 9ENSG00000000938 104 79 7 29 0 3… omitted
![Page 54: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/54.jpg)
Replace the column names: from the IDs to the tissue
type descriptions> colnames(data) [1] "ERS025098" "ERS025092" "ERS025085" "ERS025088" "ERS025089" "ERS025082" [7] "ERS025081" "ERS025096" "ERS025099" "ERS025086" "ERS025084" "ERS025087"[13] "ERS025093" "ERS025083" "ERS025095" "ERS025097" "ERS025094" "ERS025090"[19] "ERS025091"> colnames(data)<-phenotype$tissue.type> colnames(data) [1] "adipose" "adrenal" "brain" "breast" [5] "colon" "heart" "kidney" "liver" [9] "lung" "lymphnode" "mixture" "mixture" [13] "mixture" "ovary" "prostate" "skeletal_muscle" [17] "testes" "thyroid" "white_blood_cell"> head(data) adipose adrenal brain breast colon heart kidney liver lungENSG00000000003 1354 216 215 924 725 125 796 1954 815ENSG00000000005 712 134 4 1495 119 20 7 0 0ENSG00000000419 450 547 516 529 808 680 744 369 636ENSG00000000457 188 368 196 386 156 259 436 288 187ENSG00000000460 66 29 1 26 11 9 25 42 12ENSG00000000938 104 79 7 29 0 3 1 20 243
![Page 55: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/55.jpg)
Looking into the data frame> head(data$adipose, 100)
[1] 1354 712 450 188 66 104 0 1323 0 858 0 0 [13] 13 6346 0 0 0 0 0 3 0 485 0 0 [25] 36 0 0 0 0 1002 1360 0 4179 12 424 0 [37] 97 0 0 0 0 0 0 0 2577 0 0 0 [49] 0 0 5 2241 0 0 115 3678 0 14104 18 1662 [61] 0 0 0 0 6 0 0 7839 0 2 1313 1997 [73] 40 5390 0 0 0 208 180 1277 1460 0 0 1002 [85] 30 177 84 441 0 2986 1598 0 13925 94 5565 0 [97] 0 0 0 0
> length(data$adipose)[1] 52580> length(data$adipose[data$adipose>0])[1] 9992
![Page 56: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/56.jpg)
Distribution of the data> hist(data$adipose)
> hist(log10(data$adipose))
> summary(log10(data$adipose)) Min. 1st Qu. Median Mean 3rd Qu. Max. -Inf -Inf -Inf -Inf -Inf 6 > summary(log10(data$adipose[data$adipose>0])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 1.462 2.382 2.287 3.109 6.200
![Page 57: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/57.jpg)
attach() and detach() the column header names to its
“environment”
> attach(data) > head(adipose, 100) [1] 1354 712 450 188 66 104 0 1323 0 858 0 0 [13] 13 6346 0 0 0 0 0 3 0 485 0 0 [25] 36 0 0 0 0 1002 1360 0 4179 12 424 0 [37] 97 0 0 0 0 0 0 0 2577 0 0 0 [49] 0 0 5 2241 0 0 115 3678 0 14104 18 1662 [61] 0 0 0 0 6 0 0 7839 0 2 1313 1997 [73] 40 5390 0 0 0 208 180 1277 1460 0 0 1002 [85] 30 177 84 441 0 2986 1598 0 13925 94 5565 0 [97] 0 0 0 0 > length(adipose) [1] 52580 > detach(data) > length(adipose) Error: object 'adipose' not found > length(data$adipose) [1] 52580
![Page 58: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/58.jpg)
Environment (1/2)Environment basics : http://adv-r.had.co.nz/Environments.html
The job of an environment is to associate, or bind, a set of names to a set of values.You can think of an environment as a bag of names:
• If an object has no names pointing to it, it gets automatically deleted by the garbage collector.
• Every object in an environment has a unique name.
• The objects in an environment are not ordered (i.e., it doesn’t make sense to ask what the first object in an environment is).
![Page 59: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/59.jpg)
Environment (2/2)Most environments are created as a consequence of using functions.
An environment has a parent environment.
http://adv-r.had.co.nz/Environments.html
![Page 60: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/60.jpg)
the apply() function> apply(data, 2, sum) adipose adrenal brain breast 23957600 18987359 20995462 23426900 colon heart kidney liver 23397325 26762377 22630393 29314904 lung lymphnode mixture mixture 23426381 19489508 31135063 57697453 mixture ovary prostate skeletal_muscle 52460922 22857384 25215879 28400943 testes thyroid white_blood_cell 27261469 24465463 27871222
> png(filename="bar001.png") > par(mai=c(1,2,1,1)) > barplot(s,horiz=T,las=1) > dev.off()
![Page 61: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/61.jpg)
Customizing (Traditional) Graphics
> s=apply(data, 2, sum)> s adipose adrenal brain breast 23957600 18987359 20995462 23426900 colon heart kidney liver 23397325 26762377 22630393 29314904 lung lymphnode mixture mixture 23426381 19489508 31135063 57697453 mixture ovary prostate skeletal_muscle 52460922 22857384 25215879 28400943 testes thyroid white_blood_cell 27261469 24465463 27871222
> barplot(s)
![Page 62: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/62.jpg)
Customizing (Traditional)
Graphics
barplot(s, horiz=TRUE)
![Page 63: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/63.jpg)
Customizing (Traditional)
Graphics
> par(mai=c(1,2,1,1)) > barplot(s,horiz=T,las=1)
![Page 64: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/64.jpg)
Customizing Traditional Graphics
with par() function
Paul MurrelR Graphics 2nd. ed.(2011)
![Page 65: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/65.jpg)
Customizing Traditional Graphics
with par() function
Paul MurrelR Graphics 2nd. ed.(2011)
![Page 66: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/66.jpg)
Paul MurrelR Graphics 2nd. ed.(2011)
![Page 67: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/67.jpg)
![Page 68: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/68.jpg)
How many plot types are there?
![Page 69: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/69.jpg)
Winston ChangR Graphics Cookbook O’Reilly (2013)
ggplot2 and traditional graphics
![Page 70: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/70.jpg)
Functional programming with the apply() function
> apply(log10(data), 2, mean) adipose adrenal brain breast -Inf -Inf -Inf -Inf colon heart kidney liver -Inf -Inf -Inf -Inf lung lymphnode mixture mixture -Inf -Inf -Inf -Inf mixture ovary prostate skeletal_muscle -Inf -Inf -Inf -Inf testes thyroid white_blood_cell -Inf -Inf -Inf > mean2<-function(x) { mean(x[x>0]) }> apply(log10(data), 2, mean2) adipose adrenal brain breast 2.335220 2.344531 2.278299 2.346041 colon heart kidney liver 2.380096 2.226729 2.415721 2.236490 lung lymphnode mixture mixture 2.484701 2.502548 2.531860 2.776740 mixture ovary prostate skeletal_muscle 2.670258 2.402131 2.503051 2.464915 testes thyroid white_blood_cell 2.486507 2.439520 2.597849 >
![Page 71: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/71.jpg)
Quick-Rhttp://www.statmethods.net/management/userfunctions.html
![Page 72: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/72.jpg)
Quick-Rhttp://www.statmethods.net/management/controlstructures.html
![Page 73: R lecture oga](https://reader030.vdocuments.mx/reader030/viewer/2022021506/586fba101a28abe57d8b8607/html5/thumbnails/73.jpg)