![Page 1: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/1.jpg)
ggplotHadley Wickham
![Page 2: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/2.jpg)
Outline
• Introduction to the data
• Introduction to ggplot
• Supplemental statistical summaries
• Iterating between graphics and models
• Graphical margins
![Page 3: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/3.jpg)
Intro to data
• Response of trees to gypsy moth attack
• 5 genotypes of tree: Dan-2, Sau-2, Sau-3, Wau-1, Wau-2
• 2 treatments: NGM / GM
• 2 nutrient levels: low / high
• 5 reps
![Page 4: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/4.jpg)
Intro to data
• 50 bugs (2 x 5 x 5)
• Weight of living bugs
• 100 leaves (2 x 2 x 5 x 5)
• Nitrogen (~ protein)
• Salicylates
• Tannins
![Page 5: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/5.jpg)
Intro to ggplot
• High-level package for creating statistical graphics - has a rich and comprehensive set of components, and a user friendly wrapper
• An implementation of "The Grammar of Graphics", Wilkinson 2005
• Find out more at http://had.co.nz/ggplot2
![Page 6: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/6.jpg)
10
20
30
40
50
60
70
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
weight
genotype
qplot(genotype, weight, data=b)
![Page 7: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/7.jpg)
10
20
30
40
50
60
70
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
qplot(genotype, weight, data=b, colour=nutr)
![Page 8: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/8.jpg)
10
20
30
40
50
60
70
Sau−3 Dan−2 Sau−2 Wau−2 Wau−1
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
![Page 9: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/9.jpg)
Comparing means
• Actually interested in comparing the means of the groups
• But hard to do visually - eyes naturally compare ranges
• What can we do? - Visual ANOVA
![Page 10: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/10.jpg)
Supplements
• smry <- stat_summary(fun=stat_mean_cl_boot, geom="crossbar", conf.int=0.68, width=0.3)
• Add another layer with summary statistics - mean and boot strap estimate of sem
• (from Frank Harrell's Hmisc package)
![Page 11: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/11.jpg)
10
20
30
40
50
60
70
Sau−3 Dan−2 Sau−2 Wau−2 Wau−1
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
qplot(genotype, weight, data=b, colour=nutr)
![Page 12: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/12.jpg)
10
20
30
40
50
60
70
Sau−3 Dan−2 Sau−2 Wau−2 Wau−1
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
qplot(genotype, weight, data=b, colour=nutr) + smry
![Page 13: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/13.jpg)
10
20
30
40
50
60
70
High.Sau−3Low.Sau−3High.Dan−2Low.Dan−2High.Sau−2Low.Sau−2High.Wau−2Low.Wau−2High.Wau−1Low.Wau−1
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
interaction(nutr, genotype)
![Page 14: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/14.jpg)
Iterating graphics and modelling
• Strong genotype effect
• Is there a nutr effect? Is there a nutr-genotype interaction?
• Hard to see from this plot - what if we remove the nutr main effect?
• (Old idea of Tukey's)
![Page 15: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/15.jpg)
b$weight2 <- resid(lm(weight ~ genotype, data=b))
qplot(genotype, weight2, data=b, colour=nutr) + smry
b$weight3 <- resid(lm(weight ~ genotype + nutr, data=b))
qplot(genotype, weight3, data=b, colour=nutr) + smry
![Page 16: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/16.jpg)
−20
−10
0
10
20
Sau−3 Dan−2 Sau−2 Wau−2 Wau−1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
2
genotype
![Page 17: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/17.jpg)
−20
−10
0
10
Sau−3 Dan−2 Sau−2 Wau−2 Wau−1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
3
genotype
![Page 18: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/18.jpg)
Df Sum Sq Mean Sq F value Pr(>F) genotype 4 13331 3333 36.22 8.4e-13 ***nutr 1 1053 1053 11.44 0.0016 ** genotype:nutr 4 144 36 0.39 0.8141 Residuals 40 3681 92
anova(lm(weight ~ genotype * nutr, data=b))
![Page 19: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/19.jpg)
p <- qplot(genotype, weight, data=b, colour=nutr) + smry
p
p + aes(y = weight2)
p + aes(y = weight3)
![Page 20: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/20.jpg)
Graphical margins
• Often interested in marginal, as well as conditional, relationships
• Or comparing one subset to the whole, rather than to other subsets
• Like in contingency table, we often want to see margins as well
![Page 21: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/21.jpg)
2.0
2.5
3.0
3.5
4.0
4.5
2.0
2.5
3.0
3.5
4.0
4.5
GM NGM GM NGM GM NGM GM NGM GM NGM
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2High
Low
●●●
●
●
●
●
●
●
●
●●
●
●●●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●●●●
●●●
●●
●
●●
●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●
●●
●
●●
trtNGMGM
n
trt
![Page 22: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/22.jpg)
2.02.53.03.54.04.5
2.02.53.03.54.04.5
2.02.53.03.54.04.5
GM NGM GM NGM GM NGM GM NGM GM NGM GM NGM
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 (all)High
Low(all)
●●●
●
●
●
●
●●
●
●●
●
●●●
●●●
●
●●
●
●●●
●●●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●●
●
●●●●●●
●
●●
●
●●●●●●
●
●●
●
●
●●●
●
●●
●●
●
●●
●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●●●●
●●
●
●●
●
●●●●
●●
●
●●
●
●●
●●
●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●●
●●
●
●●
●
●
●●
●●
●●
●●
●●
●
●
●
●●
●●
●
●●●
●●●
●
●
●
●●●
●●
●
●●●
●●●●●●
●
●●
●
●
●●
●●●
●
●
●
●
●●●●
●●
●
●●
●●
●
●●●
●●●
●
●●●
●
●
●
●
●●
●
●
●
●●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●●●●●●
●
●●
●
●
●●●
●
●●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●●●
●●
●
●●
●
●●
●●
●
●
●
●●
trtNGMGM
n
trt
![Page 23: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/23.jpg)
Arranging plots
• Facilitate comparisons of interest
• Small differences need to be closer together (big difference can be far apart)
• Connections to model?
![Page 24: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/24.jpg)
10
20
30
40
50
60
70
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2 Dan−2 Sau−2 Sau−3 Wau−1 Wau−2
High Low
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
10
20
30
40
50
60
70
10
20
30
40
50
60
70
1 1 1 1 1
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2
HighLow
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
weight
1
10
20
30
40
50
60
70
High Low
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
genotypeWau−2Wau−1Sau−3Sau−2Dan−2
weight
nutr
10
20
30
40
50
60
70
Dan−2 Sau−2 Sau−3 Wau−1 Wau−2
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
nutrLowHighwe
ight
genotype
![Page 25: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/25.jpg)
Conclusions
• Three useful graphical techniques:
• Supplement with statistical summaries
• Iterate graphics and modelling
• Graphical margins
• Graphics packages should get out of your way, and let you focus on creating the graphics you need
![Page 26: ggplot - R: The R Project for Statistical Computing](https://reader033.vdocuments.mx/reader033/viewer/2022052105/6286f4448b13c347fa1c2468/html5/thumbnails/26.jpg)
had.co.nz/ggplot2