reproducible research in r and r studio
TRANSCRIPT
![Page 1: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/1.jpg)
Introduction to Reproducible Researchin R and R Studio.
Susan Johnston
April 1, 2016
![Page 2: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/2.jpg)
What is Reproducible Research?
Reproducibility is the ability of an entire experiment orstudy to be reproduced, either by the researcher or bysomeone else working independently, [and] is one of themain principles of the scientific method.
Wikipedia
![Page 3: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/3.jpg)
In the lab:
![Page 4: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/4.jpg)
Many of us are clicking, copying and pasting...
I Can you repeat all of this again. . .
I . . . and would you get the same results every time?
![Page 5: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/5.jpg)
Worst Case Scenario
![Page 6: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/6.jpg)
Scenarios that benefit from reproducibility
I New raw data becomes available.
I You return to the project after a period of time.
I Project gets handed to new PhD student/postdoc.
I Working collaboratively.
I A reviewer wants you to change a model parameter.
I When you find an error, but not sure where you went wrong.
![Page 7: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/7.jpg)
Four rules for reproducibility.
1. Create a portable project.
2. Avoid manual data manipulation steps - use code!
3. Connect results to text.
4. Version control all custom scripts and documents.
![Page 8: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/8.jpg)
Disclaimer
Many solutions to the same problem!
![Page 9: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/9.jpg)
The Environment: http://www.rstudio.com
![Page 10: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/10.jpg)
Reproducible Research in
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
![Page 11: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/11.jpg)
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
![Page 12: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/12.jpg)
Structuring an R Project.http://nicercode.github.io/blog/2013-05-17-organising-my-project/
All data, scripts and output should be kept within the same projectdirectory.
![Page 13: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/13.jpg)
Structuring an R Project.http://nicercode.github.io/blog/2013-05-17-organising-my-project/
All data, scripts and output should be kept within the same projectdirectory.
![Page 14: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/14.jpg)
Structuring an R Project.http://nicercode.github.io/blog/2013-04-05-projects/
R/ Contains functions relevant to analysis.
data/ Contains raw data as read only.
doc/ Contains the paper.
figs/ Contains the figures.
output/ Contains analysis output(processed data, logs, etc. Treat as disposable).
.R Code for the analysis.
![Page 15: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/15.jpg)
Structuring an R Project.http://robjhyndman.com/hyndsight/workflow-in-r/
I load.R - read in data from files
I clean.R - pre-processing and cleaning
I functions.R - define what you need for anlaysis
I do.R - do the analysis!
![Page 16: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/16.jpg)
Bad habits can hinder portability.https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
setwd("C:/Users/susjoh/Desktop/SalmoAnalysis")
setwd("C:/Users/Susan Johnston/Desktop/SalmoAnalysis")
setwd("C:/Users/sjohns10/Drive/SalmoAnalysis")
source("../../OvisAnalysis/GWASplotfunctions.R")
An analysis should be contained within a directory, and it shouldbe easy to move it or pass on to someone new.
![Page 17: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/17.jpg)
Solution: using Projects.https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
I Establishes a directory with associated .Rproj file.
I Automatically sets the working directory.
I Can save and source .Rprofile, .Rhistory, .Rdata files.
I Allows version control within R Studio.
![Page 18: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/18.jpg)
Creating a Portable Project (.Rproj)
![Page 19: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/19.jpg)
Creating a Portable Project (.Rproj)
![Page 20: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/20.jpg)
Creating a Portable Project (.Rproj)
![Page 21: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/21.jpg)
Creating a Portable Project (.Rproj)
![Page 22: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/22.jpg)
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
![Page 23: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/23.jpg)
This is R. There is no “if”. Only “how”.
I CRAN, Bioconductor, github
I Reading in data and functions
read.table(), read.csv(), read.xlsx(), source()
I Reorganising data
reshape, plyr, dplyr
I Generate figures
plot(), library(ggplot2)
I Running external programmes with system()
Unix/Mac: system("plink -file OvGen --freq")
Windows: system("cmd", input = "plink -file OvGen --freq")
![Page 24: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/24.jpg)
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
![Page 25: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/25.jpg)
The knitr package allows R code and document templates to becompiled into a single report containing text, results and figures.
![Page 26: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/26.jpg)
Output script as Notebook
![Page 27: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/27.jpg)
![Page 28: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/28.jpg)
Write reports directly in R
![Page 29: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/29.jpg)
Write reports directly in R
![Page 30: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/30.jpg)
Creating an R Markdown Script (.Rmd).
![Page 31: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/31.jpg)
Creating an R Markdown Script (.Rmd).
![Page 32: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/32.jpg)
A Quick Start Guidehttp://nicercode.github.io/guides/reports/
1. Type report text into .Rmd file
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
2. Enclose code to be evaluated in chunks
```{r}model1 <- lm(speed ~ dist, data = cars)
```
3. Evaluate code inline
The slope of the model is `r coefficients(model1)[2]`
The slope of the model is 0.16557
4. Compile report as .html, .pdf or .doc
![Page 33: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/33.jpg)
A Quick Start Guidehttp://nicercode.github.io/guides/reports/
NB. PDF and Word docs require additional software.http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
![Page 34: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/34.jpg)
A Quick Start Guidehttp://nicercode.github.io/guides/reports/
http://rmarkdown.rstudio.com/?version=0.98.1103&mode=desktop
![Page 35: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/35.jpg)
Advanced Tips
I Control how chunks are reported and evaluated
```{r echo = F, warning = F, fig.width = 3}model1 <- lm(speed ~ dist, data = cars)
plot(model1)
```
I spin(): compile .R files using #’, #+ and #-
http://deanattali.com/2015/03/24/knitrs-best-hidden-gem-spin/
I LATEXdocuments, Presentations, Shiny, etc.
![Page 36: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/36.jpg)
Reproducible Research in .
1. Creating a Portable Project (.Rproj)
2. Automate analyses - stop clicking and start typing.
3. Dynamic report writing with R Markdown and knitr
4. Version control using git
![Page 37: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/37.jpg)
Version Control can revert a document to a previousversion.
![Page 38: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/38.jpg)
Version Control can revert a document to a previousversion.
![Page 39: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/39.jpg)
Version Control can revert a document to a previousversion.
![Page 40: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/40.jpg)
Version Control can revert a document to a previousversion.
![Page 41: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/41.jpg)
Version Control Using git.https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN
Git can be installed on all platforms, and can be used to implementversion control within an R Studio Project.http://git-scm.com/downloads
![Page 42: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/42.jpg)
Version Control in R Studio
Tools >Project Options allows setup of git version control.
![Page 43: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/43.jpg)
Version Control in R Studio
Select git as a version control system
![Page 44: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/44.jpg)
Version Control in R Studio
Select git as a version control system
![Page 45: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/45.jpg)
Version Control in R Studio
git information will appear in the top-right frame.
![Page 46: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/46.jpg)
Version Control in R Studio
git information will appear in the top-right frame.
![Page 47: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/47.jpg)
Version Control in R Studio
Select files to version control, write a meaningful commit message>Commit
![Page 48: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/48.jpg)
Version Control in R Studio
Select files to version control, write a meaningful commit message>Commit
![Page 49: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/49.jpg)
Version Control in R Studio
After modifying the file, repeat the process.
![Page 50: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/50.jpg)
Version Control in R Studio
After modifying the file, repeat the process.
![Page 51: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/51.jpg)
Version Control in R Studio
Previous versions can be viewed and restored from the History tab.
![Page 52: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/52.jpg)
Version Control in R Studio
Previous versions can be viewed and restored from the History tab.
![Page 53: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/53.jpg)
Advanced Steps: Github
I Forking projects
I All scripts are backed up online
I Facilitates collaboration and working on different computers
![Page 54: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/54.jpg)
Take home messages
I Manage projects reproducibly: The first researcher who willneed to reproduce the results is likely to be YOU.
I Time invested in learning to code pays off - do it.
I Supervisors should be patient and encourage students to code.
![Page 55: Reproducible Research in R and R Studio](https://reader036.vdocuments.mx/reader036/viewer/2022062503/58f147831a28ab82588b45f1/html5/thumbnails/55.jpg)
Online Resources
I RStudio: Idiot-proof guides and cheat sheetshttp://www.rstudio.com/
I Nice R Code: How-tos and advice on good coding practicehttp://nicercode.github.io/guide.html
I Ten Simple Rules for Reproducible Computational Researchhttp://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
I Yihui Xie’s blog (knitr) http://yihui.name/en/categories/
I R Bloggers: http://www.r-bloggers.com/
I StackOverflow questions on R and knitrhttp://stackoverflow.com/questions/tagged/r+knitr