introduction to r and bioconductor...introduction to r and bioconductor martin morgan...
TRANSCRIPT
![Page 1: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/1.jpg)
Introduction to R and Bioconductor
Martin Morgan ([email protected])Fred Hutchinson Cancer Research Center
Seattle, WA, USA
June 23, 2014
![Page 2: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/2.jpg)
R
R is a language and environment for statistical computing andgraphics
![Page 3: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/3.jpg)
R
R is a language and environment for statistical computing andgraphics
I Full-featured programming language
I Interactive and interpretted – convenient and forgiving of usererrors
I Coherent, extensively documented
![Page 4: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/4.jpg)
R
R is a language and environment for statistical computing andgraphics
I Throughout the language, e.g., factor and NA
I Built-in statistical functionality
I Highly extensible via user-contributed packages
![Page 5: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/5.jpg)
R
R is a language and environment for statistical computing andgraphics
I Explore data
I Communicateresults
![Page 6: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/6.jpg)
R vectors, classes, and functions
I VectorsI logical, integer, numeric, complex, character, raw
(byte)I factor: discrete levelsI Missing-ness, NA
I data.frame, matrix , and other objectsI Functions
I Operating on vectors, e.g., log, lm (fit a linear model)I ‘Higher order’ functions – apply a function to several different
vectors, e.g., lapply(df, log)
I Packages
None of this making sense? R introduction / refresher tutorial thisafternoon
![Page 7: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/7.jpg)
Using R
Documentation
I help()
I vignettes
Work flowsI Scripts. . .
I ReproducibleI Literate
I . . . mature to packagesI Coordinate data, analysis, and documentationI Share with others
![Page 8: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/8.jpg)
Bioconductor project goal
Analysis and comprehension of high-throughput genomic data
![Page 9: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/9.jpg)
Bioconductor project goal
Analysis and comprehension of high-throughput genomic data
Statistical analysis
I Reduce large data to manageable knowledge
I Cope with technological artifacts
I Rigorous exploration
I Designed experiments, e.g., treatment vs. control
I Leading-edge methods for leading-edge questions
![Page 10: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/10.jpg)
Bioconductor project goal
Analysis and comprehension of high-throughput genomic data
I Understandable
I Reproducible
I Effective visualization
I Biological context, e.g., annotation
I Training
![Page 11: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/11.jpg)
Bioconductor project goal
Analysis and comprehension of high-throughput genomic data
I Sequencing: RNA-seq, ChIP-seq, variants, copy number. . .
I Microarrays: expression, SNP, . . .
I Flow cytometry
I Proteomics
I Images
I . . .
![Page 12: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/12.jpg)
What is Bioconductor?
Collection of packages in the R statistical programming language
I Developed by the Bioconductor core and internationalcontributors
I Stable ‘release’ branch, and leading edge ‘devel’ branch
I Open source / open development
Used by. . .
I Individuals
I Academic labs & research groups
I Government agencies
I Pharma and other companies
![Page 13: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/13.jpg)
How to learn & use Bioconductor
1. Install R (&RStudio?)
2. Identify and installpackages
3. Write R scriptsI Input & ‘massage’
dataI Quality assessmentI Statistical analysisI VisualizationI AnnotationI Reports &
summaries
4. Share with colleagues,collaborators, and thecommunity
http://bioconductor.org
![Page 14: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/14.jpg)
I Estabished work flows, e.g., RNA-seq differential expressionwith DESeq2
I Flexible bioinformatic analysis, e.g., . . .
![Page 15: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/15.jpg)
Project strengths
I Extensive
I Respected
I Well-used
I Accessible
I 824 software packages, 867 annotationpackages, 202 experiment data packages
I Sequencing, microarrays, flow cytometry,proteomics, image analysis, . . .
I All packages with vignettes and helppages
I Tutorials, training material, national andinternational conferences
![Page 16: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/16.jpg)
Project strengths
I Extensive
I Respected
I Well-used
I Accessible
“Community repositories that carry out testingare ideal. . . the genetics community isfortunately familiar with the Comprehensive RArchive Network and the principles ofstewardship of modular software embodied inthe Bioconductor suite. . . The journal hassufficient experience with these resources toendorse their use by authors.” – NatureGenetics 46, 1 (2014)
![Page 17: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/17.jpg)
Project strengths
I Extensive
I Respected
I Well-used
I Accessible
PubMedCentral full-text citations
Citations
Bioconductor 9070RNA-seq
edgeR Diff. expression 647DESeq Diff. expression 648
Microarrayaffy Pre-processing 2318limma Diff. expression 4503GOstats GSEA 436
![Page 18: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/18.jpg)
Project strengths
I Extensive
I Respected
I Well-used
I Accessible
I 225,000 unique IP addresses downloaded9.3M packages
I 397,000 site visitors / year (27% increase)viewed 2.8M pages
I ∼ 600 mailing list posts from ∼ 210authors per month
![Page 19: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/19.jpg)
Project strengths
I Extensive
I Respected
I Well-used
I Accessible
http://bioconductor.org
I Package vignettes & help pages
I Work flows
I Mailing list & ‘guest posting’ facility
I Courses and other training
I Annual Conference,Boston July 30 – Aug 1.
![Page 20: Introduction to R and Bioconductor...Introduction to R and Bioconductor Martin Morgan (mtmorgan@fhcrc.org) Fred Hutchinson Cancer Research Center Seattle, WA, USA June 23, 2014 R R](https://reader030.vdocuments.mx/reader030/viewer/2022040411/5ed9ecb98f8ad53f095e99ee/html5/thumbnails/20.jpg)
Acknowledgements
I Bioconductor core: Vince Carey, Sean Davis, Kasper Hansen,Wolfgang Huber, Robert Gentleman, Rafael Irizzary, MichaelLawrence, Levi Waldron
I Bioconductor team: Sonali Arora (introductory material, copynumber), Marc Carlson (annotation), Nate Hayden (pileup,C++), Valerie Obenchain (variants, ranges), Herve Pages(ranges, strings), Paul Shannon (systems biology), DanTenenbaum (web, build)
I The international Bioconductor community!
I Funding: US NHGRI / NIH U41HG004059; NSF 1247813.
More: http://bioconductor.org