getting data into r & bioconductor
DESCRIPTION
Aed í n Culhane [email protected]. Getting Data into R & Bioconductor. http://www.hsph.harvard.edu/research/aedin-culhane/. Simple Excel SpreadSheet data. Already described Read.table() Read.csv() scan() Are other formats eg netcdf However more datatype specialized. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/1.jpg)
Getting Data into R & Bioconductor
Aedín Culhane
http://www.hsph.harvard.edu/research/aedin-culhane/http://www.hsph.harvard.edu/research/aedin-culhane/
![Page 2: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/2.jpg)
Simple Excel SpreadSheet data
• Already described – Read.table()– Read.csv()– scan()
• Are other formats eg netcdf
• However more datatype specialized.– Look at Technologies on BiocViews.– http://www.bioconductor.org/packages/release/BiocViews.html
22
![Page 3: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/3.jpg)
Some common data types
• Microarray
• SNP
• Increasingly NGS
May 2011May 2011 33
![Page 4: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/4.jpg)
A Microarray OverviewA Microarray Overview
44
![Page 5: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/5.jpg)
Reading Affymetrix Data
library(affy)
require(affy) # Alternative
affybatch <- ReadAffy(celfile.path="[Location of your data]")
eSet<-justRMA()
May 2011May 2011 55
![Page 6: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/6.jpg)
Sample R code
66
![Page 7: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/7.jpg)
ExpressionSet Class in R
May 2011May 2011 77
![Page 8: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/8.jpg)
Assessing Data Quality
May 2011May 2011 88
![Page 9: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/9.jpg)
Public Microarray Data
ArrayExpress • 21997 Studies (622,617 profiles,)
GEO • 22,735 Studies (558,074 profiles)
Statistics May 2011Statistics May 2011
![Page 10: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/10.jpg)
>500,000 arrays x $500 = $250,000,000
Cancer Studies account for >14% of all studies in databases…
![Page 11: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/11.jpg)
R Code
May 2011May 2011 1111
![Page 12: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/12.jpg)
More on GEOquery
May 2011May 2011 1212
require(GEOquery) require(GEOquery)
Let's try to load the GDS810 dataset which contains data on Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. Alzheimer's disease at various stages of severity.
GDS810<-getGEO("GDS810") GDS810<-getGEO("GDS810")
The The getGEOgetGEO function returns an object of class function returns an object of class GEODataGEOData. You can . You can get a description of this class like this: get a description of this class like this: help("GEOData-class") help("GEOData-class")
Meta(GDS810) Meta(GDS810) Columns(GDS810) Columns(GDS810) head(Table(GDS810)) head(Table(GDS810))
![Page 13: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/13.jpg)
Affy SNP Arrays
May 2011May 2011 1313
![Page 14: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/14.jpg)
Process – Affy SNP Arrays (Oligo package)
May 2011May 2011 1414
![Page 15: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/15.jpg)
Other Arrays
• Illumina– Lumi package
• 2 color spotted arrays– Limma package
• Other arrays– http://www.bioconductor.org/help/workflows/
oligo-arrays/
May 2011May 2011 1515
![Page 16: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/16.jpg)
Next Generation Sequencing Data
![Page 17: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/17.jpg)
R Code
May 2011May 2011 1717
![Page 18: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/18.jpg)
Exercise
• From GEO bring down GSE
• Download the dataset GSE1297 using getGEO
• This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs
• Use ArrayQualityMetrics to Assess the data quality of these data
May 2011May 2011 1818
![Page 19: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/19.jpg)
• With thanks to
• www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf
May 2011May 2011 1919
![Page 20: Getting Data into R & Bioconductor](https://reader035.vdocuments.mx/reader035/viewer/2022062409/56814c5d550346895db97bd0/html5/thumbnails/20.jpg)
A B
Quick Aside: Interpreting hierarchical clustering trees
Hierarchical analysis results viewed using a dendrogram (tree)
• Distance between nodes (Scale)• Ordering of nodes not important (like baby mobile)
Tree A and B are equivalentTree A and B are equivalent