microarray quality assessment issues in high-throughput data analysis bios 691-803 spring 2010 dr...
TRANSCRIPT
Microarray Quality Assessment
Issues in High-Throughput Data Analysis
BIOS 691-803 Spring 2010
Dr Mark Reimers
Quality Assessment
• Are there any factors that would lead you to doubt or distrust a particular datum (array) ?
• Quality of inputs – e.g. RNA quality
• Statistical QA – evidence of systematic variation different from others
BioAnalyzer
Ideal: Two sharp peaks for 18S & 28S RNA
Spot QA for cDNA Spotted Arrays
• Spot Measures– Signal/Noise
• Foreground / background or – foreground / SD
– Uniformity– Spot Area
• Global Measures– Qualitative assessments – Averages of spot measures
• Inspect images for artifacts– Streaks of dye, scratches etc.
• Are there biases in regions?
With commercial arrays we assume these issues are under control
Statistical Approaches
• Question: Are any samples different from others on technical grounds?
• Exploratory Data Analysis (EDA)
• Boxplots, clustering, PCA– Are there any outliers?– Are there associations with technical factors?
• Technician; date of sample prep; etc.
EDA - Boxplots
• Boxplot of 16 chips from Cheung et al Nature 2005
45
67
89
Another Portrait - Densities
4 6 8 10 12 14
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Density Plots:before and after
log(Signal)
De
nsity
Chips
GSM25524.CELGSM25525.CELGSM25526.CELGSM25527.CELGSM25528.CELGSM25529.CELGSM25530.CELGSM25531.CEL
GSM25540.CELGSM25541.CELGSM25542.CELGSM25543.CELGSM25548.CELGSM25549.CELGSM25550.CELGSM25551.CEL
Probe Intensities in 23 Replicates
Some Causes of Technical Variation
• Temperature of hybridization differs• Amount of RNA differs• RNA degraded in some samples• Yield of conversion to cDNA or cRNA
differs• Strength of ionic buffers differs• Stringency of wash differs• Scratches on some chips• Ozone (affects Cy5) at some times
Borrow an Idea from Model Testing
• Question: Is the model adequate? Or do hidden factors cause systematic errors?
• Examine residuals after fitting model – Should be IID Normal– Is there structure in residuals?– Plot against known technical covariates, such
as order of sample
• How to adapt residual examination for high-throughput assays?
Statistical QA for Arrays
• Model for signal of probe i on chip j: yij ~ i + ij
– Each gene has same mean in all arrays (mostly true)– Look at residuals after fitting model
• New twist for high-throughput assays:– Examine residuals within each chip (fix j; vary i)– Plot against known technical factors of probes– Is there any factor that seems to be predicting
systematic errors?
Statistical QA of Arrays• Significant artifacts may not be obvious
from visual inspection or bulk statistics
• General approach: plot deviations from average or residuals from fit against any technical variable:– Average Intensity across chips
– CG content or Tm
– Probe position relative to 3’ end of gene (for poly-T primed RNA)
– Physical location on chip
Ratio vs Intensity Plots: Saturation & Quenching
• Saturation– Decreasing rate of
binding of RNA at higher occupancies on probe
• Quenching:– Light emitted by one
dye molecule may be re-absorbed by a nearby dye molecule
– Then lost as heat– Effect proportional to
square of density
Plot of log ratio against average log intensity across chips
GSM25377 from the CEPH expression data GSE2552
How Much Variability on R-I?
• Ratio-Intensity plots for six arrays at random from Cheung et al Nature (2005)
Covariation with Probe Tm
• MAQC project
• Agilent 44K– Array 1C3– Performed by
Agilent
•Plot of log ratios to average against Tm •Bimodal distribution because two samples are very different
Covariation with Probe Position
• RNA degrades from 5’ end
• Intensity should decrease from 3’ end uniformly across chips
• affyRNAdeg plots in affy package
Plot of average intensity for each probe position across all genes against probe position
Effect of Runs of Guanines
• 4 G’s allows quadruplex structure
Spatial Variation Across Chips
Red/Green ratios show variation-probably concentrated
Ratios of ratios on slide to ratios on standard show consistent biases
In House Spotted Arrays
Ratio of ratios shows much clearer concentration of red spots on some slides
Note non-random but highly irregular concentration of red
Legend
Bioconductor arrayQuality Package
Background Subtraction (1)
• We think that local background contributes to bias
• Does subtracting background remove bias?
Local off-spot background may not be the best estimate of spot background (non-specific hyb)
Spots BG subtracted
Background Subtraction (2)
Raw spot ratios show a mild bias relative to averageAfter subtracting a high green bg in the center a red bias results
Raw Ratios Background BG-subtracted
Other Bias Patterns
This spotted oligo array shows strong biases at the beginning and end of each print-tip group
The background shows a milder version of this effect
Subtracting background compensates for about half this effect
Processed Raw Spot Background
Local Bias on Affymetrix ChipsImage of raw data on a log2 scale shows striations but no obvious artifacts
Image of ratios of probes to standard shows a smudge
Non-coding probes
Images show high values as red, low values as yellow
Spatial Artifacts on Affy Chips
Bubbles (yellow) in hybridization chamber
Touching cover slip and wiping incompletely
Scratches on cover slip
QC in Bioconductor
• Robust Multi-chip Analysis (RMA) – fits a linear model to each probe set– High residuals show regional patterns
High residuals in green
Available in affyQCReport package at www.bioconductor.org
See http://plmimagegallery.bmbolstad.com/
Affy QC Metrics in Bioconductor
• affyPLM package fits probe level model to Affymetrix raw data
• NUSE - Normalized Unscaled Standard Errors – normalized relative to
each gene
• How many big errors?
Spatial Artifacts in Agilent
• Usually not so strong as on other array types
• More diffuse artifacts – probably reflecting washing irregularities
Spatial Artifacts in Nimblegen
• More common than Agilent
• Usually more diffuse, probably reflecting washing
• Some sharp artifacts of unclear origin
Spatial Artifacts in Illumina Arrays
• Often bigger artifacts than Affy
• Less consequential because more beads, and all have same sequence