microarray quality assessment issues in high-throughput data analysis bios 691-803 spring 2010 dr...

30
Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Upload: ethel-jennings

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Microarray Quality Assessment

Issues in High-Throughput Data Analysis

BIOS 691-803 Spring 2010

Dr Mark Reimers

Page 2: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Quality Assessment

• Are there any factors that would lead you to doubt or distrust a particular datum (array) ?

• Quality of inputs – e.g. RNA quality

• Statistical QA – evidence of systematic variation different from others

Page 3: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

BioAnalyzer

Ideal: Two sharp peaks for 18S & 28S RNA

Page 4: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spot QA for cDNA Spotted Arrays

• Spot Measures– Signal/Noise

• Foreground / background or – foreground / SD

– Uniformity– Spot Area

• Global Measures– Qualitative assessments – Averages of spot measures

• Inspect images for artifacts– Streaks of dye, scratches etc.

• Are there biases in regions?

With commercial arrays we assume these issues are under control

Page 5: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Statistical Approaches

• Question: Are any samples different from others on technical grounds?

• Exploratory Data Analysis (EDA)

• Boxplots, clustering, PCA– Are there any outliers?– Are there associations with technical factors?

• Technician; date of sample prep; etc.

Page 6: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

EDA - Boxplots

• Boxplot of 16 chips from Cheung et al Nature 2005

45

67

89

Page 7: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Another Portrait - Densities

4 6 8 10 12 14

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Density Plots:before and after

log(Signal)

De

nsity

Chips

GSM25524.CELGSM25525.CELGSM25526.CELGSM25527.CELGSM25528.CELGSM25529.CELGSM25530.CELGSM25531.CEL

GSM25540.CELGSM25541.CELGSM25542.CELGSM25543.CELGSM25548.CELGSM25549.CELGSM25550.CELGSM25551.CEL

Page 8: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Probe Intensities in 23 Replicates

Page 9: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Some Causes of Technical Variation

• Temperature of hybridization differs• Amount of RNA differs• RNA degraded in some samples• Yield of conversion to cDNA or cRNA

differs• Strength of ionic buffers differs• Stringency of wash differs• Scratches on some chips• Ozone (affects Cy5) at some times

Page 10: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Borrow an Idea from Model Testing

• Question: Is the model adequate? Or do hidden factors cause systematic errors?

• Examine residuals after fitting model – Should be IID Normal– Is there structure in residuals?– Plot against known technical covariates, such

as order of sample

• How to adapt residual examination for high-throughput assays?

Page 11: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Statistical QA for Arrays

• Model for signal of probe i on chip j: yij ~ i + ij

– Each gene has same mean in all arrays (mostly true)– Look at residuals after fitting model

• New twist for high-throughput assays:– Examine residuals within each chip (fix j; vary i)– Plot against known technical factors of probes– Is there any factor that seems to be predicting

systematic errors?

Page 12: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Statistical QA of Arrays• Significant artifacts may not be obvious

from visual inspection or bulk statistics

• General approach: plot deviations from average or residuals from fit against any technical variable:– Average Intensity across chips

– CG content or Tm

– Probe position relative to 3’ end of gene (for poly-T primed RNA)

– Physical location on chip

Page 13: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Ratio vs Intensity Plots: Saturation & Quenching

• Saturation– Decreasing rate of

binding of RNA at higher occupancies on probe

• Quenching:– Light emitted by one

dye molecule may be re-absorbed by a nearby dye molecule

– Then lost as heat– Effect proportional to

square of density

Plot of log ratio against average log intensity across chips

GSM25377 from the CEPH expression data GSE2552

Page 14: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

How Much Variability on R-I?

• Ratio-Intensity plots for six arrays at random from Cheung et al Nature (2005)

Page 15: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Covariation with Probe Tm

• MAQC project

• Agilent 44K– Array 1C3– Performed by

Agilent

•Plot of log ratios to average against Tm •Bimodal distribution because two samples are very different

Page 16: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Covariation with Probe Position

• RNA degrades from 5’ end

• Intensity should decrease from 3’ end uniformly across chips

• affyRNAdeg plots in affy package

Plot of average intensity for each probe position across all genes against probe position

Page 17: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Effect of Runs of Guanines

• 4 G’s allows quadruplex structure

Page 18: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spatial Variation Across Chips

Red/Green ratios show variation-probably concentrated

Ratios of ratios on slide to ratios on standard show consistent biases

Page 19: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

In House Spotted Arrays

Ratio of ratios shows much clearer concentration of red spots on some slides

Note non-random but highly irregular concentration of red

Legend

Page 20: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Bioconductor arrayQuality Package

Page 21: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Background Subtraction (1)

• We think that local background contributes to bias

• Does subtracting background remove bias?

Local off-spot background may not be the best estimate of spot background (non-specific hyb)

Spots BG subtracted

Page 22: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Background Subtraction (2)

Raw spot ratios show a mild bias relative to averageAfter subtracting a high green bg in the center a red bias results

Raw Ratios Background BG-subtracted

Page 23: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Other Bias Patterns

This spotted oligo array shows strong biases at the beginning and end of each print-tip group

The background shows a milder version of this effect

Subtracting background compensates for about half this effect

Processed Raw Spot Background

Page 24: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Local Bias on Affymetrix ChipsImage of raw data on a log2 scale shows striations but no obvious artifacts

Image of ratios of probes to standard shows a smudge

Non-coding probes

Images show high values as red, low values as yellow

Page 25: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spatial Artifacts on Affy Chips

Bubbles (yellow) in hybridization chamber

Touching cover slip and wiping incompletely

Scratches on cover slip

Page 26: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

QC in Bioconductor

• Robust Multi-chip Analysis (RMA) – fits a linear model to each probe set– High residuals show regional patterns

High residuals in green

Available in affyQCReport package at www.bioconductor.org

See http://plmimagegallery.bmbolstad.com/

Page 27: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Affy QC Metrics in Bioconductor

• affyPLM package fits probe level model to Affymetrix raw data

• NUSE - Normalized Unscaled Standard Errors – normalized relative to

each gene

• How many big errors?

Page 28: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spatial Artifacts in Agilent

• Usually not so strong as on other array types

• More diffuse artifacts – probably reflecting washing irregularities

Page 29: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spatial Artifacts in Nimblegen

• More common than Agilent

• Usually more diffuse, probably reflecting washing

• Some sharp artifacts of unclear origin

Page 30: Microarray Quality Assessment Issues in High-Throughput Data Analysis BIOS 691-803 Spring 2010 Dr Mark Reimers

Spatial Artifacts in Illumina Arrays

• Often bigger artifacts than Affy

• Less consequential because more beads, and all have same sequence