an analysis of microarray quality control data james j. chen, ph.d. division of biometry and risk...
Post on 27-Mar-2015
220 Views
Preview:
TRANSCRIPT
An Analysis of MicroArray Quality Control Data
James J. Chen, Ph.D.Division of Biometry and Risk AssessmentNational Center for Toxicological Research
U.S. Food and Drug Administration
2006 FDA and Industry WorkshopSeptember 29, 2006
The views expressed in this presentation do not represent those of the U.S. Food and Drug Administration
Outline
Background: MAQC experimental design and data
Microarray Platform Comparisons Inter-platform analysis Intra-platform analysis and platform’s performance
concordance, site effects, consistency, discriminability sensitivity, specificity, and accuracy in gene selection self-consistency of titration mixture
TaqMan and microarray platforms comparability Conclusion
MicroArray Quality Control Project
Objective: To compare expression data generated at multiple test sites (labs) using several microarray-based and alternative technology platforms
Microarray platforms Alternatives platforms
Applied Biosystems ABI (1) Applied Biosystems (TAQ)Affymetrix AFX (1) Panomics (QGN) Agilent AGI (1, 2) Gene Express (GEX)
Eppendorf EPP (1) GE Healthcare GEH (1)Illumina ILM (1)NCI_Operon NCI (2)
Nature Biotechnologyv24(9), Sep (2006)
MAQC Experimental Design
Four RNA samples: Sample A: Universal human reference RNA (Stratagene) Sample B: Human brain reference RNA (Ambion) Sample C (75% A + 25% B) Sample D (25% A + 75% B) Three sites for each microarray platform (NCI: 2 sites) One site for the TAQ, QGN, GEX Five technical replicates for each microarray platform Four replicates for TAQ, three replicates for QGN & GEX
EPP: 294 target genes; QGN: 245; GEX:205
MAQC Data Used for Comparisons
Platform
ABIAFXAGIGEHILM
TAQ
Probe
32,87854,67543,93154,35947,293
1,004
Site
33
3 33
1
Array2
5860566059
N/A
Rep1
55555
412,091 common genes among microarray platforms 906 TAQ genes are among the 12,091 genes1. technical replicates; 2. a total of 293 arrays
Sample
44444
4
Hierarchical Clustering of 293 arrays on12091 genes from all pairwise correlations between two arrays.
AFXABIAGLGEHILM
ABCD
Site1Site2Site3
Sam
ple
Site
A B C D
0.5
0.6
0.7
0.8
Concordance: all pairwise Inter-platform sample correlation coefficients between two arrays from different platforms.
Up to 2250 (10x15x15) correlations computed for each sample.
.74.70 .71
.68
.82
.45
Concordance: all pairwise Inter-platform fold-change correlation coefficients between two arrays from different platforms.
B/A C/A D/A C/B D/B C/D
0.6
0.7
0.8
0.9
90 (10 x 3 x 3) correlations for each fold-change
.85
.75
.82.78
.84
.78
.92
.53
Cross Platform Consistency
Proportion of genes shows a significant platform*sample interaction from the (gene-by-gene) ANOVA:
y = m + P + Sample + P*Sample + e
Significant interaction: the patterns of expression of the four samples are inconsistent across the platforms.
alpha:10pow er
pro
portio
n o
f sig
nifi
cance
s
0.2
0.4
0.6
0.8
1.0
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
Plot of the p-values versus ranking proportions
Proportion
log10 p
The proportion of significances is 30% at = 0.01
0.3
Inconsistency (p < 0.01) Consistency (p > 0.01) Gene5 ,pvalue < 10 11
-1
0
1
2
3
AFX ABI AG1 ILM GEH
ABCD
Gene21 ,pvalue = 0.001
-1
0
1
2
3
AFX ABI AG1 ILM GEH
ABCD
Gene312 ,pvalue = 0.11
-1
0
1
2
3
AFX ABI AG1 ILM GEH
ABCD
Gene15 ,pvalue = 0.991
-1
0
1
2
3
AFX ABI AG1 ILM GEH
ABCD
Intra-Platform Analysis
Concordance: all pairwise correlations between two arrays from different sites for samples A,B,C, and D (3 x 5 x 5 correlations).
Site Effects: ANOVA: y = m + sample+ site + sample*site + e
Site Effect: the variance ratio, F = MSEsite/MSEe
Consistency: proportion of genes shown to have a significant sample*site interaction (
Discriminability: ANOVA: y = m + sample + e
Variability: residual mean square (total variation other than sample differences).
Discriminability: the proportion of the genes shown to have significant sample effects ( .
Individual Platform’s Performance
Reproducibility and Consistency Performance Median Correlation Site Cons’y MSE
Discr’ty2
A B C D Fm 1
AFX .988 .988 .991 .992 24. .012 .066 .618
ABI .968 .964 .972 .969 15. .008 .107 .620
AG1 .978 .982 .982 .981 28. .063 .090 .633
ILM .980 .979 .980 .981 242. .020 .266 .441 GEH .925 .904 .872 .862 64. .097 .267 .453
2.
Gold Standard Set
A gene is differentially expressed if it was shown to be significant in at least 2 of the 5 platforms at 10-5.
H0: A - B = 0 versus H1: A - B ≠ 0 (8265 genes were selected)
A gene is non-differentially expressed if its fold change was shown to be between 0.90 and 1/0.90 in at least 2 of the 5 platforms at 10-3. Let - log2(0.90)
Equivalence test: H0: |A-B| > versus H1: |A-B| <
(498 genes were selected)
Gold Standard: 8607 genes (delete 78 overlaps)
Accuracy (AC), sensitivity (SN), specificity (SP), and FDR by FWE = 0.05* and FDR = 0.05 as threshold.
AC SN SP FDR
.77 .76 .95 .004
.74 .73 .95 .004
.81 .80 .80 .003
.55 .53 1.0 .000
.54 .52 .95 .005
AC SN SP FDR
.92 .94 .55 .024
.89 .91 .59 .023
.92 .94 .55 .024
.88 .88 .95 .023
.82 .82 .69 .019
AFXABIAG1ILMGEH
FWE = 0.05* FDR = 0.05
= 0.05/8607 = 5.8 x 10-6
Comment on MAQC: Gene Selection
The MAQC project used technical replicates (small variance) with two distinct biological samples (large difference).
The number of differential expressed genes are much more than typical microarray experiments.
Generating a gene list is not a problem, the problem is determining the number of genes in the list.
General principle: to identify a list of differentially expressed genes as accurately as possible.
Reproducibility of lists of differentially expressed genes – Percentage of Overlapping Genes (POG)
For AFX, 6319 genes have p < 10-5 4370 genes have FC > 2.
For AB1, 6127 genes have p < 10-5 4835 genes have FC > 2.
At least more than 4,000 genes can be selected with an FDR estimate less than 2/4,000.
from MAQC Fig S2 of supplements.
Assessment of Titration Trend
Titration correlations: 0.75A+0.25B and C 0.25A+0.75B and D
Titration model: (A two-step test)
The titration relationship can be modelled by M1t: y = m + Conc + Site + e
Full ANOVA model. M1 y = m + Sample + Site + e
S1: Test for Sample difference M1: H0t1: A = B = C = D
S2: Test for the goodness of fit: H0t2 M1t = M1 Proportion of genes that reject H0t1and accept H0t2
Linear Titration Model
H0t1:A H0t1:R,H0t2:A H0t1:R,H0t2:R
p1= 0.316
2
4
6
8
10
12
B_0 D_0.25 C_0.75 A_1
p1<0.0001 , p2= 0.108
2
4
6
8
10
12
B_0 D_0.25 C_0.75 A_1
p1<0.0001 , p2<0.0001
2
4
6
8
10
12
B_0 D_0.25 C_0.75 A_1
Titration correlation for samples C and D, and the proportions of the genes that follow the titration relationship.
Sample C Sample D (5%, 5%) (1%, 1%)
.909 .911 .963 .976 .916 .928 .954 .967 .930 .939 .923 .944 .930 .936 .937 .954 .923 .934 .988 .988
AFXABIAG1ILMGEH
Correlation Titration Model (,
Taqman and microarray platform concordance: Box-Plots of all pairwise sample correlation coefficients. Corre. of TAQ v.s. microarrays
0.50
0.55
0.60
0.65
0.70
0.75
0.80
AFX ABI AG1 ILM GEH
AB
60 (4 x 15) correlations computed in each sample
.78
.62
.77.75
.74.76
.66
.71
.74
.71
.52
.80
Taqman and microarray platform concordance: Box-Plots of fold-change (B/A) correlation coefficients.Corre. of TAQ v.s. microarrays: B/A
0.82
0.84
0.86
0.88
0.90
AFX ABI AG1 ILM GEH
.86
.88
.89
.86
.89
.82
.90
Consistency of TaqMan and Microarray platforms
Proportions of significances: 0.72, 0.57, 0.49, 0.65, 0.39; Proportion of significances microarray platforms: 0.30
pvalue = 0.74
0.0
0.5
1.0
1.5
2.0
AFX ABI AG1 ILM GEH
ABCD
0.0
0.5
1.0
1.5
2.0
AFX ABI AG1 ILM GEH
ABCD
10 10
10 8
10 4
10 9
10 7
microarray platforms Taqman and microarray
Conclusion (1)
Inter platform (microarray and Taqman): Concordance
Sample correlations: 0.45(D)-0.82 (A) FC correlations: Higher B/A; Lower: C/A
In-consistency Microarray platforms: Thirty percent (30%) of genes show
inconsistent expression patterns at = 0.01. Taqman and microarray platforms: The proportions are
between 0.34 to 0.74 for the five platforms.
Comparability Intensities measured by different microarray platforms, and
measured between microarray and Taqman platforms are different.
Conclusion (2)
Titration Trend Titration Correlation: The correlations between observed
intensity and expected intensity are more than 90%. Titration trend: All five platforms follow the linear titration
relationship well.
Intra microarray platforms’ performance
Concordance: Intra-platform correlations are high. Site effect: All platforms show site effects. Consistency: The patterns of expression are consistent across
three sites.
top related