design and analysis of microarray experiments at csiro livestock industries
DESCRIPTION
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries. Toni Reverter Bioinformatics Group CSIRO Livestock Industries Queensland Bioscience Precinct 306 Carmody Rd., St. Lucia, QLD 4067, Australia. SSAI – QLD Branch – 6 Apr. 2004. - PowerPoint PPT PresentationTRANSCRIPT
Design and Analysis ofMicroarray Experiments atCSIRO Livestock Industries
Toni Reverter
Bioinformatics GroupCSIRO Livestock Industries
Queensland Bioscience Precinct306 Carmody Rd., St. Lucia, QLD 4067, Australia
SSAI – QLD Branch – 6 Apr. 2004
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
CONTENTS
1. Introduction …………………………… 4 62. Technical Concerns ……...……………. 2 73. Designs ………………..………………. 21 154. Analysis ……………..………………… 14 165. Coverage and Sensitivity ...……………. 5 76. Summary …………....………………… 2 4
Slides Minutes
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
1. Introduction
1.a – The Material
This is a Cow
This is a Sheep
This is a Pig(female)
This is a Chicken
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
cDNA “A” Cy5 cDNA “B” Cy3
Tissue Samples
Treat A Treat B
mRNA Extraction & Amplification
Hybridization
Laser 1 Laser 2
Optical Scanner
+
Image Capture
Analysis
1.b - The Method1. Introduction
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
1.c - The Challenge
SSAI – QLD Branch – 6 Apr. 2004
Time Dependent
Chronology
Logical
1800s – DATA30-60s – METHODS50-70s – SOFTWARE1980s – COMPUTER
cDNA
Human Dependent
Skill Integration
QuantitativeComputer Sci.StatisticiansMathematicians …….
Non-QBiochemistsPhysiologistsPathologists …….
BANANA EGG
“banana omelette”
Historical Excitement Balance Interdisciplinary
Data Dependent
Paradigm
Distribution
Source Size
1. Introduction
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
The Biologist and the Statistician are being executed.They are both granted one last request.
The Statistician asks that he/she be allowed to give onefinal lecture on his/her Grand Theory of Statistics.The Biologist asks that he/she be executed first.
JOKE
“The majority of microarray papersare analysed with substandard methods”
C Tilstone (citing D Allison), Nature 2003, 424:610
CLAIM
1. Biologists don’t care ………………………………… 102. Statisticians are bad …………………………………. 203. Unrealistic expectations ……………………………… 70
REASONS P Value
1.c – Human-Dependent Challenge1. Introduction
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Replication:1. Animal2. Sample3. Array4. Spot
1. Biochemist Level:a. Preparation (Printing) of the Chipb. RNA Extraction, Amplification and Hybridisationc. Optical Scanner (Reading)
2. Quantitative Level:a. Designb. Image (data) Qualityc. Data Analysisd. Data Storage
2. Technical Concerns
Note: Randomisation intentionally neglected.
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
2.a – Data Quality: GP3xCLI 2.b – Storage: GEXEX
2. Technical Concerns
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
a. Identify/Prioritise Questionsb. N of Available Samplesc. N of Available Arraysd. Consider Dye Bias
Key Issues:
Put more arrayson key questions
3. Experimental Designs
Pooling?
•Dye-Swap•Dye-Balancing•Self-Self
O
B
A
ABReference
Evaluation of Designs:
O
B
A
ABLoop
O
B
A
ABAll-Pairs
Variance of Estimated Effects (Relative to the All-Pairs)
Reference
1132
Loop
4/31
8/31
All-Pairs
1121
Main effect of AMain effect of BInteraction ABContrast A-B
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Glonek & Solomon Factorial and Time Course Designs for cDNA Microarray Experiments
• DefinitionA design with a total of n slides and design matrix X is said to be admissibleif there exists no other design with n slides and design matrix X* such that
ci* ciFor all i with strict inequality for at least one i. Where ci* and ci are respectivelythe diagonal elements of (X*’X*)-1 and (X’X)-1.
• Samples vs Slides vs Configurations
3 4 12
2
6
3
12
11
132
(S-1)
S(S-1)
Samples (S)
Arr
ays
N of Configurations?
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
SA-1
N of Configurations?
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Pie-Bald black Non-Pie-Bald black
Normal
White
Recessive SA-1 = 53 = 125
N of Configurations?
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
x5
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
0 hr 24 hr
SA-1 = 109 = 1 Billion!
N of Configurations?
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Opt 1: 10 Slides Opt 2: 10 Slides Opt 3: 11 Slides
Opt 4: 9 Slides Opt 5: 9 Slides
Transitivity (Townsend, 2003) & Extendability (Kerr, 2003)3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
0 hr 24 hr
N of Configurations?
SA-1 = 1210 = 62 Billion!
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
0 hr 24 hr
R
R
R R
R
R
R
R
RR
R
R
G
G
G G
G
G
G
G
G
G
G
G
N of Configurations?
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Pavlidis et al.(2003) The effect of replication on geneExpression microarray experiments. Bioinformatics 19:1620
>= 5 Replicates10-15 Replicates
Peng et al.(2003) Statistical implications of pooling RNASamples for microarray experiments. BMC Bioinformatics 4:26
Power: n9c9 95%, n3c3 50%, n9c3 90% n25c5 n20c20
Handling Constraints (Samples & Arrays):
3. Experimental Designs
N of Arrays?
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
F HS
M TM
F HS
M HS
F TM
M HS
F HS
M HS
R
R
R
R
R
R
R
R
R
R
R
R
R
R
G
GG
G
G
G
G
G
G
G
G
G
G
G
24: 23 To 552
14: 13 To 182
pooling
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
RES SUS 0 3 24 M F HS TM
RES 8 -8 1 0 -1 -1.766 1.766 -3.866 3.866
SUS 8 -1 0 1 1.766 -1.766 3.866 -3.866
0 8 -4 -4 -1.335 1.335 0.666 -0.666
3 10 -6 -1.033 1.033 -0.468 0.468
24 10 2.368 -2.368 -0.198 0.198
M 6.247 -6.247 0.493 -0.493
F 6.247 -0.493 0.493
HS 3.798 -3.798
TM 3.798
Sum(ABS) 29.3 29.3 22.0 23.0 27.1 21.7 21.7 17.6 17.6
Sum(ABS) 26.8 26.8 39.1 23.1 17.3 7.1 7.1 14.3 14.3
Reference Design
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Another (NEW?) Constraint:
A
B
C
D
E
M avium slope 18 days 3 3-3-3
M avium broth 18 days 10 1-2-2-1-2-1-2-1-2-1
M para broth 10 weeks 5 1-2-2-1-1
M para broth 12 weeks 6 1-1-4-5-2-1
M para in-vivo 3 1-1-1
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
A B
C
D
E
A
A
A
B
B
B
C
D
E
C
C
D
E
D E
Importance due to Transitivity of AB with BC and BD
Procedure:Five configurations will be proposed and the statistical optimality of each evaluated.
Another (NEW?) Constraint:
3. Experimental Designs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
Configuration 1
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
Configuration 2
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
Configuration 3
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
Configuration 4
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
3 3 3
1 2 2 1 2 1 2 1 2 1
1 2 2 1 1
1 1 4 5 2 1
1 1 1
Configuration 5
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
A B
C
D
E
A
A
A
B
B
B
C
D
E
C
C
D
E
D E
Imp Weight Squared Error
1 2 3 4 5 1 2 3 4 5
4 6 5 6 6 5 4 1 4 4 1
2 0 2 1 0 0 4 0 1 4 4
2 3 2 2 3 4 1 0 0 1 4
1 0 0 0 0 0 1 1 1 1 1
3 5 5 4 4 5 4 4 1 1 4
4 4 5 5 5 5 0 1 1 1 1
1 0 0 0 0 0 1 1 1 1 1
2 2 0 2 3 2 0 4 0 1 0
1 0 0 0 0 0 1 1 1 1 1
4 3 3 3 3 3 1 1 1 1 1 SSE 17 14 11 16 18
0 1 2 1 0 0 MSE .74 .64 .48 .66 .75
NoiseD D
Con
clu
sion
: C
onfi
gura
tion
3
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
1. Relaxed data acquisition criteriaa. Signal to Noise > 1.00 (relaxer (sp?) exist)b. Mean to Median > 0.85 (Tran et al. 2002)
2. Moving away froma. Ratiosb. “heavy-duty” normalisation techniques
3. Mixed-Model Equationsa. Check residualsb. Check REML estimates of Variance Componentsc. Proportion of Total V due to Gene x Variety
4. Process results Gene x Treatmenta. Mixtures of Distributions
4. Data Analysis
My (EDUCATED?) View:
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
eTv
Ta
Tg VVAAGGXMVNY ,~
Log2Intensities
Comparison GroupArray|Block|Dye
(FIXED) Main GeneEffect(RANDOM)
Gene x Dye(RANDOM)
Gene xVariety(RANDOM)
Residual(RANDOM)
DE Genes
Note: missing but (generally) unimportant.
Gene xArray|Block(RANDOM)
4. Data AnalysisMixed-Model Equations
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Mixed-Model Equations
Log2(Int.) = CG + Gene + GDye + GArray + GVariety + Error
The proportion of the Total Variationaccounted for by the G x Variety Interactionanticipates the proportion of DE Genes
CLAIMControl
ofFDR
4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Y11 197,802 9.33 1.99 5.17 15.99 768 257.5 139 343
Y12 74,030 10.82 1.91 4.95 15.99 576 128.5 22 243
Y21 110,308 9.99 2.07 4.25 15.99 576 191.5 27 319
Y22 116,409 9.89 2.09 5.17 15.99 576 202.1 19 318
Y23 117,687 10.38 2.04 4.91 15.99 576 204.3 36 320
Y31 106,591 10.11 1.77 6.60 15.99 672 158.6 37 278
Y32 236,671 9.44 2.11 5.36 15.99 1,440 164.3 57 269
Observations Comparison Groups Levels ObservationsN Mean SD Min Max Mean Min Max
4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
54 Array Slides
959,498 Valid Intensity Records (S2N>1, M2M>0.85)
7,638 Elements (genes)
752,476 Equations
56 (Co)Variance Components (REML)
BAYESMIX (Bayesian Mixtures of distributions)
4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
2
2
2
2
2
2
2
77,67,57,47,37,27,1
7,667,56,46,36,26,1
7,56,555,45,35,25,1
7,46,45,444,34,24,1
7,36,35,34,333,23,1
7,26,25,24,23,222,1
7,16,15,14,13,12,11
ggggggg
ggggggg
ggggggg
ggggggg
ggggggg
ggggggg
ggggggg
g
2
2
2
2
2
2
2
7
6
55,45,3
5,444,3
5,34,33
2
1
000000
000000
0000
0000
0000
000000
000000
a
a
aaa
aaa
aaa
a
a
a
2
2
2
2
2
2
2
7
6
55,45,3
5,444,3
5,34,33
22,1
2,11
000000
000000
0000
0000
0000
00000
00000
v
v
vvv
vvv
vvv
vv
vv
v
2222222
7654321 eeeeeeee diag
56 (Co)Variance Components4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
% TotalVarianceDue to:
Error 3.0 – 3.6 5.1 – 6.7 3.0 – 3.7
Gene 83.6 – 90.4 78.3 – 81.9 47.5 – 83.9
Gene x Array 3.5 – 9.8 10.4 – 12.6 10.6 – 43.5
Gene x Variety 2.4 – 3.7 2.1 – 2.6 2.5 – 5.4
Genetic Correlations Moderate (EXP3) to Strong
Gene Variety Corr Strong (EXP1) to Moderate (EXP2)
4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
2
1
)(ˆ)(ˆ2
11
jijiji LOWvHIGHvd
5
3
)2(ˆ)1(ˆ3
12
jijiji BREEDvBREEDvd
7
6
5
0
)(ˆ)(ˆ12
13
j tijiji CONTROLvTREATMENTvd
i = 1, …, 7,638 genesj = 1, …, 7 variablest = 0, …, 5 time points (EXP3 only)
Other measure definitions could also be valid
Measures of (Possible) Differential Expression4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
.
38.006.005.0
06.016.002.0
05.002.008.0
,
23.0
08.0
08.0
10.0
08.001.004.0
01.011.009.0
04.009.067.0
,
01.0
19.0
40.0
23.0
02.001.001.0
01.004.001.0
01.001.010.0
,
01.0
03.0
21.0
67.0
3
2
1
N
N
N
d
d
d
f
4. Data AnalysisMixtures of Distributions
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Mixtures of Distributions4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Exp1 Exp2 Exp3 Up Down Up Down Up Down
High-Low Up 409 0 26 13 36 11Down 41 3 0 5 0
HOL-JBL Up 68 0 0 8Down 319 10 6
TSS-UTS Up 252 0Down 109
10 DE Elements across the 3 Exp(2 UP/DOWN/UP; 8 UP/UP/DOWN)
Differentially Expressed Genes4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Residuals Plots4. Data Analysis
178 @ Day 82
139 @ Day 120
114
@ D
ay 1
05
171
@ I
ngui
nal
123 55
68 71
75
39
41
130
43129330
55164523
22
53
27
12
36
5
31
9926
14
4011
43
21
42
10
36
5
8124
25
12
46
12
36
5
26
12
36
5
44
22
Bovine
Ovine
Up-Regulated
Down-Regulated
Allocation of238 DE Genes
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
SSAI – QLD Branch – 6 Apr. 2004
4. Data Analysis
HomologsOrthologsParalogs
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
The “Real” Target: Molecular Interaction Maps
Adapted from Aladjem et al. 2004, Sciences’s STKE
4. Data Analysis
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
MPSS Paper PNAS 03, 100:4702
tpm N Tags %
> 1 (0.0) 27,965 100.00 5 (0.7) 15,145 54.16 10 (1.0) 10,519 37.61 50 (1.7) 3,261 11.66 100 (2.0) 1,719 6.15 500 (2.7) 298 1.07 1,000 (3.0) 154 0.55 5,000 (3.7) 26 0.0910,000 (4.0) 7 0.02
MPSS Test Data No Tags = 25,503
S 1 S 2
100.00 100.00 57.14 49.87 36.11 33.66 10.89 10.74 5.73 5.67 1.21 1.13 0.57 0.55 0.15 0.11 0.05 0.05
cDNA Noise PaperPNAS 02, 99:14031
100.00 56.19 36.79 11.76 6.95 1.94 1.11 0.29 0.16
x
xxf
1
2exp)(
2
5. Coverage and Sensitivity
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
5. Coverage and Sensitivity
Let NT = N of “Total” GenesND = N of “Differentially Expressed” Genes (ND NT)
%
x
x
x
T
itt e
N
xnxf
1
2 2
)(
D
idd N
xnxf
)(
)(
)(
tT
dD
xfN
xfN
1. The relevance of f(xi) is limited to the Concentration Signal mapping.2. At equilibrium the probability of an error either way equals.
Flat line (except Upper Bound)
)()( tdT
D xfxfN
N
T
D
N
N
)()(
)(t
tT
dD xfxfN
xfN
it
idti xn
xnxfx
)(
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
SSAI – QLD Branch – 6 Apr. 2004
5. Coverage and Sensitivity
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
SSAI – QLD Branch – 6 Apr. 2004
5. Coverage and Sensitivity
< = >
Not many DE genesHigh ConfidenceFew False +ve
Lots of DE genesHigh PowerFew False -ve
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
SSAI – QLD Branch – 6 Apr. 2004
5. Coverage and Sensitivity
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
General (ie. not only CSIRO LI):
1. Still in its infancy (…possibly even embryonic stage)2. Many decisions have a heuristic rather than a theoretical
foundation3. Prone to miss-conceptions:
a. Amount of Expression = Amount of Responseb. Same cut-off point to judge all genesc. Over-emphasis in normalization (hence, despise
“Boutique Arrays”)d. Over-emphasis in variance stabilizatione. Over-emphasis in controlling false-positivesf. Over-emphasis in biological replicates (DANGER )
4. No hope for a “One size fits all” software (even method)5. Safer to aim towards “Tailor to individual’s needs”6. Integration of interdisciplinary skills is a must
6. Summary
SSAI – QLD Branch – 6 Apr. 2004
Design and Analysis of Microarray Experiments at CSIRO Livestock Industries
Livestock Species:
1. Tailing humans (…at the moment)a. Andersson & Georges (2004) Domestic-animal genomics:
Deciphering the genetics of complex traits. Nature Genetics, March 2004, Vol 5:202-212
2. Several key advantagesa. More relaxed ethical issues (…relative to R&D in humans)b. Very strong similarities at the genome level with humansc. The genome is (being) sequenced for several species
3. Strong background knowledge of genetics accumulateda. Quantitative geneticsb. Mixed-Model equationsc. Computing expertise
4. Journals will soon be inundated5. We have the opportunity to participate
6. Summary