Hybridization Hybridization Design for 2-Design for 2-
Channel Channel Microarray Microarray
Experiments Experiments Naomi S. Altman, Naomi S. Altman,
Pennsylvania State University), Pennsylvania State University), [email protected]@stat.psu.edu
NSF_RCN Meetings 04NSF_RCN Meetings 04
Expt Design and Expt Design and MicroarraysMicroarrays
Microarrays are Microarrays are ExpensiveExpensive NoisyNoisy
A perfect situation for optimal A perfect situation for optimal designdesign
OutlineOutline Designing a Microarray StudyDesigning a Microarray Study Reference DesignReference Design Loop DesignsLoop Designs ReplicationReplication Optimal Design/AnalysisOptimal Design/Analysis Incorporating Multiple Factors Incorporating Multiple Factors
and Blocksand Blocks
Designing a Microarray Designing a Microarray ExperimentExperiment
Define objectivesDefine objectives Determine factors and treatmentsDetermine factors and treatments Determine appropriate analysis Determine appropriate analysis
methodmethod Determine sample design (biological Determine sample design (biological
and technical replication)and technical replication) Determine platform Determine platform Design spots for custom arraysDesign spots for custom arrays Determine hybridization pairsDetermine hybridization pairs Perform experimentPerform experiment
Designing a Microarray Designing a Microarray ExperimentExperiment
Define objectivesDefine objectives Determine factors and treatmentsDetermine factors and treatments Determine appropriate analysis methodDetermine appropriate analysis method Determine sample design (biological Determine sample design (biological
and technical replication)and technical replication) Determine platform Determine platform Design spots for custom arraysDesign spots for custom arrays Determine hybridization pairs Determine hybridization pairs ←← Perform experimentPerform experiment
Arrow NotationArrow Notation
Introduced by Kerr and Churchill Introduced by Kerr and Churchill (2001)(2001)
Each array is represented by an arrow.Each array is represented by an arrow.
Red Green
Reference DesignReference Design
Reference
A
BC
D4 arrays
1 sample/treatment
4 reference samples
Loop DesignLoop Design(Kerr and Churchill 2001)(Kerr and Churchill 2001)
A
C
B
D
4 arrays
2 samples/treatment
ReplicationReplicationOften there is confusion among:Often there is confusion among:
Biological replicatesBiological replicates
Technical replicatesTechnical replicatesrepeated samplesrepeated samplessplit sample and relabelsplit sample and relabelspot replicationspot replication
In this presentation: We consider only In this presentation: We consider only one spot/gene/arrayone spot/gene/arrayany technical replicates are averagedany technical replicates are averagedeach sample is an each sample is an independent biological replicateindependent biological replicate
Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data
is the response of the gene in one channelis the response of the gene in one channel
is the mean response of the gene over all is the mean response of the gene over all treatments, channels, arraystreatments, channels, arrays
is the effect of treatment iis the effect of treatment i the effect of dye jthe effect of dye j
is the effect of the array k (or spot on the array)is the effect of the array k (or spot on the array)
is the random deviation from the other effects is the random deviation from the other effects and includes biological variation, technical and includes biological variation, technical variation and random errorvariation and random error
ijkkjiijkY
ji
ijkY
ijkk
Linear Mixed Model for Linear Mixed Model for Microarray DataMicroarray Data
The 2 channels on a single spot are correlatedThe 2 channels on a single spot are correlated→ → array should be treated as a random effectarray should be treated as a random effect
ijkkjiijkY
Differencing Channels on Differencing Channels on an Arrayan ArrayOften the difference between samples on Often the difference between samples on
a single array is the unit of analysis:a single array is the unit of analysis:
rGkiRkktir YY )).((
Normalization is almost always done on this quantity.
In a reference design, the difference between treatments A and B can be estimated from 2 arrays by
)).(()).((ˆˆ
luBrktArBA
But there can be a large loss of information.
Var()=0.126 Var(M)=0.453
)).(( ktAr
Drosophila arrays courtesy of
Bryce MacIver, PSU
Reference DesignReference DesignThe reference sample is the same biological The reference sample is the same biological
material on every arraymaterial on every array
T treatments, T treatments, k replicates,k replicates, kT arrayskT arrays
If there are technical dye-swaps, these are If there are technical dye-swaps, these are averaged to form 1 replicate.averaged to form 1 replicate.
If all comparisons are between treatments, If all comparisons are between treatments, there is no need to dye-swap. If there are there is no need to dye-swap. If there are dye-swaps, these should be balanced by dye-swaps, these should be balanced by treatment.treatment.
Reference Design – Usual Reference Design – Usual AnalysisAnalysis
Usually the analysis is done on Usually the analysis is done on E.g.E.g.
).()().()(ˆˆ
BrArBA
24
and with k replicates, the variance of the estimated difference is k/4 2
Using the linear mixed model, we see that the variance of one pair is
The optimal w is
The resulting variance for a single replicate is
and with k replicates, the variance of the estimated difference is
Reference Design – Optimal Reference Design – Optimal WeightsWeights
Consider using Consider using
ThenThen )).(()).((ˆˆ luBrw
ktArwBA
rGkiRkktirw wYY )).((
)/( 222
)/(24 2242 )/(2 224
)(/2/4 2242 kk )(/2 224 k
)/(24 2242min Var 22222 /22
)/( 222 optw
Reference Design – Optimal Reference Design – Optimal WeightsWeights
We do not know the optimal weights but
if we use mixed model ANOVA such as those available in SAS, Splus or R, the weights are approximated from the data – leading to more efficient computations.
Loop DesignsLoop Designs
A
C
B
D
A loop is balanced for dye effects and has two replicates at each node.
T treatments, 2k replicates, Tk arrays
Recall: for a reference design we get only k replicates on Tk arrays
Using optimal weighting
Var(A-B)=Var(A-D) =
Var(A-C)=
Both are smaller than the variance of the reference design with 4 arrays
Loop Designs T=4, 4 Loop Designs T=4, 4 arraysarrays
22222 2/ A
C
B
D
22222 /
22222 /22
Loop Designs T=4Loop Designs T=4
A
C
B
D
A
B
C
D
A
D
B
C
Design L4C Design L4B Design L4D
Loop Design – 3 loops = 6 replicates/treatments
3* L4C Var(A-B)=
Var(A-C)=
L4B+L4C+L4D
Var(difference) =
T=4, 12 arraysT=4, 12 arrays
22222 6/3/
Reference Design – 3 replicates/treatment
Var(difference) = )(3/23/2 22222
22222 3/3/
22222 343/23/
Loop Design – 3 loops = 6 replicates/treatments
3* L4C Var(A-B)= 0.46
Var(A-C)= 0.58
L4B+L4C+L4D
Var(difference) = 0.47
T=4, 12 arraysT=4, 12 arraysAssuming Assuming
Reference Design – 3 replicates/treatment
Var(difference) = 0.83
3/ 22
2
22
2
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H2 Complete Blocks
An 8 Treatment ExampleAn 8 Treatment ExampleA
C
B
DG
F E
H
Replication:
Yellow loop?
Red “loop”?
Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop
GT
gt
gT
Gt
GT
gT
gt
Gt
2
22
821
yy
2
22
821
yy
2
22
821
yy
2
22
821
yy
Which Arrangement is Better?
Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop
The contrasts of interest can be written (in terms of the means – not the observations)½(A+B)-½ (C+D)½(A+D)-½ (B+C)½(A+C)-½ (B+D)
A
C
B
D
Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop
The optimal variances are:
½(A+B)-½ (C+D) ½(A+D)-½ (B+C)
½(A+C)-½ (B+D)
A
C
B
D
2
42
82 y
y
42
22
y
Incorporating 2x2 FactorialIncorporating 2x2 Factorialin a Loop in a Loop
GT
gt
gT
Gt
GT
gT
gt
Gt
2
22
821
yy
2
22
821
yy
2
22
821
yy
2
22
821
yy
Best arrangement for estimating interaction
Best arrangement for estimating time main effect
And now for the rest of And now for the rest of the storythe story
Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency
Added Added treatmentstreatments
A
C
B
D
A
C
B
D
E
And now for the rest of And now for the rest of the storythe story
Missing arrays – Missing arrays – not fatal but not fatal but reduce reduce efficiencyefficiency
Added Added treatmentstreatments
A
C
B
D
A
C
B
D
E
Optimal Design?Optimal Design? The loop design has not been shown to The loop design has not been shown to
be optimalbe optimal There are lots of other BIBDs for 2 There are lots of other BIBDs for 2
samples/blocksamples/block General BIBDs can be adapted as more General BIBDs can be adapted as more
channels become availablechannels become available Loop designs are particularly Loop designs are particularly
appealing due to the dye balance and appealing due to the dye balance and graphical representationgraphical representation
The Moral of the StoryThe Moral of the Story Loop designs are very efficientLoop designs are very efficient
Can incorporate factorial arrangementsCan incorporate factorial arrangements Can incorporate blocksCan incorporate blocks Can be replicated in various ways to Can be replicated in various ways to
improve efficiencyimprove efficiency Optimal design ideas can help Optimal design ideas can help
determine which BIBD to usedetermine which BIBD to use ANOVA-type analyses on the ANOVA-type analyses on the
individual channels – not differencing individual channels – not differencing – should be used for analysis.– should be used for analysis.
ReferencesReferences Kerr and Churchill (2001), Kerr and Churchill (2001),
Experimental design for gene Experimental design for gene expression microarrays, Biostatistics, expression microarrays, Biostatistics, 2:183-201. 2:183-201.
Kerr (2003) Design Considerations for Kerr (2003) Design Considerations for efficient and effective microarray efficient and effective microarray studies, Biometrics, 59: 822-828.studies, Biometrics, 59: 822-828.
Yang and Speed (2002) Design Issues Yang and Speed (2002) Design Issues for cDNA Microarray Experiments for cDNA Microarray Experiments Nature Reviews Genetics 3, 579 -588.Nature Reviews Genetics 3, 579 -588.
C2
B2
A1
C1
B1
A2