Some thoughts of the design of Some thoughts of the design of cDNA microarray experimentscDNA microarray experiments
Terry Speed & Yee HwaYang,
Department of Statistics
UC Berkeley
MGED IV Boston, February 14, 2002
Some aspects of designSome aspects of design
Layout of the array– Which cDNA sequence to print?
• Library • Controls
– Spatial position
Allocation of samples to the slides – Different design layout
• A vs B : Treatment vs control• Multiple treatments• Factorial • Time series
– Other considerations• Replication• Physical limitations: the number of slides and the amount of material• Extensibility - linking
Some issues to consider before designing cDNA microarray experiments
ScientificAims of the experiment
Specific questions and priorities between them. How will the experiments answer the questions posed?
Practical (Logistic)Types of mRNA samples: reference, control, treatment, mutant, etc Amount of material. Count the amount of mRNA involved in one channel
of hybridization as one unit. The number of slides available for the experiment.
Other Information
The experimental process prior to hybridization: sample isolation, mRNA extraction, amplification, labelling,…
Controls planned: positive, negative, ratio, etc.Verification method: Northern, RT-PCR, in situ hybridization, etc.
Natural design choiceNatural design choice
Case 1: Meaningful biological control (C)
Samples: Liver tissue from four mice treated by cholesterol modifying drugs.
Question 1: Genes that respond differently between the T and the C.
Question 2: Genes that responded similarly across two or more treatments relative to control.
Case 2: Use of universal reference.
Samples: Different tumor samples.
Question: To discover tumor subtypes.
C
T1 T2 T3 T4 T1
Ref
T2 Tn-1 Tn
Treatment vs ControlTreatment vs Control
Two samples
e.g. KO vs. WT or mutant vs. WT
T CT Ref
C Ref
Direct Indirect
2 /2 22
average (log (T/C)) log (T / Ref) – log (C / Ref )
CaveatCaveat
The advantage of direct over indirect comparisons was first pointed out by Churchill & Kerr, and in general, we agree with the conclusion. However, you can see in the last M vs A plot that the difference is not a factor of 2, as theory predicts. Why?
A likely explanation is that the assumption that log(T/Ref)
and log(C/Ref) are uncorrelated is not valid, and so the gains are less than predicted. The reason for the correlation is less obvious, but there are a number of possibilities.
One is that we use mRNA from the same extraction; another is that we didn‘t dye-swap with the two indirect comparisons, but did when we replicated the direct comparison. The answer is not yet clear.
LabelingLabeling • 3 sets of self – self hybridization: (cerebellum vs cerebellum)• Data 1 and Data 2 were labeled together and hybridized on two
slides separately.• Data 3 were labeled separately.
Data 1 Data 1
Dat
a 2
Dat
a 3
• Olfactory bulb experiment:• 3 sets of Anterior vs Dorsal performed on different days• #10 and #12 were from the same RNA isolation and
amplification• #12 and #18 were from different dissections and amplifications• All 3 data sets were labeled separately before hybridization
Extraction
I) Common Reference
II) Common reference
III) Direct comparison
Number of Slides
Ave. variance
Units of material
A = B = C = 1 A = B = C = 2 A = B = C = 2
Ave. variance
One-way layout: one factor, k levelsOne-way layout: one factor, k levels
C B
A
ref
CBA
ref
CBA
I) Common Reference
II) Common reference
III) Direct comparison
Number of Slides
N = 3 N=6 N=3
Ave. variance 2 0.67
Units of material A = B = C = 1 A = B = C = 2 A = B = C = 2
Ave. variance 1 0.67
One-way layout: one factor, k levelsOne-way layout: one factor, k levels
C B
A
ref
CBA
ref
CBA
For k = 3, efficiency ratio (Design I / Design III) = 3. In general, efficiency ratio = 2k / (k-1). However, remember the assumption!
Design I
Design III
A B
C
A
Ref
B C
Illustration from one experiment
Box plots of log ratios: we are still ahead!
CTL OSM
EGF OSM & EGF
Factorial experimentsFactorial experiments
•Treated cell lines
•Possible experiments
Here we are interested not in genes for which there is an O or an E effect, but in which there is an OE interaction, i.e. in genes for which log(O&E/O)-log(E/C) is large or small.
Other examples of factorial experimentsOther examples of factorial experiments
Suppose we have tumor T and standard cells S from the same tissue, and are interested in the impact of radiation R on gene expression. In general, genes for which log(RT/T) and log(RS/S) are large or small, will be less interesting to us than those for which log(RT/T) - log(RS/S) are large or small, i.e. those with large interactions.
Next, suppose that our interest is in comparing gene expression in two mutants , say M and M’, at two developmental stages, E and P say. Then we are probably more interested in those genes for which the temporal pattern in the two mutants differ, than in the patterns themselves, i.e. interest focusses on genes for which log(ME/MP)-log(M’E/M’P) is large or small, again the ones with large interactions.
Indirect A balance of direct and indirect
I) II) III) IV)
# Slides N = 6
Main effect A
0.5 0.67 0.5 NA
Main effect B
0.5 0.43 0.5 0.3
Interaction A.B
1.5 0.67 1 0.67
2 x 2 factorial: some design options2 x 2 factorial: some design options
C
A.BBA
B
C
A.B
A
B
C
A.B
A
B
C
A.B
A
Table entry: variance (assuming all log ratios uncorrelated)
Design choices in time series. Entry: variance
t vs t+1 t vs t+2 t vs t+3
Ave
T1T2 T2T3 T3T4 T1T3 T2T4 T1T4
N=3 A) T1 as common reference 1 2 2 1 2 1 1.5
B) Direct Hybridization 1 1 1 2 2 3 1.67
N=4 C) Common reference 2 2 2 2 2 2 2
D) T1 as common ref + more .67 .67 1.67 .67 1.67 1 1.06
E) Direct hybridization choice 1 .75 .75 .75 1 1 .75 .83
F) Direct Hybridization choice 2 1 .75 1 .75 .75 .75 .83
T2 T3 T4T1
T2 T3 T4T1
Ref
T2 T3 T4T1
T2 T3 T4T1
T2 T3 T4T1
T2 T3 T4T1
M1.WT.P11
M1.MT.P21M1.MT.P11
M1.WT.P21M1.WT.P1
M1.MT.P1
Mutant 1 (M1)
Mutant 2 (M2)
M2.WT.P11
M2.MT.P21M2.MT.P11
M2.WT.P21M2.WT P1
M2.MT.P1
Question: Seek genes that are changing over time and are different in MT vs WT.Analysis: Looking at the interaction effect between time and type.
An recently designed factorial experiment
SummarySummary
The balance of direct and indirect comparisons in a given context should be determined by optimizing the precision of the estimates among comparisons of interest, subject to the scientific and physical constraints of the experiment.
AcknowledgmentsAcknowledgments
Jean Yee Hwa YangJean Yee Hwa YangSandrine DudoitSandrine Dudoit
Gary Glonek (Adelaide)Gary Glonek (Adelaide)
Ingrid Lönnstedt (Uppsala)Ingrid Lönnstedt (Uppsala)
John Ngai’s Lab (Berkeley)
Jonathan Scolnick
Cynthia Duggan
Vivian Peng
Moriah Szpara
Percy Luu
Elva Diaz
Dave Lin (Cornell)
Some web sites:
Technical reports, talks, software etc.
http://www.stat.berkeley.edu/users/terry/zarray/Html/
Statistical software R (“GNU’s S”)
http://www.R-project.org/
Packages within R environment:
-- SMA (statistics for microarray analysis) http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
--Spot http://www.cmis.csiro.au/iap/spot.htm