affymetrix case study
DESCRIPTION
Affymetrix case study. Jesper Jørgensen NsGene A/S [email protected]. Overview. Affymetrix GeneChip technology Data processing Expression level Normalisation Fold change Statistics Parkinson disease Ventral versus dorsal midbrain (case study) Verification of array data Q-PCR - PowerPoint PPT PresentationTRANSCRIPT
Overview
• Affymetrix GeneChip technology• Data processing
– Expression level– Normalisation– Fold change– Statistics
• Parkinson disease• Ventral versus dorsal midbrain (case study)• Verification of array data
– Q-PCR– In situ hybridization– Immunohistochemistry
Expression profiling
• Expression profiling– Investigate mRNA expression profile.– Compare gene expression between two or more
situations.– Case versus control.
• Profiling methods– Differential display.– SAGE (Serial Analysis of Gene Expression)– Micro array (Custom spotted arrays / Affymetrix
GeneChip).
Affymetrix GeneChip technology
Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html
Gene 5’
Mulitple oligo probes
PMMM
3’
Probe synthesis on the array
Affymetrix GeneChip technology
Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html
Gene 5’
Mulitple oligo probes
PMMM
3’
A probe set = 11-20 PM,MM pairs(Probe design is not optimized)
Probe set design
Affymetrix GeneChip technology
Figure adapted from: David Givol, Weizman Institute of Science, http://www.weizmann.ac.il/home/ligivol/research_interests.html
Gene 5’
Mulitple oligo probes
PMMM
3’
Preparation of samples for GeneChip
Figure modified from: Knudsen (2002), “A Biologist's Guide to Analysis of DNA Microarray Data", Wiley.
Amplification(T7 RNA polymerase)
U133AU133B
The hardware
Overview
• Affymetrix GeneChip technology• Data processing
– Expression level– Normalisation– Fold change– Statistics
• Parkinson disease• Ventral versus dorsal mesencephalon (case
study)• Verification of array data
– Q-PCR– In situ hybridization– Immune histochemistry
Li-Wong model
n: scaling factor obtained by fitting
Several other models exists. Irizarry et al. (2002) uses log transformed PM values after carrying out a global background adjustment and across array normalisation.
Expression level (probe signal)
Irrizary et al. (2002) Biostatistics
Workman et al., (2002) Genome Biology, vol. 3, No. 9.
qspline normalisation (M/A plot)
Assumption: Most genes are unchanged.
M/A plot: Raw chip data are used to plot, for each probe, the logarithm of the ratio between two chips versus the logarithm of the mean expression for the two chips.
Before
After
Variation
Two different amplifications of the same RNA applied to GeneChips
A/A B/B
• Fold change = sample/control• Log transformation makes scale symmetric around 0• All data log2 transformed
Fold change (Log fold)
-4
-3
-2
-1
0
1
2
3
4
0 2 4 6 8 10 12
Fold changeLog
fold
(2
)
• Student and Welch’s t-test• ANOVA• SAM• Wilcoxon• Kruskal-Wallis • Westfall-Young• ………..
Is the regulation significant?
Statistical testing
• 5 false positives if you look at 100 genes
• 1200 false positives if you look at 24.000 genes
Increased likelihood of getting a significant result by chance alone
At a P-value of 0.05 you expect:
If you want 25% chance of having only one false positive in the list of regulated genes, you should only consider P-values more significant than the Bonferroni corrected cutoff.
• 2.5x10-3 (0.25/100) if you look at 100 genes
• 1.0x10-5 (0.25/24.000) if you look at 24.000 genes
Bonferroni correction
Overview
• Affymetrix GeneChip technology• Data processing
– Expression level– Normalisation– Fold change– Statistics
• Parkinson disease• Ventral versus dorsal mesencephalon (case
study)• Verification of array data
– Q-PCR– In situ hybridization– Immune histochemistry
Parkinson’s Disease (PD)
• A fairly common neurodegenerative disorder (app. 2 million in USA/Europe)
• Due to loss of the dopamine-producing neurons in the Substantia Nigra
• Cardinal motor symptoms: tremor, rigidity and bradykinesia
• Conventional treatment does not halt the progression nerve cell loss
Fetal Transplantation for PD• Cells from the developing
midbrain (A) – are collected and dissociated
(B) – and transplanted into the
striatum (C)
• The cells will integrate with the host brain and produce dopamine.
Stem cells in Parkinson disease
Langston JW., J Clin Invest. 2005 Jan;115(1):23-5.
Overview
• Affymetrix GeneChip technology• Data processing
– Expression level– Normalisation– Fold change– Statistics
• Parkinson disease• Ventral versus dorsal mesencephalon (case
study)• Verification of array data
– Q-PCR– In situ hybridization– Immune histochemistry
Aim
* TH IHC
• In the human fetus, DA neurons can be found in the ventral part of the tegmentum (VT) from approximately 6 weeks.
• In contrast, no DA neurons can be found in the neighboring dorsal part (DT).
• We aim at finding genes associated with DA differentiation by using GeneChips to compare the expression profiles of VT and DT.
8wVT (B)
8wDT (A)
8wDT (B)
8wVT (A)
High quality RNA from 8w GA human ventral midbrain
Experimental setup
• Compare VT against DT (3x3)• Affymetrix Human Genome U133 Chip Set
– HG-U133A: Well substantiated genes– HG-U133B: Mostly EST’s– Total: 45,000 probes (genome)
A VENTRAL B VENTRAL C VENTRALA DORSAL B DORSAL C DORSAL
U133A data permutations and filter
• Red: VM versus DM: VM (A1 VENTRAL, A2 VENTRAL, B VENTRAL) DM (A1 DORSAL, A2 DORSAL, B DORSAL)
• Other colors: Permutations
• Low-stringency filter as dotted line:
• Average expression > 50• P-value < 0.04• SLR>0.5 (42% up-regulation in VM)• Arrange with descending fold change.
SLR
Genes up-regulated in VM on U133A
Low-stringency filter: Average expression > 50, P-value<0.04, SLR>0.5 arranged with descending fold change. Total list 107 probes. Only SLR>1 displayed.
Literature verification• ALDH1A• DAT1• VMAT2• TH• Calbindin, 28kDa• HNF3a• 3x Nurr1• 2x IGF• 4x SNCA• 4x DRD2• KCNJ6 (Girk2)• Ret• PITX3• BDNF• DLK1 (FA1)• SLC17A6 (VGLUT2)• EPHA5• ERBB4
Overview
• Affymetrix GeneChip technology• Data processing
– Expression level– Normalisation– Fold change– Statistics
• Parkinson disease• Ventral versus dorsal mesencephalon (case
study)• Verification of array data
– Q-PCR– In situ hybridization– Immune histochemistry
Verification of array dataArray Data
(100 candiate genes)
Validation on array material (confirmation)
Validation on new samples (universality)
Desk work
Statistics
Literature
Bioinformatics
RNA
Q-PCR
ISH
Northerns
Protein
IHC
ELISA
Westerns
ALDH1A1 RT-PCR
35x
cDN
A#
25
7 (
DM
)
cDN
A#
25
6 (
VM
)
cDN
A#
24
5 (
DM
)
cDN
A#
24
4 (
VM
)
cDN
A#
25
4 (
DM
)
cDN
A#
25
3
(VM
)299bp
30x
299bp0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43
Cycle
Flu
ore
sc
en
ce
30 35 40
Q-PCR verification of genes regulated on U133A
TH Q-PCR on a developmental series of subdissected human
embryonic and fetal brain material
OD260/280 were measured to 1.88 +/- 0.05 for all RNA samples
Q-PCR analysis and clustering
OD260/280 were measured to 1.88 +/- 0.05 for all RNA samples
1.5 fold up-regulation from no expression
1.5 fold up-regulation from some expression
Fold change in a mixed population
Verification of array dataArray Data
(100 candiate genes)
Validation on array material (confirmation)
Validation on new samples (universality)
Desk work
Statistics
Literature
Bioinformatics
RNA
Q-PCR
ISH
Northerns
Protein
IHC
ELISA
Westerns
Organization of ISH procedure
GeneChip verification with ISH
ISH from: Vernay et al., J Neurosci. 2005 May 11;25(19):4856-67.
Verification of array dataArray Data
(100 candiate genes)
Validation on array material (confirmation)
Validation on new samples (universality)
Desk work
Statistics
Literature
Bioinformatics
RNA
Q-PCR
ISH
Northerns
Protein
IHC
ELISA
Westerns
GeneChip verification with IHC
Courtesy of Josephine Jensen
Conclusions
• Using arrays one will get at snapshot of the expression profile under the conditions investigated.– Careful experimental design– RNA quantity and quality are important
• Since a single array experiment generates thousands of data points, the primary challenge of the technique is to make sense of data.– Calculations/Statistics (back and forth)– Literature mining
• Independent methods are needed for verification– Q-PCR– In situ hybridization (ISH)– Immunohistochemistry (IHC)
AcknowledgementsNsGene, Ballerup, Denmark (http://www.nsgene.com/)• Lars Wahlberg• Bengt Juliusson• Teit Johansen
Neurotech, Huddinge University Hospital, Sweden• Åke Seiger
Department of Medical Genetics, IMBG, Panum Institute, Denmark• Claus Hansen• Karen Friis
Wallenberg Neuroscience Center, Sweden• Anders Björklund• Josephine Jensen• Elin Andersson
CBS, DTU, Denmark• Søren Brunak• Steen Knudsen• Nikolaj Blom• Thomas Nordahl Petersen