microarray technology and analysis of gene expression data hillevi lindroos
Post on 18-Dec-2015
221 views
TRANSCRIPT
Microarray technology and analysis of gene expression data
Hillevi Lindroos
Introduction to microarray technology
• Technique for studying gene expression for thousands of genes simultaneously.
• Study gene regulation, effects of treatments, differences between healthy and diseased cells...
• Comparative Genome Hybridization:
- gene content in related strains/species
- gene dosage in cancer cells
• Microarray: glass slide with spots, each containing DNA from one gene
Two-colour spotted microarrays
Spot = PCR-product (~500 bp) from one gene or long oligonucleotide (~50 bp)
Differential expression (two samples compared)
Experimental procedure:
1. Isolate RNA from 2 samples (experiment and control).
2. Reverse transcribe to cDNA with fluorescently labelled nucleotides, e.g. Cy3-dCTP (control) or Cy5-dCTP (experiment).
3. Mix and hybridize to microarray.
4. Laser scan: measure fluorescent intensities
In principle...
Red spot: up-regulated gene, ratio >1
Green spot: down-regulated gene, ratio <1
Yellow spot: no differential expression, ratio =1
Red and green images superimposed:
Sample (e.g. heat shock)Sample (e.g. heat shock)
gene A
RT
+ red dye
ControlControl
RT
+ green dye
mixing equal amounts
of cDNA
competitive
hybridization
Microarray
Red dot in imageUp-regulation
Why differential expression?
Fluorescent intensities do not directly correspond to mRNA concentrations, due to:
• different shapes and densities of spots
• different hybridization properties between genes
• different amounts of dye incorporation between genes
Compare intensities (expression) from two samples.
Data processing and analysis
1. Image analysis
Locate spots in image
Quantify fluorescence intensity (spot + background)
Mean / median of pixel intensities
2. Background correction
– local background for each spot, or global for whole array
– assuming additive background:
Spot intensity = True intensity + Background
Output
Cy5 (R) and Cy3 (G) intensities
Ratio = R/G
~ [mRNA_experiment] / [mRNA_control]
Up-regulated genes: ratio >1
Down-regulated genes: ratio= 0-1
Assymetry!
Use logarithm!
M = log2(ratio) is symmetrically distributed around 0
Upregulated 2 times: ratio= 2, M= 1
Downregulated 2 times: ratio= 0.5, M= -1
3. Normalization: correction of systematic errors (dye bias)
• different amounts of control and experiment samples
• different fluorescent intensities of Cy3 and Cy5
• different labelling and detection efficiencies
Dye bias: Most genes seem to be upregulated (higher Cy5 than Cy3 intensity).
Plot of Cy5 intensity (R) vs Cy3 intensity (G):
Corrected for by scaling Cy5 values with total_Cy3/total_Cy5.
Assumes most genes unaffected by treatment.
Dye bias may depend on total spot intensity A
(A =½(log2R+log2G)), position on array, print-tip…
Intensity dependent dye bias
Correction:
Mnormalized = M – Mtrend(A)
Identify differentially expressed genes
•Simple: cutoff (e.g. |M| > 1)
•Better: statistical test, e.g. t-test (replicate spots or repeated experiments) => Significance
–Unstable mRNAs may have high ratios – and high variation!
–Weak spots: small difference in signal may be big relative difference (high ratio).
Affymetrix genchips
Spots = 25 bp oligonucleotides
Pairs of perfectly matching probe + probe with 1 mismatch for each gene
One sample per array
Radioactive labelling
Expression level computed from difference in intensity between matching and mis-matching probe
Expression profiles
Plot expression over a series of experiment (e.g. time series)
Expression profiles
-4
-3
-2
-1
0
1
2
3
0 1 2 3 4 5 6
Time
M =
lo
g2
(R/G
)
Gene_AGene_B
Clustering expression profiles
Analyze multiple experiments to identify common patterns of gene expression
Similar function – similar expression (co-regulation)
Goals:
•Identify regulatory motifs
•Infer function of unknown genes
•Distinguish cell types, e.g. tumors (cluster arrays)
Hierarchical clustering
Expression profile -> vector
Compute similarity between expression profiles (e.g. correlation coefficient)
Successively join the most similar genes to clusters, and clusters to superclusters
Serum stimulation of human fibroblasts, time series.
A: cholesterol biosynthesis
B: cell cycle
C: immediate-early response
D: signaling and angiogenesis
E: wound healing
from: Eisen et al., 1998, PNAS 95(25): 14863-14868
Distance: correlation coefficient
Agglomeration: average linkage
Clustering of arrays:classification of cancer cells.
From Chen et al. (2002). Mol Biol Cell 13(6):1929-39
Exercise:
Normalization (Excel):
R-G plot
M-A plot
most up- and downregulated genes