microarray analysis quantitation of gene expression expression data to networks bio520...
TRANSCRIPT
![Page 1: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/1.jpg)
Microarray analysis
Quantitation of Gene Expression
Expression Data to Networks
BIO520 Bioinformatics Jim Lund
Reading: Ch 16
![Page 2: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/2.jpg)
Microarray data
• Image quantitation.• Normalization• Find genes with significant
expression differences• Annotation• Clustering, pattern analysis,
network analysis
![Page 3: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/3.jpg)
Sources of Non-Biological Variation
• Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation
• Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.)
• Variation across replicate slides
• Variation across hybridization conditions
• Variation in scanning conditions
• Variation among technicians doing the lab work.
![Page 4: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/4.jpg)
Factors which impact on the signal level
• Amount of mRNA
• Labeling efficiencies
• Quality of the RNA
• Laser/dye combination
• Detection efficiency of photomultiplier or CCD
![Page 5: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/5.jpg)
HelaHepG2
![Page 6: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/6.jpg)
HelaHepG2
![Page 7: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/7.jpg)
A = (Log Green + Log Red) / 2
M =
Lo
g (
Red
- L
og
Gre
en
M vs. A Plot
![Page 8: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/8.jpg)
M v A plots of chip pairs: before normalization
![Page 9: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/9.jpg)
M v A plots of chip pairs: after quantile normalization
![Page 10: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/10.jpg)
Types of normalization
• To total signal (linear normalization)• LOESS (LOcally WEighted polynomial
regreSSion).• To “house keeping genes”• To genomic DNA spots (Research
Genetics) or mixed cDNA’s• To internal spikes
![Page 11: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/11.jpg)
Microarray analysis
• Data exploration: expression of gene X?
• Statistical analysis: which genes show large, reproducible changes?
• Clustering: grouping genes by expression pattern.
• Knowledge-based analysis: Are amine synthesis genes involved in this experiment?
![Page 12: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/12.jpg)
HelaHepG2
Fold change: the crudest method of finding differentially expressed genes
>2-fold expression change
>2-fold expression change
![Page 13: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/13.jpg)
What do we mean by differentially expressed?
• Statistically, our gene is different from the other genes.
Num
ber
of g
enes
Log ratio
Distribution of average ratios for all genes
Probability of a given
Value of the ratio
Distribution of measurements for gene of interest
![Page 14: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/14.jpg)
Finding differentially expressed genes
What affects our certainty that a gene is up or down-regulated?
• Number of sample points• Difference in means• Standard deviations of
sample
SampleA
SampleB
Pro
be S
igna
l
![Page 15: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/15.jpg)
Practical views on statistics
• With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns.
• Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives.
• False negatives are lost opportunities, false positives cost $’s and waste time.
• A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.
![Page 16: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/16.jpg)
Statistical Tests
• Student’s t-test
– Correct for multiple testing! (Holm-Bonferroni)
• False discovery rate.
• Significance Analysis of Microarrays (SAM)
– http://www-stat.stanford.edu/~tibs/SAM/
• ANOVA
• Principal components analysis
• Special methods for periodic patterns in data.
![Page 17: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/17.jpg)
Volcano plot: log(expr) vs p-value
Log(fold change)
p-va
lue
![Page 18: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/18.jpg)
Scatter plot showing genes with significant p-values
![Page 19: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/19.jpg)
Pattern finding
• In many cases, the patterns of differential expression are the target (as opposed to specific genes)– Clustering or other approaches for pattern
identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes
– Classification - identify genes which best distinguish 2 or more classes.
• The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….
![Page 20: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/20.jpg)
What is clustering?
• Group similar objects together.
– Genes with similar expression patterns.
• Objects in the same cluster (group) are more similar to each other than objects in different clusters.
![Page 21: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/21.jpg)
Clustering
• What is clustering?
• Similarity/distance metrics
• Hierarchical clustering algorithms– Made popular by Stanford, ie. [Eisen et al. 1998]
• K-means– Made popular by many groups, eg. [Tavazoie
et al. 1999]
• Self-organizing map (SOM) – Made popular by Whitehead, ie. [Tamayo et al.
1999]
![Page 22: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/22.jpg)
Typical Tools
• SAM (Significance Analysis of Microarrays), Stanford
• GeneSpring• Affymetrix GeneChip Operating System
(GCOS)
• Cluster/Treeview• R statistics package microarray analysis
libraries.
![Page 23: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/23.jpg)
How to define similarity?
• Similarity metric:
– A measure of pairwise similarity or dissimilarity
– Examples:• Correlation coefficient• Euclidean distance
Experiments
genes
genes
genes
X
Y
X
Y
Raw matrix
Similarity matrix
1
n
1 p n
n
![Page 24: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/24.jpg)
Similarity metrics
• Euclidean distance
• Correlation coefficient
2
1
)][][(∑=
−p
j
jYjX
p
jX
Xwhere
YjYXjX
YjYXjXp
j
p
j
p
j
p
j∑
∑ ∑
∑=
= =
= =
−−
−−1
1 1
22
1
][,
)][()][(
)][)(][(
Euclidean clustering = magnitude & Direction
Correlation clustering = direction
![Page 25: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/25.jpg)
Sporulation-example
![Page 26: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/26.jpg)
Sporulation-example
![Page 27: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/27.jpg)
Self-organizing maps (SOM) [Kohonen 1995]
• Basic idea:
– map high dimensional data onto a 2D grid of nodes
– Neighboring nodes are more similar than points far away
![Page 28: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/28.jpg)
Self-organizing maps (SOM)
![Page 29: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/29.jpg)
SOM Clusters
![Page 30: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/30.jpg)
Things learned from from microarray gene expression experiments
• Pathways not known to be involved
–Ontology?
• Novel genes involved in a known pathway
• “like” and “unlike” tissues
![Page 31: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/31.jpg)
Transcription FactorsRegulatory Networks
• Identify co-regulated genes
• Search for common motifs (transcription factor binding sites)
–Evaluate known motifs/factors
–Search for new ones.• Programs: MEME, etc.
![Page 32: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/32.jpg)
mRNA-protein Correlation
• YPD: should have relevant data
– will yeast be typical?
• Electrophoresis 18:533
– 23 proteins on 2D gels
– r=0.48 for mRNA=protein
• Post transcriptional and post translational regulation important!
![Page 33: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/33.jpg)
Other microarray formats
• Single nucleotide polymorphism (SNP) chips
– Oligos with each of 4 nt at each SNP.
• Chromosomal IP chips (ChIP:chip)
– Determine transcription factor binding sites
– Promoter DNA on the chip.
• Alternative splicing chips
– Long oligos, covering alternatively spliced exons, or all exons.
• Genome tiling chips
![Page 34: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/34.jpg)
ChIP:chip--Identification of Transcription Factor Binding Sites
• Cross link transcription factors to DNA with formaldehyde
• Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion).
• Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA.
• Brown et.al. (2001) Nature, 409(533-8)
![Page 35: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/35.jpg)
ChIP:chipAnalysis of TF Binding Sites
![Page 36: Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16](https://reader036.vdocuments.mx/reader036/viewer/2022062322/5697bfe11a28abf838cb39e2/html5/thumbnails/36.jpg)
On to Proteomics
DNARNA Protein