multivariate analysis of pathways
DESCRIPTION
Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection. Key Multivariate Ideas. PCA (Principal Components Analysis) SVD (Singular Value Decomposition) MDS (Multi-dimensional Scaling) Hotelling T 2. PCA. PCA1 lies along the direction of - PowerPoint PPT PresentationTRANSCRIPT
Multivariate Analysis of Pathways
Multivariate Approaches to Gene Set Selection
Key Multivariate Ideas
• PCA (Principal Components Analysis)
• SVD (Singular Value Decomposition)
• MDS (Multi-dimensional Scaling)
• Hotelling T2
PCA
Three correlated variablesPCA1 lies along the direction ofmaximal correlation; PCA 2 atright angles with the next highest variation.
Multivariate Representation of Pathways
• BAD pathwayNormal
IBC
Other BC
• Clear separation between groups
• Variation differences
• Compute distance between sample means using (common) metric of covariation
• Where
• Multidimensional analog of t (actually F) statistic
Hotelling’s T2
Principles of Kong et al Method
• Normal covariation generally acts to preserve homeostasis
• The transcription of genes that participate in many processes will be changed
• The joint changes in genes will be most distinctive for those genes active in pathways that are working differently
Critiques of Hotelling’s T
• Small samples: unreliable estimates– N < p
• Estimates of x and not robust to outliers
• Assumes same covariance in each sample– = ? Usually not in disease
– Kong et al propose analog of Welch t-test– Permutation in samples for significance
Making it Stable
1. Insufficient information to capture all relationships – too much correlation!
– Power of Hotelling’s method comes from identifying directions of rare variation
– Many (spurious) directions of 0 variation
2. Random variation in data leads to random variation in PCA
• Regularization strategy: force covariance to be more like IID
Making it Robust
• Microarray data has many outliers
• Multivariate methods are very much distorted by outliers
• Robust estimates of covariance could give robust PCA
• Simple approach: trim outliers
Handling Changes of Covariance
• Power of Hotelling’s method comes from identifying directions of rare variation
• If one group shows little covariation in one direction but the other does – how to test for changes?
• If one group is control then its rare covariance changes should be taken as standard– Robust measure of means in both groups
Detecting changes of covariance
Meaning of Covariance Change
• Meaning of covariance across individuals– Homeostasis in face of individual variation– e.g. BAD pathway: largest loadings of PC1 on
PRKARB & ADCY1– PRKARB represses CREB1; ADCY activates CREB1
• Gene sets whose covariance diminishes may– be responding to different inputs – have escaped their usual regulatory control
• Characteristic of cancers
Testing Covariance Changes
• Idea: directions of small variation in one should match directions of small variation in other
• Mathematical approach – Find solutions of S1 – S2
– Solutions should all be near 1, if no change
– Test statistic: easily computed
• Computational approach– Ratio of largest to smallest: max / min
pii
i,..,1
2
1
1
Network Connectivity Methods
Network Topology
• Connections represent interactions:– Regulatory (one-way)– Protein interaction (two-way)
• Hubs are genes with many connections
• Bottlenecks are single genes that connect two parts of a functional network
Devising Tests Based on Topology
• Issues: how to weight more heavily the genes that are hubs
• How to assess directionality of change
• How to measure co-operativity (activation or repression changes in appropriate ways)
Draghici et. al. Approach
• Overall measure
• Effective contribution (perturbation factor)
Analysis of Outliers
Outliers: Clues to Disease Process?
• Outliers usually reflect idiosyncratic events• Recurrent outliers reflect rare events that are selected• If a particular pathway is disrupted in disease, but by
many different mechanisms, then the expression profiles should – Lose healthy covariance– Show recurrent outliers
• How to test for ‘consistent’ outliers?• COPA: a method for flagging recurrent outliers in
expression data– Finds consistent fusion gene
A Test Statistic for Consistent Outliers
• Ratio of quantile differences to normal variation: (q.90 – q.10)tumor/max( (q.9-
q.1)normal,0.4)
• Compare to null distribution by permutation
• Many genes show much higher ratios
Statistical Significance
• Find false positives confidence limits by permutations
• Several hundred genes appear significant at 10-20% FDR – Actual scores: 267 scores are greater than 5,
where 90% of permutations have fewer than 34 scores over 5
A Test for Functional Groups
• For each group G of genes
• sG <- sum(scores[G])/sqrt(length(G))
• Scores: t-scores or range ratios
• PAGE (BMC Bioinformatics, 2005)
Do Genes Make Sense? • Quantile Ratio• [1] "DNA replication"• [2] "response to pathogenic fungi"• [6] "cleavage of lamin"• [7] "spindle organization and biogenesis"• [15] "response to osmotic stress"• [16] "nutrient import"• [22] "response to mercury ion"
• T-test• [2] "sodium ion homeostasis"• [3] "leukocyte adhesive activation"• [4] "positive regulation of calcium-independent cell-cell adhesion"• [5] "oxytocin receptor activity"• [6] "ADP biosynthesis"• [7] "dADP biosynthesis"• [10] "regulation of muscle contraction"• [11] "caveolar membrane"• [12] "response to cold"• [16] "stress fiber formation"• [18] "positive regulation of complement activation"• [19] "astrocyte activation"• [22] "regulation of long-term neuronal synaptic plasticity"• [24] "positive regulation of endocytosis"• [25] "embryonic hemopoiesis"
Cancer Functional Groups
• Do very probable cancer genes show high-discrepancy in few samples?
• Program: identify genes that might contribute to cancer processes: growth signaling, loss of cell-matrix adhesion, apoptosis1. Do most samples from these categories show at
least one gross mis-regulation?
2. Are they the same genes in most samples?
Example: Cell Growth
• Select genes in GO:001558 ‘regulation of cell growth’
• Expect most samples to have at least one very serious mis-regulated gene from this category.
• Compute maximum aberration score across category
Aberrations
• Aberration score indicated by color: vanilla: 0; red: 4
• Nine normals at left• No gene misregulated in
even 50% of samples• BUT: Only a few genes
commonly misregulated
Simplest Summary
• Maximum aberration score for samples
Testing the Pathway for Outliers
• Many genes show aberrations in tumor group
• Null distribution: medians of maxima from randomly selected gene groups of size 37
• P < .01
NB. The results for cell-matrix interaction are very similar; angiogenesis not so strong