limma linear models for microarray data. difficulties with microarray data variability of the...

LIMMA

Linear Models for Microarray Data

Difficulties with microarray data

• Variability of the expression values differs between genes

• Non-identical and dependent distribution between genes

• Multiple testing of tens of thousands of genes

Correct for multiple comparisons

• Multiple testing - Family-wise error rate - False Discovery Rate etc.

• Parallel nature of the inference allows for compensating possibilities

• Borrowing information from the ensemble of genes to assist in inference from individual genes

Empirical Bayes

• Frequentist methods, a hypothesis is typically rejected or not rejected without directly assigning a probability

• Bayesian methods, specifies some prior probability, which is then updated in the light of new data.

• For Bayesian techniques, the prior distribution is assigned independent of the data and fixed before any data is observed.

Empirical Bayes

• Superficially similar to Bayesian methods in that a prior distribution is assigned.

• However, prior distribution is estimated from the data

• Therefore Empirical Bayes is a frequentist technique

LIMMA

• Empiricial Bayes techniques have previously been applied to microarray data

• Analysis specific to experiment and very difficult to implement

• LIMMA - Simple model with simple expression of posterior odds

• Allows linear modelling to be applied to microarray data

Estrogen Data

• 2x2 factorial experiment on MCF7 breast cancer cells using Affymetrix HGU95av2 arrays

• Factors : Estrogen (Presence/Absence)

Length of exposure (10hr/48hr)

• The idea of the study is to identify genes that respond to estrogen treatment

Read in the Data

• Load in the estrogen data

• Normalise the data

• Define the targets (factors) for the linear model

Design Matrix

• Eight arrays• Four pairs of replicates • Four parameters in the linear model

1 low10-1.cel absent 10


3 high10-1.cel present 10






Contrast Matrix1 low10-1.cel absent 10








Estrogen effect at 10 hours

Estrogen effect at 48 hoursTime effect without estrogen

Differential Expression

• Extract linear model fit for contrasts

• Obtain list of differentially expressed genes for contrasts

• Look for overlap among differentially expressed genes

Linear Model Fit

• logFC - Estimate of the log2-fold-change corresponding to the effect or contrast

• AveExpr - Average log2-expression for the probe over all arrays/channels

• t - moderated t-statistic• P.Value - Raw p-value• adj.P.Value -Adjusted p-value• B - log odds that the gene is differentially

expressed

Annotating Data

• Probe arrays can be annotated with external data

• Multiple sources of gene annotations

Gene Set Enrichment

• All biochemical pathways are determined by sets of genes

• Gene sets are determined by prior biological knowledge relating to co-expression, function, location or known biochemical pathways.

• If a pathway is in any way related to a biological trait then the co-functioning genes should display a higher degree of enrichment compared to the rest of the transcriptome.

• Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant overlap

Estrogen receptor (ER) gene set

• If estrogen is present, ER genes will bind the estrogen and become activated

• Gain ability to regulate gene expression and result in differential expression between the cells with and without estrogen

• Should lead to up regulation of ER genes

limma linear models for microarray data. difficulties with microarray data variability of the...

Documents