limma linear models for microarray data. difficulties with microarray data variability of the...

16
LIMMA Linear Models for Microarray Data

Upload: mary-bishop

Post on 28-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

LIMMA

Linear Models for Microarray Data

Page 2: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Difficulties with microarray data

• Variability of the expression values differs between genes

• Non-identical and dependent distribution between genes

• Multiple testing of tens of thousands of genes

Page 3: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Correct for multiple comparisons

• Multiple testing - Family-wise error rate - False Discovery Rate etc.

• Parallel nature of the inference allows for compensating possibilities

• Borrowing information from the ensemble of genes to assist in inference from individual genes

Page 4: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Empirical Bayes

• Frequentist methods, a hypothesis is typically rejected or not rejected without directly assigning a probability

• Bayesian methods, specifies some prior probability, which is then updated in the light of new data.

• For Bayesian techniques, the prior distribution is assigned independent of the data and fixed before any data is observed.

Page 5: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Empirical Bayes

• Superficially similar to Bayesian methods in that a prior distribution is assigned.

• However, prior distribution is estimated from the data

• Therefore Empirical Bayes is a frequentist technique

Page 6: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

LIMMA

• Empiricial Bayes techniques have previously been applied to microarray data

• Analysis specific to experiment and very difficult to implement

• LIMMA - Simple model with simple expression of posterior odds

• Allows linear modelling to be applied to microarray data

Page 7: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Estrogen Data

• 2x2 factorial experiment on MCF7 breast cancer cells using Affymetrix HGU95av2 arrays

• Factors : Estrogen (Presence/Absence)

Length of exposure (10hr/48hr)

• The idea of the study is to identify genes that respond to estrogen treatment

Page 8: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Read in the Data

• Load in the estrogen data

• Normalise the data

• Define the targets (factors) for the linear model

Page 9: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Design Matrix

• Eight arrays• Four pairs of replicates • Four parameters in the linear model

1 low10-1.cel absent 10

2 low10-2.cel absent 10

3 high10-1.cel present 10

4 high10-2.cel present 10

5 low48-1.cel absent 48

6 low48-2.cel absent 48

7 high48-1.cel present 48

8 high48-2.cel present 48

Page 10: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Contrast Matrix1 low10-1.cel absent 10

2 low10-2.cel absent 10

3 high10-1.cel present 10

4 high10-2.cel present 10

5 low48-1.cel absent 48

6 low48-2.cel absent 48

7 high48-1.cel present 48

8 high48-2.cel present 48

Estrogen effect at 10 hours

Estrogen effect at 48 hoursTime effect without estrogen

Page 11: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Differential Expression

• Extract linear model fit for contrasts

• Obtain list of differentially expressed genes for contrasts

• Look for overlap among differentially expressed genes

Page 12: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Linear Model Fit

• logFC - Estimate of the log2-fold-change corresponding to the effect or contrast

• AveExpr - Average log2-expression for the probe over all arrays/channels

• t - moderated t-statistic• P.Value - Raw p-value• adj.P.Value -Adjusted p-value• B - log odds that the gene is differentially

expressed

Page 13: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Annotating Data

• Probe arrays can be annotated with external data

• Multiple sources of gene annotations

Page 14: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Gene Set Enrichment

• All biochemical pathways are determined by sets of genes

• Gene sets are determined by prior biological knowledge relating to co-expression, function, location or known biochemical pathways.

• If a pathway is in any way related to a biological trait then the co-functioning genes should display a higher degree of enrichment compared to the rest of the transcriptome.

• Gene Set Enrichment (GSE) is a computational technique which determines whether a priori defined set of genes show statistically significant overlap

Page 15: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and
Page 16: LIMMA Linear Models for Microarray Data. Difficulties with microarray data Variability of the expression values differs between genes Non-identical and

Estrogen receptor (ER) gene set

• If estrogen is present, ER genes will bind the estrogen and become activated

• Gain ability to regulate gene expression and result in differential expression between the cells with and without estrogen

• Should lead to up regulation of ER genes