differential expressions bayesian techniques lecture topic 8

36
Differential Expressions Bayesian Techniques Lecture Topic 8

Upload: imogene-todd

Post on 21-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differential Expressions Bayesian Techniques Lecture Topic 8

Differential ExpressionsBayesian Techniques

Lecture Topic 8

Page 2: Differential Expressions Bayesian Techniques Lecture Topic 8

Why Bayes?

A friend of mine who is Bayesian said the following when asked this question:

• Some problems very hard to solve by classical techniques• e.g. Behrens-Fisher problem• Every new problem requires a new solution• Bayes provides a coherent path

Page 3: Differential Expressions Bayesian Techniques Lecture Topic 8

The Frequentist Paradigm

• Probability refers to a limiting relative frequency. Probability are OBJECTIVE properties in the real world.

• Parameters are fixed unknown constants, NO probability statement is possible about a parameter.

• Statistical procedures should be designed to have well-defined LONG-RUN frequency properties. For example a 95% confidence interval should trap the true value of the parameter with a limiting frequency of 95%.

Page 4: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayesian Philosophy

• Probability describes a DEGREE OF BELIEF not a relative frequency. As such you can make probability statements about anything, not just data

• We CAN make probability statement about parameters even if they are fixed constants.

• We make inferences about a parameter by producing its probability distributions. Inferences such as point or interval estimation maybe extracted from the probability distribution of the parameter.

Page 5: Differential Expressions Bayesian Techniques Lecture Topic 8

The Contrasts

• According to Larry Wasserman: “Bayesian inference is a controversial approach as it embraces a subjective notion of probability”.

• In general Bayesian methods have NO guarantees for long run performance.

Page 6: Differential Expressions Bayesian Techniques Lecture Topic 8

Advantages of Bayesian Methods

• Provide ability to formally incorporate prior information• Inference conditional on actual data (not what might have been)• More easily interpretable by non-specialists (e.g. confidence intervals)• All analyses follow directly from posterior distribution• Stopping Rule does not affect Inference• Any question can be directly answered ex. bioequivalence

– H0: θ0 ≠ θ0– H1: θ0 = θ1

• ■ Reverse role of null and alternative• ■ Hard to use traditional testing methods in Bayes easy

Page 7: Differential Expressions Bayesian Techniques Lecture Topic 8

Disadvantages

• Initial Bayesians were subjectivist• Results not “objective,” could be manipulated to yield any

desired result• How to set the prior in general?• Computationally difficult• Need to evaluate complex integrals even for simple

problems• Need inexpensive high speed computing

Page 8: Differential Expressions Bayesian Techniques Lecture Topic 8

How Bayesian Method Works

• Choose a probability density f(q) – called the PRIOR distribution - that expresses our beliefs about a parameter BEFORE we see any data.

• We choose a statistical model f(x| q) that reflects our beliefs our x given q. Here we write it as f(x | q) NOT f(x;q) in the frequentist world.

• After OBSERVING the data X1, …, Xn, we update our belief in the parameter and calculate the posterior distribution f(q | x).

• It essentially uses the Bayes theorem to calculate the posterior distribution.

Page 9: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayes Theorem: Discrete VersionA Simple Probability Result

• Let B1,B2 . . . Bn disjoint sets P(Bk) > 0, all k,

• P(B1U B2 . . . U Bn) = 1 • (Mutually exclusive and exhaustive)

• For any event A• P(Bj|A) = P(Bj)P(A|Bj)/ SP(Bk)P(A|Bk)

Page 10: Differential Expressions Bayesian Techniques Lecture Topic 8

EXAMPLE: • Disease incidence in population – P(D)=0.001

• Diagnostic test – false positive rate 0.05 , P(+|not D) = 0.05– false negative rate 0.01, P(-|D) = 0.01

• If Person drawn at random tests +, What is probability he has disease, D?

)|()()|()(

)|()()|( CC DPDPDPDP

DPDPDP

0194.)05)(.999(.)99)(.001(.

)99)(.001(.

Page 11: Differential Expressions Bayesian Techniques Lecture Topic 8

Comment

• Hence, probability that you HAVE the disease given that you have TESTED positive is still pretty LOW, even with very small FALSE POSITIVES and FALSE NEGATIVES.

• This rule is very useful in numerous other situations.

Page 12: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayes Theorem: The Continuous Version

• Let f( ) q be our prior distribution (density) for our parameter .q

• Suppose we have the data X1, …, Xn, with density f(X1, …, Xn | ) q also written as Ln(X, q)

dfxxf

fxxf

dfxxL

fxxLxxf

n

n

nn

nnn

)()|...(

)()|...(

)()...,(

)()...,()...|(

1

1

1

11

Page 13: Differential Expressions Bayesian Techniques Lecture Topic 8

Some Simplifications

• The denominator is sometimes very hard to deal with, since the integration over the parameters is not trivial.

• We call that the normalizing constant. And in most cases don’t explicitly evaluate it. And we use the idea that:

)()|...()...|( 11 fxxLxxf nn

Page 14: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayes’ Idea

• Think of a model for data y1, . . . , yn

f(y1, . . . , yn|θ) e.g. Normal, Binomial, etc.

• θ random with prior density g(.)

• Bayes Rule says that:

p(θ| y1, . . . , yn) =

g(θ) f(y1, . . . , yn |θ)• Hence, the posterior is proportional to probability of

prior multiplied by probability of data given the parameter.

Page 15: Differential Expressions Bayesian Techniques Lecture Topic 8

Hypothesis Testing: Classical vs. Bayesian

Classical: Set up null, alternative hypotheses, perform a

test, calculate a p-value, reject or fail to reject

the null

Bayesian: Inference based on posterior distribution,

p(θ|y1, . . . , yn)

• Consider evidence in favor of certain parameter values

• Data as well as prior beliefs influence inference

Page 16: Differential Expressions Bayesian Techniques Lecture Topic 8

Major Challenge 1: Setting Priors

Approaches• Subjective - based on beliefs of individual, expert, etc.

issues:– how to do in practice?– -people inconsistent– elicitation can help

• Non-informative - based on “prior ignorance” about parameter

• issues:– often hard to define– may lead to improper posteriors– sensitive to parameterization

Page 17: Differential Expressions Bayesian Techniques Lecture Topic 8

Setting Priors: Conjugate Priors • Conjugate priors are priors so that combined with the model the

posterior will have a KNOWN distribution.• issues:

– choice of convenience– avoids computational problems– exists only for limited families

• Example:• y ~ Bin(n,θ), θ ~ Beta(α,β) then p(θ|y) Beta(α+y,β+n-y)

• Normal conjugate is Normal for location• Poisson conjugate is Gamma• Inverse Gamma is often used as a prior for Normal s2.• Generally all members of the Exponential Families have conjugate

priors.

Page 18: Differential Expressions Bayesian Techniques Lecture Topic 8

Setting Priors: Non-informative

• Assuming we have no REAL information about the parameter, we can model it with a “non-informative” prior.

• For example if qi is discrete we can think of – P(qi) =1/n for i= 1…n

• If we know an interval (a,b) in which q lies, we can define – Prior as P(q) = 1/(b-a) a < q < b.

• We can also define– P(q) = c, c > 0. (improper Prior, since its not a pdf).

Page 19: Differential Expressions Bayesian Techniques Lecture Topic 8

Setting Priors: Jeffery’s Prior

• Uniform non-informative priors are criticized since they do not lend themselves to transformation.

• Jeffery’s Prior is often used, that IS invariant under transformation.

• P(q) = [I(q)]1/2 , I: information matrix

)|(log()(2

2

|

XfEI X

Page 20: Differential Expressions Bayesian Techniques Lecture Topic 8

Major Challenge II: Computation

• Need to evaluate complicated high dimensional integrals• Lots of technology developed in last 20-25 years

Approaches• Earliest solutions: approximations and numerical integration• Noniterative Monte Carlo: direct sampling, indirect sampling

(importance, rejection)• Markov Chain Monte Carlo (MCMC): Gibbs sampling, Metropolis-

Hastings algorithm, hybrid methods . . .

• MCMC most popular and can be implemented in high dimensional situations.

Page 21: Differential Expressions Bayesian Techniques Lecture Topic 8

Simple Example

Page 22: Differential Expressions Bayesian Techniques Lecture Topic 8

Simple Example contd…

• Posterior mean is weighted average of prior mean and data mean

■ Sample average is shrunk toward prior mean ■ Weight depends on relative variability of prior and data

• Posterior precision is sum of prior precision and data precision

• Samples from posterior are easy to get given data, σ², μ, τ²

Page 23: Differential Expressions Bayesian Techniques Lecture Topic 8

Lessons from Example

General principle: posterior is compromise between prior and data

• μ and τ² not known

■ Empirical Bayes: estimate μ and τ²

■ Hierarchical Bayes: put prior on μ and τ² as well

Page 24: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayesian Hypothesis Testing

• The idea is due to Jefferys (1961).

• Idea: Based on the data that each hypothesis is supposed to predict, one applies Bayes’ Theorem and computes the posterior probability that the first hypothesis is correct.

• UNLIKE Classical methods the hypothesis DO NOT have to be nested within each other.

Page 25: Differential Expressions Bayesian Techniques Lecture Topic 8

Mechanics of Bayesian Hypothesis Testing

• Lets consider we have two hypothesis H0 and H1 (the Bayesians prefer to use the word “models” as opposed to hypothesis, but we will keep “hypothesis” to be consistent with the classical ideas).

• Let H0 and H1 be two hypotheses concerning the data Y, and let q0 and q1 be the associated parameters.

• We define pi (qi) as the corresponding priors.• Let fi(y | qi) be the corresponding marginal distributions.• We can use Bayes’ Theorem to calculate, P(qi |y) the posteriors. • Bayes’ hypothesis testing consists of finding the following and using

pre-specified cut-offs for decisions:– B=[P(q0|y)/P(q1|y)]/[P(q0)/P(q1)] (Bayes’ Factor)– P( q0 | Y=y), P(q0 | Y>=y) (Bayesian p-values)

Page 26: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayesian Hypothesis Tests in Microarrays

• Let

Hg1: gene is differentially expressed

Hg0: gene is not differentially expressed

• Traditional Bayesians would write this as

otherwise 0

expressedally differenti is gene theif 1gv

Page 27: Differential Expressions Bayesian Techniques Lecture Topic 8

Method 1

• Differential Expression Score• Use t-statistic or Wilcoxon Rank sum statistic, zg

• Then Calculate P(H0 | zg=z) or P(H0 | zg z) or

• P(vg=0 | zg=z) or P(vg=0 | zg z)

• McClure and Wit (2004) show that the second term is identical to using the FDR method for controlling error.

Page 28: Differential Expressions Bayesian Techniques Lecture Topic 8

Fully Bayesian Analysis

• In general we are interested in:• The term given below where p0 is the fraction of inactive

genes in the array, F0 is the distribution under the null hypothesis, v=0, F is the distribution of the test statistic

)(ˆ1

)(1ˆ)|0(ˆ 0

0zF

zFpzzvP gg

Page 29: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayesian t test

• The t statistic is given by:

• Assume: zg|{vg=0} ~ N(0,s02)

• zg|{vg=1} ~ N(0,s12)

• Hence, zg ~ (1-p1) N(0,s02)+ p1 N(0,s1

2)

se

xxt gg )( 21

Page 30: Differential Expressions Bayesian Techniques Lecture Topic 8

Bayesian t test: Priors

• p1 ~ Uniform(0,1)• vg ~ Bernoulli (p1) 1/s0

2 ~ Gamma(a,b), 1/s12 ~ Gamma(g,d),

b ~ Gamma( 1l , 1t ), d ~ Gamma( 2l , 2t ), q = (v, p1, s0

2,a, b, s12 ,g,d, 1l , 1, 2t l , 2t )

These are all conjugate priors to make the calculations easier.

One uses the Gibbs sampler to simulate from P(q| z) to estimate p1, s0

2 ,s12 to calculate the required probability.

Page 31: Differential Expressions Bayesian Techniques Lecture Topic 8

Gibbs Sampler

• It is used to calculate the poster mean.• It does not calculate P(q|y) explicitly. It simulates draws from this

distribution. Using sample summaries we get a good idea of the joint posterior as well as the marginal distribution of interest P(v| y).

• It samples from the distribution of P(qi| q-i,y), until it converges to a stationery distribution. This is called “burn-in”.

• After burn-in each draw of q is a draw from a posterior distribution.• Bayes Theorem states that the conditional distribution of P(qi|q-i,y) is

proportional to the likelihood of the prior, P(y|q)P(q) as a function of qi.

• If the marginal distributions without the specific component is defined (generally using conjugate priors) this procedure can be applied easily.

Page 32: Differential Expressions Bayesian Techniques Lecture Topic 8

Empirical Bayes Idea• The prior distributions depend upon unknown parameters which in

turn may need a second or higher stage prior in some hierarchical setting.

• But at some point we HAVE to specify all remaining parameters of the hyper-prior.

• In other words we HAVE to use our knowledge to specify our prior.• The Empirical Bayes method uses sample data to estimate the

parameters for the final stage prior.• The idea is if we are interested in q|y, let q ~ P(h1), h1~P(h2)… • h L-1~P(hL). • In the empirical Bayes idea we use the data to estimate the parameter

hL obtained as the value that maximizes the marginal likelihood P(Y| hL).

• We replace the estimate of hL in the priors, and the posterior distribution is now P(q|y,est- hL).

Page 33: Differential Expressions Bayesian Techniques Lecture Topic 8

Empirical Bayes’ Idea in Differential Expression

• Average log fold change.• Problem: non DE genes with large variances have too much chance

of being selected.• t-statistics• Problem: apparently DE genes with very small sample variances are

suspect.• Moderated t-statistics A happy compromise between the two above,

an empirical Bayes estimate, using data to estimate the new se, sg.

Generally

css gg ~

Page 34: Differential Expressions Bayesian Techniques Lecture Topic 8

The moderated t statistic

• Smoothed standard deviations: shrink towards

• Eliminates large t-statistics due merely to very small s values,and reduces the impact of very large s values.

,)(

~

0

22

210

g

gggg dd

sdsds

Page 35: Differential Expressions Bayesian Techniques Lecture Topic 8

EB Idea

• Posterior odds (for DE)• Posterior probability of differential expression for any

gene is

• A monotonic function of t˜ 2 for constant d.

Page 36: Differential Expressions Bayesian Techniques Lecture Topic 8

Estimating hyper-parameters

Closed form estimators with good properties are available:

for s0 and d0 in terms of the first two moments of log s2.

for c0 in terms of quantiles of the | t˜g | .

Nowadays the EB estimate is used most often for differential expressions and the genes are ranked by the EB estimates.

Instead of doing strict Error Control, the top g genes are looked at using EB estimates for ranking purposes. Sometimes | t˜g | >4 is used as an empirical cut-off.

Limma in R uses empirical Bayes estimates for looking at which genes are differentially expressed.