lecture 1: basic statistical tools - university of...
TRANSCRIPT
![Page 1: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/1.jpg)
1
Lecture 1:Basic Statistical Tools
Bruce Walsh lecture notesUppsala EQG 2012 course
version 28 Jan 2012
![Page 2: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/2.jpg)
2
A random variable (RV) = outcome (realization) not a setvalue, but rather drawn from some probability distribution
A discrete RV x --- takes on values X1, X2, … Xk
Probability distribution: Pi = Pr(x = Xi)
Pi > 0, ! Pi = 1
Discrete Random Variables
Example: Binominal random variable. Let p = prob of a success.
Prob (k successes in n trails) = n!/[ (n-k)! k!] pk (1-p)n-k
Probabilities are non-negativeand sum to one
Example: Poisson random variable. Let " = expected number of
successes. Prob (k successes) = "k exp(- ")/k!
![Page 3: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/3.jpg)
3
A continuous RV x can take on any possible value in some interval (or set of intervals). The probability distribution is defined by the probability density function, p(x)
Continuous Random Variables
p(x) ≥ 0 and
∫ ∞
−∞p(x)dx = 1
P(x1 ≤ x ≤ x2) =x2
x1
p(x)dx
∫
cdf(x) =∫ x
−∞p(x)dx
Finally, the cdf, or cumulative probability function,is defined as cdf(z) = Pr( x < z)
Prob is area underthe curve
![Page 4: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/4.jpg)
4
Example: The normal (or Gaussian)distribution
Mean µ, variance #2
Unit normal(mean 0, variance 1)
![Page 5: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/5.jpg)
5
Mean (µ) = peak of distribution
The variance is a measure of spread about the mean. Thesmaller #2, the narrower the distribution about the mean
![Page 6: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/6.jpg)
6
Joint and Conditional Probabilities
The probability for a pair (x,y) of random variables isspecified by the joint probability density function, p(x,y)
The marginal density of x, p(x)
P ( y1 ≤ y ≤ y2, x1 ≤ x ≤ x2 ) =∫ y2
y1
∫ x2
x1
p(x, y)dx dy
p(x) =∫ ∞
−∞p (x, y)dy
![Page 7: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/7.jpg)
7
Joint and Conditional Probabilities
p(y|x), the conditional density y given x
Relationships among p(x), p(x,y), p(y|x)
x and y are said to be independent if p(x,y) = p(x)p(y)
Note that p(y|x) = p(y) if x and y are independent
p(x, y) = p(y | x )p(x), hence p(y |x ) =p(x, y)p(x)
P( y1 ≤ y ≤ y2 |x ) =y2
y1
p( y |x ) dy
∫
![Page 8: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/8.jpg)
8
Bayes’ TheoremSuppose an unobservable RV takes on values b1 .. bn
Suppose that we observe the outcome A of an RV correlatedwith b. What can we say about b given A?
Bayes’ theorem:
A typical application in genetics is that A is somephenotype and b indexes some underlying (but unknown) genotype
Pr(bj | A) =Pr(bj)Pr(A |bj )
Pr(A)=
Pr(bj) Pr(A | bj)n∑
i=1
Pr(bi) Pr(A | bi)
![Page 9: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/9.jpg)
9
Example: BRCA1/2 & Breastcancer
• NCI statistics:– 12% is lifetime risk of breast cancer in females
– 60% is lifetime risk if carry BRCA 1 or 2 mutation
– One estimate of BRCA 1 or 2 allele frequency is around2.3%.
– Question: Given a patent has breast cancer, what is thechance that she has a BRCA 1 or BRCA 2 mutation?
![Page 10: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/10.jpg)
10
• Here– Event B = has a BRCA mutation
– Event A = has breast cancer
• Bayes: Pr(B|A) = Pr(A|B)* Pr(B)/Pr(A)– Pr(A) = 0.12
– Pr(B) = 0.023
– Pr(A|B) = 0.60
– Hence, Pr(BRCA|Breast cancer) = [0.60*0.023]/0.12 =0.115
• Hence, for the assumed BRCA frequency (2.3%),11.5% of all patients with breast cancer have aBRCA mutation
![Page 11: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/11.jpg)
11
0.90.60.3Pr(height >70 | genotype)
0.20.30.5Freq(genotype)
qqQqQQGenotype
Pr(height > 70) = 0.3*0.5 +0.6*0.3 + 0.9*0.2 = 0.51
Pr(QQ | height > 70) = Pr(QQ) * Pr (height > 70 | QQ)
Pr(height > 70)
= 0.5*0.3 / 0.51 = 0.294
Second example: Suppose height > 70. What isThe probability individual is QQ, Qq, qq?
![Page 12: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/12.jpg)
12
Expectations of Random Variables
The expected value, E [f(x)], of some function x of therandom variable x is just the average value of that function
E[x] = the (arithmetic) mean, µ, of a random variable x
E(x) = µ =+∞
−∞xp(x)dx
∫
E[f(x)] =∑
i
Pr(x = Xi)f(Xi) x discrete
E[f(x)] =∫ +∞
−∞f(x)p(x)dx x continuous
-
![Page 13: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/13.jpg)
13
Expectations of Random Variables
E[ (x - µ)2 ] = # 2, the variance of x
More generally, the rth moment about the mean is given by E[ (x - µ)r ] r =2: variance. r =3: skew
r = 4: (scaled) kurtosis
Useful properties of expectations
E[g(x) + f(y)] = E[g(x)] + E[f(y)]E(cx) = cE(x)
E[(x− µ)2
]= σ2 =
∫ +∞
−∞(x− µ)2 p(x)dx
![Page 14: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/14.jpg)
14
The truncated normalOnly consider values of T or above in a normal
T
Let $T = Pr(z > T)
E[ z | z > T ] =∫ ∞
t
z p(z)πT
dz = µ +σ · pT
πT
Mean of truncated distribution
pT = (2π) 1/2 exp[−(T µ)2
2σ2
]
Here pT is the height of the normalat the truncation point,
∫
Density function = p(x | x > T)
p(z)Pr(z > T )
=p(z)
∞T p(z)dz
![Page 15: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/15.jpg)
15
The truncated normal
Variance
[
1 +pT · (z µ)/σ
πT
(pT
πT
)2]
σ2
pT = height of normal at T$T = area of curve to right of T
![Page 16: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/16.jpg)
16
Covariances
• Cov(x,y) = E [(x-µx)(y-µy)]
X
Y
cov(X,Y) > 0
Cov(x,y) > 0, positive (linear) association between x & y
• = E [x*y] - E[x]*E[y]
![Page 17: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/17.jpg)
17
Cov(x,y) < 0, negative (linear) association between x & y
X
Y
cov(X,Y) < 0
Cov(x,y) = 0, no linear association between x & y
X
Y
cov(X,Y) = 0
![Page 18: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/18.jpg)
18
Cov(x,y) = 0 DOES NOT imply no association
X
Y
cov(X,Y) = 0
![Page 19: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/19.jpg)
19
Correlation
Cov = 10 tells us nothing about the strength of anassociation
What is needed is an absolute measure of association
This is provided by the correlation, r(x,y)
r = 1 implies a perfect (positive) linear association
r = - 1 implies a perfect (negative) linear association
√r(x, y) =Cov(x, y)
V ar(x)V ar(y)
![Page 20: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/20.jpg)
20
Useful Properties of Variances andCovariances
• Symmetry, Cov(x,y) = Cov(y,x)
• The covariance of a variable with itself is thevariance, Cov(x,x) = Var(x)
• If a is a constant, then– Cov(ax,y) = a Cov(x,y)
• Var(a x) = a2 Var(x).– Var(ax) = Cov(ax,ax) = a2 Cov(x,x) = a2Var(x)
• Cov(x+y,z) = Cov(x,z) + Cov(y,z)
![Page 21: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/21.jpg)
21
Hence, the variance of a sum equals the sum of theVariances ONLY when the elements are uncorrelated
More generally
Cov n∑
i=1
xi,m∑
j=1
yj =
n∑
i=1
m∑
j=1
Cov(xi, yj)
V ar(x + y) = V ar(x) + V ar(y) + 2Cov(x, y)
Question: What is Var(x-y)?
![Page 22: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/22.jpg)
22
RegressionsConsider the best (linear) predictor of y given we know x
The slope of this linear regression is a function of Cov,
The fraction of the variation in y accounted for by knowing x, i.e,Var(yhat - y), is r2
|y = y + by x( x x )
by | x =Cov(x, y)V ar(x)
![Page 23: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/23.jpg)
23
In this case, the fraction of variation accounted forby the regression is b2
r(x, y) =Cov(x, y)√
V ar(x)V ar(y)= by|x
√V ar(x)V ar(y)
Relationship between the correlation and the regressionslope:
If Var(x) = Var(y), then by|x = b x|y = r(x,y)
![Page 24: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/24.jpg)
24
r2 = 0.6r2 = 0.9
![Page 25: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/25.jpg)
25
Properties of Least-squares Regressions
The slope and intercept obtained by least-squares: minimize the sum of squared residuals:
• The average value of the residual is zero
• The LS solution maximizes the amount of variation in y that can be explained by a linear regression on x
• The residual errors around the least-squares regression are uncorrelated with the predictor variable x
• Fraction of variance in y accounted by the regression is r2
• Homoscedastic vs. heteroscedastic residual variances
∑e2i =
∑(yi yi)2 =
∑(yi a bxi)2
![Page 26: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/26.jpg)
26
Different methods of analysis
• Parameters of these various models can beestimated in a number of frameworks
• Method of moments
– Very little assumptions about the underlying distribution.Typically, the mean of some statistic has an expectedvalue of the parameter
– Example: Estimate of the mean µ given by the samplemean,x, as E(x) = µ.
– While estimation does not require distributionassumptions, confidence intervals and hypothesis testingdo
• Distribution-based estimation
– The explicit form of the distribution used
![Page 27: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/27.jpg)
27
Distribution-based estimation
• Maximum likelihood estimation– MLE
– REML
– More in Lynch & Walsh (book) Appendix 3
• Bayesian– Marginal posteriors
– Conjugating priors
– MCMC/Gibbs sampling
– More in Walsh & Lynch (online chapters = Vol 2)Appendices 2,3
![Page 28: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/28.jpg)
28
Maximum Likelihood p(x1,…, xn | % ) = density of the observed data (x1,…, xn)given the (unknown) distribution parameter(s) %
Fisher suggested the method of maximum likelihood ---given the data (x1,…, xn) find the value(s) of % thatmaximize p(x1,…, xn | % )
We usually express p(x1,…, xn | %) as a likelihood function l ( % | x1,…, xn ) to remind us that it is dependenton the observed data
The Maximum Likelihood Estimator (MLE) of % are the value(s) that maximize the likelihood function l giventhe observed data x1,…, xn .
![Page 29: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/29.jpg)
29
l (% | x)
MLE of %
The curvature of the likelihood surface in the neighborhoodof the MLE informs us as to the precision of the estimatorA narrow peak = high precision. A board peak = lowerprecision
This is formalize by looking at the log-likelihood surface,L = ln [l (% | x) ]. Since ln is a monotonic function, thevalue of % that maximizes l also maximizes L
The larger the curvature, the smaller the variance
Var(MLE) = -1 ∂2 L(µ| z)∂ µ2
%
![Page 30: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/30.jpg)
30
Likelihood Ratio testsHypothesis testing in the ML frameworks occursthrough likelihood-ratio (LR) tests
For large sample sizes (generally) LR approaches aChi-square distribution with r df (r = number ofparameters assigned fixed values under null)
LR = 2 ln(
$(Θr |z)$(Θ | z)
)= 2
[L(Θr | z)−L(Θ | z)
]
%r is the MLE under the restricted conditions (someparameters specified, e.g., var =1)
& r is the MLE under the unrestricted conditions (noparameters specified)
![Page 31: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/31.jpg)
31
Bayesian StatisticsAn extension of likelihood is Bayesian statistics
p(% | x) = C * l(x | %) p(%)
Instead of simply estimating a point estimate (e.g.., the MLE), the goal is the estimate the entire distribution for the unknown parameter % given the data x
p(% | x) is the posterior distribution for % given the data x
l(x | %) is just the likelihood function
p(%) is the prior distribution on %.
![Page 32: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/32.jpg)
32
Bayesian Statistics
Why Bayesian?
• Exact for any sample size
• Marginal posteriors
• Efficient use of any prior information
• MCMC (such as Gibbs sampling) methods
Priors quantify the strength of any prior information.Often these are taken to be diffuse (with a highvariance), so prior weights on % spread over a widerange of possible values.
![Page 33: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/33.jpg)
33
Marginal posteriors• Often times were are interested in a particular
set of parameters (say some subset of the fixedeffects). However, we also have to estimate all ofthe other parameters.
• How do uncertainties in these nuisance parametersfactor into the uncertainty in the parameters ofinterest?
• A Bayesian marginal posterior takes this intoaccount by integrating the full posterior over thenuisance parameters
• While this sound complicated, easy to do withMCMC (Markov Chain Monte Carlo)
![Page 34: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/34.jpg)
34
Conjugating priors
For any particular likelihood, we can often find aconjugating prior, such that the product of the likelihood and the prior returns a known distribution.
Example: For the mean µ in a normal, taking theprior on the mean to also be normal returns aposterior for µ that is normal.
Example: For the variance #2 in a normal, taking theprior on the variance to an inverse chi-square distribution returns a posterior for #2 that is alsoan inverse chi-square (details in WL Appendix 2).
![Page 35: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/35.jpg)
35
A normal prior on the mean with mean µ0 andvariance #0
2 (larger #0
2, more diffuse the prior)
If the likelihood for the mean is a normal distribution,the resulting posterior is also normal, with
Note that if #02 is large, the mean of the posterior is
very close to the sample mean.
![Page 36: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/36.jpg)
36
If x follows a Chi-square distribution, then 1/x followsan inverse chi-square distribution.
The scaled inverse chi-square has two parameters, allowingmore control over the mean and variance of the prior
![Page 37: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/37.jpg)
37
MCMC
Analytic expressions for posteriors can be complicated,but the method of MCMC (Markov Chain Monte
Carlo) is a general approach to simulating draws for just about any distribution (details in WL Appendix 3).
Generating several thousand such draws from the posterior returns an empirical distribution that wecan use.
For example, we can compute a 95% credible interval,the region of the distribution that containing 95% ofthe probability.
![Page 38: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/38.jpg)
38
Gibbs Sampling• A very powerful version of MCMC is the Gibbs Sampler
• Assume we are sampling from a vector of parameters,but that the marginal distribution of each parameter isknown
• For example, given a current value for all the fixedeffects (but one, say '1) and the variances, conditioningon these values the distribution of '1 is a normal, whoseparameters are now functions of the current values ofthe other parameters. A random draw is then generatedfrom this distribution.
• Likewise, conditioning on all the fixed effects and allvariances but one, the distribution of this variance is aninverse chi-square
![Page 39: Lecture 1: Basic Statistical Tools - University of Arizonanitro.biosci.arizona.edu/workshops/Uppsala2012/...1 Lecture 1: Basic Statistical Tools Bruce Walsh lecture notes Uppsala EQG](https://reader035.vdocuments.mx/reader035/viewer/2022071302/60ad381e34a69138dd35716e/html5/thumbnails/39.jpg)
39
This generates one cycle of the sampler. Using thesenew values, a second cycle is generated.
Full details in WL Appendix 3.