lecture 4: statistics review ii date: 9/5/02 hypothesis tests: power estimation: likelihood,...
TRANSCRIPT
![Page 1: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/1.jpg)
Lecture 4: Statistics Review II
Date: 9/5/02Hypothesis tests: powerEstimation: likelihood, moment estimation, least squareStatistical properties of estimators
![Page 2: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/2.jpg)
Types of Errors
False positive (Type I): Probability () that H0 rejected when it is true.
False negative (Type II): Probability () of accepting H0 when it is false.
Accept H0 Reject H0
H0 true 1- Type I = H0 false Type II = power = 1-
![Page 3: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/3.jpg)
Example: Error
H0: Central Chi-SquareHA: Non-Central Chi-Square withnon-centrality parameter E(G)
1-
1-
![Page 4: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/4.jpg)
Power of a Test
Definition: The statistical power of a test is defined as 1-
The power is only defined when HA is defined, the experimental conditions (e.g. sample size) are known and the significance level has been selected.
Example: calculate sample size needed to obtain particular linkage detection power.
![Page 5: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/5.jpg)
Estimation
Hypothesis testing allows us to make qualitative conclusions regarding the suitability or not of a statement (H0).
Often we want to make quantitative inference, e.g. an actual estimate of the recombination fraction, not just evidence that genes are linked.
![Page 6: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/6.jpg)
Estimation
Definition: Point estimation is the process of estimating a specific value of the parameter based on observed data X1, X2,…,Xn.
Definition: Interval estimation is the process of estimating upper and lower limits within which the unknown parameter occurs with certain probability.
![Page 7: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/7.jpg)
Definition: The maximum likelihood estimator is the which maximizes the likelihood function.
The MLE is obtained by analytically solving the score, S=0 grid search Newton-Raphson iteration Expectation and Maximization (EM) algorithm
Maximum Likelihood Estimation
![Page 8: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/8.jpg)
MLE: Grid Search
Plot likelihood L or log likelihood l vs. parameter throughout the parameter space.
Obtain MLE by visual inspection or search algorithm.
![Page 9: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/9.jpg)
MLE: Grid Search Algorithm
1. Initially estimate 0. Pick a step size .
2. At step n, evaluate L (or l) on both sides of n.
3. Choose n+1= n+ if L is increasing to the right, else choose n+1= n-.
4. Repeat steps 2 and 3 until no longer advance. Choose smaller , and repeat steps 2, 3, and 4 until desired accuracy met.
![Page 10: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/10.jpg)
MLE: Grid Search Problems
Multiple peaks result in a failure to find the global maximum likelihood.
Solving for multiple simultaneous parameters gets computationally intensive and difficult to interpret visually when there are more than two parameters.
![Page 11: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/11.jpg)
Example: 2-Locus with Non-Penetrant Allele
Let be the recombinant fraction between marker A and gene of interest B.
Let be the probability that the allele of the gene of interest (f) fails to be detected at the phenotype level (i.e. 1- is the penetrance).
Cross +F/+F –f/–f. Score gametes of an F1 +F/–f individual for +/-
phenotype and P/p phenotype, where P means F or non-penetrant f and p means penetrant f.
B(F or f)A(+ or -)
![Page 12: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/12.jpg)
Experimental Data
P(+F gamete) = P(–f gamete) = 0.5(1-) P(–F gamete) = P(+f gamete) = 0.5 P(+P gamete) = 0.5(1-) + 0.5 P(–P gamete) = 0.5 + 0.5(1-) P(+p gamete) = 0.5(1-) P(–p gamete) = 0.5(1-) (1-) Observe n+P, n-P, n+p, n-p.
![Page 13: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/13.jpg)
Experimental Data: Log Likelihood
11log1log
1log1log
111
11
pp
PP
nn
nn
nn
nnl
Lpp
PP
![Page 14: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/14.jpg)
Experimental Data: Grid Search
![Page 15: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/15.jpg)
Newton-Raphson: One Parameter
Let S() be the score. Then the MLE is obtained through the
equation Taylor expansion of S() for n near the
MLE, gives
0)ˆ( S
0ˆˆ
d
dSSS m
mm
ddS
S
m
mm /
ˆ
ddS
S
m
mmm /1
![Page 16: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/16.jpg)
Newton-Raphson: Analysis
NR fits a parabola to the likelihood function at the point of the current parameter estimate. Obtain a new parameter estimate at the maximum of the parabola.
NR may fail when there are multiple peaks. NR may fail when the information is zero (when the
estimate is at the extremes). Recommendations: Use multiple starting initial
values. Bound new estimates.
![Page 17: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/17.jpg)
Newton-Raphson: Multiple Parameters
mmmm SIN
11
1
N is the total sample size.S(m) is the score vector evaluated at m.I-1 (m) is the inverse information matrix evaluated at m.
![Page 18: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/18.jpg)
EM Algorithm: Incomplete Data
The notion of incomplete data:
AB ab Ab aB
AB AB/AB AB/ab AB/Ab AB/aB
ab ab/AB ab/ab ab/Ab ab/aB
Ab Ab/AB Ab/ab Ab/Ab Ab/aB
aB aB/AB aB/ab aB/Ab aB/aB
![Page 19: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/19.jpg)
EM Algorithm: Example of Incomplete Data
+P gamete may result from nonrecombinant +F or from recombinant, non-penetrant +f.
+p gamete can only result from penetrant, nonrecombinant +f.
–P gamete can result from recombinant –F or from nonrecombinant, non-penetrant –f gene.
–p gamete can result only from nonrecombinant –f.
![Page 20: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/20.jpg)
EM Algorithm
Make an initial guess 0. Expectation step: Pretend that n for iteration n is
true. Estimate the complete data. This usually request distribution of complete data conditional on the observed data. For example: P(recombinant|observed phenotype).
Maximization step: Compute the maximum likelihood estimate for step n .
Repeat E & M steps until likelihood converges.
n
![Page 21: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/21.jpg)
Example: E Step
E Step:
5.015.0
5.0
P
ANDpenetrant -nonPpenetrant-nonP
penetrant-nonPpenetrants-nonE
trecombinanPtsrecombinanE
4
1
4
1
P
PP
Pf
Pf
iii
iii
![Page 22: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/22.jpg)
Example: M Step
M Step:
recessive
1
1
penetrants-nonE
tsrecombinanE
N
N
n
n
![Page 23: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/23.jpg)
Moment Estimation
Obtain equations for the population moments in terms of the parameters to estimate.
Substitute the sample moments and solve for the parameters.
For example: binomial distribution
m1 = np
n
X
n
mp
n
ii
11ˆ
![Page 24: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/24.jpg)
Example: Moment Estimation
ntP
tp
tPpppP
PpP epepeppttt ,,mgf
pp
PP
pp
PP
npm
npm
npm
npm
nfp
nfp
nfp
nfp
pp
PP
pp
PP
ˆ
ˆ
ˆ
ˆ
15.0ˆ
115.0ˆ
15.0ˆ
15.0ˆ
P
p
p
P
p
p
p
p
![Page 25: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/25.jpg)
Moment Estimation: Problems
Large sample properties of ML estimators are usually better than those for the corresponding moment estimators.
Sometimes solution of moments equations are not unique.
![Page 26: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/26.jpg)
Least Squares Estimation
XY
XXYXYY '''2''
YXXX ''ˆ 1
YXYY
YXYY
RRreduced
full
SSE
SSE
''ˆ'
'ˆ'
kNkNF
kN
SSEkN
SSE
Ffull
reduced
,1~1
![Page 27: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/27.jpg)
Variance of an Estimator
k
iik 1
22ˆ
ˆ1
1ˆ
•Suppose k independent estimates are available for :
•Suppose you have a large sample, then the variance of the MLEis approximately:
nI
1ˆ 2
ˆ Cramer-Raolower bound for variance
•Empirical estimates using resampling techniques.
![Page 28: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/28.jpg)
Variance: Linear Estimator
k
i
k
ijjiji
k
iiikk ccccc
1 11
211 ,Cov2VarVar
![Page 29: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/29.jpg)
Variance: General Function f(, , …, )
k
i
k
i
k
ijji
jii
i
k
d
df
d
df
d
df
f
1 1 1
2
1
,Cov2Var
,,Var
![Page 30: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/30.jpg)
Bias
The mean square estimator (MSE) is defined as
bias
2ˆE MSE
22
ˆˆE
ˆEˆEˆE
MSE
If an estimator is unbiased, the MSE and variance are the same.
![Page 31: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/31.jpg)
Estimating Bias
Bootstrap:
b
iiB b
Bias1
ˆˆ1
bootstrap estimator for bootstrap trial i
original estimate
![Page 32: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/32.jpg)
Confidence Interval
Because of sampling error, the estimate is not exactly the true parameter value .
A confidence interval is symmetric if
A confidence interval is non-symmetric if
A confidence interval is one-sided if
ULˆP
0P OR 0P ˆˆ UL
UL ˆˆ PP
2PP ˆˆ UL
![Page 33: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/33.jpg)
Confidence Interval: Normal Approximation I
Need pivotal quantity, i.e. a quantity that depends on the data and the parameters but whose distribution does not.
If the estimate is unbiased and normally distributed with variance , then the pivotal quantity is
ˆ
ˆ
ˆ
![Page 34: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/34.jpg)
Confidence Interval: Normal Approximation II
The MLE is asymptotically normally distributed.
ˆ15.0ˆ15.0
15.0ˆ
15.0
ˆˆP
ˆP
zz
zz
![Page 35: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/35.jpg)
Confidence Interval: Nonparametric Approximation
xxCDF b P
5.0,15.0 11 CDFCDF
percentile method
![Page 36: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/36.jpg)
Bootstrap Example
Generate a multinomial random variable with the given proportions b times and generate a bootstrap dataset. Estimate parameters and .
+P +p –P –p
Count 168 3 52 163
Proportion 0.44 0.01 0.13 0.42
b b
![Page 37: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/37.jpg)
Confidence Interval: Likelihood Approach
Let Lmax be the maximum likelihood for a given model. Find the parameter values L and U such that
log Lmax – log L() = 2
Then (L, U) serves as a confidence interval.
![Page 38: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/38.jpg)
LOD Score Support
The LOD score support for a confidence interval is
log10Lmax –log10L
where L is the likelihood at the limit values of the parameter.
In practice, you plot the LOD score support for various values of the parameter and choose the upper and lower bounds such that the LOD score support is 1.
![Page 39: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/39.jpg)
Choosing Good Confidence Intervals
The actual coverage probability should be close to the confidence coefficient.
Should be biologically relevant. For example, the following is not:
(0.1,0.6)
![Page 40: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/40.jpg)
Good Estimator: Consistent
An estimator is mean squared error consistent if the MSE approaches zero as the sample size approaches infinity.
An estimator is simple consistent if 1ˆPlim
n
![Page 41: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/41.jpg)
Good Estimator: Unbiased
An unbiased estimator is usually better than a biased one, but this may not always be true. If the variance is larger, what have we gained?
There are bootstrap techniques for obtaining a bias-corrected estimate. These are computationally more intensive than bootstrap, but sometimes worth it.
![Page 42: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/42.jpg)
Good Estimator: Asymptotically Normal
If the pivotal quantity
is normal with mean 0 and variance 1 as the sample size goes to infinity, it can be a very convenient property of the estimator.`
ˆ
ˆ
![Page 43: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/43.jpg)
Good Estimator: Confidence Interval
A good estimator should have a good way to obtain an confidence interval. MLE are good in this way if the sample size is large enough.
![Page 44: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/44.jpg)
Sample Size for Power
1-
1-
Need E(G)> 12
,,1 2
df
E(G)=nE(Gunit)
unit
12
,,1
E
2
Gn df
![Page 45: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/45.jpg)
Sample Size for Target Confidence Interval
Confidence interval by normal approximation is
The bigger the range , the less precise the confidence interval.
Suppose we wish to have
ˆ2/1ˆ
z
ˆ2/12 z
dz ˆ2/12
![Page 46: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/46.jpg)
Sample Size for Target Confidence Interval II
Then,
I
d
z
n
dnI
z
2
2/1
2/1
2
12
![Page 47: Lecture 4: Statistics Review II Date: 9/5/02 Hypothesis tests: power Estimation: likelihood, moment estimation, least square Statistical properties](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f295503460f94c43432/html5/thumbnails/47.jpg)
Summary
Distributions Likelihood and Maximum Likelihood
Estimation Hypothesis Tests Confidence Intervals Comparison of estimators Sample size calculations