Bayesian Inference
C tC tConceptConcept
�Statistical Inference�a guide to the unknown. Uncertainty
�Probability�classical : long-run relative frequencya guide to the unknown. Uncertainty classical : long run relative frequency
�Laplace: equally likely
�Bayesian: update by data or event• Thomas Bayes(Reverend): special caseRandom Sample Data
(X1 X2 Xn) ~iid f(x;θ)Population
hnu
.ac.
kr Introdu
• Pierre-Simon Laplace: celestial mechanics, medical statistics, reliability
• P(Hypothesis | Data) ∝ P(D | H) P(H)
(X1, X2, …, Xn) ~iid f(x;θ)Populationx~f(x;θ)
SamplingDistribution
MVUE
tp://
wol
fpa
ck.h u
ction�Probability SET function� sample space S, subset C, element w
� (S,Borel field, P)
statisticInformation
Inference
Distribution
htt
�Frequentist
�measure P(C)
�Probability Function
estimationTesting
• parameter θ, fixed (constant) but unknown
�Bayes• θ is random variable
�X(w)=x
�P(X=x) or f(x)
(1)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
S i tifi th dS i tifi th dScientific methodScientific method
�Case �O.J. Simpson Trial
�Premises by William Ockham�A scientific hypo. can never be shown to be O.J. Simpson Trial
• Prob.(H beats W to death) is very low.
• But Prob.(H is criminal | Beaten W is killed) up
M H ll Sh L ’ k d l
A scientific hypo. can never be shown to be absolutely true
�However, It must potentially be disprovable.
� It is a useful model until it is proved not to be true.
hnu
.ac.
kr Introdu
�Monty Hall Show: Let’s make a deal• 3 curtain, 2 donkeys and 1 sports car
• Do you dare to change?
� Always go for the simplest hypo., unless it can be shown to be false.
�Procedure
tp://
wol
fpa
ck.h u
ction
�Procedure�Set a problem in term of the current scientific
hypo. => statistical model, population
�Gather all the relevant information that is
htt
currently available. => about parameters
�Design an experiment that addresses to the model. => randomness reduces the uncertainty and outside factors
�Casino: black jack game (5 cards decks)
�Gather data from experiment. => sample, statistics
� Draw conclusion from the experimental results. => sampling dist. of statistics
(2)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
M i h t St ti tiM i h t St ti tiMain approaches to StatisticsMain approaches to Statistics
�Monte Carlo Study
�Frequentist�Frequentist�Parameter (the characteristic value of population) qParameter (the characteristic value of population)
is fixed but unknown. X~f(x;θ)
�Prob. : long-run relative frequency
� In the sampling dist., the parameter is fixed. );(~ θxfx ),...,,(ˆ 21 nxxxφθ =
Random
hnu
.ac.
kr Introdu
�Bayesian�From the uncertainty on θ, θ is assumed to be
random. π(θ)
�Prior dist must be subjective (degree of belief)
Sample
sampling dist.of θ hat
Before actual data,it calculate their performance
tp://
wol
fpa
ck.h u
ction
�Bayesian• The posterior of θ is conditional on the actual data )( θf SampleRandom
�Prior dist. must be subjective. (degree of belief)
�Revise the belief by the gathering data. π(θ|x)
their performanceby sample space averaging.
htt • The posterior of θ is conditional on the actual data.
However we might know the performance of Bayesian procedure before data.
• Pre-posterior analysis: the θ is fixed but unknown (prior) and random var. (updated by data )at the
)();(~
θπθxfx
);(~),...,,( 21θxfiidx
xxxSampleRandom
i
n Among
all possible data
actual (prior) and random var. (updated by data )at the same time.
),...,,( 21 nxxxData
data
Confidence statementon θ
data
Update the belief
(3)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Update the beliefby Data, )|( xθπ
Bayesian Inference
D tD tDataData
Gathering DATA
S li
Univariate
G hi l S�Sampling� sampling error and non-sampling error
�Probabilistic sampling• SRS, Systematic, Clustering, Stratified
�Graphical Summary�histogram and frequency table
• # of classes: 8~10
• shape of PDF
hnu
.ac.
kr Introdu
SRS, Systematic, Clustering, Stratified
�Non-Prob.• quota, judgment, convenient, snow-ball
�Randomized response Methods
p
• Polygon
�dot plot• shape of PDF
tp://
wol
fpa
ck.h u
ction
• Sensitive question and dummy question
• Known demographical question
Ob ti l E i t lhtt �Observational vs. Experimental
�Can other factors be controlled? • Lung cancer and smoking
�Experiment � stem-leaf plotp• Randomization, Blocking, Replication
• Control vs. treatment
• Placebo (gas injection for stomach ache), Blind test
• double stems, five-stems
• shape of PDF
• more informative that His.
(4)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
D t (2)D t (2)Data (2)Data (2)
�box-whisker plot• 5 elementary statistics
�measure of spread�variance, standard deviation
• shape of PDF• detecting outliers
variance, standard deviation
�Range, IQR
�mean and std
hnu
.ac.
kr Introdu
� same unit
�empirical rule
�Chebyshev’s inequality
tp://
wol
fpa
ck.h u
ctionht
t
Numerical Summary�measure of location�arithmetic mean�median, p percentile� trimmed / winsorized mean
(5)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
trimmed / winsorized mean�geometric mean
Bayesian Inference
D t ( t)D t ( t)Data (cont.)Data (cont.)
Univariate
tt l t
�Test on correlation• Ho: ρ=0 ~ t-test )2(~
2−ntr
�scatter plot�visual representation of functional relation
• Ho: ρ=ρ0, Ho: ρ1 =ρ2 ~ z-test (approximately) by Fisher transformation
2/)1( 2 −− nr
hnu
.ac.
kr Introdu
Fisher transformation
)31,
11ln5.0(~
11ln5.0
−−+
−+
nrrN
rr
tp://
wol
fpa
ck.h u
ction• Determinant Coefficient
SSTSSRR /2 =
htt
�correlation coefficient• measure of linear relationship
• Partial Determinant Coefficient
2132YRp 13.2YR
(6)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
P b bilitP b bilitProbabilityProbability
�Uncertainty�deductive logic, model
�Probability Axiom�P(A)≥0 for any event Adeductive logic, model
�plausible reasoning, probability
�Probability
P(A)≥0 for any event A
�P(S)=1 for sample space S
�P(∪ Ai)=ΣP(Ai) for disjoint Ais
hnu
.ac.
kr Prob
ab
y�measure of plausibility
• nonnegative real number
• larger plausible, larger number
d ff bl b h
�Rules�P(φ)=0, null set
�P(Ac)=1-P(A)
tp://
wol
fpa
ck.h b
ility
• different proposition is possible, but the same plausibility
• all possible outcomes must be taken into account
• the same knowledge gives the same plausibility
�addition rule: P(A∪B)=P(A)+P(B)-P(A∩B)
�Joint Prob. and Independency
htt
�Bayesian uses the rule of prob. to revise the initial belief by data
�Glossary• random experiment outcome
�P(A∩B)
� If P(A∩B)=P(A)P(B), A and B are independent
�disjoint vs. independent
�marginal P(A)=P(AB)+P(ABc)random experiment, outcome
• sample space, S
• element w, Event E
• union, intersection
�marginal P(A)=P(AB)+P(ABc)
(7)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
• mutually exclusive, disjoint
• mutually exhaustive
Bayesian Inference
P b bilit (2)P b bilit (2)
�Total Probability
Probability (2)Probability (2)
�Conditional probability�S reduces to B )(
)()|(BPABPBAP =
�Bayes Theorem
S reduces to B
� for independent events, P(A|B)=P(A)
�multiplicative rule
)(BP
)|()()()|()()(BAPBPBAPBAPBPABP
=
=
∑=∑= )()|()()( jjj BPBAPABPAP
)()|( ii BPBAP
hnu
.ac.
kr Prob
ab
�Bi’s are unobservable events
�P(Bi) prior distribution
�Bayes Theorem)()()|( ABPABPBAP ==
∑=
)()|()()|()|(jjii
i BPBAPBPBAPABP
tp://
wol
fpa
ck.h b
ility� likelihood of Bi is P(A|Bi)
�P(Bi|A) posterior dist.
observed)()|()()|()()|(
)()()()|(
APABPAPABPAPABP
BAPABPBP
+=
+
htt
unobserved
A
B1 B2
B1 A
B2
B3 A∩B3
Bayes universe
A
B3 Bk
…
Bk
Marginal of A
(8)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
LikelihoodiorJoBAPBPABP iii
)|()()(
Bayesian Inference
P b bilit (3)P b bilit (3)Probability (3)Probability (3)
�Left side in Bayes Theorem�posterior
�unobservable space = parameter space, random => a certain value is assigned by the belief.posterior
�Right side in Bayes Theorem�numerator: prior ×likelihood
�denominator: constant such that posterior prob.=1
θ1 A
θ2
θ3 A∩B3
Bayes universe
hnu
.ac.
kr Prob
ab
p p
�Tree diagram
LikelihoodiorPostrior ×∝ Pr
θ3 A∩B3
…
θk
A valueby guess
tp://
wol
fpa
ck.h b
ility
g� starts with observed event, A
�Assign Probability
�Odds ratio and Bayes factor�A event C can be prior or posterior
�B denote Bayes factor
)(1)()(CPCPCOdds
−=
htt
� long-run relative frequency
�degree of belief, subjective
de o e aye ac o
)()(
)()(
CCB
CBC
Odds PriorOdds Posterior
Odds PosteriorOdds Prior
=
=×
�Bayesian uses� long-run rel. freq. for the prob. of outcomes from
random experiment given the value of unobserved var (parameter)
• if B>1, the data makes us believe that the event C is more probable than we thought.
(9)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
unobserved var. (parameter)
� repeating experiment makes sample space
Bayesian Inference
E iE iExerciseExercise
#1 #4
hnu
.ac.
kr Prob
ab
#2 #5
tp://
wol
fpa
ck.h b
ility
#3 #6
htt
¾sensitivity P(PT|P)=(1-false negative),
¾specificity P(NT|PC)=(1-false positive)
¾positive predicted value = Posterior P(C|PT)
(10)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
¾negative predicted value = Posterior P(CC|NT)
Bayesian Inference
E i (2)E i (2)Exercise (2)Exercise (2)
#7
hnu
.ac.
kr Prob
ab
tp://
wol
fpa
ck.h b
ilityht
t
(11)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
Di t R d V i blDi t R d V i blDiscrete Random VariableDiscrete Random Variable
�Random Variable�X(w)=x; real valued function
�Discrete Prob. Mass Function)()()( xXPxPxf ===X(w) x; real valued function
�w is an element of sample space, S
Sample Space
�(mathematical) Expected value
�long-run average
)()()( xXPxPxf
∑=x
xxPXE )()(
hnu
.ac.
kr Discret
w1, w2, w3, …
�Variance and STD
x
22
2
)()(
)())(()(
XEXE
xPXExXVx∑ −=
Prob
tp://
wol
fpa
ck.h e R
and
om V
ari
real number
�Probability Density Function� a function that assigns a probability to each
x 22 )()( XEXE −=Prob.relative freq.p(x)
htt ia
ble
� a function that assigns a probability to each value of random variable.
� induced probability ))(:()( xwXCwPxPX =∈=
�Mean and Var. of Linear function
X
�Discrete or Continuous� the outcomes are distinct numbers
�F(x) is step function�Extension
bXaY +=)()(
)()(2 XVbYV
XbEaYE
=
+=
= ii XEbYE )()(
(12)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
�outcome can occur in any small intervals.Extension
∑=i
ii XbY ∑+=< ji
jijiii
ii
XXCOVbbXVbYV ),(2)()(
)()(2
Bayesian Inference
F Di t R d V i blF Di t R d V i blFamous Discrete Random VariableFamous Discrete Random Variable
�Bernoulli Trial�dichotomous outcome (success, fail)
�Hyper-geometric (N, K,n)�X=# of subjects from the interesting groupdichotomous outcome (success, fail)
� the prob. of being success remains constant
� independent trials
X # of subjects from the interesting group
�without replacement
�PMF
K N
x
out of nKNK⎟⎟⎞
⎜⎜⎛ −⎟⎟⎞
⎜⎜⎛
hnu
.ac.
kr Discret
�Bernoulli (p)�X=the result of Bernoulli trial, 0 or 1
�parameter p = prob. of successMean nK/N
x
nx
nNxnx
xp ,...,2,1,0,)( =
⎟⎟⎠
⎞⎜⎜⎝
⎛
⎟⎟⎠
⎜⎜⎝ −⎟⎟⎠
⎜⎜⎝=
tp://
wol
fpa
ck.h e R
and
om V
ari
�PMF
�Mean=p, Variance=pq
�Mean=nK/N�N is very large1,0,)1()( 1 =−= − xppxp xx
),(),,(NKnBnKNHG →
htt ia
ble
�Binomial (n, p)�X= # of successes in “n” Bernoulli Trials with “p”
�PMF �PMF
�Mean=np, Variance=npq
nxppxn
xp xx ,...,2,1,0,)1()( 1 =−⎟⎟⎠
⎞⎜⎜⎝
⎛= −
(13)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
F Di t R d V i bl (2)F Di t R d V i bl (2)Famous Discrete Random Variable (2)Famous Discrete Random Variable (2)
�Geometric (p)�X=# of Bernoulli trials (p) until one success
�Poisson Dist. (λ)�The number of “successes” in a certain time / area.X # of Bernoulli trials (p) until one success
�unit time, discrete life-time
�PMF
The number of successes in a certain time / area.
�Split the time into subintervals, each of which contains at most one happening.
,...2,1,)1()( 1 =−= − xppxp x
hnu
.ac.
kr Discret
�mean=1/p, var.=(1-p)/p • independent trials
• P(only one happens) = p remains constant
• P(more than one occur)=0
tp://
wol
fpa
ck.h e R
and
om V
ari
�Negative Binomial (r, p)�X=# of Bernoulli trials (p) until rth successes
( )
� In the Binomial, n (# of intervals)→∞ and p is very small; λ=np
�PMF210)(
−ex λλ
htt ia
ble
�X=# of Bernoulli trials (p) until rth successes
�PMF �mean= λ, var.= λ�Poisson approximation to the Binomial,...1,,)1(
11
)( +=−⎟⎟⎠
⎞⎜⎜⎝
⎛−
−+= rrxpprxr
xp xr
,...2,1,0,!
)( == xx
xp
)(),( npPpnB =→ λ�mean=r/p, var.=r(1-p)/p2
�Normal approximation to Binomial�continuity correction
)(),( npPpnB → λ
(14)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
)),(~|5.0()),(~|( npqnpNXaXPpnBXaXP −≥≈≥
Bayesian Inference
E i E i & HW#1& HW#1 1 1 d 0091009d 0091009Exercise Exercise & HW#1& HW#1--1 1 due 009.10.09due 009.10.09
�#1• Suppose that approximately 35% of all applications pp pp y pp
for jobs falsify the information on their application forms. Consider a company with 2300 employees. Let be the number of applications in the company that have been falsified.
∑ iX−= pr )1(λ
iXmin1=r
hnu
.ac.
kr Discret
⒜Find the mean and standard deviation of .
⒝Calculate the interval
⒞The company verify the forms of 2300 employees, and 249 forms contains falsified information. Does it
Poisson (λ)
∞→n
∑ X
Bernoulli (p)
∑ X
tp://
wol
fpa
ck.h e R
and
om V
ari
support the overall 35%?
�#2• Of the volunteers donating blood in a clinic, 80%
h th Rh f t t i th i bl d
∑ iX
Binomial (n,p)∞→
=n
npλ∑ iX1=n
= npμ →= λσ
htt ia
ble
have the Rhesus factor present in their blood.
⑴If 5 of volunteers are randomly selected, what is the probability that at least one does not have the Rhesus factor?
⑵ h t i th b bilit th t t t f h th ∞→
=
NNKp ∞→
=
=
nnpq
np
σ
μ ∞→n
⑵ what is the probability that at most four have the Rhesus factor?
⑶What is the smallest number of volunteers who should be selected if we want to be at least 90% ce tain that e obtain at least fi e dono s ith the
HG (N,K,n)
(15)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
certain that we obtain at least five donors with the Rhesus factor?
Bayesian Inference
E i (2) & HW#1E i (2) & HW#1 1 d 00910091 d 0091009Exercise (2) & HW#1Exercise (2) & HW#1--1 due 009.10.091 due 009.10.09
�#3• Suppose that the prob. of engine malfunction during
#6• A geological study indicates that an exploratory oil pp p g g
any 1-hour operation is 0.02. Find the probability that a given engine will survive 2 hours.
�#4Th b bilit f t i l t b k
g g y p ywell should strike oil with prob. 0.2.
⑴Find the prob. that the first strike comes on the third well drilled.
⑵Find the prob. that the third strike comes on the
hnu
.ac.
kr Discret
• The probability of a customer arrival at a bank service desk in any 1-sec. is equal to 0.1. Assume the customers visit the bank randomly and independently.
⑴Find the prob that the first arrival will occur
⑵ d e p o a e d e co e o e seventh well drilled.
⑶What is the mean and variance of the number wells that must be drilled if the company wants to set up three producing wells?
tp://
wol
fpa
ck.h e R
and
om V
ari
⑴Find the prob. that the first arrival will occur during the third 1-sec.
⑵Find the prob. that the first arrival will not occur until at least the third 1-sec interval.
⑶H l d t t h th fi t t
p p g
#7• Ten percent of the engines manufactured are
defective. If engines are randomly selected one at a ti d t t d Wh t i th b th tht
t iab
le
⑶How long do you expect to have the first customer at a bank?
�#5• Show that P(X>a+b|X>a)=P(X>b) when X~G(p).
time and tested. What is the prob. that
⑴the first non-defective engine will be found on the second trial?
⑵the third non-defective engine will be found on the fifth t i l?
Show that P(X a b|X a) P(X b) when X G(p).
• This is the memory-less property of the geometric distribution.
fifth trial?
⑶the third non-defective engine will be found on or before the fifth trial?
(16)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
E i (3) & HW#1E i (3) & HW#1 1 d 00910091 d 0091009Exercise (3) & HW#1Exercise (3) & HW#1--1 due 009.10.091 due 009.10.09
�#8• A product is shipped in lots of twenty. We test each l
�#11• The number of accidents in a certain street is p pp y
ot with the sample of size 5. If there is no defective, the lot will be shipped to the customer. If a lot contains four defectives,
⒜what is the prob. that it will be rejected?
observed to average 3 per month. During the last month 6 accidents occurred. Does it indicate an increase in the mean?
hnu
.ac.
kr Discret
⒝what is the expected number of defectives in the sample of size 5?
�#9h b f ( )
�#12• Suppose the number of visitors to a museum per
hour is distributed as a Poisson with λ=2.
⒜F d h b h h l
tp://
wol
fpa
ck.h e R
and
om V
ari
• We want to estimate the number of animals(N) in a certain area. M animals are captured and tagged. Some time later, n animals are captured, and X, the number of ragged animals among the n animals, are noted Suppose M=4 and n=3
⒜Find the prob. that there are exactly 3 visitors between 9:00 to 9:30
⒝Find the prob. that there is no visitors between 9:00 to 10:30
⒞htt ia
ble
e noted. Suppose M=4 and n=3.
⑴Find P(X=1) as a function of N.
⑵What value of N will maximize P(X=1)?
⒞How many visitors do you expect in a day when the museum opens for 8 hours?
�#10• Animals live in a prairie with the mean population
density of approximately 5 per acre. If we check 10 acres randomly, what the prob. that we can see
(17)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
acres randomly, what the prob. that we can see none of animals?
Bayesian Inference
J i t R d V i blJ i t R d V i blJoint Random VariablesJoint Random Variables
�Joint PDF of two discrete r. v. (X,Y) �Independent� Ù X and Y independent)()( YXP
�marginal PDF of Y
X and Y independent
�When X and Y are independent,
),(),( yYxXPyxp ===
∑ ===x
Y yYxXPyp ),()(
hnu
.ac.
kr Discret
�conditional PDF of X given Y=y
�Var. of Sum of (X, Y) )(),()|(ypyxpyxp
y=
0),()()|(
==YXCOVxpyxp
tp://
wol
fpa
ck.h e R
and
om V
ari
�Expected value of h(x, y)
h th t E(X Y) E(X) E(Y)
( )
∑∑=x y
yxpyxhyxhE ),(),()),((),(2)()()( YXCOVYVXVYXV ±+=±
htt ia
ble
� show that E(X+Y)=E(X)+E(Y)
�Covariance of (X, Y)
y1 y2 … yn fX(x)
x1 f(x1,y1) f(x1,y2) … f(x1,yn) f(x1)
x2 f(x2,y1) f(x2,y2) … f(x2,yn) f(x2)
))())(((),( YEYXEXEYXCOV −−= … … … … …
xm f(xm,y1) f(xm,y2) … f(xm,yn) f(xm)
f (y) f(y ) f(y ) f(y )Margi
(18)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
fY(y) f(y1) f(y2) … f(yn) nal
Bayesian Inference
HW#1HW#1 2 d 0091009 2 d 0091009 HW#1HW#1--2 due 009.10.09 2 due 009.10.09
#1 #3
hnu
.ac.
kr Discret
tp://
wol
fpa
ck.h e R
and
om V
ari
#2 #4
htt ia
ble
¾no cover sheet 4 digit pass code on the top of the first page
(19)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
¾no cover sheet, 4 digit pass code on the top of the first page
¾fold the homework in half and write your name on the outside
Bayesian Inference
C tC tConceptConcept
�Notation�θ: random variable for parameter, never be
�Bayesian Paradigm• Frequentist: estimate θ with the model of dataθ: random variable for parameter, never be
observed, π(θ); prior prob. function
�Y: random variable, observed, random sample, f(y; θ)
i b li f (i f ) t θ i
q
•Bayesian: prob. model for θ and data
�3 steps for Bayesian• Set up for π(θ) and f(y;θ)
C l l t th t i (θ| )
hnu
.ac.
kr Ba
yesia
� revise our belief (inference) on parameter θ given the observed values of Y
�Bayesian universe
•Calculate the posterior π(θ|y)
•Model diagnosis with π(θ|y)
�Posterior PDF
tp://
wol
fpa
ck.h a
n Inference 4 D
Bayesian universe� (θi, yj) Ù (unobservable, observed)
�Posterior PDF
π(θ) y1 y2.. yJ
Bayesuniverse
)|()()|()(
),(),(
)|(
iiji
iji
j
jiji yf
yfyfyf
yθθπθθπθ
θπ∑
=−
=
htt D
iscrete Ra
ndom
�prior: π(θ), guess from past information before collecting data θi
θ1θ2 f(θI,yj)
)|()( iji
iyf θθπ∝
m V
aria
ble
� joint of (θ, y)
collecting data, θi
� likelihood function: f(yj| θi), reduced to θi universe
�posterior? update the belief by observed value
θIby guess
Reduced universe
(20)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
joint of (θ, y))|()(),( ijiji yfyf θθπθ =
Bayesian Inference
E lE lExampleExample
�Example�There is 5 balls in a urn and the number of red
�Calculation Table�a sample of size one, y=1There is 5 balls in a urn and the number of red
balls is of our interest. θ=0, 1, 2, …, 5
� One ball is randomly drawn, which is an observation. y=red(1) or not(0)
Wh t i th t i ?
a sample of size one, y 1
θi prior likelihoodprior*
likelihoodposterior
0 1/6 0 0 0
hnu
.ac.
kr Ba
yesia
�What is the posterior?
∑ ××
=
iLikelihoodiorLikelihoodiorPosterior
PrPr
1 1/6 1/5 1/30 1/152 1/6 2/5 1/15 2/153 1/6 3/5 1/10 1/5 4 1/6 4/5 2/15 4/155 1/6 1 1/6 1/3
tp://
wol
fpa
ck.h a
n Inference 4 D
M h b
i 5 1/6 1 1/6 1/3 sum 1/2 1
)|()()|( ijiji yfy θθπθπ ∝
htt D
iscrete Ra
ndom
�More than one observation: y1, y2, y3, … �keep drawing one ball w/o replacement
�all together in a time
ff )|()()( θθθ
�a sample of size two, y1=1, y2=0
θιprior
Posterior*likelihood
y=0prior*
likelihoodposterior
0 0 ? 0 0
m V
aria
ble� sequentially one by one
∑=
−=
iiiiii
i yyfyyf
yyfyyfyy
)|,()()|,()(
),,(),,(),|(
2121
2121
21 θθπθθπθθπ
)1(
0 0 ? 0 0 1 1/15 1 1/15 1/102 2/15 3/4 1/10 3/203 1/5 1/2 1/10 3/204 4/15 1/4 1/15 1/10
(21)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
)|()|(),|( 2)1(
121 yyyy iii θπθπθπ ×=4 4/15 1/4 1/15 1/105 1/3 1 1/3 1/2
Sum 2/3 1
Bayesian Inference
M d F PDFM d F PDFMore and Famous PDFMore and Famous PDF
� sequential �Binomial � In n Bernoulli trials (p), suppose Y is the number
ilikelihood prior*
t iIn n Bernoulli trials (p), suppose Y is the number of successes. Then Y| p ~ B(n, p)
�Likelihood: f(Y|pj)
�Selection of Prior for p
θι priorlikelihoody1=1,y2=0
priorlikelihood
posterior
0 1/6 0 0 0 1 1/6 1/5 1/30 1/5 2 1/6 2/7 1/20 3/103 1/6 2/7 1/20 3/10
hnu
.ac.
kr Ba
yesia
�discrete prior p1, p2, …, pI
•prior (pi) * conditional(y| pi) =joint (y, pi)
• example Y=3 observed when n=4
3 1/6 2/7 1/20 3/104 1/6 1/5 1/30 1/5 5 1/6 1/4 0 0
sum 1/6 1
tp://
wol
fpa
ck.h a
n Inference 4 D
�Benefit of Bayesian�better understanding on for C bounds for θ• confident interval vs credible interval
• example: Y=3 observed when n=4
p priorLikelihoodY=3|p
Jointprior*L
posterior
0.4 1/3 0.154 2/39 0.205 0 5 1/3 0 250 1/12 0 334
htt D
iscrete Ra
ndom
confident interval vs. credible interval
�based on likelihood principle• 3 heads are observed in 12 coin tossing to test θ=0.5
•P(Y=3: Y ~ Binomial (12, θ)) vs. P(Y=3: Y ~ NB (3, θ)) • constant = 1/4
0.5 1/3 0.250 1/12 0.334 0.6 1/3 0.346 3/26 0.461
sum 1/4 1
)|()()|( ijiji yfy θθπθπ ∝ m V
aria
ble
• In Frequentist, different results
• In Bayesian, same likelihood function
�easily updated with extra data
� large sampling theory not needed in Bayesian
�Conjugate prior: the prior which has the same kind of pdf for the posterior•Beta(α, β) prior for the p
)|()()|( ijiji yfy
(22)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
large sampling theory not needed in Bayesian
�Special case: no information • Show that the posterior is Beta(y+α, n-y+β)
Bayesian Inference
F PDF (2)F PDF (2)Famous PDF (2)Famous PDF (2)
�Bayesian Theorem Property�Bayesian Results does not change by
�Poisson Distribution�Discrete PriorBayesian Results does not change by
•multiplying the prior by a constant
• multiplying the likelihood by a constant
Discrete Prior
)|()()|( ijiji yfy θθπθπ ∝
λ priorLikelihoodY=2|λ
Jointprior*L
posterior
1 1/6 0.184 0.0307 0.124 1 5 1/3 0 251 0 0837 0 338
hnu
.ac.
kr Ba
yesia
�Non-information prior�when there is no information about θ
ll k i f ti i
)|()()|( ijiji yfy1.5 1/3 0.251 0.0837 0.338 2 1/3 0.271 0.0902 0.365 2.5 1/6 0.257 0.0428 0.173
sum 0.2473 1
tp://
wol
fpa
ck.h a
n Inference 4 D
�well-known non-informative prior• For 0≤ θ ≤1, uniform (0,1)
• For 0 ≤ θ, Gamma (α, β)
• For - ∞ ≤ θ ≤ ∞, normal (μ, σ-> ∞)�Conjugate prior• π(λ)~ Gamma (α, β)
htt D
iscrete Ra
ndom
( ) ( β)
• Then, posterior is Gamma(α+y, β/(β+1))
m V
aria
ble
(23)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
E iE iExerciseExercise
�#1•when Y=7
�#2
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference 4 D
htt D
iscrete Ra
ndom
• slide 22 & 23
m V
aria
ble
(24)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
C ti R d V i blC ti R d V i blContinuous Random VariableContinuous Random Variable
�Continuous Random Variable�X(w)=x; real valued function
�Exponential Family))()(exp()()();( xTCBxAxf θθθ =X(w) x; real valued function
�The domain w is infinite.
�P(X=x)=0; probability=area
� It is not easy to derive PDF
�Sampling dist. of Sum(T(x)) is also exponential family
))()(exp()()();( xTCBxAxf θθθ
hnu
.ac.
kr Contin
•Ù Histogram + Polygon
�CDF
�Relationship�Standard Normal and Chi-Square
∫=≤=x
X dxxfxXPxF )()()( )1(~ 22 =dfZ χ
tp://
wol
fpa
ck.h u
ous R
and
om V
�Expected value �T-Dist.
∞−
∫= dxxfxuxuE )()())(()(~
/)(2kdfT
kkdf
Z=
=χ
htt V
aria
ble�Convergence to X in Probability
�WLLN
�F-Dist.
1)|(|lim =<−∞→
εXXP nn
),(~/)(
/)(2
2baF
bbdfaadf
=
=
χ
χ ),1(~)( 2 bFbT
Z�Convergence to F(x) in Distribution�CLT
�Cauchy Dist.
�Log normal )()(lim xFxF nX =
CauchyZZ ~21
(26)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
)()(nXn ∞→ NormalLogNormal −~)exp(
Bayesian Inference
C ti C ti R d V i blR d V i blContinuous Continuous Random VariableRandom Variable
�Gamma Dist. and Poisson Dist.
))(|1()(1 PXXPF λ≤
�Gamma, Chi-square, and Beta
))(~|1()(1 )/1,( wPXXPwFG λαλβα −≤=− =
2 ),( βαG
hnu
.ac.
kr Contin�Exponential(memory-less) and Weibull
)(~)2,2/( 2 rrG χβα == ),(~),(),(
),( λαβλβα
βα BGG
G+
tp://
wol
fpa
ck.h u
ous R
and
om V�Sampling Dist of Sum of Random Sample
)exp(~),1( ββα =G ),(~)][exp( /1 βγβ γ Weibull
htt V
aria
ble
�Sampling Dist. of Sum of Random Sample•additivity: Poisson, Chi-Square, Normal
• Exponential, Geometric
•Multiplication: Log-normal
(27)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
S i l ThiS i l ThiSpecial ThingsSpecial Things
�Moment Generating Function�Uniqueness of MGF
�Joint continuous and discrete�Hierarchical and Mixture model)()( tx
X eEtM =Uniqueness of MGF
�Empirical Rule
Hierarchical and Mixture model
�Mean and Variance of Y
)()(X
)()0()( kkX XEtM ==
)()()()( xFyFtMtM XYYX =⇔=),(~
),(~|βαBetap
pnBinomialpY
hnu
.ac.
kr Contin
p• 68%, 95%, 99%, 6-sigma, 2ppm
�Mean and Variance of Y
))|(())|(()())|(()(
XYEVXYVEYVXYEEYE
+==
tp://
wol
fpa
ck.h u
ous R
and
om V
�Independency and Correlation� (X, Y) are independent Ù f(x,y)=f(x)g(y)
htt V
aria
ble
�Chebychev’s Inequality
�상관계수 ρ=±1 Ù There is a≠0 s.t. P(Y=aX+b)=1
�When (X,Y) are bivariate normal, (X,Y) are independent Ù COV(X,Y)=0
211)|(|k
kXP −≥≤− σμ
(28)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian InferenceP iP iPriorsPriors
�Unobservable Parameter�population proportion, p
�Conjugate prior• same PDF family
),(~)( baBetapπpopulation proportion, p�prior:
�Observed Random Sample
• exponential family only• Posterior
)( pπ),(~)|( ynbyaBetayp −++π
hnu.
ac.k
rBayesia
p� In n Bernoulli trials, let Y be the number of successes.�Likelihood )|,,,( 21 pyyyf nK
tp://
wol
fpac
k.h an Inference for
�Calculate Posterior�∝ Prior*Likelihood a=b=5
htt r Binom
ial
�Choosing Prior�Uniform prior
( b) b th h t)1,0(~)( Uniformpπ• no information• Posterior
• (a, b) are both shape parameter),0()( Unifo mpπ
)1(1)|( )1,0( −⎟⎟⎠
⎞⎜⎜⎝
⎛∝ −pp
yn
yp ynypπ
(29)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
)1,1(~ +−+⎠⎝ynyBetay
Bayesian InferenceP i (2)P i (2)Priors (2)Priors (2)�JeffreysPrior
• Invariance property for prior�Choosing Prior�Before looking at data
• Uniform prior is not invariant• Invariant Prior for the parameter θ
Before looking at data�Conjugate prior is recommended even when vague; referring the shape of Beta PDF• by matching the location and scale; solving the following two equations
ψθψπθψθπθddhh ))((~)()(~ 1−=⇒
hnu.
ac.k
rBayesia
• Invariant Prior for the parameter, θ
�Fisher Information I(θ)
following two equations
)()( θθπ I∝)(
),()(
0 +=
∝
baapmean
baBetapπ
tp://
wol
fpac
k.h an Inference for
• Score statistics
• E(U)=0, V(U)=Fisher Information�Preparation
θθθ
dyLdU );(log)( =
2
)1()1(
)1()()( 00
20 ++−
=+++
=bapp
babaabstd σ
htt r Binom
ial
• In binomial proportion case (show that)
�Preparation• Graph your guess for Beta(a, b)• Calculate the equivalent sample size of the prior.• variance of sample proportion = prior variance
));(log(]));(log[()( 2
22
θ
θθ
θθd
yLdEdyLdEI −==
)2/1,2/1(~)( Betaθπ
1)1()(
)1()()ˆ( 200
++=⇒+++
=−
==
banbaba
abnpp
nyVpV
eq
eq
(30)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
1++⇒ baneq
Bayesian InferenceE l i E l i Example: priors Example: priors
�Example�want to know the proportion of residents who are
�priors want to know the proportion of residents who are in favor of a new building�observed: n=100, y=26 (favor)
3 different priorsBart
hnu.
ac.k
rBayesia
�Bart: conjugate prior, Beta(a, b)• mean=0.2, std=0.08=> a=4.8, b=19.2• equivalent sample size n=25Anna no information Uniform prior Beta(1 1)
Chris
tp://
wol
fpac
k.h an Inference for
�Anna: no information, Uniform prior = Beta(1, 1)• equivalent sample size n=3�Chris
• discrete weight, not beta prior
p weight0 00.05 10 1 2
Anna
htt r Binom
ial
• sum of weights shouldn’t be 1. 0.1 20.3 20.4 10.5 0
⎪⎧ ≤≤ 1.00,20 pp
⎪⎩
⎪⎨
≤≤−≤≤=
5.03.0,1053.01.0,2.0)(pp
pp
π
(31)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian InferenceP t iP t iPosteriorsPosteriors�Posterior
• Anna ~ Beta(4.8+26, 19.2+74) Bart• Bart ~ Beta(1+26, 1+74)• Chris ~ Beta in the interval
Bart
hnu.
ac.k
rBayesia
Chris
tp://
wol
fpac
k.h an Inference for
Anna
htt r Binom
ial
(32)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian InferenceW i W i Wrapping up Wrapping up �Effect of the Prior
• more data, less effect of our priors (in the previous �Estimating the proportion�MSE (mean squared error) of Estimator
example, n=100) Ù The data is said to swamp the prior.
�Summarizing the Posterior, Beta (α, β)ot co fide ce i te al c edible i te als h ?
MSE (mean squared error) of Estimator
�Bayesian Estimator?
)ˆ()ˆ()ˆ( pBpVpMSE +=
2ˆˆ
hnu.
ac.k
rBayesia
�not confidence interval, credible intervals; why? exact PDF�measure of location
• posterior mode
• squared error loss function• absolute error loss function• Bayesian Risk : Expectation of Loss function
θθπθθθθ dRRiskBayes )()ˆ()ˆ( ∫=
2)ˆ()ˆ,( θθθθ −=L|ˆ|)ˆ,( θθθθ −=L
tp://
wol
fpac
k.h an Inference for
• posterior median• posterior mean
• Posterior mean for the squared error loss function P d f h b l l f
θθπθθθ
θθπθθθθ
ddyyfL
dRRiskBayes
)()|()ˆ,(
)(),(),(
∫ ∫=
∫=
htt r Binom
ial
�measure of spread• variance, std• percentiles
• Posterior median for the absolute error loss fn.(Theorem 10.3.2, G. Casella and R. Berger, Statistical inference, 1st edition)
∫×=⇔∞−
− kdypkIk
πππβα )|(100),(1
(33)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian InferenceW i (2)W i (2)Wrapping up (2)Wrapping up (2)�Bayesian Credible Interval
• Using percentiles of Posterior�Exercise 2
)|( ypπ• not confidence intervals
hnu.
ac.k
rBayesia
�Exercise 1
tp://
wol
fpac
k.h an Inference for
• do the same things with n=87 and y=6.
htt r Binom
ial
• do the same things with non-information prior *) how to do numerically calculation
CredibleType Prior Posterior Mean Median STD IOR
Lower Upper
Anna No prior Beta(27, 75) 0.265 0.263 0.043 0.059 0.188 0.354
Bart Beta(4.8, 19.2) Beta(30.8, 93.2) 0.248 0.247 0.039 .0.53 0.177 0.328
(34)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Chris Discrete Weight Numerical* 0.261 0.255 0.041 0.057 0.181 0.340
Bayesian InferenceC i B i d F ti tC i B i d F ti tComparing Bayesian and FrequentistComparing Bayesian and Frequentist
�Using PDF?�Bayesian: Posterior of p
�Comparing Estimators�Frequentist yp̂Bayesian: Posterior of p
�Frequentist: sampling dist. of the estimator
�Likelihood function� for updating the prior
Frequentist
nppBpVpMSE
np
ff
f
)1(0)ˆ()ˆ( −==+=
=
hnu.
ac.k
rBayesia
p g p� for calculating the more likely value of parameter
�Frequentist (pre-data analysis)
�Bayesian• conjugate prior; Beta(a, b)• observed y out of n • posterior; Beta (a+y b+n-y)
tp://
wol
fpac
k.h an Inference for
q (p y )�Precision of Estimator: MSE
posterior; Beta (a+y, b+n y)
)ˆ()ˆ())ˆ(ˆ()ˆ( 22 θθθθθ VBEEMSE +=−=
2
ˆban
aban
ybanaypb ++
+++
=++
+=
htt r Binom
ial
�Find MVUE (point estimator)• RCRB gives the lower bound for the variance of estimator.
• Find MLE (Ù CSS) by Factorization Theorem or
22
2
)()1()1(
)ˆ()ˆ()ˆ(
banbpapapnp
ban
pBpVpMSE bbb
++−−
+−++
=
+=
exponential family, more likely parameter value what if this actual data is obtained
• R-B Theorem: V(E(UE|SS)) ≤ V(UE)�Calculate CI with Sampling distribution of MVUE
(35)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
p g• all possible random variable
Bayesian InferenceC i B i d F ti t (2)C i B i d F ti t (2)Comparing Bayesian and Frequentist (2)Comparing Bayesian and Frequentist (2)�Example�Frequentist: solid black line�no information, Uniform = Beta(1,1) prior: blue line�conjugate prior: Beta (2,3); dot black
• prior is rightly skewed, that is why MSE(Bayesian) is the smallest when the parameter θ is small
hnu.
ac.k
rBayesia
the smallest when the parameter θ is small
tp://
wol
fpac
k.h an Inference for
htt r Binom
ial
(36)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian InferenceC i B i d F ti t (3)C i B i d F ti t (3)Comparing Bayesian and Frequentist (3)Comparing Bayesian and Frequentist (3)
�Interval estimation�Frequentist: confidence interval
�Hypothesis Testing�statistical hypothesis: null, alternativeFrequentist: confidence interval
• pre-data concept• using the sampling dist. of the best estimator
statistical hypothesis: null, alternative� the Best Rejection Region by Neyman-Lemma Lemma(LR): UMPT�LR Test: function of SS, -2ln(LR) ~ approximately Chi S)ˆ(ˆ 1 θθ sF −±
hnu.
ac.k
rBayesia
• or large sample theoryChi-Square�power function: significant level, power
�Frequentist
)(2/1 θθ α sF −±
)1,0(~)ˆ()ˆ(ˆN
sEθθθ −
)ˆ1(ˆ
tp://
wol
fpac
k.h an Inference for
• previous example: n=100, y=26, 95% confidence interval = (0 174 0 346)
eq e• One-to-one relationship between CI and Testing• “by chance” or sig. moved.• proof by contradiction.N ll h th i i lid?
nppzp )1(ˆ 2/
−± α
θ
htt r Binom
ial
interval = (0.174, 0.346)
�Bayesian: credible bound
• Null hypothesis remains valid?• (1) Set up Hypotheses.• (2) Calculate the test statistic.• (4) Calculate the p-value. θ̂• post-data
• using the posterior dist of θ, π(θ|y)
• previous example: see the slide of page 34
• (5) Get the conclusion
�Bayesianid d i t i lik F ti t
θ
(37)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
p e o s e a p e: see t e s de o page 34 • one-sided: using posterior, like Frequentist• two-sided: using Credible Interval
Bayesian InferenceC i B i d F ti t C i B i d F ti t (4)(4)Comparing Bayesian and Frequentist Comparing Bayesian and Frequentist (4)(4)
�Example in One-sided Hypothesis Testing�Better Treatment effect? old p=0.6, New
�Example in Two-sided Hypothesis Testing�A coin is fair? n=15, y=# of heads (10)Better Treatment effect? old p 0.6, New
treatment is effective? 8 patients are cured out of random 10 patients. (sig. level=10%)
F ti t
A coin is fair? n 15, y # of heads (10)
�Frequentist • Hypothesis: null: p=0.5, alternative: p≠0.5
hnu.
ac.k
rBayesia
�Frequentist • Hypothesis: null: p=0.6, alternative: p>0.6• Test statistic: y=# of successes ~ sampling distribution Binomial(n=10, p=0.6)
• Test statistic: y=# of successes ~ sampling distribution Binomial(n=15, p=0.5)
• p-value=P(y≥10|y~B(15,0.5)) + P(y≤5|y~B(15,0.5)) =0.302
tp://
wol
fpac
k.h an Inference for
• p-value=P(y≥8|y~B(10,0.6))=0.1672• can’t reject Ho
�Bayesian
• can’t reject Ho
�Bayesian• Using Credible Interval why? Pr(p )=0ht
t r Binomial
Bayesian• uniform prior• Posterior is Beta(9,3).
• Using Credible Interval. why? Pr(p0)=0• Posterior using Uniform prior: Beta(11,6)• using normal approximation (0.426, 0.868)
0∫p
1189.0)|(0
=∫=− dpypvaluep π
(38)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
P i f P i (P i f P i ( ))Priors for Poisson(Priors for Poisson(μμ))
�Parameter: mean = μ
�Prior
�Conjugate prior• same PDF family with likelihood function)(μπ�Prior
�Likelihood�a single observation
• likelihood ~ Gamma, Then, Prior ~ Gamma)(μπ
)|,,,( 21 μnyyyf … ),(~)( βαμπ Gamma
)|( efy μμ −
hnu
.ac.
kr Ba
yesia
�Posterior�∝ Prior*Likelihood
μμπ nyyyf×∝ )|,,,()( 21 …
!)|(
yyf μμ =
)/1,( βθα ==k
tp://
wol
fpa
ck.h a
n Inference for
�Choosing Prior�Uniform prior (0, ∞) ),0(~)( ∞Uniformμπ
μμμπ ny ei −∑∝ )(
htt r P
oisson
• no information, improper
• Posterior
� Jefferys’ prior (show that)•α = shape, β = scale parameter
/β i /β2
),1(~
1)|( ),0(nyGamma
ey
i
nyi
+∑
∝ −∑∞
μμ μμπ
� Jefferys prior (show that)• improper
• mean = α/β, variance = α/β2
�Posterior
μμμπ 1)()( =∝ I
1)|( ey nyi∝ −∑ μμμ
μπ),()|(
1 μμ
μβαμπμμβα
μ
∝
∝−∑−−
−∑
ee
eGammayny
ny
i
i
(39)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
μ),5.0(~ nyGamma i +∑ ),(~ βα
μμ++∑ nyGamma
ee
i
Bayesian Inference
P iP iPriorsPriors
�Choosing Prior�Conjugate prior is recommended. why? Posterior
�Example�number of traffic accident per week = μConjugate prior is recommended. why? Posterior
has the same pdf with the prior
�With prior information on mean and std
number of traffic accident per week μ�observed data
• count the number of accidents for 8 weeks
�4 priors)()( αα d
hnu
.ac.
kr Ba
yesia
• Aretha: no prior ~ Uniform (0, ∞)
• Byron ~Jeffery prior
2
20
20
200
,
)(,)(
αβ
β
ασβα
mm
stdmmean
===>
== 0,1)( >= μμπ
0,1)( >= μμ
μπ
tp://
wol
fpa
ck.h a
n Inference for
�Warning• Graph your prior
• Chase ~ his belief, mean=2.5 std=1 ~ Gamma(2.5, 6.25), equivalent sample size=2.5 <- acceptable
• Diana ~ subjective weight
20
20 σσ
htt r P
oisson
• check the equivalent sample size of your prior
• prior variance = variance of sample mean
• Diana ~ subjective weight
βα==>= eqn
m2
0
• the equivalent sample size is too large? then, you put too much information relative to the amount you will get from the data
ββ
eqeqn 2
(40)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
you will get from the data
Bayesian Inference
P t iP t iPosteriorPosterior
�Priors �Posteriors�Aretha ~ Gamma(27, 8)
),(~)|( ** βαμπ GammayAretha Gamma(27, 8)
�Byron ~ Gamma(26.5, 8)
�Chase ~ Gamma(32.25, 10.25)
�Dianna ~ Numerically
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference for
htt r P
oisson�observed data• 3, 2, 0, 8, 2, 4, 6, 1
• y=26 n=8• y=26, n=8
• very similar, doesn’t depend on the prior much.
(41)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
S i i th P t i & I f S i i th P t i & I f Summarizing the Posterior & Inference Summarizing the Posterior & Inference
�Bayesian estimator�posterior mean or median
�when n=6, MSE by π),(~)|( ** βαμπ Gammayposterior mean or median
*
*
*
* 1,ˆβ
β
β
αμ −=b
0.5
hnu
.ac.
kr Ba
yesia
0.7
tp://
wol
fpa
ck.h a
n Inference for
�Inference on the parameter μ�point estimator and their PMS �Credible Interval β
αμ++∑=
ny
bˆ
htt r P
oisson
�point estimator and their PMS Credible Interval �using
�non-symmetric, but α*100% for both side
β+n
2
)ˆ()ˆ(
)ˆ()|ˆ()|ˆ(
nVB
yVyPMS bpbb−
>
−+=
μβμα
μμμμ),(~)|( ** βαμπ Gammay
�Frequentist
2)()(,)(
nV
nB bb
+=
+==>
β
μμβ
βμμ
yMSE fμμ =∑= )ˆ(
(42)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
nnMSE fμ == )(
Bayesian Inference
S i i th P t i & S i i th P t i & I f (2)I f (2)Summarizing the Posterior & Summarizing the Posterior & Inference (2)Inference (2)
�One-sided testing�null: μ=μ0
�Midterm Exam on 2009.10.30� It will be taken by take-home exam.null: μ μ0
�alternative μ>μ0 or μ=μ0
�p-value
It will be taken by take home exam.
�Problems will be handed out in class.
�The exam should be taken by yourself.
�On the top of your answer sheet, your honesty Gammay∫ ),(~)|( ** βαμπ
hnu
.ac.
kr Ba
yesia
�when μ0=3, observed y=26, n=8
signature should be appeared besides your name.
�Your answer sheet should be turned in Department office of Statistics by noon on 2009.11.2(Monday). The point for the late turn-in
hypothesis null in space parameter the underGammay∫ ),()|( βαμπ
tp://
wol
fpa
ck.h a
n Inference for
will be forfeited.
htt r P
oisson
�Two-sided testing�using credible interval
(43)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
Di t P i f N l M (Di t P i f N l M ( ))Discrete Prior for Normal Mean(Discrete Prior for Normal Mean(μμ))
�Widely used? “average out” effect
�Discrete Prior
�using MVUE (sample mean)
�y1= 3.2, y2=2.2, y3=3.6, y4=4.1 22
2)(
expσ
μ−−∑ iy
�Discrete Prior�population std σ is known
�equally likely or subjective weights
y1 y2 y3 y4
• slightly different? round off error
2)( μ−−y
22
2)(
exp
p
σ
μ−−y
hnu
.ac.
kr Ba
yesia�Example
22)()|()(~)|(
σμπμμπμπ ∝ eyly
tp://
wol
fpa
ck.h a
n Inference for
p�Set a subjective prior and a guessing value of σ.
�Calculate the likelihood value• using a standard normal table
F
)(σμ−
Φy
htt r N
orma
l Mea
n
• using normal PDF
�Get the Posterior
� single observation: y = 3.2
22
2)(
expσ
μ−−y
likelihoodior×Pr
nsingle observation: y 3.2
�See the left for a subjective prior and the result.
�More than one observation� sequentially one at a time
(44)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
q y
�all together
Bayesian Inference
C ti P iC ti P iContinuous PriorsContinuous Priors
�Continuous Prior�choose a prior which is similar to likelihood
�How to choose a Normal prior�guess a mean (m) firstchoose a prior which is similar to likelihood
�why?
guess a mean (m) first
� then, choose a std(s) with empirical rule
�use the “equivalent sample size”• large neq : the prior influence too much.
likelihoodiorPosterior ×∝ Pr
yofNormalLikelihoodNormaly
)()()|(
μπμπμπ
∝∝
hnu
.ac.
kr Ba
yesia
�Flat�no information ~ Uniform(-∞, ∞) �Subjective
� interpolate and numerical integration
yofNormal )(μπ∝
2σ
tp://
wol
fpa
ck.h a
n Inference for
�Posterior ~
� refer: how to calculate integral in R),()()|(n
yNormaly σμπμπ ×∝
),(2
nyNormal σ
),/1/
/
/1/
/1( 22
22
22
2
22
2
nsysnnmsnsNorma
++++ σ
σ
σ
σ
σ
htt r N
orma
l Mea
n
�conjugate prior: π(μ) ~ Normal (m, s) �Example�mean length of one-year old rainbow trout
�previous study ~ Normal(s=2)22 σ n
�Posterior ~
�previous study ~ Normal(s=2)• Annie ~ Normal (30, 4)
• Barb ~ flat prior
• Chuck ~ trapezoidal 0, 18, 24~40(weight 1), 46
),(),()|( 2n
yNormalsmNormaly σμπ ×∝
//1 2222 sns σσ
(45)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
�observed data: n=12, ybar=32),/1/
/
/1/
/1( 222222 ns
sysn
nmsn
sNormal++
++ σ
σ
σ
σ
σ
Bayesian Inference
P t iP t iPosteriorPosterior
�Prior)4,30( 2Normal
1)( =μπ
),(
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference for
�Posterior
4640),46(6/14024,1
2418),18(6/1)(
<<−−=<<=
<<−=
μμμ
μμπ p
htt r N
orma
l Mea
nNormal )3265.0,96.31( n
ynumericallNormal )774.5,32(
),(2
(46)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
C dibl I t lC dibl I t lCredible IntervalCredible Interval
�Credible interval for μ�using the posterior
�example
using the posterior
�known variance• precision is the reciprocal of the variance
• posterior precision222111
+=
hnu
.ac.
kr Ba
yesia
• posterior mean
�Hypothesis testing�one-sided : H0: μ= μ0 vs. Ha: μ< μ0
222 σ′ ss
ysn
nmsn
sm 22
2
22
2
/1/
/
/1/
/1
++
+=′
σ
σ
σ**
tp://
wol
fpa
ck.h a
n Inference for
�unknown varianceti t l i
� two-sided
szm ′±′ 2/α
2
hypothesis null in space parameter the under
Gammay∫ ∞ ),(~)|( **0
βαμπμ
htt r N
orma
l Mea
n
• estimator: sample variance
• t-dist. is used
�Non-nominal prior
• using the credible interval2σ̂
stm ′±′ 2/α
n
• posterior is not normal.
• numerically calculate with the equal size for the both side.
(47)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
P di ti D itP di ti D itPredictive DensityPredictive Density
�For the next observation�marginalization process
� integral by du to get conditional pdf
)|()|()|( dff ∫marginalization process
� find the joint pdf of the next obs. and the parameter, given the random sample.
� the parameter is treated as nuisance. ),(
),(),(
),...,,|()|(),...,,|(
222
22211211
ssmmN
dsmNN
dyyygyfyyyyf
nn
nn
nnnn
′=+′=∝
∫∝
∫ ×= ++
σ
μσμ
μμμ
hnu
.ac.
kr Ba
yesia
� the predictive density is
P d
� same as the posterior PDF of the parameter μ�one of the advantage of Bayesian
)|( 1 yyf n+
tp://
wol
fpa
ck.h a
n Inference for
�Procedure� random sample �Exercise #1
),(~,,...,, 2121 σμ knowniidNormalyyyy nn +
htt r N
orma
l Mea
n
�given the parameter, on. pdf of random sample
)|,,...,,( 121 μ+nn yyyyf
n
� let g(u) ne the prior of u
),...,,|(),...,,,|(),...,,|,(
21211
211
nnn
nnyyygyyyyf
yyyyfμμ
μ×= +
+
(48)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
),...,,|(.}.){|( 211 nn yyygsryf μμ ×= +
Bayesian Inference
E iE iExerciseExercise
�#2 �#3
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference for
htt r N
orma
l Mea
nn
(49)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
C i ith F ti tC i ith F ti tComparing with FrequentistComparing with Frequentist
�Frequentist MVUE �Comparing MSE=B2+V�example2
�Bayesian estimator
example• estimate milk powder weight, μ (=1, on the label)
• machine produces μ =1015, σ=5, n=10
• Arnold prior ~ N(1000, 10)
B th i N( )
)/,(~ 2 nNyf σμμ =
ynmsb 22
2
22
2 //1+=
σμ
hnu
.ac.
kr Ba
yesia
�E(μb)
• Beth prior ~ N(1015, 7.5)
• Carol prior ~ flat Ù Frequentist
ysnsn
b 2222 /1//1/ ++ σσμ
)()( 22
2μσμ −= mE b
tp://
wol
fpa
ck.h a
n Inference for
�V(μb); clearly smaller than V(μf)
22 σ+ns
nsV b2
222
2)()( σμ =
htt r N
orma
l Mea
n
nnsb 22 )()(
σμ
+
n
• feasible area: 1015±3*5 => (1000, 1030)
(50)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
feasible area: 1015±3 5 (1000, 1030)
Bayesian Inference
C i ith F ti t (2)C i ith F ti t (2)Comparing with Frequentist (2)Comparing with Frequentist (2)
�Confidence and Credible Interval�FrequentistFrequentist
�Bayesian
nzy σα 2/±
szm ′±′ 2/α
hnu
.ac.
kr Ba
yesia• unknown σ2 using s2 and t distribution
22211
σ
nss
+=′
ysn
nmsn
sm 22
2
22
2
/1/
/
/1/
/1
++
+=′
σ
σ
σ
tp://
wol
fpa
ck.h a
n Inference for
• unknown σ2: using s2 and t-distribution
• Bayesian with flat prior = Frequentist
�Testing Hypothesis
htt r N
orma
l Mea
n
Testing Hypothesis�One sided: H0: μ=μ0, Ha: μ>μ0
• Sampling Dist. )','(~),,(~ 0 smNn
Ny bμσμ
n
• Calculate p-value
• conclude
∫=
−≥=
0)|(
)/
( 0
μμμπ
σμ
dyp
nyzPp
b
f
(51)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
�Two-sided: using credible interval
∞−
Bayesian Inference
T l ti M diff (T l ti M diff ( ))Two population Mean difference (Two population Mean difference (μμ11--μμ22))
�Independent sample �Paired sample
모집단1x~f(x;θ1)
모집단2y~g(x;θ2) 모집단
),( 22 σμNormal),( 11 σμNormal ),( 22
2121 σσμμμ +−=dNormal
hnu
.ac.
kr Ba
yesia
표본
x f(x;θ1)
Random sample (확률표본), Independently and identically
y g(x;θ2)
표본1 표본2
모집단
),( 22 σμNormal),( 11 σμNormal
tp://
wol
fpa
ck.h a
n Inference for
),,,(121 nxxx …
통계량 (추정치) 계산 및 그 분포 판단
),,,(221 nyyy …
tN l
표본1x~f(x;θ1)
표본2y~f(y;θ2)
htt r M
ean D
ifferen
�equal variance
�population assumption
nkno n sigma
NormalpptNormalyx
~ˆˆ,~
21−− )(),( iiiii yxDyx −=⇒
통계량 (추정치) 계산 및 그 분포판단
표본
nce�unknown sigma
� large sample theoryX모수 parameter
•평균 차이 μ1-μ2
•비율 차이 p1-p2X모수 parameter
•평균 μd=(μ1-μ2)
tNormalD ,~
(52)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
평균 μd (μ1 μ2)
Bayesian Inference
I d d t RS f t l l ti (k l i )I d d t RS f t l l ti (k l i )Independent R.S. from two normal populations (known equal variance)Independent R.S. from two normal populations (known equal variance)
�Equal variance (σ1=σ2=σ) known�with normal or flat prior 22
2' isσwith normal or flat prior22
2)(ii
ii
sn
ss+
=σ
σ
iii
ii
ii
ii y
sn
nmsn
sm 22
2
22
2'
/1/
/
/1/
/1
++
+=
σ
σ
σ
hnu
.ac.
kr Ba
yesia
� since , the posterior of is
iiii
),(~ ''iii smNμ 21 μμμ −=d
))()(,(~ 2'1
2'1
'1
'1
' ssmmmN dd +−=μ
tp://
wol
fpa
ck.h a
n Inference for
�Credible interval�Confident Interval for Frequentist
2'2/
' )( dd szm α±
htt r M
ean D
ifferen
• same as flat prior
�One sidedH H
212/21
11)(nn
zyy +±− σα
nce�Ho: μ=μ0, Ha: μ>μ0
�Two sided
)0( '
'
d
ds
mzPp −≤=
(53)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
�using credible interval
Bayesian Inference
I d d t RS f t l l ti I d d t RS f t l l ti ( k l i d )( k l i d )Independent R.S. from two normal populations Independent R.S. from two normal populations (unknown equal variance and more)(unknown equal variance and more)
�Equal variance (σ1=σ2) unknown�with flat prior
�Unequal Variance (σ1≠σ2)�Known σ1, σ2with flat prior
� since the σ is unknown, it is estimated by the l d i
iiii nsym
2'' , σ==
Known σ1, σ2
•with normal or flat prior
))()(,(~ 2'1
2'1
'1
'1
' ssmmmN dd +−=μ
hnu
.ac.
kr Ba
yesia
pooled variance
�By Frequentist approach, the credible interval for
2)1()1(
21
222
2112
−+
−+−=
nnsnsn
sp
�Unknown• Sample variances of each samples are estimators for
the population variance respectively.
•with flat prior, Satterthwaites adjustment
tp://
wol
fpa
ck.h a
n Inference for
By Frequentist approach, the credible interval for the mean difference, μd
212/21
11)(nn
styy p +±− α
htt r M
ean D
ifferen
�One-sided test• using the t-dist. )2(~ 21 −+= nndftdμ
nce
(54)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
I d d t RS f t l l ti ( l i )I d d t RS f t l l ti ( l i )Independent R.S. from two normal populations (unequal variance)Independent R.S. from two normal populations (unequal variance)
�Exercise HW-1 due 12.11
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference for
htt r M
ean D
ifferennce
(55)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
P i d R d S lP i d R d S lPaired Random SamplePaired Random Sample
�Two Normal populations�Normal (μ1, σ1) and Normal (μ1, σ2)
�Exercise HW-2 due 12.11Normal (μ1, σ1) and Normal (μ1, σ2)
�Now, it is same as one population mean (μd) case
),( 22
21
221 σσσμμμ +=−= ddNormal
hnu
.ac.
kr Ba
yesia
•Credible Interval and Hypothesis testing (Refer to the 13th week lecture note)
tp://
wol
fpa
ck.h a
n Inference forht
t r Mea
n Differennce
(56)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
T T l ti til ti tiTwo Two population proportionpopulation proportion
�Unknown parameters �Exercise HW-3 due 12.11
μ1 μ2
y11, y12, …, y1n1 y21, y22, …, y2n2
21 πππ −=d
hnu
.ac.
kr Ba
yesia
• y’s are 0 or 1
�Y1 ~ Binomial (n1, π1), Y2 ~ Binomial (n2, π2)
�Beta priors for and ) Beta(a b )
y11 y12 y1n1 y21, y22, , y2n2
tp://
wol
fpa
ck.h a
n Inference for
�Beta priors for π1 and π2) ~Beta(ai, bi)�posterior ~ Beta ( )
�Posterior for 21 πππ −=d
iiiiiii ynbbyaa −+=+= '' ,
htt r M
ean D
ifferen
�Posterior for �approximately normal ( )
�Credible Interval
21 πππ =d2'' , dd sm
'2
'1' −=
aamd'
2/'
dd szm α± nce
�Two sided hypothesis testing
�One sided Hypothesis testing (Ha: πd > 0)2'
2'2
'2
'2
'2
'2
2'1
'1
'1
'1
'1'12'
'2
'2
'1
'1
)1)((
)(
)1)((
)(
++++
+++=
+−
+=
bababa
bababas
babam
d
d 2/ dd α
0 'dm−
(57)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
22221111 ))(())(()0()0:( '0
d
dd
s
mzPHPp ≤=≤= π
Bayesian Inference
Si lSi l R i R i M d l d S lM d l d S lSimpleSimpleRegression: Regression: Model and Scatter plotModel and Scatter plot
�Linear model�Linearity
iii ebxay ++=Linearity
�X is deterministic, therefore e and Y are r.v.
�Assumption�Error term ),0(~ 2σiidNei
y(2)(1)
hnu
.ac.
kr Ba
yesia
• Normality; CLT, log transformation
• Independent; DW, lag variable
• Equal Variance, Homoscadicity: WLS. Division of the problem making explanatory variable (see the
),(i
x
(3)
tp://
wol
fpa
ck.h a
n Inference in
problem making explanatory variable (see the graph)
�Mean assumption
x
0| 0 bxaxy +=μ
htt R
egression
�Scatter Plot�Diagnosis of relationship
li i tiy
)ln(y
0| 0xy
• linearization
�Pre-recognize outliers and/or influential obs.• observation (1), (2), (3)
(58)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
Si lSi l R i R i i i d i ifii i d i ifiSimpleSimpleRegression: Regression: estimation and significanceestimation and significance
�Estimation�OLS: ordinary least square, dist. free estimation,
yxbay ˆˆˆ +=
ciyiii ebxay ++=
OLS: ordinary least square, dist. free estimation, then why? normal assumption is needed
�MLE
�Gauss Markov Theorem: OLS is BLUE
y
cd
hnu
.ac.
kr Ba
yesia
�Alternative model
�MSE=estimation of σ2
x
x
−= ˆˆ xbya
iixi exxBAy +−+= )(
)ˆ( 2∑ yy
tp://
wol
fpa
ck.h a
n Inference in
�MSE=estimation of σ
�Significance�coefficient t-test
∑∑
−−−
== 2)()()(ˆ
xxyyxx
SxxSxyb
i
ii
yA =
2)(
−∑ −
=n
yyMSE ii
htt R
egression
coefficient t test
�Goodness of Fits: ANOVA• Determinant coefficient
• Correlation coefficient22 xx
yxxyB
yAx
−
−=
=
� (Pop Quiz) • which one is right?
yyyy
yySST
iiii
ii
∑ −+∑ −=
∑ −=+=22
2
)ˆ()ˆ(
)()2()1(
(59)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
which one is right?SSRSSE +=
Bayesian Inference
B i ti t f B i ti t f d d ββBayesian estimator for Bayesian estimator for ααand and ββ
�Bayesian Theorem �Joint prior for α and β�we can use normal or flat prior
�Joint likelihood for α and β
we can use normal or flat prior
�mean and std for
likelihoodpriorposterior ×∝)()(),( βπαπβαπ xx =
xα
hnu
.ac.
kr Ba
yesia�with subtracting and adding
• guess the mean of X and use it for mα
• (upper – lower)/6 = sα�mean and std for β
• may choose mβ=0
22 )))(((
21
),(xxy ixi
elikelihood−+−−
∝βα
σβα
y
tp://
wol
fpa
ck.h a
n Inference in
may choose mβ=0
• (upper – lower)/6 = sαβ
�Joint Posterior })(2{
21 222
),(ynSSSSSS xxxyy
el−++−−
∝αββ
σβαyA =
y
)|()()|( βαβαπβαπ ylikelihoody ×∝
htt R
egression
�dividing two parts�normal
22 1)(1 SSxyβ
22 xx
yxxyB
yAx
−
−=
= ),|(),()|,( βαβαπβαπ xxx ylikelihoody ×∝
x Anms
mn α σ 22
' //111++
MSE=2σ̂
�product of two separate likelihood
� like normal shape with mean B and A and
22
22 )(
/21)(
/2),(y
nSSSS xx
y
x eel−−−−
×∝α
σβ
σβαx
x
x
x
xx
xx
As
ms
mss
αα
αα
ααα σ 22222 /1/1
,′
+′
=+=′
BSSsSS xx22
' //111 β σ
(60)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
variances respectively nSxx /,/ 22 σσB
sm
sm
ssxx22222 /1/1
,β
ββ
ββ
ββ σ ′+
′=+=
′
Bayesian Inference
C dibl I t l d H th i T tiC dibl I t l d H th i T tiCredible Interval and Hypothesis TestingCredible Interval and Hypothesis Testing
�Credible interval for the slope β �Bayesian:'
2/'
βαβ szm ± )( 11 xxnxn −+= ++ βαμ
�unknown variance• same as Frequentist when the flat prior
• predictive distribution (result)2/ βαβ szm ±
)(')2;2/(' MSEstm n βαβ −±
''
'',11
)(
),(~)|( ++ smNormalxyf yydatann
f
11 nxn ++
hnu
.ac.
kr Ba
yesia
• same as Frequentist when the flat prior
�Hypothesis testing� two sided: using Credible interval
222 '
.)(
σμ
μ
+=′
=
ss
postmeanmm
y
y of
tp://
wol
fpa
ck.h a
n Inference in
two sided: using Credible interval
�One sided: �Credible interval for the future obs.• using z-table or t-table with MSE
∫==0
)|()|( 0H
ddatadataHunderPp ββπβ '
2/2/' )2;( yy sntzm −± αα
htt R
egression�Predictive Dist. for future obs.�Frequentist
0H 2/2/ )2;( yy sntzm ± αα
• confidence interval
• prediction interval exxyNormalxxyE
++=+=
00
00|
~)|(βα
βα
(61)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
E lE lExampleExample
�Problem �Solution
hnu
.ac.
kr Ba
yesiatp
://w
olfp
ack
.h an Inference in
htt R
egression
(62)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
E i E i Exercise Exercise hn
u.a
c.kr B
ayesia
tp://
wol
fpa
ck.h a
n Inference inht
t Regression
(63)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
R liR liResamplingResampling
�Definition� sampling or scrambling the original (sample) data
�Randomization (permutation) test�proceduresampling or scrambling the original (sample) data
numerous times
�nonparametric method
• take original (sample) data• reordering the data (reshuffling the subscript)• calculate the test statistic, Λ for testing the null
hypothesis
hnu
.ac.
kr Ba
yesia
�Methods� jackknifing • computing the statistics of interest (medians,
variances percentiles) for all combination of the
•do the same thing a numerous times to get the sampling dist. of Λ and test the null with p-value
height
tp://
wol
fpa
ck.h a
n Inference in
variances, percentiles) for all combination of the original data or subsets of available data
�bootstrapping•drawing randomly with replacement from a set of
data points to estimate the sampling distrib tion
� (example)• Test the difference of by gender•Data: W of size nm and W of size nf
• The dist. of W is not normal, but very skewed.
3/1)(weightheightW =
htt R
esam
pling
data points to estimate the sampling distribution
�permutation tests• Exchanging labels on data points (or scrambling the
order of the original data) when performing f
• computing the mean difference of W from the original data, for example 10 unit.• Take nm from all W of size (nm+nf) and assign “male”
to the sample observation.A i “f l ” t th i i d C ti significance tests •Assign “female” to the remaining and Computing the means of W separately.•do the same thing 5,000(N) times.• k times the mean difference of (male-female) out of
N went over 10 units Then k/N is p-value for the
(64)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
N went over 10 units. Then k/N is p value for the one-sided test.
Bayesian Inference
J kk ifJ kk ifJackknifeJackknife
�Tukey (1958)� suggest Jackknife idea based on removing data
� the 100(1-α)% conf. interval for θ
suggest Jackknife idea based on removing data and then recalculating the estimator
�Idea �Motivation of Jackknife
)ˆ(ˆ *2/
* θθ α Vt±
hnu
.ac.
kr Ba
yesia
� sample data of size n:
k b h h l d
� reduce the bias
�Example∑
−=
≠−
jiij x
nx
11
),...,,( 21 nxxx
tp://
wol
fpa
ck.h a
n Inference in
�we know both the sample mean and x(-j)
� jth data obs.
� similarly
�estimate the species diversity based on n samples?
� total number S=120 of species in the sample• it should be under-estimate
B d O t (1978) t J kk if
jj xnxnx −−−= )1(
htt R
esam
pling
similarly,
� jackknife estimator of θ is
•Burn and Overton (1978) suggest a Jackknife estimator to reduce the bias.
),...,,(ˆ,ˆ)1(ˆˆ 21*
njj xxxwherenn φθθθθ =−−=
5.14725*109120)1(ˆ 1
* =+=−
+= fnnSS
�an approximate sampling error of
ni∑=*
* ˆˆ θθ
*θ̂
(65)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
)1()ˆˆ()
ˆ()ˆ(
2****
−∑ −
=∑=nnn
VV ii θθθθ
Bayesian Inference
B t t B t t Bootstrap Bootstrap
�Idea� re-sampling of obtained single sample for
�Standard bootstrap confidence intervalsP 22 1re sampling of obtained single sample for
generating the sampling dist.• robustness is depending on the sample size, in
general, more than 20
�with replacement sampling� then
∑ −−
=P
iBip
S 22 )ˆˆ(11 θθ
Sz 2/ˆ αθ ±
hnu
.ac.
kr Ba
yesia
�with replacement sampling• randomized sampling = without replacement
�Procedure for estimating bias�Efron’s percentile confidence intervals �using bootstrapping sampling dist
tp://
wol
fpa
ck.h a
n Inference in
oced e o est at g b as�Let be the estimator for θ�Let denote the estimate using the original
data and be the estimator using ith bootstrap l
�using bootstrapping sampling dist.
0̂θ
),...,,(ˆ 21 nxxxφθ =
iθ̂
)ˆ,ˆ( 2/,2/, αα θθ HL
htt R
esam
pling
sample.
�The mean of all the bootstrap is used for estimating the bias.
�Bias and is the bootstrap
∑=iiB P
θθ ˆ1ˆ
θθ −= )ˆ(ˆ EB Bθ̂estimator of .
�Then, the bootstrap estimator of bias is
0̂ˆˆ θθ −= BB
)()ˆ(θE
B
(66)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall
Bayesian Inference
W iW iWrappingWrapping
�Bootstrap in R�boot {boot}
�Jackknife in R� jackknife {bootstrap}boot {boot}
•Generate R bootstrap replicates of a statistic applied to data. Both parametric and nonparametric resampling are possible. For the nonparametric bootstrap, possible resampling
jackknife {bootstrap}• See Efron and Tibshirani (1993) for details on this
function.
� jackknife(x, theta, ...)
hnu
.ac.
kr Ba
yesia
p p, p p gmethods are the ordinary bootstrap, the balanced bootstrap, antithetic resampling, and permutation. For nonparametric multi-sample problems stratified resampling is used. This is specified by including a
f �Bayesian and Resampling
tp://
wol
fpa
ck.h a
n Inference in
vector of strata in the call to boot. Importance resampling weights may be specified.
�boot(data, statistic, R, sim="ordinary", stype="i",
�Bayesian and Resampling�Bayes vs. Resampling: A Rematch, Campbell
Garvey et. al. (2006)• summary of 3 pages.
htt R
esam
pling
strata=rep(1,n), L=NULL, m=0, weights=NULL, ran.gen=function(d, p) d, mle=NULL, simple=FALSE, ...)
(67)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall