Download - 개요, 베이즈 정리이산형 추정모비율, 모평균, 모평균 차이회귀 ...wess.hannam.ac.kr/Stat_Notes/adv_stat/Special_topic/... · 2018-03-30 · 개요, 베이즈

Wolfpack_Kwon

개요, 베이즈 정리이산형 추정모비율, 모평균, 모평균 차이회귀분석, Resampling (Jackknife, Bootstrap)�

Bayesian Inference

C tC tConceptConcept

�Statistical Inference�a guide to the unknown. Uncertainty

�Probability�classical : long-run relative frequencya guide to the unknown. Uncertainty classical : long run relative frequency

�Laplace: equally likely

�Bayesian: update by data or event• Thomas Bayes(Reverend): special caseRandom Sample Data

(X1 X2 Xn) ~iid f(x;θ)Population

hnu

.ac.

kr Introdu

• Pierre-Simon Laplace: celestial mechanics, medical statistics, reliability

• P(Hypothesis | Data) ∝ P(D | H) P(H)

(X1, X2, …, Xn) ~iid f(x;θ)Populationx~f(x;θ)

SamplingDistribution

MVUE

tp://

wol

fpa

ck.h u

ction�Probability SET function� sample space S, subset C, element w

� (S,Borel field, P)

statisticInformation

Inference

Distribution

htt

�Frequentist

�measure P(C)

�Probability Function

estimationTesting

• parameter θ, fixed (constant) but unknown

�Bayes• θ is random variable

�X(w)=x

�P(X=x) or f(x)

(1)Prof. Sehyug Kwon, Dept. of Statistics, HANNAM University Lecture of 2009 Fall

Bayesian Inference

S i tifi th dS i tifi th dScientific methodScientific method

�Case �O.J. Simpson Trial

�Premises by William Ockham�A scientific hypo. can never be shown to be O.J. Simpson Trial

• Prob.(H beats W to death) is very low.

• But Prob.(H is criminal | Beaten W is killed) up

M H ll Sh L ’ k d l

A scientific hypo. can never be shown to be absolutely true

�However, It must potentially be disprovable.

� It is a useful model until it is proved not to be true.

hnu

.ac.

kr Introdu

�Monty Hall Show: Let’s make a deal• 3 curtain, 2 donkeys and 1 sports car

• Do you dare to change?

� Always go for the simplest hypo., unless it can be shown to be false.

�Procedure

tp://

wol

fpa

ck.h u

ction

�Procedure�Set a problem in term of the current scientific

hypo. => statistical model, population

�Gather all the relevant information that is

htt

currently available. => about parameters

�Design an experiment that addresses to the model. => randomness reduces the uncertainty and outside factors

�Casino: black jack game (5 cards decks)

�Gather data from experiment. => sample, statistics

� Draw conclusion from the experimental results. => sampling dist. of statistics


Bayesian Inference

M i h t St ti tiM i h t St ti tiMain approaches to StatisticsMain approaches to Statistics

�Monte Carlo Study

�Frequentist�Frequentist�Parameter (the characteristic value of population) qParameter (the characteristic value of population)

is fixed but unknown. X~f(x;θ)

�Prob. : long-run relative frequency

� In the sampling dist., the parameter is fixed. );(~ θxfx ),...,,(ˆ 21 nxxxφθ =

Random

hnu

.ac.

kr Introdu

�Bayesian�From the uncertainty on θ, θ is assumed to be

random. π(θ)

�Prior dist must be subjective (degree of belief)

Sample

sampling dist.of θ hat

Before actual data,it calculate their performance

tp://

wol

fpa

ck.h u

ction

�Bayesian• The posterior of θ is conditional on the actual data )( θf SampleRandom

�Prior dist. must be subjective. (degree of belief)

�Revise the belief by the gathering data. π(θ|x)

their performanceby sample space averaging.

htt • The posterior of θ is conditional on the actual data.

However we might know the performance of Bayesian procedure before data.

• Pre-posterior analysis: the θ is fixed but unknown (prior) and random var. (updated by data )at the

)();(~

θπθxfx

);(~),...,,( 21θxfiidx

xxxSampleRandom

i

n Among

all possible data

actual (prior) and random var. (updated by data )at the same time.

),...,,( 21 nxxxData

data

Confidence statementon θ

data

Update the belief


Update the beliefby Data, )|( xθπ

Bayesian Inference

D tD tDataData

Gathering DATA

S li

Univariate

G hi l S�Sampling� sampling error and non-sampling error

�Probabilistic sampling• SRS, Systematic, Clustering, Stratified

�Graphical Summary�histogram and frequency table

• # of classes: 8~10

• shape of PDF

hnu

.ac.

kr Introdu

SRS, Systematic, Clustering, Stratified

�Non-Prob.• quota, judgment, convenient, snow-ball

�Randomized response Methods

p

• Polygon

�dot plot• shape of PDF

tp://

wol

fpa

ck.h u

ction

• Sensitive question and dummy question

• Known demographical question

Ob ti l E i t lhtt �Observational vs. Experimental

�Can other factors be controlled? • Lung cancer and smoking

�Experiment � stem-leaf plotp• Randomization, Blocking, Replication

• Control vs. treatment

• Placebo (gas injection for stomach ache), Blind test

• double stems, five-stems

• shape of PDF

• more informative that His.


Bayesian Inference

D t (2)D t (2)Data (2)Data (2)

�box-whisker plot• 5 elementary statistics

�measure of spread�variance, standard deviation

• shape of PDF• detecting outliers

variance, standard deviation

�Range, IQR

�mean and std

hnu

.ac.

kr Introdu

� same unit

�empirical rule

�Chebyshev’s inequality

tp://

wol

fpa

ck.h u

ctionht

t

Numerical Summary�measure of location�arithmetic mean�median, p percentile� trimmed / winsorized mean


trimmed / winsorized mean�geometric mean

Bayesian Inference

D t ( t)D t ( t)Data (cont.)Data (cont.)

Univariate

tt l t

�Test on correlation• Ho: ρ=0 ~ t-test )2(~

2−ntr

�scatter plot�visual representation of functional relation

• Ho: ρ=ρ0, Ho: ρ1 =ρ2 ~ z-test (approximately) by Fisher transformation

2/)1( 2 −− nr

hnu

.ac.

kr Introdu

Fisher transformation

)31,

11ln5.0(~

11ln5.0

−−+

−+

nrrN

rr

tp://

wol

fpa

ck.h u

ction• Determinant Coefficient

SSTSSRR /2 =

htt

�correlation coefficient• measure of linear relationship

• Partial Determinant Coefficient

2132YRp 13.2YR


Bayesian Inference

P b bilitP b bilitProbabilityProbability

�Uncertainty�deductive logic, model

�Probability Axiom�P(A)≥0 for any event Adeductive logic, model

�plausible reasoning, probability

�Probability

P(A)≥0 for any event A

�P(S)=1 for sample space S

�P(∪ Ai)=ΣP(Ai) for disjoint Ais

hnu

.ac.

kr Prob

ab

y�measure of plausibility

• nonnegative real number

• larger plausible, larger number

d ff bl b h

�Rules�P(φ)=0, null set

�P(Ac)=1-P(A)

tp://

wol

fpa

ck.h b

ility

• different proposition is possible, but the same plausibility

• all possible outcomes must be taken into account

• the same knowledge gives the same plausibility

�addition rule: P(A∪B)=P(A)+P(B)-P(A∩B)

�Joint Prob. and Independency

htt

�Bayesian uses the rule of prob. to revise the initial belief by data

�Glossary• random experiment outcome

�P(A∩B)

� If P(A∩B)=P(A)P(B), A and B are independent

�disjoint vs. independent

�marginal P(A)=P(AB)+P(ABc)random experiment, outcome

• sample space, S

• element w, Event E

• union, intersection

�marginal P(A)=P(AB)+P(ABc)


• mutually exclusive, disjoint

• mutually exhaustive

Bayesian Inference

P b bilit (2)P b bilit (2)

�Total Probability

Probability (2)Probability (2)

�Conditional probability�S reduces to B )(

)()|(BPABPBAP =

�Bayes Theorem

S reduces to B

� for independent events, P(A|B)=P(A)

�multiplicative rule

)(BP

)|()()()|()()(BAPBPBAPBAPBPABP

=

=

∑=∑= )()|()()( jjj BPBAPABPAP

)()|( ii BPBAP

hnu

.ac.

kr Prob

ab

�Bi’s are unobservable events

�P(Bi) prior distribution

�Bayes Theorem)()()|( ABPABPBAP ==

∑=

)()|()()|()|(jjii

i BPBAPBPBAPABP

tp://

wol

fpa

ck.h b

ility� likelihood of Bi is P(A|Bi)

�P(Bi|A) posterior dist.

observed)()|()()|()()|(

)()()()|(

APABPAPABPAPABP

BAPABPBP

+=

+

htt

unobserved

A

B1 B2

B1 A

B2

B3 A∩B3

Bayes universe

A

B3 Bk

…

Bk

Marginal of A


LikelihoodiorJoBAPBPABP iii

×==Print

)|()()(

Bayesian Inference

P b bilit (3)P b bilit (3)Probability (3)Probability (3)

�Left side in Bayes Theorem�posterior

�unobservable space = parameter space, random => a certain value is assigned by the belief.posterior

�Right side in Bayes Theorem�numerator: prior ×likelihood

�denominator: constant such that posterior prob.=1

θ1 A

θ2

θ3 A∩B3

Bayes universe

hnu

.ac.

kr Prob

ab

p p

�Tree diagram

LikelihoodiorPostrior ×∝ Pr

θ3 A∩B3

…

θk

A valueby guess

tp://

wol

fpa

ck.h b

ility

g� starts with observed event, A

�Assign Probability

�Odds ratio and Bayes factor�A event C can be prior or posterior

�B denote Bayes factor

)(1)()(CPCPCOdds

−=

htt

� long-run relative frequency

�degree of belief, subjective

de o e aye ac o

)()(

)()(

CCB

CBC

Odds PriorOdds Posterior

Odds PosteriorOdds Prior

=

=×

�Bayesian uses� long-run rel. freq. for the prob. of outcomes from

random experiment given the value of unobserved var (parameter)

• if B>1, the data makes us believe that the event C is more probable than we thought.


unobserved var. (parameter)

� repeating experiment makes sample space

Bayesian Inference

E iE iExerciseExercise

#1 #4

hnu

.ac.

kr Prob

ab

#2 #5

tp://

wol

fpa

ck.h b

ility

#3 #6

htt

¾sensitivity P(PT|P)=(1-false negative),

¾specificity P(NT|PC)=(1-false positive)

¾positive predicted value = Posterior P(C|PT)


¾negative predicted value = Posterior P(CC|NT)

Bayesian Inference

E i (2)E i (2)Exercise (2)Exercise (2)

#7

hnu

.ac.

kr Prob

ab

tp://

wol

fpa

ck.h b

ilityht

t


Bayesian Inference

Di t R d V i blDi t R d V i blDiscrete Random VariableDiscrete Random Variable

�Random Variable�X(w)=x; real valued function

�Discrete Prob. Mass Function)()()( xXPxPxf ===X(w) x; real valued function

�w is an element of sample space, S

Sample Space

�(mathematical) Expected value

�long-run average

)()()( xXPxPxf

∑=x

xxPXE )()(

hnu

.ac.

kr Discret

w1, w2, w3, …

�Variance and STD

x

22

2

)()(

)())(()(

XEXE

xPXExXVx∑ −=

Prob

tp://

wol

fpa

ck.h e R

and

om V

ari

real number

�Probability Density Function� a function that assigns a probability to each

x 22 )()( XEXE −=Prob.relative freq.p(x)

htt ia

ble

� a function that assigns a probability to each value of random variable.

� induced probability ))(:()( xwXCwPxPX =∈=

�Mean and Var. of Linear function

X

�Discrete or Continuous� the outcomes are distinct numbers

�F(x) is step function�Extension

bXaY +=)()(

)()(2 XVbYV

XbEaYE

=

+=

= ii XEbYE )()(


�outcome can occur in any small intervals.Extension

∑=i

ii XbY ∑+=< ji

jijiii

ii

XXCOVbbXVbYV ),(2)()(

)()(2

Bayesian Inference

F Di t R d V i blF Di t R d V i blFamous Discrete Random VariableFamous Discrete Random Variable

�Bernoulli Trial�dichotomous outcome (success, fail)

�Hyper-geometric (N, K,n)�X=# of subjects from the interesting groupdichotomous outcome (success, fail)

� the prob. of being success remains constant

� independent trials

X # of subjects from the interesting group

�without replacement

�PMF

K N

x

out of nKNK⎟⎟⎞

⎜⎜⎛ −⎟⎟⎞

⎜⎜⎛

hnu

.ac.

kr Discret

�Bernoulli (p)�X=the result of Bernoulli trial, 0 or 1

�parameter p = prob. of successMean nK/N

x

nx

nNxnx

xp ,...,2,1,0,)( =

⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎜⎜⎝ −⎟⎟⎠

⎜⎜⎝=

tp://

wol

fpa

ck.h e R

and

om V

ari

�PMF

�Mean=p, Variance=pq

�Mean=nK/N�N is very large1,0,)1()( 1 =−= − xppxp xx

),(),,(NKnBnKNHG →

htt ia

ble

�Binomial (n, p)�X= # of successes in “n” Bernoulli Trials with “p”

�PMF �PMF

�Mean=np, Variance=npq

nxppxn

xp xx ,...,2,1,0,)1()( 1 =−⎟⎟⎠

⎞⎜⎜⎝

⎛= −


Bayesian Inference

F Di t R d V i bl (2)F Di t R d V i bl (2)Famous Discrete Random Variable (2)Famous Discrete Random Variable (2)

�Geometric (p)�X=# of Bernoulli trials (p) until one success

�Poisson Dist. (λ)�The number of “successes” in a certain time / area.X # of Bernoulli trials (p) until one success

�unit time, discrete life-time

�PMF

The number of successes in a certain time / area.

�Split the time into subintervals, each of which contains at most one happening.

,...2,1,)1()( 1 =−= − xppxp x

hnu

.ac.

kr Discret

�mean=1/p, var.=(1-p)/p • independent trials

• P(only one happens) = p remains constant

• P(more than one occur)=0

tp://

wol

fpa

ck.h e R

and

om V

ari

�Negative Binomial (r, p)�X=# of Bernoulli trials (p) until rth successes

( )

� In the Binomial, n (# of intervals)→∞ and p is very small; λ=np

�PMF210)(

−ex λλ

htt ia

ble

�X=# of Bernoulli trials (p) until rth successes

�PMF �mean= λ, var.= λ�Poisson approximation to the Binomial,...1,,)1(

11

)( +=−⎟⎟⎠

⎞⎜⎜⎝

⎛−

−+= rrxpprxr

xp xr

,...2,1,0,!

)( == xx

xp

)(),( npPpnB =→ λ�mean=r/p, var.=r(1-p)/p2

�Normal approximation to Binomial�continuity correction

)(),( npPpnB → λ


)),(~|5.0()),(~|( npqnpNXaXPpnBXaXP −≥≈≥

Bayesian Inference

E i E i & HW#1& HW#1 1 1 d 0091009d 0091009Exercise Exercise & HW#1& HW#1--1 1 due 009.10.09due 009.10.09

�#1• Suppose that approximately 35% of all applications pp pp y pp

for jobs falsify the information on their application forms. Consider a company with 2300 employees. Let be the number of applications in the company that have been falsified.

∑ iX−= pr )1(λ

iXmin1=r

hnu

.ac.

kr Discret

⒜Find the mean and standard deviation of .

⒝Calculate the interval

⒞The company verify the forms of 2300 employees, and 249 forms contains falsified information. Does it

Poisson (λ)

∞→n

∑ X

Bernoulli (p)

∑ X

tp://

wol

fpa

ck.h e R

and

om V

ari

support the overall 35%?

�#2• Of the volunteers donating blood in a clinic, 80%

h th Rh f t t i th i bl d

∑ iX

Binomial (n,p)∞→

=n

npλ∑ iX1=n

= npμ →= λσ

htt ia

ble

have the Rhesus factor present in their blood.

⑴If 5 of volunteers are randomly selected, what is the probability that at least one does not have the Rhesus factor?

⑵ h t i th b bilit th t t t f h th ∞→

=

NNKp ∞→

=

=

nnpq

np

σ

μ ∞→n

⑵ what is the probability that at most four have the Rhesus factor?

⑶What is the smallest number of volunteers who should be selected if we want to be at least 90% ce tain that e obtain at least fi e dono s ith the

HG (N,K,n)


certain that we obtain at least five donors with the Rhesus factor?

Bayesian Inference

E i (2) & HW#1E i (2) & HW#1 1 d 00910091 d 0091009Exercise (2) & HW#1Exercise (2) & HW#1--1 due 009.10.091 due 009.10.09

�#3• Suppose that the prob. of engine malfunction during

#6• A geological study indicates that an exploratory oil pp p g g

any 1-hour operation is 0.02. Find the probability that a given engine will survive 2 hours.

�#4Th b bilit f t i l t b k

g g y p ywell should strike oil with prob. 0.2.

⑴Find the prob. that the first strike comes on the third well drilled.

⑵Find the prob. that the third strike comes on the

hnu

.ac.

kr Discret

• The probability of a customer arrival at a bank service desk in any 1-sec. is equal to 0.1. Assume the customers visit the bank randomly and independently.

⑴Find the prob that the first arrival will occur

⑵ d e p o a e d e co e o e seventh well drilled.

⑶What is the mean and variance of the number wells that must be drilled if the company wants to set up three producing wells?

tp://

wol

fpa

ck.h e R

and

om V

ari

⑴Find the prob. that the first arrival will occur during the third 1-sec.

⑵Find the prob. that the first arrival will not occur until at least the third 1-sec interval.

⑶H l d t t h th fi t t

p p g

#7• Ten percent of the engines manufactured are

defective. If engines are randomly selected one at a ti d t t d Wh t i th b th tht

t iab

le

⑶How long do you expect to have the first customer at a bank?

�#5• Show that P(X>a+b|X>a)=P(X>b) when X~G(p).

time and tested. What is the prob. that

⑴the first non-defective engine will be found on the second trial?

⑵the third non-defective engine will be found on the fifth t i l?

Show that P(X a b|X a) P(X b) when X G(p).

• This is the memory-less property of the geometric distribution.

fifth trial?

⑶the third non-defective engine will be found on or before the fifth trial?


Bayesian Inference

E i (3) & HW#1E i (3) & HW#1 1 d 00910091 d 0091009Exercise (3) & HW#1Exercise (3) & HW#1--1 due 009.10.091 due 009.10.09

�#8• A product is shipped in lots of twenty. We test each l

�#11• The number of accidents in a certain street is p pp y

ot with the sample of size 5. If there is no defective, the lot will be shipped to the customer. If a lot contains four defectives,

⒜what is the prob. that it will be rejected?

observed to average 3 per month. During the last month 6 accidents occurred. Does it indicate an increase in the mean?

hnu

.ac.

kr Discret

⒝what is the expected number of defectives in the sample of size 5?

�#9h b f ( )

�#12• Suppose the number of visitors to a museum per

hour is distributed as a Poisson with λ=2.

⒜F d h b h h l

tp://

wol

fpa

ck.h e R

and

om V

ari

• We want to estimate the number of animals(N) in a certain area. M animals are captured and tagged. Some time later, n animals are captured, and X, the number of ragged animals among the n animals, are noted Suppose M=4 and n=3

⒜Find the prob. that there are exactly 3 visitors between 9:00 to 9:30

⒝Find the prob. that there is no visitors between 9:00 to 10:30

⒞htt ia

ble

e noted. Suppose M=4 and n=3.

⑴Find P(X=1) as a function of N.

⑵What value of N will maximize P(X=1)?

⒞How many visitors do you expect in a day when the museum opens for 8 hours?

�#10• Animals live in a prairie with the mean population

density of approximately 5 per acre. If we check 10 acres randomly, what the prob. that we can see


acres randomly, what the prob. that we can see none of animals?

Bayesian Inference

J i t R d V i blJ i t R d V i blJoint Random VariablesJoint Random Variables

�Joint PDF of two discrete r. v. (X,Y) �Independent� Ù X and Y independent)()( YXP

�marginal PDF of Y

X and Y independent

�When X and Y are independent,

),(),( yYxXPyxp ===

∑ ===x

Y yYxXPyp ),()(

hnu

.ac.

kr Discret

�conditional PDF of X given Y=y

�Var. of Sum of (X, Y) )(),()|(ypyxpyxp

y=

0),()()|(

==YXCOVxpyxp

tp://

wol

fpa

ck.h e R

and

om V

ari

�Expected value of h(x, y)

h th t E(X Y) E(X) E(Y)

( )

∑∑=x y

yxpyxhyxhE ),(),()),((),(2)()()( YXCOVYVXVYXV ±+=±

htt ia

ble

� show that E(X+Y)=E(X)+E(Y)

�Covariance of (X, Y)

y1 y2 … yn fX(x)

x1 f(x1,y1) f(x1,y2) … f(x1,yn) f(x1)

x2 f(x2,y1) f(x2,y2) … f(x2,yn) f(x2)

))())(((),( YEYXEXEYXCOV −−= … … … … …

xm f(xm,y1) f(xm,y2) … f(xm,yn) f(xm)

f (y) f(y ) f(y ) f(y )Margi


fY(y) f(y1) f(y2) … f(yn) nal

Bayesian Inference

HW#1HW#1 2 d 0091009 2 d 0091009 HW#1HW#1--2 due 009.10.09 2 due 009.10.09

#1 #3

hnu

.ac.

kr Discret

tp://

wol

fpa

ck.h e R

and

om V

ari

#2 #4

htt ia

ble

¾no cover sheet 4 digit pass code on the top of the first page


¾no cover sheet, 4 digit pass code on the top of the first page

¾fold the homework in half and write your name on the outside

Bayesian Inference

C tC tConceptConcept

�Notation�θ: random variable for parameter, never be

�Bayesian Paradigm• Frequentist: estimate θ with the model of dataθ: random variable for parameter, never be

observed, π(θ); prior prob. function

�Y: random variable, observed, random sample, f(y; θ)

i b li f (i f ) t θ i

q

•Bayesian: prob. model for θ and data

�3 steps for Bayesian• Set up for π(θ) and f(y;θ)

C l l t th t i (θ| )

hnu

.ac.

kr Ba

yesia

� revise our belief (inference) on parameter θ given the observed values of Y

�Bayesian universe

•Calculate the posterior π(θ|y)

•Model diagnosis with π(θ|y)

�Posterior PDF

tp://

wol

fpa

ck.h a

n Inference 4 D

Bayesian universe� (θi, yj) Ù (unobservable, observed)

�Posterior PDF

π(θ) y1 y2.. yJ

Bayesuniverse

)|()()|()(

),(),(

)|(

iiji

iji

j

jiji yf

yfyfyf

yθθπθθπθ

θπ∑

=−

=

htt D

iscrete Ra

ndom

�prior: π(θ), guess from past information before collecting data θi

θ1θ2 f(θI,yj)

…PDF

)|()( iji

iyf θθπ∝

m V

aria

ble

� joint of (θ, y)

collecting data, θi

� likelihood function: f(yj| θi), reduced to θi universe

�posterior? update the belief by observed value

θIby guess

Reduced universe


joint of (θ, y))|()(),( ijiji yfyf θθπθ =

Bayesian Inference

E lE lExampleExample

�Example�There is 5 balls in a urn and the number of red

�Calculation Table�a sample of size one, y=1There is 5 balls in a urn and the number of red

balls is of our interest. θ=0, 1, 2, …, 5

� One ball is randomly drawn, which is an observation. y=red(1) or not(0)

Wh t i th t i ?

a sample of size one, y 1

θi prior likelihoodprior*

likelihoodposterior

0 1/6 0 0 0

hnu

.ac.

kr Ba

yesia

�What is the posterior?

∑ ××

=

iLikelihoodiorLikelihoodiorPosterior

PrPr

1 1/6 1/5 1/30 1/152 1/6 2/5 1/15 2/153 1/6 3/5 1/10 1/5 4 1/6 4/5 2/15 4/155 1/6 1 1/6 1/3

tp://

wol

fpa

ck.h a

n Inference 4 D

M h b

i 5 1/6 1 1/6 1/3 sum 1/2 1

)|()()|( ijiji yfy θθπθπ ∝

htt D

iscrete Ra

ndom

�More than one observation: y1, y2, y3, … �keep drawing one ball w/o replacement

�all together in a time

ff )|()()( θθθ

�a sample of size two, y1=1, y2=0

θιprior

Posterior*likelihood

y=0prior*

likelihoodposterior

0 0 ? 0 0

m V

aria

ble� sequentially one by one

∑=

−=

iiiiii

i yyfyyf

yyfyyfyy

)|,()()|,()(

),,(),,(),|(

2121

2121

21 θθπθθπθθπ

)1(

0 0 ? 0 0 1 1/15 1 1/15 1/102 2/15 3/4 1/10 3/203 1/5 1/2 1/10 3/204 4/15 1/4 1/15 1/10


)|()|(),|( 2)1(

121 yyyy iii θπθπθπ ×=4 4/15 1/4 1/15 1/105 1/3 1 1/3 1/2

Sum 2/3 1

Bayesian Inference

M d F PDFM d F PDFMore and Famous PDFMore and Famous PDF

� sequential �Binomial � In n Bernoulli trials (p), suppose Y is the number

ilikelihood prior*

t iIn n Bernoulli trials (p), suppose Y is the number of successes. Then Y| p ~ B(n, p)

�Likelihood: f(Y|pj)

�Selection of Prior for p

θι priorlikelihoody1=1,y2=0

priorlikelihood

posterior

0 1/6 0 0 0 1 1/6 1/5 1/30 1/5 2 1/6 2/7 1/20 3/103 1/6 2/7 1/20 3/10

hnu

.ac.

kr Ba

yesia

�discrete prior p1, p2, …, pI

•prior (pi) * conditional(y| pi) =joint (y, pi)

• example Y=3 observed when n=4

3 1/6 2/7 1/20 3/104 1/6 1/5 1/30 1/5 5 1/6 1/4 0 0

sum 1/6 1

tp://

wol

fpa

ck.h a

n Inference 4 D

�Benefit of Bayesian�better understanding on for C bounds for θ• confident interval vs credible interval

• example: Y=3 observed when n=4

p priorLikelihoodY=3|p

Jointprior*L

posterior

0.4 1/3 0.154 2/39 0.205 0 5 1/3 0 250 1/12 0 334

htt D

iscrete Ra

ndom

confident interval vs. credible interval

�based on likelihood principle• 3 heads are observed in 12 coin tossing to test θ=0.5

•P(Y=3: Y ~ Binomial (12, θ)) vs. P(Y=3: Y ~ NB (3, θ)) • constant = 1/4

0.5 1/3 0.250 1/12 0.334 0.6 1/3 0.346 3/26 0.461

sum 1/4 1

)|()()|( ijiji yfy θθπθπ ∝ m V

aria

ble

• In Frequentist, different results

• In Bayesian, same likelihood function

�easily updated with extra data

� large sampling theory not needed in Bayesian

�Conjugate prior: the prior which has the same kind of pdf for the posterior•Beta(α, β) prior for the p

)|()()|( ijiji yfy


large sampling theory not needed in Bayesian

�Special case: no information • Show that the posterior is Beta(y+α, n-y+β)

Bayesian Inference

F PDF (2)F PDF (2)Famous PDF (2)Famous PDF (2)

�Bayesian Theorem Property�Bayesian Results does not change by

�Poisson Distribution�Discrete PriorBayesian Results does not change by

•multiplying the prior by a constant

• multiplying the likelihood by a constant

Discrete Prior

)|()()|( ijiji yfy θθπθπ ∝

λ priorLikelihoodY=2|λ

Jointprior*L

posterior

1 1/6 0.184 0.0307 0.124 1 5 1/3 0 251 0 0837 0 338

hnu

.ac.

kr Ba

yesia

�Non-information prior�when there is no information about θ

ll k i f ti i

)|()()|( ijiji yfy1.5 1/3 0.251 0.0837 0.338 2 1/3 0.271 0.0902 0.365 2.5 1/6 0.257 0.0428 0.173

sum 0.2473 1

tp://

wol

fpa

ck.h a

n Inference 4 D

�well-known non-informative prior• For 0≤ θ ≤1, uniform (0,1)

• For 0 ≤ θ, Gamma (α, β)

• For - ∞ ≤ θ ≤ ∞, normal (μ, σ-> ∞)�Conjugate prior• π(λ)~ Gamma (α, β)

htt D

iscrete Ra

ndom

( ) ( β)

• Then, posterior is Gamma(α+y, β/(β+1))

m V

aria

ble


Bayesian Inference


�#1•when Y=7

�#2

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference 4 D

htt D

iscrete Ra

ndom

• slide 22 & 23

m V

aria

ble


Bayesian Inference

C ti R d V i blC ti R d V i blContinuous Random VariableContinuous Random Variable

�Continuous Random Variable�X(w)=x; real valued function

�Exponential Family))()(exp()()();( xTCBxAxf θθθ =X(w) x; real valued function

�The domain w is infinite.

�P(X=x)=0; probability=area

� It is not easy to derive PDF

�Sampling dist. of Sum(T(x)) is also exponential family

))()(exp()()();( xTCBxAxf θθθ

hnu

.ac.

kr Contin

•Ù Histogram + Polygon

�CDF

�Relationship�Standard Normal and Chi-Square

∫=≤=x

X dxxfxXPxF )()()( )1(~ 22 =dfZ χ

tp://

wol

fpa

ck.h u

ous R

and

om V

�Expected value �T-Dist.

∞−

∫= dxxfxuxuE )()())(()(~

/)(2kdfT

kkdf

Z=

=χ

htt V

aria

ble�Convergence to X in Probability

�WLLN

�F-Dist.

1)|(|lim =<−∞→

εXXP nn

),(~/)(

/)(2

2baF

bbdfaadf

=

=

χ

χ ),1(~)( 2 bFbT

Z�Convergence to F(x) in Distribution�CLT

�Cauchy Dist.

�Log normal )()(lim xFxF nX =

CauchyZZ ~21


)()(nXn ∞→ NormalLogNormal −~)exp(

Bayesian Inference

C ti C ti R d V i blR d V i blContinuous Continuous Random VariableRandom Variable

�Gamma Dist. and Poisson Dist.

))(|1()(1 PXXPF λ≤

�Gamma, Chi-square, and Beta

))(~|1()(1 )/1,( wPXXPwFG λαλβα −≤=− =

2 ),( βαG

hnu

.ac.

kr Contin�Exponential(memory-less) and Weibull

)(~)2,2/( 2 rrG χβα == ),(~),(),(

),( λαβλβα

βα BGG

G+

tp://

wol

fpa

ck.h u

ous R

and

om V�Sampling Dist of Sum of Random Sample

)exp(~),1( ββα =G ),(~)][exp( /1 βγβ γ Weibull

htt V

aria

ble

�Sampling Dist. of Sum of Random Sample•additivity: Poisson, Chi-Square, Normal

• Exponential, Geometric

•Multiplication: Log-normal


Bayesian Inference

S i l ThiS i l ThiSpecial ThingsSpecial Things

�Moment Generating Function�Uniqueness of MGF

�Joint continuous and discrete�Hierarchical and Mixture model)()( tx

X eEtM =Uniqueness of MGF

�Empirical Rule

Hierarchical and Mixture model

�Mean and Variance of Y

)()(X

)()0()( kkX XEtM ==

)()()()( xFyFtMtM XYYX =⇔=),(~

),(~|βαBetap

pnBinomialpY

hnu

.ac.

kr Contin

p• 68%, 95%, 99%, 6-sigma, 2ppm

�Mean and Variance of Y

))|(())|(()())|(()(

XYEVXYVEYVXYEEYE

+==

tp://

wol

fpa

ck.h u

ous R

and

om V

�Independency and Correlation� (X, Y) are independent Ù f(x,y)=f(x)g(y)

htt V

aria

ble

�Chebychev’s Inequality

�상관계수 ρ=±1 Ù There is a≠0 s.t. P(Y=aX+b)=1

�When (X,Y) are bivariate normal, (X,Y) are independent Ù COV(X,Y)=0

211)|(|k

kXP −≥≤− σμ


Bayesian InferenceP iP iPriorsPriors

�Unobservable Parameter�population proportion, p

�Conjugate prior• same PDF family

),(~)( baBetapπpopulation proportion, p�prior:

�Observed Random Sample

• exponential family only• Posterior

)( pπ),(~)|( ynbyaBetayp −++π

hnu.

ac.k

rBayesia

p� In n Bernoulli trials, let Y be the number of successes.�Likelihood )|,,,( 21 pyyyf nK

tp://

wol

fpac

k.h an Inference for

�Calculate Posterior�∝ Prior*Likelihood a=b=5

htt r Binom

ial

�Choosing Prior�Uniform prior

( b) b th h t)1,0(~)( Uniformpπ• no information• Posterior

• (a, b) are both shape parameter),0()( Unifo mpπ

)1(1)|( )1,0( −⎟⎟⎠

⎞⎜⎜⎝

⎛∝ −pp

yn

yp ynypπ


)1,1(~ +−+⎠⎝ynyBetay

Bayesian InferenceP i (2)P i (2)Priors (2)Priors (2)�JeffreysPrior

• Invariance property for prior�Choosing Prior�Before looking at data

• Uniform prior is not invariant• Invariant Prior for the parameter θ

Before looking at data�Conjugate prior is recommended even when vague; referring the shape of Beta PDF• by matching the location and scale; solving the following two equations

ψθψπθψθπθddhh ))((~)()(~ 1−=⇒

hnu.

ac.k

rBayesia

• Invariant Prior for the parameter, θ

�Fisher Information I(θ)

following two equations

)()( θθπ I∝)(

),()(

0 +=

∝

baapmean

baBetapπ

tp://

wol

fpac


• Score statistics

• E(U)=0, V(U)=Fisher Information�Preparation

θθθ

dyLdU );(log)( =

2

)1()1(

)1()()( 00

20 ++−

=+++

=bapp

babaabstd σ

htt r Binom

ial

• In binomial proportion case (show that)

�Preparation• Graph your guess for Beta(a, b)• Calculate the equivalent sample size of the prior.• variance of sample proportion = prior variance

));(log(]));(log[()( 2

22

θ

θθ

θθd

yLdEdyLdEI −==

)2/1,2/1(~)( Betaθπ

1)1()(

)1()()ˆ( 200

++=⇒+++

=−

==

banbaba

abnpp

nyVpV

eq

eq


1++⇒ baneq

Bayesian InferenceE l i E l i Example: priors Example: priors

�Example�want to know the proportion of residents who are

�priors want to know the proportion of residents who are in favor of a new building�observed: n=100, y=26 (favor)

3 different priorsBart

hnu.

ac.k

rBayesia

�Bart: conjugate prior, Beta(a, b)• mean=0.2, std=0.08=> a=4.8, b=19.2• equivalent sample size n=25Anna no information Uniform prior Beta(1 1)

Chris

tp://

wol

fpac


�Anna: no information, Uniform prior = Beta(1, 1)• equivalent sample size n=3�Chris

• discrete weight, not beta prior

p weight0 00.05 10 1 2

Anna

htt r Binom

ial

• sum of weights shouldn’t be 1. 0.1 20.3 20.4 10.5 0

⎪⎧ ≤≤ 1.00,20 pp

⎪⎩

⎪⎨

≤≤−≤≤=

5.03.0,1053.01.0,2.0)(pp

pp

π


Bayesian InferenceP t iP t iPosteriorsPosteriors�Posterior

• Anna ~ Beta(4.8+26, 19.2+74) Bart• Bart ~ Beta(1+26, 1+74)• Chris ~ Beta in the interval

Bart

hnu.

ac.k

rBayesia

Chris

tp://

wol

fpac


Anna

htt r Binom

ial


Bayesian InferenceW i W i Wrapping up Wrapping up �Effect of the Prior

• more data, less effect of our priors (in the previous �Estimating the proportion�MSE (mean squared error) of Estimator

example, n=100) Ù The data is said to swamp the prior.

�Summarizing the Posterior, Beta (α, β)ot co fide ce i te al c edible i te als h ?

MSE (mean squared error) of Estimator

�Bayesian Estimator?

)ˆ()ˆ()ˆ( pBpVpMSE +=

2ˆˆ

hnu.

ac.k

rBayesia

�not confidence interval, credible intervals; why? exact PDF�measure of location

• posterior mode

• squared error loss function• absolute error loss function• Bayesian Risk : Expectation of Loss function

θθπθθθθ dRRiskBayes )()ˆ()ˆ( ∫=

2)ˆ()ˆ,( θθθθ −=L|ˆ|)ˆ,( θθθθ −=L

tp://

wol

fpac


• posterior median• posterior mean

• Posterior mean for the squared error loss function P d f h b l l f

θθπθθθ

θθπθθθθ

ddyyfL

dRRiskBayes

)()|()ˆ,(

)(),(),(

∫ ∫=

∫=

htt r Binom

ial

�measure of spread• variance, std• percentiles

• Posterior median for the absolute error loss fn.(Theorem 10.3.2, G. Casella and R. Berger, Statistical inference, 1st edition)

∫×=⇔∞−

− kdypkIk

πππβα )|(100),(1


Bayesian InferenceW i (2)W i (2)Wrapping up (2)Wrapping up (2)�Bayesian Credible Interval

• Using percentiles of Posterior�Exercise 2

)|( ypπ• not confidence intervals

hnu.

ac.k

rBayesia

�Exercise 1

tp://

wol

fpac


• do the same things with n=87 and y=6.

htt r Binom

ial

• do the same things with non-information prior *) how to do numerically calculation

CredibleType Prior Posterior Mean Median STD IOR

Lower Upper

Anna No prior Beta(27, 75) 0.265 0.263 0.043 0.059 0.188 0.354

Bart Beta(4.8, 19.2) Beta(30.8, 93.2) 0.248 0.247 0.039 .0.53 0.177 0.328


Chris Discrete Weight Numerical* 0.261 0.255 0.041 0.057 0.181 0.340

Bayesian InferenceC i B i d F ti tC i B i d F ti tComparing Bayesian and FrequentistComparing Bayesian and Frequentist

�Using PDF?�Bayesian: Posterior of p

�Comparing Estimators�Frequentist yp̂Bayesian: Posterior of p

�Frequentist: sampling dist. of the estimator

�Likelihood function� for updating the prior

Frequentist

nppBpVpMSE

np

ff

f

)1(0)ˆ()ˆ( −==+=

=

hnu.

ac.k

rBayesia

p g p� for calculating the more likely value of parameter

�Frequentist (pre-data analysis)

�Bayesian• conjugate prior; Beta(a, b)• observed y out of n • posterior; Beta (a+y b+n-y)

tp://

wol

fpac


q (p y )�Precision of Estimator: MSE

posterior; Beta (a+y, b+n y)

)ˆ()ˆ())ˆ(ˆ()ˆ( 22 θθθθθ VBEEMSE +=−=

2

ˆban

aban

ybanaypb ++

+++

=++

+=

htt r Binom

ial

�Find MVUE (point estimator)• RCRB gives the lower bound for the variance of estimator.

• Find MLE (Ù CSS) by Factorization Theorem or

22

2

)()1()1(

)ˆ()ˆ()ˆ(

banbpapapnp

ban

pBpVpMSE bbb

++−−

+−++

=

+=

exponential family, more likely parameter value what if this actual data is obtained

• R-B Theorem: V(E(UE|SS)) ≤ V(UE)�Calculate CI with Sampling distribution of MVUE


p g• all possible random variable

Bayesian InferenceC i B i d F ti t (2)C i B i d F ti t (2)Comparing Bayesian and Frequentist (2)Comparing Bayesian and Frequentist (2)�Example�Frequentist: solid black line�no information, Uniform = Beta(1,1) prior: blue line�conjugate prior: Beta (2,3); dot black

• prior is rightly skewed, that is why MSE(Bayesian) is the smallest when the parameter θ is small

hnu.

ac.k

rBayesia

the smallest when the parameter θ is small

tp://

wol

fpac


htt r Binom

ial


Bayesian InferenceC i B i d F ti t (3)C i B i d F ti t (3)Comparing Bayesian and Frequentist (3)Comparing Bayesian and Frequentist (3)

�Interval estimation�Frequentist: confidence interval

�Hypothesis Testing�statistical hypothesis: null, alternativeFrequentist: confidence interval

• pre-data concept• using the sampling dist. of the best estimator

statistical hypothesis: null, alternative� the Best Rejection Region by Neyman-Lemma Lemma(LR): UMPT�LR Test: function of SS, -2ln(LR) ~ approximately Chi S)ˆ(ˆ 1 θθ sF −±

hnu.

ac.k

rBayesia

• or large sample theoryChi-Square�power function: significant level, power

�Frequentist

)(2/1 θθ α sF −±

)1,0(~)ˆ()ˆ(ˆN

sEθθθ −

)ˆ1(ˆ

tp://

wol

fpac


• previous example: n=100, y=26, 95% confidence interval = (0 174 0 346)

eq e• One-to-one relationship between CI and Testing• “by chance” or sig. moved.• proof by contradiction.N ll h th i i lid?

nppzp )1(ˆ 2/

−± α

θ

htt r Binom

ial

interval = (0.174, 0.346)

�Bayesian: credible bound

• Null hypothesis remains valid?• (1) Set up Hypotheses.• (2) Calculate the test statistic.• (4) Calculate the p-value. θ̂• post-data

• using the posterior dist of θ, π(θ|y)

• previous example: see the slide of page 34

• (5) Get the conclusion

�Bayesianid d i t i lik F ti t

θ


p e o s e a p e: see t e s de o page 34 • one-sided: using posterior, like Frequentist• two-sided: using Credible Interval

Bayesian InferenceC i B i d F ti t C i B i d F ti t (4)(4)Comparing Bayesian and Frequentist Comparing Bayesian and Frequentist (4)(4)

�Example in One-sided Hypothesis Testing�Better Treatment effect? old p=0.6, New

�Example in Two-sided Hypothesis Testing�A coin is fair? n=15, y=# of heads (10)Better Treatment effect? old p 0.6, New

treatment is effective? 8 patients are cured out of random 10 patients. (sig. level=10%)

F ti t

A coin is fair? n 15, y # of heads (10)

�Frequentist • Hypothesis: null: p=0.5, alternative: p≠0.5

hnu.

ac.k

rBayesia

�Frequentist • Hypothesis: null: p=0.6, alternative: p>0.6• Test statistic: y=# of successes ~ sampling distribution Binomial(n=10, p=0.6)

• Test statistic: y=# of successes ~ sampling distribution Binomial(n=15, p=0.5)

• p-value=P(y≥10|y~B(15,0.5)) + P(y≤5|y~B(15,0.5)) =0.302

tp://

wol

fpac


• p-value=P(y≥8|y~B(10,0.6))=0.1672• can’t reject Ho

�Bayesian

• can’t reject Ho

�Bayesian• Using Credible Interval why? Pr(p )=0ht

t r Binomial

Bayesian• uniform prior• Posterior is Beta(9,3).

• Using Credible Interval. why? Pr(p0)=0• Posterior using Uniform prior: Beta(11,6)• using normal approximation (0.426, 0.868)

0∫p

1189.0)|(0

=∫=− dpypvaluep π


Bayesian Inference

P i f P i (P i f P i ( ))Priors for Poisson(Priors for Poisson(μμ))

�Parameter: mean = μ

�Prior

�Conjugate prior• same PDF family with likelihood function)(μπ�Prior

�Likelihood�a single observation

• likelihood ~ Gamma, Then, Prior ~ Gamma)(μπ

)|,,,( 21 μnyyyf … ),(~)( βαμπ Gamma

)|( efy μμ −

hnu

.ac.

kr Ba

yesia

�Posterior�∝ Prior*Likelihood

μμπ nyyyf×∝ )|,,,()( 21 …

!)|(

yyf μμ =

)/1,( βθα ==k

tp://

wol

fpa

ck.h a

n Inference for

�Choosing Prior�Uniform prior (0, ∞) ),0(~)( ∞Uniformμπ

μμμπ ny ei −∑∝ )(

htt r P

oisson

• no information, improper

• Posterior

� Jefferys’ prior (show that)•α = shape, β = scale parameter

/β i /β2

),1(~

1)|( ),0(nyGamma

ey

i

nyi

+∑

∝ −∑∞

μμ μμπ

� Jefferys prior (show that)• improper

• mean = α/β, variance = α/β2

�Posterior

μμμπ 1)()( =∝ I

1)|( ey nyi∝ −∑ μμμ

μπ),()|(

1 μμ

μβαμπμμβα

μ

∝

∝−∑−−

−∑

ee

eGammayny

ny

i

i


μ),5.0(~ nyGamma i +∑ ),(~ βα

μμ++∑ nyGamma

ee

i

Bayesian Inference

P iP iPriorsPriors

�Choosing Prior�Conjugate prior is recommended. why? Posterior

�Example�number of traffic accident per week = μConjugate prior is recommended. why? Posterior

has the same pdf with the prior

�With prior information on mean and std

number of traffic accident per week μ�observed data

• count the number of accidents for 8 weeks

�4 priors)()( αα d

hnu

.ac.

kr Ba

yesia

• Aretha: no prior ~ Uniform (0, ∞)

• Byron ~Jeffery prior

2

20

20

200

,

)(,)(

αβ

β

ασβα

mm

stdmmean

===>

== 0,1)( >= μμπ

0,1)( >= μμ

μπ

tp://

wol

fpa

ck.h a

n Inference for

�Warning• Graph your prior

• Chase ~ his belief, mean=2.5 std=1 ~ Gamma(2.5, 6.25), equivalent sample size=2.5 <- acceptable

• Diana ~ subjective weight

20

20 σσ

htt r P

oisson

• check the equivalent sample size of your prior

• prior variance = variance of sample mean

• Diana ~ subjective weight

βα==>= eqn

m2

0

• the equivalent sample size is too large? then, you put too much information relative to the amount you will get from the data

ββ

eqeqn 2


you will get from the data

Bayesian Inference

P t iP t iPosteriorPosterior

�Priors �Posteriors�Aretha ~ Gamma(27, 8)

),(~)|( ** βαμπ GammayAretha Gamma(27, 8)

�Byron ~ Gamma(26.5, 8)

�Chase ~ Gamma(32.25, 10.25)

�Dianna ~ Numerically

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference for

htt r P

oisson�observed data• 3, 2, 0, 8, 2, 4, 6, 1

• y=26 n=8• y=26, n=8

• very similar, doesn’t depend on the prior much.


Bayesian Inference

S i i th P t i & I f S i i th P t i & I f Summarizing the Posterior & Inference Summarizing the Posterior & Inference

�Bayesian estimator�posterior mean or median

�when n=6, MSE by π),(~)|( ** βαμπ Gammayposterior mean or median

*

*

*

* 1,ˆβ

β

β

αμ −=b

0.5

hnu

.ac.

kr Ba

yesia

0.7

tp://

wol

fpa

ck.h a

n Inference for

�Inference on the parameter μ�point estimator and their PMS �Credible Interval β

αμ++∑=

ny

bˆ

htt r P

oisson

�point estimator and their PMS Credible Interval �using

�non-symmetric, but α*100% for both side

β+n

2

)ˆ()ˆ(

)ˆ()|ˆ()|ˆ(

nVB

yVyPMS bpbb−

>

−+=

μβμα

μμμμ),(~)|( ** βαμπ Gammay

�Frequentist

2)()(,)(

nV

nB bb

+=

+==>

β

μμβ

βμμ

yMSE fμμ =∑= )ˆ(


nnMSE fμ == )(

Bayesian Inference

S i i th P t i & S i i th P t i & I f (2)I f (2)Summarizing the Posterior & Summarizing the Posterior & Inference (2)Inference (2)

�One-sided testing�null: μ=μ0

�Midterm Exam on 2009.10.30� It will be taken by take-home exam.null: μ μ0

�alternative μ>μ0 or μ=μ0

�p-value

It will be taken by take home exam.

�Problems will be handed out in class.

�The exam should be taken by yourself.

�On the top of your answer sheet, your honesty Gammay∫ ),(~)|( ** βαμπ

hnu

.ac.

kr Ba

yesia

�when μ0=3, observed y=26, n=8

signature should be appeared besides your name.

�Your answer sheet should be turned in Department office of Statistics by noon on 2009.11.2(Monday). The point for the late turn-in

hypothesis null in space parameter the underGammay∫ ),()|( βαμπ

tp://

wol

fpa

ck.h a

n Inference for

will be forfeited.

htt r P

oisson

�Two-sided testing�using credible interval


Bayesian Inference

Di t P i f N l M (Di t P i f N l M ( ))Discrete Prior for Normal Mean(Discrete Prior for Normal Mean(μμ))

�Widely used? “average out” effect

�Discrete Prior

�using MVUE (sample mean)

�y1= 3.2, y2=2.2, y3=3.6, y4=4.1 22

2)(

expσ

μ−−∑ iy

�Discrete Prior�population std σ is known

�equally likely or subjective weights

y1 y2 y3 y4

• slightly different? round off error

2)( μ−−y

22

2)(

exp

p

σ

μ−−y

hnu

.ac.

kr Ba

yesia�Example

22)()|()(~)|(

σμπμμπμπ ∝ eyly

tp://

wol

fpa

ck.h a

n Inference for

p�Set a subjective prior and a guessing value of σ.

�Calculate the likelihood value• using a standard normal table

F

)(σμ−

Φy

htt r N

orma

l Mea

n

• using normal PDF

�Get the Posterior

� single observation: y = 3.2

22

2)(

expσ

μ−−y

likelihoodior×Pr

nsingle observation: y 3.2

�See the left for a subjective prior and the result.

�More than one observation� sequentially one at a time


q y

�all together

Bayesian Inference

C ti P iC ti P iContinuous PriorsContinuous Priors

�Continuous Prior�choose a prior which is similar to likelihood

�How to choose a Normal prior�guess a mean (m) firstchoose a prior which is similar to likelihood

�why?

guess a mean (m) first

� then, choose a std(s) with empirical rule

�use the “equivalent sample size”• large neq : the prior influence too much.

likelihoodiorPosterior ×∝ Pr

yofNormalLikelihoodNormaly

)()()|(

μπμπμπ

∝∝

hnu

.ac.

kr Ba

yesia

�Flat�no information ~ Uniform(-∞, ∞) �Subjective

� interpolate and numerical integration

yofNormal )(μπ∝

2σ

tp://

wol

fpa

ck.h a

n Inference for

�Posterior ~

� refer: how to calculate integral in R),()()|(n

yNormaly σμπμπ ×∝

),(2

nyNormal σ

),/1/

/

/1/

/1( 22

22

22

2

22

2

nsysnnmsnsNorma

++++ σ

σ

σ

σ

σ

htt r N

orma

l Mea

n

�conjugate prior: π(μ) ~ Normal (m, s) �Example�mean length of one-year old rainbow trout

�previous study ~ Normal(s=2)22 σ n

�Posterior ~

�previous study ~ Normal(s=2)• Annie ~ Normal (30, 4)

• Barb ~ flat prior

• Chuck ~ trapezoidal 0, 18, 24~40(weight 1), 46

),(),()|( 2n

yNormalsmNormaly σμπ ×∝

//1 2222 sns σσ


�observed data: n=12, ybar=32),/1/

/

/1/

/1( 222222 ns

sysn

nmsn

sNormal++

++ σ

σ

σ

σ

σ

Bayesian Inference

P t iP t iPosteriorPosterior

�Prior)4,30( 2Normal

1)( =μπ

),(

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference for

�Posterior

4640),46(6/14024,1

2418),18(6/1)(

<<−−=<<=

<<−=

μμμ

μμπ p

htt r N

orma

l Mea

nNormal )3265.0,96.31( n

ynumericallNormal )774.5,32(

),(2


Bayesian Inference

C dibl I t lC dibl I t lCredible IntervalCredible Interval

�Credible interval for μ�using the posterior

�example

using the posterior

�known variance• precision is the reciprocal of the variance

• posterior precision222111

+=

hnu

.ac.

kr Ba

yesia

• posterior mean

�Hypothesis testing�one-sided : H0: μ= μ0 vs. Ha: μ< μ0

222 σ′ ss

ysn

nmsn

sm 22

2

22

2

/1/

/

/1/

/1

++

+=′

σ

σ

σ**

tp://

wol

fpa

ck.h a

n Inference for

�unknown varianceti t l i

� two-sided

szm ′±′ 2/α

2

hypothesis null in space parameter the under

Gammay∫ ∞ ),(~)|( **0

βαμπμ

htt r N

orma

l Mea

n

• estimator: sample variance

• t-dist. is used

�Non-nominal prior

• using the credible interval2σ̂

stm ′±′ 2/α

n

• posterior is not normal.

• numerically calculate with the equal size for the both side.


Bayesian Inference

P di ti D itP di ti D itPredictive DensityPredictive Density

�For the next observation�marginalization process

� integral by du to get conditional pdf

)|()|()|( dff ∫marginalization process

� find the joint pdf of the next obs. and the parameter, given the random sample.

� the parameter is treated as nuisance. ),(

),(),(

),...,,|()|(),...,,|(

222

22211211

ssmmN

dsmNN

dyyygyfyyyyf

nn

nn

nnnn

′=+′=∝

∫∝

∫ ×= ++

σ

μσμ

μμμ

hnu

.ac.

kr Ba

yesia

� the predictive density is

P d

� same as the posterior PDF of the parameter μ�one of the advantage of Bayesian

)|( 1 yyf n+

tp://

wol

fpa

ck.h a

n Inference for

�Procedure� random sample �Exercise #1

),(~,,...,, 2121 σμ knowniidNormalyyyy nn +

htt r N

orma

l Mea

n

�given the parameter, on. pdf of random sample

)|,,...,,( 121 μ+nn yyyyf

n

� let g(u) ne the prior of u

),...,,|(),...,,,|(),...,,|,(

21211

211

nnn

nnyyygyyyyf

yyyyfμμ

μ×= +

+


),...,,|(.}.){|( 211 nn yyygsryf μμ ×= +

Bayesian Inference


�#2 �#3

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference for

htt r N

orma

l Mea

nn


Bayesian Inference

C i ith F ti tC i ith F ti tComparing with FrequentistComparing with Frequentist

�Frequentist MVUE �Comparing MSE=B2+V�example2

�Bayesian estimator

example• estimate milk powder weight, μ (=1, on the label)

• machine produces μ =1015, σ=5, n=10

• Arnold prior ~ N(1000, 10)

B th i N( )

)/,(~ 2 nNyf σμμ =

ynmsb 22

2

22

2 //1+=

σμ

hnu

.ac.

kr Ba

yesia

�E(μb)

• Beth prior ~ N(1015, 7.5)

• Carol prior ~ flat Ù Frequentist

ysnsn

b 2222 /1//1/ ++ σσμ

)()( 22

2μσμ −= mE b

tp://

wol

fpa

ck.h a

n Inference for

�V(μb); clearly smaller than V(μf)

22 σ+ns

nsV b2

222

2)()( σμ =

htt r N

orma

l Mea

n

nnsb 22 )()(

σμ

+

n

• feasible area: 1015±3*5 => (1000, 1030)


feasible area: 1015±3 5 (1000, 1030)

Bayesian Inference

C i ith F ti t (2)C i ith F ti t (2)Comparing with Frequentist (2)Comparing with Frequentist (2)

�Confidence and Credible Interval�FrequentistFrequentist

�Bayesian

nzy σα 2/±

szm ′±′ 2/α

hnu

.ac.

kr Ba

yesia• unknown σ2 using s2 and t distribution

22211

σ

nss

+=′

ysn

nmsn

sm 22

2

22

2

/1/

/

/1/

/1

++

+=′

σ

σ

σ

tp://

wol

fpa

ck.h a

n Inference for

• unknown σ2: using s2 and t-distribution

• Bayesian with flat prior = Frequentist

�Testing Hypothesis

htt r N

orma

l Mea

n

Testing Hypothesis�One sided: H0: μ=μ0, Ha: μ>μ0

• Sampling Dist. )','(~),,(~ 0 smNn

Ny bμσμ

n

• Calculate p-value

• conclude

∫=

−≥=

0)|(

)/

( 0

μμμπ

σμ

dyp

nyzPp

b

f


�Two-sided: using credible interval

∞−

Bayesian Inference

T l ti M diff (T l ti M diff ( ))Two population Mean difference (Two population Mean difference (μμ11--μμ22))

�Independent sample �Paired sample

모집단1x~f(x;θ1)

모집단2y~g(x;θ2) 모집단

),( 22 σμNormal),( 11 σμNormal ),( 22

2121 σσμμμ +−=dNormal

hnu

.ac.

kr Ba

yesia

표본

x f(x;θ1)

Random sample (확률표본), Independently and identically

y g(x;θ2)

표본1 표본2

모집단

),( 22 σμNormal),( 11 σμNormal

tp://

wol

fpa

ck.h a

n Inference for

),,,(121 nxxx …

통계량 (추정치) 계산 및 그 분포 판단

),,,(221 nyyy …

tN l

표본1x~f(x;θ1)

표본2y~f(y;θ2)

htt r M

ean D

ifferen

�equal variance

�population assumption

nkno n sigma

NormalpptNormalyx

~ˆˆ,~

21−− )(),( iiiii yxDyx −=⇒

통계량 (추정치) 계산 및 그 분포판단

표본

nce�unknown sigma

� large sample theoryX모수 parameter

•평균 차이 μ1-μ2

•비율 차이 p1-p2X모수 parameter

•평균 μd=(μ1-μ2)

tNormalD ,~


평균 μd (μ1 μ2)

Bayesian Inference

I d d t RS f t l l ti (k l i )I d d t RS f t l l ti (k l i )Independent R.S. from two normal populations (known equal variance)Independent R.S. from two normal populations (known equal variance)

�Equal variance (σ1=σ2=σ) known�with normal or flat prior 22

2' isσwith normal or flat prior22

2)(ii

ii

sn

ss+

=σ

σ

iii

ii

ii

ii y

sn

nmsn

sm 22

2

22

2'

/1/

/

/1/

/1

++

+=

σ

σ

σ

hnu

.ac.

kr Ba

yesia

� since , the posterior of is

iiii

),(~ ''iii smNμ 21 μμμ −=d

))()(,(~ 2'1

2'1

'1

'1

' ssmmmN dd +−=μ

tp://

wol

fpa

ck.h a

n Inference for

�Credible interval�Confident Interval for Frequentist

2'2/

' )( dd szm α±

htt r M

ean D

ifferen

• same as flat prior

�One sidedH H

212/21

11)(nn

zyy +±− σα

nce�Ho: μ=μ0, Ha: μ>μ0

�Two sided

)0( '

'

d

ds

mzPp −≤=


�using credible interval

Bayesian Inference

I d d t RS f t l l ti I d d t RS f t l l ti ( k l i d )( k l i d )Independent R.S. from two normal populations Independent R.S. from two normal populations (unknown equal variance and more)(unknown equal variance and more)

�Equal variance (σ1=σ2) unknown�with flat prior

�Unequal Variance (σ1≠σ2)�Known σ1, σ2with flat prior

� since the σ is unknown, it is estimated by the l d i

iiii nsym

2'' , σ==

Known σ1, σ2

•with normal or flat prior

))()(,(~ 2'1

2'1

'1

'1

' ssmmmN dd +−=μ

hnu

.ac.

kr Ba

yesia

pooled variance

�By Frequentist approach, the credible interval for

2)1()1(

21

222

2112

−+

−+−=

nnsnsn

sp

�Unknown• Sample variances of each samples are estimators for

the population variance respectively.

•with flat prior, Satterthwaites adjustment

tp://

wol

fpa

ck.h a

n Inference for

By Frequentist approach, the credible interval for the mean difference, μd

212/21

11)(nn

styy p +±− α

htt r M

ean D

ifferen

�One-sided test• using the t-dist. )2(~ 21 −+= nndftdμ

nce


Bayesian Inference

I d d t RS f t l l ti ( l i )I d d t RS f t l l ti ( l i )Independent R.S. from two normal populations (unequal variance)Independent R.S. from two normal populations (unequal variance)

�Exercise HW-1 due 12.11

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference for

htt r M

ean D

ifferennce


Bayesian Inference

P i d R d S lP i d R d S lPaired Random SamplePaired Random Sample

�Two Normal populations�Normal (μ1, σ1) and Normal (μ1, σ2)

�Exercise HW-2 due 12.11Normal (μ1, σ1) and Normal (μ1, σ2)

�Now, it is same as one population mean (μd) case

),( 22

21

221 σσσμμμ +=−= ddNormal

hnu

.ac.

kr Ba

yesia

•Credible Interval and Hypothesis testing (Refer to the 13th week lecture note)

tp://

wol

fpa

ck.h a

n Inference forht

t r Mea

n Differennce


Bayesian Inference

T T l ti til ti tiTwo Two population proportionpopulation proportion

�Unknown parameters �Exercise HW-3 due 12.11

μ1 μ2

y11, y12, …, y1n1 y21, y22, …, y2n2

21 πππ −=d

hnu

.ac.

kr Ba

yesia

• y’s are 0 or 1

�Y1 ~ Binomial (n1, π1), Y2 ~ Binomial (n2, π2)

�Beta priors for and ) Beta(a b )

y11 y12 y1n1 y21, y22, , y2n2

tp://

wol

fpa

ck.h a

n Inference for

�Beta priors for π1 and π2) ~Beta(ai, bi)�posterior ~ Beta ( )

�Posterior for 21 πππ −=d

iiiiiii ynbbyaa −+=+= '' ,

htt r M

ean D

ifferen

�Posterior for �approximately normal ( )

�Credible Interval

21 πππ =d2'' , dd sm

'2

'1' −=

aamd'

2/'

dd szm α± nce

�Two sided hypothesis testing

�One sided Hypothesis testing (Ha: πd > 0)2'

2'2

'2

'2

'2

'2

2'1

'1

'1

'1

'1'12'

'2

'2

'1

'1

)1)((

)(

)1)((

)(

++++

+++=

+−

+=

bababa

bababas

babam

d

d 2/ dd α

0 'dm−


22221111 ))(())(()0()0:( '0

d

dd

s

mzPHPp ≤=≤= π

Bayesian Inference

Si lSi l R i R i M d l d S lM d l d S lSimpleSimpleRegression: Regression: Model and Scatter plotModel and Scatter plot

�Linear model�Linearity

iii ebxay ++=Linearity

�X is deterministic, therefore e and Y are r.v.

�Assumption�Error term ),0(~ 2σiidNei

y(2)(1)

hnu

.ac.

kr Ba

yesia

• Normality; CLT, log transformation

• Independent; DW, lag variable

• Equal Variance, Homoscadicity: WLS. Division of the problem making explanatory variable (see the

),(i

x

(3)

tp://

wol

fpa

ck.h a

n Inference in

problem making explanatory variable (see the graph)

�Mean assumption

x

0| 0 bxaxy +=μ

htt R

egression

�Scatter Plot�Diagnosis of relationship

li i tiy

)ln(y

0| 0xy

• linearization

�Pre-recognize outliers and/or influential obs.• observation (1), (2), (3)


Bayesian Inference

Si lSi l R i R i i i d i ifii i d i ifiSimpleSimpleRegression: Regression: estimation and significanceestimation and significance

�Estimation�OLS: ordinary least square, dist. free estimation,

yxbay ˆˆˆ +=

ciyiii ebxay ++=

OLS: ordinary least square, dist. free estimation, then why? normal assumption is needed

�MLE

�Gauss Markov Theorem: OLS is BLUE

y

cd

hnu

.ac.

kr Ba

yesia

�Alternative model

�MSE=estimation of σ2

x

x

−= ˆˆ xbya

iixi exxBAy +−+= )(

)ˆ( 2∑ yy

tp://

wol

fpa

ck.h a

n Inference in

�MSE=estimation of σ

�Significance�coefficient t-test

∑∑

−−−

== 2)()()(ˆ

xxyyxx

SxxSxyb

i

ii

yA =

2)(

−∑ −

=n

yyMSE ii

htt R

egression

coefficient t test

�Goodness of Fits: ANOVA• Determinant coefficient

• Correlation coefficient22 xx

yxxyB

yAx

−

−=

=

� (Pop Quiz) • which one is right?

yyyy

yySST

iiii

ii

∑ −+∑ −=

∑ −=+=22

2

)ˆ()ˆ(

)()2()1(


which one is right?SSRSSE +=

Bayesian Inference

B i ti t f B i ti t f d d ββBayesian estimator for Bayesian estimator for ααand and ββ

�Bayesian Theorem �Joint prior for α and β�we can use normal or flat prior

�Joint likelihood for α and β

we can use normal or flat prior

�mean and std for

likelihoodpriorposterior ×∝)()(),( βπαπβαπ xx =

xα

hnu

.ac.

kr Ba

yesia�with subtracting and adding

• guess the mean of X and use it for mα

• (upper – lower)/6 = sα�mean and std for β

• may choose mβ=0

22 )))(((

21

),(xxy ixi

elikelihood−+−−

∝βα

σβα

y

tp://

wol

fpa

ck.h a

n Inference in

may choose mβ=0

• (upper – lower)/6 = sαβ

�Joint Posterior })(2{

21 222

),(ynSSSSSS xxxyy

el−++−−

∝αββ

σβαyA =

y

)|()()|( βαβαπβαπ ylikelihoody ×∝

htt R

egression

�dividing two parts�normal

22 1)(1 SSxyβ

22 xx

yxxyB

yAx

−

−=

= ),|(),()|,( βαβαπβαπ xxx ylikelihoody ×∝

x Anms

mn α σ 22

' //111++

MSE=2σ̂

�product of two separate likelihood

� like normal shape with mean B and A and

22

22 )(

/21)(

/2),(y

nSSSS xx

y

x eel−−−−

×∝α

σβ

σβαx

x

x

x

xx

xx

As

ms

mss

αα

αα

ααα σ 22222 /1/1

,′

+′

=+=′

BSSsSS xx22

' //111 β σ


variances respectively nSxx /,/ 22 σσB

sm

sm

ssxx22222 /1/1

,β

ββ

ββ

ββ σ ′+

′=+=

′

Bayesian Inference

C dibl I t l d H th i T tiC dibl I t l d H th i T tiCredible Interval and Hypothesis TestingCredible Interval and Hypothesis Testing

�Credible interval for the slope β �Bayesian:'

2/'

βαβ szm ± )( 11 xxnxn −+= ++ βαμ

�unknown variance• same as Frequentist when the flat prior

• predictive distribution (result)2/ βαβ szm ±

)(')2;2/(' MSEstm n βαβ −±

''

'',11

)(

),(~)|( ++ smNormalxyf yydatann

f

11 nxn ++

hnu

.ac.

kr Ba

yesia

• same as Frequentist when the flat prior

�Hypothesis testing� two sided: using Credible interval

222 '

.)(

σμ

μ

+=′

=

ss

postmeanmm

y

y of

tp://

wol

fpa

ck.h a

n Inference in

two sided: using Credible interval

�One sided: �Credible interval for the future obs.• using z-table or t-table with MSE

∫==0

)|()|( 0H

ddatadataHunderPp ββπβ '

2/2/' )2;( yy sntzm −± αα

htt R

egression�Predictive Dist. for future obs.�Frequentist

0H 2/2/ )2;( yy sntzm ± αα

• confidence interval

• prediction interval exxyNormalxxyE

++=+=

00

00|

~)|(βα

βα


Bayesian Inference

E lE lExampleExample

�Problem �Solution

hnu

.ac.

kr Ba

yesiatp

://w

olfp

ack

.h an Inference in

htt R

egression


Bayesian Inference

E i E i Exercise Exercise hn

u.a

c.kr B

ayesia

tp://

wol

fpa

ck.h a

n Inference inht

t Regression


Bayesian Inference

R liR liResamplingResampling

�Definition� sampling or scrambling the original (sample) data

�Randomization (permutation) test�proceduresampling or scrambling the original (sample) data

numerous times

�nonparametric method

• take original (sample) data• reordering the data (reshuffling the subscript)• calculate the test statistic, Λ for testing the null

hypothesis

hnu

.ac.

kr Ba

yesia

�Methods� jackknifing • computing the statistics of interest (medians,

variances percentiles) for all combination of the

•do the same thing a numerous times to get the sampling dist. of Λ and test the null with p-value

height

tp://

wol

fpa

ck.h a

n Inference in

variances, percentiles) for all combination of the original data or subsets of available data

�bootstrapping•drawing randomly with replacement from a set of

data points to estimate the sampling distrib tion

� (example)• Test the difference of by gender•Data: W of size nm and W of size nf

• The dist. of W is not normal, but very skewed.

3/1)(weightheightW =

htt R

esam

pling

data points to estimate the sampling distribution

�permutation tests• Exchanging labels on data points (or scrambling the

order of the original data) when performing f

• computing the mean difference of W from the original data, for example 10 unit.• Take nm from all W of size (nm+nf) and assign “male”

to the sample observation.A i “f l ” t th i i d C ti significance tests •Assign “female” to the remaining and Computing the means of W separately.•do the same thing 5,000(N) times.• k times the mean difference of (male-female) out of

N went over 10 units Then k/N is p-value for the


N went over 10 units. Then k/N is p value for the one-sided test.

Bayesian Inference

J kk ifJ kk ifJackknifeJackknife

�Tukey (1958)� suggest Jackknife idea based on removing data

� the 100(1-α)% conf. interval for θ

suggest Jackknife idea based on removing data and then recalculating the estimator

�Idea �Motivation of Jackknife

)ˆ(ˆ *2/

* θθ α Vt±

hnu

.ac.

kr Ba

yesia

� sample data of size n:

k b h h l d

� reduce the bias

�Example∑

−=

≠−

jiij x

nx

11

),...,,( 21 nxxx

tp://

wol

fpa

ck.h a

n Inference in

�we know both the sample mean and x(-j)

� jth data obs.

� similarly

�estimate the species diversity based on n samples?

� total number S=120 of species in the sample• it should be under-estimate

B d O t (1978) t J kk if

jj xnxnx −−−= )1(

htt R

esam

pling

similarly,

� jackknife estimator of θ is

•Burn and Overton (1978) suggest a Jackknife estimator to reduce the bias.

),...,,(ˆ,ˆ)1(ˆˆ 21*

njj xxxwherenn φθθθθ =−−=

5.14725*109120)1(ˆ 1

* =+=−

+= fnnSS

�an approximate sampling error of

ni∑=*

* ˆˆ θθ

*θ̂


)1()ˆˆ()

ˆ()ˆ(

2****

−∑ −

=∑=nnn

VV ii θθθθ

Bayesian Inference

B t t B t t Bootstrap Bootstrap

�Idea� re-sampling of obtained single sample for

�Standard bootstrap confidence intervalsP 22 1re sampling of obtained single sample for

generating the sampling dist.• robustness is depending on the sample size, in

general, more than 20

�with replacement sampling� then

∑ −−

=P

iBip

S 22 )ˆˆ(11 θθ

Sz 2/ˆ αθ ±

hnu

.ac.

kr Ba

yesia

�with replacement sampling• randomized sampling = without replacement

�Procedure for estimating bias�Efron’s percentile confidence intervals �using bootstrapping sampling dist

tp://

wol

fpa

ck.h a

n Inference in

oced e o est at g b as�Let be the estimator for θ�Let denote the estimate using the original

data and be the estimator using ith bootstrap l

�using bootstrapping sampling dist.

0̂θ

),...,,(ˆ 21 nxxxφθ =

iθ̂

)ˆ,ˆ( 2/,2/, αα θθ HL

htt R

esam

pling

sample.

�The mean of all the bootstrap is used for estimating the bias.

�Bias and is the bootstrap

∑=iiB P

θθ ˆ1ˆ

θθ −= )ˆ(ˆ EB Bθ̂estimator of .

�Then, the bootstrap estimator of bias is

0̂ˆˆ θθ −= BB

)()ˆ(θE

B


Bayesian Inference

W iW iWrappingWrapping

�Bootstrap in R�boot {boot}

�Jackknife in R� jackknife {bootstrap}boot {boot}

•Generate R bootstrap replicates of a statistic applied to data. Both parametric and nonparametric resampling are possible. For the nonparametric bootstrap, possible resampling

jackknife {bootstrap}• See Efron and Tibshirani (1993) for details on this

function.

� jackknife(x, theta, ...)

hnu

.ac.

kr Ba

yesia

p p, p p gmethods are the ordinary bootstrap, the balanced bootstrap, antithetic resampling, and permutation. For nonparametric multi-sample problems stratified resampling is used. This is specified by including a

f �Bayesian and Resampling

tp://

wol

fpa

ck.h a

n Inference in

vector of strata in the call to boot. Importance resampling weights may be specified.

�boot(data, statistic, R, sim="ordinary", stype="i",

�Bayesian and Resampling�Bayes vs. Resampling: A Rematch, Campbell

Garvey et. al. (2006)• summary of 3 pages.

htt R

esam

pling

strata=rep(1,n), L=NULL, m=0, weights=NULL, ran.gen=function(d, p) d, mle=NULL, simple=FALSE, ...)


Download - 개요, 베이즈 정리이산형 추정모비율, 모평균, 모평균 차이회귀 ...wess.hannam.ac.kr/Stat_Notes/adv_stat/Special_topic/... · 2018-03-30 · 개요, 베이즈

Top Related