stat 231 midterm 2

STAT 231 MIDTERM 2

Introduction

• Niall MacGillivray• 2B Actuarial Science

Agenda

• 6:05 – 6:25 Likelihood Functions and MLEs

• 6:25 – 6:35 Regression Model• 6:35 – 6:50 Gaussian, Chi Square, T RVs• 6:50 – 7:00 Sampling and Estimators• 7:00 – 7:30 Confidence Intervals• 7:30 – 8:00 Hypothesis Testing

Probability Models

• Random Variables– Represents what we’re going to measure in our

experiment

• Realizations– Represents the actual data we’ve collected from

our experiment

Binomial Model

• Problem: what is π, the proportion of the target population that possesses a certain characteristic

• We will use our data to estimate π• Let X be a random variable that represents the number of

people in your sample (of size n) that possesses the characteristic– X ~ BIN (n, π)

• A realization of X will give us the number of people in our sample that possesses the characteristic.

Response Model

• Problem: what is μ, the average variate of the target population

• We will use our collected data to estimate μ• Let Y be a random variable that represents the measured

response variate• Y = μ + R R~G(0, σ )

– Y ~ G(μ, σ)• A realization of Y is the measured attribute of one unit in the

sample

Maximum Likelihood Estimation

• Binomial = ; x = # of successes

• Response = ; yi is the ith realization

• Maximum Likelihood Estimation – A procedure used to determine a parameter

estimate given any model

n

x

n

yn

ii

1

Maximum Likelihood Estimation

• First, we assume our data collected will follow a distribution

• Before we collect the sample random variables– {Y1, Y2, …, Yn}

• After we collect the sample realizations– {y1, y2, …, yn}

• We know the distribution of Yi (with unknown parameters), hence we know the PDF/PMF

Likelihood Function

• The Likelihood Function:

• Likelihood: the probability of observing the dataset you have– We want to choose an estimate of the parameter θ that gives the

largest such probability– Ω is the parameter space, the set of possible values for θ

,);()(1

ii

n

i

yYPL Discrete (with sample)

Discrete (no sample) ,);()( yYPL

,);()(1

i

n

i

yfL Continuous

MLE Process

• Step One: Define the likelihood function

• Step Two: Define the log likelihood function l(θ ) = ln[L(θ)]

• Step Three: Take the derivative with respect to θ– If there are multiple parameters to estimate: take partial derivatives

with respect to each parameter

• Step Four: Solve for zero to arrive at the maximum likelihood estimate

• Step Five: Plug in data values (if given) to arrive at a numerical maximum likelihood estimate (or values of other MLEs if multiple parameters)

Example 1

Derive the MLEs of the Gaussian distributionwith parameters μ and σ, for a sample of data ofsize n.

2)(2

1

2

1)(

iy

i eyf

Regression Model• Let Y|{X=x} be a random variable that represents the measured

response variate, given a certain value of the explanatory variate• We can define a distribution Y|{X=x} ~ G(μ(x), σ)

– Simple Linear Regression: Y |{X=x} = α + βx + R, where R ~ G (0, σ)– Y|{X=x} ~ G(α + βx , σ)

• Response Model: μ is the average response variate of the target population

• Regression Model: α + βx is the average response variate of the subgroup in the target population, as specified by the value of the explanatory variate x– Allows us to look at subgroups within the target population

Regression Model• Problem: what is α, the average response variate of the

subgroup in the target population where the explanatory variate, x, is equal to 0?

• Problem: what is β, the change in the average value of the response variate, given a one unit change in x, our explanatory variate?

• We will use our collected data to estimate α or β using the MLE method

Example 2

Derive the MLEs of the simple linear regressionmodel with parameters α, β, and σ given asample of size n.

Gaussian DistributionGaussian Distribution• f(x; μ, σ) =

• If Y ~ G(μ,σ), then Z = ~ G(0,1)• If Y1,…,Yn are independent G(μ1,σ1),…,G(μn,σn):

– ~ G( , )

• If Y1,…,Yn are iid G(μ, σ):– ~ G(nμ, σ )

– = ~ G(μ, σ/ )

2)(2

1

2

1

x

e

Y

n

i

iiYb1

n

i

iib1

n

i

iib1

22

n

i

iY1

n

Y

n

i n

Yi

1n

Chi-Squared

• Y, a random variable following a distribution is defined as

, Y > 0where – We say , where n = degrees of freedom– E(Y) = n– If and , – Can use tables to get quantiles based on degrees of

freedom

2n

222

21 ... nXXXY

)1,0(~ GX i

)(~ 2 nY

)(~ 112 nY )(~ 22

2 nY )(~ 21212 nnYY

Chi-Squared Table

Example 3

• Prove that, if X ~ and Y ~ , a) E(X) = m b) X + Y ~

• If m = 10, estimate P(X > 20)

2m 2

p

2pm

t Distribution

• T, a random variable following a distribution, is defined as

where and Z and S are independent

We say where m is the degrees of freedom

mt

mSZ

T

)1,0(~ GZ )(~ 22 mS

mtT ~

T Table

Estimators

• In the data stage, we assume follows a certain distribution– Population Distribution

• From MLE in Example 1, we obtained an estimate – This is a number

• The corresponding estimator is – This is a random variable– Replace all instances of realizations (xi) with RVs (Xi)

– If = g(y1,y2,…,yn), then = g(Y1,Y2,…,Yn)

• We can then look at the distribution of these parameter estimators, aka the sampling distribution of θ

• We can make probability statements about the accuracy of our parameter estimates

1)ln(

1

n

iix

n

1)ln(

~

1

n

iiX

n

iX

~

~

Response Model Estimators• In the response model Y = μ + R ~ G(μ, σ ):

– For a sample y1,y2,…,yn , = / n

• The corresponding estimator = / n

• The distribution of ~ G(μ, σ/ )• The sample error, - μ , ~ G(0, σ/ )

– For a sample y1,y2,…,yn , =

• The corresponding estimator =

• In this case, we call the sample variance

n

iiy

1

~

n

iiY

1

n

1ˆ n

n

i

i

n

y

1

2

1

)ˆ(

1~

n

n

i

i

n

Y

1

2

1

)~(

n

21

~n

~

~

Confidence Intervals• In estimation problems, we use collected data to determine

estimates for model parameters

• Confidence intervals are statements about the true model parameter being in between two values: – We can make a statement about our ‘confidence’ that μ is

located somewhere between a and b– The confidence is measured by probability statements

• We will use sampling distributions to make probability statements as a starting point in determining the end values of the confidence interval

ba

Confidence Intervals

• A confidence interval helps answer the question: “What is the probability that ?”

– C(θ) = P[ ] coverage probability– Confidence interval = [l(D), u(D)] – Interpretation: the true value of the parameter θ will fall

in the confidence interval [l(D), u(D)] in proportion C(θ) of all cases

)()( DUDL )()( DUDL

Confidence Intervals for the Response Model

• Sampling distribution for the response model:

• But we want a distribution we can work with (we want to use our probability tables) so standardizing gives

• For now, we will assume the true value of σ (population standard deviation) is known

),(~~n

G

)1,0(~~

G

n

~


• Our goal: find (a,b) such that P( ) = 0.95• Method: construct a 95% interval estimator (coverage interval) such that

or equivalently

What are a and b? Use = to get a confidence interval

ba

95.0)~

(

c

n

P

95.0)~

(

c

n

cP

~

Example 4

),(n

cn

c

• Coverage Interval:

• Confidence Interval:

• is called the Standard Error n

)~,~(n

cn

c


• Often, we don’t know the value of σ• So we need to use the sample standard

deviation as an estimator for σ:

becomes

where =

nn 1

~~

n

~

1~

n

n

i

i

n

y

1

2

1

)~(


• no longer follows a G(0,1) distribution!

– ~ tn-1

– New 95% CI for :

nn 1

~~

nn 1

~~

),( 11

nc

nc nn

95.0)~~

(1

c

n

cPn

Example 5

),( 11

nc

nc nn

T Table

Confidence Intervals for the Binomial Model

• Population Distribution: Y ~ Bin (n, π)• The parameter we want to estimate is π

• Using MLE, we get an estimate of – This is a number

• The corresponding estimator is – This is a random variable– What is the sampling distribution?

n

y

n

Y~


• To derive the sampling distribution for , consider the

expectation and variance of :– E(Y) = nπ– Var(Y) = nπ(1 – π)

• CLT tells us that, for large n, Y is well approximated as a Gaussian:

• Then will also be a Gaussian:

))1(

,(~~n

G

),(~ nBinY

))1(,(~ nnGY

n

Y~

n

Y~


Standardizing gives

We will use an approximation instead of

Confidence Interval:

n

)1(

)1,0(~)1(

~G

n

n

)~1(~

))1(

,)1(

(n

cn

c

Example 6

))1()1(

(n

cn

c

Confidence Intervals for the Regression Model

• Population Distribution:• Using MLE, we obtain:

• Your course notes simplify by:

n

ii

n

iii

xx

xxY

1

2

1

)(

)(~

n

ii

ii

xx

xxc

1

2)(

)(

n

iiicY

1

~

),(~}{| xGxXY


• What is the sampling distribution of ?

– is a linear combination of independent Gaussians, and thus is Gaussian itself

–

– Standardizing gives

– If σ is unknown, then

n

iiicY

1

~

~

),(~~

1

2

n

iicG

)1,0(~~

1

2

G

cn

ii

2

1

22

~~

~

nn

iin

t

c


• Confidence Interval:

– Assuming sigma is unknown, we will get c from the t table with (n – 2) degrees of freedom

),(1

22

1

22

n

iin

n

iin cccc

TerminologyThe random variables that we’ve used to construct confidence intervals

are called pivotal quantities– Distribution does not depend on choice of parameters

Confidence intervals are often written in the form– Point Estimate c Standard Error (SE)– Point Estimate: the MLE for the parameter– c: found using probability tables depending on the distribution of the

pivotal quantity

)1,0(~~

G

n

1

1

~~~

n

n

t

n

)1,0(~)~1(~

~G

n

2

1

22

~~

~

nn

iin

t

c

Terminology

Standard Error (SE): square root of the variance of our sampling distribution (replace all unknown parameters (i.e. σ) with estimates)

• Response (σ known)

• Response (σ unknown)

• Binomial

• Regression

n

nn 1

n

)1(

n

iin c

1

22

Confidence Interval Recap

Response Model (σ known)

Response Model (σ unknown)

Binomial Model

Regression Model

nc

nc

nc

)1(

n

iin cc

1

22

Interpretation of theConfidence Interval

• Does NOT mean there’s a 95% chance our true parameter will be between a and b

• 95% confidence interval: after repeatedly collecting data and calculating lots of confidence intervals, around 95% of them will contain the actual parameter

Hypothesis Testing

1) Define the null hypothesis, define the alternate hypothesis

2) Define the test statistic, identify the distribution , calculate the observed value

3) Calculate the p-value

4) Make a conclusion about your hypothesis

Hypothesis Testing1) Define the null hypothesis, define the alternate

hypothesis:

Null hypothesis always contains an “=” sign!

00 : H 0: aH

00 : H 0: aH

00 : H0: aH

Hypothesis Testing2) Define the test statistic, identify the distribution; calculate

the observed value

Assume that H0 will be tested using some

random data

Test Statistic: random variable, denoted DDistribution: of the test statistic, the standardized sampling

distribution of the model based on H0

Observed Value: a realization of the test statistic from our data

Hypothesis Testing

Test Statistics:

These distributions only hold because of theNull hypothesis: θ = θ0

)1,0(~)1(

~0 G

n

D

)1,0(~

~0 G

n

D

11

0 ~~~

n

n

t

n

D

Response (σ known)

Response (σ unknown)

Binomial

Hypothesis Testing

Calculate the observed value:

n

dobs)1(

0

n

dn

obs

1

0

n

dobs 0

Response (σ known)

Response (σ unknown)

Binomial

Hypothesis Testing3) Calculate the p-value

p-value = for

p-value = for

p-value = for– p-value (aka observed significance level) is the tail probability of

observing a dataset more extreme than our sample data, given H0 is true

)( obsdDP

)( obsdDP

)( obsdDP

00 : H

00 : H

00 : H

Hypothesis Testing4) Make a conclusion about your hypothesis

General Rule of Thumb• If the p-value > 0.05, do not reject the null

hypothesis• If the p-value < 0.05, reject the null hypothesis

Example 7

What if we want to test if the average weight is less than 18 ounces?

T Table

Example 8

Yao Ming is assumed to shoot free throws at an 80% success rate. In a sample of 50 free throws, Yao Ming makes 45. Test the hypothesis that Yao is an 80% free throw shooter.

T Table

Example 9

Professor Banerjee models the relationship between Stat 230 marks (X) and Stat 231 marks (Y) using a simple linear regression model and a sample of size 102. He obtains the following results:

MLE for Alpha: 99% Standard Error for Alpha: 10%MLE for Beta: -0.4 Standard Error for Beta: 0.20MLE for Sigma: 0.21 Standard Error for Sigma 0.04

Test the hypothesis that Beta = 0.

Questions

Questions???

stat 231 midterm 2

Documents

realization of x

average response variate

average variate

response modelproblem

likelihood functions

sample of data

log likelihood function

likelihood function