1 sta 517 – introduction: distribution and inference 1.5 statistical inference for multinomial...

23
1 STA 517 – Introduction: Distribution and Inference STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS Recall multi(n, =( 1 , 2 , …, c )) Suppose that each of n independent, identical trials can have outcome in any of c categories. if trial i has outcome in category j = 0 otherwise represents a multinomial trial, with Let denote the number of trials having outcome in category j. The counts have the multinomial distribution. Note: are random variables

Upload: august-edwards

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

1STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS

Recall multi(n, =(1, 2, …, c))

Suppose that each of n independent, identical trials can have outcome in any of c categories.

if trial i has outcome in category j = 0 otherwise

represents a multinomial trial, with

Let denote the number of trials having outcome in category j.

The counts have the multinomial distribution.

Note: are random variables

Page 2: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

2STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Example: Mendel’s theory

To test Mendel’s theories of natural inheritance. Mendel crossed pea plants of pure yellow strain with plants of pure green strain.

He predicted that second-generation hybrid seeds would be 75% yellow and 25% green, yellow being the dominant strain.

One experiment: produce n=8023 seeds, and observed n1=6022 yellow, n2=2001 green.

He want to test whether it follows 3:1 ratio.

Page 3: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

3STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

1.5.1 Estimation of Multinomial Parameters

To obtain MLE, the multinomial probability mass function is proportional to the kernel

The MLE are the {j} that maximize (1.14). Log likelihood

Differentiating L with respect to j gives the likelihood equation

ML solution satisfies

)1log(loglog)(1

1

1

1

c

jjc

c

jjjj

jj nnnL

Page 4: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

4STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

MLE

Now

Thus MLE

The MLE are the sample proportions.

Page 5: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

5STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

1.5.2 Pearson Statistic for Testing a Specified Multinomial

In 1900 the eminent British statistician Karl Pearson introduced a hypothesis test that was one of the first inferential methods.

It had a revolutionary impact on categorical data analysis, which had focused on describing associations.

Pearson’s test evaluates whether multinomial parameters equal certain specified values.

Page 6: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

6STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Pearson Statistic

Consider

When H0 is true, the expected values of {nj}, called expected frequencies, are

Pearson proposed the test statistics

Greater difference produce greater X2 values, for fixed n.

Let denote the observed value of X2. The P-value is

Page 7: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

7STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

1.5.3 Example: Testing Mendel’s Theories

n1=6022 yellow, n2=2001 green MLE:

test whether it follows 3:1 ratio, i.e.

Expected frequencies are

This does not contradict Mendel’s hypothesis.

,2494.08023

2001ˆ,7506.0

8023

6022ˆ 21

25.0,75.0: 2021010 H

Page 8: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

8STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

SAS code

data D;

input outcome $ w;

cards;

yellow 6022

green 2001

;

proc freq; weight w;

table outcome/chisq TESTP=(0.25 0.75);

run;

Page 9: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

9STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Pearson statistic

When c=2, it can be proved Pearson chi-square statistic is squared score statistic

PROOF: by Maple in matlab

How about c>2?

syms y n pi0

f=(y-n*pi0)^2/pi0+((n-y)-n*(1-pi0))^2/(1-pi0);

f1=simple(f)

%result: -(-y+pi0*n)^2/n/pi0/(-1+pi0)

Page 10: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

10STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

An alternative test for multinomial parameters uses the likelihood-ratio test.

The kernel of the multinomial likelihood is Under H0 the likelihood is maximized when In the general case, it is maximized when The ratio of the likelihoods equals

Thus, the likelihood-ratio statistic is

1.5.5 Likelihood-Ratio Chi-Squared

Page 11: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

11STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

LR

In the general case, the parameter space consists of {j} subject to j=1, so the dimensionality is c-1. Under H0, the {j} are specified completely, so the dimension is 0. The difference in these dimensions equals c-1.

For large n, G2 has a chi-squared null distribution with df c-1.

Page 12: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

12STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Both chi-squared dist. With df=c-1 Asymptotically equivalent

Page 13: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

13STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Wu, Ma, George (2007)

Page 14: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

14STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

1.5.6 Testing with Estimated Expected Frequencies

Pearson’s chi-square was proposed for testing H0: j=j0, where j0 are fixed.

In some application, j0=j0() are function of a small set of unknown parameters .

ML estimates of determine ML estimates of {j0=j0()} and hence ML estimates of expected frequencies in X2.

Replacing by estimates affects the distribution of X2.

the true df=(c-1)-dim()

Page 15: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

15STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Example

A sample of 156 dairy calves born in Okeechobee County, Florida, were classified according to whether they caught pneumonia within 60 days of birth.

Calves that got a pneumonia infection were also classified according to whether they got a secondary infection within 2 weeks after the first infection cleared up.

Hypothesis: the primary infection had an immunizing effect that reduced the likelihood of a secondary infection.

How to test it?

Page 16: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

16STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Data structure

Calves that did not get a primary infection could not get a secondary infection, so no observations can fall in the category for ‘‘no’’ primary infection and ‘‘yes’’ secondary infection.

That combination is called a structural zero.

Page 17: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

17STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Test: whether the probability of primaryinfection was the same as the conditional probability of secondary infection, given that the calf got the primary infection.

ab denotes the probability that a calf is classified in row a and column b of this table, the null hypothesis is

Let =11+12 denote the probability of primary infection. Then hypothesis probability is

Page 18: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

18STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

MLE and chi-squared test

Likelihood Log likelihood

Differentiation with respect to

Solution For the example Expected counts for each cell

Conclusion: the primary infection had an immunizing effect that reduced the likelihood of a secondary infection.

Page 19: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

19STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

Standard Error

Since

the information is its expected value, which is

which simplifies to

The asymptotic standard error is the square root of the inverse information, or

Page 20: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

20STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

How about confidence limits?

Page 21: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

21STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

SAS code - MLE, test for binomial proc IML;

y=842; n=1824;pi0=0.5; /*data*/ pihat=y/n; SE=sqrt(pihat*(1-pihat)/n); /*MLE*/

WaldStat=(pihat-pi0)**2/SE**2; pWald=1-CDF('CHISQUARE', WaldStat, 1);

LR=2*(y*log(pihat/(pi0)) +(n-y)*log((1-pihat)/(1-pi0)));

pLR=1-CDF('CHISQUARE',LR, 1);

ScoreStat=(pihat-pi0)**2/(pi0*(1-pi0)/n); pScore=1-CDF('CHISQUARE',ScoreStat, 1);

print WaldStat pWald; print LR pLR; print ScoreStat pScore;

Page 22: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

22STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

SAS code - MLE, test for binomial

data D;

input outcome $ w;

cards;

Yes 842

No 982

;

proc freq;

weight w;

table outcome/all CL BINOMIAL(P=0.5 LEVEL="Yes");

exact binomial;

run;

Page 23: 1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose

23STA 517 – Introduction: Distribution and InferenceSTA 517 – Introduction: Distribution and Inference

SAS code – multinomial

data D;

input outcome $ w;

cards;

yellow 6022

green 2001

;

proc freq; weight w;

table outcome/chisq TESTP=(0.25 0.75);

run;