ps203 spring2002 homework one-answerkeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or...

8
PS 203 Spring 2002 Homework One - Answer Key 1. If you have a home or office computer, download and install WinBUGS. If you don’t have your own computer, try running WinBUGS in the Department lab. 2. The data set aes96media (on the PS203 website) contains survey data on voting in the 1996 Australian Federal Election. The variables are: y , a dummy for whether survey respondent i voted for the Labor Party (1) or not (0) PID, an indicator of partisanship (1 for Strong Labor, 2 for Weak Labor, 3 for Lean Labor, 4 for Independents, 5 for Lean Conservative, 6 for Weak Conservative, 7 for Strong Conservative). media, a scale measure of media consumption through the election campaign (0 through 1 corresponding to low through high) quiz, a scale measure of the respondent’s level of political information, ascertained through a series of objective ‘‘true/false’’ items administered at the end of the survey (0 through 1 corresponding to low through high) Use the available predictors to model the voting outcomes y , via logit. Use a series of dummy variables for each level of party identification (collapse the weak and strong conservative categories, since no strong conservatives voted Labor). Include an interaction between media and quiz. (a) Briefly interpret the coefficients and report on the fit of the model to the data. The party identification dummies perform as expected, with a steady monotonic decreasing pattern from Strong Labor through to Weak/Strong Conservative. The media and quiz coefficients tap the effect of a unit change in one of these variables when the other is set to its lowest level. Thus, the -2.28 coefficient on the media exposure variable suggests that when political information is at its lowest level (zero), increased media exposure decreases the probability of voting Labor. Alternatively, the -1.23 coefficient on ‘‘quiz’’ suggests that when media exposure is at its lowest level (zero), as political information increases, the probability of a Labor vote also decreases. The positive interaction term indicates that as both variables increase, these negative effects eventually turn into positive effects (see the next question and Figure 1). (b) Use the estimated coefficients to solve for the level of political information z such that conditional on z, media consumption has no impact on the probability of voting for the ALP. The logit model can be written as p i = F (l i ) l i = α + x i1 b 1 + x i2 b 2 + x i1 x i2 b 3 1

Upload: others

Post on 22-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

PS 203 Spring 2002Homework One - Answer Key

1. If you have a home or office computer, download and install WinBUGS. If you don’t haveyour own computer, try running WinBUGS in the Department lab.

2. The data set aes96media (on the PS203 website) contains survey data on voting in the1996 Australian Federal Election. The variables are:

• y, a dummy for whether survey respondent i voted for the Labor Party (1) or not (0)

• PID, an indicator of partisanship (1 for Strong Labor, 2 for Weak Labor, 3 for LeanLabor, 4 for Independents, 5 for Lean Conservative, 6 for Weak Conservative, 7 forStrong Conservative).

• media, a scale measure of media consumption through the election campaign (0through 1 corresponding to low through high)

• quiz, a scale measure of the respondent’s level of political information, ascertainedthrough a series of objective ‘‘true/false’’ items administered at the end of thesurvey (0 through 1 corresponding to low through high)

Use the available predictors to model the voting outcomes y, via logit. Use a seriesof dummy variables for each level of party identification (collapse the weak andstrong conservative categories, since no strong conservatives voted Labor). Include aninteraction between media and quiz.

(a) Briefly interpret the coefficients and report on the fit of the model to the data.The party identification dummies perform as expected, with a steady monotonicdecreasing pattern from Strong Labor through to Weak/Strong Conservative. Themedia and quiz coefficients tap the effect of a unit change in one of these variableswhen the other is set to its lowest level. Thus, the -2.28 coefficient on themedia exposure variable suggests that when political information is at its lowestlevel (zero), increased media exposure decreases the probability of voting Labor.Alternatively, the -1.23 coefficient on ‘‘quiz’’ suggests that when media exposureis at its lowest level (zero), as political information increases, the probability ofa Labor vote also decreases. The positive interaction term indicates that as bothvariables increase, these negative effects eventually turn into positive effects (seethe next question and Figure 1).

(b) Use the estimated coefficients to solve for the level of political information z suchthat conditional on z, media consumption has no impact on the probability ofvoting for the ALP.The logit model can be written as

pi = F (li)

li = α + xi1b1 + xi2b2 + xi1xi2b3

1

Page 2: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

and we seek z = xi2 such that ∂pi/∂xi1 = 0. Note that

∂pi

∂xi1=

∂F (li)

∂li

∂li

∂xi1

= f (li) (b1 + xi2b3)

Note that f (li) 6= 0,∀ li, so

∂pi

∂xi1= 0 ⇐⇒ b1 + xi2b3 = 0 ⇐⇒ z =

-b1

b3,

where b1 is the coefficient on media and b3 is the coefficient on the interactionbetween media and quiz. This ratio is 2.2797/3.4503 = 0.66. That is, forrespondents with a quiz score of .66, there is no relationship between mediaexposure and the probability of voting Labor. Note that this is a very high levelof quiz, corresponding to the 77th to 87th percentiles of this variable. Note alsothat for values of quiz greater than .66, the effect of media exposure is actuallypositive. See the contour plot in Figure 1.

(c) Use simulation methods to obtain a 95% confidence bound for z.To do this, I simply sampled from the multivariate Normal distribution implied forb by the MLEs. That is, from a Bayesian perspective if we have flat priors over b,then the posterior for b is proportional to the likelihood, and so

p(b|data) ≡ N(

b̂MLE, V(b̂MLE))

.

Then we can induce a posterior on g(b), denoted p(g(b)|data), by repeating thefollowing steps many times (t = 1, . . . , T ):

i. sample b(t) from p(b|data)

ii. form g(t) = g(b(t).

In this case, g(b) = -b1b3

. Since this is a ratio of two (correlated) random variables,p(g(b)|data) has Cauchy-like properties with extremely heavy tails. In fact, themore Monte Carlo simulations we draw, the further we probe into the heavy tails.With 500,000 draws, the median of p(g(b)|data), .66 is equal to the value impliedby the MLEs, and with a 95 percent confidence interval extending outside the unitinterval on which ‘‘quiz’’ is measured. A fifty percent bound (the inter-quartilerange) is [.56, .87].

3. Consider the lung cancer data presented in class (from Johnson and Albert’s Ordinal DataModeling, p35). Eight-six lung cancer patients and a matched sample of 86 controlswere questioned about their smoking habits. The two groups were chosen to representrandom samples from a subpopulation of lung-cancer patietns and an otherwise similarpopulation of cancer-free individuals. The following table summarises the data:

Cancer ControlSmokers 83 72Nonsmokers 3 14

2

Page 3: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Media

Qui

z

Figure 1: Predicted Probabilities of Labor Vote, as a function of Quiz (political information)and Media (media exposure)

3

Page 4: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

Let 0 < pL < 1 and 0 < pC < 1 denote the population proportions of lung-cancerpatients and controls who smoke, respectively. Assume a binomial model for the dataand independence (both within and across groups).

(a) With uninformative (uniform) priors on pL and pC, report the posterior means ofthese parameters, along with 95% credible intervals.The uninformative (uniform) priors for pL and pC are equivalent to Beta(1,1)distributions, yielding the posteriors

p(pL|data) ≡ Beta(83 + 1, 3 + 1)

p(pC|data) ≡ Beta(72 + 1, 14 + 1)

with the following characteristics:

Parameter Mean Mode (MLE) 2.5% 97.5%pL 84/88≈ .95 83/86≈ .97 .90 .99pL 73/88≈ .83 72/86≈ .84 .74 .90

(b) Consider the quantity d = pL - pC. With the same uninformative priors on pL andpC, summarize the posterior density implied by the model for d.d has a posterior mean of .125, with a 95 percent confidence interval extendingfrom .04 to .22.

(c) Compare your Bayesian inferences about pL, pC and d with those from a classical,likelihood analysis.For pL and pC, see the table above. Via independence across the two groups, theMLE is simply of d is simply the difference of the two within-group point estimates,or (83-72)/86 = 11/86 or about .128. Note that point estimate corresponds to theposterior mode obtained with flat priors. To obtain a confidence interval for d, Irely on asymptotically-valid normal approximations as summaries of uncertaintyin the group-specific MLEs. This is convenient, since the normal is completelycharacterized by its mean and variance, and so a 95% confidence interval for theMLE of d can then be obtained by simply adding/substracting 1.96 standard errorsto the MLE of d. By independence across groups, the variance of the MLE of d is

V(d̂MLE) = V(p̂LMLE - p̂C MLE)

= V(p̂LMLE) + V(p̂C MLE)

=pL(1 - pL)

n+

pC(1 - pC)

n≈ .00039 + .00158 ≈ .00198

1.96 times the square root of this variance is the half-width of a 95% boundfor d is [.041, .215], which is not dissimilar from that obtained via the Bayesiansimulation procedure. That is, the asymptotic normal approximation is not badwith this sample size (n.b., n=86).

4

Page 5: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

(d) Prior Sensitivity Analysis: Imagine that after seeing the data in the table (above),a skeptic maintains that he is still not convinced that pL > pC. Assume this skeptichas an uninformative prior on pC. Find a prior on pL that rationalizes the skeptic’sposterior beliefs.There is an infinite set of Beta priors for pL that will rationalize the skeptic’s beliefs.To provide a sense of the mapping from prior to posterior, I use a ‘‘data-equivalent’’representation of the family of priors, summarizing a Beta (α, b) density with itsmean, and an equivalent sample size.I summarize the mapping from the skeptic’s priors (a two-space, since the Betadensity takes two parameters), into the posterior probability that pL > pC.

4. In generating state-level forecasts of the 2000 presidential vote, Jackman and Riversused historical election results as ‘‘priors.’’ For instance, the average of Democraticpresidential vote share in 1988, 1992 and 1996 was used to generate a prior forforecasting the 2000 outcomes. For California, this averaging of historical results yieldsa prior mean for Democratic vote share of of 48.4%. Jackman and Rivers completethe specification of their prior by assuming that after controlling for period-specificnational-level shocks, vote shares vary randomly around a stable long-term averagelevel specific to each state. They estimated this ‘‘within-state’’ random component tohave a standard deviation of 3.1 percentage points.

This prior information is to be combined with poll numbers from the 2000 electionseason to generate state-level forecasts. For instance, a Zogby poll of 436 Californianlikely voters fielded on August 23, 2000 found 42% support for Gore.

Use Bayesian methods to combine the historical ‘‘prior’’ information with the pollinformation to come up with a posterior density over Gore support in California. Reportthe posterior mean and a 95% confidence interval.

Hints: you will have to first convert the prior information into a form suitable for‘‘pooling’’ with the poll data (or vice-versa). For instance, if you assume a binomialmodel for the poll data, then you will have to convert the historical prior informationinto a conjugate Beta density. On the other hand, you might assume that a normalmodel and prior is a suitable characterization of the information in the poll and thehistorical data, in which case you will need to convert the poll information into a formcaptured by the parameters of a normal distribution.

First, try converting the poll information into a form suitable for pooling (via Bayes’Rule) with the historical information. The historical information is expressed as a meanand a standard deviation, which, for convenience, we can interpret as the sufficientstatistics of a normal distribution. The poll information can also be expressed in termsof the sufficient statistics of a normal distribution, i.e., mean = .42 and variance

var(p) =p(1 - p)

n=

.42 · .58436

= .0005587

while the historical analysis yields a variance of .0312 = .000961. Pooling thisinformation via Bayes’ Rule yields a variance of

v = (.005587-1 + .000961-1)-1 = 1/2830.4 = .0003533

5

Page 6: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

Prior Pr(Smoker | Lung Cancer)

Prio

r P

reci

sion

as

Equ

ival

ent N

0.2 0.4 0.6 0.8

050

100

150

200

250

Figure 2: Mapping from Prior over pL to Posterior Mean of d. The contour lines connect pointsin the ‘‘prior space’’ for pL (defined as a prior mean and an equivalent prior n) that give riseto the same posterior mean for d. For instance, an uninformative prior (prior mean = .5 andprior sample size of zero) yields a posterior mean for d of just over .1. The observed data forlung cancer patients (solid square) and the control group (open circle) are also representedin this ‘‘prior space’’ for comparison.

6

Page 7: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

or a standard deviation of 1.88 percentage points. The pooled (or posterior) mean is

x =.42× .005587-1 + .484× .000961-1

.005587-1 + .000961-1= 1255.37/2830.40 ≈ .444.

Another approach is to turn the historical information into a form suitable for poolingwith the poll information. This can be done by treating the historical information asthe equivalent of a Beta prior for the binomial poll data. The historical information hasmean .484 and variance .000961, which we can use to solve for the parameters of aBeta (α, b) distribution, noting that

α

α + b= .484

αb

(α + b)2(α + b + 1)= .000961

Solving for α and b yields α ≈ 125.3 and b ≈ 133.6. The binomial data from thepoll can be represented as y = .42 × 436 successes from n = 436 trials, and sothe posterior is a Beta density with parameters 125.3 + .42 × 436 = 308.42 and133.6 + 436 - .42× 436 = 386.48. This Beta distribution has mean .444, and variance.0003547, or standard deviation about 1.88 percentage points.

Note that the two approaches to this problem yield identical answers.

5. Given data y = (y1, y2, . . . , yn)′, consider the model yi

iid∼ N(h1 + h2, 1), i = 1, . . . , n.Prove that

(a) h1 and h2 are unidentified.The likelihood for these iid normal data is

p(y; h1, h2, r2 = 1) =n∏

i=1

p(yi; h1, h2, r2 = 1)

=n∏

i=1

u(yi; h1, h2, r2 = 1)

=n∏

i=1

1√2p

exp[-(yi - h1 - h2)

2

2

]=

n√2p

exp[-∑n

i=1(yi - h1 - h2)2

2

]and so

lnp(y) ∝ -12

n∑i=1

(yi - h1 - h2)2

=-12

(∑

y2i + nh2

1 + nh22 - 2h1

∑yi - 2h2

∑yi + 2nh1h2)

7

Page 8: PS203 Spring2002 Homework One-AnswerKeyweb.stanford.edu/class/polisci203/hw1key.pdf.0003547,or standarddeviation about1.88percentage points. Notethatthe two approachesto this problemyieldidentical

Now we have the following derivatives:

∂lnp(y)

∂h1= -nh1 +

∑yi - nh2

∂lnp(y)

∂h2= -nh1 +

∑yi - nh2

∂2lnp(y)

∂h21

= -n

∂2lnp(y)

∂h22

= -n

∂2lnp(y)

∂h1∂h2=

∂2lnp(y)

∂h2∂h1= -n

and so the Hessian (the matrix of second derivatives) of the log-likelihood is

H =

[-n -n-n -n

]= -nI

which is clearly not of full column rank (column one is a linear combination ofcolumn two, and vice-versa), and hence singular. This implies that the likelihoodfunction does not have a unique maximum with respect to h = (h1, h2)

′ and so theparameters are not identified.

(b) h1 + h2 is identified.This is rather trivial and I will not elaborate here. The model for the mean is nowre-parameterized as l = h1 + h2. Twice differentiate the log-likelihood functionwith respect to l; the 2nd derivative is -n, implying that the likelihood over l hasa unique maximum.

(c) normal priors with finite variances on h1 and h2 are sufficient to identify h1 and h2.Harder problem. The strategy of proof is to note that since a posterior isproportional to a prior times a likelihood, a log-posterior is proportional to the logprior plus the log-likelihood, and further, the Hessian of the log-posterior equalsthe Hessian of the log-prior plus the Hessian of the log-likelihood. We have shownthat the Hessian of the log-likelihood is of not full rank. It remains to be shownthat with proper priors, this is no longer the case and, further, that the Hessian ofthe log-posterior is now negative definite, implying a unique posterior mode forh = (h1, h2)

′.

8