kk 1 - cornell universityfaculty.bscb.cornell.edu/~hooker/btry6520/hw3 ans.pdf · kk 5 here is a...

29
KK 1 Note: Code is only provided in R for this homework. Solutions for Q1 and Q2 have been provided by me, Q3 and Q4 by Giles. Solution for Question 1 is on page 2. Grading rubric for Question 1 is on page 8. Solution for Question 2 is on page 9. Grading rubric for Question 2 is on page 13. Comments for Question 2 is on page 13. Solution for Question 3 is on page 14. Grading rubric for Question 3 is on page 24. Partial solutions for Question 4 is on page 25. Grading rubric for Question 4 is on page 29. Note that the pages are hyperlinked - you can click on them.

Upload: others

Post on 05-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 1

Note: Code is only provided in R for this homework. Solutions for Q1 and Q2 have beenprovided by me, Q3 and Q4 by Giles.

• Solution for Question 1 is on page 2.

• Grading rubric for Question 1 is on page 8.

• Solution for Question 2 is on page 9.

• Grading rubric for Question 2 is on page 13.

• Comments for Question 2 is on page 13.

• Solution for Question 3 is on page 14.

• Grading rubric for Question 3 is on page 24.

• Partial solutions for Question 4 is on page 25.

• Grading rubric for Question 4 is on page 29.

Note that the pages are hyperlinked - you can click on them.

Page 2: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 2

1. Optimized Rejection Sampling for the Gamma

We have the Gamma density given by:

f(x;m,λ) = λmxm−1e−λx/Γ(m)

and an Exponential density:

g(x) = µe−µx

(a) Note we assume m 6= 1 in this question, otherwise the Gamma distribution is justthe Exponential distribution.

We want to find k such that f(x) < kg(x) ∀x, and it suffices to find the maximum

value of f(x)g(x)

. Setting y(x) = f(x)g(x)

, we have:

y(x) =λmxm−1e−λx

Γ(m)µe−µx

= Kxm−1e(µ−λ)x

where K = λm

Γ(m)µ. Taking the derivative of y(x), we have:

y′(x) = K exp{(µ− λ)x}xm−2 (x(µ− λ) + (m− 1))

Setting y′(x) = 0, we get x = 0, or x = m−1λ−µ .

Now, we want to know whether x = m−1λ−µ is a point of inflexion, or local / global

maxima (or minima), and looking at the second derivative of y(x) is a pain to do.Therefore:

An analytical method

However, let’s consider y(x). We know y(x) is continuous, and if y(x)→ ±∞ asx→∞, we cannot get a rejection sampler. This only happens when µ > λ (andthis implies x = m−1

λ−µ is not a global (or local) maxima).

Of course, if λ > µ, then it follows from the shape of y(x) (since as x→∞, y(x)→0) that x = m−1

λ−µ must be a global maxima.

Now we look at the conditions on m. If m > 1, we are fine. But what if m < 1 withµ > λ? If 0 < m < 1, then note that as x→ 0, we have that limx→0+ y(x)→∞,and therefore x = m−1

λ−µ is a point of inflexion (and we reject1 the case m < 1).

A not so analytical method - useful for checking

Note that you can verify your results2 by plotting graphs in R. For example:

m = 0.5 # test with arbitrary values of m and lambda - mu

k = -0.1 # set k to be lambda - mu

xvec = seq(0, 9, by = 0.00001)

yvec = exp(k*xvec) * xvec^(m-1)

plot(xvec,yvec, type = "n")

lines(xvec,yvec)

1pun!2Don’t do this as a proof, but good to make sure you’ve considered all possible cases.

Page 3: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 3

Anyway, we substitute x = m−1λ−µ back into y(x), and have:

y(x) = K

(m− 1

λ− µ

)m−1

exp

{(µ− λ)

m− 1

λ− µ

}= K

(m− 1

λ− µ

)m−1

exp{1−m}

Therefore to summarize, we have k = y(x) = λm

Γ(m)µ

(m−1λ−µ

)m−1

exp{1 −m} with

λ > µ > 0 and m > 1.

(b) To find the value of µ leading to the smallest k, we have:

dk

dµ=λm exp{1−m}(m− 1)m−1

Γ(m)

(λ− µ)−m(mµ− λ)

µ2

Setting this to be zero is equivalent to setting (mµ−λ) = 0, which implies µ = λm

.This is the only stationary point, and is a minimum, since we note µ → 0 andµ→∞ implies k →∞, which implies µ = λ

mcan’t be a maximum point (or point

of inflexion).

Simplifying a bit, we have:

k =λm

Γ(m)µ

(m− 1

λ− µ

)m−1

exp{1−m}

=λm

Γ(m) λm

(m− 1

λ− λm

)m−1

exp{1−m}

=mλm−1

Γ(m)

(m(m− 1)

λ(m− 1)

)m−1

exp{1−m}

=mm

Γ(m)exp{1−m}

Thus µ = λm

leads to the smallest k, and we can write k = mm

Γ(m)exp{1−m}.

(c) We first note that:

f(x)

kg(x)=λmxm−1e−λx

Γ(m)µe−µxΓ(m)

mme1−m

=

m

)m−1

xm−1 exp{( λm− λ)x+m− 1}

We also recall that if we want to generate x from any exponential variable with pa-rameter µ, we can generate x from setting x = − 1

µlogU , where U ∼ Uniform(0, 1).

Page 4: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 4

Thus, using the following algorithm:

i. Set k = 0.

ii. Generate U1 ∼ Uniform(0, 1), and set Y = −mλ

logU1.

iii. Generate U2 ∼ Uniform(0, 1).

iv. If U2 <(λm

)m−1Y m−1 exp{( λ

m− λ)Y + m − 1}, set X = Y . Else return to

step ii.

v. Set k = k + 1. If k < n, go to step ii. Else stop.

Here is an algorithm in R.

genGamma<-function(n, lambda, m)

{

vec = vector("numeric")

k = 1

while (k < n)

{

u1 = runif(1)

u2 = runif(1)

y = -(m/lambda) * log(u1)

if (u2 < ((lambda/m)^(m-1) * y^(m-1) * exp((lambda/m - lambda)*y+m-1)))

{

vec[k] = y

k = k +1

print(k)

}

}

return(vec)

}

m = 2.5

lambda = 1

# plot histogram + lines

xgamma1 = genGamma(100000,lambda,m)

hist(xgamma1, freq = FALSE, main = "Histogram of Gamma with shape 2.5 and rate 1", xlab = "counts")

xvec1 = seq(0, 15, by = 0.00001)

yvec1 = dgamma(xvec1,shape = m, rate = lambda)

lines(xvec1,yvec1)

Page 5: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 5

Here is a plot.

(d) We want to find how many samples we need to reduce the variance of E[|X−3|]2/3below 0.01, where X ∼ Gamma(1, 2.5).

Using an initial sample size of 100, we set our sample mean Y = 1n

∑100i=1 |xi−3|2/3,

and estimate Var[Y ] = s2

n, where s2 is our sample variance.

We thus look for n such that s2

n< 0.01.

# Generate 100 random samples

xgamma2 = genGamma(100,1,2.5)

# Compute Y

Y = abs(xgamma2-3)^(2/3)

# Find n

n = ceiling(var(Y) / 0.01)

# get n = 29

29

We have n = 29.

Now we construct this estimate by running the rejection sampler 50 times.

m = 2.5

lambda = 1

samplevec = vector("numeric")

Page 6: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 6

for (i in 1:50)

{

samplevec[i] = mean(abs(genGamma(29,lambda,m) - 3)^(2/3))

}

var(samplevec)

[1] 0.01038911

(e) We first create an antithetic rejection sampler.

genGamma<-function(n, lambda, m)

{

vec1 = vector("numeric")

vec2 = vector("numeric")

k = 1

while (k < (n+1))

{

u1 = runif(1)

u2 = runif(1)

u3 = 1-u2

y1 = -(m/lambda) * log(u2)

y2 = -(m/lambda) * log(u3)

# Reject both or none

if (u1 < ((lambda/m)^(m-1) * y1^(m-1) * exp((lambda/m - lambda)*y1+m-1)))

{

vec1[k] = y1

k = k+1

}

if (u1 < ((lambda/m)^(m-1) * y2^(m-1) * exp((lambda/m - lambda)*y2+m-1)))

{

vec1[k] = y2

k = k +1

print(k)

}

}

if ( length(vec1) == (n+1))

{

vec1 = vec1[-(n+1)]

}

vec1

}

samplevec = vector("numeric")

for (i in 1:50)

Page 7: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 7

{

samplevec[i] = mean(abs(genGamma(29,lambda,m) - 3)^(2/3))

}

var(samplevec)

[1] 0.008875397

The variance reduces slightly (at least in this case). However:

i. Generally speaking, the variance reduction comes from pairs which are cor-related.

ii. So for a pair (Y1, Y−1), if we rejected one of them, but kept the other, thereis no variance reduction here.

iii. But if we kept both (or rejected both), then the distribution we want isn’tnecessarily from a Gamma distribution.

Page 8: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 8

Grading Rubric for Question 1

(a) i. Find k [3]

ii. Condition on λ [1]

iii. Condition on m [1]

(b) Find value of µ [2]

(c) i. Write a program . . . [4]

ii. and plot [1]

(d) i. Decide samples [1]

ii. Construct estimate [1]

(e) i. Incorporate antithetic sampling [4]

ii. Variance reduction ? [1]

iii. Comment [1]

? Bonus: Recognize why there’s (some) variance reduction, or realize that thesampler isn’t producing true gammas [2]

? Grand Total: [20]

Comments are on individual scripts!

Page 9: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 9

2. Convergence of the Trapezoid Rule

This is a question in many first year undergraduate courses in Analysis, so most of youhave probably seen this proof before. But here is a worked out proof (with explana-tions)3, so those who may not have a background in Analysis can follow as well. Weassume that f is continuous on [a, b] and twice4 differentiable on (a, b). Let’s see whathappens if we have one trapezoid.

x

y

a b

f(a)

f(b)

The trapezoid rule states that we can approximate an integral:∫ b

a

f(x) ≈ f(a) + f(b)

2(b− a)

i.e. we approximate the area under the curve with a trapezoid (green shaded area),and our “error” is the small white flying saucer like shaped area.

︸ ︷︷ ︸h

︸ ︷︷ ︸h

t︷︸︸︷t︷︸︸︷x

y

a p b

f(a)

f(b)

3blatantly copied from undergraduate notes - but this follows the same style as http://math.

stackexchange.com/questions/312429/trapezoid-rule-error-analysis - proof attributed to G.H.Hardy - thanks to the three who cited this!

4see comments

Page 10: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 10

Suppose we let p be the midpoint of a and b, and let 2h = b− a. Now, we define:

g(t) =

∫ p+t

p−tf(x) dx− 2t

2(f(p− t) + f(p+ t)) (?)

for 0 < t ≤ h which gives our error term for any trapezoid constructed within thisinterval.

The idea here is to find an alternative expression for g(t), since we can rearrange ourequation (?) to write:∫ p+t

p−tf(x) dx = t (f(p− t) + f(p+ t)) + g(t)

⇒∫ b

a

f(x) dx =b− a

2(f(a) + f(b)) + g(h) setting t = h

Therefore, finding an expression for g(h) is finding an expression for our error.

The next step here is to construct another function (which isn’t intuitive at all)

r(t) = g(t)−(t

h

)3

g(h) (♥)

because we want to use the Mean Value Theorem5 to claim that r′(c) = 0 for somec ∈ (0, h).

We can claim this since r(0) = r(h) = 0, thus since r is continuous on [0, h] anddifferentiable on (0, h) (being the sum of continuous and differentiable functions), then

using the Mean Value Theorem, there must exist c ∈ (0, h) such that r′(c) = r(h)−r(0)h

=0.

We have:

r′(c) = g′(c)− 3

(c2

h3

)g(h) = 0 (♦)

Now, we take a slight sojourn back to (?), and differentiate this to get:

g′(t) = (f(p+ t)− (−f(p− t)))− (t(−f ′(p− t) + f ′(p+ t)) + f(p− t) + f(p+ t))

= −t (f ′(p+ t)− f ′(p− t)) (†)

Under our assumptions on f (which satisfy the MVT), there must be a point c′ ∈ (a, b)such that:

f ′(c′) =f(b)− f(a)

b− a5Technically Rolle’s Theorem

Page 11: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 11

and thus:

f ′′(c′) =f ′(b)− f ′(a)

b− a⇒ (b− a)f ′′(c′) = f ′(b)− f ′(a) (††)

Setting b = p+ t, a = p− t, and substituting (††) in (†), we get:

g′(t) = −2t2f ′′(c′) (♣)

for c ∈ (p− t, p+ t).

Substituting (♣) into (♦), we have:

r′(c) = −2c2f ′′(c′)− 3

(c2

h3

)g(h)

= −c2

(2f ′′(c′) +

3

h3g(h)

)= 0

This implies that:

g(h) = −2h3

3f ′′(c′)

and substituting this back into (?), we have:∫ b

a

f(x) dx =b− a

2(f(a) + f(b))−2h3

3f ′′(c′)︸ ︷︷ ︸

error

when t = h

with c′ ∈ (a, b).

We can see that this is the error for one trapezoid. Now, what happens if we havemany trapezoids, each of width dx?

Suppose we have N partitions over the interval (a, b) where a = x0, b = xn, say(x0, x1), (x1, x2), . . . , (xn−1, xn) with width dx. Our error E must then be:

E =n∑i=1

−2(xi − xi−1)3

3f ′′(c′i) with ci ∈ (xi−1, xi)

=n∑i=1

−2(dx

2)3

3f ′′(c′i)

=n∑i=1

−dx3

12f ′′(c′i)

Page 12: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 12

From our previous assumptions, we know that each f ′′(c′i) = f ′(xi)−f ′(xi−1)xi−xi−1

exists since

f ′(x) exists as our function is continuously differentiable on our interval. Thus, if weset U = supi f

′′(c′i), we have:

E =n∑i=1

−dx3

12f ′′(c′i)

≤n∑i=1

dx3

12U

= nUdx3

12

=b− adx

Udx3

12

=b− a

12Udx2

= U ′dx2

Thus the error in the trapezoid rule is dx2.

Page 13: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 13

Grading Rubric for Question 2

(a) General correct idea of proof [5]

(b) Completion of proof [5]

(c) Some minor mistakes / conceptual errors [-variable]

? Bonus 1: [2]

? Bonus 2: [2]

Comments:

(a) For (analysis) questions like this, it is nice to draw diagrams6 for an intuition,and also to accompany your proof (this is where LATEXing up solutions fail unlessyou include graphics / use the tikz package), to

i. make it easier for the grader to know what you are trying to show

ii. make it easier for yourself when you read your proofs a week or so later

(b) The function r(t) isn’t very intuitive, and at first glance, you could ask: Why

can’t we have r(t) = g(t)−(th

)kg(h) for any k, since this fulfills r(0) = r(h) = 0?

Try the steps of the proof for k = 1 and k = 2, and see what you get.

(c) Consider the (continuous) function:

f(x) =

x2

2x > 0

0 x = 0

−x2

2x < 0

This function is differentiable since the left (and right) limits of its derivative existat 0 and are equal to each other, but f ′(x) = |x|. If our interval contains 0, f ′′(x)doesn’t exist at 0. In short - there should be an underlying assumption f is twicedifferentiable.

(d) I probably took a closer look at your Question 2 if you referred to stackexchangeand didn’t cite it. Sorry.

(e) A common mistake (from those who paraphrased from Stack Exchange) - was totake a fixed g(t) = . . ., instead of letting t be a varying parameter. Why is this amistake? Because in the proof, you will end up differentiating (and integrating)g(t), and differentiating a constant gives zero.

6But diagrams by themselves aren’t a proof.

Page 14: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 14

3. Vectorization, Bootstrapping and the Nadaraya-Watson Smoother

(a) x = seq(0,365,len=101)

smooth.rupert = rep(0,101)

# Loop over the entries in x and calculate the smooth at each entry. Here

# we will also record the processing time for part b.

start = proc.time()

for(i in 1:length(x)){

smooth.rupert[i] = NadarayaWatson(x[i],Rupert[,1],Rupert[,2],sigma=10)

}

NWtime = proc.time()-start

# Now we’ll plot the data and add a line for the smooth

plot(Rupert)

lines(x,smooth.rupert,col=’Blue’,lwd=2)

0 100 200 300

510

15

day

Rainfall

(b) # We’ll call our function NadarayaWatson 2 to do this, we’ll change the

# input x so that it will work for a vector of values of x.

NadarayaWatson2 = function(x,X,Y,sigma){

# x = vector of points to evaluate our estimate.

# X = vector of observation X values

# Y = vector of observation Y values

# sigma = standard deviation of the normal kernel.

Page 15: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 15

# First, we’ll set up xmat and Xmat that store x repeated over columns

# and X repeated over rows.

xmat = matrix(x,length(x),length(X),byrow=FALSE)

Xmat = matrix(X,length(x),length(X),byrow=TRUE)

# Now (xmat-Xmat)[i,j] gives x[i] - X[j] which we’ll put into dnorm.

WeightMat = dnorm( xmat-Xmat,sd=sigma)

# Note that we could also have used dnoarm(xmat,mean=Xmat,sd=sigma)

# We now need to sum the weights over Columns (ie over values in X),

# which we can do with a vector operation as in lab.

sumweights = WeightMat%*%rep(1,length(X))

# We can now calculate the weighted sum by a matrix multiplication with

# Y and an element-wise division by the sum of the weights.

smooth = (WeightMat%*%Y)/sumweights

}

# Let’s evaluate the smooth this way:

start = proc.time()

smooth.rupert2 = NadarayaWatson2(x,Rupert[,1],Rupert[,2],sigma=10)

NW2time = proc.time()-start

# Add lines to the existing plot

lines(x,smooth.rupert2,col=’Red’,lwd=2)

# Where we see that we have exactly the same output as NadarayaWatson.

# To compare the timing between the two, we already recorded:

# The time for NadarayaWatson:

NWtime

# user system elapsed

# 0.29 0.00 0.36

# The time for NadarayaWatson2:

NW2time

# user system elapsed

# 0 0 0

Page 16: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 16

# which we observe is much faster.

(c) # First We have a vector of sigmas

sigmas = c(2,10,20,50)

# We’ll record the resulting smooth in a matrix, one column per sigma:

smooth.ruperts = matrix(0,length(x),length(sigmas))

# Loop over sigmas and record NadarayaWatson2

for(i in 1:length(sigmas)){

smooth.ruperts[,i] = NadarayaWatson2(x,Rupert[,1],Rupert[,

2],sigma=sigmas[i])

}

# Plot the original data

plot(Rupert[,1],Rupert[,2])

# The matplot function adds all the columns of a matrix to a graph at

# once.

matplot(x,smooth.ruperts,lwd=2,lty=1,col=1:4,type=’l’,add=TRUE)

# Add a legend:

legend(’topleft’,legend=sigmas,lty=1,lwd=2,col=1:4)

0 100 200 300

510

15

Rupert[, 1]

Rup

ert[,

2]

2102050

Page 17: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 17

(d) i. # First we need to predict at the values in the data set, in this case,

# replace x with Rupert[,1]:

NWpred1 = NadarayaWatson2(Rupert[,1],Rupert[,1],Rupert[,2],sigma=10)

# Now calculate residuals.

resid = Rupert[,2] - NWpred1

ii. # We will compute 500 bootstraps

nboot = 500

# Set up the matrix to contain the boostrap values, note we evaluate the

# bootstrap smooths at the original x points.

boot.ruperts = matrix(0,length(x),nboot)

# Loop over bootstraps

for(i in 1:nboot){

bootY = NWpred1 + sample(resid,replace=TRUE)

# second term bootstraps residuals

# Evaluate smooth and add it to i’th column of boot.ruperts

boot.ruperts[,i] = NadarayaWatson2(x,Rupert[,1],bootY,sigma=10)

}

iii. # Plot the first 20 columns in boot.ruperts using matplot:

matplot(x,boot.ruperts[,1:20],type=’l’,lty=1,lwd=2,col=’blue’)

# Add the mean (I’ll make this extra-thick so it’s more visible):

lines(x,smooth.rupert,lwd=3,col=’red’)

0 100 200 300

46

810

12

x

boot

.rupe

rts[,

1:20

]

Page 18: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 18

iv. # Bias estimate: obtain mean of bootstraps for each row (since columns

# = boostrap replicates) and subtract values of smooth

# (I’ll use an apply function here for readability)

rupert.bias = apply(boot.ruperts,1,mean) - smooth.rupert

# Bias-corrected estimate - subtract bias from smooth

rupert.correct = smooth.rupert - rupert.bias

# Obtain the standard deviation in each row. I’ve also used an apply

# statement here

rupert.sd = apply(boot.ruperts,1,sd)

# And use this for normal-theory confidence intervals by adding and

# subtracting the critical value times the standard deviation from the

# corrected estimate.

crit = qnorm(0.975)

rupert.normCI = cbind(rupert.correct+crit*rupert.sd,rupert.correct-

crit*rupert.sd)

# For the quantile-based intervals, obtain the quantiles in each row,

# again with apply:

rupert.quant = apply(boot.ruperts,1,quantile,c(0.025,0.975))

# and remember that we need to adjust them around the estimate:

rupert.quantCI = cbind(2*smooth.rupert - rupert.quant[1,],

2*smooth.rupert-rupert.quant[2,])

# Now we’ll plot the smooth and bias-corrected smooth

matplot(x,cbind(smooth.rupert,rupert.correct),

type=’l’,lwd=2,col=c(’black’,’blue’),lty=c(1,1,2,2))

# Add ruppert correct +/- 2*rupert.sd

matplot(x,rupert.normCI,type=’l’,lwd=2,col=’red’,lty=2,add=TRUE)

# Add the quantiles of the bootstrap distribution, in this case they

# need to be transposed to make them the right dimension

matplot(x,rupert.quantCI,lty=2,col=’green’,lwd=2,type=’l’,add=TRUE)

# Add a legend

legend(’bottomright’,legend=c(’smooth’,’bias-corrected’,’normal

CI’,’Bootstrap CI’),

Page 19: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 19

col=c(’black’,’blue’,’red’,’green’),lty=c(1,1,2,2),lwd=2)

# Here we see that the bias is largest at the second peak where the

# smooth changes very rapidly. The confidence intervals, however, are

# very close.

0 100 200 300

46

810

12

x

cbin

d(sm

ooth

.rupe

rt, ru

pert.

corr

ect)

smoothbias-correctednormal CIBootstrap CI

(e) # The complexity is O(mnB) -- n operations for each of m evaluation

# points, B times.

# When m = O(n), the complexity is O(n^2 B)

(f) # First we’ll run the boostrap, we’ll store this in boot.ruperts 2.

boot.ruperts2 = matrix(0,length(x),nboot)

# Loop over bootstraps

for(i in 1:nboot){

# Indices of rows of Rupert to use in the bootsrap

I = sample(nrow(Rupert),replace=TRUE)

# Evaluate smooth and add it to i’th column of boot.ruperts

boot.ruperts2[,i] = NadarayaWatson2(x,Rupert[I,1],Rupert[I,2],sigma=10)

}

# and plot the first 20 entries along with the mean

matplot(x,boot.ruperts2[,1:20],type=’l’,lty=1,lwd=2,col=’blue’)

lines(x,smooth.rupert,lwd=3,col=’red’)

Page 20: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 20

0 100 200 300

46

810

12

x

boot

.rupe

rts2[

, 1:2

0]

# Then repeat the same calculations as 3diii, remembering not to write

# over the output that we calculated in that part.

rupert.bias2 = apply(boot.ruperts2,1,mean) - smooth.rupert

rupert.correct2 = smooth.rupert - rupert.bias2

rupert.sd2 = apply(boot.ruperts2,1,sd)

rupert.normCI2 = cbind(rupert.correct2+2*rupert.sd2,rupert.correct2-

2*rupert.sd2)

rupert.quant2 = apply(boot.ruperts2,1,quantile,c(0.025,0.975))

rupert.quantCI2 = cbind(2*smooth.rupert - rupert.quant2[1,],

2*smooth.rupert-rupert.quant2[2,])

# And produce the plots

matplot(x,cbind(smooth.rupert,rupert.correct2),

type=’l’,lwd=2,col=c(’black’,’blue’),lty=c(1,1,2,2))

matplot(x,rupert.normCI2,type=’l’,lwd=2,col=’red’,lty=2,add=TRUE)

matplot(x,rupert.quantCI2,lty=2,col=’green’,lwd=2,type=’l’,add=TRUE)

legend(’bottomright’,legend=c(’smooth’,’bias-corrected’,’normal CI’

,’Bootstrap CI’), col=c(’black’,’blue’,’red’,’green’),

lty=c(1,1,2,2),lwd=2)

Page 21: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 21

0 100 200 300

46

810

12

x

cbin

d(sm

ooth

.rupe

rt, ru

pert.

corr

ect2

)

smoothbias-correctednormal CIBootstrap CI

# Here we observe that the bias correction appears to be negligible.

# We’ll do some comparisons between residual and vanilla bootstraps

# Look at the bias

matplot(x,cbind(rupert.bias,rupert.bias2),type=’l’)

0 100 200 300

-0.4

-0.2

0.0

0.2

0.4

x

cbin

d(ru

pert.

bias

, rup

ert.b

ias2

)

Page 22: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 22

# And the corrected values (along with the original smooth)

matplot(x,cbind(smooth.rupert,rupert.correct,rupert.correct2),type=’l’)

0 100 200 300

46

810

12

x

cbin

d(sm

ooth

.rupe

rt, ru

pert.

corr

ect,

rupe

rt.co

rrec

t2)

# It will be interesting to see if the standard deviations change very

# much between the two bootstraps

matplot(x,cbind(rupert.sd,rupert.sd2),type=’l’)

0 100 200 300

0.2

0.3

0.4

0.5

x

cbin

d(ru

pert.

sd, r

uper

t.sd2

)

Page 23: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 23

# and there appears to be a noticeable difference. This plays out in the

# corrected intervals

matplot(x,cbind(rupert.normCI,rupert.normCI2),type=’l’,lty=c(1,1,2,2),col=

c(1,1,2,2))

0 100 200 300

46

810

12

x

cbin

d(ru

pert.

norm

CI,

rupe

rt.no

rmC

I2)

# where the residual boostrap predicts more variation in the middle of

# the function.

# Looking at the quantile intervals

matplot(x,cbind(rupert.quantCI,rupert.quantCI2),type=’l’,lty=c(1,1,2,2)

,col=c(1,1,2,2))

0 100 200 300

46

810

12

x

cbin

d(ru

pert.

quan

tCI,

rupe

rt.qu

antC

I2)

Page 24: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 24

Grading Rubric for Question 3

(a) i. Define x correctly. [1]

ii. Calling NadarayaWatson appropriately [2]

iii. Correct plot [1]

(b) i. Producing new function [5]

ii. Add line to graph in previous plot [1]

iii. Calculate timing [2]

iv. Each loop in the function [-2]

? Bonus: No loops / apply [2]

(c) i. Loop over σ and store values [1]

ii. Plot [1]

iii. Legend [1]

(d) i. A. Predictions [1]

B. Residuals [1]

ii. A. Set up storage matrix [1]

B. Resampling residuals and adding onto predictions [2]

iii. Plot [2]

iv. A. Bias estimate [1]

B. Bias corrected estimate [1]

C. Normal theory confidence intervals [1]

D. Standard bootstrap quantiles [1]

E. Plots [1]

F. Legend [1]

(e) Complexity [2]

? Bonus: Implement original bootstrap [3]

? Bonus: Replicate estimates and plot [2]

† Total: [30]

Page 25: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 25

4. A Statistician Plays Darts

(a) x = seq(-170,170,len=101)

DBmat = matrix(DartBoard(rep(x,101),rep(x,1,each=101)),101,101)

library(’rgl’)

persp3d(x,x,DBmat,col=’red’)

(b) SimpsonsRule = function(fn,lb,ub,n){

# Conducts integration on a square using Simpon’s rule

#

# fn -- function to integrate should take two vectors and return a

# vector of values

# lb -- lower end point in both x and y

# ub -- upper end poing in both x and y

# n -- number of quadrature points to use in each dimension, must be

# odd.

x = seq(lb,ub,len=n) # equally spaced points on [lb ub]

h = (ub-lb)/(n-1) # distance between points

# And define the Simpson’s rule weights in one dimension.

w = rep(1,n)

w[seq(2,n-1,by=2)] = 4

w[seq(3,n-2,by=2)] = 2

w = w*h/3

# Now we will define the grid of points on the square

# Run through x[1] in x and pair it with all values of x in y, then

# move onto x[2] etc

X = cbind( rep(x,n), rep(x,1,each=n) )

# The weight given in Simpson’s rule is just the proeduct of the weight

# in x and the weight in y

W = rep(w,n)*rep(w,1,each=n)

return(sum(fn(X[,1],X[,2])*W)/(340^2) )

}

n = 21

I = SimpsonsRule(DartBoard,-170,170,n)

[1] 10.24889

(c) tol = 1e-6

tol.met = FALSE

Page 26: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 26

maxn = 500

ns = 21

Is = I

while(!tol.met){

Iold = I

n = 2*n+1

I = SimpsonsRule(DartBoard,-170,170,n)

ns = c(ns,n)

Is = c(Is,I)

print(c(n,I,Iold,I-Iold))

if( abs(I-Iold) < tol | n > maxn){ tol.met = TRUE }

}

plot(log(ns[2:length(ns)]),log(abs(diff(Is))))

lm(log(abs(diff(Is)))~ log(ns[2:length(ns)]))

(d) (not shown)

(e)BoxMuller = function(n,mu,sig){

U = matrix(runif(2*n),n,2)

r = sqrt(-2*log(U[,1]))

return(mu + sig*cbind(r*cos(2*pi*U[,2]),r*sin(2*pi*U[,2])))

}

X = BoxMuller(100000,0,20)

Ibig = mean( DartBoard(X[,1],X[,2]) )

(f) X = BoxMuller(1000,0,1)

X1 = 20*X - 20

X2 = 20*X

BM1 = DartBoard(X1[,1],X1[,2])

BM2 = DartBoard(X2[,1],X2[,2])

alpha = cov(BM1,BM2)/var(BM2)

hatI = mean(BM1) - alpha*(mean(BM2) - Ibig)

mean(BM1)

Page 27: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 27

hatIs = rep(0,100)

hatIv = rep(0,100)

for(i in 1:100){

X = BoxMuller(1000,0,1)

X1 = 20*X - 20

X2 = 20*X

BM1 = DartBoard(X1[,1],X1[,2])

BM2 = DartBoard(X2[,1],X2[,2])

alpha = cov(BM1,BM2)/var(BM2)

hatIs[i] = mean(BM1) - alpha*(mean(BM2) - Ibig)

hatIv[i] = mean(BM1)

}

var(hatIs)/var(hatIv)

(g) xgrid = seq(-170,170,len=51)

X = BoxMuller(10000,0,1)

BMv = DartBoard(20*X[,1],20*X[,2])

hatIvErr = mean(BMv) - Ibig

Ivar = var(BMv)

sig = 20

alphamat = matrix(0,51,51)

scoremat = matrix(0,51,51)

for(i in 1:51){

for(j in 1:51){

BMc = DartBoard(sig*X[,1]+xgrid[i], sig*X[,1]+xgrid[j])

alphamat[i,j] = cov(BMc,BMv)/Ivar

scoremat[i,j] = mean(BMc) - alphamat[i,j]*hatIvErr

}

}

contour(xgrid,xgrid,alphamat)

Page 28: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 28

persp3d(xgrid,xgrid,scoremat,col=’blue’)

rowmax = apply(scoremat,1,max)

ibest = which.max(rowmax)

jbest = which.max(scoremat[ibest,])

points(xgrid[ibest],xgrid[jbest],pch=8,col=2)

(h) (not shown)

(i) sigs = c(5,10,15,20,25,50,100)

xbest = matrix(0,length(sigs),2)

for(k in 1:length(sigs)){

alphamat = matrix(0,51,51)

scoremat = matrix(0,51,51)

for(i in 1:51){

for(j in 1:51){

BMc = DartBoard(sigs[k]*X[,1]+xgrid[i], sigs[k]*X[,1]+xgrid[j])

alphamat[i,j] = cov(BMc,BMv)/Ivar

scoremat[i,j] = mean(BMc) - alphamat[i,j]*hatIvErr

}

}

rowmax = apply(scoremat,1,max)

xbest[k,1] = which.max(rowmax)

xbest[k,2] = which.max(scoremat[xbest[k,1],])

}

contour(x,x,DBmat)

points(xgrid[xbest[,1]],xgrid[xbest[,

2]],pch=as.character(1:length(sigs)),col=2,cex=3)

Page 29: KK 1 - Cornell Universityfaculty.bscb.cornell.edu/~hooker/BTRY6520/HW3 Ans.pdf · KK 5 Here is a plot. (d)We want to nd how many samples we need to reduce the variance of E[jX 3j]2=3

KK 29

Grading Rubric for Question 4

(a) Perspective plot [2]

(b) i. Simpson’s rule [2]

ii. Check with Monte Carlo integral [2]

(c) i. Function . . . [2]

ii. Plot [1]

iii. Convergence [1]

iv. Why not? [1]

(d) Carry the calculation through [3]

(e) i. Produce a Box Muller algorithm [2]

ii. Produce a Monte Carlo estimate [2]

(f) i. Estimation of correlation [1]

ii. Simulation of 100 estimates [3]

(g) i. Make the control variate correction [1]

ii. Contour plot of covariance [1]

iii. Would you attempt to find this point . . . [2]

(h) Produce exact control variate [4]

? Bonus: [5]

?? Total: [30]