multinomial models and the em- algorithm

58
Multinomial models and the EM- algorithm Niels Richard Hansen October 2, 2019

Upload: others

Post on 26-Jul-2022

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multinomial models and the EM- algorithm

MultinomialmodelsandtheEM-algorithm

NielsRichardHansen

October2,2019

Page 2: Multinomial models and the EM- algorithm

PepperedMoth

Threealleles,andsixdifferentgenotypes(CC,CI,CT,II,ITandTT).

Threedifferentphenotypes(black,mottled,lightcolored).

Allelefrequencies: , , withpC pI pT pC + pI + pT = 1.

Page 3: Multinomial models and the EM- algorithm

PepperedMothAccordingtoHardy-Weinbergequilibriumthegenotypefrequenciesare

Thecompletemultinomiallog-likelihoodis

Weonlyobserve ,where

Asaspecificdataexamplewehavetheobservation , ,and.

p2C, 2pCpI , 2pCpT , p2

I, 2pI pT , p2

T.

2nCC log(pC) + nCI log(2pCpI) + nCT log(2pCpI) + 2nII log(pI) + nIT log(2pIpT ) + 2nT T log(pT ),

(nC ,nI ,nT )

n = nCC + nCI + nCT=nC

+ nIT + nII=nI

+ nTT=nT

nC = 85 nI = 196nT = 341

Page 4: Multinomial models and the EM- algorithm

MultinomialcellcollapsingThePepperedMothexampleisanexampleofcellcollapsinginamultinomialmodel.

Ingeneral,let beapartitionandlet

bethemapgivenby

A1 ∪… ∪ AK0 = {1,… ,K}

M : NK0 → NK0

0

M((n1,… , nK))j = ∑i∈Aj

ni.

Page 5: Multinomial models and the EM- algorithm

MultinomialcellcollapsingIf with then

ForthePepperedMoths, correspondingtothesixgenotypes, andthepartitioncorrespondingtothephenotypesis

and

Y ∼ Mult(p, n) p = (p1,… , pK)

X = M(Y ) ∼ Mult(M(p),n).

K = 6 K0 = 3

{1, 2, 3} ∪ {4, 5} ∪ {6} = {1,… , 6},

M(n1,… , n6) = (n1 + n2 + n3,n4 + n5, n6).

Page 6: Multinomial models and the EM- algorithm

MultinomialcellcollapsingIntermsofthe parametrization, and

Hence

Thelog-likelihoodis,

(pC, pI) pT = 1− pC − pI

p = (p2C, 2pCpI , 2pCpT , p2

I, 2pIpT , p2

T).

M(p) = (p2C+ 2pCpI + 2pCpT , p2

I+ 2pIpT , p2

T).

ℓ(pC, pI ) = nC log(p2C+ 2pC pI + 2pCpT )

+ nI log(p2I+ 2pIpT) + nT log(p2

T)

Page 7: Multinomial models and the EM- algorithm

PepperedMothsnegativelog-likelihood

Wecancodeaproblemspecificversionofthenegativelog-likelihoodanduseoptimtominimizeit.

## par = c(pC, pI), pT = 1 - pC - pI## x is the data vector of length 3 of counts loglik <- function(par, x) { pT <- 1 - par[1] - par[2]

if (par[1] > 1 || par[1] < 0 || par[2] > 1 || par[2] < 0 || pT < 0) return(Inf)

PC <- par[1]^2 + 2 * par[1] * par[2] + 2 * par[1] * pT PI <- par[2]^2 + 2 * par[2] * pT PT <- pT^2

- (x[1] * log(PC) + x[2] * log(PI) + x[3]* log(PT))}

Notehowparameterconstraintsareencodedviathereturnvalue .∞

Page 8: Multinomial models and the EM- algorithm

PepperedMothsMLE

## Default algorithm Nelder-Mead doesn't use derivativesoptim(c(0.3, 0.3), loglik, x = c(85, 196, 341))

$par[1] 0.07084643 0.18871900

$value[1] 600.481

$countsfunction gradient 71 NA

$convergence[1] 0

$messageNULL

Page 9: Multinomial models and the EM- algorithm

PepperedMothsMLESomethoughthastogointotheinitialparameterchoice.

optim(c(0, 0), loglik, x = c(85, 196, 341))

Error in optim(c(0, 0), loglik, x = c(85, 196, 341)): function cannot be evaluated at initial parameters

Page 10: Multinomial models and the EM- algorithm

PepperedMothsMLEUsingBFGS(orCG)ispossible.Gradientsarecomputedinternallybynumericaldifferentiation.

optim(c(0.3, 0.3), loglik, x = c(85, 196, 341), method = "BFGS")

$par[1] 0.07083646 0.18873621

$value[1] 600.481

$countsfunction gradient 40 9

$convergence[1] 0

$messageNULL

Page 11: Multinomial models and the EM- algorithm

PepperedMothsnegativelog-likelihood

Thecomputationscanbeneficiallybeimplementedingreatergenerality.

ThefunctionMsumsthecellsthatarecollapsed,whichhastobespecifiedbythegroupargument.

M <- function(y, group) as.vector(tapply(y, group, sum))

Page 12: Multinomial models and the EM- algorithm

PepperedMothsnegativelog-likelihood

Thefunctionprobmapstheparameterstothemultinomialprobabilityvector.Thisfunctionwillhavetobeproblemspecific.

prob <- function(p) { p[3] <- 1 - p[1] - p[2] c(p[1]^2, 2 * p[1] * p[2], 2* p[1] * p[3], p[2]^2, 2 * p[2] * p[3], p[3]^2)}

Page 13: Multinomial models and the EM- algorithm

PepperedMothsMLE

loglik <- function(par, x) { pT <- 1 - par[1] - par[2] if (par[1] > 1 || par[1] < 0 || par[2] > 1 || par[2] < 0 || pT < 0) return(Inf)

- sum(x * log(M(prob(par), c(1, 1, 1, 2, 2, 3))))}

Page 14: Multinomial models and the EM- algorithm

PepperedMothsMLE

pep_optim <- optim(c(0.3, 0.3), loglik, x = c(85, 196, 341))pep_optim

$par[1] 0.07084643 0.18871900

$value[1] 600.481

$countsfunction gradient 71 NA

$convergence[1] 0

$messageNULL

Page 15: Multinomial models and the EM- algorithm

PepperedMothssummaryThePepperedMothexampleisverysimple.Thelog-likelihoodfortheobserveddatacaneasilybecomputed.

Thefulldatamodelaswellasthemodelofcollapsedcellsarecurvedexponentialfamilies.

TheexamplewasusedabovetoillustratedifferentwaysofimplementingalikelihoodcomputationinR.Onewasproblemspecificandonewasmoreabstractandgeneral.

BelowtheexampleisusedtoillustratetheEM-algorithm.TheEM-algorithmdoesnotrequirethatwecancomputethemarginallikelihood.In(real)applicationsthiswilloftennotbepossible,oritwillbeanumericallyheavycomputation.

Page 16: Multinomial models and the EM- algorithm

IncompletedatalikelihoodSupposethat isarandomvariableand .Supposethat hasdensity andthat hasmarginaldensity .

Themarginaldensityistypicallyoftheform

forasuitablemeasure dependingon and butnot .

Thelog-likelihoodforobserving is

Y X = M(Y ) Yf(⋅ ∣ θ) X g(x ∣ θ)

g(x ∣ θ) = ∫{y:M(y)=x}

f(y ∣ θ)μx(dy)

μx M x θ

X = x

ℓ(θ) = log g(x ∣ θ).

Page 17: Multinomial models and the EM- algorithm

IncompletedatalikelihoodThemarginallikelihoodisoftenimpossibletocomputeanalyticallyanddifficultandexpensivetocomputenumerically.

Thecompletelog-likelihood, ,isofteneasytocompute,butwedon'tknow ,onlythat .

Insomecasesitispossibletocompute

whichistheconditionalexpectationofthecompletelog-likelihoodgiventheobserveddataandundertheprobabilitymeasuregivenby .

log f(y ∣ θ)Y M(Y ) = x

Q(θ ∣ θ′) := Eθ′(log f(Y ∣ θ) ∣ X = x),

θ′

Page 18: Multinomial models and the EM- algorithm

Idea

Withaninitialguessof computeiteratively

for .

ThisistheEM-algorithm:

E-step:Computetheconditionalexpectation .

M-step:Maximize .

θ′ = θ(0)

θ(n+1) = argmaxQ(θ ∣ θ(n))

n = 0, 1, 2,…

Q(θ ∣ θ(n))

θ ↦ Q(θ ∣ θ(n))

Page 19: Multinomial models and the EM- algorithm

TheEM-algorithmForsomenicemodels(e.g.exponentialfamilies)theconditionalexpectationiseasytocompute,thecompletelog-likelihoodiseasytomaximize,andthistransferstoeasymaximizationof .

Weprovebelowthatthealgorithmisanascentalgorithm;it(weakly)increasesthemarginallikelihoodineverystep.

Butfirstsometheoryaboutconditionaldistributions.

Q

Page 20: Multinomial models and the EM- algorithm

ConditionaldistributionsItholdsingreatgeneralitythattheconditionaldistributionof given hasdensity

w.r.t.asuitablemeasure thatdoesnotdependupon .

Youcanverifythisfordiscretedistributionsandwhen withjointdensityw.r.t.aproductmeasure thatdoesnotdependupon .

Y X = x

h(y ∣ x, θ) =f(y ∣ θ)g(x ∣ θ)

μx θ

Y = (Z,X)μ ⊗ ν θ

Page 21: Multinomial models and the EM- algorithm

ConditionaldistributionsInthelattercase, and

isthemarginaldensityw.r.t. .

Ingeneral

andundersomeintegrabilityconditions,thisdecompositionisusedtoshowthattheEM-algorithmincreasesthelog-likelihood, ,ineachiteration.

f(y ∣ θ) = f(z, x ∣ θ)

g(x ∣ θ) = ∫ f(z, x ∣ θ)μ(dz)

ν

log g(x ∣ θ) = log f(y ∣ θ) − logh(y ∣ x, θ),

ℓ(θ)

Page 22: Multinomial models and the EM- algorithm

Multinomialconditionaldistributions

Theconditionaldistributionof conditionallyon canbefoundtoo.

Theprobabilityparametersaresimply restrictedto andrenormalizedtoaprobabilityvector.Hence

for .

YAj= (Yi)i∈Aj

X

YAj∣ X = x ∼ Mult( , xj) .

pAj

M(p)j

p Aj

E(Yk ∣ X = x) =xjpk

M(p)j

k ∈ Aj

Page 23: Multinomial models and the EM- algorithm

AbstractE-stepTheEM-algorithmcanbeimplementedbytwosimplefunctionsthatcomputetheconditionalexpectationsabove(theE-step)andthenmaximizationofthecompleteobservationlog-likelihood.

EStep0 <- function(p, x, group) { x[group] * p / M(p, group)[group]}

Page 24: Multinomial models and the EM- algorithm

MultinomialMLE

With acompleteobservation,itcanbeshownthattheMLEis

Whichis for

X <- matrix( c(2, 1, 1, 0, 0, 0, 0, 1, 0, 2, 1, 0) / 2, 2, 6, byrow = TRUE)X

[,1] [,2] [,3] [,4] [,5] [,6][1,] 1 0.5 0.5 0 0.0 0[2,] 0 0.5 0.0 1 0.5 0

y = (nCC,nCI ,nCT ,nII ,nIT ,nT T )T

pC = nCC + (nCI + nCT )/2pI = (nCI + nIT )/2 + nII

p = Xy

Page 25: Multinomial models and the EM- algorithm

AbstractM-stepTheMLEofthecompletelog-likelihoodisalinearestimator,asisthecaseinmanyexampleswithexplicitMLEs.

MStep0 <- function(n, X) as.vector(X %*% n / (sum(n)))

TheEStep0andMStep0functionsareabstractimplementations.TheyrequirespecificationoftheargumentsgroupandX,respectively,tobecomeconcrete.

TheM-stepisonlyimplementedinthecasewherethecomplete-dataMLEisalinearestimator,thatis,alinearmapofthecompletedatavector thatcanbeexpressedintermsofamatrix .

yX

Page 26: Multinomial models and the EM- algorithm

PepperedMothsE-andM-stepsConcretefunctionsfortheE-andM-stepsareimplementedfortheparticularexample.

EStep <- function(p, x) EStep0(prob(p), x, c(1, 1, 1, 2, 2, 3))

MStep <- function(n) { X <- matrix( c(2, 1, 1, 0, 0, 0, 0, 1, 0, 2, 1, 0) / 2, 2, 6, byrow = TRUE)

MStep0(n, X)}

Page 27: Multinomial models and the EM- algorithm

PepperedMothsEM

EM <- function(par, x, epsilon = 1e-6, trace = NULL) { repeat{ par0 <- par par <- MStep(EStep(par, x)) if(!is.null(trace)) trace() if(sum((par - par0)^2) <= epsilon * (sum(par^2) + epsilon)) break } par ## Remember to return the parameter estimate}

EM(c(0.3, 0.3), c(85, 196, 341))pep_optim$par

[1] 0.07083693 0.18877365[1] 0.07084643 0.18871900

Page 28: Multinomial models and the EM- algorithm

InsidetheEMCheckwhatisgoingonineachstepoftheEM-algorithm.

source("Debugging_and_tracing.R")

EM_tracer <- tracer("par")EM(c(0.3, 0.3), c(85, 196, 341), trace = EM_tracer$trace)

n = 1: par = 0.08038585, 0.22464192; n = 2: par = 0.07118928, 0.19546961; n = 3: par = 0.07084985, 0.18993393; n = 4: par = 0.07083738, 0.18894757; n = 5: par = 0.07083693, 0.18877365;

[1] 0.07083693 0.18877365

Page 29: Multinomial models and the EM- algorithm

InsidetheEM

summary(EM_tracer)

par.1 par.2 .time1 0.08038585 0.2246419 0.0000000002 0.07118928 0.1954696 0.0002268913 0.07084985 0.1899339 0.0003789184 0.07083738 0.1889476 0.0005293205 0.07083693 0.1887737 0.000670244

Page 30: Multinomial models and the EM- algorithm

InsidetheEM

EM_tracer <- tracer(c("par0", "par"), N = 0)phat <- EM(c(0.3, 0.3), c(85, 196, 341), epsilon = 1e-20, trace = EM_tracer$trace)phat

[1] 0.07083691 0.18873652

EM_trace <- summary(EM_tracer)tail(EM_trace)

par0.1 par0.2 par.1 par.2 .time10 0.07083691 0.1887366 0.07083691 0.1887365 0.00214632411 0.07083691 0.1887365 0.07083691 0.1887365 0.00238005812 0.07083691 0.1887365 0.07083691 0.1887365 0.00267261113 0.07083691 0.1887365 0.07083691 0.1887365 0.00286564914 0.07083691 0.1887365 0.07083691 0.1887365 0.00309316915 0.07083691 0.1887365 0.07083691 0.1887365 0.003308058

Page 31: Multinomial models and the EM- algorithm

Addingcomputedvalues

loglik_pep <- Vectorize(function(p1, p2) loglik(c(p1, p2), c(85, 196, 341)))EM_trace <- transform( EM_trace, n = 1:nrow(EM_trace), par_norm_diff = sqrt((par0.1 - par.1)^2 + (par0.2 - par.2)^2), loglik = loglik_pep(par.1, par.2))

Page 32: Multinomial models and the EM- algorithm

InsidetheEM

qplot(n, log(par_norm_diff), data = EM_trace)

Notethelog-axis.TheEM-algorithmconvergeslinearly(thisistheterminology,seeAlgorithmsandConvergence).

Page 33: Multinomial models and the EM- algorithm

LinearconvergenceThelog-rateoftheconvergencecanbeestimatedbyleast-squares.

log_rate <- coefficients(lm(log(par_norm_diff) ~ n, data = EM_trace))["n"]exp(log_rate)

n 0.1750251

Itisverysmallinthiscaseimplyingfastconvergence.

Thisisnotalwaysthecase.Ifthelog-likelihoodisflat,theEM-algorithmcanbecomequiteslowwitharatecloseto1.

Page 34: Multinomial models and the EM- algorithm

Convergenceofthelog-likelihood

qplot(n, log(loglik- loglik[15]), data = EM_trace)

Page 35: Multinomial models and the EM- algorithm

ObjectorientedimplementationAn(S3)objectorientedimplementationisusefulwhen

somepartsofanalgorithmcanbeimplementedabstractly

otherpartsrequiredetailsspecifictoaparticularproblem

Detailscanbedata,dimensionsandparametrizations.Thesearelogicallyrelated,andtheycanbestoredinalistandgivenalabel.

That'sallthereistoanS3object.

Page 36: Multinomial models and the EM- algorithm

Pep.Mothsobject

pepMoths <- structure( list( x = c(85, 196, 341),

X = matrix( c(2, 1, 1, 0, 0, 0, 0, 1, 0, 2, 1, 0) / 2, 2, 6, byrow = TRUE),

group = c(1, 1, 1, 2, 2, 3),

par = c(0.3, 0.3),

prob = function(p) { p[3] <- 1 - p[1] - p[2] c(p[1]^2, 2 * p[1] * p[2], 2* p[1] * p[3], p[2]^2, 2 * p[2] * p[3], p[3]^2) } ), class = "MultCollapse")

Page 37: Multinomial models and the EM- algorithm

EMgenericfunction

EM <- function(x, ...) ## ... allows for additional arg. UseMethod("EM")

Page 38: Multinomial models and the EM- algorithm

AnEMmethod

EM.MultCollapse <- function(x, epsilon = 1e-6) { par <- x$par repeat{ par0 <- par par <- MStep0(EStep0(x$prob(par), x$x, x$group), x$X) if(sum((par - par0)^2) <= epsilon * (sum(par^2) + epsilon)) break } x$par <- par x ## Returns the entire object}

Page 39: Multinomial models and the EM- algorithm

Pep.MothsEM

pepMoths <- EM(pepMoths)pepMoths$par

[1] 0.07083693 0.18877365

phat

[1] 0.07083691 0.18873652

str(pepMoths)

List of 5 $ x : num [1:3] 85 196 341 $ X : num [1:2, 1:6] 1 0 0.5 0.5 0.5 0 0 1 0 0.5 ... $ group: num [1:6] 1 1 1 2 2 3 $ par : num [1:2] 0.0708 0.1888 $ prob :function (p) ..- attr(*, "srcref")= 'srcref' int [1:8] 14 12 18 5 12 5 14 18 .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fb2431ecd00> - attr(*, "class")= chr "MultCollapse"

Page 40: Multinomial models and the EM- algorithm

OptimizationandstatisticsTheEM-algorithmisageneralalgorithmfornumericaloptimizationofalog-likelihoodfunction.Itworksbyiterativelyoptimizing

Fornumericaloptimizationof orvariantsofEM(likeEMgradientoraccelerationmethods)thegradientandHessianof canbeuseful.

ForstatisticsweneedtheobservedFisherinformation(Hessianofthenegativelog-likelihoodfortheobserveddata).

Thereareinterestingandusefulrelationsbetweenthesedifferentderivativesthatwewillcoverinthislecture.

Q(θ ∣ θ(n)) = Eθ(n)(log f(Y ∣ θ) ∣ X = x).

QQ

Page 41: Multinomial models and the EM- algorithm

GradientNotethatwith insomeparametrizationofthecellprobabilities,

where isdefinedby .

Thegradientof w.r.t. andevaluatedin is

p = p(θ)

Q(θ ∣ θ′) = ∑i

log pi(θ),xj(i)pi(θ′)

M(p(θ′))j(i)

j(i) i ∈ Aj(i)

Q θ θ′

∇θQ(θ′ ∣ θ′) = ∑i

∇pi(θ′).xj(i)

M(p(θ′))j(i)

Page 42: Multinomial models and the EM- algorithm

Gradient

Dprob <- function(p) { matrix( c(2 * p[1], 0, 2 * p[2], 2 * p[1], 2* p[3] - 2 * p[1], -2 * p[1], 0, 2 * p[2], -2 * p[2], 2 * p[3] - 2 * p[2], -2 * p[3], -2 * p[3]), ncol = 2, nrow = 6, byrow = TRUE)}

gradQ <- function(p, x) { p[3] <- 1 - p[1] - p[2] group <- c(1, 1, 1, 2, 2, 3) - (x[group] / M(prob(p), group)[group]) %*% Dprob(p)}

Page 43: Multinomial models and the EM- algorithm

GradientidentityThoughcomputedasthegradientof ,

fromthefactthat maximizes

Thiscanalsobeverifiedbydirectcomputation.

Wecanusethegradientforoptimizationofthelog-likelihood.

Q

∇θQ(θ′ ∣ θ′) = ∇θℓ(θ′)

θ′

θ ↦ Q(θ ∣ θ′) − ℓ(θ).

Page 44: Multinomial models and the EM- algorithm

Gradientusedforoptimization

optim(c(0.3, 0.3), loglik, gr = gradQ, x = c(85, 196, 341), method = "BFGS")[1:3]

$par[1] 0.07083689 0.18873652

$value[1] 600.481

$countsfunction gradient 46 9

phat ## Optimum computed by EM algorithm

[1] 0.07083691 0.18873652

Page 45: Multinomial models and the EM- algorithm

EmpiricalFisherinformationNotethatamultinomialobservationwithsizeparameter canberegardedasi.i.d.samples.

Fori.i.d.samplestheFisherinformation(foronesample)canbeestimatedastheempiricalvarianceofthegradientofthelog-likelihood.Bytheidentity

holdingforeachsample,wecancomputetheempiricalvariance.

n n

∇θQ(θ′ ∣ θ′) = ∇θℓ(θ′)

Page 46: Multinomial models and the EM- algorithm

Pep.MothsempiricalFisher

empFisher <- function(p, x, center = FALSE) { gradQ <- 0 ## is supposed to be 0 in the MLE if (center) gradQ <- gradQ(p, x) / sum(x) grad1 <- gradQ(p, c(1, 0, 0)) - gradQ grad2 <- gradQ(p, c(0, 1, 0)) - gradQ grad3 <- gradQ(p, c(0, 0, 1)) - gradQ x[1] * t(grad1) %*% grad1 + x[2] * t(grad2) %*% grad2 + x[3] * t(grad3) %*% grad3 }

Notethatthegradientis0inthemaximumof ,butthecodeaboveallowsforempiricallycenteringthegradientbeforecomputingtheestimateoftheFisherinformation.

Q(⋅ ∣ θ)

Page 47: Multinomial models and the EM- algorithm

Pep.MothsempiricalFisher

empFisher(phat, c(85, 196, 341))

[,1] [,2][1,] 18487.558 1384.626[2,] 1384.626 6816.612

empFisher(phat, c(85, 196, 341), center = TRUE)

[,1] [,2][1,] 18487.558 1384.626[2,] 1384.626 6816.612

Page 48: Multinomial models and the EM- algorithm

Pep.MothsnumericalFisherinformation

WecancomputetheobservedFisherinformationusingoptim.

ihat <- optim(c(0.3, 0.3), loglik, x = c(85, 196, 341), hessian = TRUE)$hessianihat

[,1] [,2][1,] 18489.734 1384.604[2,] 1384.604 6817.921

NotethatthisHessianandtheempiricalFisherinformationsabovearedifferentestimatesofthesamequantity.Thustheyarenotsupposedtobeidenticalonagivendataset.

Page 49: Multinomial models and the EM- algorithm

FisherinformationComputinganestimateoftheFisherinformationisimportantforcomputingstandarderrorsofMLEs.

TheempiricalFisherinformationcanbecomputedfori.i.d.samplesusingthegradientof ,ifthatgradientcanbecomputedorestimated.

SEMisageneralalternativethatreliesonlyonEM-stepsandanumericaldifferentiationscheme.

Bootstrapisaanotheralternativeforestimatingstandarderrors;inthei.i.d.caseitcanbedonenonparametrically,whichmeansthattheinferenceiscorrectevenifthemodeliswrong;forothermodels,parametricbootstrappingcanbeimplemented,whichthenhingesonmodelassumptionsjustliketheFisherinformation.Bootstrappingiscomputationallyexpensive.Itwillnotbecoveredinthiscourse(butinRegression).

Q

Page 50: Multinomial models and the EM- algorithm

TheEM-mappingDefine by

Aglobalmaximumofthelikelihoodisafixedpointof .

TheEM-algorithmsearchesforafixedpointfor ,thatis,asolutionto

VariationsoftheEM-algorithmcanoftenbeseenasotherwaystofindafixedpointfor .

Φ : Θ ↦ Θ

Φ(θ′) = argmaxQ(θ ∣ θ′).

Φ

Φ

Φ(θ) = θ.

Φ

Page 51: Multinomial models and the EM- algorithm

InformationidentityFrom

itfollowsthattheobservedFisherinformationequals

Itispossibletocompute .ForPep.Moths(andexponentialfamilies)itisasdifficultascomputingtheFisherinformationforcompleteobservations.

Wewanttocompute and isnotcomputableeither.

ℓ(θ) = Q(θ ∣ θ′) − H(θ ∣ θ′)

iX := −D2θℓ(θ) = −D2

θQ(θ ∣ θ′)

= iY (θ′)

+ D2θH(θ ∣ θ′)=− iY ∣X(θ′)

.

iY := iY (θ)

iX iY ∣X := iY ∣X(θ)

Page 52: Multinomial models and the EM- algorithm

SupplementedEMItcanbeshownthat

Hence

canbecomputedvianumericaldifferentiation.

DθΦ(θ)T = iY ∣X( iY )−1.

iX = (I − iY ∣X( iY )−1) iY

= (I − DθΦ(θ)T ) iY .

DθΦ(θ)

Page 53: Multinomial models and the EM- algorithm

Numericalhessianof

Firstweimplementthemap asanRfunction.

Q <- function(p, pp, x = c(85, 196, 341)) { p[3] <- 1 - p[1] - p[2] pp[3] <- 1 - pp[1] - pp[2] group <- c(1, 1, 1, 2, 2, 3) - (x[group] * prob(pp) / M(prob(pp), group)[group]) %*% log(prob(p))}

TheRpackagenumDerivcontainsfunctionsthatcomputenumericalderivatives.

library(numDeriv)iY <- hessian(Q, phat, pp = phat)

Q

Q

Page 54: Multinomial models and the EM- algorithm

SupplementedEM

Phi <- function(pp) MStep(EStep(pp, x = c(85, 196, 341)))DPhi <- jacobian(Phi, phat) ## Using numDeriv function 'jacobian'iX <- (diag(1, 2) - t(DPhi)) %*% iYiX

[,1] [,2][1,] 18487.558 1384.626[2,] 1384.626 6816.612

ihat

[,1] [,2][1,] 18489.734 1384.604[2,] 1384.604 6817.921

Page 55: Multinomial models and the EM- algorithm

SupplementedEMTheinverseFisherinformationis

wherethesecondidentityfollowsbytheNeumannseries.

Thelastformulaaboveexplicitlygivestheasymptoticvariancefortheincompleteobservation astheasymptoticvarianceforthecompleteobservation plusacorrectionterm.

i−1X = i

−1Y (I − DθΦ(θ)T )−1

= i−1Y (I +

∑n=1

(DθΦ(θ)T )n)= i

−1Y + i

−1Y DθΦ(θ)T (I − DθΦ(θ)T )−1

X Y

Page 56: Multinomial models and the EM- algorithm

SupplementedEM

iYinv <- solve(iY)iYinv %*% solve(diag(1, 2) - t(DPhi))

[,1] [,2][1,] 5.492602e-05 -1.115686e-05[2,] -1.115686e-05 1.489667e-04

iYinv + iYinv %*% t(solve(diag(1, 2) - DPhi, DPhi))

[,1] [,2][1,] 5.492602e-05 -1.115686e-05[2,] -1.115686e-05 1.489667e-04

Page 57: Multinomial models and the EM- algorithm

Comparethesewithpreviouscomputations

solve(iX) ## SEM-based, but different use of inversion

[,1] [,2][1,] 5.492602e-05 -1.115686e-05[2,] -1.115686e-05 1.489667e-04

solve(ihat) ## Numerical hessian

[,1] [,2][1,] 5.491927e-05 -1.115318e-05[2,] -1.115318e-05 1.489373e-04

Page 58: Multinomial models and the EM- algorithm

SupplementedEMTheSEMimplementationabovereliesonthehessianandjacobianfunctionsfromthenumDerivpackagefornumericaldifferentiation.

Itispossibletoimplementthecomputationofthehessianof analytically.

Variantsonthestrategyforcomputing vianumericaldifferentiationsuggestusingdifferenceapproximationsalongthesequenceofEM-steps.Idon'tseethepointofdoingthisoverstandardnumericaldifferentiation(besidesthatitexplicitlyusesthat isafixedpoint).Onthecontrary,Idon'tbelievesuchdifferenceapproximationswillconverge.WhenthealgorithmgetssufficientlyclosetotheMLE,thenumericalerrorswilldominate.

Q

DθΦ(θ)

θ