hidden markov models tutorial - david springer

7/25/2019 Hidden Markov Models Tutorial - David Springer

1/14

David Springer 11/11/2013

1

Hidden Markov ModelsHidden Markov models (HMMs) are a statistical framework used to descrie se!uential data"#$e% work % making inferences aout t$e likeli$ood of eing in certain &$idden states' movingetween t$ose states and seeing an oservation generated % eac$ state"HMMs $ave een used in iomedical engineering for laelling *+ waves identif%ing apnoeicsounds during sleep patient degradation in t$e ,*- sleep staging image segmentation and D.se!uencing as well as ot$er areas like climatolog% $andwriting recognition and etensiveapplication in speec$ processing"#$is aims to e a s$ort simple introduction to t$e maor concepts of HMMs" #$e maorit% ofcontent in t$is document come from" 4" 4ainer & tutorial on $idden Markov models and selected applications in speec$recognition' 5roc" ,))) vol" 66 no" 2 pp" 276829: 1;9;"." 5" Hug$es &5roailistic Models for utomated *+ ,nterval nal%sis' Doctoral #$esis-niversit% of (?(@)A?(@ G B) ?(@ G C) E ),f t$e proailit% of otaining a sample value ? at time @ is onl% dependant on a finite $istor% ofprevious values t$is can e written as>(?(@)A?(@ G B) ?(@ G C) E ? ( @ G I))w$ic$ defines an JKLorder Markov process" .ow a Markov 5rocessoften refers to a 1storderprocess defined as

>(?(@)A?(@ G B))w$ere t$e current value in t$e time series is onl% dependant on t$e previous value" #$is&memor%less' e$aviour w$ere t$e net value onl% depends on t$e current value is referred toas t$e

Markov propert%

"


2/14


2

,f we redefine t$is aove e!uation in terms of states rat$er t$an sample values we can define aMarkov c$ainas

(NKANKOB) Markov model can e used to represent an% random variale NKw$ic$ can occup% one of Jpossile states at an% time @"1"2"1"2"1"2"1"2" #$e initial state#$e initial state#$e initial state#$e initial state distriutiondistriutiondistriutiondistriution FFFF P#$e initial state distriution Q defines t$e proailities of eing in a certain state at t$e specialcase w$en @R1" #$is is a 1J arra% defining t$e proailit% of eing in a certain state wit$outan% prior knowledge" #$is is defined as

QT R (NB R U) 1 V U V J

w$ere NBis t$e state at t$e first timeFstep first sample or first oservation"1"3"1"3"1"3"1"3" #$e transition m#$e transition m#$e transition m#$e transition matriatriatriatri 8888 WXY#$e transition matri defines t$e proailit% of moving from one state to anot$er" single pointin t$is matri can e defined asZ R [T\ R (NK R UANKOB R ]) 1 V U ] V J

w$ere NK RU indicates t$at t$e state at time @ is state numer U" #$erefore t$e transition matriZ is a J J matri defining t$e proailit% of transitioning from one state at time @ G 1 toanot$er state at time @" ^% definition [T\ _ 0 and ` [T\I\B R1"1"b"1"b"1"b"1"b" ample 1ample 1ample 1ample 1


3/14


3

Z R [T\j R

[/q 0"b 0"3 0"3

0"2 0": 0"2 0"1 0"1 0"9

$at is t$e proailit% t$at t$e weat$er is#$is can e written as

(A) R xN NC NB N N NC NC NBAyR xNy xNCANy xNBANCy ER Q [C [CB ER Qz{| } ~{{|{ R0"7 0"1 0"2 0"3 0"9 0"1 0": 0"bR7"6: 10O1"7"1"7"1"7"1"7" tension to Hidden Markov Modelstension to Hidden Markov Modelstension to Hidden Markov Modelstension to Hidden Markov Models$at $appens if we cant directl% oserve t$e weat$er =or instance w$at if %ou are stuck in+aris la in t$e ,^M wit$ no windows and all %ou can oserve is t$e average level of lig$t int$e atrium F can we t$en make predictions aout t$e weat$er outside #$ese are !uestions t$atcan e solved using HMMs"2"2"2"2" HiddenHiddenHiddenHidden Markov ModelsMarkov ModelsMarkov ModelsMarkov ModelsHidden Markov models (HMMs) is a proailistic model t$at descries t$e statisticalrelations$ip etween on oservale se!uence and an unoservale or &$idden' statese!uence " #$e $idden state is discrete1 and governed % an underl%ing Markov model asdefined aove wit$ an initial state distriution Q and a transition matri [ T\ " #$e oservationscan e continuous or discrete"n HMM is often called a &doul% emedded stoc$astic process'" #$is is ecause of t$eproailistic $idden process t$at is onl% visile t$roug$ anot$er set of stoc$astic oservations"#$e second oservale proailistic process is related to t$e $idden states % an oservationproailit% distriution2"1"2"1"2"1"2"1" #$e


4/14


b

proailit% mass function w$ile eing a proailit% densit% function in t$e case of continuousoservations" #$is can e defined as R \() R ([@ @U @ANK R ]) 1 V ] V J 1 V V

w$ere is KLdiscrete oservation of w$ic$ t$ere are " complete $idden Markov model is written asR( Z Q)et us etend t$e eample used aove to visualise an oservation proailit% distriution2"2"2"2"2"2"2"2" ample 2ample 2ample 2ample 2 isualisingisualisingisualisingisualising +oing ack to Section 1"7 we said we can onl% oserve t$e average level of lig$t in t$e la utnot t$e weat$er outside" #$is can e visualised in t$e form of a d%namic ^a%esian network

=igure=igure=igure=igure 1111 d%namic ae%sian network illustration of an HMM d%namic ae%sian network illustration of an HMM d%namic ae%sian network illustration of an HMM d%namic ae%sian network illustration of an HMM more standard wa% of representing a HMM is t$is topolog%


5/14


7

=igure=igure=igure=igure 2222 Standard HMM topolog% Standard HMM topolog% Standard HMM topolog% Standard HMM topolog%,n order to make inferences aout t$e weat$er outside we need to define t$e t$ree parametersneeded for a HMM

QT R 0"3 0"2 0"7

#$is sta%s t$e same as efore" e ma% not e ale to oserve t$e weat$er ut it still $as t$esame likeli$ood"

Z R [T\j R [/q

0"b 0"3 0"3 0"2 0": 0"2 0"1 0"1 0"9

#$e transitions etween t$e weat$er too sta% t$e same" However we need to define t$erelations$ip etween t$e oservations and t$e $idden states" ,n t$e case of discreteoservations t$is could look like t$is


6/14


:

ig$t evel

States

0"2 0"3 0"7

R \R

0"2 0"b 0"b 0": 0"3 0"1#ale#ale#ale#ale 1111 ample of a discrete oservations proailit% distriution ample of a discrete oservations proailit% distriution ample of a discrete oservations proailit% distriution ample of a discrete oservations proailit% distriution,f t$e oservations w$ere continuous t$is distriution could look more like t$is

=igure=igure=igure=igure 3333 ample of ample of ample of ample of wit$ continuous oservationswit$ continuous oservationswit$ continuous oservationswit$ continuous oservations

3"3"3"3"

#$e #$ree prol#$e #$ree prol#$e #$ree prol#$e #$ree prolems solved using HMMsems solved using HMMsems solved using HMMsems solved using HMMs#$ere are t$ree main prolems or tasks t$at HMMs are used for" #$ese are #$e evaluation prolem +iven a set of oservations RBC E and a model R (Z Q) $ow do we efficientl% compute (A) #$is can e seen as calculating$ow likel% a set of oservations is given a model wit$ application for t$ings likeanormalit% detection" #$e inference prolem +iven a set of oservations RBC E and a model R (Z Q)$ow do we c$oose a state se!uence RNBNCN E Nt$at est eplains t$eoservations #$is can e seen as a calculating t$e most likel% se!uence of states useful

for t$ings like signal segmentation and laelling"

#$e optimisation prolem +iven a set of oservations RBC E $ow do weadust t$e model parameters R (Z Q)to maimise (A) #$is can e seen as an


7/14


6

unsupervised mac$ine learning prolem w$ere %ou are completel% unaware of t$emodel parameters" #$is is t$e most difficult of t$e prolems t$at HMMs are used tosolve"

3"1"3"1"3"1"3"1"

5rolem 1 #$e valuation 5rolem5rolem 1 #$e valuation 5rolem5rolem 1 #$e valuation 5rolem5rolem 1 #$e valuation 5roleme want to compute (A) or t$e proailit% of an oservation se!uence given t$e model we$ave",f t$e state se!uence RNBNCN E N is known eactl% (i"e" NB R 1 NC R3E) t$is prolem issimple(A ) R >(KANK )KB

R Qz|z|(B) [(z|z)z(K)

KC #$e e!uations aove make an assumption t$at eac$ oservation in eac$ state is independentand identicall% distriuted (i"i"d) from t$e last" #$is is most serious limitation of HMMs and isaddressed later on in t$e document3"$at if we dont know t$e eact state se!uence of t$e oservations *an we still calculate t$eproailit% of a set of oservations over all t$e possile states#o calculate t$e aove e!uation ut wit$ an unknown se!uence of states ut not knowing eac$state transition would involve an enormous numer of calculations" ,ntuitivel% t$is wouldinvolve summing t$e proailit% of eac$ comination of state se!uences" However t$is processcan e simplified using a d%namic programmingb approac$ called t$e forwardFackwardalgorit$m21"3"1"1"3"1"1"3"1"1"3"1" #$e forward#$e forward#$e forward#$e forward algorit$malgorit$malgorit$malgorit$mDefine t$e forward variale as

K(U) R (BC E KNK RUA)w$ic$ is t$e proailit% of a partial oservations se!uence up to time @ and eing in state U attime @ given t$e model "#$en B(U) R QTT(B)#$en using t$e d%namic programming approac$ t$e suse!uent value can e computed fromt$e previous value using t$e induction step

KB(]) R K(])[T\ITB \(KB) 1 V @ V G 1 1 V ] V J3lso refer to Section b"2 in Hug$es D5$il t$esis"bD%namic programming is a term for reaking a prolem down into smaller se!uential steps t$at are notindependent


8/14


9

=inall% t$e value of interest(A) R (U)I

TB

#$e induction step w$ere eac$ value of KB(]) is calculated can e visualised as s$own in=igure b"

=igure=igure=igure=igure bbbb ,llustration of t$e computation of t$e forward variale ,llustration of t$e computation of t$e forward variale ,llustration of t$e computation of t$e forward variale ,llustration of t$e computation of t$e forward variale

(Y)1"3"2"1"3"2"1"3"2"1"3"2" #$e ackward algorit$m#$e ackward algorit$m#$e ackward algorit$m#$e ackward algorit$m#$is algorit$m is not necessar% to solve prolem 1 of HMMs ut is used in prolem 2 andprolem 3",n a similar manner to t$e forward variale we define t$e ackward variale as

K(U) R(KBKC E ANK RU)w$ic$ is t$e proailit% of seeing t$e partial oservation from time @1 to t$e end given t$att$e state at time @ is state U and t$e given t$e model " #$is can again e solved using d%namicprogramming ut from t$e end of t$e se!uence to our current point ($ence t$e name&ackward variale') (U) R 1 1 V U V J

K(U) R [T\$KB)B()I\B @ R G 1 G 2 E 1 1 V U V J#$e ackward variale is aritraril% defined to e 1 for all U" #$is calculation can e visualised ina similar fas$ion to t$e forward variale as s$own in =igure 7" #$is algorit$m identifies t$at inorder to e in state NT at time @ and to account for t$e oservations until t$e end of t$ese!uence %ou $ave to consider all states at time @ 1 t$e transition to all t$ese states ([T$ and


9/14


;

t$e oservations from eac$ of t$ese states (\(KB)) and t$en account for t$e rest of t$eoservation se!uence (KB(]))"

=igure=igure=igure=igure 7777 ,llustration of t$e computation of t$e ,llustration of t$e computation of t$e ,llustration of t$e computation of t$e ,llustration of t$e computation of t$e ackwardackwardackwardackward varialevarialevarialevariale (X)3"2"3"2"3"2"3"2" ample 3 =orward algorit$m calculationample 3 =orward algorit$m calculationample 3 =orward algorit$m calculationample 3 =orward algorit$m calculation+oing ack to t$e weat$er eample w$at is t$e proailit% of seeing a lig$t level se!uence t$atlooks like t$is (presuming a discrete oservation proailit% distriution)

=igure=igure=igure=igure ::::


10/14


10

C

(]) R B

(])[B\

TB " \(

C) R

0"170"090"07 x0"b 0"3 0"3y

0"170"090"07 x0"2 0": 0"2y0"170"090"07 x0"1 0"1 0"9y "

\(

C) R

0"0;;0"0:10"0b;7 "

0"70"b0"1

R 0"0b;70"03720"00:3#$is can get tiring doing it % $and" See Matla code forwardalgorit$mweat$ere0ample"mfort$e complete calculation"

B(])ends up looking like t$is@ R 1 @ R 2 @ R 3 @ R b0"17 0"0b;7 0"01:1 0"007b0"09 0"0372 0"012; 0"00b70"07 0"00:3 0"001b 0"000b#$erefore(A) R (U)ITB R0"0103

3"3"3"3"3"3"3"3"

5rolem 2 #$e ,nference 5rolem5rolem 2 #$e ,nference 5rolem5rolem 2 #$e ,nference 5rolem5rolem 2 #$e ,nference 5rolem#$is prolem addresses t$e need to find an optimalstate se!uence given a set of oservationsand t$e model" #$is could e t$oug$t of as segmenting or laelling a time series" However $owdo we define optimal in t$is sensee could find t$e most likel% state for eac$ oservation using t$e forward and ackwardvariales defined efore

K(U) R (NK R UA ) R K(U)K(U)` K(U)K(U)ITB

#$en NK R[ m a BTIxK(U)y 1 V @ V #$is met$od maimises t$e numer of correct individual states ut does not take into accountan% information aout t$e se!uence of states7" ,n order to find t$e most optimal se!uence ofstates we need t$e iteri /lgorit$m"

7=or eample t$ink aout state transitions t$at mig$t e impossile suc$ as going from sunn% weat$erto snowing wit$out first going t$roug$ cloud% weat$er or going from t$e 4Fpeak of an *+ to t$e 5Fwavewit$out first transitioning t$roug$ t$e S and # comple"


11/14


11

3"3"1"3"3"1"3"3"1"3"3"1" #$e iteri lgorit$m#$e iteri lgorit$m#$e iteri lgorit$m#$e iteri lgorit$m,n order to find t$e most likel% se!uence of states associated wit$ a set of oservations givent$e model we need to define a variale to keep track of t$e proailit% along a single pat$

K(U) R maz|zzEz| xNBNC E NK R U BC E KAyw$ic$ is t$e $ig$est proailit% along a single pat$ (or state se!uence NBNC E) at time @ w$ic$accounts for t$e oservations BC E Kup to time @ and ends in state U" #$e iterative process ofcalculating t$is value can e rewritten asKB(]) R maT K(U)[T\ " \(KB)#$is can e t$oug$t of as finding t$e most likel% transition from t$e previous states at time @ tostate ] at time @ 1 and t$en finding t$e proailit% of eing in t$at state given t$e oservation

KB" #$is $as a ver% similar form to t$e forward algorit$m ut finds t$e maimum value rat$ert$an t$e sum of values"#o track t$is optimal se!uence we need to keep track of w$ic$ state we transitioned from toeac$ suse!uent state" #$is means finding t$e argument U w$ic$ maimised t$e e!uationaove" #$is can e t$oug$t of as t$e most likel% previous state and is stored in t$e arra% K(U)"So following a similar procedure to t$e forward algorit$m t$e forward pass of t$e iterialgorit$m isB(U) R QT " T(B) 1 V U V JB(U) R 0#$en K(]) R m a BTIKOB(U)" [T\ " \(K) 2 V @ V K(]) R[ m a BTIxKOB(U)" [T\y 1 V ] V J#$e most likel% final state is given %

R m a BTIx(U)y

N

R[ m a BTIx

(U)y#$en in order to find t$e most likel% se!uenceof states we perform t$e ackward passNK R KB(NKB ) @ R G 1 G 2 E 1


12/14


12

3"b"3"b"3"b"3"b" ample b #$e iteri lgorit$mample b #$e iteri lgorit$mample b #$e iteri lgorit$mample b #$e iteri lgorit$m-sing t$e same model and oservation se!uence as in ample 3 w$at is t$e most likel%se!uence of states

QT R 0"3 0"2 0"7

Z R [T\j R[/q

0"b 0"3 0"3 0"2 0": 0"2 0"1 0"1 0"9 ig$t evel

States

0"2 0"3 0"7 R \R

0"2 0"b 0"b 0": 0"3 0"1-sing t$e aove w$at state se!uence est descries t$e following oservations=or t$e first time step @ R 1

B(]) R QT " T(B) R QB B(B)QC C(B)Q (B) R 0"30"70"20"b0"70"1 R

0"170"090"07B(]) R 000

=or t$e second time step we need to find t$e most likel% transition to eac$ state from B(])


13/14


13

C(]) R m a BTIB(U)" [T\ " \(C) R

[? 0"170"090"07 " 0"b0"30"3

[? 0"170"090"07 " 0"20":0"2[? 0"170"090"07 0"10"10"9 " \(C) R

[? 0"0:0"02b0"017

[? 0"03

0"0b90"01 [? 0"0170"0090"0b " \(C)

R 0"0:0"0b90"0b " 0"70"b0"1 R

0"030"01;20"00b #$en

C(]) R 123gain see t$e Matla eample named viterialgorit$mweat$ere0ample"m" #$e final matricesend up looking like t$is

K(]) R @ R 10"170"090"07 @ R 20"030"01;20"00b

@ R 30"00:0"00b:0"0003@ R b0"00120"00110"0001

K(]) R 0 1 1 10 2 2 20 3 3 1K R x1 1 1 1yets s$ow a more involved eample"Sa% a se!uence of a %ears weat$er is randoml% generated using t$e model aove" ,f %ou see t$eresult from viterialgorit$mweat$ere0amplelong"m %ou will see t$at !uickl% reduces tover% small values due to t$e large numer of proailit% multiplications" ,f t$is numer ecomestoo small for t$e memor% of t$e computer t$is is called t$e underflow prolem:" #$is can esolved using logFlikeli$oods as seen in viterialgorit$mweat$ere0amplelonglog"m6"

:#$e opposite is intuitivel% called t$e overflowprolem w$en a numer ecomes too ig"6#$e a and matrices were c$anged in t$is eample to give a more realistic idea of weat$er transitions"


14/14


1b

3"7"3"7"3"7"3"7" 5rolem 35rolem 35rolem 35rolem 3 #$e#$e#$e#$e

hidden markov models tutorial - david springer

Documents