slides-lecture-6chrome.ws.dei.polimi.it/images/9/95/mis-handout-lecture-7.pdf · title: microsoft...
TRANSCRIPT
Met
hods
for I
ntel
ligen
t Sys
tem
sM
etho
ds fo
r Int
ellig
ent S
yste
ms
Lect
ure
Not
es o
n M
achi
ne L
earn
ing
Lect
ure
Not
es o
n M
achi
ne L
earn
ing
Mat
teo
Mat
teucc
i
Dep
artm
ent
of
Ele
ctro
nic
s an
d I
nfo
rmat
ion
Polit
ecnic
odi M
ilano
Prob
abili
stic
Mod
elin
g of
Tim
ePr
obab
ilist
ic M
odel
ing
of T
ime
-- Mar
kov
Chai
ns
Mar
kov
Chai
ns --
Suppose
we
nee
d a
model
to t
ake
dec
isio
ns,
we
hav
e to
fac
e w
orld
unce
rtai
nty
due
to:
•Pa
rtia
l In
form
atio
n
•N
ois
y D
ata
•Tim
e ch
anges
!
Up to
now
w
e hav
e use
d st
atic
m
odel
s, now
on w
e w
ill use
m
ore
ap
pro
priat
edyn
amic
model
s:•
Pres
ent
situ
atio
n (
or
stat
e) is
just
one
snap
shot
(des
crib
ed u
sing
random
var
iable
s) in a
tim
e se
quen
ce
•Ran
dom
var
iable
val
ues
chan
ge
ove
r tim
e
•Act
ual
sta
te d
epen
ds
on p
ast
his
tory
Tim
e an
d U
nce
rtai
nty
Tim
e an
d U
nce
rtai
nty
Pro
bab
ilist
ic R
easo
nin
g f
or
Tim
e Ser
ies
Pro
bab
ilist
ic R
easo
nin
g f
or
Tim
e Ser
ies
To d
escr
ibe
an e
ver
chan
gin
g w
orld w
e ca
n u
se a
ser
ies
of
random
variab
les
des
crib
ing t
he
world s
tate
at
any
tim
e in
stan
t!
•A B
ayes
ian N
etw
ork
Bay
esia
n N
etw
ork
that
form
s a
chai
n!
•It
rep
rese
nts
a s
equen
ce o
f st
ates
ove
r tim
e: X
1,
X2,
X3,
…
•The
tran
sition f
rom
Xt-
1to
Xtdep
ends
only
on X
t-1
P(X
t|X
t-1,
Xt-
2,
…,
X1,X
0)=
P(X
t|X
t-1)
(Mar
kov
Proper
ty)
•W
hen
tra
nsi
tion p
robab
ilities
are
the
sam
e a
any
tim
e t,
we
are
faci
ng d
efin
es a
sta
tionar
y pro
cess
.
X 2X 3
X 4X 1
…
Let’s
star
t fr
om
the
very
beg
innin
g!
Giv
enX
tth
e va
lue
of
a sy
stem
char
acte
rist
ic a
t tim
e t
des
crib
ed a
s a
(sta
te)
random
var
iable
, w
e hav
e:
•D
iscr
ete
Sto
chas
tic
Proce
ss:
des
crib
es t
he
a re
lationsh
ip b
etw
een
the
stoch
astic
des
crip
tion o
f a
syst
em (
X0,
X1,
X2,
…)
at s
om
e dis
cret
e tim
e st
eps.
•A C
ontinuous
Sto
chas
tic
Proce
ssis
a s
toch
astic
pro
cess
wher
e th
e st
ate
can b
e obse
rved
at
any
tim
e.
A D
iscr
ete
Sto
chas
tic
Proce
ss is
a (f
irst
ord
er)
Mark
ov C
hain
when
we
hav
e th
ett
= 1
, 2,
3,
…an
d f
or
all n
stat
es it
hold
s:
P(X
t+1=
i t+1|X
t=i t,
Xt-
1=
i t-1,…
,X1=
i 1,X
0=
i 0)=
P(X
t+1=
i t+1|X
t=i t)
When
ever
the
pro
bab
ility
of
an e
vent
is indep
enden
t fr
om
tim
e th
e M
arko
v Chai
n is
Sta
tionar
y:P(
Xt+
1=
j|X
t=i)
=p
ij
Sto
chas
tic
Pro
cess
es a
nd M
arko
v Chai
ns
Sto
chas
tic
Pro
cess
es a
nd M
arko
v Chai
ns
Mar
kov
Chai
n D
escr
iption
Mar
kov
Chai
n D
escr
iption
A M
arko
v Chai
n c
an b
e des
crib
ed u
sing a
Tra
nsi
tion M
atrix
wher
ep
ijdes
crib
es t
he
pro
bab
ility
of
get
ting into
sta
te j
star
ting f
rom
sta
te i:
This
tra
nsi
tion m
atrix
can b
e des
crib
ed a
lso u
sing a
direc
ted g
raph a
s w
ith c
lass
ical
Bay
esia
n N
etw
ork
s:nnn
nn
nn pp
pp
pp
pp
pp
pp
P
.....
......
....
......
....
...... 32
1
223
2221
113
1211
11
n jij
p
ik
j
p jk
p ij
p ii
p kk
p ji
Giv
en a
Mar
kov
Chai
n in s
tate
iat
tim
e m
we
can c
om
pute
sta
tes
pro
bab
ility
aft
er n
tim
e st
eps:
P(X
m+
n=
j|X
m=
i)=
P(X
n=
j|X
0=
i)=
Pij(n
)
If w
e ta
ke n
=2 w
e hav
e
P ij(2)
=k
pik
·p
kjSca
lar
pro
duct
of ro
w i
and c
olu
mn j
In g
ener
al P
ij(n
)=
ij-t
hel
emen
t of P
n.
The
pro
bab
ility
of
bei
ng in a
giv
en s
tate
jat
tim
e n
without
know
ing
the
exac
t st
ate
of
Mar
kov
Chai
n a
t tim
e 0
is t
hus:
iq
i·
P ij(n)
= q
·(c
olu
mn j o
f Pn
)w
her
e:q
i=
sta
te i p
robab
ility
at
tim
e 0
Com
puting P
robab
ilities
Com
puting P
robab
ilities
The
Cola
Exa
mple
(I)
The
Cola
Exa
mple
(I)
Suppose
our
com
pan
y pro
duce
s tw
o b
rands
of
Cola
(i.e.
, Cola
1,
and
Cola
2)
and t
her
e ar
e no o
ther
Cola
s on t
he
mar
ket.
A p
erso
n b
uyi
ng
Cola
1 w
ill b
uy
Cola
1 a
gai
n w
ith p
robab
ility
0.9
. A p
erso
na
buyi
ng C
ola
2
will
buy
Cola
2 a
gai
n w
ith p
robab
ility
0.8
.
•Som
eone
has
bought
Cola
2,
what
’s t
he
pro
bab
ility
he/
she
will
buy
Cola
1 a
fter
2 t
imes
?•
Som
eone
has
bought
Cola
1,
what
’s t
he
pro
bab
ility
he/
she
will
buy
Cola
1 a
gai
n a
fter
3 t
imes
?•
Suppose
at
som
e tim
e 60%
of cl
ients
bought
Cola
1 a
nd 4
0%
Cola
2.
Aft
er t
hre
e purc
has
es w
hat
’s t
he
per
centa
ge
of
peo
ple
buyi
ng C
ola
1?
Col
a1
Col
a2
Col
a1
Col
a2
0.10
0.90
0.80
0.20
P =
Som
eone
has
bought
Cola
2,
what
’s t
he
pro
bab
ility
he/
she
will
buy
Cola
1 a
fter
2 t
imes
?P(
X2=
1|X
0=
2)=
P21(2
)
Som
eone
has
bought
Cola
1,
what
’s t
he
pro
bab
ility
he/
she
will
buy
Cola
1 a
gai
n a
fter
3 t
imes
? P(X
3=
1|X
0=
1)=
P11(3
)
0.34
0.10
0.90
0.80
0.20
0.10
0.90
0.80
0.20
0.17
0.83
0.66
P2=
=
The
Cola
Exa
mple
(II
)The
Cola
Exa
mple
(II
)
0.10
0.90
0.80
0.20
0.21
90.
781
0.56
20.
438
P3=
=0.
170.
83
0.66
0.34
Suppose
at
som
e tim
e 60%
of
clie
nts
bought
Cola
1 a
nd 4
0%
Cola
2.
Aft
er t
hre
e purc
has
es w
hat
’s t
he
per
centa
ge
of
peo
ple
buyi
ng C
ola
1?
p=
iq
i·
P ij(3)
= q
·(c
olu
mn 1
of
P3)
Note:
What
we
hav
e dis
cuss
ed s
o f
ar is
the
firs
t-ord
er M
arko
v Chai
n.
More
gen
eral
ly,
in k
th-o
rder
Mar
kov
Chai
n,
each
sta
te t
ransi
tion
dep
ends
on p
revi
ous
kst
ates
.
The
Cola
Exa
mple
(II
I)The
Cola
Exa
mple
(II
I)
0.64
380.
781
0.43
8p=
=0.
400.
60
What
’s t
he
size
of
tran
sition p
robab
ility
mat
rix?
X 1X 2
X 3X 0
…
A B
unch
of
Def
initio
ns
A B
unch
of
Def
initio
ns
Giv
en a
Mar
kov
Chai
n w
e def
ine:
•Sta
te j
isre
achab
lefr
om
iif it
exis
t a
pat
h f
rom
ito
j
•Sta
tes
ian
djco
mm
unic
ate
ifiis
rea
chab
le f
rom
jan
d v
icev
ersa
•A s
et o
f st
ates
Sin
a M
arko
v Chai
n is
close
dif n
o s
tate
outs
ide
S
is r
each
able
fro
m a
sta
te in S
•A s
tate
iis
an a
bso
rbin
gst
ate
if p
ii=1
•A s
tate
iis
tran
sien
tif e
xist
s jre
achab
le f
rom
i,
but
iis
not
reac
hab
le f
rom
j•
A s
tate
that
is
not
tran
sien
t is
def
ined
as
recu
rren
t•
A s
tate
iis
per
iodic
with p
erio
d k
>1
ifk
is t
he
smal
lest
num
ber
th
at d
ivid
es t
he
length
of
all pat
h f
rom
ito
i•
A s
tate
that
is
not
per
iodic
is
said
a-p
erio
dic
If a
ll st
ates
in a
Mar
kov
Chai
n a
re r
ecurr
ent,
a-per
iodic
,an
dco
mm
unic
ate
with e
ach o
ther
, it is
said
to b
e Ergothic
A s
imple
exa
mple
of
Erg
oth
icM
arko
v Chai
n is
the
follo
win
g:
Do t
he
follo
win
g t
ransi
tions
repre
sent
Erg
oth
icM
arko
v Chai
ns?
Exa
mple
s of
Exa
mple
s of
Erg
oth
icErg
oth
icM
arko
v Chai
ns
Mar
kov
Chai
ns
13
20.
30.
7
0.5
0.25
0.5
0.75
0.3
0.7
0
0.5
0
0
.50
0.2
5 0
.75
P =
1/4
1/2
1
/42/
3
1/3
0
0
2/3
1/
3P
= 1/2
1/2
0
0
1/2
1/2
0
0
0
0
2/3
1/3
0
0
1/4
3/4
P =
13
20.
250.
5
0.66
0.66
0.33
0.33
0.25
13
20.
50.
5
0.5
0.66
0.5
0.254
0.75
0.33
Bei
ng
Pth
e tr
ansi
tion m
atrix
of
an E
rgoth
icM
arko
v Chai
n w
ith n
stat
esw
e hav
e th
at
limP i
j(n) =
j
With
= [
12
3…
n]=
Pbei
ng t
he
Ste
ady
Sta
te D
istr
ibution
The
Cola
Exa
mple
:
n+
Ste
ady
Sta
te D
istr
ibution
Ste
ady
Sta
te D
istr
ibution
.33
.67
.33
.67
30
.33
.67
.33
.67
40
.33
.67
.33
.67
20
.35
.65
.32
.68
10
.44
.56
.28
.72
5
.56
.44
.22
.78
3
.66
.34
.17
.83
2
.80
.20
.10
.90
1
P22(n)
P21(n)
P12(n)
P11(n)
n
STEA
DY
STA
TE
0.9
0.1
0.2
0.8
P =
0.67
0.
330.
67
0.33
=
Tra
nsi
tory
Beh
avio
rTra
nsi
tory
Beh
avio
r
The
bah
avio
rof
a M
arko
v Chai
n b
efore
get
ting t
o t
he
Ste
ady
Sta
te id
def
ined
tran
sito
ry
We
can c
om
pute
the
expec
ted n
um
ber
of
tran
sition
to r
each
sta
te j
bei
ng in s
tate
ifo
r an
Erg
oth
icM
arko
v Chai
n:
mij
= p
ij(1
)+k
jpik·
(1+
mkj)=
1+
kjp
ik·
mkj
The
Cola
Exa
mple
:
•H
ow
man
y bott
le o
n a
vera
ge
a Cola
1 b
uye
r w
ill h
ave
bef
ore
sw
itch
ing t
o C
ola
2?
m12=
1+
kjp
1k·
mk2
=1+
0.9
·m12
m12=
10
•W
hat
about
vice
vers
a?
m21=
1+
kjp
2k·
mk1
=1+
0.8
·m21
m21=
5
TRANSITORY
P
We
hav
e an
d a
bso
rbin
g M
arko
v Chai
nif t
her
e ex
ist
one
or
more
ab
sorb
ing s
tate
s an
d a
ll th
e oth
er a
re t
ransi
ent.
For
an a
bso
rbin
g M
arko
v Chai
n w
e ca
n w
rite
the
tran
sition m
atrix
as:
wher
e:•
Qis
the
tran
sition m
atrix
for
tran
sien
t st
ates
•R
is t
he
tran
tion
mat
rix
from
tra
nsi
ent
to a
bso
rbin
g s
tate
s
What
kin
d o
f in
fere
nce
we
could
mak
e w
ith t
his
model
?•
How
long it
will
tak
e to
get
in a
n a
bso
rbin
g s
tate
giv
en t
hat
we
star
t fr
om
a t
ransi
ent
one?
•Sta
rtin
g f
rom
a t
ransi
ent
stat
e, h
ow long d
oes
it
take
s to
get
to
an a
bso
rbin
g o
ne?
Dea
ling w
ith A
bso
rbin
g S
tate
sD
ealin
g w
ith A
bso
rbin
g S
tate
s
QR
P =
0I
How
long it
will
tak
e to
get
in a
n a
bso
rbin
g s
tate
giv
en t
hat
we
star
t fr
om
a t
ransi
ent
one?
•Bei
ng in a
tra
nsi
ent
stat
e ith
e av
erag
e tim
e sp
ent
in a
tra
nsi
ent
stat
ejis
the
ij-t
hel
emen
t of
(I-Q
)-1
Sta
rtin
g f
rom
a t
ransi
ent
stat
e, h
ow
long d
oes
it
take
s to
get
to a
n
abso
rbin
g o
ne?
•Bei
ng in t
ransi
ent
stat
e ith
e pro
bab
ility
to g
et into
an a
bso
rbin
g
stat
ejis
the
ij-t
hel
emen
t of
(I-Q
)-1·R
Exa
mple
: in
a c
om
pan
y th
ere
are
3 lev
els:
junio
r, s
enio
r, p
artn
er.
You
can lea
ve t
he
com
pan
y as
par
tner
or
not
•H
ow
long d
oes
a junio
r re
mai
ns
in t
he
com
pan
y?
•W
hat
’s t
he
pro
bab
ility
for
a ju
nio
rto
lea
ve t
he
com
pan
y as
par
tner
?
Infe
rence
in A
bso
rbin
g M
arko
v Chai
ns
Infe
rence
in A
bso
rbin
g M
arko
v Chai
ns
10
00
0
01
00
0
0.0
50
0.9
50
0
00.1
00.2
00.7
00
00.0
50
0.1
50.8
0
P =
J
S
P
LP
L
N
The
Com
pan
y Exa
mple
The
Com
pan
y Exa
mple
How
long d
oes
a junio
r re
mai
ns
in t
he
com
pan
y?
•H
e/sh
e w
ill s
tay
as J
unio
r: m
11
= 5
•H
e/sh
e w
ill s
tay
as S
enio
r: m
12
= 2
.5
•H
e/She
will
sta
y as
Par
tner
: m
13
= 1
0
What
’s t
he
pro
bab
ility
for
a ju
nio
r to
lea
ve t
he
com
pan
y as
par
tner
?
•H
e/She
will
end u
p in s
tate
LP:
m12
= 0
.5
(I-Q
)-1=
5 2
.5
100
3.3
13.
30
0
2
0
17.5
yea
rs!
(I-Q
)-1·
R =
0.5
0.5
0.3
0.7
0
1
Suppose
we
are
a gam
ble
r an
d w
e st
art
from
a 3
$ c
apital
, w
ith
pro
bab
ility
p=
1/3
we
can w
in 1
$an
d w
ith p
robab
ility
1-p
=2/3
we
loose
1$.
We
fail
if o
ut
capital
get
to 0
and w
e w
in if
our
capital
bec
om
es 5
.
We
can d
escr
ibe
our
capital
as
a M
arko
v Chai
n b
eing X
tour
capital
:
•Po
ssib
le s
tate
s: 0
, 1,
2,
3,
4,
5
•Tra
nsi
tion p
robab
ility
: p(X
t+1=
Xt+
1)=
1/3
, p(X
t+1=
Xt-
1)=
2/3
What
kin
d o
f re
asonin
g c
an w
e ap
ply
to t
his
model
?
•W
hat
’s t
he
pro
bab
ility
of
sequen
ce 3
, 4,
3,
2,
3,
2,
1,
0?
•W
hat
’s t
he
pro
bab
ility
of
succ
ess
for
the
gam
ble
r?
•W
hat
’s t
he
aver
age
num
ber
of
bet
s th
e gam
ble
r w
ill m
ake?
Exe
rcis
e: G
amble
rExe
rcis
e: G
amble
r ’’s
Ruin
s Ruin
X 1=?
X 2=?
X 3=?
X 0=3
…
Why
Should
I C
are
All
This
Cra
zy M
ath?
Why
Should
I C
are
All
This
Cra
zy M
ath?
“Nic
e, b
ut
unle
ss I
wan
t to
gam
ble
why
should
I c
are?
I’m
a c
om
pute
r en
gin
eer
what
this
has
to d
o w
ith p
ract
ical
inte
lligen
t sy
stem
s?”
What
do y
ou t
his
is
the
gre
ates
t re
volu
tion
(or
revo
lutionar
y co
mpan
y) o
n t
he
web
in
the
last
dec
ade?
Ass
um
e a
link
from
pag
e A t
o p
age
B is
a re
com
men
dat
ion o
f pag
e B
by
the
auth
or
of
A (
we
say
B is
succ
esso
r of
A).
•Q
ual
ity
of
a pag
e is
rel
ated
to its
in-d
egre
e.
•The
of
a pag
e is
rel
ated
to t
he
qual
ity
of
pag
es lin
king t
o it
This
rec
urs
ivel
y def
ines
the
Pag
eR
an
kof
a pag
e [B
rin
& P
age
‘98]
For
a (b
ette
r) d
etai
led d
escr
iption fee
l fr
ee t
o r
ead:
•htt
p:/
/ww
w-d
b.s
tanfo
rd.e
du/~
bac
krub/g
oogle
.htm
l•
htt
p:/
/ww
w.iprc
om
.com
/pap
ers/
pag
eran
k/
’’ ssPa
geR
ank
PageR
ank
Suppose
the
web
is
an E
rgoth
icM
arko
v Chai
n (
I kn
ow
this
is
a big
as
sum
ption).
Consi
der
bro
wsi
ng a
s an
infinite
random
wal
k (s
urf
ing):
•In
itia
lly t
he
surf
er is
at a
ran
dom
pag
e
•At
each
ste
p,
the
surf
er p
roce
eds
oto
a r
andom
ly c
hose
n w
eb p
age
with p
robab
ility
d
oto
a r
andom
ly c
hose
n s
ucc
esso
r of
the
curr
ent
pag
e w
ith
pro
bab
ility
1-d
The
PageR
ank
of
a pag
e is
the
frac
tion o
f st
eps
the
surf
er s
pen
ds
on it
in t
he
limit.
A
B
C
D
E
FG
Def
initio
n o
f D
efin
itio
n o
f Pa
geR
ank
PageR
ank
PageR
ank
= t
he
stea
dy
stat
e pro
bab
ility
for
this
Mar
kov
Chai
n
•n
is t
he
tota
l num
ber
of
nodes
in t
he
gra
ph
•d
is t
he
pro
bab
ility
of
a ra
ndom
jum
p
PageR
ank(
C)
= d
/n+
(1-d
)(1/4
Pag
eRan
(A)
+1/3
Pag
eRan
k(B))
Sum
mar
izes
the
“web
opin
ion”
about
the
pag
e im
port
ance
•Q
uer
y-in
dep
enden
t
•It
can
be
fake
d …
read
the
pro
vided
lin
ks if
you a
re c
urious!
Eu
vv
outd
egre
ev
Page
Rank
dd
uPa
geRa
nk)
,(
)(
/)(
)1(
)(
AB
C
Prob
abili
stic
Mod
elin
g of
Tim
ePr
obab
ilist
ic M
odel
ing
of T
ime
-- Hid
den
Mar
kov
Mod
els
Hid
den
Mar
kov
Mod
els
--
In s
om
e M
arko
v pro
cess
es,
we
may
not
be
able
to o
bse
rve
direc
tly
the
stat
es.
In t
his
cas
e w
e get
anoth
er f
amous
Bay
esia
n N
etw
ork
nam
ed
asH
idden
Mar
kov
Model
(HM
M).
An H
MM
is
des
crib
ed b
y a
quin
tuple
<S,E
,P,A
,B>
•S
:{s 1
,…,s
N}
are
the
valu
es f
or
the
hid
den
sta
tes
•E
:{e 1
,…,e
T}
are
the
valu
es f
or
the
obse
rvat
ions
•P:
pro
bab
ility
dis
trib
ution o
f th
e in
itia
l st
ate
•A:
tran
sition p
robab
ility
mat
rix
•B:
emis
sion p
robab
ility
mat
rix
For
a dee
per
des
crip
tion fee
l fr
ee t
o r
ead:
htt
p:/
/ww
w.c
s.ubc.
ca/~
murp
hyk
/Bay
es/r
abin
er.p
df
Hid
den
Mar
kov
Model
sH
idden
Mar
kov
Model
s
X t+
1X t
X t-1
e t+1
e te t-1
e 1
X T e T
X 1…
…
Audio
Spec
trum
of
the
song f
or
the
Proth
onota
ryW
arble
r
Audio
Spec
trum
of
the
song f
or
the
Ches
tnut-
sided
War
ble
r
What
can
we
ask
to t
his
HM
M?
•W
hat
bird is
this
?Tim
e Ser
ies
Cla
ssific
atio
n•
How
will
the
song c
ontinue?
Tim
e Ser
ies
Pre
dic
tion
•Is
this
bird s
ick?
Outlie
r D
etec
tion
•W
hat
phas
es d
oes
this
song h
ave?
Tim
e Ser
ies
Seg
men
tation
An E
xam
ple
: The
Audio
Spec
trum
An E
xam
ple
: The
Audio
Spec
trum
Obse
rvat
ions
Sta
te
Obse
rvat
ions
Sta
te
What
can
we
ask
to t
his
HM
M?
•W
ill t
he
stock
go u
p o
r dow
n?
Tim
e Ser
ies
Pre
dic
tion
•W
hat
typ
e st
ock
is
this
(eg
, risk
y)?
Tim
e Ser
ies
Cla
ssific
atio
n•
Is t
he
beh
avio
r ab
norm
al (
eg,
BF)
?O
utlie
r D
etec
tion
Anoth
er T
ime
Ser
ies
Pro
ble
mAnoth
er T
ime
Ser
ies
Pro
ble
m
Inte
l
Cis
coG
E
MS
Musi
c Anal
ysis
Musi
c Anal
ysis
What
can
we
ask
to t
his
HM
M?
•Can
we
com
pose
more
of
that
? Tim
e Ser
ies
Pred
iction
•Is
this
Bee
thove
n o
r Bac
h?
Tim
e Ser
ies
Cla
ssific
atio
n•
Can
we
segm
ent
it into
them
es?
Tim
e Ser
ies
Seg
men
tation
Wea
ther
: A M
arko
v Chai
n M
odel
Wea
ther
: A M
arko
v Chai
n M
odel
Sta
tes:
{S
sunny,
Sra
iny,
Ssn
ow
y}Sta
te t
ransi
tion p
robab
ilities
:
Initia
l st
ate
dis
trib
ution:
q =
(0.7
0.2
5
0.0
5)
Giv
en:
What
is
the
pro
bab
ility
of
this
ser
ies?
P(s)
=P(
Ssu
nny)
P(S
rain
y|S
sunny)
P(S
rain
y|S
rain
y)P(
Sra
iny|
Sra
iny)
P(S
snow
y|S
rain
y)P(
Ssn
ow
y|S
snow
y)=
0.7
·0.1
5·0
.6·0
.6·0
.02·0
.2=
0.0
001512
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38% 20%
75%
5%
P =
0.80
0.1
5 0
.05
0.38
0.6
0 0
.02
0.75
0.0
5 0
.20
Wea
ther
: An H
idden
Mar
kov
Model
sW
eath
er:
An H
idden
Mar
kov
Model
s
65%
5%
30%
60%
10%
30%
50%
0%50
%
Sunny
Rainy
Snowy
80%
15%
5%
60%
2%
38%
20%
75%
5%
Ingre
die
nts
of
HM
M a
nd F
undam
enta
l Q
ues
tions
Ingre
die
nts
of
HM
M a
nd F
undam
enta
l Q
ues
tions
Sta
tes:
{S
sunny,
Sra
iny,
Ssn
ow
y}O
bse
rvat
ions:
{O
short
s, O
coat,
Oum
bre
lla}
Sta
te t
ransi
tion p
robab
ilities
:
Obse
rvat
ion p
robab
ilities
:
Initia
l st
ate
dis
trib
ution:
P =
(0.7
0.2
5
0.0
5)
Giv
en:
…
•W
hat
is
the
pro
bab
ility
of
this
ser
ies?
•W
hat
is
the
under
lyin
g s
equen
ce o
f st
ate?
•H
ow
can
I lea
rn m
y H
MM
par
amet
ers?
A =
0.80
0.1
5 0
.05
0.38
0.6
0 0
.02
0.75
0.0
5 0
.20
B =
0.60
0.3
0 0
.10
0.05
0.3
0 0
.65
0.00
0.5
0 0
.50
Com
puting F
orw
ard P
robab
ility
Com
puting F
orw
ard P
robab
ility
We
def
ine
the
Forw
ard P
robab
ility
as
the
pro
bab
ility
of
actu
al s
tate
and
obse
rvat
ions
P(X
t=s i,
e 1:t)
Why
com
pute
forw
ard p
robab
ility
?•
Probab
ility
of
obse
rvat
ions:
P(e
1:t).
•Pr
edic
tion:
P(X
t+1=
s i|
e 1:t)=
?
P(X
t=s i,
e 1:t)
=P(
Xt=
s i,e
1:t
-1,e
t)=
jP(
Xt-
1=
s j,X
t=s i,e
1:t
-1,e
t)=
jP(
e t|X
t=s i,X
t-1=
s j,e
1:t
-1)P
(Xt=
s i,X
t-1=
s j,e
1:t
-1)
=jP(
e t|X
t=s i)P
(Xt=
s i|X
t-1=
s j,e
1:t
-1)P
(Xt-
1=
s j,e
1:t
-1)
=jP(
e t|X
t=s i)P
(Xt=
s i|X
t-1=
s j)P
(Xt-
1=
s j,
e 1:t
-1)
i(t)
=P(
Xt=
s i,
e 1:t)
=jP
(Xt=
s i|X
t-1=
s j)P
(et|
Xt=
s i)
j(t-
1)
=jA
ijB
iet
j(t-
1)
Sam
e fo
rm,
use
rec
urs
ion
The
The
Viter
bi
Viter
biAlg
orith
mAlg
orith
m
From
obse
rvat
ions,
com
pute
the
mos
t lik
ely
hid
den
sta
te s
equen
ce:
argm
axP(
x 1:t|e
1:t)
= a
rgm
axP(
x 1:t,
e 1:t)/
P(e 1
:t)
= a
rgm
axP(
x 1:t,
e 1:t)
By
apply
ing t
he
Bay
esia
n N
etw
ork
pro
per
ty
P(x 1
:t,
e 1:t)
= P
(X0)
i=1,t
P(X
i|X
i-1)
P(e i
|Xi)
The
solu
tion w
e ar
e lo
oki
ng f
or
is t
he
one
that
min
imiz
es
-logP(
x 1:t,
e 1:t)=
–lo
gP(X
0)
+i=
1,t(–
logP(X
i|X
i-1)–
logP(e
i|X
i))
Giv
en a
HM
M c
onst
ruct
a g
raph t
hat
consi
sts
1+
t*N
nodes
:•
One
initia
l node
and N
node
at t
ime
iw
her
ejt
hre
pre
sents
Xi=
s j.
•The
link
bet
wee
n t
he
nodes
Xi-
1=
s jan
dX
i=s k
is a
ssoci
ated
with
the
length
–lo
g(P
(Xi=
s k|
Xi-
1=
s j)P
(ei|X
i=s k
))
The
pro
ble
m b
ecom
es t
hat
of
findin
g t
he
short
est
pat
h f
rom
X0=
s 0to
one
of
the
nodes
Xt=
s t.
Bau
mBau
m-- W
elch
Alg
orith
mW
elch
Alg
orith
m
The
pre
vious
two k
inds
of
com
puta
tion n
eeds
par
amet
ers
=(P
, A,
B).
Wher
e do t
he
pro
bab
ilities
com
e fr
om
?
Solu
tion:
Bau
m-W
elch
Alg
orith
m (
spec
ial ca
se o
f EM
)•
Unsu
per
vise
d lea
rnin
g f
rom
obse
rvat
ions
•Fi
nd a
rgm
axP
(e1:t)
Giv
en a
n o
bse
rvat
ion s
equen
ce,
find o
ut
whic
h t
ransi
tion p
robab
ility
an
d e
mis
sion p
robab
ility
tab
le a
ssig
ns
the
hig
hes
t pro
bab
ility
to t
he
obse
rvat
ions:
1.
Sta
rt w
ith a
n initia
l se
t of
par
amet
ers
0(p
oss
ibly
arb
itra
ry)
2.
Com
pute
pse
udo c
ounts
: how
man
y tim
es t
he
tran
sition f
rom
Xi-
i=s j
toX
i=s k
occ
urr
ed?
3.
Use
the
pse
udo c
ounts
to o
bta
in a
bet
ter
set
of
par
amet
ers
1
4.
Iter
ate
until P
1(e
1:t)
is n
ot
big
ger
than
P(e
1:t)
Pse
udo C
ounts
and B
ackw
ard P
robab
ility
Pse
udo C
ounts
and B
ackw
ard P
robab
ility
Giv
en t
he
obse
rvat
ion s
equen
ce e
1:T
,•
pse
udo c
ount
of
stat
e s i
at t
ime
tis
the
pro
bab
ility
P(X
t=s i|e
1:T
)P(
Xt=
s i|e
1:T
)=
P(X
t=s i,
e 1:t,
e t+
1:T
)/P(
e 1:T
)=
P(e t
+1:T
| X
t=s i,
e 1:t)P
(Xt=
s i,
e 1:t)/
P(e 1
:T)
=P(
e t+
1:T
| X
t=s i)P
(Xt=
s i|e
1:t)P
(e1:t)/
P(e 1
:T)
=i(t)
i(t)
/P(e
t+1:T
|e1:t)
•pse
udo c
ounts
of
the
link
from
Xt=
s ito
Xt+
1=
s jis
the
pro
bab
ility
P(
Xt=
s i,X
t+1=
s j|e
1:T
)=P(
Xt=
s i,X
t+1=
s j,e
1:t,e
t+1,e
t+2:T
)/P(
e 1:T
)=
P(X
t=s i,e
1:t)P
(Xt+
1=
s j|X
t=s i)P
(et+
1|X
t+1=
s j)
P(e t
+2:T
|Xt+
1=
s j)/
P(e 1
:T)
=P(
Xt=
s i,e
1:t)A
ijB
jet+
1P(
e t+
2:T
|Xt+
1=
s j)/
P(e 1
:T)
=i(t)
Aij
Bje
tj(t+
1)/
P(e 1
:T)
Bei
ng
j(t)
=P(
e t+
1,…
,eT|X
t=s j
) w
e ca
n c
om
pute
it
bac
kwar
d•
j(T)=
1;
•j(t)
=jA
ijB
jet
j(t+
1).
HM
M P
aram
eter
s U
pdat
eH
MM
Par
amet
ers
Updat
e
We
can e
ffic
iently
com
pute
forw
ard a
nd b
ackw
ard p
robab
ility
for
all th
e st
ates
in t
he
Hid
den
Mar
kov
Model
To u
pdat
e our
estim
ate
of
HM
M p
aram
eter
s•
count(
i):
the
tota
l pse
udo c
ount
of
stat
e s i.
•co
unt(
i,j)
: th
e to
tal pse
udo c
ount
of
tran
sition f
rom
sito
s j.
•Add P
(Xt=
s i,X
t+1=
s j|e
1:T
)to
count(
i,j)
•Add P
(Xt=
s i|e
1:T
)to
count(
i)•
Add P
(Xt=
s i|e
1:T
)to
count(
i,et
)•
Updat
ed A
ij=
count(
i,j)
/count(
i)•
Updat
ed B
jet=
count(
j,e t
)/co
unt(
j)
t-1
tt+1
t+2
i(t)
j(t+1)
a ijbjet
X t+1=s j
X t=s i
Sum
mar
y on H
MM
Sum
mar
y on H
MM
HM
Ms
are
gen
erat
ive
pro
bab
ilist
ic m
odel
s fo
r tim
e se
ries
with h
idden
info
rmat
ion (
stat
e).
Ther
e a
few
iss
ues
rem
ainin
g:
•Zer
o p
robab
ility
pro
ble
mo
Tra
inin
g s
equen
ce:
AAABBBAAA
oTes
t se
quen
ce:
AAABBBCAAA
•Fi
ndin
g “
right”
num
ber
of
stat
es,
right
stru
cture
•N
um
eric
al inst
abili
ties
Bes
ide
thes
e pro
ble
ms
they
are
ext
rem
ely
pra
ctic
al,
bes
t kn
ow
n
met
hods
in s
pee
ch r
ecognitio
n,
com
pute
r vi
sion,
robotics
, …
You’d
be
surp
rise
d b
y th
e re
lationsh
ips
bet
wee
n H
MM
an
d K
alm
anFi
lter
ing o
r Kal
man
Sm
ooth
ing!