slides-lecture-6chrome.ws.dei.polimi.it/images/9/95/mis-handout-lecture-7.pdf · title: microsoft...

Met

hods

for I

ntel

ligen

t Sys

tem

sM

etho

ds fo

r Int

ellig

ent S

yste

ms

Lect

ure

Not

es o

n M

achi

ne L

earn

ing

Lect

ure

Not

es o

n M

achi

ne L

earn

ing

Mat

teo

Mat

teucc

i

[email protected]

Dep

artm

ent

of

Ele

ctro

nic

s an

d I

nfo

rmat

ion

Polit

ecnic

odi M

ilano

Prob

abili

stic

Mod

elin

g of

Tim

ePr

obab

ilist

ic M

odel

ing

of T

ime

-- Mar

kov

Chai

ns

Mar

kov

Chai

ns --

Suppose

we

nee

d a

model

to t

ake

dec

isio

ns,

we

hav

e to

fac

e w

orld

unce

rtai

nty

due

to:

•Pa

rtia

l In

form

atio

n

•N

ois

y D

ata

•Tim

e ch

anges

!

Up to

now

w

e hav

e use

d st

atic

m

odel

s, now

on w

e w

ill use

m

ore

ap

pro

priat

edyn

amic

model

s:•

Pres

ent

situ

atio

n (

or

stat

e) is

just

one

snap

shot

(des

crib

ed u

sing

random

var

iable

s) in a

tim

e se

quen

ce

•Ran

dom

var

iable

val

ues

chan

ge

ove

r tim

e

•Act

ual

sta

te d

epen

ds

on p

ast

his

tory

Tim

e an

d U

nce

rtai

nty

Tim

e an

d U

nce

rtai

nty

Pro

bab

ilist

ic R

easo

nin

g f

or

Tim

e Ser

ies

Pro

bab

ilist

ic R

easo

nin

g f

or

Tim

e Ser

ies

To d

escr

ibe

an e

ver

chan

gin

g w

orld w

e ca

n u

se a

ser

ies

of

random

variab

les

des

crib

ing t

he

world s

tate

at

any

tim

e in

stan

t!

•A B

ayes

ian N

etw

ork

Bay

esia

n N

etw

ork

that

form

s a

chai

n!

•It

rep

rese

nts

a s

equen

ce o

f st

ates

ove

r tim

e: X

1,

X2,

X3,

…

•The

tran

sition f

rom

Xt-

1to

Xtdep

ends

only

on X

t-1

P(X

t|X

t-1,

Xt-

2,

…,

X1,X

0)=

P(X

t|X

t-1)

(Mar

kov

Proper

ty)

•W

hen

tra

nsi

tion p

robab

ilities

are

the

sam

e a

any

tim

e t,

we

are

faci

ng d

efin

es a

sta

tionar

y pro

cess

.

X 2X 3

X 4X 1

…

Let’s

star

t fr

om

the

very

beg

innin

g!

Giv

enX

tth

e va

lue

of

a sy

stem

char

acte

rist

ic a

t tim

e t

des

crib

ed a

s a

(sta

te)

random

var

iable

, w

e hav

e:

•D

iscr

ete

Sto

chas

tic

Proce

ss:

des

crib

es t

he

a re

lationsh

ip b

etw

een

the

stoch

astic

des

crip

tion o

f a

syst

em (

X0,

X1,

X2,

…)

at s

om

e dis

cret

e tim

e st

eps.

•A C

ontinuous

Sto

chas

tic

Proce

ssis

a s

toch

astic

pro

cess

wher

e th

e st

ate

can b

e obse

rved

at

any

tim

e.

A D

iscr

ete

Sto

chas

tic

Proce

ss is

a (f

irst

ord

er)

Mark

ov C

hain

when

we

hav

e th

ett

= 1

, 2,

3,

…an

d f

or

all n

stat

es it

hold

s:

P(X

t+1=

i t+1|X

t=i t,

Xt-

1=

i t-1,…

,X1=

i 1,X

0=

i 0)=

P(X

t+1=

i t+1|X

t=i t)

When

ever

the

pro

bab

ility

of

an e

vent

is indep

enden

t fr

om

tim

e th

e M

arko

v Chai

n is

Sta

tionar

y:P(

Xt+

1=

j|X

t=i)

=p

ij

Sto

chas

tic

Pro

cess

es a

nd M

arko

v Chai

ns

Sto

chas

tic

Pro

cess

es a

nd M

arko

v Chai

ns

Mar

kov

Chai

n D

escr

iption

Mar

kov

Chai

n D

escr

iption

A M

arko

v Chai

n c

an b

e des

crib

ed u

sing a

Tra

nsi

tion M

atrix

wher

ep

ijdes

crib

es t

he

pro

bab

ility

of

get

ting into

sta

te j

star

ting f

rom

sta

te i:

This

tra

nsi

tion m

atrix

can b

e des

crib

ed a

lso u

sing a

direc

ted g

raph a

s w

ith c

lass

ical

Bay

esia

n N

etw

ork

s:nnn

nn

nn pp

pp

pp

pp

pp

pp

P

.....

......

....

......

....

...... 32

1

223

2221

113

1211

11

n jij

p

ik

j

p jk

p ij

p ii

p kk

p ji

Giv

en a

Mar

kov

Chai

n in s

tate

iat

tim

e m

we

can c

om

pute

sta

tes

pro

bab

ility

aft

er n

tim

e st

eps:

P(X

m+

n=

j|X

m=

i)=

P(X

n=

j|X

0=

i)=

Pij(n

)

If w

e ta

ke n

=2 w

e hav

e

P ij(2)

=k

pik

·p

kjSca

lar

pro

duct

of ro

w i

and c

olu

mn j

In g

ener

al P

ij(n

)=

ij-t

hel

emen

t of P

n.

The

pro

bab

ility

of

bei

ng in a

giv

en s

tate

jat

tim

e n

without

know

ing

the

exac

t st

ate

of

Mar

kov

Chai

n a

t tim

e 0

is t

hus:

iq

i·

P ij(n)

= q

·(c

olu

mn j o

f Pn

)w

her

e:q

i=

sta

te i p

robab

ility

at

tim

e 0

Com

puting P

robab

ilities

Com

puting P

robab

ilities

The

Cola

Exa

mple

(I)

The

Cola

Exa

mple

(I)

Suppose

our

com

pan

y pro

duce

s tw

o b

rands

of

Cola

(i.e.

, Cola

1,

and

Cola

2)

and t

her

e ar

e no o

ther

Cola

s on t

he

mar

ket.

A p

erso

n b

uyi

ng

Cola

1 w

ill b

uy

Cola

1 a

gai

n w

ith p

robab

ility

0.9

. A p

erso

na

buyi

ng C

ola

2

will

buy

Cola

2 a

gai

n w

ith p

robab

ility

0.8

.

•Som

eone

has

bought

Cola

2,

what

’s t

he

pro

bab

ility

he/

she

will

buy

Cola

1 a

fter

2 t

imes

?•

Som

eone

has

bought

Cola

1,

what

’s t

he

pro

bab

ility

he/

she

will

buy

Cola

1 a

gai

n a

fter

3 t

imes

?•

Suppose

at

som

e tim

e 60%

of cl

ients

bought

Cola

1 a

nd 4

0%

Cola

2.

Aft

er t

hre

e purc

has

es w

hat

’s t

he

per

centa

ge

of

peo

ple

buyi

ng C

ola

1?

Col

a1

Col

a2

Col

a1

Col

a2

0.10

0.90

0.80

0.20

P =

Som

eone

has

bought

Cola

2,

what

’s t

he

pro

bab

ility

he/

she

will

buy

Cola

1 a

fter

2 t

imes

?P(

X2=

1|X

0=

2)=

P21(2

)

Som

eone

has

bought

Cola

1,

what

’s t

he

pro

bab

ility

he/

she

will

buy

Cola

1 a

gai

n a

fter

3 t

imes

? P(X

3=

1|X

0=

1)=

P11(3

)

0.34

0.10

0.90

0.80

0.20

0.10

0.90

0.80

0.20

0.17

0.83

0.66

P2=

=

The

Cola

Exa

mple

(II

)The

Cola

Exa

mple

(II

)

0.10

0.90

0.80

0.20

0.21

90.

781

0.56

20.

438

P3=

=0.

170.

83

0.66

0.34

Suppose

at

som

e tim

e 60%

of

clie

nts

bought

Cola

1 a

nd 4

0%

Cola

2.

Aft

er t

hre

e purc

has

es w

hat

’s t

he

per

centa

ge

of

peo

ple

buyi

ng C

ola

1?

p=

iq

i·

P ij(3)

= q

·(c

olu

mn 1

of

P3)

Note:

What

we

hav

e dis

cuss

ed s

o f

ar is

the

firs

t-ord

er M

arko

v Chai

n.

More

gen

eral

ly,

in k

th-o

rder

Mar

kov

Chai

n,

each

sta

te t

ransi

tion

dep

ends

on p

revi

ous

kst

ates

.

The

Cola

Exa

mple

(II

I)The

Cola

Exa

mple

(II

I)

0.64

380.

781

0.43

8p=

=0.

400.

60

What

’s t

he

size

of

tran

sition p

robab

ility

mat

rix?

X 1X 2

X 3X 0

…

A B

unch

of

Def

initio

ns

A B

unch

of

Def

initio

ns

Giv

en a

Mar

kov

Chai

n w

e def

ine:

•Sta

te j

isre

achab

lefr

om

iif it

exis

t a

pat

h f

rom

ito

j

•Sta

tes

ian

djco

mm

unic

ate

ifiis

rea

chab

le f

rom

jan

d v

icev

ersa

•A s

et o

f st

ates

Sin

a M

arko

v Chai

n is

close

dif n

o s

tate

outs

ide

S

is r

each

able

fro

m a

sta

te in S

•A s

tate

iis

an a

bso

rbin

gst

ate

if p

ii=1

•A s

tate

iis

tran

sien

tif e

xist

s jre

achab

le f

rom

i,

but

iis

not

reac

hab

le f

rom

j•

A s

tate

that

is

not

tran

sien

t is

def

ined

as

recu

rren

t•

A s

tate

iis

per

iodic

with p

erio

d k

>1

ifk

is t

he

smal

lest

num

ber

th

at d

ivid

es t

he

length

of

all pat

h f

rom

ito

i•

A s

tate

that

is

not

per

iodic

is

said

a-p

erio

dic

If a

ll st

ates

in a

Mar

kov

Chai

n a

re r

ecurr

ent,

a-per

iodic

,an

dco

mm

unic

ate

with e

ach o

ther

, it is

said

to b

e Ergothic

A s

imple

exa

mple

of

Erg

oth

icM

arko

v Chai

n is

the

follo

win

g:

Do t

he

follo

win

g t

ransi

tions

repre

sent

Erg

oth

icM

arko

v Chai

ns?

Exa

mple

s of

Exa

mple

s of

Erg

oth

icErg

oth

icM

arko

v Chai

ns

Mar

kov

Chai

ns

13

20.

30.

7

0.5

0.25

0.5

0.75

0.3

0.7

0

0.5

0

0

.50

0.2

5 0

.75

P =

1/4

1/2

1

/42/

3

1/3

0

0

2/3

1/

3P

= 1/2

1/2

0

0

1/2

1/2

0

0

0

0

2/3

1/3

0

0

1/4

3/4

P =

13

20.

250.

5

0.66

0.66

0.33

0.33

0.25

13

20.

50.

5

0.5

0.66

0.5

0.254

0.75

0.33

Bei

ng

Pth

e tr

ansi

tion m

atrix

of

an E

rgoth

icM

arko

v Chai

n w

ith n

stat

esw

e hav

e th

at

limP i

j(n) =

j

With

= [

12

3…

n]=

Pbei

ng t

he

Ste

ady

Sta

te D

istr

ibution

The

Cola

Exa

mple

:

n+

Ste

ady

Sta

te D

istr

ibution

Ste

ady

Sta

te D

istr

ibution

.33

.67

.33

.67

30

.33

.67

.33

.67

40

.33

.67

.33

.67

20

.35

.65

.32

.68

10

.44

.56

.28

.72

5

.56

.44

.22

.78

3

.66

.34

.17

.83

2

.80

.20

.10

.90

1

P22(n)

P21(n)

P12(n)

P11(n)

n

STEA

DY

STA

TE

0.9

0.1

0.2

0.8

P =

0.67

0.

330.

67

0.33

=

Tra

nsi

tory

Beh

avio

rTra

nsi

tory

Beh

avio

r

The

bah

avio

rof

a M

arko

v Chai

n b

efore

get

ting t

o t

he

Ste

ady

Sta

te id

def

ined

tran

sito

ry

We

can c

om

pute

the

expec

ted n

um

ber

of

tran

sition

to r

each

sta

te j

bei

ng in s

tate

ifo

r an

Erg

oth

icM

arko

v Chai

n:

mij

= p

ij(1

)+k

jpik·

(1+

mkj)=

1+

kjp

ik·

mkj

The

Cola

Exa

mple

:

•H

ow

man

y bott

le o

n a

vera

ge

a Cola

1 b

uye

r w

ill h

ave

bef

ore

sw

itch

ing t

o C

ola

2?

m12=

1+

kjp

1k·

mk2

=1+

0.9

·m12

m12=

10

•W

hat

about

vice

vers

a?

m21=

1+

kjp

2k·

mk1

=1+

0.8

·m21

m21=

5

TRANSITORY

P

We

hav

e an

d a

bso

rbin

g M

arko

v Chai

nif t

her

e ex

ist

one

or

more

ab

sorb

ing s

tate

s an

d a

ll th

e oth

er a

re t

ransi

ent.

For

an a

bso

rbin

g M

arko

v Chai

n w

e ca

n w

rite

the

tran

sition m

atrix

as:

wher

e:•

Qis

the

tran

sition m

atrix

for

tran

sien

t st

ates

•R

is t

he

tran

tion

mat

rix

from

tra

nsi

ent

to a

bso

rbin

g s

tate

s

What

kin

d o

f in

fere

nce

we

could

mak

e w

ith t

his

model

?•

How

long it

will

tak

e to

get

in a

n a

bso

rbin

g s

tate

giv

en t

hat

we

star

t fr

om

a t

ransi

ent

one?

•Sta

rtin

g f

rom

a t

ransi

ent

stat

e, h

ow long d

oes

it

take

s to

get

to

an a

bso

rbin

g o

ne?

Dea

ling w

ith A

bso

rbin

g S

tate

sD

ealin

g w

ith A

bso

rbin

g S

tate

s

QR

P =

0I

How

long it

will

tak

e to

get

in a

n a

bso

rbin

g s

tate

giv

en t

hat

we

star

t fr

om

a t

ransi

ent

one?

•Bei

ng in a

tra

nsi

ent

stat

e ith

e av

erag

e tim

e sp

ent

in a

tra

nsi

ent

stat

ejis

the

ij-t

hel

emen

t of

(I-Q

)-1

Sta

rtin

g f

rom

a t

ransi

ent

stat

e, h

ow

long d

oes

it

take

s to

get

to a

n

abso

rbin

g o

ne?

•Bei

ng in t

ransi

ent

stat

e ith

e pro

bab

ility

to g

et into

an a

bso

rbin

g

stat

ejis

the

ij-t

hel

emen

t of

(I-Q

)-1·R

Exa

mple

: in

a c

om

pan

y th

ere

are

3 lev

els:

junio

r, s

enio

r, p

artn

er.

You

can lea

ve t

he

com

pan

y as

par

tner

or

not

•H

ow

long d

oes

a junio

r re

mai

ns

in t

he

com

pan

y?

•W

hat

’s t

he

pro

bab

ility

for

a ju

nio

rto

lea

ve t

he

com

pan

y as

par

tner

?

Infe

rence

in A

bso

rbin

g M

arko

v Chai

ns

Infe

rence

in A

bso

rbin

g M

arko

v Chai

ns

10

00

0

01

00

0

0.0

50

0.9

50

0

00.1

00.2

00.7

00

00.0

50

0.1

50.8

0

P =

J

S

P

LP

L

N

The

Com

pan

y Exa

mple

The

Com

pan

y Exa

mple

How

long d

oes

a junio

r re

mai

ns

in t

he

com

pan

y?

•H

e/sh

e w

ill s

tay

as J

unio

r: m

11

= 5

•H

e/sh

e w

ill s

tay

as S

enio

r: m

12

= 2

.5

•H

e/She

will

sta

y as

Par

tner

: m

13

= 1

0

What

’s t

he

pro

bab

ility

for

a ju

nio

r to

lea

ve t

he

com

pan

y as

par

tner

?

•H

e/She

will

end u

p in s

tate

LP:

m12

= 0

.5

(I-Q

)-1=

5 2

.5

100

3.3

13.

30

0

2

0

17.5

yea

rs!

(I-Q

)-1·

R =

0.5

0.5

0.3

0.7

0

1

Suppose

we

are

a gam

ble

r an

d w

e st

art

from

a 3

$ c

apital

, w

ith

pro

bab

ility

p=

1/3

we

can w

in 1

$an

d w

ith p

robab

ility

1-p

=2/3

we

loose

1$.

We

fail

if o

ut

capital

get

to 0

and w

e w

in if

our

capital

bec

om

es 5

.

We

can d

escr

ibe

our

capital

as

a M

arko

v Chai

n b

eing X

tour

capital

:

•Po

ssib

le s

tate

s: 0

, 1,

2,

3,

4,

5

•Tra

nsi

tion p

robab

ility

: p(X

t+1=

Xt+

1)=

1/3

, p(X

t+1=

Xt-

1)=

2/3

What

kin

d o

f re

asonin

g c

an w

e ap

ply

to t

his

model

?

•W

hat

’s t

he

pro

bab

ility

of

sequen

ce 3

, 4,

3,

2,

3,

2,

1,

0?

•W

hat

’s t

he

pro

bab

ility

of

succ

ess

for

the

gam

ble

r?

•W

hat

’s t

he

aver

age

num

ber

of

bet

s th

e gam

ble

r w

ill m

ake?

Exe

rcis

e: G

amble

rExe

rcis

e: G

amble

r ’’s

Ruin

s Ruin

X 1=?

X 2=?

X 3=?

X 0=3

…

Why

Should

I C

are

All

This

Cra

zy M

ath?

Why

Should

I C

are

All

This

Cra

zy M

ath?

“Nic

e, b

ut

unle

ss I

wan

t to

gam

ble

why

should

I c

are?

I’m

a c

om

pute

r en

gin

eer

what

this

has

to d

o w

ith p

ract

ical

inte

lligen

t sy

stem

s?”

What

do y

ou t

his

is

the

gre

ates

t re

volu

tion

(or

revo

lutionar

y co

mpan

y) o

n t

he

web

in

the

last

dec

ade?

Ass

um

e a

link

from

pag

e A t

o p

age

B is

a re

com

men

dat

ion o

f pag

e B

by

the

auth

or

of

A (

we

say

B is

succ

esso

r of

A).

•Q

ual

ity

of

a pag

e is

rel

ated

to its

in-d

egre

e.

•The

of

a pag

e is

rel

ated

to t

he

qual

ity

of

pag

es lin

king t

o it

This

rec

urs

ivel

y def

ines

the

Pag

eR

an

kof

a pag

e [B

rin

& P

age

‘98]

For

a (b

ette

r) d

etai

led d

escr

iption fee

l fr

ee t

o r

ead:

•htt

p:/

/ww

w-d

b.s

tanfo

rd.e

du/~

bac

krub/g

oogle

.htm

l•

htt

p:/

/ww

w.iprc

om

.com

/pap

ers/

pag

eran

k/

Google

Google

’’ ssPa

geR

ank

PageR

ank

Suppose

the

web

is

an E

rgoth

icM

arko

v Chai

n (

I kn

ow

this

is

a big

as

sum

ption).

Consi

der

bro

wsi

ng a

s an

infinite

random

wal

k (s

urf

ing):

•In

itia

lly t

he

surf

er is

at a

ran

dom

pag

e

•At

each

ste

p,

the

surf

er p

roce

eds

oto

a r

andom

ly c

hose

n w

eb p

age

with p

robab

ility

d

oto

a r

andom

ly c

hose

n s

ucc

esso

r of

the

curr

ent

pag

e w

ith

pro

bab

ility

1-d

The

PageR

ank

of

a pag

e is

the

frac

tion o

f st

eps

the

surf

er s

pen

ds

on it

in t

he

limit.

A

B

C

D

E

FG

Def

initio

n o

f D

efin

itio

n o

f Pa

geR

ank

PageR

ank

PageR

ank

= t

he

stea

dy

stat

e pro

bab

ility

for

this

Mar

kov

Chai

n

•n

is t

he

tota

l num

ber

of

nodes

in t

he

gra

ph

•d

is t

he

pro

bab

ility

of

a ra

ndom

jum

p

PageR

ank(

C)

= d

/n+

(1-d

)(1/4

Pag

eRan

(A)

+1/3

Pag

eRan

k(B))

Sum

mar

izes

the

“web

opin

ion”

about

the

pag

e im

port

ance

•Q

uer

y-in

dep

enden

t

•It

can

be

fake

d …

read

the

pro

vided

lin

ks if

you a

re c

urious!

Eu

vv

outd

egre

ev

Page

Rank

dd

uPa

geRa

nk)

,(

)(

/)(

)1(

)(

AB

C

Prob

abili

stic

Mod

elin

g of

Tim

ePr

obab

ilist

ic M

odel

ing

of T

ime

-- Hid

den

Mar

kov

Mod

els

Hid

den

Mar

kov

Mod

els

--

In s

om

e M

arko

v pro

cess

es,

we

may

not

be

able

to o

bse

rve

direc

tly

the

stat

es.

In t

his

cas

e w

e get

anoth

er f

amous

Bay

esia

n N

etw

ork

nam

ed

asH

idden

Mar

kov

Model

(HM

M).

An H

MM

is

des

crib

ed b

y a

quin

tuple

<S,E

,P,A

,B>

•S

:{s 1

,…,s

N}

are

the

valu

es f

or

the

hid

den

sta

tes

•E

:{e 1

,…,e

T}

are

the

valu

es f

or

the

obse

rvat

ions

•P:

pro

bab

ility

dis

trib

ution o

f th

e in

itia

l st

ate

•A:

tran

sition p

robab

ility

mat

rix

•B:

emis

sion p

robab

ility

mat

rix

For

a dee

per

des

crip

tion fee

l fr

ee t

o r

ead:

htt

p:/

/ww

w.c

s.ubc.

ca/~

murp

hyk

/Bay

es/r

abin

er.p

df

Hid

den

Mar

kov

Model

sH

idden

Mar

kov

Model

s

X t+

1X t

X t-1

e t+1

e te t-1

e 1

X T e T

X 1…

…

Audio

Spec

trum

of

the

song f

or

the

Proth

onota

ryW

arble

r

Audio

Spec

trum

of

the

song f

or

the

Ches

tnut-

sided

War

ble

r

What

can

we

ask

to t

his

HM

M?

•W

hat

bird is

this

?Tim

e Ser

ies

Cla

ssific

atio

n•

How

will

the

song c

ontinue?

Tim

e Ser

ies

Pre

dic

tion

•Is

this

bird s

ick?

Outlie

r D

etec

tion

•W

hat

phas

es d

oes

this

song h

ave?

Tim

e Ser

ies

Seg

men

tation

An E

xam

ple

: The

Audio

Spec

trum

An E

xam

ple

: The

Audio

Spec

trum

Obse

rvat

ions

Sta

te

Obse

rvat

ions

Sta

te

What

can

we

ask

to t

his

HM

M?

•W

ill t

he

stock

go u

p o

r dow

n?

Tim

e Ser

ies

Pre

dic

tion

•W

hat

typ

e st

ock

is

this

(eg

, risk

y)?

Tim

e Ser

ies

Cla

ssific

atio

n•

Is t

he

beh

avio

r ab

norm

al (

eg,

BF)

?O

utlie

r D

etec

tion

Anoth

er T

ime

Ser

ies

Pro

ble

mAnoth

er T

ime

Ser

ies

Pro

ble

m

Inte

l

Cis

coG

E

MS

Musi

c Anal

ysis

Musi

c Anal

ysis

What

can

we

ask

to t

his

HM

M?

•Can

we

com

pose

more

of

that

? Tim

e Ser

ies

Pred

iction

•Is

this

Bee

thove

n o

r Bac

h?

Tim

e Ser

ies

Cla

ssific

atio

n•

Can

we

segm

ent

it into

them

es?

Tim

e Ser

ies

Seg

men

tation

Wea

ther

: A M

arko

v Chai

n M

odel

Wea

ther

: A M

arko

v Chai

n M

odel

Sta

tes:

{S

sunny,

Sra

iny,

Ssn

ow

y}Sta

te t

ransi

tion p

robab

ilities

:

Initia

l st

ate

dis

trib

ution:

q =

(0.7

0.2

5

0.0

5)

Giv

en:

What

is

the

pro

bab

ility

of

this

ser

ies?

P(s)

=P(

Ssu

nny)

P(S

rain

y|S

sunny)

P(S

rain

y|S

rain

y)P(

Sra

iny|

Sra

iny)

P(S

snow

y|S

rain

y)P(

Ssn

ow

y|S

snow

y)=

0.7

·0.1

5·0

.6·0

.6·0

.02·0

.2=

0.0

001512

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38% 20%

75%

5%

P =

0.80

0.1

5 0

.05

0.38

0.6

0 0

.02

0.75

0.0

5 0

.20

Wea

ther

: An H

idden

Mar

kov

Model

sW

eath

er:

An H

idden

Mar

kov

Model

s

65%

5%

30%

60%

10%

30%

50%

0%50

%

Sunny

Rainy

Snowy

80%

15%

5%

60%

2%

38%

20%

75%

5%

Ingre

die

nts

of

HM

M a

nd F

undam

enta

l Q

ues

tions

Ingre

die

nts

of

HM

M a

nd F

undam

enta

l Q

ues

tions

Sta

tes:

{S

sunny,

Sra

iny,

Ssn

ow

y}O

bse

rvat

ions:

{O

short

s, O

coat,

Oum

bre

lla}

Sta

te t

ransi

tion p

robab

ilities

:

Obse

rvat

ion p

robab

ilities

:

Initia

l st

ate

dis

trib

ution:

P =

(0.7

0.2

5

0.0

5)

Giv

en:

…

•W

hat

is

the

pro

bab

ility

of

this

ser

ies?

•W

hat

is

the

under

lyin

g s

equen

ce o

f st

ate?

•H

ow

can

I lea

rn m

y H

MM

par

amet

ers?

A =

0.80

0.1

5 0

.05

0.38

0.6

0 0

.02

0.75

0.0

5 0

.20

B =

0.60

0.3

0 0

.10

0.05

0.3

0 0

.65

0.00

0.5

0 0

.50

Com

puting F

orw

ard P

robab

ility

Com

puting F

orw

ard P

robab

ility

We

def

ine

the

Forw

ard P

robab

ility

as

the

pro

bab

ility

of

actu

al s

tate

and

obse

rvat

ions

P(X

t=s i,

e 1:t)

Why

com

pute

forw

ard p

robab

ility

?•

Probab

ility

of

obse

rvat

ions:

P(e

1:t).

•Pr

edic

tion:

P(X

t+1=

s i|

e 1:t)=

?

P(X

t=s i,

e 1:t)

=P(

Xt=

s i,e

1:t

-1,e

t)=

jP(

Xt-

1=

s j,X

t=s i,e

1:t

-1,e

t)=

jP(

e t|X

t=s i,X

t-1=

s j,e

1:t

-1)P

(Xt=

s i,X

t-1=

s j,e

1:t

-1)

=jP(

e t|X

t=s i)P

(Xt=

s i|X

t-1=

s j,e

1:t

-1)P

(Xt-

1=

s j,e

1:t

-1)

=jP(

e t|X

t=s i)P

(Xt=

s i|X

t-1=

s j)P

(Xt-

1=

s j,

e 1:t

-1)

i(t)

=P(

Xt=

s i,

e 1:t)

=jP

(Xt=

s i|X

t-1=

s j)P

(et|

Xt=

s i)

j(t-

1)

=jA

ijB

iet

j(t-

1)

Sam

e fo

rm,

use

rec

urs

ion

The

The

Viter

bi

Viter

biAlg

orith

mAlg

orith

m

From

obse

rvat

ions,

com

pute

the

mos

t lik

ely

hid

den

sta

te s

equen

ce:

argm

axP(

x 1:t|e

1:t)

= a

rgm

axP(

x 1:t,

e 1:t)/

P(e 1

:t)

= a

rgm

axP(

x 1:t,

e 1:t)

By

apply

ing t

he

Bay

esia

n N

etw

ork

pro

per

ty

P(x 1

:t,

e 1:t)

= P

(X0)

i=1,t

P(X

i|X

i-1)

P(e i

|Xi)

The

solu

tion w

e ar

e lo

oki

ng f

or

is t

he

one

that

min

imiz

es

-logP(

x 1:t,

e 1:t)=

–lo

gP(X

0)

+i=

1,t(–

logP(X

i|X

i-1)–

logP(e

i|X

i))

Giv

en a

HM

M c

onst

ruct

a g

raph t

hat

consi

sts

1+

t*N

nodes

:•

One

initia

l node

and N

node

at t

ime

iw

her

ejt

hre

pre

sents

Xi=

s j.

•The

link

bet

wee

n t

he

nodes

Xi-

1=

s jan

dX

i=s k

is a

ssoci

ated

with

the

length

–lo

g(P

(Xi=

s k|

Xi-

1=

s j)P

(ei|X

i=s k

))

The

pro

ble

m b

ecom

es t

hat

of

findin

g t

he

short

est

pat

h f

rom

X0=

s 0to

one

of

the

nodes

Xt=

s t.

Bau

mBau

m-- W

elch

Alg

orith

mW

elch

Alg

orith

m

The

pre

vious

two k

inds

of

com

puta

tion n

eeds

par

amet

ers

=(P

, A,

B).

Wher

e do t

he

pro

bab

ilities

com

e fr

om

?

Solu

tion:

Bau

m-W

elch

Alg

orith

m (

spec

ial ca

se o

f EM

)•

Unsu

per

vise

d lea

rnin

g f

rom

obse

rvat

ions

•Fi

nd a

rgm

axP

(e1:t)

Giv

en a

n o

bse

rvat

ion s

equen

ce,

find o

ut

whic

h t

ransi

tion p

robab

ility

an

d e

mis

sion p

robab

ility

tab

le a

ssig

ns

the

hig

hes

t pro

bab

ility

to t

he

obse

rvat

ions:

1.

Sta

rt w

ith a

n initia

l se

t of

par

amet

ers

0(p

oss

ibly

arb

itra

ry)

2.

Com

pute

pse

udo c

ounts

: how

man

y tim

es t

he

tran

sition f

rom

Xi-

i=s j

toX

i=s k

occ

urr

ed?

3.

Use

the

pse

udo c

ounts

to o

bta

in a

bet

ter

set

of

par

amet

ers

1

4.

Iter

ate

until P

1(e

1:t)

is n

ot

big

ger

than

P(e

1:t)

Pse

udo C

ounts

and B

ackw

ard P

robab

ility

Pse

udo C

ounts

and B

ackw

ard P

robab

ility

Giv

en t

he

obse

rvat

ion s

equen

ce e

1:T

,•

pse

udo c

ount

of

stat

e s i

at t

ime

tis

the

pro

bab

ility

P(X

t=s i|e

1:T

)P(

Xt=

s i|e

1:T

)=

P(X

t=s i,

e 1:t,

e t+

1:T

)/P(

e 1:T

)=

P(e t

+1:T

| X

t=s i,

e 1:t)P

(Xt=

s i,

e 1:t)/

P(e 1

:T)

=P(

e t+

1:T

| X

t=s i)P

(Xt=

s i|e

1:t)P

(e1:t)/

P(e 1

:T)

=i(t)

i(t)

/P(e

t+1:T

|e1:t)

•pse

udo c

ounts

of

the

link

from

Xt=

s ito

Xt+

1=

s jis

the

pro

bab

ility

P(

Xt=

s i,X

t+1=

s j|e

1:T

)=P(

Xt=

s i,X

t+1=

s j,e

1:t,e

t+1,e

t+2:T

)/P(

e 1:T

)=

P(X

t=s i,e

1:t)P

(Xt+

1=

s j|X

t=s i)P

(et+

1|X

t+1=

s j)

P(e t

+2:T

|Xt+

1=

s j)/

P(e 1

:T)

=P(

Xt=

s i,e

1:t)A

ijB

jet+

1P(

e t+

2:T

|Xt+

1=

s j)/

P(e 1

:T)

=i(t)

Aij

Bje

tj(t+

1)/

P(e 1

:T)

Bei

ng

j(t)

=P(

e t+

1,…

,eT|X

t=s j

) w

e ca

n c

om

pute

it

bac

kwar

d•

j(T)=

1;

•j(t)

=jA

ijB

jet

j(t+

1).

HM

M P

aram

eter

s U

pdat

eH

MM

Par

amet

ers

Updat

e

We

can e

ffic

iently

com

pute

forw

ard a

nd b

ackw

ard p

robab

ility

for

all th

e st

ates

in t

he

Hid

den

Mar

kov

Model

To u

pdat

e our

estim

ate

of

HM

M p

aram

eter

s•

count(

i):

the

tota

l pse

udo c

ount

of

stat

e s i.

•co

unt(

i,j)

: th

e to

tal pse

udo c

ount

of

tran

sition f

rom

sito

s j.

•Add P

(Xt=

s i,X

t+1=

s j|e

1:T

)to

count(

i,j)

•Add P

(Xt=

s i|e

1:T

)to

count(

i)•

Add P

(Xt=

s i|e

1:T

)to

count(

i,et

)•

Updat

ed A

ij=

count(

i,j)

/count(

i)•

Updat

ed B

jet=

count(

j,e t

)/co

unt(

j)

t-1

tt+1

t+2

i(t)

j(t+1)

a ijbjet

X t+1=s j

X t=s i

Sum

mar

y on H

MM

Sum

mar

y on H

MM

HM

Ms

are

gen

erat

ive

pro

bab

ilist

ic m

odel

s fo

r tim

e se

ries

with h

idden

info

rmat

ion (

stat

e).

Ther

e a

few

iss

ues

rem

ainin

g:

•Zer

o p

robab

ility

pro

ble

mo

Tra

inin

g s

equen

ce:

AAABBBAAA

oTes

t se

quen

ce:

AAABBBCAAA

•Fi

ndin

g “

right”

num

ber

of

stat

es,

right

stru

cture

•N

um

eric

al inst

abili

ties

Bes

ide

thes

e pro

ble

ms

they

are

ext

rem

ely

pra

ctic

al,

bes

t kn

ow

n

met

hods

in s

pee

ch r

ecognitio

n,

com

pute

r vi

sion,

robotics

, …

You’d

be

surp

rise

d b

y th

e re

lationsh

ips

bet

wee

n H

MM

an

d K

alm

anFi

lter

ing o

r Kal

man

Sm

ooth

ing!

slides-lecture-6chrome.ws.dei.polimi.it/images/9/95/mis-handout-lecture-7.pdf · title: microsoft...

Documents