the learning problem - cs114 · 2017-03-14 · the learning problem baum-welch = forward-backward...

69
The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or ExpectaDon- MaximizaDon algorithm (Dempster, Laird, Rubin) The algorithm will let us train the transiDon probabiliDes A= {a ij } and the emission probabiliDes B={b i (o t )} of the HMM

Upload: others

Post on 29-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

The Learning Problem

�  Baum-Welch=Forward-BackwardAlgorithm(Baum1972)

�  IsaspecialcaseoftheEMorExpectaDon-MaximizaDonalgorithm(Dempster,Laird,Rubin)

�  ThealgorithmwillletustrainthetransiDonprobabiliDesA={aij}andtheemissionprobabiliDesB={bi(ot)}oftheHMM

Page 2: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Input to Baum-Welch � O unlabeledsequenceofobservaDons� Q vocabularyofhiddenstates

� Forice-creamtask� O={1,3,2,…,}� Q={H,C}

Page 3: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Starting out with Observable Markov Models �  Howtotrain?�  RunthemodelonobservaDonsequenceO.�  Sinceit’snothidden,weknowwhichstateswewentthrough,hencewhichtransiDonsandobservaDonswereused.

�  GiventhatinformaDon,training:� B={bk(ot)}:SinceeverystatecanonlygenerateoneobservaDonsymbol,observaDonlikelihoodsBareall1.0

� A={aij}:

aij =C(i→ j)C(i→ q)

q∈Q∑

Page 4: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Extending Intuition to HMMs �  ForHMM,cannotcomputethesecountsdirectlyfromobservedsequences

�  Baum-WelchintuiDons:� IteraDvelyesDmatethecounts.

�  StartwithanesDmateforaijandbk,iteraDvelyimprovetheesDmates

� GetesDmatedprobabiliDesby:�  compuDngtheforwardprobabilityforanobservaDon�  dividingthatprobabilitymassamongallthedifferentpathsthatcontributedtothisforwardprobability

Page 5: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

The Backward algorithm � Wedefinethebackwardprobabilityasfollows:

�  ThisistheprobabilityofgeneraDngparDalobservaDonsOt+1TfromDmet+1totheend,giventhattheHMMisinstateiatDmetandofcoursegivenΦ.€

βt (i) = P(ot+1,ot+2,...oT ,|qt = i,Φ)

Page 6: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

The Backward algorithm � WecomputebackwardprobbyinducDon:

q0 /

Page 7: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Inductive step of the backward algorithm �  ComputaDonofβt(i)byweightedsumofallsuccessivevaluesβt+1

!"#$!"

%&$

%&'

%&(

%&)

*$+!"#$,

!!"#$%&"'&!!()"'$&%&-.&*-+!"#$,&

/$

/'

/)

/(

/$

/&

/'

/$

/'

!"0$

/)

/(

!!()"*$

!!()"+$

!!()",$

!!()")$

*'+!"#$,*'+!"#$,

*'+!"#$,

Page 8: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Intuition for re-estimation of aij � WewillesDmateâijviathisintuiDon:

�  NumeratorintuiDon:� AssumewehadsomeesDmateofprobabilitythatagiventransiDoniàjwastakenatDmetinobservaDonsequence.

�  IfweknewthisprobabilityforeachDmet,wecouldsumoverallttogetexpectedvalue(count)foriàj.

ˆ a ij =expected number of transitions from state i to state j

expected number of transitions from state i

Page 9: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Re-estimation of aij � LetξtbetheprobabilityofbeinginstateiatDmetandstatejatDmet+1,givenO1..TandmodelΦ:

� Wecancomputeξfromnot-quite-ξ,whichis:

ξ t (i, j) = P(qt = i,qt+1 = j |O,λ)

not _quite_ξ t (i, j) = P(qt = i,qt+1 = j,O | λ)

Page 10: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Computing not-quite-ξ

The four components of P(qt = i,qt+1 = j,O | λ) : α,β,aij and bj (ot )

Page 11: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

From not-quite-ξ to ξ

� Wewant:

� We’vegot:

� Whichwecomputeasfollows:

not _quite_ξ t (i, j) = P(qt = i,qt+1 = j,O | λ)€

ξ t (i, j) = P(qt = i,qt+1 = j |O,λ)

Page 12: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

From not-quite-ξ to ξ

� Wewant:

� We’vegot:

�  Since:

� Weneed:€

not _quite_ξ t (i, j) = P(qt = i,qt+1 = j,O | λ)

ξ t (i, j) = P(qt = i,qt+1 = j |O,λ)

ξ t (i, j) =not _quite_ξ t (i, j)

P(O | λ)

Page 13: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

From not-quite-ξ to ξ

ξ t (i, j) =not _quite_ξ t (i, j)

P(O | λ)

Page 14: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

From ξ to aij

�  TheexpectednumberoftransiDonsfromstateitostatejisthesumoveralltofξ

�  ThetotalexpectednumberoftransiDonsoutofstateiisthesumoveralltransiDonsoutofstatei

�  FinalformulaforreesDmatedaij:

ˆ a ij =expected number of transitions from state i to state j

expected number of transitions from state i

Page 15: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Re-estimating the observation likelihood b

ˆ b j (vk ) =expected number of times in state j and observing symbol vk

expected number of times in state j

We’llneedtoknow γt(j): theprobabilityofbeinginstatejatDmet:

Page 16: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Computing γ

Page 17: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Summary

TheraDobetweentheexpectednumberoftransiDonsfromstateitojandtheexpectednumberofalltransiDonsfromstatei

TheraDobetweentheexpectednumberofDmestheobservaDondataemieedfromstatejisvk,andtheexpectednumberofDmesanyobservaDonisemieedfromstatej

Page 18: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

The Forward-Backward Algorithm

Page 19: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Summary: Forward-Backward Algorithm �  InDalizeΦ=(A,B)�  Computeα,β,ξ�  EsDmatenewΦ’=(A,B)�  ReplaceΦwithΦ’�  Ifnotconvergedgoto2

Page 20: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Applying FB to speech: Caveats �  NetworkstructureofHMMisalwayscreatedbyhand

�  noalgorithmfordouble-inducDonofopDmalstructureandprobabiliDeshasbeenabletobeatsimplehand-builtstructures.

� AlwaysBakisnetwork=linksgoforwardinDme�  SubcaseofBakisnet:beads-on-stringnet:

�  Baum-Welchonlyguaranteedtoreturnlocalmax,ratherthanglobalopDmum

�  Attheend,wethroughawayAandonlykeepB

Page 21: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

CS 224S / LINGUIST 285 Spoken Language Processing

DanJurafskyStanfordUniversity

Spring2014

Lecture4b:AdvancedDecoding

Page 22: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Outline for Today �  AdvancedDecoding�  HowthisfitsintotheASRcomponentofcourse

� April8:HMMs,Forward,ViterbiDecoding� Onyourown:N-gramsandLanguageModeling� Apr10:Training:Baum-Welch(Forward-Backward)� Apr10:AdvancedDecoding� Apr15:AcousDcModelingandGMMs� Apr17:FeatureExtracDon,MFCCs� May27:DeepNeuralNetAcousDcModels

Page 23: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Advanced Search (= Decoding)

� HowtoweighttheAMandLM�  Speedingthingsup:Viterbibeamdecoding� MulDpassdecoding

� N-bestlists� Lalces� Wordgraphs� Meshes/confusionnetworks

� FiniteStateMethods

Page 24: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

What we are searching for � GivenAcousDcModel(AM)andLanguageModel(LM):

ˆ W = argmaxW ∈L

P(O |W )P(W )

AM(likelihood) LM(prior)

(1)

Page 25: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Combining Acoustic and Language Models

� Wedon’tactuallyuseequaDon(1)

� AMunderesDmatesacousDcprobability� Why?BadindependenceassumpDons�  IntuiDon:wecompute(independent)AMprobabilityesDmates;butifwecouldlookatcontext,wewouldassignamuchhigherprobability.SoweareunderesDmaDng

� Wedothisevery10ms,butLMonlyeveryword.� Besides:AMisn’tatrueprobability

� AMandLMhavevastlydifferentdynamicranges

(1) ˆ W = argmaxW ∈L

P(O |W )P(W )

Page 26: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Language Model Scaling Factor �  SoluDon:addalanguagemodelweight(alsocalledlanguageweightLWorlanguagemodelscalingfactorLMSF

�  Valuedeterminedempirically,isposiDve(why?)�  Ooenintherange10+-5.

(2) ˆ W = argmaxW ∈L

P(O |W )P(W )LMSF

Page 27: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Language Model Scaling Factor �  AsLMSFisincreased:

� MoredeleDonerrors(sinceincreasepenaltyfortransiDoningbetweenwords)

� FewerinserDonerrors� Needwidersearchbeam(sincepathscoreslarger)

� LessinfluenceofacousDcmodelobservaDonprobabiliDes

Slide from Bryan Pellom

Page 28: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word Insertion Penalty �  ButLMprobP(W)alsofuncDonsaspenaltyforinserDngwords�  IntuiDon:whenauniformlanguagemodel(everywordhasanequalprobability)isused,LMprobisa1/VpenaltymulDpliertakenforeachword

�  EachsentenceofNwordshaspenaltyN/V�  Ifpenaltyislarge(smallerLMprob),decoderwillpreferfewerlongerwords

�  Ifpenaltyissmall(largerLMprob),decoderwillprefermoreshorterwords

� WhentuningLMforbalancingAM,sideeffectofmodifyingpenalty

�  SoweaddaseparatewordinserDonpenaltytooffset

(3) ˆ W = argmaxW ∈L

P(O |W )P(W )LMSFWIPN (W )

Page 29: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word Insertion Penalty �  Controlstrade-offbetweeninserDonanddeleDonerrors� Aspenaltybecomeslarger(morenegaDve)� MoredeleDonerrors�  FewerinserDonerrors

�  Actsasamodelofeffectoflengthonprobability� Butprobablynotagoodmodel(geometricassumpDonprobablybadforshortsentences)

Page 30: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Log domain � Wedoeverythinginlogdomain�  SofinalequaDon:

(4) ˆ W = argmaxW ∈L

logP(O |W ) + LMSF logP(W ) + N logWIP

Page 31: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Speeding things up �  ViterbiisO(N2T),whereNistotalnumberofHMMstates,andTislength

�  Thisistoolargeforreal-Dmesearch�  AtonofworkinASRsearchisjusttomakesearchfaster:� Beamsearch(pruning)�  Fastmatch�  Tree-basedlexicons

Page 32: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Beam search �  Insteadofretainingallcandidates(cells)ateveryDmeframe

�  UseathresholdTtokeepsubset:� AteachDmet�  IdenDfystatewithlowestcostDmin�  Eachstatewithcost>Dmin+Tisdiscarded(“pruned”)beforemovingontoDmet+1

� UnprunedstatesarecalledtheacDvestates

Page 33: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Viterbi Beam Search

Slide from John-Paul Hosom

bA(1)

bB(1)

bC(1)

bA(2)

bB(2)

bC(2)

bA(3)

bB(3)

bC(3)

bA(4)

bB(4)

bC(4)

πA

π B

π C

A:

B:

C:

t=0 t=1 t=2 t=3 t=4

Page 34: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Viterbi Beam search

� MostcommonsearchalgorithmforLVCSR� Time-synchronous� Comparingpathsofequallength

� TwodifferentwordsequencesW1andW2:� WearecomparingP(W1|O0t)andP(W2|O0t)� BasedonsameparDalobservaDonsequenceO0t� Sodenominatorissame,canbeignored

� Time-asynchronoussearch(A*)isharder

Page 35: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Viterbi Beam Search � Empirically,beamsizeof5-10%ofsearchspace

� Thus90-95%ofHMMstatesdon’thavetobeconsideredateachDmet

� VastsavingsinDme.

Page 36: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

On-line processing

� ProblemwithViterbisearch� Doesn’treturnbestsequenceDlfinalframe

� ThisdelayisunreasonableformanyapplicaDons.

� On-lineprocessing� usuallysmallerdelayindetermininganswer� atcostofalwaysincreasedprocessingDme.

36

Page 37: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

On-line processing �  AteveryDmeintervalI(e.g.1000msecor100frames):

� AtcurrentDmetcurr,foreachacDvestateqtcurr,findbestpathP(qtcurr)thatgoesfromfromt0totcurr(usingbacktrace(ψ))

�  ComparesetofbestpathsPandfindlastDmetmatchatwhichallpathsPhavethesamestatevalueatthatDme

�  Iftmatchexists{Outputresultfromt0totmatchReset/RemoveψvaluesunDltmatchSett0totmatch+1}

�  EfficiencydependsonintervalI,beamthreshold,andhowwelltheobservaDonsmatchtheHMM.

Slide from John-Paul Hosom

Page 38: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

On-line processing

�  Example(Interval=4frames):

�  AtDme4,allbestpathsforallstatesA,B,andChavestateBincommonatDme2.So,tmatch=2.

�  NowoutputstatesBBforDmes1and2,becausenomaeerwhathappensinthefuture,thiswillnotchange.Sett0to3

Slide from John-Paul Hosom

δ1(A)

δ1(B)

δ1(C)

δ2(A)

δ2(B)

δ2(C)

δ3(A)

δ3(B)

δ3(C)

δ4(A)

δ4(B)

δ4(C)

A:

B:

C:

t=1 t=2 t=3 t=4

BBAA

BBBB

BBBC

best sequence

t0=1 tcurr=4

Page 39: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

On-line processing

Slide from John-Paul Hosom

•  Nowtmatch=7,sooutputfromt=3tot=7:BBABB,thensett0to8.

•  IfT=8,thenoutputstatewithbestδ8,forexampleC.Finalresult(obtainedpiece-by-piece)isthenBBBBABBC

δ3(A)

δ3(B)

δ3(C)

δ4(A)

δ4(B)

δ4(C)

δ5(A)

δ5(B)

δ5(C)

δ6(A)

δ6(B)

δ6(C)

A:

B:

C:

t=3 t=4 t=5 t=6

BBABBA

BBABBB

BBABBC

t0=3 tcurr=8

δ7(A)

δ7(B)

δ7(C)

δ8(A)

δ8(B)

δ8(C)

t=7 t=8

best sequence Interval=4

Page 40: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Problems with Viterbi �  It’shardtointegratesophisDcatedknowledgesources� Trigramgrammars� Parser-basedLM

�  long-distancedependenciesthatviolatedynamicprogrammingassumpDons

� Knowledgethatisn’tleo-to-right�  Followingwordscanhelppredictprecedingwords

�  SoluDons� ReturnmulDplehypothesesandusesmartknowledgetorescorethem

� Useadifferentsearchalgorithm,A*Decoding(=Stackdecoding)

Page 41: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Multipass Search

Page 42: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Ways to represent multiple hypotheses �  N-bestlist

�  Insteadofsinglebestsentence(wordstring),returnorderedlistofNsentencehypotheses

� Wordlalce�  CompactrepresentaDonofwordhypothesesandtheirDmesandscores

� Wordgraph�  FSArepresentaDonoflalceinwhichDmesarerepresentedbytopology

Page 43: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Another Problem with Viterbi �  TheforwardprobabilityofobservaDongivenwordstring

�  TheViterbialgorithmmakesthe“ViterbiApproximaDon”

�  ApproximatesP(O|W)

� withP(O|beststatesequence)

Page 44: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Solving the best-path-not-best-words problem �  Viterbireturnsbestpath(statesequence)notbestwordsequence� BestpathcanbeverydifferentthanbestwordstringifwordshavemanypossiblepronunciaDons

�  TwosoluDons� ModifyViterbitosumoverdifferentpathsthatsharethesamewordstring.�  DothisaspartofN-bestcomputaDon

�  ComputeN-bestwordstrings,notN-bestphonepaths

� Useadifferentdecodingalgorithm(A*)thatcomputestrueForwardprobability.

Page 45: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Sample N-best list

Page 46: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

N-best lists �  Again,wedon’twanttheN-bestpaths�  Thatwouldbetrivial

� StoreNvaluesineachstatecellinViterbitrellisinsteadof1value

�  But:� MostoftheN-bestpathswillhavethesamewordstring� Useless!!!

� ItturnsoutthatafactorofNistoomuchtopay

Page 47: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Computing N-best lists �  Intheworstcase,anadmissiblealgorithmforfindingtheNmostlikelyhypothesesisexponenDalinthelengthoftheueerance.�  S.Young.1984.“GeneraDngMulDpleSoluDonsfromConnectedWordDPRecogniDonAlgorithms”.Proc.oftheInsDtuteofAcousDcs,6:4,351-354.

�  Forexample,ifAMandLMscorewerenearlyidenDcalforallwordsequences,wemustconsiderallpermutaDonsofwordsequencesforwholesentence(allwiththesamescores).

�  Butofcourseifthisistrue,can’tdoASRatall!

Page 48: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Computing N-best lists �  Instead,variousnon-admissiblealgorithms:� (Viterbi)ExactN-best� (Viterbi)WordDependentN-best

� Andoneadmissible� A*N-best

Page 49: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Exact N-best for time-synchronous Viterbi �  DuetoSchwartzandChow;alsocalled“sentence-dependentN-best”

�  Idea:eachstatestoresmulDplepaths�  Idea:maintainseparaterecordsforpathswithdisDnctwordhistories� History:wholewordsequenceuptocurrentDmetandwordw

� When2ormorepathscometothesamestateatthesameDme,mergepathsw/samehistoryandsumtheirprobabiliDes.�  i.e.computetheforwardprobabilitywithinwords

� Otherwise,retainonlyN-bestpathsforeachstate

Page 50: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Exact N-best for time-synchronous Viterbi �  Efficiency:

�  TypicalHMMstatehas2or3predecessorstateswithinwordHMM

�  SoforeachDmeframeandstate,needtocompare/merge2or3setsofNpathsintoNnewpaths.

� Atendofsearch,NpathsinfinalstateoftrellisgiveN-bestwordsequences

�  ComplexityisO(N)�  SDlltooslowforpracDcalsystems

�  Nis100to1000�  Moreefficientversions:word-dependentN-best

Page 51: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word-dependent (‘bigram’) N-best �  IntuiDon:

�  Insteadofeachstatemergingallpathsfromstartofsentence

� Wemergeallpathsthatsharethesamepreviousword�  Details:

�  ThiswillrequireustodoamorecomplextracebackattheendofsentencetogeneratetheN-bestlist

Page 52: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word-dependent (‘bigram’) N-best �  Ateachstatepreservetotalprobabilityforeachofk<<Npreviouswords�  Kis3to6;Nis100to1000

�  Atendofeachword,recordscoreforeachpreviouswordhypothesisandnameofpreviousword�  Soeachwordendingwestore“alternaDves”

�  But,likenormalViterbi,passonjustthebesthypothesis

�  Atendofsentence,doatraceback�  Followbackpointerstoget1-best� Butaswefollowpointers,putonaqueuethealternatewordsendingatsamepoint

� OnnextiteraDon,popnextbest

Page 53: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word Lattice �  EacharcannotatedwithAMandLMlogprobs

Page 54: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Word Graph �  TiminginformaDonremoved�  Overlappingcopiesofwordsmerged�  AMinformaDonremoved�  ResultisaWFST�  NaturalextensiontoN-gramlanguagemodel

Page 55: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Converting word lattice to word graph � Wordlalcecanhaverangeofpossibleendframesforword

�  Createanedgefrom(wi,ti)to(wj,tj)iftj-1isoneoftheend-Dmesofwi

Slide from Bryan Pellom

Page 56: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Lattices

�  SomeresearchersarecarefultodisDnguishbetweenwordgraphsandwordlalces

�  Butwe’llfollowconvenDoninusing“lalce”tomeanbothwordgraphsandwordlalces.

�  Twofactsaboutlalces:� Density:thenumberofwordhypothesesorwordarcsperueeredword

�  Lalceerrorrate(alsocalled“lowerbounderrorrate”):thelowestworderrorrateforanywordsequenceinlalce�  Lalceerrorrateisthe“oracle”errorrate,thebestpossibleerrorrateyoucouldgetfromrescoringthelalce.

�  Wecanusethisasanupperbound

Page 57: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Posterior lattices � Wedon’tactuallycomputeposteriors:

� Whydowewantposteriors?� Withoutaposterior,wecanchoosebesthypothesis,butwecan’tknowhowgooditis!

�  Inordertocomputeposterior,needto�  NormalizeoveralldifferentwordhypothesisataDme

� Alignallthehypotheses,sumoverallpathspassingthroughword

Page 58: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Mesh = Sausage = pinched lattice

Page 59: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Summary: one-pass vs. multipass �  PotenDalproblemswithmulDpass

�  Can’tuseforreal-Dme(needendofsentence)�  (Butcankeepsuccessivepassesreallyfast)

�  Eachpasscanintroduceinadmissiblepruning�  (Butone-passdoesthesamew/beampruningandfastmatch)

� WhymulDpass� VeryexpensiveKSs.(NLparsing,higher-ordern-gram,etc.)�  Spokenlanguageunderstanding:N-bestperfectinterface� Research:N-bestlistverypowerfulofflinetoolsforalgorithmdevelopment

� N-bestlistsneededfordiscriminanttraining(MMIE,MCE)togetrivalhypotheses

Page 60: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Weighted Finite State Transducers for ASR �  AnalternaDveparadigmforASR� UsedbyKaldi� Weightedfinitestateautomatonthattransducesaninputsequencetoanoutputsequence

� Mohri,Mehryar,FernandoPereira,andMichaelRiley."SpeechrecogniDonwithweightedfinite-statetransducers."InSpringerHandbookofSpeechProcessing,pp.559-584.SpringerBerlinHeidelberg,2008.

�  hep://www.cs.nyu.edu/~mohri/pub/hbka.pdf

Page 61: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Weighted Finite State Acceptors

Page 62: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Weighted Finite State Transducers

Page 63: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

WFST Algorithms Composi:on:combinetransducersatdifferentlevels.IfGisafinitestategrammarandPisapronunciaDondicDonary,P◦GtransducesaphonestringtowordstringsallowedbythegrammarDeterminiza:on:EnsureseachstatehasnomorethanoneoutputtransiDonforagiveninputlabelMinimiza:on:transformsatransducertoanequivalenttransducerwiththefewestpossiblestatesandtransiDons

slide from Steve Renals

Page 64: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

WFST-based decoding �  RepresentthefollowingcomponentsasWFSTs

�  Context-dependentacousDcmodels(C)�  PronunciaDondicDonary(D)�  n-gramlanguagemodel(L)

�  ThedecodingnetworkisdefinedbytheircomposiDon:C◦D◦L�  Successivelydeterminizeandcombinethecomponenttransducers,thenminimizethefinalnetwork

slide from Steve Renals

Page 65: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

G

Page 66: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

L

Page 67: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

G o L

Page 68: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

min(det(L o G))

Page 69: The Learning Problem - CS114 · 2017-03-14 · The Learning Problem Baum-Welch = Forward-Backward Algorithm (Baum 1972) Is a special case of the EM or Expectaon- Maximizaon algorithm

Advanced Search (= Decoding)

� HowtoweighttheAMandLM�  Speedingthingsup:Viterbibeamdecoding� MulDpassdecoding

� N-bestlists� Lalces� Wordgraphs� Meshes/confusionnetworks

� FiniteStateMethods