learning structured models for phone recognition slav petrov, adam pauls, dan klein

36
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Upload: adriana-chinnock

Post on 11-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Learning Structured Models for Phone Recognition

Slav Petrov, Adam Pauls, Dan Klein

Page 2: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Acoustic Modeling

Page 3: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Motivation

Standard acoustic models impose many structural constraints

We propose an automatic approach

Use TIMIT Dataset MFCC features Full covariance Gaussians (Young and Woodland, 1994)

Page 4: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Phone Classification

? ? ? ? ? ? ? ? ??

Page 5: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Phone Classification

æ

Page 6: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

HMMs for Phone Classification

Page 7: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

HMMs for Phone Classification

Temporal Structure

Page 8: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Standard subphone/mixture HMM

Temporal Structure

Gaussian Mixtures

Model Error rate

HMM Baseline 25.1%

Page 9: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Our ModelStandard Model

Single Gaussians

Fully Connected

Page 10: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Hierarchical Baum-Welch Training

32.1%

28.7%

25.6%

HMM Baseline 25.1%

5 Split rounds 21.4%

23.9%

Page 11: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Phone Classification Results

Method Error Rate

GMM Baseline (Sha and Saul, 2006) 26.0 %

HMM Baseline (Gunawardana et al., 2005) 25.1 %

SVM (Clarkson and Moreno, 1999) 22.4 %

Hidden CRF (Gunawardana et al., 2005) 21.7 %

Our Work 21.4 %

Large Margin GMM (Sha and Saul, 2006) 21.1 %

Page 12: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Phone Recognition

? ? ? ? ? ? ? ? ?

Page 13: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Standard State-Tied Acoustic Models

Page 14: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

No more State-Tying

Page 15: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

No more Gaussian Mixtures

Page 16: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Fully connected internal structure

Page 17: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Fully connected external structure

Page 18: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Refinement of the /ih/-phone

Page 19: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Refinement of the /ih/-phone

Page 20: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Refinement of the /ih/-phone

Page 21: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Refinement of the /ih/-phone

Page 22: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Refinement of the /l/-phone

Page 23: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Hierarchical Refinement Results

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0 500 1000 1500 2000

Number of States

Error Rate

Split and Merge, Automatic Alignment Split Only

HMM Baseline 41.7%

5 Split Rounds 28.4%

Page 24: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Merging

Not all phones are equally complex Compute log likelihood loss from merging

Split model Merged at one node

t-1 t t+1 t-1 t t+1

Page 25: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Merging Criterion

t-1 t t+1

t-1 t t+1

Page 26: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Split and Merge Results

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0 500 1000 1500 2000

Number of States

Error Rate

Split and Merge Split Only

Split Only 28.4%

Split & Merge 27.3%

Page 27: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

0

5

10

15

20

25

30

35

ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n

vcl ow l

m t v

uw aw ax ch w th el dh uh p

en oy hh jh ng y b d dx g zh epi

HMM states per phone

Page 28: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

ey eh ao

0

5

10

15

20

25

30

35

ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n

vcl ow l

m t v

uw aw ax ch w th el dh uh p

en oy hh jh ng y b d dx g zh epi

HMM states per phone

Page 29: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

g d b

0

5

10

15

20

25

30

35

ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n

vcl ow l

m t v

uw aw ax ch w th el dh uh p

en oy hh jh ng y b d dx g zh epi

HMM states per phone

Page 30: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Alignment

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0 500 1000 1500 2000

Number of States

Error Rate

Split and Merge Split Only Split and Merge, Automatic Alignment

Hand Aligned 27.3%

Auto Aligned 26.3%

Results

Page 31: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

0

5

10

15

20

25

30

35

ae ao ay eh er ey ih aa ah ix iy ow uw aw ax el uh en oy f r s z k sh n l m t v ch w th dh

p hh jh ng

y b d dx g zh sil cl vcl epi

Hand Aligned Auto Aligned

Alignment State Distribution

Page 32: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Inference

State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5

Phone sequence:d - d - d -d -ae - ae - ae - ae - d - d -d - d - d

Transcription d - ae - d

Viterbi

Variational

???

Page 33: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Variational Inference

Variational Approximation:

Viterbi 26.3%

Variational 25.1%

: Posterior edge marginals

Solution:

Page 34: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Phone Recognition Results

Method Error Rate

State-Tied Triphone HMM (HTK)

(Young and Woodland, 1994)27.7 %

Gender Dependent Triphone HMM

(Lamel and Gauvain, 1993) 27.1 %

Our Work 26.1 %

Bayesian Triphone HMM

(Ming and Smith, 1998) 25.6 %

Heterogeneous classifiers

(Halberstadt and Glass, 1998) 24.4 %

Page 35: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Conclusions

Minimalist, Automatic Approach Unconstrained Accurate

Phone Classification Competitive with state-of-the-art discriminative

methods despite being generative

Phone Recognition Better than standard state-tied triphone models

Page 36: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Thank you!

http://nlp.cs.berkeley.edu