sphinx 3.4 development progress report in february

Sphinx 3.4 DevelopmentProgress Report in February

Arthur Chan, Jahanzeb SherwaniCarnegie Mellon University

Mar 1, 2004

This Presentation

S3.4 Development Progress Speed-up Language Model facilities

CALO and S3.5 Development Which features should be there to make

CALO better? Schedule for next three months

Review of Last Month Progress

Last month Wrote a speed-up version of s3. Completed some coding of s3.4 speed-up

task. This month

Backbone of speed-up functionalities s3.4 completed and tested.

Basic LM facilities completed and smoked-tested.

Current Systems Specifications(without Gaussian Selection)

Sphinx 3 Sphinx 3.3

Speed in P4-1GTested in Communicator Task

ERR 17.2%11xRT GMM,3xRT Srch

ERR 18.6%6xRT GMM,1xRT Srch

GMM Computations Not optimized(few codeoptimization)

Can applied Sub-VQ-based Gauss. Selection

Lexicon Flat Tree

Search Beam on search, no beam on GMM

Beam on SearchBeam on GMM.

Speed-up Facilities in s3.3

GMMComputation

Seach

Frame-Level

Senone-Level

Gaussian-Level

Component-Level

Not implemented

Not implemented

SVQ-based GMM Selection Sub-vector constrained to 3

SVQ code removed

Lexicon Structure

Pruning

Heuristic SearchSpeed-up

Tree.

Standard

NotImplemented

Speed-up Facilities in s3.4

GMMComputation

Seach

Frame-Level

Senone-Level

Gaussian-Level

Component-Level

(New) Naïve Down-Sampling(New) Conditional Down-Sampling

(New) CI-based GMM Selection

(New) VQ-based GMM Selection(New) Unconstrained no. of sub-vectors in SVQ-based GMM Selection

(New) SVQ code enabled

Lexicon Structure

Pruning

Heuristic SearchSpeed-up

Tree

(New)ImprovedWord-end

Pruning

(New) Phoneme-Look-ahead

S3.4 Speed Performance in Communicator Task

Sphinx 3.3 Sphinx 3.4

Error Rate ERR: 18.6% ERR: 18.7%

Speed (P4-1G) 6xRT GMM,1xRT Search

1.2xRT GMM,1.5xRT Search

Speed (P4-2G) 1.6xRT GMM,0.6xRT Search

0.4xRT GMM0.9xRT Search

Techniques used - -CI-based GMMSelection-Word-end pruning

Issues in Speed Optimization Implementation Issues:

Beams applied on GMM causing many techniques hard to be implemented

Some facilities were hardwired for specific purpose.

Performance Issues Each techniques reduced computation by 40-

50% with <5% degradation. However, they didn’t add-up……

Reduction in computation has certain lower bound (usually 75%-80% reduction is max.)

Overhead is huge in some techniques E.g. VQ-based Gaussian Selection take 0.25xRT

Language Model Facilities S3.3 only accept single LM without class in

binary format So far, S3.4 is able to accept multiple class-

based LMs in binary format. One major modification of codes

Affect 6-7 files. Caveats:

Not perfect implementation. Text format is not yet supported. Backward

compatibility is an issue. Lack of test-cases. Only slightly smoke-tested

~1 more week work

Problems with s3.4 (valid for Feb 29th, 2004)

Only accept DMP file. Txt format reader is very complex in Sphinx 2. Straight conversion is not clean.

LMs are all loaded into memory We can work on this.

Lexical tree are all built at the beginning We tried to avoid the overhead of rebuilding tree in e

very utterance.

Summary in Sphinx 3.4 Development Derivative s3.3

With Speed Optimization Better LM facilities

Algorithmic Optimization is 90% completed Still need to improve overhead performance. Tree-b

ased GMM selection is desirable. Improvement for individual technique.

Go-through the major hurdle of multiple LMs and class-based LMs. Need more time to make it more stable.

Expected internal release time : March 8, 2004

Sphinx 3.4 and CALO

Which pieces are missing? Sphinx 3.4’s decoding is still not

streamlined => Continuous Listening is not yet enabled.

Sphinx’s speed may still not be ideal. From s3 to s3.3, ~10% degradation. Sphinx 3.4 doesn’t learn from data yet.

Sphinx 3.5. What should we do in next 3 months? Expected release time (May – June) Interfaces:

Streamlined front-end and decoding (?) Portaudio based audio routine.

Speed/Accuracy Improved lexical tree search Machine optimization of Gaussian computation. Combination of multiple recognizers

Learning Acoustic Model adaptation (?) Language Model adaptation (In Phoenix) Better semantic parsing

Resource Acquisition and Load Balancing

Highlight I: Speed/Accuracy Improved lexical tree search

Current implementation used single lexical tree.

May be desirable to create tree copies. Machine Optimization of Gaussian

Computation SIMD (Single Implementation Multiple

Data) Require help from assembly language

experts. (Jason/Thomas)

Highlight II: Multiple Recognizer Combination and Resource Acquisition

Research by Rong suggests combination of multiple recognizer can improve accuracy

Speed worsen by 100% if we run two recognizers.

An interesting solution: Computation can be shared by other machines in the

meeting. Inspired by routing implementation. A very natural solution in meeting scenario because

usually only one person will be speaking. Challenges : Bandwidth and Load Balancing

Highlight III: Learning

Acoustic Model Maximum Likely Linear Regression (MLLR) Will be responsible by Jahanzeb

(?)Language Model How? Cached-based LM?

(?)Improved Robust Parsing Better parsing based on previous command

history Phoenix’s source code is not easy to trace Thomas Harris’s implementation may be a good

place to start.

Arthur and Jahanzeb’s Proposed Schedule

Arthur Jahanzeb

Mar 1 – Mar 15 Windows Port+Stream-line S3.4 decoding

Regression-test + AdaptationMilestone 1Mar 15- Apr 1 Multiple

recognizers Experiments

Apr 1 – Apr 15 Preparation for Demo + if (we want) {write-up paper ICSLP}

Cont.

Arthur Jahanzeb

Apr 16 – May 7 Search modification: tree copies implementation

Regression-test + AdaptationMilestone 2

May 7 – June 1 Sphinx 3.5 Learning code development +s3.5 release (?)

sphinx 3.4 development progress report in february

Documents