in collaboration with hualin gao, richard duncan, julie a. baca, joseph picone human and systems...

in collaboration with

Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone

Human and Systems EngineeringCenter of Advanced Vehicular System

Mississippi State University

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

Presented by Richard Duncan

Tablet PCMicrosoft Corporation

of 38Signal Processing Tools for Speech Recognition

WHICH TWO ARE THE SAME PHONEME?

We need to extract meaningful information from the signal for a speech recognition system to model


WHICH TWO ARE THE SAME PHONEME?

a: “ow” b: “aa” c: “ow”


WHAT IS AN ACOUSTIC FRONT-END?

It encapsulates the signal processing of a speech recognition system.

It computes a sequence of feature vectors from an audio stream.

These vectors are then processed by HMMs, neural networks, or other classifiers.


WHY REINVENT THE WHEEL?

A Front-end has many areas of complexity:

•Run-time efficiency

•File I/O

•Data management (framing)

•DSP algorithm complexity

•Algorithm re-use

Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms


DATA FRAMING

framen framen+1

windown

windown+1

New dataShared data


FEATURES OF ISIP FOUNDATION CLASSES

• Efficient memory management and tracking;

• System and I/O libraries that abstract details of the operating system;

• Math classes that provide basic linear algebra and efficient matrix manipulations;

• Generic data structures;

• Built-in unit tests to verify component correctness.


DESIGN REQUIREMENTS

• A library of standard algorithms provides basic digital signal processing (DSP) functions;

• New algorithms can be added without modifying existing classes;

• A block diagram tool allows rapid prototyping without programming or recompiling;

• The same system is used for offline feature extraction, recognition, and general DSP work.


BASIC DIGITAL PROCESSING FUNCTIONS

This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm:

// declare an Energy object, input vector, and output vectorEnergy egy; VectorFloat output; VectorFloat input(L"0, 1, 2");

// choose algorithm enrgy.setAlgorithm(Energy::SUM);

// choose implementationegy.setImplementation(Energy::DB);

// compute the energy of input data egy.compute(output, input);


ADDING NEW ALGORITHMS

class AlgorithmBase :// Processing:virtual boolean init();virtual boolean apply();// Configuration:virtual const String& className() const;virtual long getLeadingPad() const;virtual long getTrailingPad() const;virtual CMODE getOutputMode() const;virtual float getOutputSampleFrequency() const;virtual boolean setParser();// Debugging:boolean displayStart();boolean displayFinish();boolean displayChannel();boolean display();

}

• Interface contract allows extensibility to new algorithms;

• All algorithms are classes that implement this interface;

• Most have a default implementation.



boolean Energy::init() { }

const String& className() const { return CLASS_NAME; }

int GetLeadingPad() const { return 0; }

int GetTrailingPad() const { return 0; }

bool Apply(Vector<AlgorithmData> output, Vector<AlgorithmData> input){ // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } …}



boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) {

// compute the sum of squares Float e = input_a.sumSquare();

// compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length());

// the length of the output vector should be 1 as it only contains the energy output_a.setLength(1);

// assign the value of energy to the outputoutput_a(0) = Integral::max(floor_d, scaled_energy);

// exit gracefullyreturn true;

}


DEFINITIONS

Algorithm:

• Input and output is an array of floating point numbers

• Correspond to basic DSP principles

Recipe:

• Collection of algorithms which are run serially, output of An-1 is the input to An

• Named input and outputs

• Allows reuse of processing blocks between systems


HIERARCHY OF ALGORITHM CLASSES

AlgorithmBase

FourierTransformEnergy

Constant

Calculus Generator

Prediction

Window

Cepstrum

Filter

Reflection

Statistics Math Correlation FilterBank

Mask

Covariance

Inherit from

Algorithm Contains (runtime)

Recipe

input name

output name

Algorithm2

Algorithm1

...

Algorithmn


FRONT-END CONFIGURATION TOOL

• Design a front-end by creating a block diagram;

• Allows rapid prototyping of ideas.

• New modules can easily be added into the system

• Parameter file is then the input to a full speech recognition system


FRONT-END CONFIGURATION TOOL


RESPONSIBILITIES OF THE UTILITY

• Parses the file containing the recipe created in the configuration tool;

• Synchronizes different paths along the block flow diagram contained in the recipe;

• Prepares input and output data buffers for each algorithm;

• Schedules the sequence of required signal processing operations;

• Processes data through the recipe;

• Manages large collections of data files.


VERIFICATION STRATEGY

• The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB.

• Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions.

• Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.


STATE-OF-THE-ART FEATURES

• Mel-frequency cepstral coefficients (MFCCs);

• Cepstral mean subtraction;

• Energy normalization;

• 1st and 2nd order differential features;

• These features are used by most commercial speech recognition systems.


0

1

2

3

4

5

6

7

8

9

WER (%)

WSJ0 TIDIGITS

Experiment

New

Baseline

EXPERIMENTAL RESULTS


CONCLUSION

• The front-end performs signal processing for speech recognition systems;

• The ISIP front-end is implemented on an extensible library of basic DSP building blocks;

• A block diagram interface is used to configure the front-end data flow;

• The tool’s usability was optimized through multiple training sessions with new users;

• The system’s correctness was verified through speech recognition experiments.

in collaboration with hualin gao, richard duncan, julie a. baca, joseph picone human and systems...

Documents

speech recognition system

input slide

new datashared data

energy of input data

virtual long gettrailingpad

virtual long getleadingpad

energy of input vector

tracking system