in collaboration with hualin gao, richard duncan, julie a. baca, joseph picone human and systems...
TRANSCRIPT
in collaboration with
Hualin Gao, Richard Duncan, Julie A. Baca, Joseph Picone
Human and Systems EngineeringCenter of Advanced Vehicular System
Mississippi State University
SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
Presented by Richard Duncan
Tablet PCMicrosoft Corporation
Page 2 of 38Signal Processing Tools for Speech Recognition
WHICH TWO ARE THE SAME PHONEME?
We need to extract meaningful information from the signal for a speech recognition system to model
Page 3 of 38Signal Processing Tools for Speech Recognition
WHICH TWO ARE THE SAME PHONEME?
a: “ow” b: “aa” c: “ow”
Page 4 of 38Signal Processing Tools for Speech Recognition
WHAT IS AN ACOUSTIC FRONT-END?
It encapsulates the signal processing of a speech recognition system.
It computes a sequence of feature vectors from an audio stream.
These vectors are then processed by HMMs, neural networks, or other classifiers.
Page 5 of 38Signal Processing Tools for Speech Recognition
WHY REINVENT THE WHEEL?
A Front-end has many areas of complexity:
•Run-time efficiency
•File I/O
•Data management (framing)
•DSP algorithm complexity
•Algorithm re-use
Our system abstracts the researcher/student from these mundane issues to so he or she can focus on the algorithms
Page 6 of 38Signal Processing Tools for Speech Recognition
DATA FRAMING
framen framen+1
windown
windown+1
New dataShared data
Page 7 of 38Signal Processing Tools for Speech Recognition
FEATURES OF ISIP FOUNDATION CLASSES
• Efficient memory management and tracking;
• System and I/O libraries that abstract details of the operating system;
• Math classes that provide basic linear algebra and efficient matrix manipulations;
• Generic data structures;
• Built-in unit tests to verify component correctness.
Page 8 of 38Signal Processing Tools for Speech Recognition
DESIGN REQUIREMENTS
• A library of standard algorithms provides basic digital signal processing (DSP) functions;
• New algorithms can be added without modifying existing classes;
• A block diagram tool allows rapid prototyping without programming or recompiling;
• The same system is used for offline feature extraction, recognition, and general DSP work.
Page 9 of 38Signal Processing Tools for Speech Recognition
BASIC DIGITAL PROCESSING FUNCTIONS
This example shows how to realize the basic digital signal processing functions. It computes the energy of input vector in dB using the SUM algorithm:
// declare an Energy object, input vector, and output vectorEnergy egy; VectorFloat output; VectorFloat input(L"0, 1, 2");
// choose algorithm enrgy.setAlgorithm(Energy::SUM);
// choose implementationegy.setImplementation(Energy::DB);
// compute the energy of input data egy.compute(output, input);
Page 10 of 38Signal Processing Tools for Speech Recognition
ADDING NEW ALGORITHMS
class AlgorithmBase :// Processing:virtual boolean init();virtual boolean apply();// Configuration:virtual const String& className() const;virtual long getLeadingPad() const;virtual long getTrailingPad() const;virtual CMODE getOutputMode() const;virtual float getOutputSampleFrequency() const;virtual boolean setParser();// Debugging:boolean displayStart();boolean displayFinish();boolean displayChannel();boolean display();
}
• Interface contract allows extensibility to new algorithms;
• All algorithms are classes that implement this interface;
• Most have a default implementation.
Page 11 of 38Signal Processing Tools for Speech Recognition
ADDING NEW ALGORITHMS
boolean Energy::init() { }
const String& className() const { return CLASS_NAME; }
int GetLeadingPad() const { return 0; }
int GetTrailingPad() const { return 0; }
bool Apply(Vector<AlgorithmData> output, Vector<AlgorithmData> input){ // determine what channel to operate on … if (algorithm_d == SUM) { computeSum(output(0).makeVectorFloat(), input(0).getVectorFloat()); } …}
Page 12 of 38Signal Processing Tools for Speech Recognition
ADDING NEW ALGORITHMS
boolean Energy::computeSum(VectorFloat& output_a, const VectorFloat& input_a) {
// compute the sum of squares Float e = input_a.sumSquare();
// compute the scale factor according to specified implementation float scaled_energy = scale(e, input_a.length());
// the length of the output vector should be 1 as it only contains the energy output_a.setLength(1);
// assign the value of energy to the outputoutput_a(0) = Integral::max(floor_d, scaled_energy);
// exit gracefullyreturn true;
}
Page 13 of 38Signal Processing Tools for Speech Recognition
DEFINITIONS
Algorithm:
• Input and output is an array of floating point numbers
• Correspond to basic DSP principles
Recipe:
• Collection of algorithms which are run serially, output of An-1 is the input to An
• Named input and outputs
• Allows reuse of processing blocks between systems
Page 14 of 38Signal Processing Tools for Speech Recognition
HIERARCHY OF ALGORITHM CLASSES
AlgorithmBase
FourierTransformEnergy
Constant
Calculus Generator
Prediction
Window
Cepstrum
Filter
Reflection
Statistics Math Correlation FilterBank
Mask
Covariance
Inherit from
Algorithm Contains (runtime)
Recipe
input name
output name
Algorithm2
Algorithm1
...
Algorithmn
Page 15 of 38Signal Processing Tools for Speech Recognition
FRONT-END CONFIGURATION TOOL
• Design a front-end by creating a block diagram;
• Allows rapid prototyping of ideas.
• New modules can easily be added into the system
• Parameter file is then the input to a full speech recognition system
Page 35 of 38Signal Processing Tools for Speech Recognition
RESPONSIBILITIES OF THE UTILITY
• Parses the file containing the recipe created in the configuration tool;
• Synchronizes different paths along the block flow diagram contained in the recipe;
• Prepares input and output data buffers for each algorithm;
• Schedules the sequence of required signal processing operations;
• Processes data through the recipe;
• Manages large collections of data files.
Page 36 of 38Signal Processing Tools for Speech Recognition
VERIFICATION STRATEGY
• The correctness: The implementation of each algorithm is verified manually or by using other tools such as MATLAB.
• Usability: Assessed and enhanced the usability of our tools through extensive user testing conducted over the course of several training sessions.
• Speech recognition experiments: The correctness of the tools was also verified by speech recognition experiments.
Page 37 of 38Signal Processing Tools for Speech Recognition
STATE-OF-THE-ART FEATURES
• Mel-frequency cepstral coefficients (MFCCs);
• Cepstral mean subtraction;
• Energy normalization;
• 1st and 2nd order differential features;
• These features are used by most commercial speech recognition systems.
Page 38 of 38Signal Processing Tools for Speech Recognition
0
1
2
3
4
5
6
7
8
9
WER (%)
WSJ0 TIDIGITS
Experiment
New
Baseline
EXPERIMENTAL RESULTS
Page 39 of 38Signal Processing Tools for Speech Recognition
CONCLUSION
• The front-end performs signal processing for speech recognition systems;
• The ISIP front-end is implemented on an extensible library of basic DSP building blocks;
• A block diagram interface is used to configure the front-end data flow;
• The tool’s usability was optimized through multiple training sessions with new users;
• The system’s correctness was verified through speech recognition experiments.