improved asr in noise using harmonic decomposition introduction pitch-scaled harmonic filter...

21
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contributio n periodic contributio n Production of /z/:

Upload: jane-striker

Post on 31-Mar-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Improved ASR in noise using harmonic decomposition

• Introduction

• Pitch-Scaled Harmonic Filter

• Recognition Experiments

• Results

• Conclusion aperiodic contribution

periodic contribution

Production of /z/:

Page 2: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Motivation & Aims

• Most speech sounds are predominantly voiced or unvoiced.

What happens when the two components are “mixed”?

• Voiced and unvoiced components have different natures:

unvoiced: aperiodic signal from turbulence-noise sources

voiced: quasi-periodic signal from vocal-fold vibration

Why not extract their features separately?

Do the two contributions contain complementary information?

• Human speech recognition still performs well in noise.

How? Does it take advantage of harmonic properties?

Introduction

Page 3: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Voiced and unvoiced parts of a speech signal

aperiodic contribution

periodic contribution

Production of /z/:

Introduction

Page 4: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Automatic Speech Recognition

Front EndPattern

Recognitionspeech signal

speech labels

Feature Extraction:

conversion of speech signals to a sequence of parameter vectors

Dynamic Programming:

matching of observation sequences to models of known utterances

Introduction

Page 5: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

u(n) v(n)

Harmonic Decomposition

Pitch optimisation

PSHF block diagram

raw pitch

wave-form

+ _

optimised pitch

f0raw f0

opt

aperiodic waveform

s(n)

periodic waveform

Nopt

sw(n)

vw(n)^

window

w(n) w(n)

window

uw(n)^

PSHF

Page 6: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Decomposition example (waveforms)

Ori

gina

lP

erio

dic

part

Ape

riod

ic

part

PSHF

Page 7: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Decomposition example (spectrograms)

Ori

gina

lP

erio

dic

part

Ape

riod

ic

part

PSHF

Page 8: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Decomposition example (MFCC specs.)

Ori

gina

lP

erio

dic

part

Ape

riod

ic

part

PSHF

Page 9: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Parameterisations

SPLIT: MFCC +Δ, +Δ2 catPSHF

PCA26:

PCA78:

PCA13:

PCA39:

MFCC +Δ, +Δ2catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

MFCC +Δ, +Δ2 catPSHF PCA

BASE: MFCCwaveform features

+Δ, +Δ2

Method

Page 10: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Speech Database: Aurora 2.0

• TIdigits database at 8 kHz, filtered with G.712 channel

• Connected English digit strings (male & female speakers)

GroupSignal-to-Noise Ratio

(dB)

clean condition Train

multi-condition 20 15 10 5

set A(same noises)

20 15 10 5 0 -5

set B(different noises)

20 15 10 5 0 -5Test

set C(different channel)

20 15 10 5 0 -5

Method

Page 11: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Description of the experiments

• Baseline experiment: [base]

standard parameterisation of the original waveforms (i.e., MFCC+D+A)

• Split experiments: [split]

adjustment of stream weights (voiced vs. unvoiced)

• PCA experiments: [pca26, pca78, pca13 and pca39]

decorrelation of the feature vectors, and reduction of the number of coefficients

Method

Page 12: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Split experiments resultsResults

Page 13: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Split experiments resultsResults

Page 14: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Split experiments resultsResults

Page 15: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Word Accuracy (%)clean multi overall

base 52.6 78.3 65.4split 77.9 89.1 83.0pca26 71.2 88.8 78.8pca78 61.9 88.1 74.7pca13 72.6 87.6 79.7pca39 70.9 87.5 78.8

Word Accuracy (%) WER (%)clean multi overall abs. rel.

base 52.6 78.3 65.4 -- --split 77.9 89.1 83.0 17.6 50.9pca26 71.2 88.8 78.8 13.4 38.7pca78 61.9 88.1 74.7 9.3 26.9pca13 72.6 87.6 79.7 14.3 41.3pca39 70.9 87.5 78.8 13.4 38.7

Summary of resultsResults

Page 16: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Conclusions

• PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic).

• Used separately, accuracy was slighty degraded, however together, it was substantially increased in noisy conditions.

• Periodic speech segments provide robustness to noise.

• Apply Linear Discriminant Analysis (LDA) to the two-stream feature vector.

• Evaluate the performance of this front end in a more general task, such as phoneme recognition.

• Test the technique for speaker recognition.

Further Work

Page 17: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

COLUMBO PROJECT: Harmonic Decomposition applied to ASR

David M. Moreno 1 <[email protected]>

Philip J.B. Jackson 2 <[email protected]>

Javier Hernando 1 <[email protected]>

Martin J. Russell 3 <[email protected]>

http://www.ee.surrey.ac.uk/

Personal/P.Jackson/Columbo/

1 2 3

Page 18: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Pitch Optimisation: vowel /u/

Cost function

Spectrum derived from a 268-point DFT

Page 19: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Harmonic Decomposition: vowel /u/

Page 20: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Word accuracy results (%)

Page 21: Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution

Observation probability, with stream weights