spectrum? hynek hermansky with jordan cohen, sangita sharma, and pratibha jain,

SPECTRUM?

Hynek Hermansky

with

Jordan Cohen, Sangita Sharma, and Pratibha Jain,

/u/ /o/ /a/ // /iy/

Helmholtz

Radio Rex (1917)

“limited commercial success”-John Pierce 1969

beer

Newton

frequ

ency

time

classify

about 20 ms Short-termspectrum

SHORT TERM SPECTRUM

Cortical receptive fields

frequ

ency

time

ASR from TempoRAl Patterns (TRAP)

Phone “boundaries”

1 sec

temporal pattern of critical band energies

window

classify

classify

about 20 ms Short-termspectrum

WHY 200-1000 ms ?

• because that’s where the information is (coarticulation)– mutual info studies (Bilmes, Yang et al.)

• psychophysics of hearing– 200 ms “critical time window” (forward masking, perception

of loudness, perception of gaps,…• physiology of hearing

– time component of cortical receptive fields (Klein)• because “it works”

– ETSI Aurora work

time

freq

uen

cy 200 – 1000 ms

WHY narrow frequency bands?

• psychophysics of hearing– independence of processing within critical bands

• physiology of hearing– mechanical selectivity of cochlea– cortical receptive fields (e.g. Shamma)

• because “it works”– multi-band ASR (Bourlard and Dupont, Hermansky et al,…)– decrease in ASR accuracy for wider frequency spans (Jain and

Hermansky - Eurospeech 2003)

time

freq

uen

cy

1-3 Bark

Which features?

• no knowledge is better than wrong knowledge– data cannot lie– speech evolved to be heard

• data-derived processing is consistent with human-like processing (minus the irrelevant components of the human cognitive processing)

time

features

freq

uen

cy

data-guided processing

WHY data-guided

processing?

• some function of class posteriors– class posteriors form the most efficient

feature set [e.g. Fukunaga]

• posteriors of which classes?

time

features

freq

uen

cy

data-guided(trained on data)

processing

Speech Events

signal frequencyselective hearing

event detection

event detection

p(event,frequency)

class(phoneme?)

detection

time

frequ

ency

data processing( trained system )

processing( trained system )

some functionof phoneme posteriors

TRAP TANDEM

data processing( trained system )

class posteriors

spectrum? hynek hermansky with jordan cohen, sangita sharma, and pratibha jain,

Documents