spectrum? hynek hermansky with jordan cohen, sangita sharma, and pratibha jain,

11
SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Upload: gwenda-ferguson

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

SPECTRUM?

Hynek Hermansky

with

Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Page 2: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

/u/ /o/ /a/ // /iy/

Helmholtz

Radio Rex (1917)

“limited commercial success”-John Pierce 1969

beer

Newton

Page 3: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

frequ

ency

time

classify

about 20 ms Short-termspectrum

SHORT TERM SPECTRUM

Page 4: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Cortical receptive fields

Page 5: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

frequ

ency

time

ASR from TempoRAl Patterns (TRAP)

Phone “boundaries”

1 sec

temporal pattern of critical band energies

window

classify

classify

about 20 ms Short-termspectrum

Page 6: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

WHY 200-1000 ms ?

• because that’s where the information is (coarticulation)– mutual info studies (Bilmes, Yang et al.)

• psychophysics of hearing– 200 ms “critical time window” (forward masking, perception

of loudness, perception of gaps,…• physiology of hearing

– time component of cortical receptive fields (Klein)• because “it works”

– ETSI Aurora work

time

freq

uen

cy 200 – 1000 ms

Page 7: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

WHY narrow frequency bands?

• psychophysics of hearing– independence of processing within critical bands

• physiology of hearing– mechanical selectivity of cochlea– cortical receptive fields (e.g. Shamma)

• because “it works”– multi-band ASR (Bourlard and Dupont, Hermansky et al,…)– decrease in ASR accuracy for wider frequency spans (Jain and

Hermansky - Eurospeech 2003)

time

freq

uen

cy

1-3 Bark

Page 8: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Which features?

• no knowledge is better than wrong knowledge– data cannot lie– speech evolved to be heard

• data-derived processing is consistent with human-like processing (minus the irrelevant components of the human cognitive processing)

time

features

freq

uen

cy

data-guided processing

Page 9: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

WHY data-guided

processing?

• some function of class posteriors– class posteriors form the most efficient

feature set [e.g. Fukunaga]

• posteriors of which classes?

time

features

freq

uen

cy

data-guided(trained on data)

processing

Page 10: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

Speech Events

signal frequencyselective hearing

event detection

event detection

p(event,frequency)

class(phoneme?)

detection

Page 11: SPECTRUM? Hynek Hermansky with Jordan Cohen, Sangita Sharma, and Pratibha Jain,

time

frequ

ency

data processing( trained system )

processing( trained system )

some functionof phoneme posteriors

TRAP TANDEM

data processing( trained system )

class posteriors