spectrum? hynek hermansky with jordan cohen, sangita sharma, and pratibha jain,
Post on 04-Jan-2016
213 Views
Preview:
TRANSCRIPT
SPECTRUM?
Hynek Hermansky
with
Jordan Cohen, Sangita Sharma, and Pratibha Jain,
/u/ /o/ /a/ // /iy/
Helmholtz
Radio Rex (1917)
“limited commercial success”-John Pierce 1969
beer
Newton
frequ
ency
time
classify
about 20 ms Short-termspectrum
SHORT TERM SPECTRUM
Cortical receptive fields
frequ
ency
time
ASR from TempoRAl Patterns (TRAP)
Phone “boundaries”
1 sec
temporal pattern of critical band energies
window
classify
classify
about 20 ms Short-termspectrum
WHY 200-1000 ms ?
• because that’s where the information is (coarticulation)– mutual info studies (Bilmes, Yang et al.)
• psychophysics of hearing– 200 ms “critical time window” (forward masking, perception
of loudness, perception of gaps,…• physiology of hearing
– time component of cortical receptive fields (Klein)• because “it works”
– ETSI Aurora work
time
freq
uen
cy 200 – 1000 ms
WHY narrow frequency bands?
• psychophysics of hearing– independence of processing within critical bands
• physiology of hearing– mechanical selectivity of cochlea– cortical receptive fields (e.g. Shamma)
• because “it works”– multi-band ASR (Bourlard and Dupont, Hermansky et al,…)– decrease in ASR accuracy for wider frequency spans (Jain and
Hermansky - Eurospeech 2003)
time
freq
uen
cy
1-3 Bark
Which features?
• no knowledge is better than wrong knowledge– data cannot lie– speech evolved to be heard
• data-derived processing is consistent with human-like processing (minus the irrelevant components of the human cognitive processing)
time
features
freq
uen
cy
data-guided processing
WHY data-guided
processing?
• some function of class posteriors– class posteriors form the most efficient
feature set [e.g. Fukunaga]
• posteriors of which classes?
time
features
freq
uen
cy
data-guided(trained on data)
processing
Speech Events
signal frequencyselective hearing
event detection
event detection
p(event,frequency)
class(phoneme?)
detection
time
frequ
ency
data processing( trained system )
processing( trained system )
some functionof phoneme posteriors
TRAP TANDEM
data processing( trained system )
class posteriors
top related