what’re am and fm? what are their perceptual roles? where to find it? implications?
DESCRIPTION
Speech recognition with amplitude and frequency modulations: Implications for cochlear implant design. Fan-Gang Zeng Kaibao Nie Ginger Stickney Ying-Yee Kong Ashish Bhargave Hongbin Chen Michael Vongphoe Janice Chang. What’re AM and FM? What are their perceptual roles? - PowerPoint PPT PresentationTRANSCRIPT
Speech recognition with amplitude and frequency Speech recognition with amplitude and frequency modulations:modulations:
Implications for cochlear implant designImplications for cochlear implant design
• What’re AM and FM?
• What are their perceptual roles?
• Where to find it?
• Implications?
Fan-Gang Zeng
Kaibao Nie
Ginger Stickney
Ying-Yee Kong
Ashish Bhargave
Hongbin Chen
Michael Vongphoe
Janice Chang
What is fine structure?What is fine structure?
• Rosen’s definition:– Envelope (5-50 Hz)
– Periodicity (50-500 Hz)
– Fine structure (500-10,000 Hz)
• Hilbert’s definition:– Temporal envelope
– Fine structure
Original
AM
Fine Structure
FM
Little math Little math
• Flanagan (1980) “Parametric coding of speech spectra”
– Discard absolute phase:
– Discard relative phase (i.e., frequency modulation):
N
1kk
t
0
kckk d)(2tf2cos)t(A)t(s.
N
1k
t
0
kckk d)(2tf2cos)t(A)t(s.
N
1kckk tf2cos)t(A)t(s
ImplementationImplementation
• Combo of Dudley’s vocoder and Flanagan’s phase vocoder
Input
AM filter, Envelope,
FM filter,
Output
FM filter, Envelope, Compression
AM
. . .
. . .
Zeng, Nie, Stickney et al. PNAS (2005)
Spectra: What does FM encode?
-6
-12
-18
-24
0
-30
4000
3000
2000
1000
5000
0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Time (s)
Fre
qu
ency
(H
z)
dB
4000
3000
2000
1000
5000
0
4000
3000
2000
1000
5000
0
4000
3000
2000
1000
5000
0
A
C
B
F
E
G D
Zeng, Nie, Stickney et al. PNAS (2005)
Sentence, speaker, and tone recognition
Zeng, Nie, Stickney et al. PNAS (2005)
Combo:
Target:
Masker:
Comparison with previous studies
Condition
CN4 CN8 CS4 CS8 HN4 HN8 HS4 HS8 IN4 IN8 IS4 IS8
Pe
rce
nt
Co
rre
ct
0
20
40
60
80
100 Shannon et al. 1995 Dorman et al. 1997 Zeng et al. 2005
Zeng, Nie, Stickney et al. PNAS (2005)
Spectral resolution and noise type
Am
plit
ud
e (d
B)
0 1 3 4 5 6 7 8 40
50
60
70
80
90
100
110
Frequency (kHz) 2
Original AM AM+FM
40
50
60
70
80
90
100
110
Target Male Female
A
C
E S
pe
ech
Re
cep
tion
Th
resh
old
(d
B) D
F
5
-25
-20
0
10
-15
-10
-5
15 B
5
-25
-20
0
10
-15
-10
-5
15
Speech- shaped noise
Male masker
Female masker
5
-25
-20
0
10
-15
-10
-5
15
40
50
60
70
80
90
100
110
Original AM AM+FM
AM+FM AM
NH CI
AM+FM AM
Am
plit
ud
e (d
B)
0 1 3 4 5 6 7 8 40
50
60
70
80
90
100
110
Frequency (kHz) 2
Original AM AM+FM
40
50
60
70
80
90
100
110
Target Male Female
A
C
E S
pe
ech
Re
cep
tion
Th
resh
old
(d
B) D
F
5
-25
-20
0
10
-15
-10
-5
15 B
5
-25
-20
0
10
-15
-10
-5
15
Speech- shaped noise
Male masker
Female masker
5
-25
-20
0
10
-15
-10
-5
15
40
50
60
70
80
90
100
110
Original AM AM+FM
AM+FM AM
NH CI
AM+FM AM
30-dB SRT
Zeng, Nie, Stickney et al. PNAS (2005)
Speech recognition in combined hearing
Kong, Stickney, and Zeng JASA (2005)
S2
Pe
rce
nt c
orr
ect 0
20
40
60
80
100 S3
S5
Signal-to-noise Ratio (SNR)
0 5 10 15 20
0
20
40
60
80
100 Mean
0 5 10 15 20
10-dB SRT
HA
CI
HA+CI
FM detection in CIs: Results
10 100 1000
1
10
100
1000
upward Regression for upwarddownward Regression for downwardSinusoid Regression for sinusoid
standard frequency (Hz)
Dif
fere
nce
lim
en(H
z)
10 100 1000
1
10
100
1000
upward Regression for upwarddownward Regression for downwardSinusoid Regression for sinusoid
10 100 1000
1
10
100
1000
upward Regression for upwarddownward Regression for downwardSinusoid Regression for sinusoid
upward Regression for upwarddownward Regression for downwardSinusoid Regression for sinusoid
standard frequency (Hz)
Dif
fere
nce
lim
en(H
z)
Chen and Zeng JASA (2004)
Time
Fre
quen
cy
Summary Summary Using FM to improve auditory performance:
– Speech cues are not redundant: FM complements AM in speech
perception
– FM is important for speech recognition with competing voice as
maskers
– FM is important for music and tonal language perception
– FM is a slow version of fine structure that can be perceived and
used to improve cochlear implant performance
Acknowledgements
• NIH - NIDCD• Chinese NSF• Advanced Bionics Corp• Cochlear Corp• Medel• Peter Assmann• Ann Bradlow• Keli Cao and CG Wei• Larry Feth• Ruth Litovsky• Jones Ackland