what’re am and fm? what are their perceptual roles? where to find it? implications?

Speech recognition with amplitude and frequency Speech recognition with amplitude and frequency modulations:modulations:

Implications for cochlear implant designImplications for cochlear implant design

• What’re AM and FM?

• What are their perceptual roles?

• Where to find it?

• Implications?

Fan-Gang Zeng

Kaibao Nie

Ginger Stickney

Ying-Yee Kong

Ashish Bhargave

Hongbin Chen

Michael Vongphoe

Janice Chang

What is fine structure?What is fine structure?

• Rosen’s definition:– Envelope (5-50 Hz)

– Periodicity (50-500 Hz)

– Fine structure (500-10,000 Hz)

• Hilbert’s definition:– Temporal envelope

– Fine structure

Original

AM

Fine Structure

FM

Little math Little math

• Flanagan (1980) “Parametric coding of speech spectra”

– Discard absolute phase:

– Discard relative phase (i.e., frequency modulation):

N

1kk

t

0

kckk d)(2tf2cos)t(A)t(s.

N

1k

t

0

kckk d)(2tf2cos)t(A)t(s.

N

1kckk tf2cos)t(A)t(s

ImplementationImplementation

• Combo of Dudley’s vocoder and Flanagan’s phase vocoder

Input

AM filter, Envelope,

FM filter,

Output

FM filter, Envelope, Compression

AM

. . .

. . .

Zeng, Nie, Stickney et al. PNAS (2005)

Spectra: What does FM encode?

-6

-12

-18

-24

0

-30

4000

3000

2000

1000

5000

0

0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5

Time (s)

Fre

qu

ency

(H

z)

dB

4000

3000

2000

1000

5000

0

4000

3000

2000

1000

5000

0

4000

3000

2000

1000

5000

0

A

C

B

F

E

G D


Sentence, speaker, and tone recognition


Combo:

Target:

Masker:

Comparison with previous studies

Condition

CN4 CN8 CS4 CS8 HN4 HN8 HS4 HS8 IN4 IN8 IS4 IS8

Pe

rce

nt

Co

rre

ct

0

20

40

60

80

100 Shannon et al. 1995 Dorman et al. 1997 Zeng et al. 2005


Spectral resolution and noise type

Am

plit

ud

e (d

B)

0 1 3 4 5 6 7 8 40

50

60

70

80

90

100

110

Frequency (kHz) 2

Original AM AM+FM

40

50

60

70

80

90

100

110

Target Male Female

A

C

E S

pe

ech

Re

cep

tion

Th

resh

old

(d

B) D

F

5

-25

-20

0

10

-15

-10

-5

15 B

5

-25

-20

0

10

-15

-10

-5

15

Speech- shaped noise

Male masker

Female masker

5

-25

-20

0

10

-15

-10

-5

15

40

50

60

70

80

90

100

110

Original AM AM+FM

AM+FM AM

NH CI

AM+FM AM

Am

plit

ud

e (d

B)

0 1 3 4 5 6 7 8 40

50

60

70

80

90

100

110

Frequency (kHz) 2

Original AM AM+FM

40

50

60

70

80

90

100

110

Target Male Female

A

C

E S

pe

ech

Re

cep

tion

Th

resh

old

(d

B) D

F

5

-25

-20

0

10

-15

-10

-5

15 B

5

-25

-20

0

10

-15

-10

-5

15

Speech- shaped noise

Male masker

Female masker

5

-25

-20

0

10

-15

-10

-5

15

40

50

60

70

80

90

100

110

Original AM AM+FM

AM+FM AM

NH CI

AM+FM AM

30-dB SRT


Speech recognition in combined hearing

Kong, Stickney, and Zeng JASA (2005)

S2

Pe

rce

nt c

orr

ect 0

20

40

60

80

100 S3

S5

Signal-to-noise Ratio (SNR)

0 5 10 15 20

0

20

40

60

80

100 Mean

0 5 10 15 20

10-dB SRT

HA

CI

HA+CI

FM detection in CIs: Results

10 100 1000

1

10

100

1000

upward Regression for upwarddownward Regression for downwardSinusoid Regression for sinusoid

standard frequency (Hz)

Dif

fere

nce

lim

en(H

z)

10 100 1000

1

10

100

1000


10 100 1000

1

10

100

1000



standard frequency (Hz)

Dif

fere

nce

lim

en(H

z)

Chen and Zeng JASA (2004)

Time

Fre

quen

cy

Summary Summary Using FM to improve auditory performance:

– Speech cues are not redundant: FM complements AM in speech

perception

– FM is important for speech recognition with competing voice as

maskers

– FM is important for music and tonal language perception

– FM is a slow version of fine structure that can be perceived and

used to improve cochlear implant performance

Acknowledgements

• NIH - NIDCD• Chinese NSF• Advanced Bionics Corp• Cochlear Corp• Medel• Peter Assmann• Ann Bradlow• Keli Cao and CG Wei• Larry Feth• Ruth Litovsky• Jones Ackland

what’re am and fm? what are their perceptual roles? where to find it? implications?

Documents

speech recognition

speech perceptionfm

speech cues

zeng jasa

fm complements

relative phase

slow version of fine

hzfine structure