pitch perception in auditory scenes 2 papers on pitch perception… of a single sound source of more...

27
Pitch perception in auditory scenes

Upload: derek-montgomery

Post on 05-Jan-2016

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

Pitch perception in auditory scenes

Page 2: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

2

Papers on pitch perception…

• of a single sound source

• of more than one sound source

LOTS - too many?

Almost none ± a few

Page 3: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

4

Bach: Musical Offering (strings)

Page 4: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

5

Talk outline

• Low-level grouping cues in pitch perception– harmonic structure

• Conjoint or disjoint allocation

– onset-time– context– spatial cues.

• Use of a difference in harmonic structure between two sound sources to help identify, track over time, and localise simultaneous sound sources.

Page 5: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

6

Excitation pattern of complex tone on basilar membrane

-5.0

0.0

5.0

10.0

15.0

20.0

25.0

base apexlog (ish) frequency

2004001600

resolvedunresolved

600800

Output of 1600 Hz filter Output of 200 Hz filter

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 0.2 0.4 0.6 0.8 1

1/200s = 5ms

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

1/200s = 5ms

Page 6: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

7

Mistuned harmonic’s contribution to pitch declines as Gaussian function of mistuning

• Experimental evidencefrom mistuning expts: Moore, Glasberg & Peters JASA (1985).

Darwin (1992). In M. E. H. Schouten (Ed). The auditory processing of speech: from sounds to words Berlin: Mouton de Gruyter

frequency (Hz)

400 800 1200 1600 2000 24000

Match low pitch

-1.5

-1

-0.5

0

0.5

1

1.5

540 560 580 600 620 640 660 680

∆F

0

(Hz)

Frequency of mistuned harmonic f (Hz)

∆Fo = a - k ∆f exp(-∆f

2 / 2 s

2)

s = 19.9

k = 0.073

Page 7: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

8

Harmonic Sieve

• Only consider frequencies that are close enough to harmonic. Useful as front-end to a Goldstein-type model of pitch perception.

Duifhuis, Willems & Sluyter JASA (1982).

frequency (Hz)

400 800 1200 1600 2000 2400

200 Hz sieve spacing

0

blocked

Page 8: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

9

Is “harmonic sieve” necessary with autocorrelation models?

• Autocorrelation could in principle explain mistuning effect– mistuned harmonic

initially shifts autocorrelation peak

– then produces its own peak

• But the numbers do not work out. – Meddis & Hewitt model

is too tolerant of mistuning.

Page 9: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

10

Disjoint allocation ?

Will the 200-Hz series grab the 600-Hz component, and so make it give less of a pitch shift to the 155-Hz series ?

Double Complex, Ipsilateral

TARGET

MATCH

600Hz

155 465 775

155 465 775

200

Darwin, Buffa, Williams and Ciocca (1992). in Auditory physiology and perception edited by Cazals, Horner and Demany (Pergamon, Oxford)

Page 10: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

11

Disjoint allocation… (2)

• No evidence of 600-Hz component being grabbed by "in-tune" 200-Hz complex

-1.5

-1

-0.5

0

0.5

1

1.5

550 570 590 610 630 650 670 690

ipsi doubleipsi singleipsi double-single

Match: 155 HzLeft: 155 Hz F0Right: 200 Hz F05 subjects

Page 11: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

12

Pitch determined by conjoint allocation

• Decision on each simultaneous pitch taken independently.

Page 12: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

13

Mistuning & pitch

-0.2

0

0.2

0.4

0.6

0.8

1

0 1 2 3 5 8

vowelcomplex

Mea

n pi

tch

shif

t (H

z)

% Mistuning of 4th Harmonic

8 subjects

90 ms

Page 13: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

14

Onset asynchrony & pitch

-0.2

0

0.2

0.4

0.6

0.8

1

0 80 160 240 320

vowelcomplex

Onset Asynchrony T (ms)

Mea

n pi

tch

shif

t (H

z) 8 subjects±3% mistuning

90 ms

T

Page 14: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

15

Adaptation ?0, 30, 300 ms 90 ms

155

465

775

1085

1705

1395

640 640

Increases effect of mistuning

Page 15: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

16

Use of onset-tme

• A long onset-time removes a harmonic from pitch calculation.

• Not just due to adaptation because of regrouping experiment

• NB Onset-time much longer for pitch (100s of ms) than for hearing sound out as separate harmonic or for timbre judgements of complex (10s of ms).

Page 16: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

17

Repeated context and pitch50 ms 90 ms

155 Hz

1085

1500

620 ± ∆

1810 Hz

0

0.4

0.8

1.2

1.6

0 2 4 6 8

IsolatedIn sequence

Absolute mistuning of 4th harmonic (%)

Darwin, Hukin, & Al-Khatib, B. Y. (1995). "Grouping in pitch perception: evidence for sequential constraints," J. Acoust. Soc. Am. 98, 880-885.

Page 17: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

18

Pitch perception is not a bacon slicer

• Pitch perception does not operate on independent time-slices

• It uses, for example, the “Old + New” heuristic to parse which components are temporally relevant

Speckschneidermaschine

Page 18: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

19

These conclusions apply to resolved harmonics

Separate simultaneous pitches only audible with unresolved harmonics when each pitch comes predominantly from a separate frequency region.

Problem for autocorrelation models

And the Old+New Heuristic breaks down. No advantage for onset-time.

Page 19: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

20

Superimposing unresolved harmonics

Time (s)0 0.1

ミ 0.9636

0.9636

0

Time (s)0 0.1

ミ 0.9696

0.9696

0

Time (s)0 0.1

ミ 1.929

1.929

0

2kHz - 3kHz

Fo = 100 Hz

Fo = 125 Hz

sum

Carlyon, R. P. (1996). "Encoding the fundamental frequency of a complex tone in the presence of a spectrally overlapping masker," J. Acoust. Soc. Am. 99, 517-24.

Page 20: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

21

Spatial cues ?

• Contralateral mistuned component contributes almost as much as ipsilateral (k= .073 vs .061)Cf Beerends and

Houtsma JASA (1989). -1.5

-1

-0.5

0

0.5

1

1.5

540 560 580 600 620 640 660 680

ipsilateral

contralateral

∆F

0 (H

z)

Frequency of mistuned harmonic f (Hz)

∆Fo = a - k ∆f exp(-∆f

2 / 2 s

2)

s = 17.4

k = 0.061

s = 19.9

k = 0.073

Page 21: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

22

Summary of constraints for pitch

• Harmonic (Gaussian s.d. 3%)• Onset time (~100-200 ms)• Repeated Context• V. weak spatial constraint• Independent estimates of different

simultaneous F0s (conjoint allocation)

• N.B. These constraints different for unresolved harmonics and for e.g. timbre / vowel perception

Old +New

Page 22: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

23

Using pitch differences to separate different objects

• Helps intelligibility of simultaneous speech

• Helps to track one voice in presence of others

• Helps to separate objects for independent

localisation

Page 23: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

24

Fo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982)

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Fo difference (semitones)

40 Subjects40 Sentence Pairs

Perfect Fourth ~4:3

Two sentences (same talker)• only voiced consonants • (with very few stops)Thus maximising Fo effect

Task: write down target sentence

Replicates & extends Brokx & Nooteboom

Target sentence Fo = 140 Hz

Masking sentence = 140 Hz ± 0,1,2,5,10 semitones

Page 24: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

25

Example of using harmonicity for grouping for another property

How do we localise complex sounds when more than one sound source is present?

Maybe we group sounds first and then pool the localisation estimates of the component frequencies for that sound. More stable than grouping by location?

Page 25: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

26

Localisation by ITD:Jeffress / Trahiotis & Stern

500 Hz Narrow-band noise

Right ear leads by 1.5 ms

500 Hz Wider-band noise

Right ear leads by 1.5 ms

Hear on Left

(phase ambiguity)

Hear on Right(consistent ITD)

Page 26: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

27

Resolution of phase ambiguityResolution of phase ambiguity by across-frequency comparison

500 Hz: period = 2ms

L lags by 1.5 ms or L leads by 0.5 ms ?

-2.5200

800

600

400

-0.5 1.5 3.5

Delay of cross-correlator ms

Frequency of auditory filter Hz

Cross-correlation peaks for sound delayed in one ear by 1.5 ms

300 Hz: period = 3.3ms

R R LL R

Actual delay

Left ear actually lags by 1.5 ms

L lags by 1.5 ms or L leads by 1.8 ms ?

R

Page 27: Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none

29

Mistuning & localisation

-15

-10

-5

0

5

10

15

Poin

ter

IID

(d

B)

0 1 3 6

6 Ss

percent mistuning