sound, ears, brains and the world - mit opencourseware · 2020-01-04 · when objects in the world...

133
Sound, Ears, Brains and the World Josh McDermott Dept. of Brain and Cognitive Sciences, MIT CBMM Summer School 1

Upload: others

Post on 27-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Sound, Ears, Brains and the World

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT

CBMM Summer School

1

Page 2: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Radio excerpt:

Scene from cafe:

Scene from sports bar:

Barry White:

The ear receives a pressure waveform.

Pre

ssur

e

Time

Consider some examples of typical auditory input:

2

Page 3: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of a wave. The ears measure this sound energy and transmit it to the brain. The task of the brain is to interpret this signal, and use it to figure out what is out there in the world.

AUDITION

3

Page 4: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

The listener is interested in what happened in the world to cause the sound:

• Most properties of interest are not explicit in the waveform:

How do we derive information about the world from sound?

© Source Unknown. All rights

reserved. This content is

excluded from our Creative

Commons license. For more

information, see

https://ocw.mit.edu/help/faq-

fair-use/.

© Source Unknown. All rights

reserved. This content is

excluded from our Creative

Commons license. For more

information, see

https://ocw.mit.edu/help/faq-

fair-use/.

© Source Unknown. All rights

reserved. This content is

excluded from our Creative

Commons license. For more

information, see

https://ocw.mit.edu/help/faq-

fair-use/.

© John Rowley. All rights

reserved. This content is

excluded from our Creative

Commons license. For more

information, see

https://ocw.mit.edu/help/faq-

fair-use/.

4

Page 5: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

The Cocktail Party Problem

Real-world settings often involve concurrent sounds.

© Paramount Pictures. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

5

Page 6: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Presence of other speakers obscures much structure of target utterance, but speech remains intelligible.

•Present-day speech recognition algorithms (e.g. in your iPhone) fall apart in such circumstances.

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H. "The cocktail party problem." Current Biology 19,

no. 22 (2009): R1024-R1027.

6

Page 7: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions. Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

7

Page 8: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Human speech recognition is also remarkably invariant:

Dry Reverberant

Perc

en

t E

rro

rs

Machine speech recognition is not robust:

8

Page 9: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Experiments in humans

Auditory neuroscience

Machine algorithms

Psychology Neuroscience Engineering

My research group: Laboratory for Computational Audition

9

Page 10: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

General approach: start with what the brain has to work with.

Work with representations like those in the early auditory system.

© Academic Press. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: Lindsay, Peter H., and Donald A. Norman. Human information processing: An

introduction to psychology. Academic press, 2013. 10

Page 11: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Plan for this morning:

1. Overview of Auditory System 2. Sound Texture Perception 3. Perception of Sound Sources 4. Auditory Scene Analysis

11

Page 12: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Part 1: Overview of Auditory System

12

Page 13: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Functional schematic of the ear:

13

Page 14: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions.

Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

14

Page 15: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Cochlear transduction is frequency tuned:

Diagram of fourier transform in the basilar membrane removed due to copyright restrictions.

Source: "Transmission of sound within the inner ear" from Hawkins, Joseph, "Human ear:

Anatomy" Encyclopedia Britannica, Last updated on February, 24 2017.https://www.britannica.com/science/ear/Transmission-of-sound-by-bone-conduction.

15

Page 16: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Cross section of cochlea

© Source Unknown. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

16

Page 17: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Diagram of the inner ear removed due to copyright restrictions.

Please see the video.

17

Movement of the basilar membrane causes the hair cells to move against the tectorial membrane, which causes the cilia to bend.

When the cilia bend, the hair cells release neurotransmitter onto synapses with auditory nerve fibers that send signals to the brain.

Page 18: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

But because only part of the basilar membrane moves for a given frequency of sound, each hair cell and auditory nerve fiber signal only particular frequencies of sound. One example:

© Source Unknown. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

18

Page 19: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Different nerve fibers (synapsing at different points along the basilar membrane) are tuned to different frequencies:

19

© Source Unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Page 20: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Auditory nerve fibers usually approximated as bandpass filters:

20

Page 21: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Sound signal represented with “subbands”, like auditory nerve:

Original Sound Signal

Time (ms) 0 10 A

mpl

itude

350-520 Hz

Time (ms)

Am

plitu

de

0 10

Cochlear Filter Bank

Subband

Frequency

21

Page 22: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Sound signal represented with “subbands”, like auditory nerve:

Original Sound Signal

Time (ms) 0 10 A

mpl

itude

Cochlear Filter Bank

Subband

Frequency 1330-1760 Hz

Time (ms)

Am

plitu

de

0 10 22

Page 23: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Sound signal represented with “subbands”, like auditory nerve:

Original Sound Signal

Time (ms) 0 10 A

mpl

itude

Cochlear Filter Bank

Subband

Frequency 4000-5000 Hz

Time (ms)

Am

pli

tud

e

0 10

23

Page 24: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Frequency bandwidths

1. Sound signal represented with “subbands”, like auditory nerve:

24

Page 25: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters Sound signal

AUDITORY MODEL

Subbands

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940.

25

Page 26: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Frequency selectivity has a host of perceptual consequences.

26

© Source Unknown. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Page 27: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Frequency selectivity is evident by the ability to “hear out” individual frequency components of a complex tone:

27

Page 28: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Am

plitu

de

Time

Perception of beating constrained by freq. selectivity

Superposition of two pure tones waxes and wanes in amplitude.

•Perception of beating is constrained by the cochlea: -Beats are only heard if two frequency components fall within the filter bandwidth of the cochlea:

1 semitone frequency difference: 3 semitones: 8 semitones:

The perceptual correlate of rapid beating is known as roughness.

28

Page 29: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters Sound signal

AUDITORY MODEL

Subbands

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940.

29

Page 30: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Rose (1971) Auditory nerve fiber response to pure tones

at different frequencies and levels:

But… linear filtering provides only an approximate description of cochlear tuning. At high levels, frequency tuning broadens:

Figure removed due to copyright restrictions.

Please see the video.

Source: Rose, Jerzy E., Joseph E. Hind, David J. Anderson, and John F. Brugge. "Some effects of stimulus intensity

on response of auditory nerve fibers in the squirrel monkey." J. Neurophysiol 34, no. 4 (1971): 685-699.

30

Page 31: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

At higher stimulus levels, frequency tuning broadens:

Figures removed due to copyright restrictions.

Please see the video.

Source: Rose, Jerzy E., Joseph E. Hind, David J. Anderson, and John F. Brugge. "Some effects of stimulus intensity

on response of auditory nerve fibers in the squirrel monkey." J. Neurophysiol 34, no. 4 (1971): 685-699.

31

Page 32: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions.

Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

32

Page 33: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Another important response property of the cochlea: For low frequencies, auditory nerve spikes are phase-locked to the stimulus:

Neural Coding of Sound

© Source Unknown. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

33

Page 34: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions.

Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

34

Phase locking occurs for frequencies under ~4kHz (in nonhuman animals – no available data in humans).

Page 35: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Most nerve fibers don’t fire with every stimulus cycle, especially for higher frequencies. A stimulus is encoded by many nerve fibers at once. This form of population coding may be one reason why there are many more auditory nerve fibers (30,000 per ear) than there are inner hair cells (3500 per ear).

© Source Unknown. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

35

Page 36: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Per ear: 3500 inner hair cells 12,000 outer hair cells 30,000 auditory nerve fibers Per eye: 5 million cones 100 million rods 1.5 million optic nerve fibers

Some interesting numbers:

Per hemisphere: 60 million neurons in primary auditory cortex? 140 million neurons in primary visual cortex

36

Page 37: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions.

Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

37

Page 38: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

How does a subband relate to what we see in the auditory nerve?

Original Sound Signal

Time (ms) 0 10 A

mpl

itude

Cochlear Filter Bank

Subband

Frequency 1330-1760 Hz

Time (ms)

Am

plitu

de

0 10 38

Page 39: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Subbands can be characterized by instantaneous amplitude and phase, loosely mapping onto rate and spike timing in auditory nerve:

39

Page 40: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

-envelope is easy to extract from auditory nerve responses (firing rate over local time windows) -we often extract it with Hilbert transform (magnitude of analytic signal)

Much of the information in sound is carried by the way that frequencies are modulated over time, measured by the instantaneous amplitude in a subband:

Amplitude modulations are captured by the envelope

40

Page 41: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

A spectrogram contains the envelope of each subband:

1 2 0

Time (s)

Drumming

41

Page 42: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1 2 0

Time (s)

Drumming

Envelopes often capture all the information that matters perceptually*.

Sounds can be reconstructed just from the envelopes:

42

Page 43: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Start with noise, replace with envelopes, resynthesize, iterate:

+

43

Page 44: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

Sound signal

AUDITORY MODEL

Cochlear subbands

Envelopes

Amplitude compression, simulating that of cochlea

(spectrogram)

© Elsevier. All rights reserved. This content is excluded from our Creative Commons license.

For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

44

Page 45: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Spike-triggered average: a method to characterize a neuron’s receptive field.

45

Page 46: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Figure removed due to copyright restrictions.

Please see the video.

Source: Chapter 8, Ochsner, Kevin, and Stephen M. Kosslyn. The Oxford Handbook of

Cognitive Neuroscience, Volume 2: The Cutting Edges. Vol. 2. Oxford University Press, 2013.

46

STRF = spectro-temporal receptive field Derived from methods like the STA applied to stimulus spectrogram. As early as the midbrain, auditory neurons are tuned to particular modulation rates.

Page 47: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Envelope structure measured with second filter bank:

Old idea (Dau, Viemeister etc.), with fair bit of empirical support. © Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 47

Page 48: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Envelope structure measured with second filter bank:

Mod. bands

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 48

Page 49: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

AUDITORY MODEL

Model of auditory signal processing from cochlea to midbrain/thalamus:

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 49

Page 50: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Given these representations, how do we recognize sounds and their properties?

Sound signal

Cochlear subbands

Envelopes

Mod. bands

AUDITORY MODEL

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 50

Page 51: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Part 2: Sound Texture

51

Page 52: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Textures result from large numbers of acoustic events.

•rain •wind •birds in a forest •running water •insects at night •crowd noise •applause •fire

Sound textures are common in the world, but largely unstudied.

SOUND TEXTURE

52

© Source Unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Page 53: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Unlike event sounds, textures are stationary - essential properties do not change over time.

•Stationarity makes textures a good starting point for understanding auditory representation.

Much of hearing research is concerned with the sounds of individual events:

53

Page 54: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

How do people represent, recognize sound textures?

What do you extract and store about these waveforms to recognize that they are the same kind of thing?

54

Page 55: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Key Theoretical Proposal:

•Because they are stationary, textures can be captured by statistics that are time-averages of acoustic measurements.

•When you recognize the sound of fire or the sound of rain, you may be recognizing these statistics.

What kinds of statistics might we be measuring?

55

Page 56: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Whatever statistics the auditory system measures are presumably derived from these representations:

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 56

Page 57: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

How far can we get with generic statistics of standard auditory representations?

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via

statistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5

(2011): 926-940. 57

Page 58: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

How far can we get with generic statistics – marginal moments (mean/variance/skew) and pairwise correlations?

58

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 59: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Statistics are not specifically tailored to natural sounds •Ultimately, would be nice to learn statistics from data (working on it…) •Simple, involve operations that could be instantiated in neurons

To be useful for recognition, statistics need to give different values for different sounds…

59

Page 60: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

60

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 61: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Frequency of Occurrence

Am

plitude

Envelope Distribution:

Marginal moments (mean, variance, skew) describe distribution of envelope:

61

Page 62: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Envelope distributions for natural signals differ from those for noise.

Distributions have similar mean, but different shapes.

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940. 62

Page 63: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Intuition: natural sounds contain events (raindrops, geese calls)

These events are infrequent, but when they occur, they produce large amplitudes.

Natural signals are sparser than noise.

Sparsity reflected in envelope moments (cf Strickland, Lorenzi).

cf Attias and Schreiner

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940. 63

Page 64: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

64

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 65: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Correlations between envelopes vary across sounds.

Broadband events induce dependencies between channels.

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

65

Page 66: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Correlations reflect broadband events (crackles, claps):

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

66

Page 67: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

67

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 68: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Textures vary in distribution of modulation power:

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

68

Page 69: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

69

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 70: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

70

Page 71: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

These statistics capture variation across sound...

71

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 72: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Can they account for perception of real-world textures?

72

Page 73: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•If your brain represents sounds with a set of measurements, then:

Signals with the same values of those measurements should sound the same.

•Sounds synthesized to have the same measurements as a real-world recording should sound like it…

IF the measurements are what the brain is using to represent sound.

Key Methodological Proposal:

•Synthesis is a powerful way to test a perceptual theory.

73

Page 74: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Envelopes

Simple example: test the role of the mean of each cochlear envelope (power spectrum)

•Measure average value of each envelope in real-world texture •Then synthesize random signal with same envelope means.

74

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 75: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Start with noise, rescale noise subbands, resynthesize:

+

75

Page 76: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Synthesis is not realistic (everything sounds like noise):

•We aren’t simply registering the spectrum (mean values of envelopes) when we recognize textures.

What do they sound like?

Rain

Stream

Bubbles

Fire

Applause

76

Page 77: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

Will additional simple statistics do any better?

77

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 78: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Then we alter noise envelopes to give them the same statistics:

First, we measure the statistics of a real-world sound texture:

McDermott & Simoncelli, Neuron, 2011

*synthesis code now available

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli."Sound texture perception via statistics of

the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940. 78

Page 79: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

How do they sound?

If statistics account for texture perception, synthetic signals should sound like new examples of the real thing…

The result: a signal that shares the statistics of a real-world sound.

79

Page 80: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Rain

Stream

Bubbles

Fire

Applause

Wind

Insects

Birds

Crowd

With marginal moments and pairwise correlations, synthesis is often compelling:

80

Page 81: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Rustling Paper

Train

Helicopter

Jackhammer

Also works for many “unnatural” sounds:

Success of synthesis suggests these statistics could underlie representation and recognition of textures.

81

Page 82: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

Will any set of statistics do? What if we measure statistics from model deviating from biology?

82

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 83: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Will any set of statistics do? What if we measure statistics from model deviating from biology?

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli."Sound texture perception via statistics of

the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

83

Page 84: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

Sound signal

AUDITORY MODEL

Cochlear subbands

Envelopes

Amplitude compression, simulating that of cochlea

(spectrogram)

84

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 85: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

1. Cochlear filters

2. Envelope & compression

3. Modulation filters

Sound signal

Cochlear subbands

Envelopes

Mod. bands

Will any set of statistics do? What if we measure statistics from model deviating from biology?

85

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 86: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Experiment: Original - Synth1 - Synth2 Which synthetic version sounds more realistic?

McDermott & Simoncelli, Neuron, 2011

86

Page 87: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Biologically inspired model is crucial - altering either filter bank, or compression, degrades synthesis:

Crowd Noise:

Biological

Non-biological

Helicopter:

Biological

Non-biological

McDermott & Simoncelli, Neuron, 2011

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics

of the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

87

Page 88: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Procedure is initialized with noise, so produces a different sound signal every time, sharing only statistical properties:

•Statistics define a class of sounds that include the original and many others.

•If the statistics measure what the brain is measuring, the samples should sound like another example of the original sound.

Original Example 1

Example 2 Example 3

88

Page 89: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Original

Statistics of non-biological model define a different class of sounds:

The sounds in the non-biological class don’t sound like the original, because they are not defined with the measurements the brain is making.

89

Page 90: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Time-averages provide invariance. •Many sound signals have the same statistics…

Original Example 1

Example 2 Example 3

Many sound signals have the same statistics:

90

Page 91: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Interesting possibility: If the brain just represents time-averaged statistics, different exemplars of a texture should be difficult to discriminate

But only for long excerpts:

91

Page 92: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Which sound was produced by a different source?

First, discrimination of different textures (diff. long-term statistics):

McDermott et al. 2013 © Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498.

92

Page 93: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Which sound was different from the other two?

What about excerpts of same texture (same long-term statistics)?

McDermott et al. 2013 © Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498. 93

Page 94: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Which sound was different from the other two?

Although information increases w. duration, performance declines!

McDermott et al. 2013 © Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498. 94

Page 95: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Details are apparently accrued into statistics…

•But details are not otherwise retained.

•Textures contain discriminable detail…

McDermott et al. 2013

© Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498.

95

Page 96: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Results suggest representation of time-averaged statistics.

(because statistics converge as duration increases)

McDermott et al. 2013 © Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498.96

Page 97: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Could difference be due to decay of memory for detail with time?

Fixed inter-onset interval:

Fixed inter-stimulus interval

© Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498. 97

Page 98: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Time delay per se has little effect:

© Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

98

Page 99: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Cueing subjects to beginning or end has little effect:

© Nature Neuroscience. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summary

statistics in auditory perception.“ Nature neuroscience 16, no. 4 (2013): 493-498.

99

Page 100: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Textures are normally generated from superposition of sources…

Single speaker

29 speakers (German cocktail party)

McDermott et al. 2013

100

© Nature Neuroscience. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 101: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Impairment at long durations is specific to textures, not present for single sources:

McDermott et al. 2013

101

© Nature Neuroscience. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 102: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

5 drum hits/sec

50 drum hits/sec

McDermott et al. 2013

102

© Nature Neuroscience. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 103: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Impairment at long durations again specific to dense textures.

McDermott et al. 2013

103

© Nature Neuroscience. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 104: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•High performance with short excerpts indicates that all stimuli have discriminable variation.

•But temporal detail is not retained when signals are texture- like.

McDermott et al. 2013

104

© Nature Neuroscience. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 105: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

A Speculative Framework:

• Sounds are encoded both as sequences of features, and with statistics that average information over time.

• Feature encodings are sparse for typical natural sound sources, but dense for textures.

• Memory capacity places limits on number of features that can be retained.

• Sound is continuously and obligatorily encoded • When memory capacity for feature sequences is

reached, memory is overwritten by incoming sound, leaving only statistics

105

© Nature Neuroscience. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. "Summarystatistics in auditory perception." Nature neuroscience 16, no. 4 (2013): 493-498.

Page 106: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Listen to original, then synthetic; rate realism. (170 sounds)

McDermott and Simoncelli, Neuron, 2011

© Elsevier. All rights reserved. This content is excluded from our Creative Commons license. For

more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics of

the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

106

Page 107: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Railroad crossing Tapping rhythm - quarter note pairs Wind chimes Running up stairs Tapping rhythm - quarter note triplets Snare drum beats Walking on gravel Snare drum rimshot sequence Music - drum break Music - mambo Bongo drum loop Firecracker explosions Person speaking French Church bells Person speaking English

1.93 1.90 1.77 1.77 1.70 1.67 1.63 1.60 1.60 1.50 1.50 1.47 1.40 1.37 1.20

Lowest rated sounds are among most interesting, as they imply brain is measuring something model is not:

Pitch Rhythm Pitch Reverb Rhythm Reverb Reverb Rhythm Pitch Rhythm Reverb Pitch Pitch Pitch

107

Page 108: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Sound synthesis can help us test/explore theories of audition. -variables that produce compelling synthesis could underlie perception. -synthesis failures point the way to new variables that might be important for the perceptual system. •Textures are a nice point of entry into real-world hearing •Many natural sounds may be recognized with relatively simple statistics of early auditory representations

-simplest statistics (spectrum) are not that informative -slightly more complex statistics are quite powerful -for textures of moderate length, statistics may be all we retain

TAKE-HOME MESSAGES

108

Page 109: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•Locus of time-averaging? •Presumptive integration windows of several seconds are long relative to typical timescales in auditory system…

•Relation to scene analysis?

•What happens when foreground sounds are superimposed on a texture?

•What statistics are needed to account for synthesis failures?

OPEN QUESTIONS

109

Page 110: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

retina/LGN filtering

cortical filtering

envelope

ACAC

AC ACC1C1

C1C1

AC AC

C2C2

M M

Courtesy of Elsevier, Inc., https://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception via statistics of

the auditory periphery: Evidence from sound synthesis." Neuron 71, no. 5 (2011): 926-940.

110

Page 111: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

111

© Elsevier. All rights reserved. This content is excluded from our Creative Commonslicense. For more information, see https://ocw.mit.edu/help/faq-fair-use/.Source: McDermott, Josh H., and Eero P. Simoncelli. "Sound texture perception viastatistics of the auditory periphery: Evidence from sound synthesis." Neuron 71, no.5 (2011): 926-940.

Page 112: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Part 4: Auditory Scene Analysis

112

Page 113: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Auditory Scene Analysis • Process of inferring events in the world from sound

© Paramount Pictures. All rights reserved. This content is excluded from our Creative

Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

113

Page 114: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Cocktail party problem: ear receives mixture of sources:

Source 1

Source 2

Mixture

+ =

The listener is usually interested in individual sources, which must be inferred from the mixture.

114

Page 115: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

+ =

Many sets of possible sounds add up to equal the observed mixture:

= +

Sound segregation is ill-posed:

The brain must choose the correct set over the other possibilities.

X + Y = 17

115

Page 116: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

How do we manage to hear?

+ = = +

Can only solve with assumptions about sources… Can only make assumptions about sources if real-world sound sources have some degree of regularity.

116

Page 117: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Real-world sounds are not random.

Real-world sounds Fully random sounds

Real-world sounds are a very small portion of all possible sounds.

117

Page 118: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

We rely on the regularities of natural sounds in order to hear.

+ = = +

118

Page 119: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

One intuitive view of inferring a target source from a mixture: 1) Determine grouping of observed sound elements.

Red: Other sources Green: Masked

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H. "The cocktail party problem." Current Biology 19,

no. 22 (2009): R1024-R1027.

119

Page 120: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Red: Other sources Green: Masked

One intuitive view of inferring a target source from a mixture: 2) Estimate parts of source that are masked.

Courtesy of Elsevier, Inc., http://www.sciencedirect.com. Used with permission.

Source: McDermott, Josh H. "The cocktail party problem." Current Biology 19,

no. 22 (2009): R1024-R1027. 120

Page 121: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

One example of a regularity that could be used to group sound: harmonic frequencies Voices and instruments produce frequencies that are harmonics (multiples) of a fundamental:

© Sinauer. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: Wolfe, Jeremy M., Keith R. Kluender, Dennis M. Levi, Linda M. Bartoshuk,

Rachel S. Herz, Roberta L. Klatzky, Susan J. Lederman, and Daniel M. Merfeld.

Sensation & perception. Sunderland, MA: Sinauer, 2006.

121

Page 122: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Frequency (KHz)

Am

plitu

de

(dB

atte

nuat

ion)

A440 on an oboe:

440Hz 440Hz

F0, 2F0, 3F0, 4F0, 5F0, 6F0, 7F0, …

Harmonicity is a key property of vocal, instrument sounds.

© Elsevier. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: McDermott, Josh H., and Andrew J. Oxenham. "Music perception, pitch, and

the auditory system." Current opinion in neurobiology 18, no. 4 (2008): 452-463.

122

Page 123: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Image of moving vocal chords removed due to copyright restrictions.

Please see the video.

123

When air is blown through the vocal cords, they open and close at regular time intervals, generating a periodic series of sound pulses:

Page 124: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Classic evidence for harmonicity as a grouping cue: When one frequency is not harmonically related to a bunch of others, it segregates perceptually.

124

Page 125: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

The pitch of a sound is inferred collectively from its harmonics.

© Sinauer. All rights reserved. This content is excluded from our Creative Commons

license. For more information, see https://ocw.mit.edu/help/faq-fair-use/.

Source: Wolfe, Jeremy M., Keith R. Kluender, Dennis M. Levi, Linda M. Bartoshuk,

Rachel S. Herz, Roberta L. Klatzky, Susan J. Lederman, and Daniel M. Merfeld.

Sensation & perception. Sunderland, MA: Sinauer, 2006.

125

Page 126: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Moore et al., 1986

For small mistunings, pitch of complex is shifted, but effect is reduced for larger mistunings:

Task: match the pitch of the mistuned complex with a normal complex

© The Journal of the Acoustical Society of America. All rights reserved. This content is

excluded from our Creative Commons license. For more information, see

https://ocw.mit.edu/help/faq-fair-use/.

Source: Moore, Brian CJ, Brian R. Glasberg, and Robert W. Peters. "Thresholds for hearing

mistuned partials as separate tones in harmonic complexes." The Journal of the Acoustical

Society of America 80, no. 2 (1986): 479-483.

126

Page 127: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

The Reynolds-McAdams Oboe

Evidence for grouping/segregation via common frequency modulation? Could be mediated via harmonicity alone…

127

Page 128: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Another potential grouping cue: repetition

Courtesy of National Academy of Sciences, U. S. A. Used with permission.

Source: McDermott, Josh H., David Wrobleski, and Andrew J. Oxenham. "Recovering sound

sources from embedded repetition." Proceedings of the National Academy of Sciences 108,

no. 3 (2011): 1188-1193. Copyright © 2011 National Academy of Sciences, U.S.A.

128

Page 129: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

+ =

Was the probe one of the sounds in the mixture?

Present mixture, then probe sound:

Can repetition be used to segregate sounds?

Sounds have some structure, but not enough to produce segregation of a single mixture.

McDermott, Wrobleski & Oxenham, PNAS 2011

Courtesy of National Academy of Sciences, U. S. A. Used with permission.

Source: McDermott, Josh H., David Wrobleski, and Andrew J. Oxenham. "Recovering sound

sources from embedded repetition." Proceedings of the National Academy of Sciences 108,

no. 3 (2011): 1188-1193. Copyright © 2011 National Academy of Sciences, U.S.A.

129

Page 130: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Listeners can identify sound sources from multiple mixtures:

McDermott, Wrobleski & Oxenham, PNAS 2011

https://ocwcms.mit.edu/resources/res-9-003-brains-minds-and-machines-summer-course-summer-2015/folder_contents

Courtesy of National Academy of Sciences, U. S. A. Used with permission.

Source: McDermott, Josh H., David Wrobleski, and Andrew J. Oxenham. "Recovering sound

sources from embedded repetition." Proceedings of the National Academy of Sciences 108,

no. 3 (2011): 1188-1193. Copyright © 2011 National Academy of Sciences, U.S.A.

130

Page 131: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

Summary of Generic Grouping Cues

• Common onset • Co-modulation • Harmonicity • No clear role for frequency modulation (common fate) • Weak role for spatial cues in concurrent segregation • Repetition

• How do grouping cues relate to natural sound statistics? • Are we optimal given the nature of real-world sounds? • Are grouping cues learned or hard-wired? • How important are generic cues relative to knowledge of

particular sounds (e.g. words)?

Open Questions

131

Page 132: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

•The brain estimates the causes of the sound signal that enters the ears (usually a mixture of sources).

•“Grouping cues” are presumed to be related to statistical regularities of natural sounds.

•Harmonicity

•Common onset

•Repetition

•The brain infers parts of source signals that are masked by other sources, again using prior assumptions.

•We need a proper theory in this domain to be able to predict and explain real-world performance.

TAKE-HOME MESSAGES

132

Page 133: Sound, Ears, Brains and the World - MIT OpenCourseWare · 2020-01-04 · When objects in the world vibrate, they transmit acoustic energy through surrounding medium in the form of

MIT OpenCourseWarehttps://ocw.mit.edu

Resource: Brains, Minds and Machines Summer CourseTomaso Poggio and Gabriel Kreiman

The following may not correspond to a particular course on MIT OpenCourseWare, but has beenprovided by the author as an individual learning resource.

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.