cs 551/651: structure of spoken language lecture 10: overview of sound perception john-paul hosom...

24
CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Upload: augusta-carroll

Post on 11-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

CS 551/651:Structure of Spoken Language

Lecture 10: Overview of Sound Perception

John-Paul HosomFall 2010

Page 2: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

Page 3: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

The Outer Ear

composed of: pinna (auricle) and external auditory meatus (ear canal)

functions:• irregular shape of pinna directs high-frequency sounds into ear

canal• shape of pinna helps with determining location of sound • ear canal acts as resonator (2.7 cm long), with broad resonance

between 3 to 5 kHz

implications:smaller animals better at hearing high-frequency sounds

Page 4: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

The Middle Ear

composed of: • chamber (tympanic cavity) containing ossicular chain:

malleus (hammer), incus (anvil), stapes (stirrup)• middle ear also contains epitympanic recess. (The ossicles are

lodged in the epitympanic recess.)

• tympanic membrane (ear drum) is partition between ear canal (outer ear) and middle ear

• sound transferred from tympanic membrane to cochlea (inner ear) via ossicular chain

• stapes connects to footplate, which connects to “oval window,”which is membrane of inner ear

functions:• matches acoustic vibration of air to that of fluid in cochlea

(if air directly hits oval window (water), there’s a 30 dB drop in energy)

• has low-pass filter effect

Page 5: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

The Inner Ear

composed of: cochlea, semi-circular canals

function:simply speaking, the inner ear performs a frequency analysis of the incoming sound, which is transmitted via VIIIth cranial nerve to CNS.

cochlea: • spiral in shape• 35 mm long, wound in 2 ¾ turns• filled with “incompressible” “water-like” fluid• separated into 3 parts by two membranes, the Basilar

Membrane (BM) and Vestibular (Reissner’s) Membrane• thousands of hair cells are attached to BM; these cells are

connected to neurons that fire in response to sound

Page 6: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

The Inner Ear

cochlea:• Sound from ear canal is amplified by middle ear

• Vibration of bone against oval window is received by cochlea (not just at oval window, but by entire cochlea; no standing waves);

• Entire cochlea vibrates at the same frequency as the stimulus

• Different locations of the cochlea respond better to particular frequencies;

• Higher frequencies respond more near base of cochlea.

• Cochlea and VIIIth nerve have tonotopic organization: direct mapping between place and frequency

Page 7: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

A schematic of the cochlea “unrolled” (middle) and basilar membrane (bottom). The top figure indicates the tonotopic organization.

(from J.D. Durrant and J.H. Lovrinic, Bases of Hearing Science, 1977, in Daniloff p. 395)

Page 8: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

This figure shows instantaneous displacement of the BM for two instants in time, in response to 200-Hz sine wave, and theenvelope of amplitude peaks for this wave. Each point on BM vibrates at a frequency equal to the input frequency (200 Hz).

Page 9: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Tonotopic organization

• BM varies in tautness and shape along its length, creating different frequency responses

• Tautness at base responds well to high-frequency sounds; compliance at apex (tip) responds well to low-frequency sounds.

• Each point in BM has a “characteristic frequency” (CF) at which the frequency response is maximum

• The bandpass shape of a CF filter has constant ratio of frequency to bandwidth, implying better resolution (lower bandwidth) at lower frequencies

Anatomy

Page 10: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Transduction

• Between BM and tectorial membrane (A thin, responsive, gelatinous membrane) are hair cells; about 25,000-30,000 outer hair cells, 3500-5000 inner hair cells in humans. (“Tunnel of Corti” separates outer from inner). Each hair cell has 30-300 hair-like projections called stereocilia protruding from the surface into the fluid-filled cavity in a “bundle.”

• When BM vibrates up and down, it creates a “shearing” motion between tectorial membrane and stereocilia. This motion causes tips of stereocilia to be displaced, causing electrical action potentials in a hair cell; the electrical signal is then transmitted down auditory nerve.

Anatomy

Page 11: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Transduction

• Most (95%) neurons connecting cochlea to higher levels in auditory system connect to inner hair cells

• Function of outer hair cells less clear; provides amplification, sharp tuning (partially under the control of higher levels).

• Hair cells connect to neurons; about 30,000 neurons in one human auditory nerve .

Anatomy

Page 12: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

1-Inner hair cell2-Outer hair cells3-Tunnel of Corti4-Basilar membrane5-Habenula perforata 6-Tectorial membrane7-Deiters' cells8-Space of Nuel9-Hensen's cells10-Inner spiral sulcus

from: http://www.iurc.montp.inserm.fr/cric/audition/english/corti/corti.htm

Anatomy

Organ of Corti contains hair cells and neurons:

11- nerves

three parallel rows

11

Page 13: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Anatomy

The same picture from Gray’s Anatomy (1918):

Page 14: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

Each neuron in the auditory nerve responds to certain frequencies; the response to each frequency can be plotted by stimulating a neuron with a particular frequency and measuring the response rate (firing rate) of the neuron

The most sensitive frequency is the “Characteristic Frequency” (CF)

Page 15: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

Auditory firings processed by two types of neurons: • ones extracting precise temporal features (onset chopper units), • others for spectral features (transient chopper units).

(O’Shaughnessy p. 113)

Each neuron has spontaneous rate of firing; this rate depends on the sensitivity of the neuron (high spontaneous rates associated with low threshold of firing).

3 “groups” of spontaneous rates: high rate (61%, 18 to 250 spikes/sec), medium rate (23%, 0.5 to 18 spikes/sec), low rate (16%, <0.5 spikes/sec);.

Page 16: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

The firing rate of a neuron to a given stimulus can be plotted:

• Firing rate has a dynamic range; if intensity is below or above this range, firing rate won’t change.

• Typical range of 20 dB for low-threshold fibers, 40-50 dB for high-threshold fibers

audiovisualdetectionlevel =threshold

Page 17: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

With three groups of neurons with different thresholds and firingrates, can cover wide range of signal levels at a given frequency:

high rate low rate

Page 18: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Phase Locking

• In addition to encoding frequency according to place along the BM, information is encoded in the rate of neuron firing

• Upper limit of 4 to 5 kHz for phase locking

Neural Response

This figure shows the number of neuron firings over time in response to three different tones; the timing of the firings is related to the frequency of the tone

msecmsecmsec

coun

t

2.45 msec/group = 408 Hz 1.18 msec/group = 850 Hz 1.0 msec/group = 1000 Hzfirings

tone =

Page 19: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

Neural Recruitment

Another “method” for increasing dynamic range is for multiple neurons to fire in response to the same stimulus

If stimulus is low in energy, a small number of neurons, located near the CF, fire

More intense stimuli cause more neurons, located farther fromthe CF, to fire

weak stimulus

strongstimulus

(samefrequency)

1 line =50–100fibers

Page 20: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Neural Response

Adaptation

If stimulus remains, neurons “adapt” to it, decreasing the firing rate with an exponential rate of decay (time constant 40 msec).

Most of decay occurs within 15-20 msec of stimulus onset.

When stimulus removed, firing rate falls to near zero andthen exponentially increases back to “spontaneous” rate.

There may be two classes of neurons:

• neurons that respond to steady-state sounds,

• neurons that respond to changes in frequency, with frequencysensitivity greatest at levels near human speech (O’Shaugnessy p. 119)

Page 21: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

Hearing Threshold

This figure shows the absolute thresholds of hearing, as a function of frequency

Page 22: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

JND

“Just Noticeable Difference”: measure of ability to perceive a difference

JND tests:• Two stimuli differ along one dimension, otherwise identical

• Subjects asked if two sounds are the same or different (“AX” test, is X=A?)

• Or subjects are asked which of two sounds most resembles third (“ABX” or “AXB” test, is X = A or B?)

• The JND occurs when 75% of responses are “different” (AX) or correctly identified (ABX)

People are able to discriminate between 100 Hz and 101 Hz,but can’t identify if a tone is 100, 101, … , 109 Hz withoutmaking pairwise comparisons

Page 23: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

JND

JND Trivia:

JND is greater for louder sounds, sounds with duration 250 msec

Sounds of equal intensity increase in loudness up to 200 msec

Below 1 kHz, two tones must be different by 1 to 3 Hz to be perceived as different

At higher frequencies, JND is larger (approx. 8 kHz tones require a 100 Hz separation to be heard as different)

Entire frequency range has 1600 distinguishable frequencies and 350 intensities, or about 300,000 tones of frequency and intensity that can be identified in pairwise combination (for durations > 200 msec)

For duration < 250 msec, there are 850 frequency levels; for duration < 10 msec only 120 levels and 170 intensities

Identification of frequencies in isolation yields much fewer tones.

Page 24: CS 551/651: Structure of Spoken Language Lecture 10: Overview of Sound Perception John-Paul Hosom Fall 2010

JND

JND Trivia, Timing Information

Onsets of two signals must differ by at least 2 msec to be heard as separate sounds

To identify order of two signals, about 17 msec gap is requiredand sounds must be 125-200 msec long

However, people use rise and fall of amplitude to segment speech; can not identify order of 4 vowels of 200 msec duration in repeating sequence, but can identify much shorter vowels if there are amplitude onsets and offsets as well as 50 msec gap between vowels.

Sounds with energy onset < 20 msec heard as “plucks”; otherwise,heard as “bow”