sgn-14006 / a.k. hearingsgn14006/pdf/l03-hearing.pdf · 2 ear physiology sgn-14006 / a.k. ! the...

10
Hearing 1 SGN-14006 / A.K. Contents: 1. Introduction 2. Ear physiology 3. Masking 4. Sound pressure level 5. Loudness 6. Pitch 7. Spatial hearing Hearing Sources: Rossing. (1990). ”The science of sound”. Chapters 5–7. Karjalainen. (1999). ”Kommunikaatioakustiikka”. Moore. (1997). ”An introduction to the psychology of hearing”. Hearing 2 SGN-14006 / A.K. 1 Introduction ! Auditory system can be divided in two parts Peripheral auditory system (outer, middle, and inner ear) Auditory nervous system (in the brain) ! Ear physiology studies the peripheral system ! Psychoacoustics studies the entire sensation: relationships between sound stimuli and the subjective sensation Hearing 3 SGN-14006 / A.K. 1.1 Auditory system ! Dynamic range of hearing is wide ratio of a very loud to a barely audible sound pressure level is 1:10 5 (powers 1:10 10 , 100 dB) ! Frequency range of hearing varies a lot between individuals only few can hear from 20 Hz to 20 kHz sensitivity to low sounds (< 100Hz) is not very good sensitivity to high sounds (> 12 kHz) decreases along with age ! Selectivity of hearing listener can pick an instrument from among an orchestra listener can follow a speaker at a cocktail party One can sleep in background noise but still wake up to an abnormal sound Hearing 4 SGN-14006 / A.K. 1.2 Psychoacoustics ! Perception involves information processing in the brain Information about the brain is limited ! Psychoacoustics studies the relationships between sound stimuli and the resulting sensations Attempt to model the process of perception For example trying to predict the perceived loudness / pitch / timbre from the acoustic properties of the sound signal ! In a psychoacoustic listening test Test subject listens to sounds Questions are made or the subject is asked to describe her sensasions

Upload: others

Post on 20-Jun-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 1 SGN-14006 / A.K.

Contents: 1.  Introduction 2.  Ear physiology 3.  Masking 4.  Sound pressure level 5.  Loudness 6.  Pitch 7.  Spatial hearing

Hearing Sources: Rossing. (1990). ”The science of sound”. Chapters 5–7.

Karjalainen. (1999). ”Kommunikaatioakustiikka”. Moore. (1997). ”An introduction to the psychology of hearing”.

Hearing 2 SGN-14006 / A.K. 1 Introduction

!  Auditory system can be divided in two parts –  Peripheral auditory system (outer, middle, and inner ear) –  Auditory nervous system (in the brain)

!  Ear physiology studies the peripheral system !  Psychoacoustics studies the entire sensation:

relationships between sound stimuli and the subjective sensation

Hearing 3 SGN-14006 / A.K. 1.1 Auditory system

!  Dynamic range of hearing is wide –  ratio of a very loud to a barely audible sound pressure level is

1:105 (powers 1:1010, 100 dB)

!  Frequency range of hearing varies a lot between individuals –  only few can hear from 20 Hz to 20 kHz –  sensitivity to low sounds (< 100Hz) is not very good –  sensitivity to high sounds (> 12 kHz) decreases along with age

!  Selectivity of hearing –  listener can pick an instrument from among an orchestra –  listener can follow a speaker at a cocktail party –  One can sleep in background noise but still wake up to an

abnormal sound

Hearing 4 SGN-14006 / A.K. 1.2 Psychoacoustics

!  Perception involves information processing in the brain –  Information about the brain is limited

!  Psychoacoustics studies the relationships between sound stimuli and the resulting sensations –  Attempt to model the process of perception –  For example trying to predict the perceived loudness / pitch /

timbre from the acoustic properties of the sound signal

!  In a psychoacoustic listening test –  Test subject listens to sounds –  Questions are made or the subject is asked to describe her

sensasions

Page 2: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 5 SGN-14006 / A.K. 2 Ear physiology

!  The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 6 SGN-14006 / A.K. 2.1 Outer ear

!  Outer ear consists of: –  pinna – gathers sound; direction-dependent response –  auditory canal (ear canal) - conveys sound to middle ear

Nerve signal to brain

[Chittka05]

Hearing 7 SGN-14006 / A.K. 2.2 Middle ear

!  Middle ear contains –  Eardrum that transforms sound waves into mechanic vibration –  Tiny audtory bones: hammer (resting against the eardrum, see

figure), anvil and stirrup

!  The bones transmit eardrum vibrations to the oval window of the inner ear

!  Acoustic reflex: when sound pressure level exceeds ~80 dB, eardrum tension increases and stirrup is removed from oval window –  Protects the inner ear

from damage

Hearing 8 SGN-14006 / A.K. 2.3 Inner ear, cochlea

!  The inner ear contains the cochlea: a fluid-filled organ where vibrations are converted into nerve impulses to the brain.

!  Cochlea = Greek: “snail shell”. !  Spiral tube: When stretched out, approximately 30 millimeters long. !  Vibrations on the cochlea’s oval window cause hydraulic pressure

waves inside the cochlea !  Inside the cochlea there is

the basilar membrane, !  On the basilar membrane

there is the organ of Corti with nerve cells that are sensitive to vibration

!  Nerve cells transform movement information into neural impulses in the auditory nerve

Page 3: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 9 SGN-14006 / A.K. 2.4 Basilar membrane

!  Figure: cochlea stretched out for illustration purposes –  Basilar membrane divides the fluid of the cochlea into separate

tunnels –  When hydraulic pressure waves travel along the cochlea, they

move the basilar membrane

Hearing 10 SGN-14006 / A.K. Basilar membrane

Best freq (Hz)

Travelling waves:

!  Different frequencies produce highest amplitude at different sites !  Preliminary frequency analysis happens on the basilar membrane

Hearing 11 SGN-14006 / A.K.

!  Distributed along the basilar membrane are sensory hair cells that transform membrane movement into neural impulses

!  When a hair cell bends, it generates neural impulses –  Impulse rate depends on vibrate amplitude and frequency

!  Each nerve cell has a characteristic frequency to which it is most responsive to (Figure: �tuning curves� of 6 different cells)

2.5 Sensory hair cells Hearing 12

SGN-14006 / A.K. 3 Masking

!  Masking describes the situation where a weaker but clearly audible signal (maskee, test tone) becomes inaudible in the presence of a louder signal (masker)

!  Masking depends on both the spectral structure of the sounds and their variation over time

Page 4: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 13 SGN-14006 / A.K. 3.1 Masking in frequency domain

!  Model of the frequency analysis in the auditory system –  subdivision of the frequency axis into critical bands –  frequency components within a same critical band mask each

other easily –  Bark scale: frequency scale that is derived by mapping

frequencies to critical band numbers !  Narrowband noise masks a tone (sinusoidal) easier than

a tone masks noise !  Masked threshold refers to the raised threshold of

audibility caused by the masker –  sounds with a level below the masked threshold are inaudible –  masked threshold in quiet = threshold of hearing in quiet

Hearing 14 SGN-14006 / A.K. Masking in frequency domain

!  Figure: masked thresholds [Herre95] –  masker: narrowband noise around 250 Hz, 1 kHz, 4 kHz –  spreading function: the effect of masking extends to the spectral

vicinity of the masker (spreads more towards high freqencies)

!  Additivity of masking: joint masked thresh is approximately (but slightly more than) sum of the components

Hearing 15 SGN-14006 / A.K. 3.2 Masking in time domain

!  Forward masking –  masking effect extends to times after the masker is switched off

!  Backwards masking –  masking extends to times before the masker is been switched on

!  Forward/backward masking does not extend far in time " simultaneous masking is more important phenomenon

forward masking

backward masking

Hearing 16 SGN-14006 / A.K. Masking: Examples

!  A single tone is played, followed by the same tone and a higher frequency tone. HF tone is reduced in intensity first by 12 dB, then by steps of 5 dB. Sequence repeats twice: second time the frequency separation between the tones is increased.

!  Attempt to mask higher frequencies !  Attempt to mask lower frequencies (not masked as

easily)

Page 5: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 17 SGN-14006 / A.K. Application to audio steganography

!  Idea: hide a message in the audio data, keeping the message inaudible yet decodable

!  Example –  Here robustness to environmental noise was important

Hearing 18 SGN-14006 / A.K. 4 Sound pressure level

!  Sound signal s1(t) at time t represents pressure deviation from normal atmospheric pressure

!  Sound pressure is the (linear) RMS-level of the signal –  E{ } denotes expectation (RMS = root-mean-square level)

!  Due to the wide dynamic range, decibel scale is convenient –  pdB = 20 log10 (pRMS / p0) = Lp

where p0 is a reference pressure

2{ ( ) }RMSp E s t=

Hearing 19 SGN-14006 / A.K. 4.1 Threshold of hearing and dB scale

!  Threshold of hearing –  Weakest audible sound pressure at 1 kHz frequency is 20 µPa,

which has been chosen to be the reference level p0 of the dBscale –  Lp = 20 log10(p/p0) = 10 log10(p2/p0

2) !  Threshold of pain

–  Loudest sound that the auditory system can meaningfully deal with

–  130 dB @ 1 kHz

Hearing 20 SGN-14006 / A.K. 4.2 Multiple sources

!  Two sound sources: s(t) = s1(t) + s2(t) !  RMS pressure level of the summary signal:

!  If the signals are uncorrelated and the above formula simplifies to If p1 = p2, the sound pressure level of the summary signal is 3 dB higher than that of p1 (why?)

2 2 21 1 2 2{ ( ) } { ( ) 2 ( ) ( ) ( ) }RMSp E s t E s t s t s t s t= = + +

1 2{ ( ) ( )} 0E s t s t =

2 21 2RMSp p p= +

Page 6: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 21 SGN-14006 / A.K. Multiple sources

!  Two sources with 80 dB sound pressure level –  Source signals uncorrelated: together produce 83 dB level –  Sources correlate perfectly (same sound): results in 86 dB level

!  Doubling the sound amplitude increases the sound pressure level by 6 dB –  Because: Lp = 20 log10(2·p/p0) = 20 log10(p/p0) + 6 [dB] –  Equivalent to adding another identical source next to the first one

!  Intuitively: if the two sources do not correlate, the components of the two audio signals may amplify or cancel out each other, depending on their relative phases, and hence the level will be only 83 dB

Hearing 22 SGN-14006 / A.K. 5 Loudness

!  Loudness describes the subjective level of sound –  Perception of loudness is relatively complex, but –  consistent phenomenon and –  one of the central parts of psychoacoustics

!  The loudness of a sound can be compared to a standardized reference tone, for example 1000 Hz sinusoidal tone –  Loudness level (phon) is defined to be the sound pressure level

(dB) of a 1000 Hz sinusoidal, that has the the same subjective loudness as the target sound

–  For example if the heard sound is perceived as equally loud as 40 dB 1kHz sinusoidal, is the loudness level 40 phons

Hearing 23 SGN-14006 / A.K. 5.1 Equal-loudness curves

Soun

d pr

essu

re le

vel (

dB)

Frequency (Hz)

Loudness level (phons)

Hearing 24 SGN-14006 / A.K. 5.3 Critical bands

!  Listening to two sinusoids with nearby frequencies and increasing their frequency difference, the perceived loudness increases when the frequency difference exceeds critical bandwidth –  Figure: 1 kHz @ 60 dB, Critical bandwidth is 160 Hz at 1 kHz

!  Ear analyzes sound at critical band resolution. Each critical band contributes to the overall loudness level

Page 7: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 25 SGN-14006 / A.K.

!  Loudness of a complex sound is calculated by using so-called loudness density as intermediate unit

!  Loudness density at each critical band is (roughly) proportional to the log-power of the signal at the band (weighted according to sensitivity of hearing and spread slightly by convolving over frequency)

!  Overall loudness is obtained by summing up loudness density values from each critical band

!  Figure: integration of loudness for a sinusoidal tone and for wideband noise

5.4 Loudness of a complex sound

Frequency / Bark Loud

ness

den

sity

(son

es /

Bar

k)

Hearing 26 SGN-14006 / A.K. 6 Pitch

!  Pitch –  Subjective attribute of sounds that enables us to arrange them on

a frequency-related scale ranging from low to high –  Sound has a certain pitch if human listaners can consistently

match the frequency of a sinusoidal tone to the pitch of the sound

!  Fundamental frequency vs. pitch –  Fundamental frequency is a physical attribute –  Pitch is a perceptual attribute –  Both are measured in Hertz (Hz) –  In practise, perceived pitch ≈ fundamental frequency

!  "Perfect pitch" or "absolute pitch" - ability to recognize the pitch of a musical note without any reference –  Minority of the population can do that

Hearing 27 SGN-14006 / A.K. 6.1 Harmonic sound

!  For a sinusoidal tone –  Fundamental frequency = sinusoidal frequency –  Pitch ≈ sinusoidal frequency

!  Harmonic sound

Trumpet sound: * Fundamental frequency F = 262 Hz * Wavelength 1/F = 3.8 ms

Hearing 28 SGN-14006 / A.K. 6.2 Pitch perception

!  Pitch perception has been tried to explain using two competing theories –  Place theory: “Peak activity along the basilar membrane

determines pitch” (fails to explain missing fundamental) –  Periodicity theory: “Pitch depends on rate, not place, of response.”

Neurons fire in sync with signals !  The real mechanism

is a combination of the above –  Sound is subdivided into

subbands (critical bands) –  Periodicity of the

amplitude envelope (see lowest panel) is analyzed within bands

–  Results are combined across bands

Page 8: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 29 SGN-14006 / A.K. 6.3 Perceptually-motivated frequency scales

mm. on basilar membrane frequency / kHz frequency / mel frequency / Bark

Hearing 30 SGN-14006 / A.K. Subjective attributes of sound

!  Sounds are typically described using four main attributes –  loudness, pitch, timbre, and duration

!  Table: dependence of the subjective attributes on physical parameters

–  ♦♦♦ = strongly dependent, ♦♦ = to some extent ♦ = weak dependency

Loudness Pitch Timbre Duration

Pressure ♦ ♦ ♦ ♦ ♦ ♦

Frequency ♦ ♦ ♦ ♦ ♦ ♦ ♦

Spectrum ♦ ♦ ♦ ♦ ♦ ♦

Duration ♦ ♦ ♦ ♦ ♦ ♦

Envelope ♦ ♦ ♦ ♦ ♦

Subjective attribute

Phys

ical

par

amet

er

Hearing 31 SGN-14006 / A.K. 7 Spatial hearing

!  The most important auditory cues for localizing a sound sources in space are 1.  Interaural time difference 2.  Interaural intensity difference 3.  Direction-dependent filtering of the sound spectrum by head and

pinnae

!  Terms –  Monaural : with one ear –  Binaural : with two ears –  Interaural : between the ears (interaural time difference etc) –  Lateralization : localizing a source in horizontal plane

Hearing 32 SGN-14006 / A.K. 7.1 Monaural source localization

!  Diretional hearing works to some extent even with one ear !  Head and pinna form a direction-dependent filter

–  Direction-dependent changes in the spectrum of the sound arriving in the ear can be described with HRTFs

–  HRTF = head-related transfer function !  HRTFs are crucial

for localizing sources in the median plane (vertical localization)

Page 9: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 33 SGN-14006 / A.K. Monaural source localization

!  HRTFs can be measured by recording –  Sound emitted by a source –  Sounds arriving to the auditory canal or eardrum (transfer function

of the auditory canal does not vary along with direction) !  In practice

–  left: microphone in the ear of a test subject, OR

–  right: head and torso simulator

Hearing 34 SGN-14006 / A.K. 7.2 Localizing a sinusoidal

!  Experimenting with sinusoidal tones helps to understand the localization of more complex sounds

!  Angle-of-arrival perception for sinusoids below 750 Hz is based mainly on interaural time difference

Hearing 35 SGN-14006 / A.K. Localizing a sinusoidal

!  Interaural time difference is useful only up to 750 Hz –  Above that, the time difference is ambiguous, since there are

several wavelengths within the time difference –  Moving the head (or source movement) helps: can be done up to

1500 Hz !  At higher frequencies

(> 750 Hz) the auditory system utilizes interaural intensity difference –  Head causes and acoustic

”shadow” (sound level is lower behind the head)

–  Works especially at high frequencies

Hearing 36 SGN-14006 / A.K. 7.3 Localizing complex sounds

!  Complex sounds refer to sounds that –  involve a number of different frequency components and –  vary over time

!  Localizing sound sources is typically a result of combining all the above-described mechanisms 1.  Interaural time difference (most important) 2.  Interaural intensity difference 3.  HRTFs

!  Wideband noise: directional hearing works well

Page 10: SGN-14006 / A.K. Hearingsgn14006/PDF/L03-hearing.pdf · 2 Ear physiology SGN-14006 / A.K. ! The human ear consists of three main parts: (1) outer ear, (2) middle ear, (3) inner ear

Hearing 37 SGN-14006 / A.K. 7.4 Lateralization in headphone listening

!  When listening with headphones, the sounds are often localized inside the head, on the axis between the ears –  Sound does not seem to come from outside the head because the

diffraction caused by pinnae and head is missing –  If the sounds are processed with HRTFs carefully, they move outside the

head