adsp 10 ac psycho acoustics ec623 adsp

Audio CodingPsychoacoustics

S. R. M. Prasanna

Dept of ECE,

IIT Guwahati,

[email protected]

Audio Coding – p. 1/45

ww

w.jntuw

orld.com

http://prosper.sourceforge.net/

Motivation

Acoustics: Study of sounds

Psychoacoustics: Study of perception of sounds

Deals with characterizing human auditory perception

Particularly time-frequency analysis capabilities of innerear

Audio coders achieve significant compression byexploiting the property that perceptually irrelevantinformation cannot be heard

Perceptually irrelevant information is identified byincorporating several psychoacoustic principles


ww

w.jntuw

orld.com

Human Speech Perception

Figure 1: Cross Section of Human Ear


ww

w.jntuw

orld.com

Functions of Human Ear

Mainly three regions - outer ear, middle ear & inner ear

Outer ear - directs speech pressure variations towardsthe middle ear

Middle ear - transforms pressure variations intomechanical motion

Inner ear - converts mechanical vibrations into electricalfirings in the auditory neurons, which leads to brain

Language decoding and message understanding at thehigher centers of learning in brain which is lessunderstood


ww

w.jntuw

orld.com

Inner Ear

Figure 2: Figures Related to Inner Ear


ww

w.jntuw

orld.com

Frequency to Place Transformation

Sound waves to mechanical vibrations by middle ear

Mechanical vibrations to traveling waves by inner earalong the length of basilar membrane

Neural receptors are connected along the length of thebasilar membrane

Traveling waves generate peak responses at frequencyspecific membrane positions

Therefore different neural receptors are effectivelytuned to different frequency bands according to theirlocations.


ww

w.jntuw

orld.com

Freq. to Place Tfmn. (contd.)

For sinusoidal stimuli, the peak response occurs nearthe basilar membrane region with a resonant freq.equal to input sinusoid freq.

Location of peak is characteristic place for the stimulus

Freq. that best excites a particular place ischaracteristic frequency

Thus a frequency to place transformation takes place


ww

w.jntuw

orld.com

Signal Processing Perspective

Bank of highly overlapping band pass filters

Magnitude responses are asymmetric

Bandwidths increase with frequency


ww

w.jntuw

orld.com

Sound Pressure Level (SPL)

A std. metric that quantifies the intensity of anacoustical stimulus

SPL gives the level (intensity) of sound pressure in dBsrelative to an internationally defined ref. level

LSPL = 20log10(p/p0) (dB)where LSPL is the SPL of a stimulus p, which is thesound pressure in pascals and p0 is the std. ref level of20 µPa

About 150 dB SPL spans the dynamic range of intensityfor human auditory system

Min value is the limit of detection for low intensity (quiet)stimuli

Max value is the threshold of pain for high intensity(loud) stimuli


ww

w.jntuw

orld.com

Absolute Threshold for Hearing (ATH)

Amount of energy needed in a pure tone such that it canbe detected by a listener in a noiseless environment

ATH is expressed in dB SPL

ATH is frequency dependent parameter and is given byTq(f) =

3.64(f/1000)−0.8 − 6.5e−0.6(f/1000−3.3)2 + 10−3(f/1000)4

dB(SPL)

In the context of signal compression, Tq(f) could beinterpreted naively as a maximum allowable energylevel for coding distortions introduced in the frequencydomain (Fig 5.1 from Spanias book)

Use of ATH to shape the coding distortion spectrumrepresents the first step towards perceptual coding.


ww

w.jntuw

orld.com

ATH Diagram

Figure 3: Absolute Threshold for Hearing


ww

w.jntuw

orld.com

Critical Bands (CB)

Critical band is a function of frequency that quantifiesthe cochlear filter passbands

CB tends to remain constant (about 100 Hz) up to 500Hz and increases to approximately 20% of the centerfrequency about 500 Hz

For an average listener the critical bandwidth is givenby BWc(f) = 25 + 75[1 + 1.4(f/100)2]0.69 (Hz)

The functionZb(f) = 13tan−1(0.00076f) + 3.5tan−1((f/7500)2) (Bark)is often used to convert frequency in Hz to Bark scale

Nonuniform Hz spacing of the filter bank is actuallyuniform on a Bark scale

One critical band (CB) comprises one Bark. (Table 5.1and Fig. 5.4)


ww

w.jntuw

orld.com

Critical Bands

Figure 4: Table Showing Critical Bands


ww

w.jntuw

orld.com

Mapping from Hz to Bark

Figure 5: Mapping from Hz to Bark Scale


ww

w.jntuw

orld.com

Simultaneous Masking

Masking: One sound is rendered inaudible because ofthe presence of another sound

Simultaneous masking: When two or more stimuli aresimultaneously presented to the auditory system

Freq. Domain: Relative shapes of the masker andmaskee magnitude spectra determine to what extentpresence of certain spectral energy will mask thepresence of other spectral energy

Time Domain: Phase relationships between stimuli canalso affect masking outcomes

In simple words presence of a strong noise or tonemasker creates an excitation of sufficient strength onthe basilar membrane at the critical band location toblock effectively detection of a weaker (maskee) signal.


ww

w.jntuw

orld.com

Types of Simultaneous Masking

Noise-Masking-Tone (NMT), Tone-Masking-Noise(TMN) and Noise-Masking-Noise (NMN)

NMT:A NB noise (1 Bark) masks a tone within the sameCB, provided intensity of masked tone is below apredictable thresholdSignal-to-Mask Ratio (SMR) (dB) is the differencebetween the intensities of masking and maskeeMin. SMR at the threshold of detection occurs whenmaskee freq is close to center freq of masker andwill be about 5 dB


ww

w.jntuw

orld.com

TMN and NMN

TMN:Pure tone at the center of a CB masks noise of anysubcritical BW, provided noise spectrum is below apredictable thresholdMin SMR lie between 21 and 28 dB

NMN:A NB noise masks another NB noiseMin SMR is nearly about 26 dB


ww

w.jntuw

orld.com

Masking Schemes

Figure 6: Masking schemes


ww

w.jntuw

orld.com

Asymmetry of Masking

The NMT and TMN show asymmetry in masking powerbetween noise masker and tone masker

In spite of both maskers at same db SPL, associatedthreshold SMRs differ by 20 dB

Hence the interest in all types of masking

Knowledge of all three is critical to succeed in the taskof shaping coding distortion

For each temporal analysis interval, a codec’sperceptual model should identify across the freqspectrum noise-like and tone-like components withinboth the audio signal and the coding distortion

Model should then apply appropriate maskingrelationships to obtain global masking threshold


ww

w.jntuw

orld.com

Spread of Masking

Simultaneous masking is not bandlimited to within theboundaries of a single CB

Interband masking also occurs, i.e., a masker centeredwithin one critical band has some predictable effect ondetection thresholds in other CBs.

This effect is known as spread of masking

A triangular spreading function that has slopes of +25and -10 dB per Bark.

SFdB(x) = 15.81 + 7.5(x + 0.474) − 17.5√

1 + (x + 0.474)2

dBwhere x in Barks and SFdB(x) is expressed in dB.


ww

w.jntuw

orld.com

Just Noticeable Distortion (JND)

Global masking threshold comprises an estimate of thelevel at which quantization noise becomes justnoticeable

Hence global masking threshold is sometimes referredto as JND


ww

w.jntuw

orld.com

Nonsimultaneous Masking

Also termed temporal masking

Masking phenomenon extends beyond window ofsimultaneous stimulus presentation

Masking occurs both prior to masker onset and alsoafter masker removal

Forward (post) and backward (pre) masking are the two


ww

w.jntuw

orld.com

Figure 7: Temporal Masking

22-1

www.jntuworld.com

Perceptual Entropy

Entropy gives min. no. of bits/sample required to storeor transmit given message block

Johnstan combined notion of psychoacoustic maskingwith signal quantization principles to define PerceptualEntropy (PE).

Perceptual Entropy gives min. no. of bits/samplerequired to store or transmit perceptually relevantinformation in given audio message block.

While discussing PE, conventional entropy is termed asstatistical entropy.

Statistical entropy employs the statistical properties ofthe signal for computing entropy

Perceptual entropy employs both statical andperceptual properties of signal for computing entropy.


ww

w.jntuw

orld.com

Basis for PE

Masking threshold indicates amount of quantzn. in freq.dom. without perceptually corrupting signal.

Assume that step size and no. of levels in the quantizerfor each spectral line could be set independently.

Further choice of step size is such that total noiseinjected at each frequency corresponds to maskingthreshold i.e., min no of quantization levels are used.

Then no. of bits required to encode entire transformrepresents min. no. of bits necessary to transmit thatblock of the signal.

The total number of bits divided by the no. of samples inthe transform represents per-sample rate.

This per-sample bit rate is Perceptual Entropy of signal.


ww

w.jntuw

orld.com

PE v/s SE

Statistical entropy (SE) exploits signal statistics

Perceptual entropy (PE) exploits signal statistics andalso psychoacoustic masking

No. of quantization levels just to avoid perceptualdistortion due to quantization by exploiting maskingthresholds.


ww

w.jntuw

orld.com

Steps for PE Computation

DFT computation

Finding Masking thresholds

Calculating no. of bits to quantize DFT spectrum


ww

w.jntuw

orld.com

DFT Computation

Windowing and frequency transformation

2048 sample DFT by FFT

1024 are considered for further analysis


ww

w.jntuw

orld.com

Calculation of Masking Threshold

Critical band analysis

Applying spreading function to critical band spectrum

Calculating Masking Thresholds

Accounting for absolute thresholds

Relating spread masking threshold to critical bandmasking threshold


ww

w.jntuw

orld.com

Critical Band Analysis

DFT spectrum is complex: S(ω) = Re(ω) + Im(ω)

Power Spectrum: P (ω) = Re2(ω) + Im2(ω)

P (ω) is partitioned into CBs

Energy in each CB: Bi =∑bhi

ω=bli P (ω)

Bi represents CB spectrum


ww

w.jntuw

orld.com

Spreading Function (SF)

CB spectrum threshold is also influenced by adjacentCBs which is accounted using SF.

SF is used to estimate effects of masking across CBs

SF is calculated for abs(j − i) ≤ 25, where i is bark freqof masked and j is bark freq of masking and placed intoa matrix Sij

Spread CB Spectrum: Ci = Sij ∗ Bi

Effect of spreading function is to spread peaks in Bi andalso raise threshold values, especially at higherfrequencies.


ww

w.jntuw

orld.com

Masking Thresholds

TMN is estimated as 14 + i dB below Ci, where i is barkfreq.

NMT is estimated as 5.5 dB below Ci uniformly acrossCB spectrum


ww

w.jntuw

orld.com

Tone Like and Noise Like Components

Spect. Flatness Measure: SFM = GM/AM

GM geometric mean of P (ω) and AM is arithmetic meanof P (ω)

SFMdB = 10log10(GM/AM )

Coeff. of tonality: α = min(SFMdB/SFMdBmax, 1)

SFMdbmax = −60 dB is used to estimate tonality

SFMdB = 0 indicate complete noise like

SFMdB = −30 dB indicates α = 0.5

SFMdB = −75 dB indicates α = 1.0


ww

w.jntuw

orld.com

Offset for Masking Energy

Oi = α(14.5 + i) + (1 − α)5.5 (dB), in each band i

Index α is used to geometrically weight the twothresholds

Oi is then subtracted from Ci to yield spread thresholdestimate Ti = 10log10(Ci)−Oi/10

Since spectrum spread fns. do not have normalizedgain, it is normalized by the DC gain for each CB

After normalization, bark thresholds are compared toabsolute thresholds.

Any CB that has bark threshold lower than absolutethreshold is changed to the absolute threshold

This will be the threshold used for computing bit rate.


ww

w.jntuw

orld.com

Calculation of Bit Rate

No. of quantization levels to follow signal in freq domain

Ti is in power d omain

Quantization energy must be spread across ki spectrallines in each CB

Assuming noise to spread equally across the entireband, noise energy will be δ2/12

Energy at each spectral freq = Ti/ki

Real and imaginary are quantized independently,= Ti/2ki

δ2/12 = Ti/2Ki =⇒ δ = T ′

i =√

(6Ti)/ki

T ′

i is step size.


ww

w.jntuw

orld.com

Computing PE

NRe(ω) = abs(nint(Re(ω)/T ′

i )) andNIm(ω) = abs(nint(Im(ω)/T ′

i )) for each ω within CB i.

Let N∗ represents actual (integer) quantized value ofeach line

If N(ReorIm)(ω) = 0, then N ′

(ReorIm)(ω) = 0

If N(ReorIm)(ω) 6= 0, then N ′

(ReorIm)(ω) = log2(2N∗(ω) + 1)

This operation assigns a bit rate of zero bits to anysignal with an amplitude that does not need to bequantized and assigns a bit ate of log2(no.oflevels) tothose that must be quantized.

Total bit rate =∑π

ω=0(N′

Re(ω) + N ′

Im(ω))

Rate per sample, PE = Totalbitrate/2048


ww

w.jntuw

orld.com

Example codec perceptual model

ISO/IEC 11172-3 (MPEG-1) Psychoacoustic Model-1

Determines max. allowable quantization noise energyin each CB such that it remains inaudible.

Blocking i/p audio into frames

High resolution spectral computation for each frame

For each frame tonal and noise maskers estimation

Decimation and reorganization of maskers

Calculation of individual masking thresholds forcomponents in each CB

Calculation of global masking thresholds for each CB


ww

w.jntuw

orld.com

Spectral Analysis

512 point DFT computation

Power Spectral Density (PSD) P (k) estimation, wherek = 1, 2, . . . , 512

−10

60

50

40

30

20

10

0

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

SP

L (

dB

)

Frequency (Hz)


ww

w.jntuw

orld.com

Identn. of Tonal and Noise Maskers

P (k) where k = 1, 2, . . . , 256 are considered

Local maxima in PSD within a certain Bark by at least 7dB are classified as tonal

Tonal set ST is defined as

ST = P (k)|P (k) > P (k ± 1)&P (k) > P (k ± ∆k) + 7dB

where

∆k ∈ 2 2 < k < 63(0.17 − 5.5kHz)

∆k ∈ [2, 3] 63 ≤ k < 127(5.5 − 11kHz)

∆k ∈ [2, 6] 127 ≤ k ≤ 256(11 − 20kHz)


ww

w.jntuw

orld.com

Tonal and Noise Maskers (contd.)

Tonal maskers PTM (k), are computed from spectralpeaks listed in ST :

PTM (k) = 10log10

1∑

j=−1

100.1P (k+j)(dB)

For each neighborhood max, energy from threeadjacent peaks combined to form a single tonal masker

For each CB, PNM (k̄) a single NM is then computedfrom (remaining) spectral lines not within the ±∆k

neighborhood of a tonal masker using the sumPNM (k̄) = 10log10

∑

j

100.1P (j)(dB)

∀P (j) 6= PTM (k, k ± 1, k ± ∆k)

where k̄ is geometric mean spectral line of CBAudio Coding – p. 39/45

ww

w.jntuw

orld.com

Decimation of Maskers

No. of maskers are reduced using two criteria

First, any tonal or noise maskers below abs. thresholdare discarded, i.e., PTM,NM (k) ≥ Tq(k) are retained.

Next, a sliding 0.5 Bark-wide window is used to replaceany pair of maskers occurring within a distance of 0.5Bark by the stronger of the two.

Masker freq. bins are reorganized using the decimationscheme

PTM,NM (i) = PTM,NM (k)

PTM,NM (k) = 0


ww

w.jntuw

orld.com

Decimation (contd.)

i = k, 1 ≤ k ≤ 48

i = k + (kmod2) 49 ≤ k ≤ 96

i = k + 3 − ((k − 1)mod4) 97 ≤ k ≤ 232

Net effect is 2 : 1 decimation of masker bins in CBs18-22

4:1 decimation of masker bins in CBs 22-35

With no loss of masking components.

Decimation reduces total no. of tone and noise maskerfreq. bins under consideration from 256 to 106


ww

w.jntuw

orld.com

Individual Masking Thresholds

Using decimated set of tonal and noise maskers,individual tone and noise masking thresholds arecomputed

Each individual threshold represents a maskingcontribution at freq. bin i due to the tone or noisemasker located at bin j

Tonal Masking Threshold, TTM (i, j) is given byTTM (i, j) = PTM (j)−0.2757zb(j)+SF (i, j)−6.025(dbSPL)where, PTM (j) is SPL of tonal masker in freq. bin j,zb(j) Bark freq of bin j and SF (i, j) is spreading ofmasking from bin j to bin i

Noise Masking Threshold, TNM (i, j) is given byTNM (i, j) = PNM (j)−0.175Zb(j)+SF (i, j)−2.025(dbSPL)where, PNM (j) is SPL of noise masker in freq bin j


ww

w.jntuw

orld.com

Global Masking Thresholds

Individual masking thresholds are combined to estimatea global masking threshold for each freq. binTg(i) = 10log10(100.1Tq(i) +

∑Ll=1 100.1TT M (i,l) +

∑Mm=1 100.1TNM (i,m))(db, SPL) where, L and M are the

number of tonal and noise maskers, respectively.

The number of bits are allocated based on the globalmasking thresholds and is termed as perceptual bitallocation.


ww

w.jntuw

orld.com

Expt. 5-AC- Audio Synthesis using MSE

Problem No. 2.25 (pp. 49) of Spanias book on AudioSignal Processing


ww

w.jntuw

orld.com

Expt. 6-AC- Audio Synthesis using Psychoacoustics

Problem No. 5.11 (pp. 142) of Spanias book on AudioSignal Processing


ww

w.jntuw

orld.com

adsp 10 ac psycho acoustics ec623 adsp

Documents