adsp 10 ac psycho acoustics ec623 adsp

46
Audio Coding Psychoacoustics S. R. M. Prasanna Dept of ECE, IIT Guwahati, [email protected] Audio Coding – p. 1/4 www.jntuworld.com

Upload: ali-mirza

Post on 28-Nov-2014

653 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ADSP 10 AC Psycho Acoustics EC623 ADSP

Audio CodingPsychoacoustics

S. R. M. Prasanna

Dept of ECE,

IIT Guwahati,

[email protected]

Audio Coding – p. 1/45

ww

w.jntuw

orld.com

Page 2: ADSP 10 AC Psycho Acoustics EC623 ADSP

Motivation

Acoustics: Study of sounds

Psychoacoustics: Study of perception of sounds

Deals with characterizing human auditory perception

Particularly time-frequency analysis capabilities of innerear

Audio coders achieve significant compression byexploiting the property that perceptually irrelevantinformation cannot be heard

Perceptually irrelevant information is identified byincorporating several psychoacoustic principles

Audio Coding – p. 2/45

ww

w.jntuw

orld.com

Page 3: ADSP 10 AC Psycho Acoustics EC623 ADSP

Human Speech Perception

Figure 1: Cross Section of Human Ear

Audio Coding – p. 3/45

ww

w.jntuw

orld.com

Page 4: ADSP 10 AC Psycho Acoustics EC623 ADSP

Functions of Human Ear

Mainly three regions - outer ear, middle ear & inner ear

Outer ear - directs speech pressure variations towardsthe middle ear

Middle ear - transforms pressure variations intomechanical motion

Inner ear - converts mechanical vibrations into electricalfirings in the auditory neurons, which leads to brain

Language decoding and message understanding at thehigher centers of learning in brain which is lessunderstood

Audio Coding – p. 4/45

ww

w.jntuw

orld.com

Page 5: ADSP 10 AC Psycho Acoustics EC623 ADSP

Inner Ear

Figure 2: Figures Related to Inner Ear

Audio Coding – p. 5/45

ww

w.jntuw

orld.com

Page 6: ADSP 10 AC Psycho Acoustics EC623 ADSP

Frequency to Place Transformation

Sound waves to mechanical vibrations by middle ear

Mechanical vibrations to traveling waves by inner earalong the length of basilar membrane

Neural receptors are connected along the length of thebasilar membrane

Traveling waves generate peak responses at frequencyspecific membrane positions

Therefore different neural receptors are effectivelytuned to different frequency bands according to theirlocations.

Audio Coding – p. 6/45

ww

w.jntuw

orld.com

Page 7: ADSP 10 AC Psycho Acoustics EC623 ADSP

Freq. to Place Tfmn. (contd.)

For sinusoidal stimuli, the peak response occurs nearthe basilar membrane region with a resonant freq.equal to input sinusoid freq.

Location of peak is characteristic place for the stimulus

Freq. that best excites a particular place ischaracteristic frequency

Thus a frequency to place transformation takes place

Audio Coding – p. 7/45

ww

w.jntuw

orld.com

Page 8: ADSP 10 AC Psycho Acoustics EC623 ADSP

Signal Processing Perspective

Bank of highly overlapping band pass filters

Magnitude responses are asymmetric

Bandwidths increase with frequency

Audio Coding – p. 8/45

ww

w.jntuw

orld.com

Page 9: ADSP 10 AC Psycho Acoustics EC623 ADSP

Sound Pressure Level (SPL)

A std. metric that quantifies the intensity of anacoustical stimulus

SPL gives the level (intensity) of sound pressure in dBsrelative to an internationally defined ref. level

LSPL = 20log10(p/p0) (dB)where LSPL is the SPL of a stimulus p, which is thesound pressure in pascals and p0 is the std. ref level of20 µPa

About 150 dB SPL spans the dynamic range of intensityfor human auditory system

Min value is the limit of detection for low intensity (quiet)stimuli

Max value is the threshold of pain for high intensity(loud) stimuli

Audio Coding – p. 9/45

ww

w.jntuw

orld.com

Page 10: ADSP 10 AC Psycho Acoustics EC623 ADSP

Absolute Threshold for Hearing (ATH)

Amount of energy needed in a pure tone such that it canbe detected by a listener in a noiseless environment

ATH is expressed in dB SPL

ATH is frequency dependent parameter and is given byTq(f) =

3.64(f/1000)−0.8 − 6.5e−0.6(f/1000−3.3)2 + 10−3(f/1000)4

dB(SPL)

In the context of signal compression, Tq(f) could beinterpreted naively as a maximum allowable energylevel for coding distortions introduced in the frequencydomain (Fig 5.1 from Spanias book)

Use of ATH to shape the coding distortion spectrumrepresents the first step towards perceptual coding.

Audio Coding – p. 10/45

ww

w.jntuw

orld.com

Page 11: ADSP 10 AC Psycho Acoustics EC623 ADSP

ATH Diagram

Figure 3: Absolute Threshold for Hearing

Audio Coding – p. 11/45

ww

w.jntuw

orld.com

Page 12: ADSP 10 AC Psycho Acoustics EC623 ADSP

Critical Bands (CB)

Critical band is a function of frequency that quantifiesthe cochlear filter passbands

CB tends to remain constant (about 100 Hz) up to 500Hz and increases to approximately 20% of the centerfrequency about 500 Hz

For an average listener the critical bandwidth is givenby BWc(f) = 25 + 75[1 + 1.4(f/100)2]0.69 (Hz)

The functionZb(f) = 13tan−1(0.00076f) + 3.5tan−1((f/7500)2) (Bark)is often used to convert frequency in Hz to Bark scale

Nonuniform Hz spacing of the filter bank is actuallyuniform on a Bark scale

One critical band (CB) comprises one Bark. (Table 5.1and Fig. 5.4)

Audio Coding – p. 12/45

ww

w.jntuw

orld.com

Page 13: ADSP 10 AC Psycho Acoustics EC623 ADSP

Critical Bands

Figure 4: Table Showing Critical Bands

Audio Coding – p. 13/45

ww

w.jntuw

orld.com

Page 14: ADSP 10 AC Psycho Acoustics EC623 ADSP

Mapping from Hz to Bark

Figure 5: Mapping from Hz to Bark Scale

Audio Coding – p. 14/45

ww

w.jntuw

orld.com

Page 15: ADSP 10 AC Psycho Acoustics EC623 ADSP

Simultaneous Masking

Masking: One sound is rendered inaudible because ofthe presence of another sound

Simultaneous masking: When two or more stimuli aresimultaneously presented to the auditory system

Freq. Domain: Relative shapes of the masker andmaskee magnitude spectra determine to what extentpresence of certain spectral energy will mask thepresence of other spectral energy

Time Domain: Phase relationships between stimuli canalso affect masking outcomes

In simple words presence of a strong noise or tonemasker creates an excitation of sufficient strength onthe basilar membrane at the critical band location toblock effectively detection of a weaker (maskee) signal.

Audio Coding – p. 15/45

ww

w.jntuw

orld.com

Page 16: ADSP 10 AC Psycho Acoustics EC623 ADSP

Types of Simultaneous Masking

Noise-Masking-Tone (NMT), Tone-Masking-Noise(TMN) and Noise-Masking-Noise (NMN)

NMT:A NB noise (1 Bark) masks a tone within the sameCB, provided intensity of masked tone is below apredictable thresholdSignal-to-Mask Ratio (SMR) (dB) is the differencebetween the intensities of masking and maskeeMin. SMR at the threshold of detection occurs whenmaskee freq is close to center freq of masker andwill be about 5 dB

Audio Coding – p. 16/45

ww

w.jntuw

orld.com

Page 17: ADSP 10 AC Psycho Acoustics EC623 ADSP

TMN and NMN

TMN:Pure tone at the center of a CB masks noise of anysubcritical BW, provided noise spectrum is below apredictable thresholdMin SMR lie between 21 and 28 dB

NMN:A NB noise masks another NB noiseMin SMR is nearly about 26 dB

Audio Coding – p. 17/45

ww

w.jntuw

orld.com

Page 18: ADSP 10 AC Psycho Acoustics EC623 ADSP

Masking Schemes

Figure 6: Masking schemes

Audio Coding – p. 18/45

ww

w.jntuw

orld.com

Page 19: ADSP 10 AC Psycho Acoustics EC623 ADSP

Asymmetry of Masking

The NMT and TMN show asymmetry in masking powerbetween noise masker and tone masker

In spite of both maskers at same db SPL, associatedthreshold SMRs differ by 20 dB

Hence the interest in all types of masking

Knowledge of all three is critical to succeed in the taskof shaping coding distortion

For each temporal analysis interval, a codec’sperceptual model should identify across the freqspectrum noise-like and tone-like components withinboth the audio signal and the coding distortion

Model should then apply appropriate maskingrelationships to obtain global masking threshold

Audio Coding – p. 19/45

ww

w.jntuw

orld.com

Page 20: ADSP 10 AC Psycho Acoustics EC623 ADSP

Spread of Masking

Simultaneous masking is not bandlimited to within theboundaries of a single CB

Interband masking also occurs, i.e., a masker centeredwithin one critical band has some predictable effect ondetection thresholds in other CBs.

This effect is known as spread of masking

A triangular spreading function that has slopes of +25and -10 dB per Bark.

SFdB(x) = 15.81 + 7.5(x + 0.474) − 17.5√

1 + (x + 0.474)2

dBwhere x in Barks and SFdB(x) is expressed in dB.

Audio Coding – p. 20/45

ww

w.jntuw

orld.com

Page 21: ADSP 10 AC Psycho Acoustics EC623 ADSP

Just Noticeable Distortion (JND)

Global masking threshold comprises an estimate of thelevel at which quantization noise becomes justnoticeable

Hence global masking threshold is sometimes referredto as JND

Audio Coding – p. 21/45

ww

w.jntuw

orld.com

Page 22: ADSP 10 AC Psycho Acoustics EC623 ADSP

Nonsimultaneous Masking

Also termed temporal masking

Masking phenomenon extends beyond window ofsimultaneous stimulus presentation

Masking occurs both prior to masker onset and alsoafter masker removal

Forward (post) and backward (pre) masking are the two

Audio Coding – p. 22/45

ww

w.jntuw

orld.com

Page 23: ADSP 10 AC Psycho Acoustics EC623 ADSP

Figure 7: Temporal Masking

22-1

www.jntuworld.com

Page 24: ADSP 10 AC Psycho Acoustics EC623 ADSP

Perceptual Entropy

Entropy gives min. no. of bits/sample required to storeor transmit given message block

Johnstan combined notion of psychoacoustic maskingwith signal quantization principles to define PerceptualEntropy (PE).

Perceptual Entropy gives min. no. of bits/samplerequired to store or transmit perceptually relevantinformation in given audio message block.

While discussing PE, conventional entropy is termed asstatistical entropy.

Statistical entropy employs the statistical properties ofthe signal for computing entropy

Perceptual entropy employs both statical andperceptual properties of signal for computing entropy.

Audio Coding – p. 23/45

ww

w.jntuw

orld.com

Page 25: ADSP 10 AC Psycho Acoustics EC623 ADSP

Basis for PE

Masking threshold indicates amount of quantzn. in freq.dom. without perceptually corrupting signal.

Assume that step size and no. of levels in the quantizerfor each spectral line could be set independently.

Further choice of step size is such that total noiseinjected at each frequency corresponds to maskingthreshold i.e., min no of quantization levels are used.

Then no. of bits required to encode entire transformrepresents min. no. of bits necessary to transmit thatblock of the signal.

The total number of bits divided by the no. of samples inthe transform represents per-sample rate.

This per-sample bit rate is Perceptual Entropy of signal.

Audio Coding – p. 24/45

ww

w.jntuw

orld.com

Page 26: ADSP 10 AC Psycho Acoustics EC623 ADSP

PE v/s SE

Statistical entropy (SE) exploits signal statistics

Perceptual entropy (PE) exploits signal statistics andalso psychoacoustic masking

No. of quantization levels just to avoid perceptualdistortion due to quantization by exploiting maskingthresholds.

Audio Coding – p. 25/45

ww

w.jntuw

orld.com

Page 27: ADSP 10 AC Psycho Acoustics EC623 ADSP

Steps for PE Computation

DFT computation

Finding Masking thresholds

Calculating no. of bits to quantize DFT spectrum

Audio Coding – p. 26/45

ww

w.jntuw

orld.com

Page 28: ADSP 10 AC Psycho Acoustics EC623 ADSP

DFT Computation

Windowing and frequency transformation

2048 sample DFT by FFT

1024 are considered for further analysis

Audio Coding – p. 27/45

ww

w.jntuw

orld.com

Page 29: ADSP 10 AC Psycho Acoustics EC623 ADSP

Calculation of Masking Threshold

Critical band analysis

Applying spreading function to critical band spectrum

Calculating Masking Thresholds

Accounting for absolute thresholds

Relating spread masking threshold to critical bandmasking threshold

Audio Coding – p. 28/45

ww

w.jntuw

orld.com

Page 30: ADSP 10 AC Psycho Acoustics EC623 ADSP

Critical Band Analysis

DFT spectrum is complex: S(ω) = Re(ω) + Im(ω)

Power Spectrum: P (ω) = Re2(ω) + Im2(ω)

P (ω) is partitioned into CBs

Energy in each CB: Bi =∑bhi

ω=bli P (ω)

Bi represents CB spectrum

Audio Coding – p. 29/45

ww

w.jntuw

orld.com

Page 31: ADSP 10 AC Psycho Acoustics EC623 ADSP

Spreading Function (SF)

CB spectrum threshold is also influenced by adjacentCBs which is accounted using SF.

SF is used to estimate effects of masking across CBs

SF is calculated for abs(j − i) ≤ 25, where i is bark freqof masked and j is bark freq of masking and placed intoa matrix Sij

Spread CB Spectrum: Ci = Sij ∗ Bi

Effect of spreading function is to spread peaks in Bi andalso raise threshold values, especially at higherfrequencies.

Audio Coding – p. 30/45

ww

w.jntuw

orld.com

Page 32: ADSP 10 AC Psycho Acoustics EC623 ADSP

Masking Thresholds

TMN is estimated as 14 + i dB below Ci, where i is barkfreq.

NMT is estimated as 5.5 dB below Ci uniformly acrossCB spectrum

Audio Coding – p. 31/45

ww

w.jntuw

orld.com

Page 33: ADSP 10 AC Psycho Acoustics EC623 ADSP

Tone Like and Noise Like Components

Spect. Flatness Measure: SFM = GM/AM

GM geometric mean of P (ω) and AM is arithmetic meanof P (ω)

SFMdB = 10log10(GM/AM )

Coeff. of tonality: α = min(SFMdB/SFMdBmax, 1)

SFMdbmax = −60 dB is used to estimate tonality

SFMdB = 0 indicate complete noise like

SFMdB = −30 dB indicates α = 0.5

SFMdB = −75 dB indicates α = 1.0

Audio Coding – p. 32/45

ww

w.jntuw

orld.com

Page 34: ADSP 10 AC Psycho Acoustics EC623 ADSP

Offset for Masking Energy

Oi = α(14.5 + i) + (1 − α)5.5 (dB), in each band i

Index α is used to geometrically weight the twothresholds

Oi is then subtracted from Ci to yield spread thresholdestimate Ti = 10log10(Ci)−Oi/10

Since spectrum spread fns. do not have normalizedgain, it is normalized by the DC gain for each CB

After normalization, bark thresholds are compared toabsolute thresholds.

Any CB that has bark threshold lower than absolutethreshold is changed to the absolute threshold

This will be the threshold used for computing bit rate.

Audio Coding – p. 33/45

ww

w.jntuw

orld.com

Page 35: ADSP 10 AC Psycho Acoustics EC623 ADSP

Calculation of Bit Rate

No. of quantization levels to follow signal in freq domain

Ti is in power d omain

Quantization energy must be spread across ki spectrallines in each CB

Assuming noise to spread equally across the entireband, noise energy will be δ2/12

Energy at each spectral freq = Ti/ki

Real and imaginary are quantized independently,= Ti/2ki

δ2/12 = Ti/2Ki =⇒ δ = T ′

i =√

(6Ti)/ki

T ′

i is step size.

Audio Coding – p. 34/45

ww

w.jntuw

orld.com

Page 36: ADSP 10 AC Psycho Acoustics EC623 ADSP

Computing PE

NRe(ω) = abs(nint(Re(ω)/T ′

i )) andNIm(ω) = abs(nint(Im(ω)/T ′

i )) for each ω within CB i.

Let N∗ represents actual (integer) quantized value ofeach line

If N(ReorIm)(ω) = 0, then N ′

(ReorIm)(ω) = 0

If N(ReorIm)(ω) 6= 0, then N ′

(ReorIm)(ω) = log2(2N∗(ω) + 1)

This operation assigns a bit rate of zero bits to anysignal with an amplitude that does not need to bequantized and assigns a bit ate of log2(no.oflevels) tothose that must be quantized.

Total bit rate =∑π

ω=0(N′

Re(ω) + N ′

Im(ω))

Rate per sample, PE = Totalbitrate/2048

Audio Coding – p. 35/45

ww

w.jntuw

orld.com

Page 37: ADSP 10 AC Psycho Acoustics EC623 ADSP

Example codec perceptual model

ISO/IEC 11172-3 (MPEG-1) Psychoacoustic Model-1

Determines max. allowable quantization noise energyin each CB such that it remains inaudible.

Blocking i/p audio into frames

High resolution spectral computation for each frame

For each frame tonal and noise maskers estimation

Decimation and reorganization of maskers

Calculation of individual masking thresholds forcomponents in each CB

Calculation of global masking thresholds for each CB

Audio Coding – p. 36/45

ww

w.jntuw

orld.com

Page 38: ADSP 10 AC Psycho Acoustics EC623 ADSP

Spectral Analysis

512 point DFT computation

Power Spectral Density (PSD) P (k) estimation, wherek = 1, 2, . . . , 512

−10

60

50

40

30

20

10

0

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

SP

L (

dB

)

Frequency (Hz)

Audio Coding – p. 37/45

ww

w.jntuw

orld.com

Page 39: ADSP 10 AC Psycho Acoustics EC623 ADSP

Identn. of Tonal and Noise Maskers

P (k) where k = 1, 2, . . . , 256 are considered

Local maxima in PSD within a certain Bark by at least 7dB are classified as tonal

Tonal set ST is defined as

ST = P (k)|P (k) > P (k ± 1)&P (k) > P (k ± ∆k) + 7dB

where

∆k ∈ 2 2 < k < 63(0.17 − 5.5kHz)

∆k ∈ [2, 3] 63 ≤ k < 127(5.5 − 11kHz)

∆k ∈ [2, 6] 127 ≤ k ≤ 256(11 − 20kHz)

Audio Coding – p. 38/45

ww

w.jntuw

orld.com

Page 40: ADSP 10 AC Psycho Acoustics EC623 ADSP

Tonal and Noise Maskers (contd.)

Tonal maskers PTM (k), are computed from spectralpeaks listed in ST :

PTM (k) = 10log10

1∑

j=−1

100.1P (k+j)(dB)

For each neighborhood max, energy from threeadjacent peaks combined to form a single tonal masker

For each CB, PNM (k̄) a single NM is then computedfrom (remaining) spectral lines not within the ±∆k

neighborhood of a tonal masker using the sumPNM (k̄) = 10log10

j

100.1P (j)(dB)

∀P (j) 6= PTM (k, k ± 1, k ± ∆k)

where k̄ is geometric mean spectral line of CBAudio Coding – p. 39/45

ww

w.jntuw

orld.com

Page 41: ADSP 10 AC Psycho Acoustics EC623 ADSP

Decimation of Maskers

No. of maskers are reduced using two criteria

First, any tonal or noise maskers below abs. thresholdare discarded, i.e., PTM,NM (k) ≥ Tq(k) are retained.

Next, a sliding 0.5 Bark-wide window is used to replaceany pair of maskers occurring within a distance of 0.5Bark by the stronger of the two.

Masker freq. bins are reorganized using the decimationscheme

PTM,NM (i) = PTM,NM (k)

PTM,NM (k) = 0

Audio Coding – p. 40/45

ww

w.jntuw

orld.com

Page 42: ADSP 10 AC Psycho Acoustics EC623 ADSP

Decimation (contd.)

i = k, 1 ≤ k ≤ 48

i = k + (kmod2) 49 ≤ k ≤ 96

i = k + 3 − ((k − 1)mod4) 97 ≤ k ≤ 232

Net effect is 2 : 1 decimation of masker bins in CBs18-22

4:1 decimation of masker bins in CBs 22-35

With no loss of masking components.

Decimation reduces total no. of tone and noise maskerfreq. bins under consideration from 256 to 106

Audio Coding – p. 41/45

ww

w.jntuw

orld.com

Page 43: ADSP 10 AC Psycho Acoustics EC623 ADSP

Individual Masking Thresholds

Using decimated set of tonal and noise maskers,individual tone and noise masking thresholds arecomputed

Each individual threshold represents a maskingcontribution at freq. bin i due to the tone or noisemasker located at bin j

Tonal Masking Threshold, TTM (i, j) is given byTTM (i, j) = PTM (j)−0.2757zb(j)+SF (i, j)−6.025(dbSPL)where, PTM (j) is SPL of tonal masker in freq. bin j,zb(j) Bark freq of bin j and SF (i, j) is spreading ofmasking from bin j to bin i

Noise Masking Threshold, TNM (i, j) is given byTNM (i, j) = PNM (j)−0.175Zb(j)+SF (i, j)−2.025(dbSPL)where, PNM (j) is SPL of noise masker in freq bin j

Audio Coding – p. 42/45

ww

w.jntuw

orld.com

Page 44: ADSP 10 AC Psycho Acoustics EC623 ADSP

Global Masking Thresholds

Individual masking thresholds are combined to estimatea global masking threshold for each freq. binTg(i) = 10log10(100.1Tq(i) +

∑Ll=1 100.1TT M (i,l) +

∑Mm=1 100.1TNM (i,m))(db, SPL) where, L and M are the

number of tonal and noise maskers, respectively.

The number of bits are allocated based on the globalmasking thresholds and is termed as perceptual bitallocation.

Audio Coding – p. 43/45

ww

w.jntuw

orld.com

Page 45: ADSP 10 AC Psycho Acoustics EC623 ADSP

Expt. 5-AC- Audio Synthesis using MSE

Problem No. 2.25 (pp. 49) of Spanias book on AudioSignal Processing

Audio Coding – p. 44/45

ww

w.jntuw

orld.com

Page 46: ADSP 10 AC Psycho Acoustics EC623 ADSP

Expt. 6-AC- Audio Synthesis using Psychoacoustics

Problem No. 5.11 (pp. 142) of Spanias book on AudioSignal Processing

Audio Coding – p. 45/45

ww

w.jntuw

orld.com