audiocompression&mp3standard

8/2/2019 AudioCompression&MP3Standard

1/12

MPEG, the MP3 Standard,

and Audio Compression

Mark Kilgore and Jamie Wu

Mathematics of the Information Age

September 16, 2003

Audio Compression

n Basic Audio Coding.

n Why beneficial to compress?

n Lossless versus Lossy Compression.

n How are MP3s Compressed?

n What makes MP3 Compression Different?n What other formats lie in our future?


2/12

PCM

Why Compress??

n Eliminate redundancy

n Most basic encoder/decoder is PCM

n Lots of redundancy b/c PCM representation is a basicsine wave

n If representing the sine wave based on frequencyrather than time, only need to store information

regarding frequency, amplitude, and phase in orderto represent the information

n Can reduce data without information loss

n Extends playing time, Allows for miniaturization andgreater equipment tolerance, Reduces cost


3/12

Lossless vs. Lossy (Perceptive)

n Lossless coding allows perfect reconstructionof a signal (theoretically)

n Lossy Coding creates a more highly

compressed signal, but some unnecessaryfrequencies are eliminated

n Perceptually, however, lossy coding results in

no difference in how it SOUNDS to a person

n MP3s are lossy, but perceptually lossless

MPEG

n Moving Picture Experts Group

n Aim to create standards relating to synchronizedaudio and video compression

n MPEG-1

n MPEG-2


4/12

MPEG-1 Block Diagrams

Topics Discussed in Detail After Diagrams

Layers I and II

Filter Bank (32

Sub-Bands)

0

31

DFT 512/1024

Hann WindowPsychoacoustic

Model

Uniform MidtreadQuanitzer

Coding of SideInformation

BitstreamFormatting

CodedAudio

Data


5/12

DFT 2 * 1024

Hann Window

Filter Bank (32

Sub-Bands)

0

31 MDCT

PsychoacousticModel

Non-UniformMidtread Quantizer

Rate/Distortion Loop

0

511

Huffman Coding


BitstreamFormatting

CodedAudio

Data

Layer III

Time to Frequency Mapping

n Filters parse signal to K bands

n Quantized to a limited number of bits

n Noise put in bands barely audible

n Sent to decoder where sound is restored

x

H0

HK

K

K

InputOutputy0

yK

y0

yK

K

K

G0

GK

Encoder Decoder

x


6/12

Z Transform

n Assists in splitting frequencies

n Discrete Time generalization of the Fouriertransform

n Important Properties

n Linearity

n Convolution Theorem

n Delay Theorem

n Can model all kinds of filter banks through it

n Representation of frequency content

DFT 2 * 1024Hann Window

Filter Bank (32

Sub-Bands)

0

31 MDCT

PsychoacousticModel



0

511

Huffman Coding


BitstreamFormatting

CodedAudio

Data

Layer III


7/12

Time to Frequency Mapping

n Filters parse signal to K bands

n Quantized to a limited number of bits

n Noise put in bands barely audible

n Sent to decoder where sound is restored

x

H0

HK

K

K

InputOutputy0

yK

y0

yK

K

K

G0

GK

Encoder Decoder

x

MPEG Time to Frequency Mapping

[ ] [ ] ( ) 32

162

1cos

+

+=

nknhnhk [ ] [ ] ( )

+

+=

3216

2

1cos32

nknhngk

n Uses a filter of 32 bands, signal represented by 512samples

n The above equations allow for taking apart the signal(the h part of the time to frequency mapping diagram)and putting it back together (the g part of the time tofrequency mapping diagram)

Analysis Filter: Synthesis Filter:

511,,1,0;31,,1,0 KK == nk


8/12

DFT 2 * 1024

Hann Window

Filter Bank (32

Sub-Bands)

0

31 MDCT

PsychoacousticModel



0

511

Huffman Coding


BitstreamFormatting

CodedAudio

Data

Layer III

PQMF & MDCT

n Both are methods of time to frequency mapping

n Pseudo-Quadrature Mirror Function

n Multiple Discrete Cosine Transformation

n Mathematically, they are equivalent

n PQMF involves using Z transforms to representthe amplitudes of the frequency

n MDCT involves performing a block transformusing a window to represent amplitudes

n These amplitudes are then quantized


9/12

DFT 2 * 1024

Hann Window

Filter Bank (32

Sub-Bands)

0

31 MDCT

PsychoacousticModel



0

511

Huffman Coding


BitstreamFormatting

CodedAudio

Data

Layer III

Pyschoacoustic Model

n determines masking threshold for each sub band

n Uses human auditory property of AuditoryMasking


10/12

Non-uniform Quantizer

n Analog to digital

n Quantizer: Maps amplitude values into finitenumber of bits

n Non-uniform: changes sample size according

to amplitude values

n parts of signal with lesser amplitude codedwith greater accuracy increases signal to

noise ratio (SNR)

DFT 2 * 1024Hann Window

Filter Bank (32

Sub-Bands)

0

31 MDCT

PsychoacousticModel



0

511

Huffman Coding


BitstreamFormatting

CodedAudio

Data

Layer III


11/12

Huffman coding

n For better data compression, variable-lengthHuffman codes are used to encode the

quantized samples.

n quantized MDCT coefficients (for long blocks)arranged in order from lowest to highestfrequency

n whole range divided into 3 sections, each

coded with a different set of Huffman tables

Bitstream Formatting

n formats encoded quantized samples into anencoded bitstream final form in which the

compressed signal is transmitted.


12/12

MPEG-4 and The Future?

n Incorporates speech and music compression

n More of an extension of MPEG-2compression techniques with independent

techniques geared specifically at coding forspeech content (some coding for meaning)

n Hasnt really taken off yet, only time will tell

n MPEG-2 AAC (Advanced Audio Coding) is

the audio format that is used if you downloadfrom the apple iTunes store

audiocompression&mp3standard

Documents