audio coding mpeg1 layers i, ii, iii mpeg2mpeg4 sherida subrati anthony caliendo

20
Audio Coding Audio Coding MPEG1 Layers I, II, III MPEG1 Layers I, II, III MPEG2 MPEG2 MPEG4 MPEG4 Sherida Subrati Anthony Caliendo

Post on 21-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Audio CodingAudio Coding

MPEG1 Layers I, II, IIIMPEG1 Layers I, II, III

MPEG2MPEG2

MPEG4MPEG4

Sherida SubratiAnthony Caliendo

OverviewOverview

• Explanation of CodecsExplanation of CodecsMPEG1 – Layer I, II, III (Differences)MPEG1 – Layer I, II, III (Differences)MPEG2 – Basic OverviewMPEG2 – Basic OverviewMPEG4 – Possible ApplicationsMPEG4 – Possible Applications

• Applications & Sound Samples UsedApplications & Sound Samples Used• Results & ExplanationResults & Explanation

File Size, Bitrate, & QualityFile Size, Bitrate, & QualityWaveform ComparisonWaveform Comparison

• Summary & QuestionsSummary & Questions

Sub-Band Coding OverviewSub-Band Coding Overview

• Size of sub-Size of sub-bands variesbands varies

• Varying Varying application of application of psychoacoustpsychoacoustic modelic model

MPEG1 – Layer I & IIMPEG1 – Layer I & II

• Time Frequency MappingTime Frequency Mapping Polyphase Filter BankPolyphase Filter Bank 32 Equal Bands32 Equal Bands

• Psychoacoustic ModelPsychoacoustic Model 512-point FFT & 1024-point FFT respectively512-point FFT & 1024-point FFT respectively Tonal & Noise MaskingTonal & Noise Masking

• QuantizerQuantizer Scale Factor: 6 bitsScale Factor: 6 bits Layer II – Allows 3 successive scale factors & Layer II – Allows 3 successive scale factors &

uses 1-3 depending on how much they differuses 1-3 depending on how much they differ

MPEG1 – Layer I & II MPEG1 – Layer I & II DiagramDiagram

Images from Peter Noll MPEG Digital Audio Coding Standards

MPEG1 – Layer IIIMPEG1 – Layer III

• Time Frequency MappingTime Frequency Mapping Switched Hybrid Filter BankSwitched Hybrid Filter Bank 32 sub-bands further sub-divided using a32 sub-bands further sub-divided using a

6 or 18-point DCT6 or 18-point DCT

• Psychoacoustic ModelPsychoacoustic Model Variable FFTVariable FFT Tonal & Noise MaskingTonal & Noise Masking

• QuantizerQuantizer Non-uniform Scale FactorsNon-uniform Scale Factors Huffman Coding, Bit Reservoir, & Iterative Huffman Coding, Bit Reservoir, & Iterative

AnalysisAnalysis

MPEG1 – Layer III DiagramMPEG1 – Layer III Diagram

Images from Peter Noll MPEG Digital Audio Coding Standards

MPEG2 – General OverviewMPEG2 – General Overview

• 5.1 Channel Support5.1 Channel Support

• Advanced Audio Coding (AAC)Advanced Audio Coding (AAC) Optional PreprocessingOptional Preprocessing Bit-stream FormatterBit-stream Formatter Prediction – helps to optimize quantizerPrediction – helps to optimize quantizer Noiseless CodingNoiseless Coding 3 Profiles3 Profiles

• Main – Variable length DCT, noiseless coding, etc.Main – Variable length DCT, noiseless coding, etc.

• Low Complexity – No temporal noise shaping & time Low Complexity – No temporal noise shaping & time domain predictiondomain prediction

• Sampling Rate Scalability – preprocessor allows for Sampling Rate Scalability – preprocessor allows for sampling rates of 6, 12, 18, & 24 KHzsampling rates of 6, 12, 18, & 24 KHz

MPEG4 - General OverviewMPEG4 - General Overview

• Consists of all previous MPEG iterationsConsists of all previous MPEG iterations

• Uses 3 Core CodersUses 3 Core CodersParametric coding for low bit rate speechParametric coding for low bit rate speechAnalysis-by-synthesis for medium bit ratesAnalysis-by-synthesis for medium bit ratesSub-band/Transform coding for high bit ratesSub-band/Transform coding for high bit rates

• Low Delay (LD) Encoding / DecodingLow Delay (LD) Encoding / Decoding

• Quality ScalabilityQuality Scalability

MPEG 4 - DiagramMPEG 4 - Diagram

Applications & Sound Applications & Sound Samples Samples • ApplicationsApplications

AVI2MP.EXEAVI2MP.EXE LAMEwin32LAMEwin32 Nero MPEG4 AACNero MPEG4 AAC GoldwaveGoldwave

• HardwareHardware Pentium III – 1.0 GHzPentium III – 1.0 GHz 512MB RAM512MB RAM Win2K SP3Win2K SP3

• Sound SamplesSound Samples PCM 16-bit Stereo 44.1 PCM 16-bit Stereo 44.1

KHzKHz• Clubbed to Death Clubbed to Death

(Kurayamino Mix) – Rob D(Kurayamino Mix) – Rob D

• Man Who Sold The World - Man Who Sold The World - NirvanaNirvana

PCM 8-bit Mono 44.1KhzPCM 8-bit Mono 44.1Khz• Voice SampleVoice Sample

Results – File size VS BitrateResults – File size VS Bitrate

Sample 2File Size VS Bitrate

0.00

1,000,000.00

2,000,000.00

3,000,000.00

4,000,000.00

5,000,000.00

6,000,000.00

7,000,000.00

32 48 56 64 80 96 112 128 160 192

Bitrate (kbps)

Sample 3File Size VS Bitrate

0.00

10,000.00

20,000.00

30,000.00

40,000.00

50,000.00

60,000.00

70,000.00

80,000.00

16 24 32 48 56 64 80 96

Bitrate (kbps)

Results – Encode Time VS Results – Encode Time VS BitrateBitrate

Sample 1Encode Time VS Bitrate

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

32 48 56 64 80 96 112 128 160 192

Bitrate (kbps)

Sample 2Encode Time VS Bitrate

0.00

20.00

40.00

60.00

80.00

100.00

120.00

32 48 56 64 80 96 112 128 160 192

Bitrate (kbps)

Results – Quality VS BitrateResults – Quality VS Bitrate

Sample 2Quality VS Bitrate

0.00

1.00

2.00

3.00

4.00

5.00

6.00

32 48 56 64 80 96 112 128 160 192

Bitrate (kbps)

Sample 3Quality VS Bitrate

0.00

1.00

2.00

3.00

4.00

5.00

6.00

16 24 32 48 56 64 80 96

Bitrate (kbps)

Sample SoundsSample Sounds

• Music SampleMusic Sample Original SoundOriginal Sound Sample 2 Play listSample 2 Play list S2-M4LT-064SS2-M4LT-064S S2-M4LT-080SS2-M4LT-080S S2-M4LT-096SS2-M4LT-096S

• Voice SampleVoice Sample Original SoundOriginal Sound Sample 3 Play listSample 3 Play list S3-M4LT-016MS3-M4LT-016M S3-M4LT-024MS3-M4LT-024M S3-M4LT-032MS3-M4LT-032M

Sample Waveforms – S2-64Sample Waveforms – S2-64

Sample Waveforms – S2-Sample Waveforms – S2-128128

Sample Waveforms – S3-64Sample Waveforms – S3-64

Sample Waveforms – S3-96Sample Waveforms – S3-96

SummarySummary

• MPEG1 – Layers I, II have limited options & MPEG1 – Layers I, II have limited options & are not size versus quality efficientare not size versus quality efficient

• MPEG1 – Layer III offers excellent quality MPEG1 – Layer III offers excellent quality at low rates but has large overheadat low rates but has large overhead

• MPEG2 – Much more comprehensiveMPEG2 – Much more comprehensive

• MPEG4 – Encompasses all previous MPEG4 – Encompasses all previous iterations & has new capabilities to iterations & has new capabilities to increase its lifespanincrease its lifespan