introduction to coding of audivisual contents

Upload: josep-carner

Post on 02-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Introduction to coding of audivisual contents

    1/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 1

    Coding of Audiovisual Contents

    1. Introduction

    Veu Processing GroupImage and Video Processing Group

    Unit structure

    Motivating multimedia compression

    The ITU601 standard

    ow oes compress on wor

    Redundancy and irrelevance

    Lossless and lossy compression Compression efficiency

    Quality measures

    2

    Coding systems

    Importance of standards

    1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    2/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 2

    Motivating multimedia compression (I)

    Uncompressed multimedia (graphics, audio and video) data requires considerable stora e ca acit and transmission bandwidth .

    Why do we need to compress multimedia content?

    Despite rapid progress in mass storage density, processor speeds , and digital communication system performance, demand for data storage capacity and data transmission bandwidth continues to outstrip the capabilities of available technologies.

    The continuous growth of data intensive multimedia based web applications have not only sustained the need for more efficient ways to encode signals and images but have made compression of such signals central to storage and communication technology .

    3 1. IntroductionCoding of Audiovisual Contents

    Motivating multimedia compression (II)

    More than 500 thousand million digital photos are taken each year .

    3 days of content are uploaded

    Some data (2013)

    every minute to YouTube. 600 years of Youtube content are

    shared through Facebook links every

    day. Every second , on average, 58

    photos are uploaded to Instagram, 15 photos are uploaded to Flickr, 26 photos/videos are shared on Twitter and more than 2000 photos are uploaded to Facebook (though the most part of them are not public).

    4 1. IntroductionCoding of Audiovisual Contents

    http://instagram.com/p/W2FCksR9-e/#

    What a difference 8 years makes. Saint Peters square in 2005 vs 2013 (#NBCPope)

  • 8/10/2019 Introduction to coding of audivisual contents

    3/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 3

    An example: ITU601

    In 1982, the CCIR defined a standard for encoding interlaced analogue video signals in digital form mainly for studio applications.

    The current name of the standard is ITUR BT.601 .

    e co or space s B R: It enables compatibility between color and black&white . It allows a reduction of the chroma resolution thanks to the lower

    sensitivity of the HVS to the color information. The RB signals are selected to create the chroma signals (CBCR)

    since G presents the largest correlation with Y.

    0.299 0.587 0.1140.500 0.419 0.0810.169 0.331 0.500

    R

    B

    Y R

    C G

    BC

    E E

    E E

    E E

    5 1. IntroductionCoding of Audiovisual Contents

    Digital video: Sampling (I)

    The previous signals EY ECB ECR present a bandwidth of

    approximately 6 MHz: Antialiasing filters have a bandpass up to 5.75 MHz (ITU601). The sampling frequency is 13.5 MHz (ITU601) which is a

    multiple of the line frequency in the most common standards: 625 (PAL) or 525 (NTSC) lines.

    Both standards have 720 samples by active line.

    Video signal sampling : An orthogonal sampling pattern is used. Chroma signals can be sampled at lower resolution than the

    luminance signal.

    6 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    4/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 4

    Digital video: Sampling (II)

    The most common sampling formats are (Chroma sampling changes):

    4:4:4 Format 4:2:2 Format 4:2:0 Format

    4:4:4 Format: Also used for the RGB format.

    4:2:2 Format: Useful for most studio applications.

    4:2:0 Format: Not the scope of the ITU601 standard.

    LuminanceChroma

    7 1. IntroductionCoding of Audiovisual Contents

    Digital video: Quantization (I) The EY ECB ECR components

    do not have the same dynamics : Signal EY [0, 1] Volts Signal ECB [0.5, 0.5] Volts Signal ECR [0.5, 0.5] Volts.

    Every signal is quantized using 8 (10) bits ; that is 256 (1024) levels, so that: Values 0 and 255 are preserved for synchronization data

    A protection interval (to allow for overshoot and undershoot) is left in the upper and lower levels of the dynamic range.

    219 16Y Y E

    224 128 160 128 R R C R Y C E E E 224 128 126 128

    B B C B Y C E E E

    The corresponding level number is the nearest integer value and the digital

    equivalents are termed Y , C B and C R .

    8 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    5/28

  • 8/10/2019 Introduction to coding of audivisual contents

    6/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 6

    Bit rates (II)

    The most

    common

    version

    is

    4:2:2

    10

    bits,

    leading

    to

    270

    Mbits/s :

    Standardized as SMPTE 259M

    Lower bit rates can be achieved removing all irrelevant information (without blanking intervals: Buffered video ):

    eo s samp e a , z Stored in a memory Only actives lines are read at minimum speed.

    For instance, at 4:2:2 format (625 lines, 50 Hz), there are 720 samples per line, 576 active lines and 25 frames per second. As every sample is quantized with 8 bits, the final bit rate is:

    =

    (720 x 480 x 30 x 10 x (1 + + ) = 207.36 Mbit/s)

    Although the bit rate is reduced, figures are still very high: Only for studio applications . Broadcasting requires compression systems .

    11 1. IntroductionCoding of Audiovisual Contents

    Unit structure

    Motivating multimedia compression

    The ITU601 standard

    ow oes compress on wor

    Redundancy and irrelevance

    Lossless and lossy compression Compression efficiency

    Quality measures

    12

    Coding systems

    Importance of standards

    1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    7/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 7

    How does compression work?

    Two fundamental components of information compression are

    redundancy and irrelevance reduction.

    Redundancy reduction aims at removing duplication from the signal source (audio / image/ video/ text). In general, three types of redundancy can be identified:

    Temporal Redundancy or correlation between adjacent samples in audio or adjacent frames in a sequence of images (in video applications).

    . Spectral Redundancy or correlation between different spectral

    bands or color components.

    13 1. IntroductionCoding of Audiovisual Contents

    Redundancy / Relevancy

    Two fundamental components of information compression are

    redundancy and irrelevance reduction.

    Irrelevance reduction omits parts of the signal that will not be noticed by the signal receiver:

    The human

    auditory

    system

    (HAS)

    or

    The human visual system (HVS).

    Current coding systems exploit both types of characteristics

    redundancy and irrelevance reduction, introducing losses in

    the decoded signal.

    14 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    8/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 8

    Temporal Redundancy: Video

    In video signals, there is usually a high temporal correlation (similarity) between frames that are captured at around the same time, especially if the temporal sampling rate (the frame rate) is high.

    Table tennis sequenceframe #40

    Table tennis sequenceframe #41

    Table tennis sequenceframe #42

    15 1. IntroductionCoding of Audiovisual Contents

    Temporal Redundancy: Speech

    In speech signals, there is usually a high temporal correlation (similarity) between consecutive (or close) samples that can be appreciated in the signal itself, its (estimated) autocorrelation or its (estimated) spectral density.

    Autocorrelation of

    Speech signal: Voiced and Unvoiced parts

    the voiced signal

    Table tennis sequenceframe #42

    16 1. IntroductionCoding of Audiovisual Contents

    Spectral density of the voiced signal

  • 8/10/2019 Introduction to coding of audivisual contents

    9/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 9

    Temporal Redundancy: Audio

    This temporal redundancy can be clearly observed as well when comparing consecutive samples in an audio signal.

    Strong correlation found in voiced sounds of speech and many sounds from musical instruments

    17 1. IntroductionCoding of Audiovisual Contents

    Spatial Redundancy: Image

    In still images, there is usually a high spatial correlation between pixels (samples) that are close to each other, i.e. the values of neighboring samples are often very similar.

    Amplitude of adjacent pixel

    Gray level image, 8bbpJoint histogram of two horizontal adjacent pixels

    Amplitude of current pixel

    18 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    10/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 10

    Inter channel redundancy: Color Image

    In images, there is usually a high correlation in the spectral domain between color image components

    Statistical dependence between R,G,B is stronger

    19 1. IntroductionCoding of Audiovisual Contents

    Inter channel redundancy: Stereo audio signals

    For some stereo audio signals , there is a strong correlation between channels

    Example of recorded sounds coming from the front (as in live concerts)

    20 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    11/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 11

    Frequency redundancy: Image

    In images, there is usually a high correlation between the various frequency bands of the signal

    First level Second level Third level

    21 1. IntroductionCoding of Audiovisual Contents

    Irrelevance

    For audio signals as well as image and video data to be consumed by humans, there is no need to represent more than the information that can be perceived in

    , , , loudness, brightness,

    Required resolution might depend on audio / image content (masking )

    Irrelevancy can be application dependent : In ima e onl a s ecific area mi ht be relevant for some task e. . in

    medical imaging.

    22 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    12/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 12

    Exploiting limitations of HAS (I)

    Human ear can only hear sound louder than a frequency dependant threshold (20Hz20kHz)

    Masking : some signal components are irrelevant to human perception Depends on spectral characteristics, temporal characteristics and intensity

    It can occur simultaneously with the masking signal (frequency masking)

    Loud sounds mask soft ones, louder sound happens at the same time

    23 1. IntroductionCoding of Audiovisual Contents

    Exploiting limitations of HAS (II) Human ear can only hear sound louder than a frequency dependant threshold

    (20Hz20kHz) Masking : some signal components are irrelevant to human perception

    Depends on intensity and spectral as well as temporal characteristics

    It can occur before or after the masking signal (temporal masking) The auditory system needs an integration time higher for lower sounds

    Signals which are not audible do not need to be transmitted

    Perceptual lossless : assign number of bits so that the quantization noise is not audible

    24 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    13/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 13

    Exploiting limitations of HVS

    Human visual system has much lower acuity for color hue and saturationthan for brightness

    Use color transform to facilitate exploiting that property

    Cb and Cr often sub sampled 2:1 relative to Y

    0.299 0.587 0.114

    0.169 0.331 0.5 .

    0.5 0.419 0.813

    Y R

    Cb G

    Cr B

    x x

    x x

    x x

    RGBcomponents

    Luminance component

    Chrominance components {

    Y

    Cb

    Cr

    25 1. IntroductionCoding of Audiovisual Contents

    Unit structure

    Motivating multimedia compression

    The ITU601 standard

    ow oes compress on wor

    Redundancy and irrelevance

    Lossless and lossy compression Compression efficiency

    Quality measures

    26

    Coding systems

    Importance of standards

    1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    14/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 14

    Lossless and lossy compression

    Lossless compression

    The reconstructed data at the output of the decoder is a perfect copy of the original data.

    n y s a s ca re un ancy s exp o e . Lossless compression of audio, image, video gives only a moderate amount of

    compression. Useful for file compression (rar, zip, arj), medical or satellite images

    Lossy compression

    The decompressed data is not identical to the source data. Both statistical and visual/psychoacoustic redundancy are exploited. Much higher compression ratios can be achieved at the expense of a loss of

    visual/audio quality.

    27 1. IntroductionCoding of Audiovisual Contents

    Performance

    How to evaluate the performance of a compression algorithm?

    Performance can be based on several factors: Algorithm complexity Memory requirements Run time Robustness Compression efficiency Quality measures

    Objective measures Subjective measures

    28 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    15/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 15

    Efficiency (Measures of compression)

    If b and b are the number of binary digits (or information carrying units) used in two representations of the same information,

    Compression ratio : '

    bC

    b Example : grayscale image of size 256x256 at 1 byte per pixel. If the compressed image

    occupies 16384 bytes, the compression ratio is 4

    Compressed bit rate : number of binary digits (bits) used to represent each sample for ima es, in b or number of bits used er second for audio/video, in b/s

    256 256 8 655364

    16384 8 16384

    x x

    x

    C

    Example : for the previous image, the compressed bit rate is 2 bpp (bits per pixel)

    16384 8 1310722

    256 256 65536

    x

    x

    29 1. IntroductionCoding of Audiovisual Contents

    Typical compressed bit rates (I)

    Audio (MP3) 32 Kb/s MW (AM) quality 96 Kb/s FM quality 128160 Kb/s Standard Bitrate quality 192 Kb/s DAB (Digital Audio Broadcasting) quality 224320 Kb/s Near CD quality

    Other audio formats 800 b/s Minimum necessary for recognizable speech 8 Kb/s Telephone quality 0.5:1 Mb/s Lossless audio 1.411 Mb/s PCM sound format of Compact Disc Digital Audio

    30 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    16/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 16

    Typical compressed bit rates (II)

    Color Image Lossless: 3 bpp Lossy quality: Usable: 0.25 bpp, Moderate: 0.5 bpp, High: 1 bpp

    Video (MPEG2) 16 Kb/s Videophone quality (minimum necessary for a

    consumer acceptable "talking head" picture) 128 : 384 Kb/s Business oriented videoconferencing system quality 1.25 Mb/s VCD quality

    5 Mb/s DVD quality 15 Mb/s HDTV quality 36 Mb/s HD DVD quality 54 Mb/s Bluray Disc quality

    31 1. IntroductionCoding of Audiovisual Contents

    Quality measures: Speech (I)

    Speech coders are evaluated in terms of bit rate, complexity, delay, robustness and output quality:

    Objective measures exist for computing bit rate, complexity and delay performance.

    For quality or robustness only subjective measures can be applied.

    Mean Opinion

    Score

    (MOS):

    The MOS is generated by averaging the results of a set of standard, subjective tests where a number of listeners rate the heard audio quality of test sentences read aloud by both male and female speakers over the communications medium bein tested.

    A listener is required to give each sentence a rating. The whole evaluation process is very expensive

    32 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    17/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 17

    Quality measures: Speech (II)

    Mean Opinion Score (MOS): The rating scheme is as follows:

    Mean Opinion Score (MOS)MOS Quality Impairment

    5 Excellent Imperceptible

    4 Good Perceptible but not

    annoying

    3 Fair Slightly annoying2 Poor Annoying

    33 1. IntroductionCoding of Audiovisual Contents

    1 Ba Very annoying

    A common value for usual standards is around 4.0 See http://en.wikipedia.org/wiki/Mean_opinion_score for typical values

    Quality measures: Audio (I)

    Audio coders are evaluated in terms of bit rate, complexity, delay, robustness and output quality:

    Objective measures exist for computing bit rate, complexity and delay performance.

    For quality or robustness only subjective measures can be applied.

    In current

    perceptual

    audio

    coding,

    uncompressed

    waveforms

    can

    be very different form the originals, but still they can sound quite similar.

    Objective measures Classical objective measures, such as Signal to Noise Ratio (SNR) or

    Total Harmonic Distortion (THD) are inadequate. Segmental SNR may be used for some classes of coders.

    34 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    18/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 18

    Quality measures: Audio (II)

    Subjective measures Subjective listening tests are the most reliable tool for assessing

    quality. . . . . ,

    ITUR BS.1116, ITUR BS1387, )

    Listening tests are influenced by Playback level and background noise Loudspeakers and headphones distortions Listening room acoustics

    Listeners expertise

    35 1. IntroductionCoding of Audiovisual Contents

    Quality measures: Image and Video (I)

    Objective measures

    Mean Square Error (MSE)1 2

    0

    1 N

    i i

    i

    MSE X X N

    Mean Absolute Difference (MAD)

    Max Error

    1

    0

    1 N

    i i

    i

    MAD X X N

    max i ii MaxError X X

    Peak Signalto Noise Ratio (PSNR)

    PSNR is useful to compare images with different dynamic ranges :

    M = Maximum peak to peak value of the representation

    2

    1010log ; M

    PSNR MSE

    36 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    19/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 19

    Quality measures: Image and Video (II)

    Subjective measures

    Group of observers ITUR Standard BT 500 Time consuming, expensive, off line

    Model the fundamental properties of Human Visual System Several attempts, no universal solution yet Lower levels of processing (eye imaging system, retina sampling,

    spatial masking, etc.) can be modeled, but higher level in the brain

    Commercial systems, like PQA500 (Tektronix)

    37 1. IntroductionCoding of Audiovisual Contents

    Quality measures: Image and Video (III)

    Example of subjective quality measure : Absolute rating scale of the Television Allocation Study Organization

    R. Gonzlez and R. Woods, Digital Image Processing , Prentice-Hall, 2008.

    38 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    20/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 20

    Quality measures: Subjective vs. Objective (I)

    Original Compressed and reconstructed,

    very good quality

    Compressed and reconstructed,

    noticeable degradation

    Missing visual information and with

    artifacts

    MSE= 5.17 MSE = 15.67 MSE = 14.17

    39 1. IntroductionCoding of Audiovisual Contents

    Quality measures: Subjective vs. Objective (II)

    Best SSIM Researchers are

    continuously looking for

    objective measures that

    ReferenceImage

    Hypersphere

    behavior of the HVS.

    For instance, the Structural

    Similarity Index (SSIM).

    Z. Wang and A.C. Bovik, Mean Square Error: Love it or Leave it? IEEE Signal Processing Magazine , pp. 98 - 117, January 2009.

    Worst SSIM

    40 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    21/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 21

    Unit structure

    Motivating multimedia compression

    The ITU601 standard

    ow oes compress on wor

    Redundancy and irrelevance

    Lossless and lossy compression

    Compression efficiency

    Quality measures

    41

    Coding systems

    Importance of standards

    1. IntroductionCoding of Audiovisual Contents

    A typical coding system (I)

    Source Encoder

    Channel Encoder

    Encoder

    x

    Channel Decoder

    Source Decoder

    Channel

    Decoder

    x

    Two different pairs of blocks: source encoder/decoder and channel encoder/decoder.

    The source encoder converts the source output to a binary sequence and the channel encoder processes the binary sequence for transmission over the channel. The channel decoder recreates the incoming binary sequence (hopefully reliable) and the source decoder recreates the source input.

    42 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    22/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 22

    A typical coding system (II)

    Source Encoder

    Channel Encoder

    Encoder

    x

    The main objective of the source encoder is to reduce the amount of data

    Channel Decoder

    Source Decoder

    Channel

    Decoder

    x

    (average number of bits) necessary to represent the information in the signal. This data compression is mainly carried out by exploiting the redundancy and irrelevancy in the signal information.

    The main objective of the channel encoder is to add redundancy to the output of the source encoder to enhance the reliability on the transmission.

    43 1. IntroductionCoding of Audiovisual Contents

    Typical source codec

    Source Encoder

    PerceptualAnalysis

    Transform orPrediction Quantization

    Channel

    x EntropyEncoder

    ChannelEncoder

    QuantizationTable

    CodingTable

    Transform orPrediction Model

    Inverse Quantization

    InverseTransform or

    Prediction x

    Source Decoder

    EntropyDecoder

    ChannelDecoder

    44 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    23/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 23

    Source encoder (I)

    Perceptual Analysis Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    This block applies the knowledge about human perception on the signal and systems to be used.

    Non perceptual parts of the signal should not be considered

    Depending on the signal to be coded, better models can be used: In the audio case, the impact of the perceptual analysis in the final coding system is

    paramount

    In some cases, the knowledge about the human perception cannot be fully exploited in the coding process.

    Typically, the result of the perceptual analysis affects the kind of signal model as well as the quantization of the parameters of such a model.

    45 1. IntroductionCoding of Audiovisual Contents

    Source encoder (II)

    Transform or Prediction Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    correlated set of samples describing the signal; modifies the signal representation into a format designed to reduce its redundancy.

    Data in the new format has to present a correct structure in order to be useful for coding

    purposes

    The operation is generally reversible , that is the transform step does not introduce losses in the description.

    represent the signal.

    Coding algorithms can be grouped into two main classes: techniques that modified the original samples relying on a prediction model (e.g. Block Matching) and techniques that represent the signal in a different domain and code the transformed coefficients (e.g. DCT)

    46 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    24/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 24

    Source encoder (III)

    Quantization Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    This block reduces the accuracy of the transform or prediction output in accordance with a pre established fidelity criterion.

    The goal is to keep irrelevant information out of the compressed representation

    The quantization represents a range of values of a given transformed sample by a single value in the range. The set of output values are the quantization indices. This step is responsible of the losses in the system and, therefore, determines the achieved com ression here we are assumin scalar uantization rou in a set of

    transformed samples into a vector and performing vector quantization is possible as well).

    This operation is irreversible , so it must be omitted when (lossless compression) error free compression is desired.

    47 1. IntroductionCoding of Audiovisual Contents

    Source encoder (IV)

    Entropy encoder Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    This block produces the final bit stream .

    The entropy encoder generates a fixed or variable length code to represent the quantization output and maps the output in accordance with the code.

    In many cases a variable length code is used: the shortest code words are assigned to the most frequently occurring quantization output values, thus minimizing coding redundancy.

    This operation is reversible

    48 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    25/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 25

    Complete system assessment

    Rate/Distortion Analysis (I)

    Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    The way to analysis the performance of a coding system is by evaluating the quality of the decoded signal when varying the encoding rate .

    In the case of audio signals, since good models for the HAS exist, the concept of quality refers to perceived audio quality .

    D

    , the concept of quality commonly refers toa measure of distortion in the decodedsignal (for instance, MSE or PSNR) R

    49 1. IntroductionCoding of Audiovisual Contents

    Complete system assessment

    Rate/Distortion Analysis (II)

    Transform or Prediction Quantization

    Source Encoder

    EntropyEncoder

    Quant. Table Coding TableSelection T/P

    PerceptualAnalysis

    A Rate/Distortion analysis requires the implementation of the complete encoding system , which sometimes is not available

    The various parts of the system are commonly assessed using other (local) measures:

    Transform / Prediction : Decrease in correlation of the output samples. Quantization : Decrease in entropy of the output symbols

    Those are suboptimal measures and usually the global analysis is necessary since the different blocks can be very related:

    For instance, entropy coders which are very adapted to a given source

    50 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    26/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 26

    Source decoder

    Source Decoder

    The decoder contains three blocks that erform, in reverse order, the inverse operations of the encoder: entropy decoder , inverse quantization and inverse transform block.

    InverseQuantization

    InverseTransform or

    EntropyDecoder

    Prediction

    Source Decoder

    51 1. IntroductionCoding of Audiovisual Contents

    Channel encoder and decoder The channel encoder and decoder play an important role when the

    channel is noisy or prone to error.

    They are designed to reduce the impact of channel noise by inserting a controlled form of redundancy into the source encoded data.

    As the output of the source encoder contains little redundancy, it would be highly sensitive to transmission noise without the addition of this controlled redundancy.

    Source Channel

    Encoder

    x

    ChannelDecoder

    SourceDecoder

    Channel

    Decoder

    x

    52 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    27/28

    CAV: Fall 2014 1. Introduction

    ETSETB TelecomBCN Universitat Politcnica de Catalunya UPC-BarcelonaTech 27

    Normalization and Audiovisual Coding Standards

    What do standards standardize? To correctly decode the bitstream produced by the coding system, the encoder

    has to share some information with the decoder: How the different codewords have been stored in the bitstream ( syntax definition )

    and Which actions are associated to their different decoded values (decoder definition ).

    What is left for commercial competition? Manufacturers still have a margin for innovation and competition

    Codec performance: parameters, coding tables, actual implementation, Added functionalities: interface, adaptation, complexity, quality, price,

    Im ortance of Normalization Normalization allows interconnection of equipment from different

    manufacturers Standards should arrive early in the market, to avoid de facto standards

    53 1. IntroductionCoding of Audiovisual Contents

    Standards ISO/MPEG

    ISO/IEC MPEG has published audio visual standards in the last 10 years to foster the development of the digital environment.

    MPEG1 1992, ISO/IEC 11172Coding of video & associated audio for Digital Storage Media (DSM) up to 1,5 Mbit/s in

    MPEG2 1994, ISO/IEC 13818 1996 Emmy for technical excellence Generic video and audio coding for TV Broadcasting of Standard Definition signals (SDTV) & High Definition (HDTV)

    MPEG4 1998, ISO/IEC 14496 2003 JVT (ISO/ITU-T) AVC Coding of Audio Visual Objects (AVO) oriented to manipulation (e.g. use of acompositor in the decoder)

    MPEG7 2001, ISO/IEC 15938Interface for the description of multimedia content, in order to facilitate search, retrieval and access of audio visual contents

    MPEG21 2003, ISO/IEC 21000Structure for the development of the multimedia, management of adaptation of content to different media and devices (repurposing) & digital rights management for content creation (applications: collections, multimedia libraries...)

    54 1. IntroductionCoding of Audiovisual Contents

  • 8/10/2019 Introduction to coding of audivisual contents

    28/28

    CAV: Fall 2014 1. Introduction

    Course contents

    Transform orPrediction Quantization

    x

    Source Encoder

    EntropyEncoder

    ChannelEncoder

    PerceptualAnalysis

    Inverse Quantization

    InverseTransform or

    Prediction

    Channel

    x EntropyDecoderChannelDecoder

    QuantizationTable

    CodingTable

    Transform orPrediction Model

    Source Decoder

    Unit 2 Entropy Coding CommunicationCourses

    Source Decoder

    Unit 3 Speech Coding, Unit 4 Audio Coding, Unit 5 Image Coding, Unit 6 Video Coding

    55 1. IntroductionCoding of Audiovisual Contents