data formats and codecs 30/8 – 2004 inf 5070 – media storage and distribution systems
TRANSCRIPT
Data Formats and Data Formats and CodecsCodecs
30/8 – 2004
INF 5070 – Media Storage and Distribution Systems
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Why codecs and formats? Codecs (coders/decoders)
Determine how information is represented Important for servers and distribution systems
Required sending speed Amount of loss allowed Buffers required …
Formats Determine how data is stored Important for servers and distribution systems
Where is the data? Where is the data about the data?
Media data
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Media data Medium: "Thing in the middle“
here: means to distribute and present information Media affect human computer interaction
The mantra of multimedia users Speaking is faster than writing Listening is easier than reading Showing is easier than describing
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Dependence of Media Time-independent media
Text Graphics Discrete media
Time-dependent media Audio Video Continuous media
Interdependant media Multimedia
"Continuous" refers to the user’s impression of the data, not necessarily to its representation
Combined video and audio is multimedia - relations must be specified
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Dependence of Media Defined by the presentation of the data, not its
representation
Discrete media Text Graphics Video stills (image displayed by pausing a video stream)
Continuous media Audio Video Animation Ticker news (continuously scrolling text)
Multimedia Multiplexed audio and video
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Properties of a Multimedia System Flexibility
Provide mechanisms to handle all kinds of media, in particular, discrete and continuous media
A VCR and a desktop publishing system for text and graphics are no multimedia systems
An editor with voice annotation is a multimedia system
Integration Independent media storage Computer-controlled media combination
DefinitionA multimedia system is characterized by the
integrated computer-controlled handling of independent discrete and continuous media
Coding for distribution
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Compression - Necessity
E.g., video sequence 25 images/sec.
PAL standard 3 byte/pixel
YUV (luminance + 2 chrominance values) RGB (red-green-blue values)
Image resolution 640 * 480 pixel Data rate = 640 * 480 * 3 Byte * 25/s = 23040000 byte/s
~ 22 MByte/s Approx. 1/16 stream over Ethernet Approx. 1/2 stream over Fast Ethernet
Compression is necessary
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Compression – General Requirements
Dependence on application type: Dialogue mode Retrieval mode
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Compression – Mode Dependent Requirements
Dialogue and retrieval mode requirements: Synchronization of audio, video, and
other media
Dialogue mode requirements: End-to-end delay < 150ms Compression and decompression in real-
time Symmetric
Retrieval mode requirements: Fast forward and backward data retrieval Random access within 1/2 s Asymmetric
We look mainly at retrieval mode!
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Compression Categories
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Basic Encoding Steps
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Run-Length Coding Assumption
Long sequences of identical symbols
Example
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Bit-Plane Coding Assumption
Even longer sequences of identical bits
Example
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB)
0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, … ,0,0 (MSB-1)
1,0,1,0,0,1,0,1,1,0,0,1,0,0,0,0, … ,0,0 (MSB-2)
0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0, … ,0,0 (MSB-3)
10,0,6,0,0,3,0,2,2,0,0,2,0,0,1,0, … ,0,0 (absolute)
0,x,1,x,x,1,x,0,0,x,x,1,x,x,0,x, … ,x,x (sign bits)
(0,1)
(2,1)
(0,0)(1,0)(2,0)(1,0)(0,0)(2,1)
(5,0)(8,1)
Up to 20% savings over run-length coding can be achieved
No 0s before a 1End of plane
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Huffman Coding Assumption
Some symbols occur more often than others E.g., character frequencies of the English language
Fundamental principle Frequently occurring symbols are coded with shorter
bit strings
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Huffman Coding Example
Characters to be encoded: A, B, C, D, E
Probability to occur: p(A)=0.3, p(B)=0.3, p(C)=0.1, p(D)=0.15, p(E)=0.15
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Huffman Table and example of application to data stream
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
JPEG “JPEG”: Joint Photographic Expert Group
International Standard: For digital compression and coding of continuous-tone still
images: Gray-scale Color
Since 1992
Joint effort of: ISO/IEC JTC1/SC2/WG10 Commission Q.16 of CCITT SGVIII
Compression rate of 1:10 yields reasonable results
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
JPEG Very general compression scheme
Independence of Image resolution Image and pixel aspect ratio Color representation Image complexity and statistical characteristics
Well-defined interchange format of encoded data
Implementation in Software only Software and hardware
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
JPEG Sequence of compression steps
Different resolutions possible Lossy or lossless mode
lossless compression factor ~1,6:1 Symmetrical codec
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
JPEG – Baseline Mode: Quantization Use of quantization tables for the DCT-coefficients
Map interval of real numbers to one integer number Allows to use different granularity for each coefficient
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
JPEG – 4 Modes of Compression
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Motion JPEG Use series of JPEG frames to encode video
Pro Lossless mode – editing advantage Frame-accurate seeking – editing advantage Arbitrary frame rates – playback advantage Arbitrary frame skipping – playback advantage Scaling through progressive mode – distribution
advantage Min transmission delay = 1/framerate – conferencing
advantage Supported by popular frame grabbers
Contra Series of JPEG-compressed images No standard, no specification
Worse, several competing quasi-standards No relation to audio No inter-frame compression
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
H.261 (px64) International Standard
Video codec for video conferences at p x 64kbit/s (ISDN): Real-time encoding/decoding, max. signal delay of 150ms Constant data rate
Intraframe coding DCT as in JPEG baseline mode
Interframe coding, motion estimation
Search of similar macroblock in previous image and compare Position of this macroblock defines motion vector Difference between similar macroblocks
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG (Moving Pictures Expert Group)
International Standard: Compression of audio and video for playback (1.5 Mbit/s): Real-time decoding
Sequence of I-, P-, and B-Frames:
Random access at I-frames at P-frames: i.e. decode previous I-frame first at B-frame: i.e. decode I and P-frames first
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-2 From MPEG-1 to MPEG-2
Improvement in quality From VCR to TV to HDTV
No CD-ROM based constraints Higher data rates
MPEG-1: about 1.5 MBit/s MPEG-2: 2-100 MBit/s
Evolution 1994: International Standard Also later known as H.262 Prominent role for digital TV in DVB (digital video
broadcasting) and DVD (digital video disk) Commercial MPEG-2 realizations available
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-2 Beyond MPEG-1:
Higher quality encoding Higher data rates Interleaved modes
Use cases Broadcast quality production
DVB-T: Terrestrial DVB-S: Satellite DVB-C: Cable
Program Stream for post-processing, storage, and DVD distribution
Transport Stream for broadcasting, error resilience
Scaling: Signal to Noise Ration (SNR) scaling - progressive compression
error correcting codes Spatial scaling - several pixel resolutions Temporal scaling - frame dropping
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4 MPEG-4 (ISO 14496) originally
Targeted at systems with very scarce resources To support applications like
Mobile communication Videophone and E-mail
Max. data rates and dimensions (roughly) Between 4800 and 64000 bits/s 176 columns x 144 lines x 10 frames/s
Further demand To provide enhanced functionality to allow for
analysis and manipulation of image contents
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4
Hence: find standardized ways to Represent units of aural, visual or audiovisual content
audio/visual objects" or AVOs object coding independent of other objects, surroundings and
background natural and synthetic objects
Compose these objects together i.e. creation of compound objects that form audiovisual scenes
Multiplex and synchronize the data associated with AVOs for transportation over network channels providing a QoS (Quality-of-
Service) Interact with the audiovisual scene generated at the decoder’s site
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Scope Definition of
„System Decoder Model“ specification for decoder implementations
Description language binary syntax of an AV object’s bitstream representation scene description information
Corresponding concepts, tools and algorithms, especially for
content-based compression of simple and compound audiovisual objects
manipulation of objects transmission of objects random access to objects animation scaling error robustness
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Scope Targeted bit rates for video and audio:
VLBV core „Very Low Bit-rate Video“ 5 - 64 Kbit/s image sequences with CIF resolution and up to 15 frames/s
Higher-quality video 64 Kbit/s - 4 Mbit/s quality like digital TV
Natural audio coding 2 - 64 Kbit/s
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Video and Image Encoding
Encoding / decoding of Rectangular images and video
coding similar to MPEG-1/2 motion prediction texture coding
Images and video of arbitrary shape as done in conventional approach
8x8 DCT or shape-adaptive DCT plus coding of shape and transparency information
Encoder Must generate timing information
speed of the encoder clock = time base desired decoding times and/or expiration times
by using time stamps attached to the stream Can specify the minimum buffer resources needed for decoding
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Composition of Scenes Scene description includes:
Tree to define hierarchical relationships between objects
Objects’ positions in space and time by converting the objects’ local coordinate system into a global
coordinate system Attribute value selection
e.g. pitch of sound, color, texture, animation parameters Description based on some VRML concepts
VRML = „Virtual Reality Modeling Language“ Interaction with scenes
e.g. change viewing point, drag object, start/stop streams, select language
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Example of a Composition
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Synthetic Objects Visual objects:
Virtual parts of scenes e.g. virtual background
Animation e.g. animated faces
Audio objects: „Text-to-speech“
speech generation from given text and prosodic parameters face animation control
„Score driven synthesis“ music generation from a score more general than MIDI
Special effects
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4: Error Handling Mobile communication:
Low bit-rate (< 64 Kbps) Error-prone
MPEG-4 concepts for error handling: Resynchronization
enables receiver to „tune in“ again based on markers within bitstream
Data recovery enables receiver to reconstruct lost data encode data in an error-resilient manner
Error concealment enables receiver to bridge gaps in data e.g. by repeating parts of old frames
Network-aware coding
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Network-aware coding Adapt to reality of the Internet
Content Is created once, off-line Is sent many times, under different circumstances
No guarantees concerning Throughput Jitter Packet loss
Sending rate Must adhere to rules Often: don’t send more than TCP would
Can’t send at the best available encoding rate
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Approaches Simulcast Scalable coding
SNR Scalability Temporal Scalability Spatial Scalability Fine Grained Scalability
Multiple Description Coding
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
3 simulcast rates
Simulcast Choose a set of sending
rates During content creation
Encode content in best possible quality below that sending rate
During transmission Choose version with the
best admissable quality
Sending rate
Qu
alit
y
Best possi
ble quality at p
ossible se
nding rate
Single rate codec
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Scalable coding Typically used as
Layered coding
A base layer Provides basic quality Must always be transferred
One or moreenhancement layers Improve quality Transferred if possible
Sending rate
Qu
alit
y
Best possi
ble quality at p
ossible se
nding rate
Base layer
Enhancement layer
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Temporal Scalability Frames can be dropped
In a controlled manner Frame dropping does not violate dependancies Low gain example: B-frame dropping in MPEG-1
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Spatial Scalability Idea
Base layer Downsample the original image (code only 1 pixel instead of
4) Send like a lower resolution version
Enhancement layer Subtract base layer pixels from all pixels Send like a normal resolution version
If enhancement layer arrives at client Decode both layers Add layers
72
75 83
6173
-1 -12
2 10
Base layer
Enhancement layerBetter compression due to low values
Less data to code
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Spatial Scalability
raw video base layer
enhancement layer
DS - downsamplingDCT – discrete cosine transformationQ – quantizationVLC – variable length coding
-
+
DS DCT Q VLC
DS DCT Q VLC
DCT Q VLC
enhancement layer 2
-
+
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
SNR Scalability SNR – signal-to-noise ratio Idea
Base layer Is regularly DCT encoded A lot of data is removed using quantization
Enhancement layer is regularly DCT encoded Run Inverse DCT on quantized base layer Subtract from original DCT encode the result
If enhancement layer arrives at client Add base and enhancement layer before running Inverse
DCT
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
SNR Scalability
IQ+
raw video base layer
enhancementlayer
DCT – discrete cosine transformationQ – quantizationIQ – inverse quantizationVLC – variable length coding
DCT Q VLC
Q VLC
-
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Fine Grained Scalability Idea
Cut of compressed tail bits of samples Base layer
As in SNR coding Enhancement layer
Use bit-plane coding for enhancement layerinstead of run-level coding
Cut tail bits off until data rate is reached
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Fine Grained ScalabilityMSB (0,1)
MSB-1 (2,1)
MSB-2 (0,0)(1,0)(2,0)(1,0)(0,0)(2,1)
MSB-3 (5,0)(8,1)
…
Sending rate
Qu
alit
y
Best possi
ble quality at p
ossible se
nding rate
Goal of FGS
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Fine Grained Scalability
IQ+
raw video base layer
enhancementlayer
DCT – discrete cosine transformation
Q – quantizationIQ – inverse quantizationVLC – variable length codingBC – bitplane coding
DCT Q VLC
Q BC
-
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Fine Grained Scalability
+
raw videobase layer
enhancementlayer
DCT
-
Q
Q
IQ
BC
VLC
IQIDCTMotion
Estimation
+
Motion vectors
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Multiple Description Coding Idea
Encode data in two streams Each stream has acceptable quality Both streams combined have good quality The redundancy between both streams is low
Problem The same relevant information must exist in both
streams Old problem: started for audio coding in telephony Currently a hot topic
Multimedia File Formats
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Overview File formats
Define the storage of media data on disks Specify synchronization Specify timing Contain metadata
They allow Interchange of data without interpretation
Copying Platform independance
Management Editing Retrieval for presentation
Needed for all asynchronous applications
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
File Format Examples
Streaming format File format and wire format are identical MPEG-1, DVI
Streamable format File format specifies wire format(s) MPEG-4, Quicktime, Windows Media, Real Video
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
RTP Recorder Solution Pragmatic generic solution
Stores and sends all MBone sessions No interpretation of data Interpretation of network timestamps Derivation of synchronity information
data
indexfile
file
ReceiverReceiver
MBone VCR
Sender
Sender
data
indexfile
file
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Stored Motion JPEG Motion JPEG Chunk File Format (UC Berkeley)
Specifies entire clip’s length in s+ns Contains sequence of images Each image in Independent JPEG Group’s JFIF format
AVI MJPEG DIB (Microsoft) Supports audio interleaving Time-stamped data chunks One frame per AVI RIFF data chunk Hack for .le size > 1GB
Quicktime (Apple) Dedicated tracks for interleaving and timing One frame per field Several fields per sample Formats A: full JFIF images, B: QT headers and data only
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Quicktime File Format
Run-time choice of tracks availability of codecs bandwidth language
Quicktime file
sample
sample
sample
sample
media
med
ia
med
ia
locationsize order <prop>track
edit list
locationsize order <prop>track
edit list
track track choice track choice
default rateduration copyright author
sample
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
MPEG-4 File Format
trak (audio)
trak (BIFS)
trak (OD)
trak (video)
MP4 file
moov
other
1st OD
atoms...
mdat
interleaved, time-orderedBIFS, OD, video, and audioaccess units
MP4 file
mdat
video and audio access unitssome units may be unorderedsome units may be unused
mdat mdatmdatmoov
media file
BIFS access unitssome units may be unorderedsome units may be unused
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Other File Formats Real Video
Not published Supports various codecs Supports various encoding formats per .le Supports dynamic selection Supports dynamic scaling ("stream thinning")
AVI AVI is published Uses Resource Interchange File Format (RIFF) Supports various codecs
ASF / Windows Media File Format Submitted as MPEG-4 proposal (but refused) ASF files can include Windows binary code ASF is patented in the USA
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
Summary Storage and distribution system must support:
Discrete media such as text and graphics Continuous media such as audio and video Interrelated Multiplexed media
Encoding Format and File Format must be distinguished Separation of file format and wire format Streamable files vs. streaming format
Trend towards Formats that define presentation environments Interaction of encoding format and application Interaction of client and server
Influence on Distribution Systems?
2004 Carsten Griwodz & Pål Halvorsen
INF 5070 – media servers and distribution systems
References Ralf Steinmetz, Klara Nahrstedt: Multimedia Fundamentals, Volume I: Media Coding and Content
Processing (2nd Edition), Prentice Hall, 2002, ISBN 0130313998 Touradj Ebrahimi (Ed.), Fernando Pereira, The MPEG-4 Book, Prentice Hall, 2002, ISBN
0130616214 Weiping Li, Overview of Fine Granularity Scalability in MPEG-4 Video Standard, IEEE
Transactions on Circuits and Systems for Video Technology, 11(3), Mar. 2001 Vivek K. Goyal, Multiple Description Coding: Compression Meets the Network, IEEE Signal
Processing Magazine, Sep. 2001