9. video coding (vc-1)

Post on 11-Nov-2014

37 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Video coding (Part 5)Video coding (Part 5)⎯⎯Microsoft Window Media and VCMicrosoft Window Media and VC--11

Yi-Shin TungNational Taiwan University (NTU)

OutlineOutline

Windows Media family and its evolutionWMV applicationsVideo coding toolsComparison with MPEG-2, H.264/AVCPerformance evaluationsConclusions

Goal and applicationsGoal and applications

Focus on streaming compressed audio and video over the Internet to personal computers.Has a vision to move forward and enable the effective delivery of digital media through any networks to any devices.Applications include:– Internet based application like Web broadcast, VOD.– Consumer electronics like DVD, car audio and mobile

phones.– Terrestrial and satellite broadcast (DVB-T and DVB-S)

WM endWM end--22--end deliveryend delivery

Windows SDKWM porting kit

Windows Media Windows Media CodecsCodecs

Audio codec– Windows Media Audio 9 (mono/stereo, 8kHz~48kHz,

5kbps~320kbps, CD quality at 48~128kbps)– Windows Media Audio 9 Professional (5.1 or 7.1 ch, up to 96kHz,

up to 24 bits/sample, 128kbps~)– Windows Media Audio 9 Lossless (2:1 ratio for stereo)– Windows Media Audio 9 Voice (mono, 4kbps~20kbps, hybrid

CELP/transform coding)Video codec

– Windows Media Video 7 and 8 (non-standard version of MPEG-4)– Windows Media Video 9 (VC-9, VC-1) (160x120@10kbps,

BT.601@2Mbps, 720p@4~6Mbps, 1080i@6~20Mbps)– Windows Media Video 9 Screen (generally 28kbps, 100kbps for

images)– Windows Media Video 9 Image (slide show and transitions)

Encoding operational modesEncoding operational modes

One-pass CBR (live encoding and transmission)Two-pass CBR (offline encoding for on-demand streaming)One-pass VBR (live capture)Two-pass VBR (download-and-play applications)Peak-constrained VBR (constrained reading-speed)

Avg/max/min bitrates are specified.

Multiple bitrate encoding (MBR)

WMV statusWMV status

HD movies have been commercially released in 2003.WMV-9 is under consideration of SMPTE, to be VC-1 by C-24 group, Sep 2003. Promoted to CD, March 2004.

– previously named “Proposed SMPTE Standard for Television: VC-9 Compressed Video Bitstream Format and Decoding Process”

VC-1 becomes a mandatory codec for two major formats of HD video

– HD-DVD: Microsoft on every DVD, Feb 2004http://news.com.com/2100-1041_3-5166786.html?tag=nefd_top

– BD-DVD (blu-ray): H.264 and VC-1 added to blu-ray standardhttp://www.digitmag.co.uk/news/index.cfm?NewsID=4382

MPEG-LA announces plan for joint VC-1 license– Call for essential patents is first step (http://www.mpegla.com/pid/vc9/)

Decoding process block diagramDecoding process block diagram

Bit-streamParsing

Overlap Smooth & Loop Filter

Decoded Frame

Buffer(1-frame delay)

Inv.VLC

InvQuant

InvTransf

PredInv.VLC

Motion Compensation

? pel interp

4MV

? pel interp

Intensity Comp.

&Range

Re-mapping

Out-of-Loop Processing

Post-filtering

Color Conv.

Re-sizing

Implementation-specific

Conforming Implementation

Same structureSame structure

Internal color format is 8-bit 4:2:0.Block-based motion compensation and spatial transform.I/P/B definitions are similar to MPEG-4. (not as H.264)

Design criteriaDesign criteria

Design metrics– Rate-distortion curve– Visually feedback by cinema testing– Drift-free design for bit exact reconstruction– Computational complexity v.s. coding gain

FP arithmetic is ruled out16 bit word size is preferredConditional statements should be minimized.

Guideline: Any inefficiency in signal processing operations tends to have a big impact on R-D at high rates, whereas any inefficiency in entropy coding has more impact at low rate R-D plot.– Signal process ops: motion comp., transform, loop filtering.– Entropy coding: zigzag scanning, motion vector prediction.

Salient innovations of WMVSalient innovations of WMV--99

Adaptive block size transformLimited precision transform setAdaptive motion compensationAdaptive quantizationAdvanced entropy codingLoop filteringAdvanced B frame codingInterlace codingOverlap smoothingLow-rate toolsFading compensation

Adaptive block size transformAdaptive block size transform

Large transform v.s. small transform– Pros: good to capture trends and periodicities– Cons: spreading effects due to local transients, ringing effects

Trends and textures are better preserved by large transform, while areas of discontinuity are better by small transform.One 8x8, two 8x4, two 4x8 or four 4x4 transforms are applicable to code a block, which allows to use the size best suited for the underlying data.Transform type can be signaled at the frame, macroblock or block level.Intra block always adopts 8x8 transform.

Adaptive block size transform (contAdaptive block size transform (cont’’d)d)

The ability of retain texture information by large transform.Although R-D gain is not huge, it provides major subjective quality benefits, especially for the subtle texture, film details and grain noise.In H.264 high profile, adaptive transform is added for acknowledging this benefit.

16 bit transform16 bit transformDesign constraints

– A full 16-bit operation, where both sums and products of two 16-bit values produce results within 16-bits.

– Forward and inverse transform form an orthogonal pair. V×U = diag(D)– Transform approximates a DCT.– Norms of basis functions within one transform type are identical.– Norms of basis functions between transform types are identical.

8x8 inverse transform places the tightest constraint.WMV-9 relaxes the last two constraints. The norms are in the ratio 288:289:292 (1% difference). This is compensated during encodingprocess.Row Itrans => rounding => column Itrans => rounding

Motion compensationMotion compensation

8x8 or 16x16 predictionUp to ¼-pel motion vector is adopted.Adaptive motion mode derived from 3 criteria (MV resolution, size, filtering type) is signaled at frame level.– Mixed block size (16x16 and 8x8), ¼-pel, bicubic [high

bitrate]– 16x16, ¼-pel, bicubic– 16x16, ½-pel, bicubic– 16x16, ½-pel, bilinear, [low bitrate]

BicubicBicubic filteringfilteringDirect filtering approach, where the 4-tapped coefficients are– (-1*P1 + 9*P2 + 9*P3 -1*P4 + 8 – r) >> 4– (-4*P1 + 53*P2 + 18*P3 – 3*P4 + 32 – r) >> 6– (-3*P1 + 18*P2 + 53*P3 – 4*P4 + 32 – r) >> 6

¼-pel bilinear filtering is applied to chrominance components. ½-pel bilinear is optional for low complexity applications.

Case 3

Case 6

Case 2

Case 1

Case 4

Case 5

Integer locations Case 7Case 8

Adaptive quantizationAdaptive quantization

The same quantization rule applies to all 4 transform coeffs.Two quantization modes, decided at each frame– Dead-zone, suitable for low bitrate, {-kQ-D, 0, kQ+D}– Regular uniform quantization, high bitrate, {kQ}– Adaptively change according to the running QP

In the encoding side, dead-zone is always existed.5/2×QP

3/2×QP

Dead-zone

Regular uniform quant

Entropy coding: Context adaptive multiple Entropy coding: Context adaptive multiple VLCsVLCs

In WMV9, up to 8 tables (coding sets) are used for coding each symbol and is selected by each frame. E.g., there are 8 transform AC coeff. tables, which is different from H.264, symbols are encoded adaptively by several tables of different symbol distributions.

Y blocks Cb and Cr blocksIndex Table Index Table

0 High Rate Intra 0 High Rate Inter

1 High Motion Intra 1 High Motion Inter

2 Mid Rate Intra 2 Mid Rate Inter

Y blocks Cb and Cr blocksIndex Table Index Table

0 High Rate Intra 0 High Rate Inter

1 High Motion Intra 1 High Motion Inter

2 Mid Rate Intra 2 Mid Rate Inter

Coding Set Correspondence for PQINDEX <= 8

Coding Set Correspondence for PQINDEX > 8 run_before zerosLeft1 2 3 4 5 6 >6

0 1 1 11 11 11 11 1111 0 01 10 10 10 000 1102 - 00 01 01 011 001 1013 - - 00 001 010 011 1004 - - - 000 001 010 0115 - - - - 000 101 0106 - - - - - 100 001… - - - - - - …

Entropy coding:Entropy coding: BitplaneBitplane codingcoding

Some symbols are spatially correlated, e.g. MB type. An efficient way to encode these symbols by taking advantage of spatial dependency of these bits7 Modes: Raw, RowSkip, ColSkip, Norm-2, Norm-6, Diff-2 and Diff-6

skip

intra

interMB type of P-VOP

………

… … … …

Norm-2Diff-2

Norm-6Diff-6

Row-skipCol-skip

Loop filteringLoop filtering

Independent block coding leads to– Visible “blocky” artifacts– The quality reduction of reference frames

In-loop deblocking filter is used as H.264.Filtering is applied to every 4th, 8th, 12th, etc pixel row or column depending on transform type.Adaptive filtering ruleA shortcut to save computation.Filtering energy is small than that of H.264.

Shortcut

Interlace codingInterlace coding

Field picture coding mode– Intra-MB is coded as the progressive case– Inter-MB may be either predicted by one 16x16 or four 4x4

MVs, where each MV can refer to either one of two previously encoded fields.

Interlace coding (contInterlace coding (cont’’d)d)

Frame picture coding mode– Intra-MB may be coded by frame DCT or field DCT.– Inter-MB may adopt frame prediction (1 or 4 MVs) or field

prediction (2 or 4 MVs) in addition to DCT types.

Advanced BAdvanced B--frame codingframe coding

Explicit coding of the B frame’s temporal position relative to its two reference frames. (variable velocity model)Intra-coded B frames.Improve MV coding efficiency.Allow bottom B-field to refer to top B-field.

Overlap smoothing Overlap smoothing

Another technique to reduce blocking artifacts in intra areas.Drawback of deblocking filtering

– It is purely a decoder process, which operates equally on both block-aligned true edges and apparent block edges.

– Usually disable in the less complex profiles.The lapped transform is another way to remove blocking effect.Spatial-domain approach makes lapped transform as a pre- and post-processing.Adaptive applications rule: applied in the lower bitrate, also can be switched on or off at MB-basis.

3

70011711

11711007

1

0

1

0

3

2

1

0

3

2

1

0

>>

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

+

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

−−

=

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

rrrr

xxxx

yyyy

a0 a1 b1 b0

p0 p1 q1 q0

LowLow--rate tools (Rate control tools)rate tools (Rate control tools)Dynamic range reduction (intensity res.)– Luminance and chrominance values may be scaled down by

a factor of 2 before coding.

Dynamic frame resizing (spat. res.)– Coded frame size may be half in vertical, horizontal or both

to further reduce rate cost and keep the constant bitrate requirement.

Int. range reductionFrame re-sizing

original

Fading compensationFading compensation

Effective with global illumination changes– Natural illumination changes– Artificial transitioning effects, such as fade-to-black, fade-

from-black and dissolves, blending, cross-fades and morphing.

Encoder detects fading prior to motion compensation by comparing the error measure with a threshold.Encoder and decoder use the quantized fading parameters based on a linear first-order function to transform the original reference frame into a new reference frame.

Video smoothingVideo smoothingInterpolate missing frames after decoding, also referred to as frame interpolationUse an advanced optical flow estimation technique (on a per-pixel basis), along with warping, to synthesize new frames.Need a CPU at 733MHz to interpolate a video clip at 320x240 from 10 to 30 fps.J. Ribas-Corbeta and J. Sklansky, “Interframe interpolation of cinematic sequences,” Journal of VCIR, Dec 1993.

Profiles and levelsProfiles and levels

Simple profileMain profileAdvanced profile

MPEGMPEG--2 Video2 Video SMPTE VCSMPTE VC--99 H.264/AVCH.264/AVC

prediction coding

Motion res. & Interpolation ½ bilinear Adaptive ½ bilinear +½ 4-tap FIR +¼ 4-tap FIR/direct

¼ 6-tap FIR/cascaded

Motion block size 16x16 16x16, 8x8 16x16, 16x8, …, 4x4

Brightness change N/A Intensity compensation (P/B) Weighted prediction (B)

Intra prediction Freq-domain pred. (DC) Freq-domain pred. (DC/AC) Spatial-domain prediction

transform coding, entropy coding & postprocessing

CA Multiple VLCs N/A Y Y

Bitplane coding N/A Y N/A

Dynamic frame resizingDynamic range reduction

N/A Y N/A

Streaming & error resilience

Data partitioning N/A N/A Slice level partitioning

Bitstream switching N/A System level SI/SP frames

Post-processing Optional In-the-loop deblockingOverlapped transform

In-the-loop deblocking

Rate control

Quantization uniform Adaptive uniform and non-uniform log scale

Arithmetic coding N/A N/A Y (Main profile)

Ref. Frame num (P/B) 1/2 1/2 M/M

Generalised B N/A N/A Y

Inter-intra mixed N/A Y N/A

Transform size & type 8x8 float 8x8, 8x4, 4x8, 4x4 integer 4x4 integer (only +, >>)

Comparison among HDComparison among HD--DVD video candidates DVD video candidates

WMV v.s. MPEGWMV v.s. MPEG--22

WMV v.s. MPEGWMV v.s. MPEG--4 SP4 SP

WMV v.s. H.264WMV v.s. H.264Glasgow_qcif_15fps

29

30

31

32

33

34

35

36

37

38

39

40

41

42

0 100 200 300 400 500 600

Kbps

PSN

R-Y WMV9

H264-1ref

ConclusionsConclusions

Software and hardware components can be developed based on SDKs or WM hardware porting kits.WM 9 provides a variety of state-of-the-art audio and video codecs for different applications.The quality of WMV-9 is competitive with H.264/AVC and arguably superior based on several independent tests, with significantly lower computational complexity.This paper explains why some of the tools unique to WMV-9 provide an intrinsic quality benefit over H.264/AVC.

Reading assignmentReading assignment

Mandatory– Sridhar Srinivasan et al., Windows Digital Media Division,

Microsoft Corporation, “Windows Media Video 9: overview and applications,“ Signal Processing: Image Communication, Oct 2004.

HomeworkHomework

7. Composite symbol represents different properties of one MB, and tries to exploit its joint occurrence probability. Bitplane coding collects the same symbol for all MBs and removes the in-between correlations. Could you think out a way to simultaneously take advantage of both?

top related