9. video coding (vc-1)
Post on 11-Nov-2014
37 Views
Preview:
TRANSCRIPT
Video coding (Part 5)Video coding (Part 5)⎯⎯Microsoft Window Media and VCMicrosoft Window Media and VC--11
Yi-Shin TungNational Taiwan University (NTU)
OutlineOutline
Windows Media family and its evolutionWMV applicationsVideo coding toolsComparison with MPEG-2, H.264/AVCPerformance evaluationsConclusions
Goal and applicationsGoal and applications
Focus on streaming compressed audio and video over the Internet to personal computers.Has a vision to move forward and enable the effective delivery of digital media through any networks to any devices.Applications include:– Internet based application like Web broadcast, VOD.– Consumer electronics like DVD, car audio and mobile
phones.– Terrestrial and satellite broadcast (DVB-T and DVB-S)
WM endWM end--22--end deliveryend delivery
Windows SDKWM porting kit
Windows Media Windows Media CodecsCodecs
Audio codec– Windows Media Audio 9 (mono/stereo, 8kHz~48kHz,
5kbps~320kbps, CD quality at 48~128kbps)– Windows Media Audio 9 Professional (5.1 or 7.1 ch, up to 96kHz,
up to 24 bits/sample, 128kbps~)– Windows Media Audio 9 Lossless (2:1 ratio for stereo)– Windows Media Audio 9 Voice (mono, 4kbps~20kbps, hybrid
CELP/transform coding)Video codec
– Windows Media Video 7 and 8 (non-standard version of MPEG-4)– Windows Media Video 9 (VC-9, VC-1) (160x120@10kbps,
BT.601@2Mbps, 720p@4~6Mbps, 1080i@6~20Mbps)– Windows Media Video 9 Screen (generally 28kbps, 100kbps for
images)– Windows Media Video 9 Image (slide show and transitions)
Encoding operational modesEncoding operational modes
One-pass CBR (live encoding and transmission)Two-pass CBR (offline encoding for on-demand streaming)One-pass VBR (live capture)Two-pass VBR (download-and-play applications)Peak-constrained VBR (constrained reading-speed)
Avg/max/min bitrates are specified.
Multiple bitrate encoding (MBR)
WMV statusWMV status
HD movies have been commercially released in 2003.WMV-9 is under consideration of SMPTE, to be VC-1 by C-24 group, Sep 2003. Promoted to CD, March 2004.
– previously named “Proposed SMPTE Standard for Television: VC-9 Compressed Video Bitstream Format and Decoding Process”
VC-1 becomes a mandatory codec for two major formats of HD video
– HD-DVD: Microsoft on every DVD, Feb 2004http://news.com.com/2100-1041_3-5166786.html?tag=nefd_top
– BD-DVD (blu-ray): H.264 and VC-1 added to blu-ray standardhttp://www.digitmag.co.uk/news/index.cfm?NewsID=4382
MPEG-LA announces plan for joint VC-1 license– Call for essential patents is first step (http://www.mpegla.com/pid/vc9/)
Decoding process block diagramDecoding process block diagram
Bit-streamParsing
Overlap Smooth & Loop Filter
Decoded Frame
Buffer(1-frame delay)
Inv.VLC
InvQuant
InvTransf
PredInv.VLC
Motion Compensation
? pel interp
4MV
? pel interp
Intensity Comp.
&Range
Re-mapping
Out-of-Loop Processing
Post-filtering
Color Conv.
Re-sizing
Implementation-specific
Conforming Implementation
Same structureSame structure
Internal color format is 8-bit 4:2:0.Block-based motion compensation and spatial transform.I/P/B definitions are similar to MPEG-4. (not as H.264)
Design criteriaDesign criteria
Design metrics– Rate-distortion curve– Visually feedback by cinema testing– Drift-free design for bit exact reconstruction– Computational complexity v.s. coding gain
FP arithmetic is ruled out16 bit word size is preferredConditional statements should be minimized.
Guideline: Any inefficiency in signal processing operations tends to have a big impact on R-D at high rates, whereas any inefficiency in entropy coding has more impact at low rate R-D plot.– Signal process ops: motion comp., transform, loop filtering.– Entropy coding: zigzag scanning, motion vector prediction.
Salient innovations of WMVSalient innovations of WMV--99
Adaptive block size transformLimited precision transform setAdaptive motion compensationAdaptive quantizationAdvanced entropy codingLoop filteringAdvanced B frame codingInterlace codingOverlap smoothingLow-rate toolsFading compensation
Adaptive block size transformAdaptive block size transform
Large transform v.s. small transform– Pros: good to capture trends and periodicities– Cons: spreading effects due to local transients, ringing effects
Trends and textures are better preserved by large transform, while areas of discontinuity are better by small transform.One 8x8, two 8x4, two 4x8 or four 4x4 transforms are applicable to code a block, which allows to use the size best suited for the underlying data.Transform type can be signaled at the frame, macroblock or block level.Intra block always adopts 8x8 transform.
Adaptive block size transform (contAdaptive block size transform (cont’’d)d)
The ability of retain texture information by large transform.Although R-D gain is not huge, it provides major subjective quality benefits, especially for the subtle texture, film details and grain noise.In H.264 high profile, adaptive transform is added for acknowledging this benefit.
16 bit transform16 bit transformDesign constraints
– A full 16-bit operation, where both sums and products of two 16-bit values produce results within 16-bits.
– Forward and inverse transform form an orthogonal pair. V×U = diag(D)– Transform approximates a DCT.– Norms of basis functions within one transform type are identical.– Norms of basis functions between transform types are identical.
8x8 inverse transform places the tightest constraint.WMV-9 relaxes the last two constraints. The norms are in the ratio 288:289:292 (1% difference). This is compensated during encodingprocess.Row Itrans => rounding => column Itrans => rounding
Motion compensationMotion compensation
8x8 or 16x16 predictionUp to ¼-pel motion vector is adopted.Adaptive motion mode derived from 3 criteria (MV resolution, size, filtering type) is signaled at frame level.– Mixed block size (16x16 and 8x8), ¼-pel, bicubic [high
bitrate]– 16x16, ¼-pel, bicubic– 16x16, ½-pel, bicubic– 16x16, ½-pel, bilinear, [low bitrate]
BicubicBicubic filteringfilteringDirect filtering approach, where the 4-tapped coefficients are– (-1*P1 + 9*P2 + 9*P3 -1*P4 + 8 – r) >> 4– (-4*P1 + 53*P2 + 18*P3 – 3*P4 + 32 – r) >> 6– (-3*P1 + 18*P2 + 53*P3 – 4*P4 + 32 – r) >> 6
¼-pel bilinear filtering is applied to chrominance components. ½-pel bilinear is optional for low complexity applications.
Case 3
Case 6
Case 2
Case 1
Case 4
Case 5
Integer locations Case 7Case 8
Adaptive quantizationAdaptive quantization
The same quantization rule applies to all 4 transform coeffs.Two quantization modes, decided at each frame– Dead-zone, suitable for low bitrate, {-kQ-D, 0, kQ+D}– Regular uniform quantization, high bitrate, {kQ}– Adaptively change according to the running QP
In the encoding side, dead-zone is always existed.5/2×QP
3/2×QP
Dead-zone
Regular uniform quant
Entropy coding: Context adaptive multiple Entropy coding: Context adaptive multiple VLCsVLCs
In WMV9, up to 8 tables (coding sets) are used for coding each symbol and is selected by each frame. E.g., there are 8 transform AC coeff. tables, which is different from H.264, symbols are encoded adaptively by several tables of different symbol distributions.
Y blocks Cb and Cr blocksIndex Table Index Table
0 High Rate Intra 0 High Rate Inter
1 High Motion Intra 1 High Motion Inter
2 Mid Rate Intra 2 Mid Rate Inter
Y blocks Cb and Cr blocksIndex Table Index Table
0 High Rate Intra 0 High Rate Inter
1 High Motion Intra 1 High Motion Inter
2 Mid Rate Intra 2 Mid Rate Inter
Coding Set Correspondence for PQINDEX <= 8
Coding Set Correspondence for PQINDEX > 8 run_before zerosLeft1 2 3 4 5 6 >6
0 1 1 11 11 11 11 1111 0 01 10 10 10 000 1102 - 00 01 01 011 001 1013 - - 00 001 010 011 1004 - - - 000 001 010 0115 - - - - 000 101 0106 - - - - - 100 001… - - - - - - …
Entropy coding:Entropy coding: BitplaneBitplane codingcoding
Some symbols are spatially correlated, e.g. MB type. An efficient way to encode these symbols by taking advantage of spatial dependency of these bits7 Modes: Raw, RowSkip, ColSkip, Norm-2, Norm-6, Diff-2 and Diff-6
skip
intra
interMB type of P-VOP
………
… … … …
Norm-2Diff-2
Norm-6Diff-6
Row-skipCol-skip
Loop filteringLoop filtering
Independent block coding leads to– Visible “blocky” artifacts– The quality reduction of reference frames
In-loop deblocking filter is used as H.264.Filtering is applied to every 4th, 8th, 12th, etc pixel row or column depending on transform type.Adaptive filtering ruleA shortcut to save computation.Filtering energy is small than that of H.264.
Shortcut
Interlace codingInterlace coding
Field picture coding mode– Intra-MB is coded as the progressive case– Inter-MB may be either predicted by one 16x16 or four 4x4
MVs, where each MV can refer to either one of two previously encoded fields.
Interlace coding (contInterlace coding (cont’’d)d)
Frame picture coding mode– Intra-MB may be coded by frame DCT or field DCT.– Inter-MB may adopt frame prediction (1 or 4 MVs) or field
prediction (2 or 4 MVs) in addition to DCT types.
Advanced BAdvanced B--frame codingframe coding
Explicit coding of the B frame’s temporal position relative to its two reference frames. (variable velocity model)Intra-coded B frames.Improve MV coding efficiency.Allow bottom B-field to refer to top B-field.
Overlap smoothing Overlap smoothing
Another technique to reduce blocking artifacts in intra areas.Drawback of deblocking filtering
– It is purely a decoder process, which operates equally on both block-aligned true edges and apparent block edges.
– Usually disable in the less complex profiles.The lapped transform is another way to remove blocking effect.Spatial-domain approach makes lapped transform as a pre- and post-processing.Adaptive applications rule: applied in the lower bitrate, also can be switched on or off at MB-basis.
3
70011711
11711007
1
0
1
0
3
2
1
0
3
2
1
0
>>
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
+
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
−−
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
rrrr
xxxx
yyyy
a0 a1 b1 b0
p0 p1 q1 q0
LowLow--rate tools (Rate control tools)rate tools (Rate control tools)Dynamic range reduction (intensity res.)– Luminance and chrominance values may be scaled down by
a factor of 2 before coding.
Dynamic frame resizing (spat. res.)– Coded frame size may be half in vertical, horizontal or both
to further reduce rate cost and keep the constant bitrate requirement.
Int. range reductionFrame re-sizing
original
Fading compensationFading compensation
Effective with global illumination changes– Natural illumination changes– Artificial transitioning effects, such as fade-to-black, fade-
from-black and dissolves, blending, cross-fades and morphing.
Encoder detects fading prior to motion compensation by comparing the error measure with a threshold.Encoder and decoder use the quantized fading parameters based on a linear first-order function to transform the original reference frame into a new reference frame.
Video smoothingVideo smoothingInterpolate missing frames after decoding, also referred to as frame interpolationUse an advanced optical flow estimation technique (on a per-pixel basis), along with warping, to synthesize new frames.Need a CPU at 733MHz to interpolate a video clip at 320x240 from 10 to 30 fps.J. Ribas-Corbeta and J. Sklansky, “Interframe interpolation of cinematic sequences,” Journal of VCIR, Dec 1993.
Profiles and levelsProfiles and levels
Simple profileMain profileAdvanced profile
MPEGMPEG--2 Video2 Video SMPTE VCSMPTE VC--99 H.264/AVCH.264/AVC
prediction coding
Motion res. & Interpolation ½ bilinear Adaptive ½ bilinear +½ 4-tap FIR +¼ 4-tap FIR/direct
¼ 6-tap FIR/cascaded
Motion block size 16x16 16x16, 8x8 16x16, 16x8, …, 4x4
Brightness change N/A Intensity compensation (P/B) Weighted prediction (B)
Intra prediction Freq-domain pred. (DC) Freq-domain pred. (DC/AC) Spatial-domain prediction
transform coding, entropy coding & postprocessing
CA Multiple VLCs N/A Y Y
Bitplane coding N/A Y N/A
Dynamic frame resizingDynamic range reduction
N/A Y N/A
Streaming & error resilience
Data partitioning N/A N/A Slice level partitioning
Bitstream switching N/A System level SI/SP frames
Post-processing Optional In-the-loop deblockingOverlapped transform
In-the-loop deblocking
Rate control
Quantization uniform Adaptive uniform and non-uniform log scale
Arithmetic coding N/A N/A Y (Main profile)
Ref. Frame num (P/B) 1/2 1/2 M/M
Generalised B N/A N/A Y
Inter-intra mixed N/A Y N/A
Transform size & type 8x8 float 8x8, 8x4, 4x8, 4x4 integer 4x4 integer (only +, >>)
Comparison among HDComparison among HD--DVD video candidates DVD video candidates
WMV v.s. MPEGWMV v.s. MPEG--22
WMV v.s. MPEGWMV v.s. MPEG--4 SP4 SP
WMV v.s. H.264WMV v.s. H.264Glasgow_qcif_15fps
29
30
31
32
33
34
35
36
37
38
39
40
41
42
0 100 200 300 400 500 600
Kbps
PSN
R-Y WMV9
H264-1ref
ConclusionsConclusions
Software and hardware components can be developed based on SDKs or WM hardware porting kits.WM 9 provides a variety of state-of-the-art audio and video codecs for different applications.The quality of WMV-9 is competitive with H.264/AVC and arguably superior based on several independent tests, with significantly lower computational complexity.This paper explains why some of the tools unique to WMV-9 provide an intrinsic quality benefit over H.264/AVC.
Reading assignmentReading assignment
Mandatory– Sridhar Srinivasan et al., Windows Digital Media Division,
Microsoft Corporation, “Windows Media Video 9: overview and applications,“ Signal Processing: Image Communication, Oct 2004.
HomeworkHomework
7. Composite symbol represents different properties of one MB, and tries to exploit its joint occurrence probability. Bitplane coding collects the same symbol for all MBs and removes the in-between correlations. Could you think out a way to simultaneously take advantage of both?
top related