introduction to vp8

Introduction to VP8郭至軒 (KuoE0)

[email protected]

mailto:mailto:[email protected]?subject=

mailto:mailto:[email protected]?subject=

Latest update: Jun 13, 2013

Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

http://creativecommons.org/licenses/by-sa/3.0/







Situation

Video Codec

VP8

An Open Source Codec

Developed by On2 Technology

Developed by On2 Technology

February, 2010

Acquired by Google

February, 2010

Patent

March, 2013

web m

Royalty-Free TermsMarch, 2013

web m

Successor

VP9

Successor

VP9May 15, 2013

Feature

focus on

Internetweb-based

application

Low Bandwidth Requirement

Image Quality:

watchable (PSNR: ~30dB)

visually lossless (PSNR: ~45dB)

Heterogeneous Client Hardware

Heterogeneous Client Hardware

Efficient Implementations

Web Video Format

YUV 420 color sampling

8 bit per channel depth

Up to 16383 × 16383 pixels

Processing Flow

CodingPredict

Transform + Quantize

Entropy Code

Loop Filter

DecodingEntropy Decode

Predict

Dequantize+Inverse Transform

Loop Filter

Reference Frame

Golden FrameLast FrameAlternate Frame

Reference Frame

Golden Frame Last FrameAlternate

Frame

At most 3 reference frames in VP8.

Last Frame

Last Frame Current Frame

Golden Frame

Choose an arbitrary frame in the past.

Define a number of flags to notify decoderwhen and how to update this buffer.

set as the golden frame

Golden Frame

Golden Frame

Reconstruct

moving objectbackground

Alternate Frame

Other Frame

Alternate Frame

Alternate Frame

Other Frame

Alternate Frame

decodeshow

Alternate Frame

Other Frame

Alternate Frame

decodeshow

decode show

Alternate Frame

Other Frame

Alternate Frame

decodeshow

decode show

store beneficial information

Construct from multi-frame

Construct from multi-frame

Alternate Frame

Typical Frame

I B B P B B P B B I B B P

VP8

L G

A

G G G G G L G G G

A

G L

Prediction

Intra Prediction

Inter Prediction

use data within a single video frame

use data from previously encoded frames

Intra Prediction

Luma

LumaChroma

Intra Prediction

Luma

LumaChroma

16 4 8

H_PRED (horizontal prediction)

V_PRED (vertical prediction)

DC_PRED (DC prediction)

TM_PRED (TrueMotion prediction)

Four Prediction Modes:

Horizontal Prediction

Fills each column of the block with a copy of the left column.

a b c d ef g h i jk l m n op q r s tu v w x y

A B C D EF G H I JK L M N OP Q R S TU V W X Y





ejoty





ejoty

e e e e ej j j j jo o o o ot t t t ty y y y y

Vertical Prediction

Fills each row of the block with a copy of the above row.



Vertical Prediction



A B C D EF G H I JK L M N OP Q R S TU V W X YU V W X Y

Vertical Prediction




U V W X YU V W X YU V W X YU V W X YU V W X Y

DC Prediction

Fills the block with a single value using the average of the pixels in the above row and the left column.



DC Prediction




ejoty

Z = (U + V + W + X + Y + e + j + o + t + y) ÷ 10

DC Prediction




ejoty

Z = (U + V + W + X + Y + e + j + o + t + y) ÷ 10

Z Z Z Z ZZ Z Z Z ZZ Z Z Z ZZ Z Z Z ZZ Z Z Z Z

* * * * L0

* * * * L1

* * * * L2

* * * * L3

* * * * L4

* * * * ** * * * ** * * * ** * * * *

A0 A1 A2 A3 A4

TrueMotion Prediction

Horizontal differences between pixels in above row and vertical differences between pixels in left column are propagated (starting from C).

* * * * ** * * * ** * * * ** * * * ** * * * C

* * * * L0

* * * * L1

* * * * L2

* * * * L3

* * * * L4

* * * * ** * * * ** * * * ** * * * *

A0 A1 A2 A3 A4A0 A1 A2 A3 A4

L0

L1

L2

L3

L4



* * * * ** * * * ** * * * ** * * * ** * * * CC

Xij = Ai + Lj - C

* * * * L0

* * * * L1

* * * * L2

* * * * L3

* * * * L4

* * * * ** * * * ** * * * ** * * * *

A0 A1 A2 A3 A4A0 A1 A2 A3 A4

L0

L1

L2

L3

L4



* * * * ** * * * ** * * * ** * * * ** * * * CC

Xij = Ai + Lj - C

Xij Xij Xij Xij Xij

Xij Xij Xij Xij Xij

Xij Xij Xij Xij Xij

Xij Xij Xij Xij Xij

Xij Xij Xij Xij Xij

Inter Prediction

As mentioned above...

Inter Prediction

Golden Frame Last FrameAlternate

Frame

Motion VectorReusing vectors from neighboring macroblocks.

Flexible partitioning of a macroblock into sub-blocks.

Sub-pixel Interpolation

Quarter pixel accurate motion vectors for luma pixels.

High performance six-tap interpolation filters.[3, -16, 77, 77, -16, 3]/128 for 1⁄2 pixel positions[2, -11, 108, 36, -8, 1]/128 for 1⁄4 pixel positions[1, -8, 36, 108, -11, 2]/128 for 3⁄4 pixel positions

Hybrid Transform & Quantization

Divide into Macroblocks

One 16×16 block of luma pixels (Y)Two 8×8 blocks of chroma pixels (U, V)

Typical Method

16 8 8

Divide into blocks

VP8 MethodAll blocks of luma and chroma are 4×4 blocks

Discrete Cosine Transform

Fast implementation

Slightly worse in energy compaction than KLT

Content-independency

Coding2-D DCT

Decoding4×4 variant of LLM implementation

Coding2-D DCT

Decoding4×4 variant of LLM implementationPractical fast 1-D DCT algorithms with 11 multiplications

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=266596&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D266596

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=266596&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D266596

I1

I2

I3

I4

O1

O2

O3

O4

Inverse DCT Graph in VP8

y0

y1

x0

x1

y0 = √2(x0×sin(π/8)-x1×cos(π/8))y1 = √2(x0×cos(π/8)+x1×sin(π/8))

H.264/AVCuse multiplication-less integer transform

slightly better thanEnergy compaction is

It is efficient in processors with

SIMD capability.

Walsh-Hadamard Transform

Y = HXHT

H = 1 1 1 1 1 1 -1 -1 1 -1 1 -1 1 -1 -1 1 [ ]

HT is the transpose of H.

Take advantage of the correlation to reduce redundancy.

Adaptive Quantization

128 quantization level.

Different quantization level in single frame.1st order luma DC1st order luma AC2st order luma DC2st order luma AC

2st order chroma DC2st order chroma AC

Entropy Coding

Supports distribution updates on a per-frame basis

Boolean arithmetic coder

Stable probability distributions within one frame

Keyframes reset the probability values to the defaults

Adaptive Loop Filter

Removing blocking artifacts introduced by quantization and transformation.


Slight Filtering


Slight Filtering

Strong Filtering


Slight Filtering

Strong Filtering

No Filtering

Parallel Processing

Data Partition

Compressed Data

Data Partition

Compressed Data

marcoblock code mode& motion vector

transform coefficients

More Transform Coefficient Partition

transform coefficients

support up to 8 token partitions

Compare to H.264

100120140160180200220240260280300

Night 720p 2000kbps Sheriff 720p 2000kbps Tulip 720p 2000kbps

Deocding speed in Frame/second

VP8 H.264 High Profile

Intel Core i7 3.2GHz

20

25

30

35

40

45

Night 720p 2000kbps Sheriff 720p 2000kbps Tulip 720p 2000kbps

Deocding speed in Frame/second

VP8 H.264 High Profile

Intel Atom N270 1.66GHz

Any Questions?

Thanks for your listening :)