VLC 2009 PART 2

Download VLC 2009 PART 2

Post on 29-Aug-2014

108 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<p>2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Variable Length Coding Information entropy Huffman code vs. arithmetic code Arithmetic coding Why CABAC? Rescaling and integer arithmetic coding Golomb codes Binary arithmetic coding CABAC 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 The CABAC Framework Binarization, LPS and MPS Context modeling Binary arithmetic coding Binarization Context modeling Binary arithmetic coding 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Binary Arithmetic Coding Word-based to Image-based Q, QM coders JBIG, JBIG-2 and JPEG-LS M coder H.264 Three steps: modeling statistics-gathering, and coding with many sets of probabilities2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Binary Arithmetic Coding least probable symbol (LPS) &amp; most probable symbol (MPS) Binary arithmetic is based on the principal of recursive interval subdivision. Suppose that an estimate of the probability pLPS in (0,0.5] is given and its lower bound L and its width R. Based on this, the given interval is sub-divided into two sub-intervals: RLPS=R pLPS and the dual interval RMPS=R - RLPS. In a practical implementation, the main bottleneck in terms of throughput is the multiplication operation required. Speeding up the required calculation by introducing some approximations of either the range R or of the probability pLPS such that multiplication can be avoided. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Binary Coding Binary coding allows a number of the components of a coder for a multi-symbol alphabet to be eliminated. There is no need for the statistics data structure, since cumulative frequency counts are trivially available. c0 and c1: the frequencies of symbols zero and one, respectively One multiplicative operation can be eliminated in the encoder when the LPS is transmitted, and one multiplicative operation avoided in the decoder for the MPS and two for the LPS. + = + = == = =) ( ) () 1 ( ) () or() or( symbol observed th the :symbol ththe to up prob cumulative , ) ( , ..., , 2 , 1 , ) 1 ( ) 1 ( ) 1 ( ) () 1 ( ) 1 ( ) 1 ( ) (1 0 11 01n Xn n n nn Xn n n nn nl nllnn n Xlx F l u l ux F l u l lc c d dc c a n xa n p x FLPS MPSL l aC. contextfor the LPS the of occurrence the of prob the : ) 1 ( : symbol LPS the of occurrence the) 1 ( : symbol MPS the of occurrence thel subinterva lowerthe to MPS the Map) 1 ( ) () 1 ( ) 1 ( ) () 1 ( ) () 1 ( ) (CLPSCLPSn nCLPSn n nCLPSn nn npp R Rp R l lp R Rl l = + = ==MPS LPS 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Issues in Word-based Models An efficient data structure is needed to accumulate frequency counts for a large alphabet. Multiple coding contexts are necessary, for tokens, characters, and lengths, for both words and nonwords. Here, a coding context is a conditioning class on which the probability distribution for the next symbol is based. An escape mechanism is required to switch from one coding context to another. Data structures must be resizable because there is no a priori bound on alphabet size. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 The Models The word-based model uses six contexts: a zero-order context for words, a zero-order context for nonwords (sequences of spaces and punctuation), a zero-order character context for spelling out new words, a zero-order character context for spelling out new nonwords, and contexts for specifying the lengths of words and of nonwords. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 The Statistics Module Managing the data structure that records cumulative symbol frequencies. Encode a symbol s in context C lC,s, hC,s and tC: the cumulative frequency counts in context C of symbols respectively prior to and including s, according to some symbol ordering, and tC is the total frequency of all symbols recorded in context C. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Performing the Arithmetic arithmetic_encode(l, h, t): which encodes a symbol implicitly assumed to have occurred h - l times out of a total of t, and which is allocated the probability range [l/t, h/t). [L, L + R), L and R: the current lower bound and range of the coding interval L and R: b-bit integers; L is initially 0, and takes on values between 0 and [2b - 2b-2); R is initially 2b-1,</p> <p>and takes on values between 2b-2 + 1 and 2b-1. Renormalization: To minimize loss of compression effectiveness due to imprecise division of code space, R should be kept as large as possible. This is done by maintaining R in the interval 2b-2 &lt; R s 2b-1 prior to each coding step, and making sure that t &lt; 2f, with f s b - 2. 0 B=2b-1 B 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 The Q-Coder Q-Coder is the benchmark The Q-Coder combines probability estimation and renormalization in a particularly elegant manner, and implements all operations as table lookups. QM Coder: Assume that R(n) has the value close to one, rescale if not The probabilityfor context C is updated each time a rescaling take place It may happen that the symbol assigned to LPS actually occurs more often than the symbol assigned to MPS. The condition is detected and the assignments are reversed when CLPSnn np Rl l ==1 : MPS For ) () 1 ( ) (CLPSpCLPSn CLPSp R p &gt;) (CLPSnCLPSn np Rp l l= =) () 1 ( ) ( ) 1 ( : LPS For 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Caveat !Arithmetic Coding Revisited, ACM Trans on Information Systems, 1998 For binary alphabets, if there is only one state and if the symbols are independent and can be aggregated into runs, then Golomb or other similar codes should be used. For multi-symbol alphabets in which the MPS is relatively infrequent, the error bounds on minimum-redundancy (Huffman) coding are such that the compression loss compared to arithmetic coding is very small. If static or semi-static coding (that is, with fixed probabilities) is to be used with such an alphabet, a Huffman coder will operate several times faster than the best arithmetic coding implementations, using very little memory. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 CABAC Binarization Huffman tree unary, truncated unary, kth order exp-Golomb, and fixed-length codes Context modeling Adaptive probability models Binary arithmetic coding: M coder Table-based BAC 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 QM-coder: fast, multiplication-free variants of binary arithmetic coder, used in JBIG-2, JPEG-LS, and JPEG-2000 Goal(M Coder) : design a variation with lower computational complexity of 10% than QM and 20%than Standard-AC, for the total decoder execution time at medium bitrate Background &amp; Goal 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 An Example A given non-binary value syntax element is uniquely mapped to a binary sequence, a so-called bin string. 3 of mb_type (macroblock type P_88) is coded by 001 The symbol probability p(3) =the product of p(C0)(0), p(C1)(0), and p(C2)(1), where C0, C1, and C2 are denote the binary probability models of the internal nodes. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Why Binarization? Adaptive m-ary binary arithmetic coding (m &gt; 2) is in general requiring at least two multiplication for each symbol to encode as well as a number of fairly operations to perform the probability update. Fast and multiplication-free variants of binary arithmetic coding. Since the probability of symbols with larger bin strings is typically very low, the computation overhead is fairly small and can be easily compensated by using a fast binary coding engine. Binarization enables context modeling on sub-symbol level. For the most frequently observed bins, conditional probability can be used, while less frequently observed bins can be treaded using a joint, typically zero-order probability model. Independent or memoryless 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Binarization Schemes A binary representation for a given non-binary valued syntax element should be close to a minimum redundancy code. Instead of Huffman tree, the design of CABAC (mostly) relies the (a few) basic code trees, whose structure enables a simple on-line computation of all code words without the need for storing any tables. Unary code (U) and truncated unary code (TU) The kth order exp-Golomb code (EGk) The fixed-length code (FL) In addition, there are binarization schemes based on a concatenation of these elementary types. As an exception, there are five specific binary trees selected manually for the coding of macroblock and sub-macroblock types. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Unary and Truncated Unary Binarization unsigned integer valued symbol x&gt; 0, the unary code word consists of x 1 bits plus a terminating 0 bit. U: 5 111110 The truncated unary (TU) code is only defined for x with 0 s x s S, where for x &lt; S the code is given by the unary code, whereas for x = S the terminating 0 bit is neglected. S = 9:6: 1111110 9: 111111111 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 kth order exp-Golomb Binarization The prefix part: a unary code corresponding to the value l(x)= lg(x/2k+1) The suffix part: is computed as the binary representation of x+2k(1-2l(x)) using k+l(x) significant bits. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Fixed-Length Binarization x, a syntax element, 0 s x s S: the FL codeword of x is the binary representation of x with a fixed (minimum) number lFL=lgS( of bits. Typically, FL binarization is applied to syntax elements with a nearly uniform distribution or to syntax elements, where each bit in the FL binary representation represents a specific coding decisions. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Concatenation Schemes coded block pattern: Concatenation of Prefix: 4-bit FL for luminance Suffix: TU with S = 2 for chrominance motion vector difference (mvd): truncated unary / kth order exp-Golomb binarization the unary code is the simplest prefix-free code permits a fast adaptation of the individual symbol probabilities in the sub-sequent context modeling Prefix: TU with S = 9 for |mvd| &lt; 9 Suffix: EG3 for |mvd - 9| if |mvd| &gt; 9 Sign bit These observations are only accurate for small values of the absolute motion vector differences and transform coefficient levels.2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Transformed coefficient level Prefix: TU with S = 14 for |level| Suffix: EG0 for | level - 14| if |level| &gt; 142012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 CABAC Binarization Huffman tree unary, truncated unary, kth order exp-Golomb, and fixed-length codes Context modeling Adaptive probability models Binary arithmetic coding Table-based BAC 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Context Modeling To utilize a clean interface between modeling and coding such that in the modeling stage, a model probability distribution is assigned to the given symbols, which then,in the subsequent coding stage, drives the actual coding engine to generate a sequence of bits as a coded representation of the symbols according to the model distribution. F: T C operating on the template T to a related set C={0,,C-1} of contexts x, a symbol to be codedz, an already coded neighboring symbols in T a conditional probability p(x|F(z)) is estimated by switching between different probability models Thus, p(x|F(z)) is estimated on the fly by tracking the actual source statistics. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Context Modeling (2) of different conditional probabilities to be estimated for an alphabet size of m is equal to =C(m-1), it is intuitively clear that the model cost (learning) is proportional to. Increases the # of C, there is a point where overfitting of the model may occur. In CABAC, only very limited context templates T consisting of a few neighboring of the current symbol to encode are employed such that only a small number of different context models C is used. Context modeling is restricted to selected bins of the binarized symbols. As a result, the model cost is drastically reduced. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Types of Context Modeling Four basic design types of context models 1st: two neighboring syntax elements in the past of the current syntax element, where the specific definition of the kind of neighborhood depends on the syntax element, e.g., 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Types of Context Modeling 2nd: for the syntax elements of mb_type and sub_mb_type.the values of prior coded bins (b0,b1,...,bi-1) are used for the choice of a model for a given bin with index i. These context models are only used to select different models for different internal nodes of the corresponding binary trees. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 3rd and 4th: applied to residual data only. In contrast to all other types of context models, both types depend on the context categories of different block types. 3rd: does not rely on past coded data, but on the position in the scanning path. 4th: involves the evaluation of the accumulated number of encoded (decoded) levels with a specific value prior to the current level bin to encode (decode). Significant map 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 4th Type Non-zero levels are coded in reverse order Each non-zero coefficient is coded by abs_m1: |level| 1 Binarized by unary code One set of context models for the first bit;One set of context models for all other bits Sign 5 context models for the first bit of abs_m1 If all levels coded so far are 1 or -1 Context ID = NumT1 (up to 3) If a level greater than 1 has been coded Context ID = 4. 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 4th Type (2) 5 context models for the remaining bits of abs_m1: NumLgt1: number of coded levels that are greater than 1 Context ID = 5 + min(4, NumLgt1) NumLgt1 = 0Context ID = 5 . NumLgt1 = 4Context ID = 9 NumLgt1 = 5Context ID = 9 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 4th Type (3) Level Coding Example: Zigzag scanned result 9 0 -5 3 0 0 -1 0 1 0 0 Reverse order of nonzero abs values 1 1 3 5 9 NumLgt1 (before current level) 0 0 0 1 2 Unary code of (abs level 1) 0 0 110 11110 111111110 Context ID: 0 1 2,54,6 4,7 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 Context Index The entity of probability models can be arranged in a linear fashion called context index .Each probability model related to a given context index is determined by two values, a 6-bit probability state index o and the (binary) value = (o ,= ) for 0 398 represented as 7-bit unsigned integer. Syntax elements and associated range of context indices (0, 398) 2012/4/28VLC 2008 PART 22012/4/28VLC 2008 PART 2 0 ~ 72 context indices: related to syntax elements of macroblock, sub-macroblock, prediction modes of special and temporal as well as slice-based and macroblock-based control information. A corresponding context index can be calculated as =S+S.. S: context index offset, the lower value of the range, S : context i...</p>