high efficient distributed video coding with parallelized design for cloud computing
DESCRIPTION
High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing. 適用於雲端架構下兼具高效能與平行化設計之分散式視訊編碼. Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja -Ling 吳家麟 教授 2010/6/2. Outline. Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/1.jpg)
High Efficient Distributed Video Coding with
Parallelized Design for Cloud Computing
適用於雲端架構下兼具高效能與平行化設計之分散式視訊編碼
CMLab, CSIE, NTU1
Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja-Ling 吳家麟 教授
2010/6/2
![Page 2: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/2.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU2
![Page 3: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/3.jpg)
Trends of Cloud Computing
Cloud Computing makes Clients slimmer&thinner
CMLab, CSIE, NTU3
![Page 4: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/4.jpg)
Video Coding in Cloud Computing
Only need low complexity encoder and decoder at client side Conventional video coding (e.g. H.264)
Encode once, decode many times Low complexity decoder
Distributed Video Coding (DVC) e.g. Video surveillance, wireless sensor
network Low complexity encoder
CMLab, CSIE, NTU4
![Page 5: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/5.jpg)
Distributed Video Coding
Slepian-Wolf Theorem (1973)
Wyner-Ziv Theorem (1976)
CMLab, CSIE, NTU5
RX ≧H(X)Source X
Source Y
Dependency exists but is not exploited
Joint Decoder
X
Y
Encoder X
Encoder YRY ≧H(X)
RX + RY≧?RX + RY≧H(X, Y)
Source X
Source Y
Statistical dependency
Joint Encoder
RX ≧H(X)
Joint Decoder
X
Y
Conventional video coding paradigmRY ≧H(Y)
Slepian&Wolf : H(X, Y) !!
![Page 6: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/6.jpg)
Distributed Video Coding
Wyner-Ziv Theorem (1976) Extend to lossy coding
CMLab, CSIE, NTU6
Dependency exists but is not exploited
Joint DecoderEncoder X
Source XSource
Encoder
XSource Decoder
Virtual channelVirtual channel
Encoder Y
Source YY
Source Encoder
Source Decoder
Side information estimation
X’
DVC is also called Wyner-Ziv (WZ) video coding
Quantizer
Quantizer
Channel Encoder
Channel Decoder
Y
Channel Encoder
Channel Decoder
Noisy Channel
X’XX+P (X+P)’
Channel coding (Error Control Code):
RX + RY≧ ?
Wyner&Ziv : H(X, Y) !
RY ≧H(Y)
RX ≧H(X|Y)
Correlation is exploited
P
![Page 7: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/7.jpg)
Video Coding in Cloud Computing
WZ to H.264 video transcoder
CMLab, CSIE, NTU7
WZ to H.264 Transcoder
CloudComputational Resource
WZ encoder(Low Complexity)
H.264 decoder(Low Complexity)
WZ encoded bitstream
H.264 encoded bitstream
![Page 8: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/8.jpg)
Motivation
There is still a gap between Wyner-Ziv video coding and conventional video coding (e.g. H.264/AVC)
Most reported WZ codecs have a high time-delay in the decoder Trends of parallel computing
e.g. Multi-core CPU, GPU Parallelizability of the decoder is essential
CMLab, CSIE, NTU8
![Page 9: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/9.jpg)
DISPAC Video Codec
DIStributed video coding with PArallelized design for Cloud computing (DISPAC) To better rate-distortion (RD)
performance Combine coding tools developed in recent
literatures with some newly developed modules.
To reduce decoding time-delay Highly parallelized decoder.
CMLab, CSIE, NTU9
![Page 10: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/10.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU10
![Page 11: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/11.jpg)
DISPAC Video Codec
Combine coding tools of two state-of-the-art WZ codec: DISCOVER codec (Distributed coding for video
services) X. Artigas et al., “The DISCOVER codec: architecture,
techniques and evaluation”, PCS, 2007
MLWZ codec (Motion-learning based Wyner-Ziv video coding)
R. Martin et al., “Statistical motion learning for improved transform domain Wyner-Ziv video coding”, IET Image Processing, 2010
CMLab, CSIE, NTU11
![Page 12: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/12.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU12
Ref. X. Artigas et al., PCS, 2007
GOP 2
WZKey WZKey Key
GOP 4
WZ
![Page 13: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/13.jpg)
Quantization
CMLab, CSIE, NTU13
Eight quantization matrices
Q1
16 8 0 0
8 0 0 0
0 0 0 0
0 0 0 0
Q2
32 8 0 0
8 0 0 0
0 0 0 0
0 0 0 0
Q3
32 8 4 0
8 4 0 0
4 0 0 0
0 0 0 0
Q4
32 16 8 4
16 8 4 0
8 4 0 0
4 0 0 0
Q5
32 16 8 4
16 8 4 4
8 4 4 0
4 4 0 0
Q6
64 16 8 8
16 8 8 4
8 8 4 4
8 4 4 0
Q7
64 32 16 8
32 16 8 4
16 8 4 4
8 4 4 0
Q8
128 64 32 16
64 32 16 8
32 16 8 4
16 8 4 0
32 = 25
=> use 5 bits
8 = 23
=> use 3 bits
0 bits (不傳送 )
![Page 14: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/14.jpg)
Quantization
CMLab, CSIE, NTU14
DCT coefficient band
Block1
S11 S1
2 S16 S1
7
S13 S1
5 S18 S1
13
S14 S1
9 S112 S1
14
S110 S1
11 S115 S1
16
Block2
S21 S2
2 S26 S2
7
S23 S2
5 S28 S2
13
S24 S2
9 S212 S2
14
S210 S2
11 S215 S2
16
Block3
S31 S3
2 S36 S3
7
S33 S3
5 S38 S3
13
S34 S3
9 S312 S3
14
S310 S3
11 S315 S3
16
DCT coefficient band b1: { S11, S2
1, S31, …SN
1 }
DCT coefficient band b2: { S12, S2
2, S32, …SN
2 }
DCT coefficient band b16: { S116, S2
16, S316, …SN
16 }
…
DC band
AC bands
![Page 15: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/15.jpg)
Bit plane Extraction
CMLab, CSIE, NTU15
00100 00001
00000 11110
Bit planes of DC band:
Bit plane 1:
Bit plane 2:
Bit plane 3:
Bit plane 4:
Bit plane 5:
Channel Encode(LDPCA)
4 6
7
0 6
3
1 7
7
30 1
5
For each DCT coefficient band…
MSB
LSB
Q4
32 16 8 4
16 8 4 0
8 4 0 0
4 0 0 0
![Page 16: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/16.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU16
Ref. X. Artigas et al., PCS, 2007
白育姍
Dependency exists but is not exploited
Joint DecoderEncoder X
Source XX
Virtual channelVirtual channel
Encoder Y
Source YY
Source Encoder
Source Decoder
Side information estimation
X’
Quantizer
Quantizer
Channel Encoder
Channel Decoder
Y
RY ≧H(Y)
RX ≧H(X|Y)
P
![Page 17: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/17.jpg)
Side Information Creation
CMLab, CSIE, NTU17
XFXB
Low pass filter (3x3 Mean filter)Divide frame to 16x16 non-overlapped blocksMotion estimation (search window: ±32)
( , )
1( , ) ( , ) ( , )x y F B x y
x y B
MAD d d X x y X x d y dN
2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d
![Page 18: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/18.jpg)
Side Information Creation
CMLab, CSIE, NTU18
XFXB
![Page 19: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/19.jpg)
Side Information Creation
CMLab, CSIE, NTU19
XFXB
(xL, yL )
(xu, yu )Adaptive search range:
L x R
U y B
x N d x N
y N d y N
N
N
N
N(xR yR )
(xB, yB )
![Page 20: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/20.jpg)
Side Information Creation
CMLab, CSIE, NTU20
XFXB
Half pixel motion estimation
![Page 21: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/21.jpg)
Side Information Creation
CMLab, CSIE, NTU21
XFXB
9
1,
arg min , for 1 i 9i
wvmf j i jx j j i
x w x x
Weighted vector median filter:
( , )
( , )i
jj
MSE x Bw
MSE x B
x1
x2
x3
x4
x5x6
x7
x8
x9
Spatial motion smoothing
![Page 22: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/22.jpg)
MSE2
Side Information Creation
CMLab, CSIE, NTU22
XFXB
9
1,
arg min , for 1 i 9i
wvmf j i jx j j i
x w x x
Weighted vector median filter:
( , )
( , )i
jj
MSE x Bw
MSE x B
x1
x2
MSE1
![Page 23: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/23.jpg)
Side Information Creation
CMLab, CSIE, NTU23
XFXB
9
1,
arg min , for 1 i 9i
wvmf j i jx j j i
x w x x
Weighted vector median filter:
( , )
( , )i
jj
MSE x Bw
MSE x B
x1
9
1 11, 1
1 1 11 2 1 3 1 9
2 3 9
=
...
i j jj j
x x w x x
MSE MSE MSEx x x x x x
MSE MSE MSE
![Page 24: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/24.jpg)
The result of x6 is minimumxwvmf = x6 (Final motion vector ! )
Side Information Creation
CMLab, CSIE, NTU24
XFXB
9
1,
arg min , for 1 i 9i
wvmf j i jx j j i
x w x x
Weighted vector median filter:
( , )
( , )i
jj
MSE x Bw
MSE x B
x6
![Page 25: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/25.jpg)
Side Information Creation
CMLab, CSIE, NTU25
XFXB
Block interpolation ( 0.75*XB + 0.25*XF )Bidirectional motion compensation
![Page 26: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/26.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU26
Ref. X. Artigas et al., PCS, 2007
白育姍
Laplacian Distributio
n
![Page 27: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/27.jpg)
CNM Parameter Estimation
CMLab, CSIE, NTU27
XFXB
Residual frame generation:R( , ) ( , )
( , )2
F xf yf B xb ybX x d y d X x d y dR x y
![Page 28: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/28.jpg)
CNM Parameter Estimation
CMLab, CSIE, NTU28
( , ) [ ( , )]n nT u v DCT R x y
Residual frame DCT transform : (4x4)
RT z
258
10
-30 120
0.5
-6
35
5
-24 200
-40
20
21
Variance of
ˆBand 1 : 22
Variance of
ˆBand 2 : 23
Variance of
ˆBand 3 :
![Page 29: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/29.jpg)
CNM Parameter Estimation
CMLab, CSIE, NTU29
T
258
10
-30 120
0.5
-6
35
5
-24 200
-40
20
2 22
2 22
2ˆ, [ ( , )]
ˆˆ ( , )
2ˆ, [ ( , )] >
[ ( , )]
n bb
n
n bn
D u v
u v
D u vD u v
CNM parameter computation:
21 1 1
Assume variance and mean
of band 1 is:
ˆ , [ 1|10 00 ]0 | 5E T
2 2
2
2
(| 258 | )
1 1000
150
08
2
108
nD
2 2
2
(|120 | )
3 1000
15
1000
0
2
0nD
( , ) | ( , ) | [| | ]n n b bD u v T u v E T
![Page 30: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/30.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU30
Ref. X. Artigas et al., PCS, 2007
白育姍
![Page 31: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/31.jpg)
Correlation Noise Distribution Modeling
CMLab, CSIE, NTU
CNM parameter
Side information
Laplacian distribution
WZ
![Page 32: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/32.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU32
Ref. X. Artigas et al., PCS, 2007
白育姍
![Page 33: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/33.jpg)
Conditional Bit Prob Computation
: probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits
CMLab, CSIE, NTU33
X-Y
Prob.
176/4
144/4
WZ
WZ WZ
WZ
Laplacian pdf
1( 1| , )k
kP B Y B
1( 1| , )k
kP B Y B
1( )kB
( 1)k
B
Need to sum up 256 probabilities0011000 (24) 0011111 (31)
Assume quantization step size is 32
(31-24+1) x 32 = 256
R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.
![Page 34: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/34.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU34
Ref. X. Artigas et al., PCS, 2007
白育姍
![Page 35: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/35.jpg)
Reconstruction
CMLab, CSIE, NTU35
4
7
6 1
7
7
0
3
6 30
5
1
Channel decode(LDPCA)
Bit plane 1: 0 0 0 1
Bit plane 2: 0 0 0 1
Bit plane 3: 1 0 0 1
Bit plane 4: 0 0 0 1
Bit plane 5: 0 1 0 0
Zig zag order
Bit planes of DC band:
![Page 36: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/36.jpg)
Reconstruction
CMLab, CSIE, NTU36
1
1
1 1
ˆ [ | [ , ), ]
1,
11 1
( ) ( ), [ , )
2 ( )
1,
1
opt i i
i i
i i
i i
x E x x z z y
z y ze
e ey y z z
e e
z y ze
1, , is quantization step sizei iy z z y 2
2 2
is the model parameter related to the variance of
the Laplacian distribution as 1
/
D. Kubasov et al., “Optimal reconstruction in Wyner–Ziv video coding with multiple side information”, IEEE workshop on MMSP, 2007
![Page 37: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/37.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU37
Ref. X. Artigas et al., PCS, 2007
Poor RD performance for high motion and large GOP size sequences
白育姍
![Page 38: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/38.jpg)
DISCOVER Video Codec
CMLab, CSIE, NTU38
Ref. X. Artigas et al., PCS, 2007
Rooms for Improvement
白育姍
![Page 39: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/39.jpg)
MLWZ Video Codec
CMLab, CSIE, NTU39
Ref. R. Martin et al., IET Image Processing, 2010
SI (Y)
WZ (R)
Search rangeSMF1=0.1
SMF2=0.02
SMF81=0.1
( , )( , ) {( , )}bn x ySSE m m
n x y x ySMF m m P m m e
Update SMF:Normalize SMF:
( , )( , )
( , )x y
n x yn x y S S
n x ym S m S
SMF m mSMF m m
SMF m m
白育姍
![Page 40: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/40.jpg)
白育姍
MLWZ Video Codec
CMLab, CSIE, NTU40
Ref. R. Martin et al., IET Image Processing, 2010
DCTY
SI
Search range
……
MLY
( , )( , ) ( , ) ( , )x y
x y
S SML DCT
nn n m m x ym S m S
Y u v Y u v SMF m m
Side information re-estimation:
![Page 41: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/41.jpg)
MLWZ Video Codec
CMLab, CSIE, NTU41
Ref. R. Martin et al., IET Image Processing, 2010
( , )ˆ ( , )| ( , ) ( , )|
( ( , ) ( , ))
ˆ ( , )( , )( )
2
DCT DCTn n n m mx y
x y
ML DCT DCTn n n
S Su v X u v Y u vn
n x ym S m S
p X u v Y u v
u vSMF m m e
Correlation Noise Distribution Modeling:
DCT coefficient of WZ
DCT coefficient SI
Laplacian distributionLaplacian parameter
Sum of Laplacian !
白育姍
![Page 42: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/42.jpg)
MLWZ Video Codec
CMLab, CSIE, NTU42
Ref. R. Martin et al., IET Image Processing, 2010
Improve RD performance in high motion and large GOP size sequences
Rooms for Improvement
白育姍
![Page 43: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/43.jpg)
DISPAC Video Codec
CMLab, CSIE, NTU43
邱柏叡
邱柏叡Half-pixel motion estimation:
( , )
1( , ) ( , ) ( , )x y R P x y
x y B
MAD d d X x y X x d y dN
2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d
( , )
1( , ) ( , ) ( , )x y R F x y
x y B
MAD d d X x y X x d y dN
白育姍
Reduce decoding time and Improve RD performance
Improve subjective quality
Improve SI for motion learning
For low motion parts
For high motion parts
Improve initial SI and motion learning
![Page 44: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/44.jpg)
DISPAC Video Codec
CMLab, CSIE, NTU44
邱柏叡
邱柏叡白育姍
程瀚平
![Page 45: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/45.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU45
![Page 46: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/46.jpg)
RD Performance of DISPAC Test sequences:
QCIF, 15Hz, all frames (150 for Soccer, Foreman, Coastguard and 164 for Hall Monitor)
GOP size: 2, 4, 8 Bitrate and PSNR: only luminance component
CMLab, CSIE, NTU46
Soccer Foreman Coastguard Hall MonitorHigh LowMotion
![Page 47: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/47.jpg)
RD Performance (GOP=2)
CMLab, CSIE, NTU47
![Page 48: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/48.jpg)
RD Performance (GOP=4)
48 CMLab, CSIE, NTU
![Page 49: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/49.jpg)
RD Performance (GOP=8)
CMLab, CSIE, NTU49
3.6 dB3.1 dB
0.9 dB 2.6 dB
3.1 dB1.6 dB
0.2 dB 2.6 dB
![Page 50: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/50.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU50
![Page 51: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/51.jpg)
Parallelizing DISPAC Decoder
CMLab, CSIE, NTU51
OpenMP
CUDA
白育姍
邱柏叡
邱柏叡
![Page 52: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/52.jpg)
Side Information Re-Creation
Assume QCIF sequence, 800 4x4 WZ blocks, 1024 search candidates within search range
CMLab, CSIE, NTU
Second iteration(128 candidates)
First iteration(128 candidates)Texture memory
52
![Page 53: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/53.jpg)
Side Information Re-Creation
Reduction algorithm
CMLab, CSIE, NTU53Mark Harris, “Optimizing parallel reduction in CUDA”, NVIDIA Developer Technology, 2007.
( , )
1( , ) ( , ) ( , )x y R B x y
x y B
MAD d d X x y X x d y dN
2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d
( , )
1( , ) ( , ) ( , )x y R F x y
x y B
MAD d d X x y X x d y dN
![Page 54: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/54.jpg)
Parallelizing DISPAC Decoder
CMLab, CSIE, NTU54
CUDA
CUDA
白育姍
邱柏叡
邱柏叡
![Page 55: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/55.jpg)
Correlation Noise Distribution Modeling
Assume QCIF sequence, 800 4x4 WZ blocks, 1024 possible integer values of X-Y for DCT coefficient band 2
CMLab, CSIE, NTU55
176/4
144/4
WZ
WZWZ WZ
Skip Intra
WZ1024 integer values
X-Y
PCNM
Sum of Laplacian pdf
![Page 56: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/56.jpg)
Correlation Noise Distribution Modeling
CMLab, CSIE, NTU56
![Page 57: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/57.jpg)
Conditional Bit Prob Computation
: probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits
CMLab, CSIE, NTU57
X-Y
PCNM
176/4
144/4
WZ
WZWZ WZ
Skip Intra
WZ
Sum of Laplacian pdf
1( 1| , )k
kP B Y B
1( 1| , )k
kP B Y B
1( )kB
( 1)k
B
Need to sum up 256 probabilities0011000 (24) 0011111 (31)
Assume quantization step size is 32
(31-24+1) x 32 = 256
R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.
![Page 58: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/58.jpg)
Conditional Bit Prob Computation
CMLab, CSIE, NTU58
![Page 59: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/59.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU59
![Page 60: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/60.jpg)
Decoding speed of DISPAC A workstation equipped with an Intel Xeon
E5530 CPU at 2.4GHz and an NVIDIA Tesla C1060 graphics card is used to emulate the basic unit of a Could computing environment.
Operating system: Debian squeeze/sid with 2.6.32-5-amd64 kernel.
QCIF, 15Hz, whole sequence, GOP size 8, quantization table 8 (Q8)
CMLab, CSIE, NTU60
![Page 61: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/61.jpg)
Decoding speed of DISPAC
CMLab, CSIE, NTU61
Bottleneck analysis (sequential decoding)
CNM: Correlation Noise Modeling
![Page 62: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/62.jpg)
Decoding speed of DISPAC
CMLab, CSIE, NTU62
Foreman Soccer Coastguard Hall Monitor
22.81 16.64 27.77
9.21
232.06
179.95
293.17
184.08
120.27 115.51 126.88
104.39
8.75 8.43 9.06 8.02
Speedup ratio of decoding modules (8core+GPU)
LDPCA Decode CNM SI Re-Creation Others
![Page 63: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/63.jpg)
Decoding speed of DISPAC
63
DISCOVER MLWZ DISPAC
84.7875.3
48.35
1.54
Foreman
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
81.31 84.38
29.83
1.33
Soccer
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
62.31
74.72 77.95
1.9
Coastguard
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
13.78
33.18
15.93
1.19
Hall Monitor
Sequential 8core+GPU
Average decoding time per frame (sec.)
![Page 64: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/64.jpg)
Decoding speed of DISPAC
64
DISCOVER MLWZ DISPAC
1.00 1.13 1.75
55.1238371511453
Foreman
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
1.00 0.96 2.73
60.9697161975081
Soccer
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
1.00 0.83 0.80
32.7941378891544
Coastguard
Sequential 8core+GPU
DISCOVER MLWZ DISPAC
1.00 0.42 0.87
11.5701702530149
Hall Monitor
Sequential 8core+GPU
Speed up ratio (compare to DISCOVER)
![Page 65: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/65.jpg)
Outline
Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work
CMLab, CSIE, NTU65
![Page 66: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/66.jpg)
Conclusions
DISPAC combined the coding tools developed in recent literatures (e.g. MLWZ codec) with some newly developed modules (block mode selection, SI re-creation and adaptive deblocking filter). Up to 3.6 dB gain on RD performance
The decoding modules can be highly parallelized. Up to 61 times faster than state-of-the-art DVC codec
CMLab, CSIE, NTU66
![Page 67: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/67.jpg)
Future Work
Update the correlation noise model parameter during decoding process. For RD performance
Improve parallelizability of the parallel LDPCA decoding algorithm for small size parity check matrices. For decoding speed
WZ to H.264 video transcoder. For real demo system
CMLab, CSIE, NTU67
![Page 68: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing](https://reader036.vdocuments.mx/reader036/viewer/2022062322/56814577550346895db246e5/html5/thumbnails/68.jpg)
Thank You
CMLab, CSIE, NTU68