data compression conference 2013 chenggang yan, yongdong zhang, feng dai and liang li 1
DESCRIPTION
Introduction (1/2) HEVC coding tree unit (CTU) 3TRANSCRIPT
![Page 1: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/1.jpg)
1
HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-
CORE PLATFORM
Data Compression Conference 2013
Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li
![Page 2: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/2.jpg)
2
Outline Introduction Related Work Proposed Method Experimental Results Conclusion
![Page 3: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/3.jpg)
3
Introduction(1/2)
HEVC coding tree unit (CTU)
![Page 4: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/4.jpg)
4
Introduction(2/2)
Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs)
Directed acyclic graph (DAG)
![Page 5: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/5.jpg)
5
Related Work(1/2)
Local parallel method (LPM) [16] Motion estimate region (MER)
[16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012
![Page 6: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/6.jpg)
6
Related Work(2/2)
Local parallel method (LPM)
123
M = 16 or 8
8
![Page 7: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/7.jpg)
7
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
![Page 8: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/8.jpg)
8
Proposed Method.A(1/3)
Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not
overlap. The IPU’s upper boundary and MER’s upper boundary do not
overlap.
123
![Page 9: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/9.jpg)
9
Proposed Method.A(2/3)
![Page 10: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/10.jpg)
10
Proposed Method.A(3/3)
Neighboring CTUs left upper upper-left upper-right
![Page 11: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/11.jpg)
11
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
![Page 12: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/12.jpg)
12
Proposed Method.B(1/4)
Generate a DAG to capture the dependency relationships of CTUs.
![Page 13: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/13.jpg)
13
Proposed Method.B(2/4)
DAG consists of a set of vertices V and edges E. data dependency <=> an edge. Processed <=> remove
123
![Page 14: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/14.jpg)
14
Proposed Method.B(3/4)
Condition matrix (CM)
![Page 15: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/15.jpg)
15
Proposed Method.B(4/4)
![Page 16: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/16.jpg)
16
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
![Page 17: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/17.jpg)
17
Proposed Method.C(1/5)
![Page 18: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/18.jpg)
18
Proposed Method.C(2/5)
Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is
designed to record the number of related CTUs for each CTU. Step2 :
When some values in the CM become zero, get the corresponding coordinates and push them into DQ.
![Page 19: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/19.jpg)
19
Proposed Method.C(3/5)
Step3 :Get coordinates from DQ and process corresponding
CTUs in parallel on many-core platform. Step4 :
Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.
Step5 :Repeat above steps 2~4 until each frame is over.
![Page 20: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/20.jpg)
20
Proposed Method.C(4/5)
Maximum parallelism of CTU
123
Maximum parallelism of highly parallel framework
123
Average parallelism of highly parallel framework
123
![Page 21: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/21.jpg)
21
Proposed Method.C(5/5)
![Page 22: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/22.jpg)
22
Experimental Results(1/5)
![Page 23: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/23.jpg)
23
Experimental Results(2/5)
![Page 24: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/24.jpg)
24
Experimental Results(3/5)
![Page 25: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/25.jpg)
25
Experimental Results(4/5)
![Page 26: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/26.jpg)
26
Experimental Results(5/5)
![Page 27: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1](https://reader035.vdocuments.mx/reader035/viewer/2022062908/5a4d1b937f8b9ab0599c258c/html5/thumbnails/27.jpg)
27
Conclusion(1/1)
Highly parallel framework provide sufficient parallelism for many-core platforms.
Use the DAG-based order to parallelize CTUs.