optimizing video editing software with - home -...

34

Upload: others

Post on 04-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink
Page 2: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Optimizing Video Editing Software with

OpenCL

Stanley Lam

Cyberlink

Cyberlink Senior Program Manager/Technologist

Page 3: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

3 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL - Introduction

Page 4: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Introduction

Video Editing Pipeline

– Decode Effect Blend Encode

– 200+ effects

OpenCL for Effect/Blend acceleration

– Compatibility

Single-code for multiple devices

– Performance

GPU support

Concerns

– Host-to-device memory copy

– Host code security

Encode Blend Decode Effect

Effects Decoder Blender Encoder

Page 5: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Video Editing Pipeline

4-Stages pipeline 1. Obtain decoded frames from Decoder modules 2. Apply Video Effect on demand 3. Apply Blender to merge frame layers 1-by-1 4. Pass frame into Encoder to produce final video

Encode Blend Decode Effect

Effect Decoder Blender

Effect Decoder Blender

Encoder

Page 6: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

6 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL – Problem statement

Page 7: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Problem statement

OpenCL source buffer needs to be uploaded to the device before applying kernel operations

OpenCL result buffer needs to be downloaded to the host and passed to the next pipeline stage

Such Host-to-Device buffer movements could eliminate any HW acceleration gain

CPU OCL_GPU OCL_Kernel

Gain

OCL_GPU

Gain Effect Name Total(ms) Total (ms) Kernel Upload Download

TV Simulator 14.11 15.46 2.88 6.11 6.47 389.93 % -8.73 %

Platform: Armorhead

Driver: 8.832

SDK: APP SDK 2.4 RC

OS: Win7 Ultimate 64bits

Page 8: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

8 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL – Resource inventory

Page 9: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

1 Encode n source k effects n blender

Pipeline: n Source Videos (hence n blenders) with k effects

Effect 1 Decoder 1 Effect 2 Blender

Effect 5 Decoder 2 Effect 6 Blender

Effect k-1 Decoder n Effect k Blender

Encoder

Resource inventory

Page 10: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

copy = 0

Encoder

Typical case: All modules are executed by the CPU (Host) on system memory.

• Minimum # of frame copy = n + k + n

copy = n copy = k copy = n

Effect 1 Decoder 1 Effect 2 Blender

Effect 3 Decoder 2 Effect 4 Blender

Effect 2n-1 Decoder n Effect 2n Blender

… … …

3

4

5 6 7

8

1 2

… 4(n-1)

4n-1 4n-2 4n-3

: Frame on system memory

4n

Resource inventory

If k == 2n

Page 11: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

If GPGPU Filter is used…

Logic:

– Upload input frame to the Device

– Upload filter kernel code to the Device

– Do kernel code in the Device

– Read-back output frame to the Host

11

Page 12: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

OpenCL GPGPU Effect – Host/Device copy

Pipeline OCL Host/Device Frame Movement

Effect

(OpenCL) Output

Input Output

Host

Device

Write Read

Input

• 1 Input frame uploaded to Device • 1 Output frame read-back to Host

: Frame on GPU Device

: Frame on CPU Host

Page 13: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

OpenCL GPGPU Blender – Host/Device copy

Pipeline OCL Host/Device Frame Movement

Blender

(OpenCL) Output

Input1 Output

Host

Device

Write Read

Input2

Write

Input1

Input2

• 2 Input frames uploaded to Device • 1 Output frame read-back to Host

: Frame in Device

: Frame in Host

Page 14: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

copy = 0

Data Flow – Use OpenCL Effect/Blender

copy = 4n copy = n copy = 3k

Effect 1

(OpenCL) Decoder 1

Effect 2

(OpenCL)

Blender

(OpenCL)

Effect 3

(OpenCL) Decoder 2

Effect 4

(OpenCL)

Blender

(OpenCL)

… … … 11(n - 1)-1

4 7 10

11 14 17 21

2 3 5 6 8 9

12 13 15 16 18 19

1

Effect 2n-1

(OpenCL) Decoder n

Effect 2n

(OpnCL)

Blender

(OpenCL) 11n - 11 11n - 8 11n - 5

11n-10 11n-9 11n-7 11n-6 11n-2 11n-4

11n-3

20

: Frame in Device

: Frame in Host

• If we just simply put OpenCL GPGPU modules into the pipeline • Frame copy overhead = 2k + 3n

Encoder 11n - 1

• Minimum # of frame copy = n + 3k + 4n

If k == 2n

Page 15: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Reduced Host/Device copy

Host

• If we keep frames in Device as long as possible

Effect

(OpenCL) Input Output

Device

: Frame in Device

: Frame in Host

Data Flow OCL Host/Device Frame Movement

Page 16: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Reduced Host/Device copy

• Keep frames in Device as long as possible

Data Flow OCL Host/Device Frame Movement

Blender

(OpenCL) Output

Host

Device

Input1

Input2

: Frame in Device

: Frame in Host

Page 17: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

copy = 1 Encoder

OpenCL Effects Data Flow – Reduced

copy = 2n copy = k copy = n

Effect 1

(OpenCL)

Source Video 1

Effect 2

(OpenCL)

Blender

(OpenCL)

Effect 3

(OpenCL)

Source Video 2

Effect 4

(OpenCL)

Blender

(OpenCL)

… … …

3 4

5

6 8 9

10

2

7

1

Effect 2n-1

(OpenCL)

Source Video n

Effect 2n

(OpnCL)

Blender

(OpenCL) 5n-4

5n+1

5n-2 5n-1 5n-3

5(n-1)

5n

: Frame in Device

: Frame in Host

• Keeping frames in GPGPU Device to reduce frame copy • Frame copy overhead = n + 1

• Minimum # of frame copy = 2n + k + n + 1

If k == 2n

Page 18: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Case 1-1: Decoder OpenCL Effect

– SW Decoder + OpenCL Effect

1 frame copy (OCL Host OCL Device)

– HW Decoder + OpenCL Effect (no share)

2 frame copy (DxVA Device OCL Host (System) OCL Device)

– HW Decoder + OpenCL Effect (DxVA/OpenDecode share)

0 frame copy

Case 1-2: Decoder SW Effect

– SW Decoder + SW Effect

0 frame copy

– HW Decoder + SW Effect

1 frame copy (DxVA Device System)

Page 19: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Case 2-1: OpenCL Effect Blender

– 2 OpenCL Effect + SW Blender

2 frame copy (2 OCL Device 2 OCL Host)

– 2 OpenCL Effect + OpenCL Blender (no share)

4 frame copy (2 OCL Device 2 OCL Host (System) 2 OCL Device)

– 2 OpenCL Effect + OpenCL Blender (OpenCL shared)

0 frame copy

Case 2-2: SW Effect Blender

– 2 SW Effect + SW Blender

0 frame copy

– 2 SW Effect + OpenCL Blender

2 frame copy (2 OCL Host 2 OCL Device)

Page 20: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Case 3-1: OpenCL Blender Enc

– OpenCL Blender + SW Encoder

1 frame copy (OCL Device OCL Host)

– OpenCL Blender + HW Encoder (no share)

2 frame copy (OCL Device OCL Host (System) Enc Device)

– OpenCL Blender + HW Encoder (OpenEncode shared)

0 frame copy

Case 3-2: SW Blender Enc

– SW Blender+ SW Enc

0 frame copy

– SW Blender + HW Encoder

1 frame copy (System Enc Device)

Page 21: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Memory is a limited resource within the GPU

ex: 1024MB in HD 6800 series

Lots of memory activities in video editing pipeline

Memory management is critical

Host-to-Device frame-copy must occur when the pipeline uses a mix of CPU and GPU filters

Can be improved by “zero-copy”, “fast copy” techniques

Frame-copy can be avoided if whole pipeline is in the same Device

Page 22: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Various HW acceleration techniques adopted

They handle resource differently

– Buffers duplicated

HW Acceleration Tech

Decoder DxVA

Effects

PiP OpenCL

DSP OpenCL/IntelQuickSync/APP/CUDA

Particle D3D11

3D Template D3D9

Title OpenCL

Blender OpenCL

Encoder MSDK/AVT/NVPVENC (Note1)

Note 1: Use HW Encoder filter from IHV

Page 23: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Resource inventory

Multi-thread rendering

– Can leverage available computational resources

– Further increases overall rendering performance

Dedicated Memory Management is necessary

– To improve performance in CPU+GPU mixed pipeline cases

– To avoid memory movement in pure GPGPU pipeline cases

– OpenCL/D3D9/D3D11

– To use limited GPGPU memory in a more efficient way

Page 24: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

24 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL – Memory Management

Page 25: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Memory management

2 pipeline scenario

– CPU-only pipeline – use only CPU resource

– HW accelerated pipeline – use GPU resource if possible

To manage resource for different pipeline scenario

– To share memory object

– To reduce memory allocate/destroy

– To increase memory usage efficiency

2-layer design

– Manager + Object

Page 26: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Memory Manager

Memory object manager

– Allocate/Free/Monitor memory object

from both system and GPGPU memory

– Sync up GPGPU device used in pipeline

– Handle out-of-memory situation

– Keep tracking of memory object status

Total object amount

Total used/locked memory size

Used by editing kernel to deliver/manage frame buffers in pipeline

Page 27: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Memory Object

Abstracted frame buffer object

– To carry System/OpenCL/DxVA/D3D9/D3D11 frames

Used by all modules within pipeline

Do host/device migration

– Notify out-of-memory situation

Centrically Managed by Memory Manager

Page 28: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

28 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL – Coding examples

Page 29: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Examples

Memory Manager :

Memory Object : Buffer passed through pipe line

Temporary buffer

Effect 1 Decoder Effect 2 Blender Encoder

Page 30: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

30 | Presentation Title | Month ##, 2011

Optimizing Video Editing Software with

OpenCL - Demo

Page 31: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Demo

Show whole editing pipeline

– PDR10 (build 5/30)

Demo cases

– No HW acceleration

SW Dec + SW Effect + SW Blender + SW Enc

– Enable all OCL acceleration w/ share

SW Dec + OCL Effect + OCL Blender + SW Enc

– Enable all HW acceleration w/o share

HW Dec + OCL Effect + OCL Blender + HW Enc

– Enable all HW acceleration w/ share

Not ready yet

Page 32: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Performance Performance numbers of 46 OCL Effects

Performance numbers of Demo cases

Test Projects: PackedProject\Project1.pds - PDR 10 2x2 TVWall with 3 OpenCL effects

Source: Sample_H264.m2ts - Full HD H264 clip, 30 sec

Desination: MPEG2 - Profile "(HD) MPEG-2, 1080i" PDR10 build

Sabine Platform: 1.8 GHz, Radeon HD 6620G, 4GB system memory

Total (s) Gain (%)

SW (Effect + Blender) + SW (Dec + Enc) 1045 -

OCL (Effect + Blender) + SW (Dec + Enc) (with overhead) 485 115.46

OCL (Effect + Blender) + SW (Dec + Enc) (w/o overhead) 405 158.02

OCL (Effect + Blender) + HW (Dec + Enc) (w/o overhead) 455 129.67

Page 33: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

Q & A

Page 34: Optimizing Video Editing Software with - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/1741_final.pdf · Optimizing Video Editing Software with OpenCL Stanley Lam Cyberlink

34 | Optimizing Video Editing Software with OpenCL | June 2011

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions

and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited

to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product

differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no

obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to

make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO

RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS

INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY

DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL

OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF

EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in

this presentation are for informational purposes only and may be trademarks of their respective owners.

The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and

opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is

not responsible for the content herein and no endorsements are implied.