opencl introduction an example for opencl lu oct.11 2014

36
OpenCL Introduction AN EXAMPLE FOR OPENCL LU LU OCT.11 2014

Upload: bernard-chambers

Post on 16-Jan-2016

260 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

OpenCL Introduction

AN EXAMPLE FOR OPENCLLU LU

OCT.11 2014

Page 2: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

2OPENCL INTRODUCTION | APRIL 11, 2014

CONTENTS

1. Environment Configuration

2. Case Analyzing

Page 3: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

1. ENVIRONMENT CONFIGURATION

Page 4: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

4OPENCL INTRODUCTION | APRIL 11, 2014

1. ENVIRONMENT CONFIGURATION

IDE– Any IDE for C/C++ could use OpenCL.– We use Microsoft Visual Studio 2010.

Setting for the requiring projects:– Add include path of the SDK to Additional include directories.– Add library path of the SDK to Additional library directories.

Page 5: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

5OPENCL INTRODUCTION | APRIL 11, 2014

1. ENVIRONMENT CONFIGURATION

Include Directory

Page 6: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

6OPENCL INTRODUCTION | APRIL 11, 2014

1. ENVIRONMENT CONFIGURATION

Lib Directory

Page 7: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

7OPENCL INTRODUCTION | APRIL 11, 2014

1. ENVIRONMENT CONFIGURATION

OpenCL Lib

Page 8: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

2. CASE ANALYZING

Page 9: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

9OPENCL INTRODUCTION | APRIL 11, 2014

2. CASE ANALYZING

1. Problem Description

2. Algorithm

3. Calculation Features

4. Parallelizing

5. Programming1. Kernel2. Host

6. Tools1. AMD Profiler2. gDEBugger

Page 10: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

10OPENCL INTRODUCTION | APRIL 11, 2014

2.1 PROBLEM DESCRIPTION

Input an image, the rotation center and angle;

Output the rotated image with the same size of the input (original) image.

Original Rotated

Page 11: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

11OPENCL INTRODUCTION | APRIL 11, 2014

2.2 ALGORITHM

Let be the rotation center, be the rotation angle;

A point in original image will be move into the new position after rotating clockwise as per following formula:

Page 12: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

12OPENCL INTRODUCTION | APRIL 11, 2014

2.3 CALCULATION FEATURES

The calculation for each point is the same and independent;

A large amount of points.

So it is fit for parallel computing with GPU.

Page 13: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

13OPENCL INTRODUCTION | APRIL 11, 2014

2.4 PARALLELIZING

With OpenCL framework, assign one work-item for the calculation for each point.

There are two methods to implement the algorithm:– Assign work-items as per original image;

• For each point, calculate the new position and copy it to the output image;• Write-memory conflict.

– Assign work-items as per output image.• For each point, calculate the source position and copy it from the original image;• Read-memory conflict.

Page 14: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

14OPENCL INTRODUCTION | APRIL 11, 2014

2.5 PROGRAMMING

1. Kernel– which run in GPU.

2. Host– which run in CPU.

Page 15: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

15OPENCL INTRODUCTION | APRIL 11, 2014

2.5.1 KERNEL

1. __kernel void image_rotate(

2. __global float * src_data, __global float * dest_data, //Data in global memory

3. int W, int H, //Image Dimensions

4. float sinTheta, float cosTheta ) //Rotation Parameters

5. {

6. //Thread gets its index within index space

7. const int ix = get_global_id(0);

8. const int iy = get_global_id(1);

9. //Calculate location of data to move into ix and iy– Output decomposition as mentioned

10. float xpos = (((float)ix) * cosTheta + ((float)iy) * sinTheta);

11. float ypos = (((float)iy) * cosTheta - ((float)ix) * sinTheta);

12. //Bound Checking

13. if ((((int)xpos >= 0) && ((int)xpos < W)) && (((int)ypos >= 0) && ((int)ypos < H)))

14. {

15. //Read (xpos,ypos) src_data and store at (ix,iy) in dest_data

16. dest_data[iy * W + ix] = src_data[(int)(floor(ypos * W + xpos))];

17. }

18. }

Page 16: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

16OPENCL INTRODUCTION | APRIL 11, 2014

2.5.1 KERNEL

This kernel will rotate the image with rotation angle anticlockwise.

OpenCL defined some native function, such as sin and cos, but here calculate these value in host and pass them as parameters to the kernel because they are the same for every work-item.

Page 17: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

17OPENCL INTRODUCTION | APRIL 11, 2014

2.5.1 KERNEL

KernelAnalyzer

Page 18: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

18OPENCL INTRODUCTION | APRIL 11, 2014

2.5.1 KERNEL

KernelAnalyzer– We can see the bottlenecks are ALU ops.– It means that the main work of kernel is calculation, but not the data

transfer.– This kernel has high performance.

Page 19: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

19OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Platform

• Query Platform• Query Devices• Create Context• Create Command Queue

Compiler• Compile Program• Create Kernel

Runtime

• Create Buffers• Write buffers• Set Kernel Arguments• Run Kernel• Read buffers

Page 20: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

20OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Query Platformcl_int clGetPlatformIDs (cl_uint num_entries,

cl_platform_id *platforms,

cl_uint *num_platforms)

– This function is usually called twice; first calling is for getting the number of platform, and second calling is for getting the platforms.

– First calling:• clGetPlatformIDs(NULL, NULL, num)

– Second calling:• clGetPlatformIDs(num, platforms, NULL)

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 21: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

21OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Query Devicescl_int clGetDeviceIDs (cl_platform_id platform,

cl_device_type device_type,

cl_uint num_entries,cl_device_id

*devices,cl_uint

*num_devices)– This function is also usually called twice just like

clGetPlatformIDs.– device_type:

• CL_DEVICE_TYPE_ALL• CL_DEVICE_TYPE_CPU• CL_DEVICE_TYPE_GPU

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 22: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

22OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 23: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

23OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Create Contextcl_context clCreateContext (

const cl_context_properties *properties,cl_uint num_devices,const cl_device_id *devices,void (CL_CALLBACK *pfn_notify)(const

char *errinfo, const void *private_info, size_t cb, void *user_data),

void *user_data,cl_int *errcode_ret)

Create Command Queuecl_command_queue clCreateCommandQueue (

cl_context context,cl_device_id device,cl_command_queue_properties properties,cl_int *errcode_ret)

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 24: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

24OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Compile Programcl_program clCreateProgramWithSource(

cl_context context,cl_uint count,const char **strings,const size_t *lengths,cl_int *errcode_ret)

Create Kernelcl_kernel clCreateKernel (

cl_program program,const char *kernel_name,cl_int *errcode_ret)

Page 25: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

25OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Create Bufferscl_mem clCreateBuffer (cl_context context,

cl_mem_flags flags,size_t size,void *host_ptr,cl_int *errcode_ret)

Write Bufferscl_int clEnqueueWriteBuffer (cl_command_queue command_queue,

cl_mem buffer,cl_bool blocking_write,size_t offset,size_t size,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

Page 26: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

26OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Set Kernel Arguments (for each one)cl_int clSetKernelArg (cl_kernel kernel,

cl_uint arg_index,size_t arg_size,const void *arg_value)

Run Kernelcl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,

cl_kernel kernel,cl_uint work_dim,const size_t

*global_work_offset,const size_t *global_work_size,const size_t *local_work_size,cl_uint

num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

Page 27: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

27OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Parameters of clEnqueueNDRangeKernel – work_dim is the number of dimensions used to

specify the global work-items and work-items in the work-group.

– global_work_offset can be used to specify an array of work_dim unsigned values that describe the offset used to calculate the global ID of a work-item.

– If global_work_offset is NULL, the global IDs start at offset (0, 0, … 0).

– local_work_size points to an array of work_dim unsigned values that describe the number of work-items that make up a work-group (also referred to as the size of the work-group) that will execute the kernel specified by kernel.

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 28: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

28OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Parameters of clEnqueueNDRangeKernel– global_work_size into appropriate work-group

instances. If local_work_size is specified, global_work_size must be evenly divisible by local_work_size.

– event_wait_list and num_events_in_wait_list specify events that need to complete before this particular command can be executed.

– event returns an event object that identifies this particular kernel execution instance.

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 29: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

29OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Read Bufferscl_int clEnqueueReadBuffer (

cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_read,size_t offset,size_t size,void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

Query Platform

Query Devices

Create Context

Create Command Queue

Compile Program

Create Kernel

Create Buffers

Write buffers

Set Kernel Arguments

Run Kernel

Read buffers

Page 30: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

30OPENCL INTRODUCTION | APRIL 11, 2014

2.5.2 HOST

Release– clReleaseKernel– clReleaseProgram– clReleaseMemObject– clReleaseCommandQueue– clReleaseContext– clReleaseDevice

Page 31: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

31OPENCL INTRODUCTION | APRIL 11, 2014

2.6 TOOLS

1. AMD Profiler

2. gDEBugger

Page 32: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

32OPENCL INTRODUCTION | APRIL 11, 2014

2.6.1 AMD PROFILER

Counters

We can see the running information of any kernel.

Page 33: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

33OPENCL INTRODUCTION | APRIL 11, 2014

2.6.1 AMD PROFILER

Trace

Trace the OpenCL Runtime.

Page 34: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

34OPENCL INTRODUCTION | APRIL 11, 2014

2.6.2 GDEBUGGER

Debug into kernel

Page 35: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

35OPENCL INTRODUCTION | APRIL 11, 2014

THANK YOU!

Page 36: OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT.11 2014

36OPENCL INTRODUCTION | APRIL 11, 2014

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.