parallel image processing - purdue engineeringparallel image processing lei cao and yan wang...

22
Parallel Image Processing Lei Cao and Yan Wang 2013-04-19 1

Upload: others

Post on 22-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Parallel Image Processing Lei Cao and Yan Wang

2013-04-19

1

Page 2: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

2

Image Size: 4753 * 3168 (*3 RGB)

Page 3: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

3

LOMO effect: boundaries are made darker; add contrast to the red channel.

Page 4: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

4

Page 5: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

5

Tilt-shift effect: certain regions are made out-of-focus by Gaussian filtering

Page 6: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Outline

Algorithm &Serial Code Profiling

OpenMP Parallelization

MPI Parallelization

Performance

Future Work

6

Page 7: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Algorithm

7

Tilt-shift LOMO

Page 8: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Serial Program Profiling

8

4%

71%

8%

15%

2% 0%

Time

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Tested at Purdue’s Hansen cluster (Dell compute nodes with four 12-core AMD Opteron

6176 processors). Compiler: intel/12.0.084. R/W the image with the JPEG library

LOMO

Page 9: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

9

255 135 2

221 127 18

94 100 33 a1 a2 a3

a4 a5 a6

a7 a8 a9

( )* ( )( )

( )

p i a ip center

a i

Gaussian Filtering:

Center Weighted Average

4753 * 3168 (*3

RGB)

Gaussian Filter:

Page 10: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

10

255 135 2

221 127 18

94 100 33 a1 a2 a3

a4 a5 a6

a7 a8 a9

Page 11: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

11

221 127 18

94 100 33

32 93 105

a1 a2 a3

a4 a5 a6

a7 a8 a9

Page 12: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

94 100 33

32 93 105

19 18 17

12

a1 a2 a3

a4 a5 a6

a7 a8 a9

Page 13: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

19 18 94 100 33

32 93 32 93 105

19 18 19 18 17

19 18 94 100 33

13

a1 a2 a3

a4 a5 a6

a7 a8 a9

Less computation Vs. More computation

Load Imbalance: dynamical scheduling in OpenMP

Page 14: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

OMP Parallelization

Dynamics scheduling

nowait

Loop coalescing: transform 2D image matrix

to 1D array

14

Page 15: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Performance

15

4%

71%

8%

15%

2% 0%

Serial

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

Serial compiler: intel/12.0.084

Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.

16%

44% 6%

26%

6%

2%

8-core OpenMP

Page 16: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Dynamic Scheduling

16

17%

39%

7%

28%

7%

2%

Dynamic

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

Serial compiler: intel/12.0.084

Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.

16%

44% 6%

26%

6%

2%

Static

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Page 17: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

nowait

17

17%

39%

7%

28%

7%

2%

w/ nowait

16%

44% 6%

26%

6%

2%

w/o nowait

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Page 18: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Speedup

18

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Original Dynamic w/o

nowait

Nowait w/o

dynamic

Dynamic

&nowait

Time

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

Parallel compiler: openmpi/1.4.4_intel-12.0.084; 8 cores are used.

+15% +2%

+15%

Page 19: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Speedup of OpenMP and MPI

19

0 5 10 15 20 250

5

10

15

20

25

No. of procs

Sp

ee

du

p

Ideal

OpenMP

MPI

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

OMP compiler: openmpi/1.4.4_intel-12.0.084;

MPI compiler: mpich2-intel/1.4.4_intel-12.0.084

Page 20: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

MPI Vs. OpenMP

20

9%

52% 6%

25%

6%

2%

8-core MPI

Read Image

Tilt Shift

Red Tune

Vignette

Write Image

Others

Hansen cluster (Dell compute nodes with four 12-core AMD Opteron 6176 processors);

Serial compiler: intel/12.0.084; OpenMP compiler: openmpi/1.4.4_intel-12.0.084; MPI

compiler: mpich2/1.4.4_intel-12.0.0848

16%

44% 6%

26%

6%

2%

8-core OpenMP

MPI: creates a large array in each thread, which is inefficient.

Page 21: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

Future Work

Other image sizes (smaller or larger)

Find other aspects to improve the speedup

Try different chunk size in dynamic

scheduling

21

Page 22: Parallel Image Processing - Purdue EngineeringParallel Image Processing Lei Cao and Yan Wang 2013-04-19 1 . 2 ... (Dell compute nodes with four 12-core AMD Opteron 6176 processors)

22

Thanks!