image treatment implementing extended depth of field with...

1
Image Treatment Implementing Extended Depth of Field with NVIDIA® CUDA® M. Hernández Ariza, C. J. Barrios Hernández, A. Plata Gómez and D.A. Sierra Bueno Universidad Industrial de Santander, Bucaramanga, Colombia http://sc3.uis.edu.co Extended depth of field (EDF) is a specific method used to analyze and treat specific image zones in optical research. Due to the complexity of the EDF and the possible large volume of data processed in optics problems, EDF is a good candidate to process in parallel architectures. From a large set of images taken through a microscope, we propose an implementation of parallel-extended depth of field addressed to massive parallel computing machines based on NVIDIA® GPUs. The proposed algorithms were implemented using NVIDIA® CUDA® and MPI with interesting performance results observed in terms of efficiency over different platforms maintaining the accuracy. Image Processing with EDF PARALLEL EDF Images Stack Final Topographic Image We use different plaforms: a GPGPU Cluster on Grenoble Informatics Laboratory named IDGraf with 6 NVIDA® TESLA® C2050 + 1 NVIDIA® Geforce® GTX 295, 72 GB RAM and 2 Intel Xeon X5650. The second platform was a GPGPU Beowulf Cluster with GPU Nvidia® FX 570, 2GB RAM and 2 Intel 2.2. GHz QuadCore 2, and third, some regular nodes of Grid’5000 French Grid Platform and some generic nodes of GridUIS-2 plaform (Both cases without GPUs). The following results correspond to tests of images with a high resolution of 1920*2560 pixels. Defined grid for NVIDIA® CUDA®, has a dimension equal to the image resolution (1920*2560 threads). Each thread on i-j position compute over each i-j element of the matrix. For MPI code, each process compute over all i-j elements of the matrix. The idea to use different platforms is to observe behaviors on different possibilities. From generic nodes of beowulf clusters on Grid Computing platforms to sofisticated GPU infrastructures and non-sofisticated resources. Results show best performance with NVIDIA® CUDA® implementation in all platforms. The main reason of this performance is the possibility to launch the algorithm CUDA® kernel on parallel processing all i-j elements of the matrix simultaneously. On the other hand, MPI implementation limitations are increased by the communication cost among nodes. Tests and Results Parallel EDF Computation of RGB components on each one of the stack images. Array R The variation of color intensity is computed for each image pixel observed the nearby cells to determine the zone with high focus degree. From the G matrix of each image, variance matrix is computed to identify the focused points of the image. G Matrix (Enhanced) With the variance matrix, a position matrix is computed to build the topographic matrix. In this matrix each i-j position corresponds to the position occupied on the image stack by the image with high intensity value for the i-j pixel. This procedure allows to obtain the topographic image or relief of the object scanned by the microscope. Topography Finally, R,G,B components are computed from each i-j element of the position matrix. This process allows to obtain the maximum color intensity values for each one of the components from the color intensity values of each one of the stack images Then, the final builded image is a focalized image in all axes. R, G and B image components at the 9 stack position. Topographic Image Focalized Image Render Final Images Jason Sanders, Edward Kandrot, CUDA by Example An Introduction to General-Purpose GPU Programming, Nvidia, Addison Wesley, Ann Arbor, Michigan, United States, First printing July 2010 Nvidia, NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, Versión 1.1, 2007. George Em Karniadakis and Robert M. Kirby II, Parallel Scientific Computing in C++ and MPI, A seamless approach to parallel algorithms and their implementation, Cambridge University Press, Cambridge, Uneted Kingdom, First published 2003. References Variance Compute 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 Array G Array B 0 0 0 0 0 0 0 0 00 01 02 03 04 0 0 10 11 12 13 14 0 0 20 21 22 23 24 0 0 30 31 32 33 34 0 0 40 41 42 43 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 01 02 03 04 0 0 10 11 12 13 14 0 0 20 21 22 23 24 0 0 30 31 32 33 34 0 0 40 41 42 43 44 0 0 0 0 0 0 0 0 Array R 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 Array G Array B 5 9 3 4 10 2 6 8 1 6 9 1 10 3 5 4 8 2 7 9 10 6 9 3 5 Array R 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 00 01 02 03 04 10 11 12 13 14 20 21 22 23 24 30 31 32 33 34 40 41 42 43 44 Array G Array B R, G and B elements of the final focalized image L. Camargo Forero and A. Lobo, engineers of Scientific and High Performance Computing Service at UIS (SC3-UIS). Professors B. Raffin (INRIA Rhône Alpes), O. Richard and Y. Denneulin (LIG Laboratory and Grid’5000 Project). Also, Mr. M. Lansen of NVIDIA® Corporation and Optics and Signal Treatment Research Group at UIS (GOTS-UIS). Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr ) and the GridUIS-2 platform, being developed under the Universidad Industrial de Santander (UIS) High Performance and Scientific Computing Service development action with support from UIS Vicerrectoria de Investigación y Extension (VIE-UIS) and several UIS research groups and academic unities as well as other funding bodies (see https://grid.uis.edu.co ) . Acknowledgements

Upload: others

Post on 16-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Treatment Implementing Extended Depth of Field with ...on-demand.gputechconf.com/gtc/2012/posters/P0466... · Extended depth of field (EDF) is a specific method used to analyze

Image Treatment Implementing Extended Depth of Field with NVIDIA® CUDA® M. Hernández Ariza, C. J. Barrios Hernández, A. Plata Gómez and D.A. Sierra Bueno

Universidad Industrial de Santander, Bucaramanga, Colombia

http://sc3.uis.edu.co

Extended depth of field (EDF) is a specific method used to analyze and treat specific image zones in optical research. Due to the complexity of the EDF and the possible large volume of data processed in optics problems, EDF is a good candidate to process in parallel architectures. From a large set of images taken through a microscope, we propose an implementation of parallel-extended depth of field addressed to massive parallel computing machines based on NVIDIA® GPUs. The proposed algorithms were implemented using NVIDIA® CUDA® and MPI with interesting performance results observed in terms of efficiency over different platforms maintaining the accuracy.

Image Processing with EDF

PARALLEL EDF

Images Stack Final Topographic Image

We use different plaforms: a GPGPU Cluster on Grenoble Informatics Laboratory named IDGraf with 6 NVIDA® TESLA® C2050 + 1 NVIDIA®

Geforce® GTX 295, 72 GB RAM and 2 Intel Xeon X5650. The second platform was a GPGPU Beowulf Cluster with GPU Nvidia® FX 570, 2GB RAM and 2 Intel 2.2. GHz QuadCore 2, and third, some regular nodes of Grid’5000 French Grid Platform and some generic nodes of GridUIS-2 plaform (Both cases without GPUs). The following results correspond to tests of images with a high resolution of 1920*2560 pixels.

Defined grid for NVIDIA® CUDA®, has a dimension equal to the image resolution (1920*2560 threads). Each thread on i-j position compute over each i-j element of the matrix. For MPI code, each process compute over all i-j elements of the matrix.

The idea to use different platforms is to observe behaviors on different possibilities. From generic nodes of beowulf clusters on Grid

Computing platforms to sofisticated GPU infrastructures and non-sofisticated resources. Results show best performance with NVIDIA® CUDA® implementation in all platforms. The main reason of this performance is the

possibility to launch the algorithm CUDA® kernel on parallel processing all i-j elements of the matrix simultaneously. On the other hand, MPI implementation limitations are increased by the communication cost among nodes.

Tests and Results

Parallel EDF

Computation of RGB components on each one of the stack images.

Array R

The variation of color intensity is computed for each image pixel observed the nearby cells to determine the zone with high focus degree. From the G matrix of each image, variance matrix is computed to identify the focused points of the image.

G Matrix (Enhanced)

With the variance matrix, a position matrix is computed to build the topographic matrix. In this matrix each i-j position corresponds to the position occupied on the image stack by the image with high intensity value for the i-j pixel. This procedure allows to obtain the topographic image or relief of the object scanned by the microscope.

Topography

Finally, R,G,B components are computed from each i-j element of the position matrix. This process allows to obtain the maximum color intensity values for each one of the components from the color intensity values of each one of the stack images Then, the final builded image is a focalized image in all axes.

R, G and B image components at the 9

stack position.

Topographic Image Focalized Image Render

Final Images

Jason Sanders, Edward Kandrot, CUDA by Example An Introduction to General-Purpose GPU Programming, Nvidia, Addison Wesley, Ann Arbor, Michigan, United States, First printing July 2010

Nvidia, NVIDIA CUDA Compute Unified Device Architecture, Programming Guide, Versión 1.1, 2007.

George Em Karniadakis and Robert M. Kirby II, Parallel Scientific Computing in C++ and MPI, A seamless approach to parallel algorithms and their implementation, Cambridge University Press,

Cambridge, Uneted Kingdom, First published 2003.

References

Variance Compute

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

Array G Array B

0 0 0 0 0 0 0

0 00 01 02 03 04 0

0 10 11 12 13 14 0

0 20 21 22 23 24 0

0 30 31 32 33 34 0

0 40 41 42 43 44 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 00 01 02 03 04 0

0 10 11 12 13 14 0

0 20 21 22 23 24 0

0 30 31 32 33 34 0

0 40 41 42 43 44 0

0 0 0 0 0 0 0

Array R

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

Array G Array B

5 9 3 4 10

2 6 8 1 6

9 1 10 3 5

4 8 2 7 9

10 6 9 3 5

Array R

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

00 01 02 03 04

10 11 12 13 14

20 21 22 23 24

30 31 32 33 34

40 41 42 43 44

Array G Array B

R, G and B elements of the final focalized

image

L. Camargo Forero and A. Lobo, engineers of Scientific and High Performance Computing Service at UIS (SC3-UIS). Professors B. Raffin (INRIA Rhône Alpes), O. Richard and Y. Denneulin (LIG

Laboratory and Grid’5000 Project). Also, Mr. M. Lansen of NVIDIA® Corporation and Optics and Signal Treatment Research Group at UIS (GOTS-UIS).

Experiments presented in this paper were carried out using the Grid'5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and

several Universities as well as other funding bodies (see https://www.grid5000.fr) and the GridUIS-2 platform, being developed under the Universidad Industrial de Santander (UIS) High Performance and

Scientific Computing Service development action with support from UIS Vicerrectoria de Investigación y Extension (VIE-UIS) and several UIS research groups and academic unities as well as other funding

bodies (see https://grid.uis.edu.co) .

Acknowledgements