vega france grenoble, 02/07/2010 · – image size is increasing – algorithm complexity is...
TRANSCRIPT
April 2010
Earth Observation Image Processing on GPU
1. VEGA France presentation2. Case studies :
1. CNES : Port 3 Algorithms on GPU2. EADS Astrium : MATLAB functions on GPU3. VEGA : SMAC on GPU for S2PAD project
3. VEGA Expertise & Approach for GPU Computing
April 2010
Revenue: €13,428MStaff: 60,748
Selex Systems Integration LtdRevenue: €145MStaff: 965
Staff: 540 Staff: 355 Staff: 45 Staff: 25
Selex Sistemi Integrati SpARevenue: €616MStaff: 3000
VEGA France : inside FinMeccanica
April 2010
GPU Computing : inside VEGA France
• GPU Computing : – a dedicated expertise in VEGA France since end of 2008– Attached to Ground Segment operations
April 2010
• VEGA GPU Expertise is based on :– More than thirty years of expertise on Space Sciences with the
development of Satellite Ground Segments, Satellite Training Centres, Earth Observation highly complex algorithms, etc.
– More than ten years of expertise on GPU programming from the original technology (OpenGL) to the latest ones (Cuda, OpenCL, MATLAB GPU).
– Partnership with GPU Computing specialists (NVIDIA, GPU-Tech, etc.)
– Software R&D & Hardware investment (Tesla)
GPU Computing : VEGA Expertise
April 2010
• A four step approach :
1. Study the algorithm/code to port on GPU : 1. How much of the code is “portable” to GPU ?
• bottleneck identification
2. What will be the gain (x5 ? x10 ? x50 ?) in real ?• analogy with existing GPU code, prototype (MATLAB)
2. Port or rewrite the code to port on GPU :
1. Port means “encapsulate” the existing code to be run by the GPU
• the algorithm is kept as is, without any modification 2. Rewrite mean “think” the algorithm to run specifically on GPU
• the algorithm is fully rewritten and need to be validated again
3. Optimize the GPU code to get more performance (VEGA expertise)
4. Integration the GPU code in the existing/new SW/HW environment
GPU Computing : VEGA approach
April 2010
• The customer can start with a simple study (phase 1) which is very “light” (< 5 men-month)
• During the porting phase (phase 2), VEGA respects :
1. Non-regression : GPU results are compliant to CPU results
2. Portability : GPU code can run on any GPU HW solution (mono or multi-GPU, AMD/ATI, NVIDIA) using OpenCL standard
3. Compatibility : GPU interface is identical to CPU interface
4. Performance : VEGA will assure the customer the gain estimated during the study phase
GPU Computing : VEGA approach
April 2010
CNES : Earth observation image processing on GPU
April 2010
• Problem : how to reduce the “age of the data”, the time between satellite acquisition and data availability in the world of Earth Observation where :– Image size is increasing– Algorithm complexity is increasing
• Solutions :– Get thousand of CPU : total cost explodes !– R&T Study to validate the use of GPU
• CNES R&T Project : “Earth observation image processing on GPU” submitted in spring 2008, won and realized by VEGA (09/08-05/09)
CNES : Earth observation image processing on GPU
April 2010
• Two phases :1. Study
• Software and hardware solutions on the market (study, then selection)
• Algorithms feasibility
2. Port and tests• Port on GPU the selected algorithms
• Optimisation, test and comparison with CPU version
CNES : Earth observation image processing on GPU
April 2010
• Hardware solution synthesis (12/2008)
CNES : Earth observation image processing on GPU
April 2010
• Software solution synthesis (12/2008)
CNES : Earth observation image processing on GPU
April 2010
• Algorithms feasibility : select 3 among 6 algorithms to port on GPU :– De-convolution– De-noising– Correlation– Zoom / De-zoom with or without rotation– Multi-spectral fusion– JPEG2000 compression
CNES : Earth observation image processing on GPU
April 2010
• De-convolution– Bottle-neck : FFT
• Optimised implementation on GPU exist (CuFFT)• Inconsistent memory access
– Preliminary results
CNES : Earth observation image processing on GPU
April 2010
• De-noising– Wavelet decomposition– Convolution then decimation– Many input parameters
CNES : Earth observation image processing on GPU
April 2010
• Correlation– Independent data with coherent access– Near to spatial convolution
CNES : Earth observation image processing on GPU
April 2010
• Zoom / De-zoom– Great for GPU : parallel operation on separate data– Preliminary results :
CNES : Earth observation image processing on GPU
April 2010
• Multi-spectral fusion– Frequential zoom + convolution (FFT)– Tile processing
CNES : Earth observation image processing on GPU
April 2010
• JPEG2000 compression– Algorithms :
• Color conversion (RVB->YUV)• Wavelet transformation
• Quantification
– Entropic coding : • 90% of compression
time
• Not (easily) parallel
CNES : Earth observation image processing on GPU
April 2010
• Algorithms classification
CNES : Earth observation image processing on GPU
April 2010
• Study phase conclusions :– Hardware selection : no big difference, selection is
based on software maturity– Software selection : CUDA (NVIDIA) is more mature and
offer the best performance– Algorithm selections :
• Zoom / De-zoom
• Fusion
• Correlation
CNES : Earth observation image processing on GPU
April 2010
• Second phase starts :– Code each algorithm with CUDA (starting from the
algorithm specifications, not the CNES code )– For each algorithm :
• we define a most common use case• We will compare with an existing CPU version from a CNES
library (MARIO, ASTRID, MEDICIS)
– All the tests (CPU & GPU) are done on the same machine :• CPU : Intel Xeon 4 cœurs @2.33GHz, 8Go RAM
• GPU : NVidia GeForce GTX285, 1 Go RAM
CNES : Earth observation image processing on GPU
April 2010
• Zoom / De-zoom results :– Image size : 3000x4000
– Zoom factor (bi-cubic) : 0.25, 0.5, 2, 4 & 8
– Comparison with MARIO library (time in s)
CNES : Earth observation image processing on GPU
April 2010
• Zoom / De-zoom results :– Simple precision results (time in ms)
– Double precision results (time in ms)
CNES : Earth observation image processing on GPU
April 2010
• Zoom / De-zoom validation :– Difference between the CPU and the GPU image :
• Min : 0.0
• Max : 0.0
• Mean : 0.0
• Standard deviation : 0.0
CNES : Earth observation image processing on GPU
April 2010
• Fusion results :– Comparison of the 3 key steps :
• Frequential zoom x7/5 from ASTRID validated data (image 1460x1460)
• Frequential zoom x4 from low resolution Pleiades images (image 8550x2500)
• Convolution on high resolution Pleiades images (image 34200x10000)
– GPU is working in single precision on 512x512 tile– Comparison with ASTRID (time in s) :
CNES : Earth observation image processing on GPU
April 2010
• Fusion results (time in ms) :
CNES : Earth observation image processing on GPU
April 2010
• Fusion validation :– Difference between the CPU and the GPU images (16-bit images) :
• Min : -1.0
• Max : 1.0
• Mean : 0.0
• Standard deviation : 0.004209
CNES : Earth observation image processing on GPU
April 2010
• Correlation results :– GPU disparity map generation compared to MEDICIS launcher– Launcher is configured to be near GPU algorithm– GPU works on tile on destination image : size changes according
sampling step (512x512 to 128x128)– Comparison with MEDICIS :
• MEDICIS parameters : – Computation method : frequential linear correlation– Sub-pixel computation method : direct interpolation with CNES interpolator– Analyse window size : 13x13– Exploration window size : 7x7
• Computation in single and double precision• Input image size (6360x6360)• Step size 1 optimized for GPU
CNES : Earth observation image processing on GPU
April 2010
• Correlation results (time in s) :
CNES : Earth observation image processing on GPU
April 2010
• Correlation results (time in ms) :
CNES : Earth observation image processing on GPU
April 2010
• Correlation validation :– Normalized correlation coefficient :
• Min : -0.00001
• Max : 0.0
• Mean : 0.0
• Standard deviation : 0.0
– Disparity :• Min : -1.955496
• Max : 3.0
• Mean : 0.014435
• Standard deviation : 0.269406
CNES : Earth observation image processing on GPU
April 2010
CNES : Earth observation image processing on GPU
April 2010
• Finally :– For the two more time consuming algorithms, we have
true gain compare to CPU versions (above x10)– Gain is obtained with a complete comparison (GPU
Computing + read/write access to the data) and without specific (=GPU) optimization
• As a consequence (for the CNES) :– There is no need for new R&T on the subject since
CNES validate the result of the study and is now porting some of its library to GPU
– New interests from user point of view who can access new functionalities
CNES : Earth observation image processing on GPU
April 2010
EADS Astrium : MATLAB functions on GPU
April 2010
EADS Astrium : MATLAB functions on GPU
• EADS Astrium develops its own Image Processing toolbox (SIMAGE) based on MATLAB functions
• VEGA Expertise to port three of the SIMAGE functions on GPU :– Convolution– Interpolation– Correlation
• Ports means :– Either rewrite the functions in CUDA– Or test with existing MATLAB-GPU solutions (plug-ins,
MATLAB beta versions including GPU acceleration)
April 2010
EADS Astrium : MATLAB functions on GPU
• Rewrite in CUDA means :– Current code/algorithm analysis– Identification of CPU bottleneck to port on GPU
– CUDA writing
– CUDA code « Encapsulation » to be called from MATLAB– MATLAB script writing to call the CUDA function
• MATLAB-GPU solutions studied :– Jacket from AccelerEyes– GPUlib from Tech-X Corporation– GPUmat from GP-you– MATLAB beta 1 and beta 2 from Mathworks
April 2010
EADS Astrium : MATLAB functions on GPU
• Convolution– Test 1 : 101x101 filter applied on a 500x500 image applied 5
times
– Test 2 : Test 1 + Disk I/O
Convolution CPU-MATLABCPU-VEGA
« naïve »GPU-VEGA
« naïve »GPU-VEGA
optimisé
GPU-VEGA optimisé simple
précision
GPU-MATLAB Jacket (single)
Test CONV_TEST1 CONV_TEST1 CONV_TEST1 CONV_TEST1 CONV_TEST1 CONV_TEST1Temps (s) 9,9 22,78 2,22 0,71 0,28 0,92
Gain (vs CPU-MATLAB)
na 0,434 4,88 14 35 10,65
Erreur max na 0 1,82E-12 3,31E-12 0,00266 2,67E-05
Convolution CPU-MATLABGPU-VEGA
optimiséTest CONV_TEST2 CONV_TEST2
Temps (s) 275,4 15,58Gain (vs CPU-
MATLAB)17,68
Erreur max 2,70E-06
Convolution CONV_TEST1 CONV_TEST2Entrée-sortie
disque0 2,714s (17%)
Transfert CPU-GPU
0,002s (4,2%) 0,366s (2%)
Calcul GPU 0,052s (95%) 12,5s (80%)
April 2010
EADS Astrium : MATLAB functions on GPU
• Correlation– Test 1 : correlation between analyse window (size 11x11) &
exploration window (25x25) applied 2500 times– Test 2 : test 1 applied on each point of a regular grid– Test 3 : test 2 with disk I/O
CorrelationGPU-MATLAB Jacket (single)
Test CORR_TEST1 CORR_TEST2 CORR_TEST3 CORR_TEST1 CORR_TEST2 CORR_TEST3 CORR_TEST1
Temps (s) 1,07 1,40 1,51 0,41 0,04 0,11 2,05Gain (vs
CPU-MATLAB)
na na na 2,6 35 13,7 0,52
Erreur max na na na 2,50E-16 5,28E-08 5,28E-08 0,00E+00
Correlation CORR_TEST1 CORR_TEST2 CORR_TEST3
Entrée-sortie disque
0 0 0.068s (61%)
Transfert CPU-GPU
0.00375s (0.9%)
0.0015s (3%)0.0015s (1.3%)
Calcul GPU 0.0275s (7%) 0.04s (96%) 0.04s (36%)
CPU-MATLAB GPU-VEGA optimisé
April 2010
EADS Astrium : MATLAB functions on GPU
• Interpolation– Test : compute output image from reference image and re-
sampling grid; final value interpolation is based on « sinus cardinal apodisée » function
Interpolation CPU-MATLABGPU-Vega
(double)GPU-Vega
(simple)GPU-MATLAB Jacket (single)
GPU-MATLAB GPUmat (simple)
GPU-MATLAB GPUmat (double)
Temps (s) 9,06 0,11 0,02 0,38 3,93 3,86Gain (vs CPU-
MATLAB)80 440 25,08 2,35 2,32
Erreur max 2,00E-13 7,80E-05 8,50E-07 8,90E-05 5,70E-14
InterpolationGPU-Vega
(double)GPU-Vega
(simple)Entrée-sortie
disque0 0
Transfert CPU-GPU
0,005s (4,8%) 0,004s (28%)
Calcul GPU 0,098s (93%) 0,010s (71%)
April 2010
EADS Astrium : MATLAB functions on GPU
• Results analysis (all in double) :– Convolution : x14 gain factor with CUDA optimized version
with 3.31E-12 error (good GPU « candidate » since more than80% of the time is spent on GPU processing)
– Correlation : x13.7 gain factor with CUDA optimized version with 5.28E-08 error (good GPU « candidate » sincecomplicated operation which need a lot of GPU processing)
– Interpolation : x80 gain factor with CUDA optimized version with 2E-13 error (good GPU « candidate » since more than90% of the time is spent on GPU processing)
April 2010
EADS Astrium : MATLAB functions on GPU
• MATLAB GPU solution synthesis :
April 2010
VEGA : Sentinel 2 MSI Products
April 2010
• VEGA F in charge of investigating the SMAC algorithm for Atmospheric correction within Level 2A Product (ESRIN S2PAD).
• SMAC (Simplified Method for Atmospheric Correction) initially developed by Rahman and Dedieu (1994) is a method which allows correcting atmospheric effects of satellite data time series. The method has been upgraded by Berthelot and Dedieu, (1997) and used for the correction of atmospheric effects of VEGETATION data in an operational processing line. It is also added as a processor in the ESA BEAM toolbox to correct MERIS L1B data from atmospheric effects over land.
• SMAC allows estimating the surface reflectances from TOA reflectances and the knowledge of the state of the atmosphere (water vapour, ozone and aerosol contents).
VEGA : Sentinel 2 MSI Products
April 2010
• What is the gain to port the current version of SMAC code (in C language) from CPU to GPU without any algorithm optimisation (we don’t modify the algorithm) ?
VEGA : Sentinel 2 MSI Products on CUDA/Open CL
for (int x = 0; x < nYSize; x++){
for (int y = 0; y < nXSize; y++){
SMAC_ca(x, y)
}
}
int x = blockIdx.x * BLOCK_SIZE + threadIdx.x;int y = blockIdx.y * BLOCK_SIZE + threadIdx.y;
SMAC_ca(x, y)
CPU code GPU codex ?
April 2010
• Test platform :
• CPU : Intel Xeon 4 cœurs @2.33GHz, 8Go RAM
• GPU : NVidia Tesla C1060, 4 Go RAM vidéo
• Test data : 8000x8000x1 band images
• Results : GPU code is almost 18 times faster than CPU code
VEGA : Sentinel 2 MSI Products on CUDA/Open CL
S2PAD CPU OpenCL CUDACUDA Optimized
Ratio CPU / OpenCL
Ratio CPU / CUDA
Ratio CPU / CUDA Optimized
Read DISK Time 2283,4 2271,8 2122 2124,6 1,01 1,08 1,07Write DISK Time 863,2 1114,8 1132,3 1117,3 0,77 0,76 0,77Read from GPU Time n/a 447,8 59,4 60 n/a n/a n/aWrite to GPU Time n/a 163 50 49 n/a n/a n/aCompile Time n/a 1209,6 n/a n/a n/a n/a n/aGPU Time 79897,3 862,9 681,4 284,4 92,59 117,25 280,93Total Time 85061,7 8188,6 5114,9 4744,3 10,39 16,63 17,93
April 2010
• What does x18 means….
VEGA : Sentinel 2 MSI Products on CUDA/Open CL
Initial Image
18 Result Images
1 Result Image
GPU
CPU
Thank you !