simulation of microwave induced thermoacoustic imaging model using gpu nilangshu bidyanta ramaprasad...
TRANSCRIPT
Simulation of Microwave Induced Thermoacoustic Imaging Model
using GPU
Nilangshu BidyantaRamaprasad KulkarniECE 562 Term Project
Project Overview
• Introduction
Figure: Schematic model (a) side view (the longer transverse dimension of the waveguide lies on the x-axis.) and (b) top view. Figure taken from reference [1].
Project Overview
• Motivation• Hypothesis• Project goals
Methodology
• Implementation of PSTD paper [Ref 1]
• Existing Matlab code, wrote C++ code using CUDA
Methodology• Code analysis: A simplified version of the PSTD implementation can
be written as follows:for(i=0; i<2000; i++){
Bx[i] = a*(Bx[i-1] + Bx[i-2] + Bx[i-3]) – b*myFunc(Cx[i-1], Cx[i-2], Cx[i-3]);
Bx[i-3] = Bx[i-2]; Bx[i-2] = Bx[i-1]; Bx[i-1] = Bx[i];..
Cx[i] = a*(Cx[i-1] + Cx[i-2] + Cx[i-3]) – b*myFunc(Bx[i-1], Bx[i-2], Bx[i-3]);Cx[i-3] = Cx[i-2]; Cx[i-2] = Cx[i-1]; Cx[i-1] = Cx[i];
.
.
}
• In each iteration, the set of four equations for Bx is repeated for By and Bz and similarly for Cy and Cz. Also, the function ‘myFunc’ has FFT and IFFT computations.
Results and Discussions
• Observed speedup w.r.t Matlab code– 11.84x speedup for GPU vs Matlab (2 core)– 2.68x speedup for GPU vs Matlab (6 core)
• Due to significant amount of data transfers, the GPU speedup vs Mutlicore machine is not significant.
• Matlab was run on a 6-core Intel i7 machine and 2-core Intel Core 2 Duo machine runningWindows 7, 64-bit OS.
• GPU code was run on a 4-core Intel Xeon machine running Linux 64-bit OS.
Results and Discussions
0 20 40 60 80 100 120 140 160 180 2001
10
100
1000
10000
100000
0
5
10
15
20
25
Matrix Size vs. Computation Time
2 Core
6 core
GPU time
GPU vs 2 core
GPU vs 6 core
Matrix Size
Com
puta
tion
Tim
e (m
s)
Results and Discussions
0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 90000000
2000
4000
6000
8000
10000
12000
14000
16000
0
5
10
15
20
25
Matrix Size vs. Computation Time
2 core
6 core
GPU time
GPU vs 2 core
GPU vs 6 core
# of Matrix Elements
Com
puta
tion
time
(ms)
Results and Discussions
• Limitation for further speedup is – Loop-carried dependency– Data intensive rather than computation intensive– Memory requirement increases exponentially– For 320x320x320 matrix, out of 3000ms for each
iteration, around 2000ms is taken for data transfer– Data transfer time is significant compared to
computation time.
Lessons learnt
• Access to entire code, potential for further speedup
• Serial computation on CPU can be improved by parallelizing the code to utilize multiple cores.
References• Wang, X.; Bauer, D. R.; Witte, R.; Xin, H., "Microwave-Induced
Thermoacoustic Imaging Model for Potential Breast Cancer Detection," IEEE Transactions on Biomedical Engineering, vol.59, no.10, pp.2782-2791, Oct. 2012. doi: 10.1109/TBME.2012.2210218.
• NVIDIA Corp. 2007. NVIDIA CUDA Compute Unified device architecture programming guide 1.1. Technical report, NVIDA, Santa Clara, CA
THANK YOU
QUESTIONS