gpu das csiro astronomy and space science chris phillips 23 th october 2012
TRANSCRIPT
![Page 1: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/1.jpg)
GPU DAS
CSIRO ASTRONOMY AND SPACE SCIENCE
Chris Phillips23th October 2012
![Page 2: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/2.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
VLBI DAS – What is needed
• Convert analog signal to digital• Format data including time code
In the age of disk recording and software correlation anything else is “optional”
2 |
![Page 3: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/3.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
VLBI DAS – What is desirable
• Channelization•Polyphase filterbank• Re-quantize data
•Reduced data rate• Tsys extraction•Digital linear to circular conversion• Plus lots more – phasecal extraction etc
3 |
![Page 4: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/4.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
“Traditional DAS”
•High speed (multi-bit) sampler and FPGA for processing•RDBE, DBBC etc• FPGA high compute capability, low power usage
and compact• Ideal for VLBI usage• But…
4 |
![Page 5: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/5.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
GPU Processing
•GPUs are extremely fast and cheap•Nvidia GTX 690
•$1000•6 Tflops (theoretical)
5 |
![Page 6: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/6.jpg)
GPU Arithmetic Performance
![Page 7: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/7.jpg)
GPU Power Efficiency
![Page 8: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/8.jpg)
GPU Memory Bandwidth
![Page 9: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/9.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
CUDA
• A parallel computing architecture developed by NVIDIA. • Extensions to C(++) to allow massive
parallelization running on a GPU•Run thousands of threads in parallel• Very simple to program
13 |
![Page 10: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/10.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips14 |
__global__ void multipy_kernel(float *a, const float *b,
const float *b) {
const size_t i = blockDim.x * blockIdx.x + threadIdx.x;
a[i] = b[i]*c[i];}
![Page 11: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/11.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
GPU DAS Experiment
•Goal 1 GHz dual pol•8 Gbps @ 2bits• Assume dual pol sampled at 8 bit real• Channelize to 16 MHz• Linear to Circular conversion• (Tsys)• Convert to 2 bit• Pack (interleave)
15 |
![Page 12: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/12.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Steps Required
• Copy data to GPU• Convert 8 bit to float•De-interleave data• FFT (eventually polyphase filterbank)• Complex gain correction• Linear to circular• Calculate RMS•Quantization and interleave• Testing on GTX 590 ($500)
16 |
![Page 13: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/13.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
I/O Bandwidth
• 1 GHz real samples @8 bits, dual pol 32 Gbps
• bandwidthTest
•Host GPU 47 Gbps•GPU Host 50 Gbps
• CUDA allows DMA transfer while processing
17 |
![Page 14: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/14.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Convert to Float
• 142ms
18 |
__global__ void unpack8bit_2chan_kernel(float *dest, const int8_t *src, int N) {
const size_t i = blockDim.x * blockIdx.x + threadIdx.x; const size_t j = i*2;
dest[i] = static_cast<float>(src[j]); dest[i+N] = static_cast<float>(src[j+1]);}
![Page 15: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/15.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
FFT
• 465 ms
19 |
cufftPlan1d(&plan, 32, CUFFT_R2C, batch);cufftSetCompatibilityMode(plan,
CUFFT_COMPATIBILITY_NATIVE);
cufftExecR2C(plan, in, out);
![Page 16: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/16.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Linear to Circular
• 254 ms
20 |
__global__ void linear2circular_kernel(cuFloatComplex *data, int nchan, int N, cuFloatComplex *gain) {
int i = blockDim.x * blockIdx.x + threadIdx.x; int c = i % nchan; cuFloatComplex temp;
data[i] = cuCmulf(data[i], gain[c]); data[i+N] = cuCmulf(data[i+N], gain[c+nchan]);
temp = cuCsubf(data[i], data[i+N]); data[i+N] = cuCaddf(data[i], data[i+N]); data[i] = temp;}
![Page 17: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/17.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
RMS
• “Reduction” so potential problematic• Took SDK sample for mean calculation• Assumed real sampled data
• 115 msec
21 |
![Page 18: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/18.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Convert to 2bit and interleave
• Just use if/then/else block with 3 thresholds• Specially coded 32 channel dual pol case
• 5.8 seconds!!!
22 |
![Page 19: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/19.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Convert to 2bit and interleave, try 2
• Just use if/then/else block with 3 thresholds•Used simple dual pol case
• 247 msec
23 |
![Page 20: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/20.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Summary
• But 2 GPU so OK
24 |
Time (msec)
Transfer to GPU OK
Unpack and convert to float 142
FFT 465
RMS 115
Circular to Linear 253
Quantize and pack 247
Total 1222
![Page 21: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/21.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Polyphase filterbank?
• Tests using FFT channelization• Prefer PFB•Wisdom says PFB 1.5-2x the compute of FFT• Assuming 2x time for FFT and dual GPU gives
total computer time as 0.84 seconds
• 320msec to implement Tsys extraction if simple RMS calculation is not enough
25 |
![Page 22: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/22.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
CPU Comparison?
GTX 590 2x Intel E5-2609 @ 2.40GHz
Unpack and convert to float 142 315
FFT (PFB) 465 (930) 1469 (2939)
RMS 115 243
Circular to Linear 253 1196
Quantize and pack 247 479
Total (sec) 0.61 (0.84) 3.7 (5.2)
26 |
![Page 23: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/23.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
GPU Comparison
GTX 590 GTX690 GTX670* GTX480
Unpack 142 156 148 136
FFT 465 420 407 450
RMS 115 111 174 110
Circular to Linear 253 220 213 241
Pack 247 123 353 260
Total (sec) 1.22 1.14 1.30 1.20
27 |
*Single GPU, rest are dual GPU
![Page 24: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/24.jpg)
1st International VLBI Technology Workshop: GPU DAS | Chris Phillips
Conclusion
•Modern GPU should have enough compute power to cover most VLBA DAS requirements• A lot of scope for speed improvements
•Combine some stages together
• A working system would allow for a very generic backend•VLBI, spectral line, pulsar, fast transient detector…
28 |
![Page 25: GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012](https://reader030.vdocuments.mx/reader030/viewer/2022032806/56649f085503460f94c1d5de/html5/thumbnails/25.jpg)
CSIRO Astronomy and Space Science
t +61 2 9372 4608E [email protected] www.atnf.csiro.au
CSIRO ASTRONOMY AND SPACE SCIENCE
Thank you