![Page 1: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/1.jpg)
Performance Analysis of AMD Multi-core Processor and Graphical Processing Units
Mohammad Ashraf BhuiyanMelissa C. SmithVivek K. Pallipuram
June 2011
This work supported in part by NSF Grant No. CCF-0916387
![Page 2: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/2.jpg)
Motivation
The recent trend of computingMulticore and many-core processorsMany-core GPUs
Various types of Accelerators availableNumber of cores, threadsMemory hierarchyProgramming modelsCode optimization techniques
Parallel program development requires knowledge ofAcceleration techniques and optimizationsApplication characteristics
This calls forPerformance analysis of Accelerators for ApplicationsUnderstanding the match between Accelerators and Applications
2
![Page 3: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/3.jpg)
Outline
Experimental SystemAMD 8 core and 32 core CPUAMD 1600 core GPU
Spiking Neural NetworkBiological ModelsNetwork Design
Preliminary ResultsEffect of problem sizeEffect of optimizationsEffect of threads/cores
Future Work
3
![Page 4: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/4.jpg)
Experimental Systems
4
Utilizing several leading architecturesAMD 8-core (Opteron 2356)AMD 32-core (Opteron 6134)AMD 1600-core GPU (Radeon 5870)
![Page 5: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/5.jpg)
Case Study: Neuron Models & Network
5
Two Layer Network:
SNN Model FLOPs per neuron update
Memory Accessper Neuron (Byte)
FLOP/ByteRatio
Izhikevich 13 20 0.65
Wilson 38 44 0.86
Morris-Lecar 132 28 4.71
Hodgkin-Huxley 246 44 6.02
Image
Level 1 neurons
Level 2 neurons
![Page 6: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/6.jpg)
Network (Problem Size) Scaling
6
Image Size Level 1Neurons
Level 2 Neurons
Total Neurons
96×96 9216 48 9264
192×192 36864 48 36912
240×240 57600 48 57648
…… …… …… ……
2400×2400 5,760,000 48 5,760,048
3120×3120 9,734,400 48 9,734,448
![Page 7: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/7.jpg)
Preliminary Results
Accelerator performance studyProblem sizeOptimization techniquesAccelerator configuration
Number of threads for CPULocal work group size for GPU
7
![Page 8: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/8.jpg)
Problem Size Variation
8
Izhikevich Wilson
Speedup over a serial implementation on Intel core 2 quad, 2.66 GHz, using all compiler optimizations
![Page 9: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/9.jpg)
Problem Size Variation Cont.
9
Morris-Lecar Hodgkin-Huxley
![Page 10: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/10.jpg)
Optimization Techniques Used
10
AMD Multi-core
1. pth: POSIX thread, 2. SSE: Streaming SIMD
Extension 3, 3. SP: Software Prefetching
AMD Radeon GPU
1. MT: Multithread 2. SP: Software Prefetching3. LM: Local Memory4. MW: Memory Write 5. MAT: Unsafe Math and
Native Math 6. RCS: Reducing
Conditional Statement7. VEC: Vector Calculation
![Page 11: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/11.jpg)
Optimization: AMD 8 core
11
Izhikevich Wilson
pth: POSIX thread, SSE: Streaming SIMD Extension 3, SP: Software Prefetching
![Page 12: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/12.jpg)
Optimization : AMD 8 core Cont.
12
Morris-Lecar Hodgkin-Huxley
pth: POSIX thread, SSE: Streaming SIMD Extension 3, SP: Software Prefetching
![Page 13: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/13.jpg)
Optimization: AMD 32 core
13
Izhikevich Wilson
pth: POSIX thread, SSE: Streaming SIMD Extension 3, SP: Software Prefetching
![Page 14: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/14.jpg)
Optimization : AMD 32 core Cont.
14
Morris-Lecar Hodgkin-Huxley
pth: POSIX thread, SSE: Streaming SIMD Extension 3, SP: Software Prefetching
![Page 15: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/15.jpg)
Optimization : AMD 1600 core GPU
15
Izhikevich Wilson
MT: multithread, SP: software prefetching, LM: local memory, MW: memory write, RCS: reducing conditional statement, MAT: Unsafe and Native math, VEC: Vector Calculation
![Page 16: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/16.jpg)
Optimization: AMD 1600 core GPU
16
Morris-Lecar Hodgkin-Huxley
MT: multithread, SP: software prefetching, LM: local memory, MW: memory write, RCS: reducing conditional statement, MAT: Unsafe and Native math VEC: Vector Calculation
![Page 17: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/17.jpg)
Thread Effect: AMD 8 core
17
![Page 18: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/18.jpg)
Thread Effect: AMD 32 core
18
![Page 19: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/19.jpg)
Thread Effect: AMD 1600 core GPU
19
![Page 20: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/20.jpg)
Performance Observations
20
Problem Size EffectGenerally performance improves with problem sizeIzhikevich model on AMD 8 core CPU
Speedup of 9x for 9000 neurons; 16x for 9.7 million neurons
HH model on AMD 1600 core GPUSpeedup of 11x for 9000 neurons;603x for 9.7 million neurons
Flop:byte Ratio EffectsHigher value provides better performanceIzhikevich (0.65): 12xHH (6.02) : 603x
![Page 21: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/21.jpg)
Performance Observations
21
Architecture Specific Optimizations Generally performance improves with optimizationsAlso depends on
Problem sizeFlop:byte ratio
Threading EffectGenerally performance improves with threadsAlso depends on
Problem sizeOverhead for Intra-processor communications
![Page 22: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/22.jpg)
Future Work
22
Extend the experimentHeterogeneous architecture (multi-core + GPU)Multi-node accelerators (Supercomputers)Accelerators from other vendorsOther application kernels such as
BioinformaticsMolecular DynamicsOptimization problems (Simulated Annealing)
![Page 23: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/23.jpg)
Related Publications
23
JournalMohammad Bhuiyan, Melissa C. Smith, Vivek K. Pallipuram, “Performance, Optimization and Fitness: Connecting Applications to Architectures”, in Journal of Concurrency and Computation: Practice and Experience, Wiley, December 2010, DOI: 10.1002/cpe.1688Vivek K. Pallipuram, Mohammad Bhuiyan, and Melissa C. Smith, “A Comparative Study of GPU Programming Models and Architectures”, in Journal of Supercomputing, Springer, May 2011, DOI: 10.1007/s11227-011-0631-3
ConferenceMohammad Bhuiyan, Ananth Nallamuthu, Melissa C. Smith, and Vivek K. Pallipuram, “Optimization and Performance Study of Large-scale Biological Networks For Reconfigurable Computing,” in proceedings of HPRCTA, SC 10, New Orleans, April 2010Mohammad Bhuiyan, Vivek K. Pallipuram and Melissa C. Smith, “Acceleration of Spiking Neural Networks in Emerging Multi-core and GPU Architectures,” in IEEE proceedings HiCOMB, IPDPS, Atlanta, GA, April 2010Kenneth Rice, Mohammad Bhuiyan, Tarek M. Taha, Christopher N. Vutsinas, Melissa C Smith, “FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition,” in proceedings ReConFig09, pp. 451 – 456, Dec. 2009
![Page 24: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/24.jpg)
Thank you
24
![Page 25: Performance Analysis of AMD Multi-core Processor and Graphical …developer.amd.com/wordpress/media/2013/09/2907_2_final.pdf · 2013. 10. 24. · Performance Analysis of AMD Multi-core](https://reader034.vdocuments.mx/reader034/viewer/2022051901/5ff00e1f97828018441804eb/html5/thumbnails/25.jpg)
Disclaimer & AttributionThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.
25