performance bugs on real world heterogeneous architecture
TRANSCRIPT
![Page 1: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/1.jpg)
Performance Bugs on real world heterogeneous architecture�
Tsung Tai Yeh Purdue University�
1�
![Page 2: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/2.jpg)
Heterogeneous GPU architecture�
Integrated GPU Discreted GPU�
+ Exist in Mobile, NB
+ Low power (ARM+Terga) + Share global Memory
+ Exist in Laptop, Servers
+ High performance + Specified global Memory
Heterogeneous Architecture is everywhere
![Page 3: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/3.jpg)
Heterogeneous Architecture Programming�
• Programming Language – CUDA – OpenCL – OpenACC
• Programming Model – Interact with CPU‐GPU – Hierarchical Thread Architecture – MulPple Memories (Shared, context, global memory )
![Page 4: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/4.jpg)
What’s criPcal issue on heterogeneous architecture�
Specifica6ons CPU (Intel i7) GPU (Geforce GTX 690)
# of cores 4 3072
Clock speed 2.5 GHz 975MHz
Memory Bandwidth 384GB/sec
Power consumpPon 140‐160W 300W
• Power, and performance
Source: Nvidia, Intel
![Page 5: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/5.jpg)
How to fight Performance Bugs�
Bug Detec)on�
Bug Fixing�
Performance Tes)ng�
Bug Avoidance�
![Page 6: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/6.jpg)
What is performance bugs�
• Performance bugs – The defects where relaPvely simple source‐code changes can significantly speed up socware, while preserving funcPonality.
• Socware efficiency is increasing important – Hardware is not geeng faster (per‐core) – Socware is geeng more complex
– Energy saving is geeng more urgent
![Page 7: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/7.jpg)
Performance bugs and tradiPonal bugs �
• Root‐cause of bugs – Similar with tradiPonal bugs, since they are all related to usage rules of funcPons/APIs
• Bug‐fixing – Developers cannot fight performance bugs by themselves, since they cannot predict future workload or code changes to avoid bugs
• Long life Pme of performance bugs – The 36 Mozilla bugs took 935 days on average to get discovered, and
another 140 days on average to be fixed, compared to the funcPonal bugs, funcPonal bugs took 252 days on average to be discovered, and 117 days to be fixed. 7�
![Page 8: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/8.jpg)
What is the challenge of performance bugs�
• Not compiler‐associated – These defects cannot be opPmized away by compilers.
• Drawbacks of tesPng techniques – TradiPonal tesPng method allows the most performance bugs to escape.
– Performance bugs need inputs with special features to manifest.
– Performance bugs need large‐scale inputs to manifest in a perceivable way.
– Non fail‐stop symptoms.
![Page 9: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/9.jpg)
How performance bugs arise from?�
• Misuse inefficient funcPon‐call combinaPons • Skippable funcPon – Conduct unnecessary work given the calling context
• SynchronizaPon issues – Unnecessary synchronizaPon intensifies thread compePPon
• Others – Wrong data structures, hardware architecture issues, high‐level design/algorithm issues
9�
![Page 10: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/10.jpg)
0 10 20 30 40 50
Uncoo
rdinate
d Fun
cPon
s
Skippable
FuncPo
n
Synchron
izaP
on Issue
Others
MySQL
Mozilla
GCC
Chrome
Apache
Root Causes of Performance Bugs�
Performance Bug Detec)on�
Domina)ng �
Implica)on: Future bug detecPon research should focus on these common root causes.
![Page 11: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/11.jpg)
How Performance Bugs Manifest�
Performance Tes)ng�Implica)on: New input generaPon
tools are needed.
0
20
40
60
80
Always AcPve Special Feature Special Scale Feature+Scale
MySQL
Mozilla
GCC
Chrome
Apache
![Page 12: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/12.jpg)
0
20
40
60
80
Always AcPve Special Feature Special Scale Feature+Scale
MySQL
Mozilla
GCC
Chrome
Apache
How Performance Bugs Manifest�
Performance Tes)ng�
Special Feature� Large Scale�
![Page 13: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/13.jpg)
LocaPons of Performance Bugs�
Performance Bug Detec)on�
Implica)on: DetecPng inefficiency in nested loops is criPcal.
![Page 14: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/14.jpg)
How Performance Bugs are Fixed�
Performance Bug Fixing�
0 10 20 30 40 50
Change Call Sequence
Change CondiPon
Change A Parameter
Others
MySQL
Moziila
GCC
Chrome
Apache
• Patch sizes are small – 42 patches are no larger than 5 LOC – Median patch size = 8 lines of codes Fixing perf. bugs does not hurt readability
![Page 15: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/15.jpg)
Performance bugs on GPU�
• Global memory data access pakerns – Adjacent, row‐based, shared data accesses
• Block dimensions in thread hierarchy – No data reuse through shared memory
• Code Portability – Different GPU necessitate different performance consideraPon
• FuncPon specializaPon • FloaPng‐point number computaPons
15�
![Page 16: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/16.jpg)
Performance bugs and Reliability�
• Performance and energy – Performance ↑, and energy cost ↓
– Energy = power x Pme (dynamic power + leakage power) x Pme
• Energy and temperature (heat) – Energy ↑, and temperature ↑
• Heat and reliability – Heat ↑, and reliability ↓
16�
![Page 17: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/17.jpg)
The future works�
• Performance bugs recogniPon – Cause by Heterogeneous architecture – Cause by Parallel programming – What these bugs look like? – What difference between tradiPonal bugs?
– What the root cause of these bugs?
• Performance bugs detecPon – How to detect these bugs? – How different compare with tradiPonal bug detecPon techniques 17�
![Page 18: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/18.jpg)
The future works�
• Bug avoidance – How to fix them?
– AutomaPc or manual?
• Modeling performance bugs and system reliability – How performance bugs affect system, and hardware reliability?
– Energy‐aware, reliability‐aware programming
18�
![Page 19: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/19.jpg)
Conclusion �
• Performance bugs research – Performance bugs are geeng to be criPcal in compuPng
– The performance bugs analysis research is in infant stage
– The complexity of socware cause performance bugs are difficult to discover
• Heterogeneous architecture programming – Portability – Power, performance
19�
![Page 20: Performance Bugs on real world heterogeneous architecture](https://reader034.vdocuments.mx/reader034/viewer/2022043013/626b61c5c5e8a14195699650/html5/thumbnails/20.jpg)
Reference�
• [1] Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, Shan Lu, Understanding and DetecPng Real‐World Performance Bugs , PLDI, 2012
• [2] Y. Yang, P. Xiang, M. Mantor, and H. Zhou, “Fixing Performance Bugs: An Empirical Study of Open‐Source GPGPU Programs”, ICPP, 2012
• [3] Li, Guodong and Li, Peng and Sawaya, Geof and Gopalakrishnan, Ganesh and Ghosh, Indradeep and Rajan, Sreeranga P., GKLEE: concolic verificaPon and test generaPon for GPUs, PPoPP 2012
• [4] Zheng, Mai and Ravi, Vignesh T. and Qin, Feng and Agrawal, Gagan, GRace: a low‐overhead mechanism for detecPng data races in GPU programs, PPoPP 2011
• [5] Nvidia, www.nvidia.com
20�