simon mcintosh-smith [email protected] head of...
TRANSCRIPT
![Page 1: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/1.jpg)
Exploiting OpenCL for heterogeneous computing:
a case study
Simon McIntosh-Smith [email protected] Head of Microelectronics Research
1
![Page 2: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/2.jpg)
! Collaborators • Richard B. Sessions, Amaurys Avila Ibarra
• University of Bristol, Biochemistry • James Price
• University of Bristol, Computer Science • Tsuyoshi Hamada, Felipe Cruz
• University of Nagasaki, Japan
2
![Page 3: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/3.jpg)
! Molecular docking
3
Proteins typically O(1000) atoms Ligands typically O(100) atoms
![Page 4: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/4.jpg)
! BUDE: Bristol University Docking Engine
Typical docking scoring functions
Empirical Free Energy Forcefield
BUDE
Free Energy calculations MM1,2 QM/MM3
Entropy: solvation configurational Electrostatics All atom Explicit solvent
No Yes Yes Approx Approx Yes ? Approx Yes No Yes Yes No No Yes
Accuracy
Speed
1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006)
2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007)
3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 4
![Page 5: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/5.jpg)
! How BUDE’s EMC works
5
![Page 6: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/6.jpg)
! Experimental results
6
![Page 7: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/7.jpg)
! Redocking into Xray Structure
7 1CIL (Human carbonic anhydrase II)
![Page 8: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/8.jpg)
! Another example
8 1EZQ (Human Factor XA)
![Page 9: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/9.jpg)
! OpenCL for heterogeneous computing
GPU
CPU CPU A modern computer includes: – One or more CPUs – One or more GPUs
9
OpenCL (Open Compute Language) lets programmers write a single portable program that uses ALL
resources in the heterogeneous platform
GPU
![Page 10: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/10.jpg)
! BUDE’s heterogeneous approach
1. Discover all OpenCL platforms/devices, inc. both CPUs and GPUs
2. Run a micro benchmark on each device, ideally a short piece of real work
3. Load balance using micro benchmark results
4. Re-run micro benchmark at regular intervals in case load changes
10
![Page 11: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/11.jpg)
! Benchmark results
11
13.4 11.7
8.0 7.3 6.5
5.3 4.3
0.8
13.6 13.3 11.7 11.0
6.6 6.3 5.9 4.6 4.5
1.2 1.2 1.0 0.9 0.9 0.7 0.3 0.3 0 2 4 6 8
10 12 14 16
Spee
dup
(hig
her i
s be
tter)
![Page 12: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/12.jpg)
13.4
11.7
8.1 7.3
6.5 5.3
0
2
4
6
8
10
12
14
16
M2050 (x2) GTX-580 M2070 M2050 GTX-285 HD5870
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
12
GPUs
![Page 13: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/13.jpg)
1.0
0.9 0.9
0.7
0.3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
E5462 (Fortran) x2
E5620 x2 i7-2600K i5-2500T A8-3850 CPU
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
13
CPUs
![Page 14: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/14.jpg)
13.6
11.7
5.9
1.2
0
2
4
6
8
10
12
14
16
M2050 (x2) & E5620 (x2)
GTX-580 & E8500 HD5870 & i5-2500T A8-3850 GPU & CPU
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
14
Heterogeneous (CPUs+GPUs)
![Page 15: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/15.jpg)
13.4
11.7
1.0
0
2
4
6
8
10
12
14
16
M2050 x2 GTX-580 E5462 (Fortran) x2
Spee
dup
(hig
her i
s be
tter)
! Selected performance results
15
Where we started
Where we finished Where we could go
£340
![Page 16: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/16.jpg)
! Relative energy and run-time
16
Measurements are for a constant amount of work. Energy measurements are “at the wall” and include any idle components.
88% reduction in energy 93% reduction in time
16.2
11.4 10.6
9.1 9.3
2.4 1.0
13.2
15.2 15.4
5.9 6.6
0.7 1.0
0
2
4
6
8
10
12
14
16
18
GTX-580 M2050 (x2)
M2050 (x2) +
E5620 (x2)
HD5870 HD5870 + i5-2500T
i5-2500T E5620 (x2)
Relative Performance Per Watt
Relative Performance
![Page 17: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/17.jpg)
! What does this let us do?
17
![Page 18: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/18.jpg)
! Potentially save lives
18
NDM-1 responsible for antibiotic resistance giving rise
to “superbugs”
![Page 19: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/19.jpg)
! GPU-system DEGIMA
19
• Used 144 GPUs in parallel for drug docking simulations • ATI Radeon HD5870 & Intel i5-2500T
• ~300 TFLOPS single precision • Courtesy of Tsuyoshi Hamada and Felipe Cruz, Nagasaki
![Page 20: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/20.jpg)
! NDM-1 experiment • 1 million candidate drug molecules times
20 conformers each à 20M dockings • 1.23x1017 atom-atom energies calculated • 267 days of GPU compute time and 224
days of CPU compute time • ~55 hours actual wall-time • A second run with 8 million molecules,
160M conformers on >200 GPUs is running right now!
20
![Page 21: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/21.jpg)
! Conclusions • OpenCL enables truly heterogeneous computing,
harnessing all hardware resources in a system • GPUs can yield significant savings in energy costs
(and equipment costs) • OpenCL can work just as well for multi-core CPUs
as it does for GPUs
It’s possible to screen libraries of millions of molecules against complex targets using highly accurate, computationally-expensive methods in one weekend using equipment costing O(£100K)
21
![Page 22: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/22.jpg)
! For an introduction to GPUs The GPU Computing Revolution – a Knowledge Transfer Report from the London Mathematical Society and the KTN for Industrial Mathematics • https://ktn.innovateuk.org/web/mathsktn/
articles/-/blogs/the-gpu-computing-revolution
22
![Page 23: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,](https://reader034.vdocuments.mx/reader034/viewer/2022050113/5f4a19480a397212dd7f73ed/html5/thumbnails/23.jpg)
! References • S. McIntosh-Smith, T. Wilson, A.A. Ibarra, J.
Crisp and R.B. Sessions, "Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems", The Computer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091
• N. Gibbs, A.R. Clarke & R.B. Sessions, "Ab-initio Protein Folding using Physicochemical Potentials and a Simplified Off-Lattice Model", Proteins 43:186-202,200
23