turbomachinery r&d acceleration using...
TRANSCRIPT
![Page 1: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/1.jpg)
© C
opyrig
ht
2009
Turbomachinery R&D
Acceleration using Titan
GTC 2015, San Jose, CA
R. Srinivasan
Aero/Thermodynamic Engineer
Dresser-Rand
![Page 2: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/2.jpg)
© C
opyrig
ht
2009
Outline
Background
Software Developments
Application Strategy
Titan focused development
Applications
Open Science
Turbomachinery R&D
Conclusions
2
![Page 3: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/3.jpg)
© C
opyrig
ht
2009
BACKGROUND
3
![Page 4: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/4.jpg)
© C
opyrig
ht
2009
Background
Partnership with Oak Ridge National Laboratory’s Leadership
Computing Facility
Utilize large scale DoE hardware at OLCF to improve development
time of new turbomachinery designs
Commercial software package
The Fine/Turbo suite
• CFD tools for high fidelity turbomachinery analysis
• Advanced features for fast, high fidelity simulations
Software development program
Jointly funded
Time-allocation through DD and ALCC
4
![Page 5: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/5.jpg)
© C
opyrig
ht
2009
Background – Software Developments
Virtual decomposition of the multi-
block grid
Distributed memory parallel
computing with MPI
I/O improvements
Parallelization of turbomachinery
related BC
Efficient scalability with up to 4000
compute processes
Ongoing effort
5
![Page 6: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/6.jpg)
© C
opyrig
ht
2009
Application Strategy
Turbomachinery R&D
Time frame for design and development
Parametric optimization
Ensemble runs for database generation
Design refinement
6
![Page 7: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/7.jpg)
© C
opyrig
ht
2009
TITAN FOCUSSED
DEVELOPMENT
7
![Page 8: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/8.jpg)
© C
opyrig
ht
2009
GPU Strategy
Utilization of new GPU hardware on Titan
Software Limitations
Industrial Limitations
Cost of rewriting the solver
Investigated alternatives
Selection of appropriate code module for acceleration
8
![Page 9: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/9.jpg)
© C
opyrig
ht
2009
GPU Accelerated Module Advanced residual smoothing algorithm, bringing a level of
implicitness to the time integration solver
Permits the use of high CFL numbers
Yields a drastic decrease in the number of iterations needed for
convergence (~4x)
Time per iteration is increased due to the additional module
arithmetic (~2x)
Attractive candidate for threaded execution, either on CPU or
GPU
9
![Page 10: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/10.jpg)
© C
opyrig
ht
2009
Hardware and Software
Considerations
Titan nodes are equipped with PCIe 2.0
Each node contains 16 CPU cores, all of which will
simultaneously transfer data to a single GPU
Bandwidth in/out of the GPU is a bottleneck
GPU memory is also a limitation for larger
computations
OpenACC programming model
Directive based approach for usability and ease of
maintainance
10
![Page 11: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/11.jpg)
© C
opyrig
ht
2009
Test Case Performance
11
Typical sample speedup of 1.25x, maximum of 1.7x.
Positive speedup of
1.25x observed during
execution of typical
sample model
Speedup dependence
on internal boundary
conditions
Peak speedup of 1.7x
observed for conformal
BC model
![Page 12: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/12.jpg)
© C
opyrig
ht
2009
IO Performance
12
Order of magnitude increase in write speed.
![Page 13: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/13.jpg)
© C
opyrig
ht
2009
APPLICATIONS
13
![Page 14: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/14.jpg)
© C
opyrig
ht
2009
Open Science
Supersonic inlet/isolator
model
Shockwave boundary layer
interaction
Conformal block boundaries
Experimental comparison
On-going work
14
Supersonic inlet/isolator simulation
utilizing GPU developments
![Page 15: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/15.jpg)
© C
opyrig
ht
2009
Optimization D-R utilizes optimization techniques for enhancing
turbomachinery design
Results for perturbed database designs
Analyze multi-dimensional data
Database refinement and design selection
Tolerance sensitivity
15
Titan – An enabling technology for accelerating
turbomachinery R&D.
![Page 16: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/16.jpg)
© C
opyrig
ht
2009
Impact on Time-to-Solution
16
![Page 17: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/17.jpg)
© C
opyrig
ht
2009
Summary
Development efforts to enable scalable solver execution
on OLCF hardware.
The GPU acceleration of the solver module
The CPUBooster convergence acceleration module
ported to GPU
Restructuring and instrumentation of the CPUBooster
module with OpenACC directives has yielded a global
iteration speedup of 1.2X-1.7X
With proper code restructuring, OpenACC allows easy
GPU acceleration
17
![Page 18: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/18.jpg)
© C
opyrig
ht
2009
Summary
The use of the CPUBooster module along with the Titan
GPUs results in a 2.5X decrease in turnaround time and
resource usage for D-R’s optimization work.
Large scale optimization related simulations are feasible
due to availability of access to Titan
Successful porting of the CPUBooster module has led
to a new development program to port additional solver
modules to execute on GPUs.
18
![Page 19: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/19.jpg)
© C
opyrig
ht
2009
Acknowledgements This research used resources of the Oak Ridge Leadership Computing Facility at
the Oak Ridge National Laboratory, which is supported by the Office of Science of
the U.S. Department of Energy under Contract No. DEAC05-00OR22725.
Dr. Jack Wells, Director of Science, ORNL
Suzy Tichenor, Director of Industrial Partnerships, ORNL
David Gutzwiller and Mathieu Gontier, Numeca
John Levesque (CRAY)
Jeff Larkin (Nvidia)
Dresser-Rand Employees
19
![Page 20: Turbomachinery R&D Acceleration using Titanon-demand.gputechconf.com/gtc/2015/presentation/S... · Titan nodes are equipped with PCIe 2.0 Each node contains 16 CPU cores, all of which](https://reader033.vdocuments.mx/reader033/viewer/2022050503/5f95993fee8a3d73d5693f9e/html5/thumbnails/20.jpg)
© C
opyrig
ht
2009
Questions