peta-scale les for turbulent flows based on lbm...
TRANSCRIPT
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on
Lattice Boltzmann Method
A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on
Lattice Boltzmann Method
1
GTC (GPU Technology Conference) 2013, San Jose, 2013, March 20
Takayuki Aoki
Global Scientific Information and Computing Center (GSIC)Tokyo Institute of Technology
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Compute Node(3 Tesla M2050 GPUs)
Performance: 1.7 TFLOPSMemory: 58.0GB(CPU)
+9.7GB(GPU)
Rack (30 nodes)
Performance: 51.0 TFLOPSMemory: 2.03 TB
System (58 racks)1442 nodes: 2952 CPU sockets,
4264 GPUsPerformance: 224.7 TFLOPS (CPU) ※ Turbo boost
2196 TFLOPS (GPU)Total: 2420 TFLOPS
TSUBAME 2.0TSUBAME 2.0
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 3
TSUBAME SupercomputerTSUBAME Supercomputer
Graph 500No. 3 (2011)
紙
2013 Q3 or Q4All the GPU will be replaced by new accelerators
TSUBAME 2.5 will have 15-17 PFlopsIn single precisionPerformance.
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Development of New MaterialsDevelopment of New MaterialsMicrostructureMechanical Structure
Improvement of fuel efficiency by reducing the weight of transportation and mechanical structures
Developing lightweight strengthening material by controlling microstructure
Low-carbon society
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Weather NewsWeather News
気象庁(http://www.jma.go.jp/jma/index.html)
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Full GPU Implementation: ASUCAFull GPU Implementation: ASUCA
10
Full GPU ApproachGPU
Dynamics PhysicsInitial condition
outputCPU
J. Ishida, C. Muroi, K. Kawano, Y. Kitamura, Development of a new nonhydrostatic model “ASUCA” at JMA, CAS/JSC WGNE Reserch Activities in Atomospheric and Oceanic Modelling.
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 11
ASUCA Typhoon Simulation500m-horizontal resolution 4792×4696×48Using 437 GPUs
ASUCA Typhoon Simulation500m-horizontal resolution 4792×4696×48Using 437 GPUs
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
東京都心部の計算エリア東京都心部の計算エリア
• 新宿区・渋谷区・目黒区・千代田区・中央区・港区・江東区を含む10km×10km四方のエリア
• 建物データ:(株)パスコTDM 3D(簡易版)
12地図データ ©2012 Google, ZENRIN
Air Flow in a 10km x 10km Area of Tokyo
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 13
Lattice Boltzmann MethodLattice Boltzmann Method eq
iiiii ffftf
1e
uuueue 2
242 2
32
931ccc
wf iiieq
i
i is the value in the direction of ith discrete velocityei is the discrete velocity set; wi is the weighting factorc is the particle velocity u is the macroscopic velocity
12
3
4
5
6
78
9
1112
1314
15
16
17
18
10
Collision step: Streaming step:Strongly Memory Bound Problem:
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
LES (Large-Eddy Simulation)LES (Large-Eddy Simulation)
Molecular viscosity andEddy viscosity
Energy spectrum
SGSSGSGSGS
Relaxation time for LES model
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
LES modelingLES modeling
Dynamic Smagorinsky model
Coherent-Structure Smagorinsky model
○ Simpleinaccurate for the flow with wall boundaryemperical tuning for the constant model coefficient
○ applicable to wall boundarycomplicated calculationaverage process over the wide area→ not available for complex shaped body→ not suitable for large-scale problem
model coefficient○ applicable to wall boundary○ model coefficient is locally
determined.
Smagorinsky model
→ model coefficient determined by the second invariant ofthe velocity gradient tensor
*H.Kobayashi, Phys. Fluids.17, (2005).*H.Kobayashi, Phys. Fluids.17, (2005).
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Molecular viscosity + eddy viscosity
Smagorinsky model subgrid closure CS = 0.22
LES modeling on LBMLES modeling on LBMTurbulence model :
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Coherent-structure SGS modelCoherent-structure SGS modelDynamic Smagorinsky model (DSM)
Second invariant of the velocity gradient tensor(Q) andEnergy dissipation(ε)
The model parameter is locally determined by the second invariant of the velocity gradient tensor.
DSM requires to take an average operation for a wide areato determine the model parameter.
<> : average operation
○ Automatically determine model coefficient×Turbulent flow around a complex objectxComputational efficiency is poor
◎ Turbulent flow around a complex object◎ Large-scale parallel computation
Coherent-structure Smagorinsky model
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology
Computational AreaComputational AreaMajor part of TokyoIncluding Shnjuku-ku, Chiyoda-ku, Minato-ku, Meguro-ku, Chuou-ku,
10km×10km
Building Data:Pasco Co. Ltd.TDM 3D
18Map©2012 Google, ZENRIN
Shinjyuku Tokyo
Shinagawa
Shibuya
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Area Around Metropolitan Government Building
Area Around Metropolitan Government Building
21
Flow profile at the 25m height on the ground
640 m
960 m
地図データ ©2012 Google, ZENRIN
Wind
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 22
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 24
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 25
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 26
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Performance of the GPU codePerformance of the GPU codePerformance estimationby using Improved Roofline Model
*CUDA Programing TuningUsing SFU (Special Function Unit) and single precision computation
Kernel fusion of the collision step and streaming step
Loop unrolling to save resister usage
32bit compile198 GFlops(efficiency 92%)
310 MLUPS(Mega Lattice site Updates /sec)
+Reduction of the address calculation by use of a 32-bit compile option
64bit compile183 GFlops(efficiency 88%)
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Performance (Strong Scalability)Performance (Strong Scalability)
• For the fixed problem size, the performances are shown with increasing the number of GPUs. By introducing the overlapping technique, the performance is improved up to 30%.
• It is found that the elapsed time is shorted by increasing GPUs.
28
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology
Performance (Weak Scalability)Performance (Weak Scalability)
29
600 TFLOPSon 4000 GPUs
15 % of the peakperformance
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 30
Turbulent Flowbehind football
Re = 100,000
Mesh:2000x1000x1000
Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 35
SUMMARYSUMMARY■ Lattice Boltzmann LES turbulent simulation
has been successfully conducted with 1-mresolution for 10km x 10km area by usingthe whole TSUBAME 2.0 resource.
■ Coherent-Structure Smagorinsky model workswell in association with LBM.
■ The performance of 15% has been achieved on TSUBAME 2.0.