peta-scale les for turbulent flows based on lbm...

36
Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method 1 GTC (GPU Technology Conference) 2013, San Jose, 2013, March 20 Takayuki Aoki Global Scientific Information and Computing Center (GSIC) Tokyo Institute of Technology

Upload: dotu

Post on 06-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on

Lattice Boltzmann Method

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on

Lattice Boltzmann Method

1

GTC (GPU Technology Conference) 2013,  San Jose, 2013, March 20

Takayuki Aoki

Global Scientific Information and Computing Center (GSIC)Tokyo Institute of Technology

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Compute Node(3 Tesla M2050 GPUs)

Performance: 1.7 TFLOPSMemory: 58.0GB(CPU)

+9.7GB(GPU)

Rack (30 nodes)

Performance: 51.0 TFLOPSMemory: 2.03 TB

System (58 racks)1442 nodes: 2952 CPU sockets,

4264 GPUsPerformance: 224.7 TFLOPS (CPU) ※ Turbo boost

2196 TFLOPS (GPU)Total: 2420 TFLOPS

TSUBAME 2.0TSUBAME 2.0

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 3

TSUBAME SupercomputerTSUBAME Supercomputer

Graph 500No. 3 (2011)

2013 Q3 or Q4All the GPU will be replaced by new accelerators

TSUBAME 2.5 will have 15-17 PFlopsIn single precisionPerformance.

Drop on dry floor

5

Industrial Appl. Steering Oil

6

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Development of New MaterialsDevelopment of New MaterialsMicrostructureMechanical Structure

Improvement of fuel efficiency by reducing the weight of transportation and mechanical structures

Developing lightweight strengthening material by controlling microstructure

Low-carbon society

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Weather NewsWeather News

気象庁(http://www.jma.go.jp/jma/index.html)

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Full GPU Implementation: ASUCAFull GPU Implementation: ASUCA

10

Full GPU ApproachGPU

Dynamics PhysicsInitial condition

outputCPU

J. Ishida, C. Muroi, K. Kawano, Y. Kitamura, Development of a new nonhydrostatic model “ASUCA” at JMA, CAS/JSC WGNE Reserch Activities in Atomospheric and Oceanic Modelling.

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 11

ASUCA Typhoon Simulation500m-horizontal resolution 4792×4696×48Using 437 GPUs

ASUCA Typhoon Simulation500m-horizontal resolution 4792×4696×48Using 437 GPUs

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

東京都心部の計算エリア東京都心部の計算エリア

• 新宿区・渋谷区・目黒区・千代田区・中央区・港区・江東区を含む10km×10km四方のエリア

• 建物データ:(株)パスコTDM 3D(簡易版)

12地図データ ©2012 Google, ZENRIN

Air Flow in a 10km x 10km Area of Tokyo

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 13

Lattice Boltzmann MethodLattice Boltzmann Method eq

iiiii ffftf

1e

uuueue 2

242 2

32

931ccc

wf iiieq

i

i is the value in the direction of ith discrete velocityei is the discrete velocity set; wi is the weighting factorc is the particle velocity u is the macroscopic velocity

12

3

4

5

6

78

9

1112

1314

15

16

17

18

10

Collision step: Streaming step:Strongly Memory Bound Problem:

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

LES (Large-Eddy Simulation)LES (Large-Eddy Simulation)

Molecular viscosity andEddy viscosity

Energy spectrum

SGSSGSGSGS

Relaxation time for LES model

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

LES modelingLES modeling

Dynamic Smagorinsky model

Coherent-Structure Smagorinsky model

○ Simpleinaccurate for the flow with wall boundaryemperical tuning for the constant model coefficient

○ applicable to wall boundarycomplicated calculationaverage process over the wide area→ not available for complex shaped body→ not suitable for large-scale problem

model coefficient○ applicable to wall boundary○ model coefficient is locally

determined.

Smagorinsky model

→ model coefficient determined by the second invariant ofthe velocity gradient tensor

*H.Kobayashi, Phys. Fluids.17, (2005).*H.Kobayashi, Phys. Fluids.17, (2005).

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Molecular viscosity + eddy viscosity

Smagorinsky model subgrid closure CS = 0.22

LES modeling on LBMLES modeling on LBMTurbulence model :

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Coherent-structure SGS modelCoherent-structure SGS modelDynamic Smagorinsky model (DSM)

Second invariant of the velocity gradient tensor(Q) andEnergy dissipation(ε)

The model parameter is locally determined by the second invariant of the velocity gradient tensor.

DSM requires to take an average operation for a wide areato determine the model parameter.

<> : average operation

○ Automatically determine model coefficient×Turbulent flow around a complex objectxComputational efficiency is poor

◎ Turbulent flow around a complex object◎ Large-scale parallel computation

Coherent-structure Smagorinsky model

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Computational AreaComputational AreaMajor part of TokyoIncluding Shnjuku-ku, Chiyoda-ku, Minato-ku, Meguro-ku, Chuou-ku,

10km×10km

Building Data:Pasco Co. Ltd.TDM 3D

18Map©2012 Google, ZENRIN

Shinjyuku Tokyo

Shinagawa

Shibuya

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Area Around Metropolitan Government Building

Area Around Metropolitan Government Building

21

Flow profile at the 25m height on the ground

640 m

960 m

地図データ ©2012 Google, ZENRIN

Wind

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 22

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 24

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 25

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 26

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Performance of the GPU codePerformance of the GPU codePerformance estimationby using Improved Roofline Model

*CUDA Programing TuningUsing SFU (Special Function Unit) and single precision computation

Kernel fusion of the collision step and streaming step

Loop unrolling to save resister usage

32bit compile198 GFlops(efficiency 92%)

310 MLUPS(Mega Lattice site Updates /sec)

+Reduction of the address calculation by use of a 32-bit compile option

64bit compile183 GFlops(efficiency 88%)

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Performance (Strong Scalability)Performance (Strong Scalability)

• For the fixed problem size, the performances are shown with increasing the number of GPUs. By introducing the overlapping technique, the performance is improved up to 30%.

• It is found that the elapsed time is shorted by increasing GPUs.

28

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

Performance (Weak Scalability)Performance (Weak Scalability)

29

600 TFLOPSon 4000 GPUs

15 % of the peakperformance

Copyright © Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 30

Turbulent Flowbehind football

Re = 100,000

Mesh:2000x1000x1000

31

DriVar: BMW-Audi

3,000x1,500x1,500Re = 1,000,000

32

33

34

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology 35

SUMMARYSUMMARY■ Lattice Boltzmann LES turbulent simulation

has been successfully conducted with 1-mresolution for 10km x 10km area by usingthe whole TSUBAME 2.0 resource.

■ Coherent-Structure Smagorinsky model workswell in association with LBM.

■ The performance of 15% has been achieved on TSUBAME 2.0.

Copyright © Global Scientific Information and Computing Center, Tokyo Institute of Technology

Thank youfor your kind attention

Thank youfor your kind attention

36