the gpu supercomputer of cqsecqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · at the...

33
The GPU Supercomputer of CQSE The GPU Supercomputer of CQSE Workshop on GPU Supercomputing January 16, 2009 Ting-Wai Chiu (趙挺偉) Department of Physics, and Center for Quantum Science and Engineering National Taiwan University

Upload: others

Post on 20-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

The GPU Supercomputer of CQSE The GPU Supercomputer of CQSE

Workshop on GPU Supercomputing January 16, 2009

Ting-Wai Chiu (趙挺偉)

Department of Physics, and

Center for Quantum Science and Engineering

National Taiwan University

Page 2: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

2

Graphic Processing Unit (GPU) Supercomputing

A graphic card (e.g., Nvidia

GTX280) is capable to deliver

> 100 Gflops (sustained) with

the price less than NT$12,000.

It gives a speed up 10x –100x

comparing with a single CPU.

Two GTX280 in one motherboard

• This opens up a great opportunity for many scientific

and engineering problems (in CQSE) which require

enormous amount of number-crunching power.

• Recall that in the past 50 years, each 10x jump in

computing power motivated new ways of computing,

which in turn led to many scientific breakthroughs.

T.W. Chiu, GPU Workshop, Jan 16, '09

Page 3: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 3

Basic criteria for acquiring computing hardware

• What is its half-life ?

(How long it takes before its worth becomes only half of its value

at the time when it was purchased ?)

• What scientific/educational impacts it can produce within its half-life ?

(Note that, in sciences, only the first that counts).

• Are codes ready for production runs when the hardware is installed ?

(Never buy any hardware before your code for production runs is ready !)

• Is the price/performance ratio the optimal ?

(Taking into account of the power consumption, and the air-conditioning.)

Page 4: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

4

GPU Supercomputer of CQSE

• It constitutes of 16 units of Nvidia

Tesla S1070 (total 64 GPU, 64 x 4 GB),

with 16 servers (total 32 quadcore CPU,

16 x 32 GB)

• Peak performance is 64 Tflops

(50 times higher than that of any PC

cluster with the same price tag)

• With our GPU supercomputer, we can

tackle many large scale computations

without using the prohibitively expensive

supercomputers like IBM BlueGene.

• We have developed highly efficient CUDA

(Compute Unified Device Architecture )

codes for our computationally intense

problems (quantum chromodynamics,

quantum spin systems, and astrophysics)

T.W. Chiu, GPU Workshop, Jan 16, '09

Page 5: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 5

Projects for the GPU Supercomputer of CQSE

• Lattice QCD with Optimal Domain-Wall Fermion

(PI: Ting-Wai Chiu)

• Self-gravitating Gas Dynamics (PI: Tzihong Chiueh)

• Quantum Phase Transition in Strongly Correlated

Systems (PI: Ying-Jer Kao)

Page 6: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 6

Lattice QCD with Optimal Domain-Wall Fermion

Quantum ChromoDynamics

The quantum field theory for the

strong interaction, e.g., the strong

nuclear force.

Optimal Domain-Wall Fermion[T.W. Chiu, Phys. Rev. Lett., 90 (2003) 071601]

For computing quark propagator in lattice QCD with ODWF,

Nvidia GTX280 (C1060) attains 120 Gflops, 85x faster than

Intel QuadCore CPU [email protected].

Formulating QCD on a 4d space-time

lattice is called Lattice QCD such that

numerical solutions of QCD can be obtained. [ K. Wilson, PRD, 1974]

Page 7: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 7

• To understand the QCD vacuum fluctuations, and its

role in color confinement, and chiral symmetry breaking.

• To obtain the mass spectra of mesons and baryons,

their decay constants, and weak matrix elements.

• To obtain the mass spectra of exotic hadrons, e.g.,

hybrid mesons, 4-quark mesons, and pentaquark baryons.

Nonperturbative Strong Interaction Physics

to be tackled with GPU Supercomputer

This is the first large scale (state-of-the-art) lattice QCD

computation with Tesla S1070, without using any

expensive supercomputers like IBM BlueGene.

Page 8: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

One slice of 3D Turbulence

3D Navier-Stokes solver with Adaptive Mesh Refinement

(AMR) scheme. GTX280 (C1060) is 15x faster than

Intel QuadCore CPU [email protected]

AMR

8T.W. Chiu, GPU Workshop, Jan 16, '09

For details, see Justin Schive’s talk in the afternoon

Self-gravitating Gas Dynamics (PI: Tzihong Chiueh)

Page 9: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Cosmology problems to be tackled with

GPU Supercomputer

Highest resolution cosmology simulations to address the galaxy formation problem

Highest resolution MHD simulations to address the black-hole accretion problem

Highest resolution MHD simulations to address the star formation problem

9T.W. Chiu, GPU Workshop, Jan 16, '09

Page 10: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

simulation

algorithms

Computational Physics

Science and Engineering

entanglement

Quantum Information Theory

quantum

many-body

systems

10T.W. Chiu, GPU Workshop, Jan 16, '09

Quantum Phase Transition in Strongly

Correlated Systems (PI: Ying-Jer Kao)

Page 11: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Entanglement: tensor networks

Simulation of quantum many-body problems on a classical computer is hard

Use tensor networks (matrix/tensor product states) to reduce the number of degrees of freedom

Possible solution for simulating frustrated systems

(For details, see Ying-Jer Kao’s talk)

11T.W. Chiu, GPU Workshop, Jan 16, '09

GTX280 (C1060) attains 92 Gflops,

92x faster than Intel QuadCore CPU [email protected]

Page 12: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Impacts of the CQSE GPU Supercomputer

• CQSE is playing the leading role of GPU

supercomputing in Taiwan, achieving

world-class contributions in the frontiers

of QCD, quantum spin system, and

cosmology.

Quantum ChromoDynamics

Quantum Spin System

• CQSE will offer a graduate

course on CUDA

programming for science

and engineering students

at NTU, in the next

semester.

• CQSE is designing the-state-of-the-art

CUDA codes for a wide range of physics

and engineering problems which will lead

to exciting scientific discoveries, and the

cutting edge technologies.

One slice of 3D Turbulence

12T.W. Chiu, GPU Workshop, Jan 16, '09

85x

15x

92x

Page 13: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Simulating Lattice Simulating Lattice QQCCDDwith GPU supercomputer with GPU supercomputer

Workshop on GPU Supercomputing January 16, 2009

Ting-Wai Chiu (趙挺偉)

Department of Physics, and

Center for Quantum Science and Engineering

National Taiwan University

Page 14: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

14

The quantum field theory for the strong interaction between

quarks and gluons.

:

Gauge group gluons have self-interacti ons.

Asymptotic freedom: .

IR slavory:

(3)

( ) 0 as 0

( )

Salient features

SU

g r r

g r

15 quark/color confinement

No exact analytic solutions

1 as 10 mr

Quantum Chromodynamics (QCD)

T.W. Chiu, GPU Workshop, Jan 16, '09

Page 15: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

QuarksQuarks

Quarks are spin fermions carrying color,

and there are 6 species (flavors) of quarks.

1

2

u c t

d s b

u c t

d s b

u c t

d s b

Hadrons are color singlets composed of quarks

antisym. in colorP uu d

antisym. in colorN du d

u u ud dd

The nuclear force between nucleons emerges as

residual interactions of QCD15T.W. Chiu, GPU Workshop, Jan 16, '09

Page 16: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

The action of The action of QQCCDD

4

1tr

2

D D

D f f f

flavo

CQ QC

CQ

rs

S d x

F F i igA m

/2, [ , ], tr ab a b ca a a b abcT T i TA T A T T f

, a a a a a abc b cF T F F A A g f A A

, 1, ,8, generators of (3), 3 3 Hermitian matrices.aT a SU

Here the color and Dirac indices of quark fields are suppressed.

Explicitly, for quark atu , , ,x t x y z

, 1,2,3,4, , ,f c x gu r yc

16

4[ , , ] exp [ ]QC

a

D

aiZ J dA d d d x J A

Nobody can solve the ground state (vacuum) of QQCCDD !

T.W. Chiu, GPU Workshop, Jan 16, '09

Page 17: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

The Challenge of The Challenge of QQCCDD

At the hadronic scale, , perturbation theory isincapable to extract any quantities from QCD, nor to tacklethe most interesting physics, namely, the spontaneouslychiral symmetry breaking and the color confinement

1g r

To extract any physical quantities from the first principlesof QCD, one has to solve QCD nonperturbatively.

A viable nonperturbative formulation of QCD was firstproposed by K. G. Wilson in 1974, i.e., Lattice QCD

But, the problem of lattice fermion, and to formulate exact chiral symmetry on the lattice had not been resolved until 1992-98.

17T.W. Chiu, GPU Workshop, Jan 16, '09

Page 18: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Basic notions of Lattice Basic notions of Lattice QQCCDD

1. Perform Wick rotation: , then ,and the expectation value of any observable O

4t ix exp( ) exp( )EiS S

1

, , ESO dA d d O A eZ

ESZ dA d d e Recall that the divergences in QFT, which

requires reg. and ren. , stemming from d.o.f. ,

and proximity of any field operator .

2. Discretize the space-time as a 4-d lattice with latticespacing a. Then the path integral in QFT becomes a well-definedmultiple integral which can be evaluated via Monte Carlo

44L Na

1

, , E

i j

i j

S

k

k

eO dA d d O AZ

18T.W. Chiu, GPU Workshop, Jan 16, '09

Page 19: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

Gluon fields on the LatticeGluon fields on the Lattice

Then the gluon action on the lattice can be written as

where

ˆx a ˆˆx a a

x ˆx a

The color gluon field are defined on each linkconnecting and , through the link variable

A x 3SU

x ˆx a

ˆexp2

aU x iagA x

4

2plaquette

6 1 11 Re 0

3 2g pS U tr U a d x tr F x F x

g

† †ˆˆpU U x U x a U x a U x

19T.W. Chiu, GPU Workshop, Jan 16, '09

Page 20: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

LatticeLattice QQCCDD

,

The QCD action

where is the action of the gluon fields

( ) ( )

( )

( ) ( )

, , , , ,

fl v a r o

G

G

f f

a x f a x b y b y

S S U D U

S U

D U D U

f u d s c b t

sites

, 1,2,3

, =1,

index

color index

2,3,4

, 1

Dirac

, , =

inde

s

x

x y z t

a b

x y N N N N N

3

1

16 32

1,572,864 1,

ite index

For example, on the lattice, is a complex matrix

of size 572,864

d( , , ) ( , )( ,

et( )

det( )

, )

G

G

SS

SS

dUd d U e dU D U eU

dUd d e dU

D

D

D e

20T.W. Chiu, GPU Workshop, Jan 16, '09

Page 21: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 21

The Challenge of Lattice The Challenge of Lattice QQCCDD

So far, the lightest u/d quark cannot be put on the lattice.

To use ChPT to extrapolate lattice results to physical ones.

To have lattice volume large enough such that 1. m L

l To have attice spacing small enough such that 1.qm a

3

The lattice size should be at least .

The required computing power i

s around

To meet the above two conditions:

100 200

Petaflops !

Page 22: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

HMC Algorithm for 2 flavor QCD

† † 1

2

1. Initial gauge configuration

2. Generate with probability distribution

3. Generate with probability distributi exp( )

Recal

{ }

{ }

l:

exp[ (

exp[ ( ) ] = ex

)

p

o

2

n

]

[

/

l

a a

l l

U

P P

D D

1†

4. Fixin

(

g

) ( ( ))

( ) ( ) ( ), ( ) ( )

the pseudofermion field

5. Molecular dynamics

]

the

(leap-frog/Omelyan inte

most expensive part of H

grato

MC

r

)

a a

l l l l l

D D U

U iP U P P T

D

† †

6. Accept { } wi

( ) ( ) ( ) ( ) ( )

th the probability

7. Go to

min 1,ex

.

( )

2

p

a a a

l l

l A

l GP D S U

U P H

D D D U

H

Page 23: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

†, Ax b A D D

,

,

k k

k

k k

r r

p Ap

Conjugate Gradient algorithm †( )D D

1k k k kx x p

1k k k kr r Ap

1 1,

,

k k

k

k k

r r

r r

1 1k k k kp r p

0 0 0 0 00, , x r b Ax p r

The most time-consuming operations

are the matrix-vector multiplications:

GPU computes in single precision

much faster than in dou

(

ble precisio

)

!

n

kD Dp

1If | | | |, then stopkr b

Page 24: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

CG algorithm with mixed precision

1

1

1

1 1

1.

2. If | | | |, then stop

3.

4.

5. Go to 1.

Pr

Let

the

Solve in

| | |

single precision to an accuracy 1

,

oo

|

f of convergence:

|

|n

|

|

k k

k

k

k k

k

k k

k k k k

k k

At r

r b Ax

r b

x x

s r A s r

r b A

t

x

t

1| | | | | | | |k kkk kb Ax A s rt r

(To recount the tuning of our CUDA kernel for CG with

mixed precision, see Kenji Ogawa’s talk)

Page 25: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 25

First results of the First results of the QQCCDD Vacuum from GPU Vacuum from GPU

• The vacuum (ground state) of QCD constitutes

various quantum fluctuations.

• These quantum fluctuations are the origin of many

interesting and important nonperturbative physics.

• In QCD, each gauge configuration possesses a

well-defined topological charge Q with integer value.

• Thus it is important to determine the topological

charge fluctuation in the QCD vacuum.

, 1,0, 1,

( ) i Q

Q

Q

Z e Z

[ ] ,ES Q

QZ dA d d e

Page 26: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 26

Quantum fluctuations in the QCD vacuumQuantum fluctuations in the QCD vacuum

0tQ

243.3 10 sect

16

33 15

1.2 10 m

2 10 m

x a

L

316 32

Page 27: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 27

Quantum fluctuations in the QCD vacuumQuantum fluctuations in the QCD vacuum

1tQ

243.3 10 sect

16

33 15

1.2 10 m

2 10 m

x a

L

316 32

Page 28: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 28

OutlookOutlook

• To clarify the nature of QCD vacuum, whether it is more

instanon-like, or more complicated 2-dim or 3-dim

sheet-like structure. Namely, to test the (anti-)self-duality,

1

2F F F

• To identify the QCD vacuum fluctuations which are the

most relevant to the mechanism of color confinement.

• GPU supercomputers will play the most important role in

simulating lattice QCD, which will unveil the nonperturbative

strong interaction physics from the first principles.

Page 29: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

T.W. Chiu, GPU Workshop, Jan 16, '09 29

Backup slides

Page 30: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

How to avoid computing How to avoid computing fermionfermion determinant determinant

1

( )( )

The central problem of lattice QCD is to generate a set of

gauge configurations with probability

But the computation of

{ , , },

det

is too costly,

( ) .

det ( ) ( )ince ss i

G

QCD

N i i

S U

f

f

f f

p C

C C C U

D U

D C D C

e

a huge matrix.

1/ 2† † † † 1/ 2det (det ) exp[ ( ) ]

Then

det exp [ ( )

Introduce which carry the same quantum

numbers of the quarks, but obey Bose statistics, i.e.

f f f f f f f f f

f f f f

f G

f

D D D d d D D

Z dU D S U

pseudofermions

† † † 1/ 2

]

exp[ ( ) ( ) ] f f G f f f f

f f

dU d d S U D D

Page 31: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

HMCHMC for for QCDQCD

2 1/ 2

For each link variable

introduce conjugate momentum

Then the Hamiltonian

exp( ) exp( )

for HMC i

1tr( ) ( ) ( )

2

s

x x l l

l l

l G f f f f

l f

U iA U iA

P A

H P S U D D

† †

† l f f

l f

H

ldPZ dU d d e

are generators of (3) gauge group

,

satisfying

a a a a

l l l l

a S

P P T A T

U

A

T

8

1

tr( )

1

3

a b

ab

a a

in mj ij mnij mna

T T

T T

Page 32: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

† † †

1 , tr( ) 0

l l l

l l l l l l l

U iP U

UU U U P P P

a a a a a

l l l l l l lij ija ij a ijl lij ij

S U S UH P P i PU P P i T U

U U

HMCHMC for for QCD (cont)QCD (cont)

I 0mpose a a

l l ijij l ij

S UH P i T U

U

Define a a

l l ijij l ij

f UD f U i T U

U

a a

l lP D S U

Page 33: The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the hadronic scale, , perturbation theory is incapable to extract any quantities from QCD,

HMC for 2 flavor QCD ( )u dm m

† † † † 1det det det det exp[ ( ) ] f u d u u u u

f

D D D D D d d D D

1† †

† †

a a a a

l l l G l

a a

l G l

P D S U D S U D D D

D S U D D D

1

†D D

† †2 1

1tr( ) ( ) ( )

2

exp( )

l G u u

l

l

l

ldP

H P S U D D

Z dU d d H

† †† † 1 exp[ ( ) ] = expNote that [ ], u u uD D D