national computational science alliance supercomputing: directions in technology, architecture and...

National Computational Science Alliance

Supercomputing: Directions in Technology, Architecture and Applications

• Keynote Talk to Supercomputer’98 in Mannheim, Germany

• June 18, 1998


Supercomputing: Directions in Technology, Architecture and Applications

Abstract

"By using the results of the Top 500 over the last five years, one can easily trace out the complete transformation of the supercomputer industry. In 1993, none of the Top500 was made by a broadly based market driven company, while today over 3/4 of the Top500 are made by SGI, IBM, HP, or Sun. Similarly, vector architectures have been replaced in market share by microprocessor based SMPs. We now see a strong move to replace many MPPs and SMPs by the new architecture of Distributed Shared Memory (DSM) such as the SGI Origin or HP SPP series. A key trend is the move toward clusters of DSMs instead of monolithic MPPs. The next major change will be the emergence of Intel processors replacing RISC processors, particularly the Intel Merced processor which should become dominant shortly after 2000. A major battle will shape up between UNIX and Microsoft's NT operating systems, particularly at the lower end of the Top500. Finally, with each new architecture comes a new set of applications we can now attack. I will discuss how DSM will enable dynamic load balancing needed to support the multi-scale problems that teraflop machines will enable us to tackle."


NCSA is the Leading Edge Site for the National Computational Science Alliance

www.ncsa.uiuc.edu


Scientific Applications Continue to Require Exponential Growth in Capacity

MACHINE REQUIREMENT IN FLOPS1010 1012 1014 1016 1018 1020

1995 NSF Capability

108

2000 NSF Leading Edge

Molecular Dynamics for Biological Molecules

Computational Cosmology

Turbulent Convection

in Stars

Atomic/Diatomic Interaction

QCD1012

MEMORY

BYTES 1010

108

1014

= Long Range Projections from Recent Applications Workshop = Next Step Projections by NSF Grand Challenge Research Teams= Recent Computations by NSF Grand Challenge Research Teams

ASCI in 2004

100 year climate model in hours

NSF in 2004 (Projected)

From Bob Voigt, NSF


The Promise of the Teraflop - From Thunderstorm to National-Scale Simulation

Simulation by Wilhelmson, et al.;Figure from Supercomputing and the Transformation of Science, Kaufmann and Smarr, Freeman, 1993


Accelerated Strategic Computing Initiative is Coupling DOE Defense Labs to Universities• Access to ASCI Leading Edge Supercomputers• Academic Strategic Alliances Program • Data and Visualization Corridors

http://www.llnl.gov/asci-alliances/centers.html


Comparison of the DoE ASCI and the NSF PACI Origin Array Scale Through FY99

www.lanl.gov/projects/asci/bluemtn/Hardware/schedule.html

Los Alamos Origin System FY995-6000 processors

NCSA Proposed System FY996x128 and 4x64=1024 processors

National Computational Science AllianceFuture Upgrade Under Negotiation with NSF

NCSA Combines Shared Memory Programming with Massive Parallelism

CM-5

CM-2


The Exponential Growth of NCSA’s SGI Shared Memory Supercomputers

1

10

100

1000

10000

Jan

-94

Jan

-95

Jan

-96

Jan

-97

Jan

-98

Jan

-99

Jan

-00

Jan

-01

SG

I Pro

cess

ors

Doubling Every Nine Months!

Challenge

Power Challenge

Origin

SN1


TOP500 Systems by Vendor

TOP500 Reports: http://www.netlib.org/benchmark/top500.html

CRI

SGI

IBM

Convex

HP

SunTMC

IntelDEC

JapaneseOther

0

100

200

300

400

500Ju

n-9

3

No

v-93

Jun

-94

No

v-94

Jun

-95

No

v-95

Jun

-96

No

v-96

Jun

-97

No

v-97

Jun

-98

Nu

mb

er o

f S

yste

ms

Other

Japanese

DEC

Intel

TMC

Sun

HP

Convex

IBM

SGI

CRI


Average User MFLOPS

Nu

mb

er o

f U

sers

0

50

100

15020 40 60 80

100

120

140

160

180

200

220

240

260

280

300

March, 1992 - February, 1993 Average Performance, Users > 0.5 CPU Hour

Cray Y-MP4 / 64

Average Speed 70 MFLOPS

Peak Speed MIPS R8000

Peak Speed Y-MP1

Why NCSA Switched From Vector to RISC Processors

NCSA 1992 Supercomputing Community


Replacement of Shared Memory Vector Supercomputers by Microprocessor SMPs


Top

500

Inst

alle

d S

C’s

Ju

n-9

3

Ju

n-9

4

Ju

n-9

5

Ju

n-9

6

Ju

n-9

7

Jun

-980

100

200

300

400

500MPPSMP/DSMPVP


Top500 Shared Memory Systems

Vector Processors Microprocessors


PVP Systems

0

100

200

300

Ju

n-9

3

No

v-93

Ju

n-9

4

No

v-94

Ju

n-9

5

No

v-95

Ju

n-9

6

No

v-96

Ju

n-9

7

No

v-97

Ju

n-9

8

Nu

mb

er o

f S

yste

ms Europe

Japan

USA

SMP + DSM Systems

0

100

200

300

Ju

n-9

3

No

v-93

Ju

n-9

4

No

v-94

Ju

n-9

5

No

v-95

Ju

n-9

6

No

v-96

Ju

n-9

7

No

v-97

Ju

n-9

8

Nu

mb

er o

f S

yste

ms

USA


Simulation of the Evolution of the Universe on a Massively Parallel Supercomputer

12 Billion Light Years 4 Billion Light Years

Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume -688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society

http://www.mpg.de/universe.htm


Limitations of Uniform Grids for Complex Scientific and Engineering Problems

Source: Greg Bryan, Mike Norman, NCSA

512x512x512 Run on 512-node CM-5

Gravitation Causes Continuous

Increase in Density Until There is a Large Mass in a

Single Grid Zone


Use of Shared Memory Adaptive Grids To Achieve Dynamic Load Balancing

Source: Greg Bryan, Mike Norman, John Shalf, NCSA

64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge,Locally Equivalent to 8192x8192x8192 Resolution


1

10

100

1000

10000

100000

1000000

1

16

31

46

61

76

91

10

6

12

1

13

6

15

1

16

6

18

1Rank

CP

U-H

ou

rs B

urn

ed 100k to 1 M

10k to 100k

1k to 10k

100 to 1k

10 to 1001 to 10

Extreme and Large PIs Dominant Usage of NCSA Origin

January thru April, 1998


Disciplines Using the NCSA Origin 2000CPU-Hours in March 1995

Particle Physics

Chemistry

Materials Sciences

Engineering CFD

Astronomy

Physics

Industry

Molecular Biology Other


A Variety of Discipline Codes -Single Processor Performance Origin vs. T3E

0

20

40

60

80

100

120

140

160

Origin T3E

Sin

gle

Pro

ce

ss

or

MF

LO

PS

QMC

RIEMANN

Laplace

QCD

PPM

PIMC

ZEUS


0

1

2

3

4

5

6

70

10

20

30

40

50

60

Processors

Gig

afl

op

s

Origin-DSM

Origin-MPI

NT-MPI

SP2-MPI

T3E-MPI

SPP2000-DSM

Solving 2D Navier-Stokes Kernel - Performance of Scalable Systems

Source: Danesh Tafti, NCSA

Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner

(2D 1024x1024)


Alliance PACS Origin2000 Repository

http://scv.bu.edu/SCV/Origin2000/

Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGIJohn Connolly, U Kentucky Developing Repository for HP Exemplar


• NEC SX-5– 32 x 16 vector processor SMP– 512 Processors– 8 Gigaflop Peak Vector Processor

• IBM SP– 256 x 16 RISC Processor SMP– 4096 Processors– 1 Gigaflop Peak RISC Processor

• SGI Origin Follow-on – 32 x 128 RISC Processor DSM– 4096 Processors– 1 Gigaflop Peak EPIC Processor

High-End Architecture 2000-Scalable Clusters of Shared Memory Modules

Each is 4 Teraflops Peak


Emerging Portable Computing Standards

• HPF• MPI• OpenMP• Hybrids of MPI and OpenMP


Basket of Applications Average Performance as Percentage of Linpack Performance

0

200

400

600

800

1000

1200

1400

1600

1800

T90 C90 SPP-2000

SP2-160

Origin195

PCA

Linpack

Apps. Ave.

22%

25%

14% 19%

33% 26%

Applications Codes:

CFDBiomolecular

ChemistryMaterials

QCD


Harnessing Distributed UNIX Workstations - University of Wisconsin Condor Pool

Condor Cycles

CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc)


NT Workstation Shipments Rapidly Surpassing UNIX

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1995 1996 1997

Wo

rkst

atio

ns

Sh

ipp

ed (

Mill

ion

s)

UNIX

NT

Source: IDC, Wall Street Journal, 3/6/98


First Scaling Testing of ZEUS-MP on CRAY T3E and Origin vs. NT Supercluster

“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoftaccess.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html

Zeus-MP Hydro Code Running Under MPI

• Alliance Cosmology Team• Andrew Chien, UIUC • Rob Pennington, NCSA

0

20

40

60

80

100

120

140

T3

E

Orig

in

NT

Sin

gle

Pro

cesso

r S

peed

o

n Z

EU

S-M

P (

MF

LO

PS

)

0

1

2

3

4

5

6

7

8

0

20

40

60

80

100

120

140

160

180

200

Processors

GFLOPS

T3E

Origin

NT/Intel


NCSA NT Supercluster Solving Navier-Stokes Kernel

Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner

(2D 1024x1024)

Single Processor Performance:MIPS R10k 117 MFLOPSIntel Pentium II 80 MFLOPS

Danesh Tafti, Rob Pennington, Andrew Chien NCSA

0

10

20

30

40

50

60

0

10

20

30

40

50

60

Processors

Sp

ee

du

p

NT MPI

Origin MPI

Origin SM

Perfect

0

1

2

3

4

5

6

7

0

10

20

30

40

50

60

70

Processors

Gig

afl

op

s

NT MPI

Origin MPI

Origin SM


Near Perfect Scaling of Cactus - 3D Dynamic Solver for the Einstein GR Equations

0

20

40

60

80

100

1200

20

40

60

80

10

0

12

0Processors

Sc

alin

g

Origin

NT SC

Ratio of GFLOPsOrigin = 2.5x NT SC

Danesh Tafti, Rob Pennington, Andrew Chien NCSA

Cactus was Developed by Paul Walker, MPI-PotsdamUIUC, NCSA


NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops

http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html

• Parallel Computing on NT Clusters– Briand Sanderson, NCSA– Microsoft Co-Funds Development

• Features– Based on Microsoft DCOM– Batch or Interactive Modes– Application Development Wizards

• Current Status & Future Plans– Symbio Developer Preview 2 Released– Princeton University Testbed


The Road to Merced

http://developer.intel.com/solutions/archive/issue5/focus.htm#FOUR

national computational science alliance supercomputing: directions in technology, architecture and...

Documents