san diego supercomputer center oct 16 2006 at the university of california, san diego overview of...

44
SAN DIEGO SUPERCOMPUTER CENTER Oct 16 2006 at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Overview of HPC SDSC Machines Science Enabled at SDSC Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center University of California San Diego

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

SAN DIEGO SUPERCOMPUTER CENTER

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Overview of HPC SDSC Machines

Science Enabled at SDSC

Amit Majumdar

Scientific Computing Applications Group San Diego Supercomputer CenterUniversity of California San Diego

SAN DIEGO SUPERCOMPUTER CENTER 2

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Topics

1. Supercomputing in General

2. Supercomputers at SDSC

3. Science Enabled at SDSC

SAN DIEGO SUPERCOMPUTER CENTER 3

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

DOE, DOD, NASA, NSF Centers in US• DOE National Labs - LANL, LNNL, Sandia• DOE Office of Science Labs – ORNL, NERSC• DOD, NASA Supercomputer Centers

• National Science Foundation supercomputer centers for academic users• San Diego Supercomputer Center (UCSD)• National Center for Supercomputer Applications (UIUC) • Pittsburgh Supercomputer Center (Pittsburgh)• Texas Advanced Computing Center (U. Texas)• Indiana-Purdue• ANL-Chicago

SAN DIEGO SUPERCOMPUTER CENTER 4

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

TeraGrid: Integrating NSF Cyberinfrastructure

SDSCTACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research.

NCAR

Caltech

USC-ISI

UtahIowa

Cornell

Buffalo

UNC-RENCI

Wisc

SAN DIEGO SUPERCOMPUTER CENTER 5

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Measure of Supercomputers

• Top 500 list (HPL code performance) • Is one of the measures, but not the measure• Japan’s Earth Simulator (NEC) was on top for 3 years

• In Nov 2005 LLNL IBM BlueGene reached the top spot ~65000 nodes, 280 TFLOP on HPL, 367 TFLOP peak• First 100 TFLOP sustained on a real application last year• Very recently 200+ TFLOP sustained on a real application

• New HPCC benchmarks• Many others – NAS, NERSC, NSF, DOD TI06 etc.• Ultimate measure is usefulness of a center for you –

enabling better or new science through simulations on balanced machines

SAN DIEGO SUPERCOMPUTER CENTER 6

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Top500 Benchmarks

• 27th Top 500 – June 2006

• NSF Supercomputer Centers in Top500

Procs Rmax (GFLOP) Rpeak (GFLOP) Nmax #37 NCSA, PowerEdge 1750, P4 Xeon, 3.06 Ghz, Myrinet

2500 9819 15300 630000 #44 SDSC, IBM Power4, P655/690, 1.5/1.7 Ghz, Federation,

2464 9121 15628 605000 #55 PSC, Cray XT3, 2.4 Ghz AMD-X86, XT3 internal interconnect

2060 7935.82 9888

SAN DIEGO SUPERCOMPUTER CENTER 7

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Historical Trends in Top500

• 1000 X increase in top machine power in 10 years

SAN DIEGO SUPERCOMPUTER CENTER 8

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Other Benchmarks

• HPCC – High Performance Computing Challenge benchmarks – no rankings

• NSF benchmarks – HPCC, SPIO, and applications: WRF, OOCORE, GAMESS, MILC, PARATEC, HOMME – (these are changing , new ones are considered)

• DoD HPCMP – TI06 benchmarks

SAN DIEGO SUPERCOMPUTER CENTER 9

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Kiviat diagrams

SAN DIEGO SUPERCOMPUTER CENTER 10

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Capability Computing• Full power of a machine is

used for a given scientific problem utilizing - CPUs, memory, interconnect, I/O performance

• Enables the solution of problems that cannot otherwise be solved in a reasonable period of time - figure of merit time to solution

• E.g moving from a two-dimensional to a three-dimensional simulation, using finer grids, or using more realistic models

Capacity Computing• Modest problems are

tackled, often simultaneously, on a machine, each with less demanding requirements

• Smaller or cheaper systems are used for capacity computing, where smaller problems are solved

• Parametric studies or to explore design alternatives

• The main figure of merit is sustained performance per unit cost

SAN DIEGO SUPERCOMPUTER CENTER 11

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Strong Scaling

• For a fixed problem size how does the time to solution vary with the number of processors

• Run a fixed size problem and plot the speedup

• When scaling of parallel codes is discussed it is normally strong scaling that is being referred to

Weak Scaling

• How the time to solution varies with processor count with a fixed problem size per processor

• Interesting for O(N) algorithms where perfect weak scaling is a constant time to solution, independent of processor count

• Deviations from this indicate that either • The algorithm is not truly O(N) or • The overhead due to parallelism is

increasing, or both

SAN DIEGO SUPERCOMPUTER CENTER 12

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Weak Vs Strong Scaling Examples

• The linked cell algorithm employed in DL_POLY 3 [1] for the short ranged forces should be strictly O(N) in time.

• Study the weak scaling of three model systems (two shown next), the times being reported for HPCx, a large IBM P690+ cluster sited at Daresbury.

• http://www.cse.clrc.ac.uk/arc/dlpoly_scale.shtml• I.J.Bush and W.Smith, CCLRC Daresbury

Laboratory

SAN DIEGO SUPERCOMPUTER CENTER 13

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Weak scaling for Argon is shown. The smallest system size is 32,000 atoms, the largest 32,768,000. It can be seen that the scaling is very good, the time step increasing from 0.6s to 0.7s on going from 1 processor to 1024. This simulation is a direct test of the linked cell algorithm as it only requires short ranged forces, and so the results show it is behaving as expected.

SAN DIEGO SUPERCOMPUTER CENTER 14

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Weak scaling for water. The time step increasing from 1.9 second on 1 processor, where the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ). Ewald terms must also be calculated in this case, but constraint forces must be calculated. These forces are short range and should scale as O(N); their calculation requires a large number of short messages to be sent, and some latency effects become appreciable.

SAN DIEGO SUPERCOMPUTER CENTER 15

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Next Leap in Supercomputer Power

• PetaFLOP : 10 15

floating point operations/sec

• Expected multiple PFLOP(s) machines in the US during 2008 - 2011

• NSF, DOE (ORNL, LANL, NNSA) are considering this

• Similar initiative in Japan, Europe

SAN DIEGO SUPERCOMPUTER CENTER 16

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Topic

1. Supercomputing in General

2. Supercomputers at SDSC

3. Science Enabled at SDSC

SAN DIEGO SUPERCOMPUTER CENTER 17

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Dat

a (I

ncre

asin

g I/

O a

nd s

tora

ge)

Compute (increasing FLOPS)

SDSC Data Science Env

Campus, Departmental and

Desktop Computing

Traditional HEC Env

QCD

Protein Folding

TurbulenceReattachment

length

CHARMMGaussian

CPMD

NVOEOL

Cypres

SCECPost-processing

Data Storage/Preservation Env Extreme I/O Environment

1. Time Variation of Field Variable Simulation

2. Out-of-Core

SDSC’s focus: Apps in top two quadrants

ENZOPost-precessing

CFD

Turbulencefield

Climate

SCECSimulation ENZO

simulation

SAN DIEGO SUPERCOMPUTER CENTER 18

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC Production Computing Environment25TF compute, 1.4PB disk, 6PB tape

DataStarIBM Power4+

15.6 TFlops

TeraGrid Linux ClusterIBM/Intel IA-64

4.4 TFlops

Archival Systems18PB capacity (~3.5PB used)

Storage Area Network Disk

1400 TBSun F15K Disk Server

Blue Gene DataIBM PowerPC2X5.7 TFlops

SAN DIEGO SUPERCOMPUTER CENTER 19

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

DataStar is a powerful compute resource well-suited to “extreme I/O” applications

• Peak speed 15.6 TFlops

• #44 in June 2006 Top500 list

• IBM Power4+ processors (2528 total)

• Hybrid of 2 node types, all on single switch

• 272 8-way p655 nodes:

• 176 1.5 GHz proc, 16 GB/node (2 GB/proc)

• 96 1.7 GHz proc, 32 GB/node (4 GB/proc)

• 11 32-way p690 nodes: 1.7 GHz, 64-256 GB/node (2-8 GB/proc)

• Federation switch: ~6 sec latency, ~1.4 GB/sec pp-bandwidth

• At 283 nodes, ours is one of the largest IBM Federation switches

• All nodes are direct-attached to high-performance SAN disk , 3.8 GB/sec write, 2.0 GB/sec read to GPFS

• GPFS now has 115TB capacity

• 225 TB of gpfs-wan across NCSA, ANL

Due to consistent high demand, in FY05 we added 96 1.7GHz/32GB p655 nodes &

increased GPFS storage from 60 ->125TB - Enables 2048-processor capability jobs

- ~50% more throughput capacity- More GPFS capacity and bandwidth

SAN DIEGO SUPERCOMPUTER CENTER 20

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

BG System Overview:Novel, massively parallel system from IBM

• Full system installed at LLNL from 4Q04 to 3Q05• 65,000+ compute nodes in 64 racks• Each node being two low-power PowerPC processors + memory• Compact footprint with very high processor density• Slow processors & modest memory per processor • Very high peak speed of 367 Tflop/s• #1 Linpack speed of 280 Tflop/s

• 1024 compute nodes in single rack installed at SDSC in 4Q04• Another 1024 compute nodes will be installed soon

• Maximum I/O-configuration with 128 I/O nodes/rack for data-intensive computing

• Systems at 14 sites outside IBM & 4 within IBM as of 2Q06 • Need to select apps carefully

• Must scale (at least weakly) to many processors (because they’re slow)• Must fit in limited memory

SAN DIEGO SUPERCOMPUTER CENTER 21

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Chip(2 processors)

Com pute Card(2 ch ips, 2x1x1)

Node Board(32 ch ips, 4x4x2)

16 Com pute C ards

System(64 cabinets, 64x32x32)

Cabinet(32 Node boards, 8x8x16)

2.8/5.6 G F/s4 M B

5.6/11.2 G F/s0.5 G B DDR

90/180 G F/s8 G B DDR

2.9/5.7 TF/s256 G B DDR

180/360 TF /s16 TB D DR

SDSC was first academic institution with an IBM Blue Gene system

SDSC rack has maximum ratio of I/O SDSC rack has maximum ratio of I/O to compute nodes at 1:8 (LLNL’s is to compute nodes at 1:8 (LLNL’s is

1:64). Each of 128 I/O nodes in rack 1:64). Each of 128 I/O nodes in rack has 1 Gbps Ethernet connection => 16 has 1 Gbps Ethernet connection => 16

GBps/rack potential. GBps/rack potential.

SDSC procured 1-rack system 12/04. SDSC procured 1-rack system 12/04. Used initially for code evaluation and Used initially for code evaluation and

benchmarking; production 10/05. benchmarking; production 10/05. (LLNL system is 64 racks.)(LLNL system is 64 racks.)

Another node will be installed soonAnother node will be installed soon

SAN DIEGO SUPERCOMPUTER CENTER 22

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC Blue Gene - a new resource

• First academic installation of this novel architecture

• Configured for data-intensive computing• 1,024 compute nodes (soon to be 2048) , 128 I/O nodes • Peak compute performance of 5.7 TFLOPS (soon will be 11.4

TFLOPS)• Two 700-MHz PowerPC 440 CPUs, 512 MB per node• IBM network : 4 us latency, 0.16 GB/sec pp-bandwidth• I/O rates of 3.4 GB/s for writes and 2.7 GB/s for reads

achieved on GPFS-WAN• Has own GPFS of 20 TB and gpfs-wan

• System targets runs of 512 CPUs or more

• Production in October 2005• Multiple 1 million-SU awards at LRAC and several smaller

awards for physics, engineering, biochemistry

In Dec ‘04, SDSC brought in a single-rack Blue Gene system- Initially an experimental system to evaluate NSF applications

on this unique architecture-Tailored to high I/O applications

- Entered production as allocated resource in October 2005

RLM
Say something about TG resource.

SAN DIEGO SUPERCOMPUTER CENTER 23

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

BG System Overview: Processor Chip (1)

“Double FPU”

Ethernet Gbit

JTAGAccess

144 bit wide DDR512MB

JTAG

Gbit Ethernet

440 CPU

440 CPUI/O proc

L2

L2

MultiportedSharedSRAM Buffer

Torus

DDR Control with ECC

SharedL3 directoryfor EDRAM

Includes ECC

4MB EDRAM

L3 CacheorMemory

l

6 out and6 in, each at 1.4 Gbit/s link

256

256

1024+144 ECC256

128

128

32k/32k L1

32k/32k L1

“Double FPU”

256

snoop

Tree

3 out and3 in, each at 2.8 Gbit/s link

GlobalInterrupt

128

SAN DIEGO SUPERCOMPUTER CENTER 24

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

BG System Overview: Processor Chip (2)(= System-on-a-chip)

• Two 700-MHz PowerPC 440 processors• Each with two floating-point units• Each with 32-kB L1 data caches that are not coherent• 4 flops/proc-clock peak (=2.8 Gflop/s-proc)• 2 8-B loads or stores / proc-clock peak in L1 (=11.2 GB/s-proc)

• Shared 2-kB L2 cache (or prefetch buffer)• Shared 4-MB L3 cache• Five network controllers (though not all wired to each node)

• 3-D torus (for point-to-point MPI operations: 175 MB/s nom x 6 links x 2 ways)• Tree (for most collective MPI operations: 350 MB/s nom x 3 links x 2 ways) • Global interrupt (for MPI_Barrier: low latency)• Gigabit Ethernet (for I/O)• JTAG (for machine control)

• Memory controller for 512 MB of off-chip, shared memory

SAN DIEGO SUPERCOMPUTER CENTER 25

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

DataStar p655 Usage, by Node Size

SAN DIEGO SUPERCOMPUTER CENTER 26

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC Academic Use, by Directorate

SAN DIEGO SUPERCOMPUTER CENTER 27

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Strategic Applications Collaborations

• Cellulose to Ethanol : Biochemistry (J. Brady, Cornell)• LES Turbelence : Mechanics (M. Krishnan, U. Minnesota)• NEES : Earthquake Engr (Ahmed Elgamal, UCSD)• ENZO : Astronomy (M. Norman, UCSD)• EM Tomography : Neuroscience (M. Ellisman, UCSD)• DNS Turbulence : Aerospace Engr (PK Yeung, Georgia

Tech)• NVO Mosaicking : Astronomy (R. Williams, Caltech, Alex

Szalay, Johns Hopkins)• UnderstandingPronouns: Linguistics (A. Kehler, UCSD)• Climate : Atmospheric Sc. (C. Wunsch, MIT)• Protein Structure : Biochemistry (D. Baker, Univ. of

Washington)• SCEC, TeraShake : Geological Science (T. Jordan and C.

Kesselman USC, K. Olsen UCSB, B. Minster, SIO)

SAN DIEGO SUPERCOMPUTER CENTER 28

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Topic

1. Supercomputing in General

2. Supercomputers at SDSC

3. Science Enabled at SDSC

SAN DIEGO SUPERCOMPUTER CENTER 29

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Enabling Users – User Centric Focus of SDSC

• SDSC leadership in enabling users• Recruit new users/communities• Enable their HPC• Help write allocations proposals• Make recruited users 100K – 1000K allocated SU users • (D. Baker/U.Washington, C. Wunsch/MITgcm, Mark Ellisman/BIRN, NEES,

PK Yeung/Georgia Tech, M. Krishnan/U. Minn, K. Droegemeier, GEON etc.)

• Balanced machine – memory/node, I/O, queue management – these attract users and retain users

• Work on community users, and comm/3rd party codes, tools, libs

• Procure machines based on needs and characteristics of users’ codes

SAN DIEGO SUPERCOMPUTER CENTER 30

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SAC Program and Science Enabled

• Achieve breakthrough computational science that users couldn’t do before

• Pair up SDSC’s computational scientists (many disciplines of domain science and parallel computing expert) with NSF PIs for 3-12 months

• Span all the NSF directorates and universities across US for SAC projects

• Scaling (procs/communication and I/O) up applications is a major thrust

• Develop and apply solutions for wider user community

SAN DIEGO SUPERCOMPUTER CENTER 31

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Scaling DNS Turbulence (PI: Dr. P.K.Yeung, Georgia Tech, SAC staff Dr. Dmitry Pekurovsky)

• Original DNS code used for years to simulate a range of phenomena in turbulence and turbulent mixing

• Over the years PI had millions of allocated SUs on SDSC and other NSF center’s machines

• Currently computing at 2048^3 resolution

• Would like to reach the grid size done on the Earth Simulator i.e. 4096^3 resolution, to better understand physics at micro scales

• Original code is limited in scalability by N (4096) processors for N^3 grid problem

SAN DIEGO SUPERCOMPUTER CENTER 32

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

2-D Parallel Decomposed Code• Reimplemented in 2-D parallel

decomposition of the compute-intensive part (3D FFT)

• Now capable of scaling up to N2 processors (16M)

• New code successfully tested and running on 32,768 BG processors at IBM Watson lab (4096^3 – first ever attemted in US)

• By-product: optimized library for scalable 3D FFT, for use in other codes. Beta version available at SDSC Web site. Currently using the library in another turbulence code, as part of another SAC project.

128

256

512

1024

2048

4096

16384

32768

8192

1.E+02 1.E+03 1.E+04 1.E+05

Nproc

N^

3 L

OG

2(N

) / T

512 3̂

1024 3̂

2048 3̂

4096 3̂

The execution speed (# of steps per second of execution),

normalized by the problem size, is plotted on the Y-axis.

SAN DIEGO SUPERCOMPUTER CENTER

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC Enables “CASP in 3 Hours” to Speed Simulations for Drug Design (SDSC recruited, parallelized code; millions of SUs) (PI David Baker U. Washington; SAC staff Dr. Ross Walker)

•SDSC Blue Gene•Work horse for CASP 7 competition.•Provided access to an order of magnitudemore computing power than was availablefor CASP 6.•Only NSF machine available that couldprovide a job “turnaround” (Queue+Runtime)of less than 1 week for all CASP targets.•Test bed for “extreme scaling” modifications.

Provided a development environment to successfullyscale HHMI Professor Baker’s Rosetta code toover 40,000 processors using IBM TJWatson BlueGene System

•SDSC Datastar•Used for the 10% of CASP targets that requireda large memory footprint.•2000+ cpu jobs possible for large structureprediction problems.

Image shows the blind prediction (Blue) of a CASP7 target. Red shows the x-ray structure (released after the prediction was submitted) and Green shows a low resolution NMR structure. The prediction was performed by Ross Walker (SDSC) and Srivatsan Raman (UW) in an unprecedented 3 hours using 40,960 cpus of IBM TJWatson Blue Gene/L Machine. Such a calculation was only possible from the experience learnt via the SDSC SAC collaboration.

SAN DIEGO SUPERCOMPUTER CENTER 34

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC SAC Group Improves Charmm Scaling for Cellulase Research (PIs from

TSRI, NREL, Cornell; SAC staff Dr. Ross Walker)

• Cellulase: key enzyme in the production of cellulistic ethanol.• Opportunity to reduce the USA’s

dependence on foreign oil.• True molecular machine.• 1 million atom+ simulations need high

performance capability computing.• Datastar is the perfect platform for this.• SDSC (Ross Walker) is working on

improving the performance and scaling of the CHARMM MD code.

• Improvements will ultimately benefit thousands of researchers.

SAN DIEGO SUPERCOMPUTER CENTER 35

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

NEES(PIs – Iowa St., Stanford,Princeton,U.Missouri,UC Berkeley, Davis etc.;

SAC Staff Dr. Dong Ju Choi)• Network for Earthquake

Engineering Simulation (NEES) is an NSF-funded MRE project.

• Provides world-class experimental facilities, coordinated IT (NEESit), data, networking and computational support, including HPC simulation support, to the NEES community.

• SDSC SAC staff is working with NEESit and NEES scientists to optimize code performance and scalability and to enable HPC for NEES community (new) users

Shaketable Viz done by SDSC Viz group –Steve Cutchin, and Amit Chourasia

• SDSC recruited NESS for HPC usageand wrote successful allocation proposal

SAN DIEGO SUPERCOMPUTER CENTER 36

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

NEES SAC• OpenSees (core structural finite element object oriented code for the

NEES community): scalability is much improved using the various parallel solver algorithm (Petsc, MUMPS, Distributed Super LU) and different communication scheme

• Recently demonstrated 2048 DataStar processor runs for 25 million elements with good scalability and single PE performance on Puente Hills earthquake simulation (originally code was modeling 1 million elements on few procs)

• 13 sub-PIs (over 30 new users) are new to HPC but using the DataStar and the TG ia64 through the NEES HPC allocation as a type of community allocation

• Users are using their improved code and/or existing structural/fluid codes (OpenSees, LS-Dyna, Abaqus, Ansys, Fluent etc.) and resulted significant increaes in HPC usage

• Designed and developed a utility for parametric runs and worked with the users to successfully complete the jobs

SAN DIEGO SUPERCOMPUTER CENTER 37

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

ENZO SAC - Scaling & Optimization(PI: Mike Norman, UCSD, SAC staff : Dr. Robert Harkness)

(SDSC contributes to writing alloc prop)• NonAMR – Lyman Alpha Forest simulation – compare results of the

simulation based on concordance model of cosmology with observation to constrain cosmological parameters.

• AMR1 – cluster of galaxies, x-ray emmisivity – comparison with x-ray obs.AMR2 – The AMR “light cone” simulations to support the construction of the LSST (Large scale Synoptic Survery Telescope)

• ENZO problem sizes increased by ~8^3, cost ~8^4 in 3 years – expect a further increase of ~8^3 in 3 years

• Today non-AMR grids up to 2048^3 with 8 billion dark matter particles possible on 2048 cpus of DataStar compared to 256^3 grids on about 64 processors a few years ago – this is result of SDSC SAC effort

• AMR 512^3 top-level grids with 7 levels of refinement, including 512^3 dark matter particles, generating > 350,000 subgrids (SAC effort resulted in N^2 to NlogN scaling improvement)

• Shared-memory parallelism used in initial conditions generator• Massively parallel dark matter particle sort enables 100% parallel I/O• Weak scaling shows linear behavior up to 2048 cpus• Strong scaling limited by ghost cells and boundary exchanges

SAN DIEGO SUPERCOMPUTER CENTER 38

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

ENZO – New physics and enhanced scaling

• ENZO will incorporate MHD and 3D flux-limited diffusion

• Advanced parallel multigrid solvers for gravity and RT

• Refactoring of AMR grid hierarchy for unlimited scaling

• Gadget equilibrium cooling• Unigrid scale up to 4096^3 and

8192^3 at Petascale on 16K to 64K processors

• 2048^3 L6 AMR at Petascale • I/O strategies for managing

multi-Petabyte results• Integrated visualization, steering

and tracking 2048^3 LAF on 2048 CPUs of DataStar (only NSF machine capable of this – 5TB memory required)

SAN DIEGO SUPERCOMPUTER CENTER 39

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC SAC TeraShake Efforts(PIs : Tom Jordan, USC, K. Olsen SDSU, B. Minster, SIO;

SAC Staff Dr. Yifeng Cui) (SDSC helps in allocation proposal)

Before SDSC SAC involved• Code deals up to 56 million mesh • Code scales up to 512 processors• Ran on local clusters only• No checkpoints/restart capability• Wave propagation simulation only• K. Olsen’s own code• Poor single-processor performance• Initialization slow and memory problems• MPI-I/O bugs, not scalable

After SDSC SAC efforts• Codes enhanced to deal with 8.6 billion mesh • Excellent speed-up to 2048 processors, achieve 1 Tflop/s• Ported to Datastar, BG/L, TG IA-64, Lemieux etc• Added Checkpoints/restart/checksum capability• Integrated dynamic rupture + wave propagation as one• Serve as SCEC Community Velocity Model• 4x speed-up of single-processor performance• 10x speed-up of initialization and memory needs reduced• MPI-I/O improved 10x, generating 47TB outputs per run

TeraShake code Total Execution Time on IBM Power4 Datastar

10.00

100.00

1000.00

10000.00

120 240 480 960 1920

Number of processors

Wal

l Clo

ck T

ime

(sec

, 101

ste

ps)

WCT time with improved I/OWCT idealWCT time with TeraShake-2WCT time with TeraShake-1

95%86% efficiency

86%

Source: 600x300x80kmM esh: 3000x1500x400Spatial resolution: 200mNumber of steps: 101Output: every time step

SAN DIEGO SUPERCOMPUTER CENTER 40

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Real World Engr Flows – PI: Mahesh Krishnan, U. Minnesota

(contributed to acquiring million SUs)

• Numerical methods and turbulence models that handle real-world engineering geometries without compromising the accuracy needed to reliably simulate the complicated details of turbulence

• DNS of turbulent jet in cross flow : 12 million control volumes (CV), 144 DS procs• Propeller crashback : 13 million CV, 384 TG procs, Re ~480,000• Spatially evolving turbulent round jet :

today : ~50 million CV (unstructured) on 1024 DataStar procs, Re ~2400 yesterday: ~6.5 million CV on 160 DataStar procs, ~Re 1000

• Fourier Spectral code runs on Blue Gene – SDSC SAC effort onging for memory scaling

An exact simulation, without approximations, of a turbulent jet using DNS. www.aem.umn.edu/~mahesh/forsdsc/jic_vort.avi

Simulation of flow around a propeller in sudden reversal known as crashback. Flow is left to right and shows streamlines

and pressure contours in the cross-section

SAN DIEGO SUPERCOMPUTER CENTER 41

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

SDSC Enables Accurate Simulation of Sun’s Corona (PI: Chuck Goodrich, BC, Z. Mikic, SAIC)

• The most true-to-life computer simulation ever made of our sun's multimillion-degree outer atmosphere, the corona, successfully predicted its actual appearance during the total solar eclipse of March 29, 2006

• The demanding calculations required four days running on more than 600 processors of the DataStar system at the SDSC

• Computer model based on spacecraftobservations of magnetic activity

• More realistic physics of how enerygyis transferred in the corona

• PMaC (Allan Snavely, Nick Wright) group involved in scaling work

A composite of observations of the eclipse. Solar north is up. Solar Physics Group, SAIC; Williams College Eclipse Expedition with support from NSF/NASA/National Geographic, and SOHO, supported by NASA and ESA

SAN DIEGO SUPERCOMPUTER CENTER 42

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Longest-ever Simulation of Type Ia SupernovaAlexei Khokhlov, Don Lamb – U. Chicago

• The first self-consistent 3-D numerical simulation of the Type Ia supernova deflagration explosion from the moment of ignition through the active explosion phase and followed up to the period of 11 days

• The current state of the art multidimensional models of such astrophysical phenomena have typically followed the evolution of the system for a few tens of seconds

• Post-explosion evolution of Type Ia supernova lasts for much longer periods of time going through various stages with different physical processes being important at different stages

• On 512 DS processors - total SU usage in August was ~30,000; Overall SU, included development & testing of the numerical code, was ~200,000 SUs

Flamestructure in thestar at 2 sec

and at 77 min

0 – unburnedor fuel

1 – totallyburned

SAN DIEGO SUPERCOMPUTER CENTER 43

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Estimating the State of the Southern OceanCarl Wunsch, Matt Mazloff, MIT, ECCO Consort.

(recruited this group and enabled to get million SUs)• “The ECCO group faces a computationally massive problem

that is only feasible thanks to computing centers like the SDSC.” Matt Mazloff, MIT

• Diagnosing and evaluating the state of the Southern Ocean

• Global ocean circulation imact climate change, ocean currents affect fisheries dynamics, shipping, offshore mining, sea level height change, sea surface temperatures, storm development, seasonal droughts and floods

• Key adjoint method used in the MITgcm code – balanced machine, 4 GB/procs, good I/O vital

• Simulations on DataStar provided improved estimate of southern ocean for the year 2000

• Received about one million SUs last March on DataStar; short term goal is to improve year 2000 estimate and extend thru 2003

SAN DIEGO SUPERCOMPUTER CENTER 44

Oct 16 2006

at the UNIVERSITY OF CALIFORNIA, SAN DIEGO

Thank you