san diego supercomputer center oct 16 2006 at the university of california, san diego overview of...
Post on 21-Dec-2015
217 views
TRANSCRIPT
SAN DIEGO SUPERCOMPUTER CENTER
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Overview of HPC SDSC Machines
Science Enabled at SDSC
Amit Majumdar
Scientific Computing Applications Group San Diego Supercomputer CenterUniversity of California San Diego
SAN DIEGO SUPERCOMPUTER CENTER 2
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Topics
1. Supercomputing in General
2. Supercomputers at SDSC
3. Science Enabled at SDSC
SAN DIEGO SUPERCOMPUTER CENTER 3
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
DOE, DOD, NASA, NSF Centers in US• DOE National Labs - LANL, LNNL, Sandia• DOE Office of Science Labs – ORNL, NERSC• DOD, NASA Supercomputer Centers
• National Science Foundation supercomputer centers for academic users• San Diego Supercomputer Center (UCSD)• National Center for Supercomputer Applications (UIUC) • Pittsburgh Supercomputer Center (Pittsburgh)• Texas Advanced Computing Center (U. Texas)• Indiana-Purdue• ANL-Chicago
SAN DIEGO SUPERCOMPUTER CENTER 4
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
TeraGrid: Integrating NSF Cyberinfrastructure
SDSCTACC
UC/ANL
NCSA
ORNL
PU
IU
PSC
TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research.
NCAR
Caltech
USC-ISI
UtahIowa
Cornell
Buffalo
UNC-RENCI
Wisc
SAN DIEGO SUPERCOMPUTER CENTER 5
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Measure of Supercomputers
• Top 500 list (HPL code performance) • Is one of the measures, but not the measure• Japan’s Earth Simulator (NEC) was on top for 3 years
• In Nov 2005 LLNL IBM BlueGene reached the top spot ~65000 nodes, 280 TFLOP on HPL, 367 TFLOP peak• First 100 TFLOP sustained on a real application last year• Very recently 200+ TFLOP sustained on a real application
• New HPCC benchmarks• Many others – NAS, NERSC, NSF, DOD TI06 etc.• Ultimate measure is usefulness of a center for you –
enabling better or new science through simulations on balanced machines
SAN DIEGO SUPERCOMPUTER CENTER 6
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Top500 Benchmarks
• 27th Top 500 – June 2006
• NSF Supercomputer Centers in Top500
Procs Rmax (GFLOP) Rpeak (GFLOP) Nmax #37 NCSA, PowerEdge 1750, P4 Xeon, 3.06 Ghz, Myrinet
2500 9819 15300 630000 #44 SDSC, IBM Power4, P655/690, 1.5/1.7 Ghz, Federation,
2464 9121 15628 605000 #55 PSC, Cray XT3, 2.4 Ghz AMD-X86, XT3 internal interconnect
2060 7935.82 9888
SAN DIEGO SUPERCOMPUTER CENTER 7
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Historical Trends in Top500
• 1000 X increase in top machine power in 10 years
SAN DIEGO SUPERCOMPUTER CENTER 8
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Other Benchmarks
• HPCC – High Performance Computing Challenge benchmarks – no rankings
• NSF benchmarks – HPCC, SPIO, and applications: WRF, OOCORE, GAMESS, MILC, PARATEC, HOMME – (these are changing , new ones are considered)
• DoD HPCMP – TI06 benchmarks
SAN DIEGO SUPERCOMPUTER CENTER 9
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Kiviat diagrams
SAN DIEGO SUPERCOMPUTER CENTER 10
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Capability Computing• Full power of a machine is
used for a given scientific problem utilizing - CPUs, memory, interconnect, I/O performance
• Enables the solution of problems that cannot otherwise be solved in a reasonable period of time - figure of merit time to solution
• E.g moving from a two-dimensional to a three-dimensional simulation, using finer grids, or using more realistic models
Capacity Computing• Modest problems are
tackled, often simultaneously, on a machine, each with less demanding requirements
• Smaller or cheaper systems are used for capacity computing, where smaller problems are solved
• Parametric studies or to explore design alternatives
• The main figure of merit is sustained performance per unit cost
SAN DIEGO SUPERCOMPUTER CENTER 11
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Strong Scaling
• For a fixed problem size how does the time to solution vary with the number of processors
• Run a fixed size problem and plot the speedup
• When scaling of parallel codes is discussed it is normally strong scaling that is being referred to
Weak Scaling
• How the time to solution varies with processor count with a fixed problem size per processor
• Interesting for O(N) algorithms where perfect weak scaling is a constant time to solution, independent of processor count
• Deviations from this indicate that either • The algorithm is not truly O(N) or • The overhead due to parallelism is
increasing, or both
SAN DIEGO SUPERCOMPUTER CENTER 12
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Weak Vs Strong Scaling Examples
• The linked cell algorithm employed in DL_POLY 3 [1] for the short ranged forces should be strictly O(N) in time.
• Study the weak scaling of three model systems (two shown next), the times being reported for HPCx, a large IBM P690+ cluster sited at Daresbury.
• http://www.cse.clrc.ac.uk/arc/dlpoly_scale.shtml• I.J.Bush and W.Smith, CCLRC Daresbury
Laboratory
SAN DIEGO SUPERCOMPUTER CENTER 13
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Weak scaling for Argon is shown. The smallest system size is 32,000 atoms, the largest 32,768,000. It can be seen that the scaling is very good, the time step increasing from 0.6s to 0.7s on going from 1 processor to 1024. This simulation is a direct test of the linked cell algorithm as it only requires short ranged forces, and so the results show it is behaving as expected.
SAN DIEGO SUPERCOMPUTER CENTER 14
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Weak scaling for water. The time step increasing from 1.9 second on 1 processor, where the system size is 20,736 particles, to 3.9 on 1024 ( system size 21,233,664 ). Ewald terms must also be calculated in this case, but constraint forces must be calculated. These forces are short range and should scale as O(N); their calculation requires a large number of short messages to be sent, and some latency effects become appreciable.
SAN DIEGO SUPERCOMPUTER CENTER 15
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Next Leap in Supercomputer Power
• PetaFLOP : 10 15
floating point operations/sec
• Expected multiple PFLOP(s) machines in the US during 2008 - 2011
• NSF, DOE (ORNL, LANL, NNSA) are considering this
• Similar initiative in Japan, Europe
SAN DIEGO SUPERCOMPUTER CENTER 16
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Topic
1. Supercomputing in General
2. Supercomputers at SDSC
3. Science Enabled at SDSC
SAN DIEGO SUPERCOMPUTER CENTER 17
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Dat
a (I
ncre
asin
g I/
O a
nd s
tora
ge)
Compute (increasing FLOPS)
SDSC Data Science Env
Campus, Departmental and
Desktop Computing
Traditional HEC Env
QCD
Protein Folding
TurbulenceReattachment
length
CHARMMGaussian
CPMD
NVOEOL
Cypres
SCECPost-processing
Data Storage/Preservation Env Extreme I/O Environment
1. Time Variation of Field Variable Simulation
2. Out-of-Core
SDSC’s focus: Apps in top two quadrants
ENZOPost-precessing
CFD
Turbulencefield
Climate
SCECSimulation ENZO
simulation
SAN DIEGO SUPERCOMPUTER CENTER 18
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC Production Computing Environment25TF compute, 1.4PB disk, 6PB tape
DataStarIBM Power4+
15.6 TFlops
TeraGrid Linux ClusterIBM/Intel IA-64
4.4 TFlops
Archival Systems18PB capacity (~3.5PB used)
Storage Area Network Disk
1400 TBSun F15K Disk Server
Blue Gene DataIBM PowerPC2X5.7 TFlops
SAN DIEGO SUPERCOMPUTER CENTER 19
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
DataStar is a powerful compute resource well-suited to “extreme I/O” applications
• Peak speed 15.6 TFlops
• #44 in June 2006 Top500 list
• IBM Power4+ processors (2528 total)
• Hybrid of 2 node types, all on single switch
• 272 8-way p655 nodes:
• 176 1.5 GHz proc, 16 GB/node (2 GB/proc)
• 96 1.7 GHz proc, 32 GB/node (4 GB/proc)
• 11 32-way p690 nodes: 1.7 GHz, 64-256 GB/node (2-8 GB/proc)
• Federation switch: ~6 sec latency, ~1.4 GB/sec pp-bandwidth
• At 283 nodes, ours is one of the largest IBM Federation switches
• All nodes are direct-attached to high-performance SAN disk , 3.8 GB/sec write, 2.0 GB/sec read to GPFS
• GPFS now has 115TB capacity
• 225 TB of gpfs-wan across NCSA, ANL
Due to consistent high demand, in FY05 we added 96 1.7GHz/32GB p655 nodes &
increased GPFS storage from 60 ->125TB - Enables 2048-processor capability jobs
- ~50% more throughput capacity- More GPFS capacity and bandwidth
SAN DIEGO SUPERCOMPUTER CENTER 20
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
BG System Overview:Novel, massively parallel system from IBM
• Full system installed at LLNL from 4Q04 to 3Q05• 65,000+ compute nodes in 64 racks• Each node being two low-power PowerPC processors + memory• Compact footprint with very high processor density• Slow processors & modest memory per processor • Very high peak speed of 367 Tflop/s• #1 Linpack speed of 280 Tflop/s
• 1024 compute nodes in single rack installed at SDSC in 4Q04• Another 1024 compute nodes will be installed soon
• Maximum I/O-configuration with 128 I/O nodes/rack for data-intensive computing
• Systems at 14 sites outside IBM & 4 within IBM as of 2Q06 • Need to select apps carefully
• Must scale (at least weakly) to many processors (because they’re slow)• Must fit in limited memory
SAN DIEGO SUPERCOMPUTER CENTER 21
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Chip(2 processors)
Com pute Card(2 ch ips, 2x1x1)
Node Board(32 ch ips, 4x4x2)
16 Com pute C ards
System(64 cabinets, 64x32x32)
Cabinet(32 Node boards, 8x8x16)
2.8/5.6 G F/s4 M B
5.6/11.2 G F/s0.5 G B DDR
90/180 G F/s8 G B DDR
2.9/5.7 TF/s256 G B DDR
180/360 TF /s16 TB D DR
SDSC was first academic institution with an IBM Blue Gene system
SDSC rack has maximum ratio of I/O SDSC rack has maximum ratio of I/O to compute nodes at 1:8 (LLNL’s is to compute nodes at 1:8 (LLNL’s is
1:64). Each of 128 I/O nodes in rack 1:64). Each of 128 I/O nodes in rack has 1 Gbps Ethernet connection => 16 has 1 Gbps Ethernet connection => 16
GBps/rack potential. GBps/rack potential.
SDSC procured 1-rack system 12/04. SDSC procured 1-rack system 12/04. Used initially for code evaluation and Used initially for code evaluation and
benchmarking; production 10/05. benchmarking; production 10/05. (LLNL system is 64 racks.)(LLNL system is 64 racks.)
Another node will be installed soonAnother node will be installed soon
SAN DIEGO SUPERCOMPUTER CENTER 22
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC Blue Gene - a new resource
• First academic installation of this novel architecture
• Configured for data-intensive computing• 1,024 compute nodes (soon to be 2048) , 128 I/O nodes • Peak compute performance of 5.7 TFLOPS (soon will be 11.4
TFLOPS)• Two 700-MHz PowerPC 440 CPUs, 512 MB per node• IBM network : 4 us latency, 0.16 GB/sec pp-bandwidth• I/O rates of 3.4 GB/s for writes and 2.7 GB/s for reads
achieved on GPFS-WAN• Has own GPFS of 20 TB and gpfs-wan
• System targets runs of 512 CPUs or more
• Production in October 2005• Multiple 1 million-SU awards at LRAC and several smaller
awards for physics, engineering, biochemistry
In Dec ‘04, SDSC brought in a single-rack Blue Gene system- Initially an experimental system to evaluate NSF applications
on this unique architecture-Tailored to high I/O applications
- Entered production as allocated resource in October 2005
SAN DIEGO SUPERCOMPUTER CENTER 23
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
BG System Overview: Processor Chip (1)
“Double FPU”
Ethernet Gbit
JTAGAccess
144 bit wide DDR512MB
JTAG
Gbit Ethernet
440 CPU
440 CPUI/O proc
L2
L2
MultiportedSharedSRAM Buffer
Torus
DDR Control with ECC
SharedL3 directoryfor EDRAM
Includes ECC
4MB EDRAM
L3 CacheorMemory
l
6 out and6 in, each at 1.4 Gbit/s link
256
256
1024+144 ECC256
128
128
32k/32k L1
32k/32k L1
“Double FPU”
256
snoop
Tree
3 out and3 in, each at 2.8 Gbit/s link
GlobalInterrupt
128
SAN DIEGO SUPERCOMPUTER CENTER 24
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
BG System Overview: Processor Chip (2)(= System-on-a-chip)
• Two 700-MHz PowerPC 440 processors• Each with two floating-point units• Each with 32-kB L1 data caches that are not coherent• 4 flops/proc-clock peak (=2.8 Gflop/s-proc)• 2 8-B loads or stores / proc-clock peak in L1 (=11.2 GB/s-proc)
• Shared 2-kB L2 cache (or prefetch buffer)• Shared 4-MB L3 cache• Five network controllers (though not all wired to each node)
• 3-D torus (for point-to-point MPI operations: 175 MB/s nom x 6 links x 2 ways)• Tree (for most collective MPI operations: 350 MB/s nom x 3 links x 2 ways) • Global interrupt (for MPI_Barrier: low latency)• Gigabit Ethernet (for I/O)• JTAG (for machine control)
• Memory controller for 512 MB of off-chip, shared memory
SAN DIEGO SUPERCOMPUTER CENTER 25
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
DataStar p655 Usage, by Node Size
SAN DIEGO SUPERCOMPUTER CENTER 26
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC Academic Use, by Directorate
SAN DIEGO SUPERCOMPUTER CENTER 27
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Strategic Applications Collaborations
• Cellulose to Ethanol : Biochemistry (J. Brady, Cornell)• LES Turbelence : Mechanics (M. Krishnan, U. Minnesota)• NEES : Earthquake Engr (Ahmed Elgamal, UCSD)• ENZO : Astronomy (M. Norman, UCSD)• EM Tomography : Neuroscience (M. Ellisman, UCSD)• DNS Turbulence : Aerospace Engr (PK Yeung, Georgia
Tech)• NVO Mosaicking : Astronomy (R. Williams, Caltech, Alex
Szalay, Johns Hopkins)• UnderstandingPronouns: Linguistics (A. Kehler, UCSD)• Climate : Atmospheric Sc. (C. Wunsch, MIT)• Protein Structure : Biochemistry (D. Baker, Univ. of
Washington)• SCEC, TeraShake : Geological Science (T. Jordan and C.
Kesselman USC, K. Olsen UCSB, B. Minster, SIO)
SAN DIEGO SUPERCOMPUTER CENTER 28
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Topic
1. Supercomputing in General
2. Supercomputers at SDSC
3. Science Enabled at SDSC
SAN DIEGO SUPERCOMPUTER CENTER 29
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Enabling Users – User Centric Focus of SDSC
• SDSC leadership in enabling users• Recruit new users/communities• Enable their HPC• Help write allocations proposals• Make recruited users 100K – 1000K allocated SU users • (D. Baker/U.Washington, C. Wunsch/MITgcm, Mark Ellisman/BIRN, NEES,
PK Yeung/Georgia Tech, M. Krishnan/U. Minn, K. Droegemeier, GEON etc.)
• Balanced machine – memory/node, I/O, queue management – these attract users and retain users
• Work on community users, and comm/3rd party codes, tools, libs
• Procure machines based on needs and characteristics of users’ codes
SAN DIEGO SUPERCOMPUTER CENTER 30
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAC Program and Science Enabled
• Achieve breakthrough computational science that users couldn’t do before
• Pair up SDSC’s computational scientists (many disciplines of domain science and parallel computing expert) with NSF PIs for 3-12 months
• Span all the NSF directorates and universities across US for SAC projects
• Scaling (procs/communication and I/O) up applications is a major thrust
• Develop and apply solutions for wider user community
SAN DIEGO SUPERCOMPUTER CENTER 31
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Scaling DNS Turbulence (PI: Dr. P.K.Yeung, Georgia Tech, SAC staff Dr. Dmitry Pekurovsky)
• Original DNS code used for years to simulate a range of phenomena in turbulence and turbulent mixing
• Over the years PI had millions of allocated SUs on SDSC and other NSF center’s machines
• Currently computing at 2048^3 resolution
• Would like to reach the grid size done on the Earth Simulator i.e. 4096^3 resolution, to better understand physics at micro scales
• Original code is limited in scalability by N (4096) processors for N^3 grid problem
SAN DIEGO SUPERCOMPUTER CENTER 32
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
2-D Parallel Decomposed Code• Reimplemented in 2-D parallel
decomposition of the compute-intensive part (3D FFT)
• Now capable of scaling up to N2 processors (16M)
• New code successfully tested and running on 32,768 BG processors at IBM Watson lab (4096^3 – first ever attemted in US)
• By-product: optimized library for scalable 3D FFT, for use in other codes. Beta version available at SDSC Web site. Currently using the library in another turbulence code, as part of another SAC project.
128
256
512
1024
2048
4096
16384
32768
8192
1.E+02 1.E+03 1.E+04 1.E+05
Nproc
N^
3 L
OG
2(N
) / T
512 3̂
1024 3̂
2048 3̂
4096 3̂
The execution speed (# of steps per second of execution),
normalized by the problem size, is plotted on the Y-axis.
SAN DIEGO SUPERCOMPUTER CENTER
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC Enables “CASP in 3 Hours” to Speed Simulations for Drug Design (SDSC recruited, parallelized code; millions of SUs) (PI David Baker U. Washington; SAC staff Dr. Ross Walker)
•SDSC Blue Gene•Work horse for CASP 7 competition.•Provided access to an order of magnitudemore computing power than was availablefor CASP 6.•Only NSF machine available that couldprovide a job “turnaround” (Queue+Runtime)of less than 1 week for all CASP targets.•Test bed for “extreme scaling” modifications.
Provided a development environment to successfullyscale HHMI Professor Baker’s Rosetta code toover 40,000 processors using IBM TJWatson BlueGene System
•SDSC Datastar•Used for the 10% of CASP targets that requireda large memory footprint.•2000+ cpu jobs possible for large structureprediction problems.
Image shows the blind prediction (Blue) of a CASP7 target. Red shows the x-ray structure (released after the prediction was submitted) and Green shows a low resolution NMR structure. The prediction was performed by Ross Walker (SDSC) and Srivatsan Raman (UW) in an unprecedented 3 hours using 40,960 cpus of IBM TJWatson Blue Gene/L Machine. Such a calculation was only possible from the experience learnt via the SDSC SAC collaboration.
SAN DIEGO SUPERCOMPUTER CENTER 34
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC SAC Group Improves Charmm Scaling for Cellulase Research (PIs from
TSRI, NREL, Cornell; SAC staff Dr. Ross Walker)
• Cellulase: key enzyme in the production of cellulistic ethanol.• Opportunity to reduce the USA’s
dependence on foreign oil.• True molecular machine.• 1 million atom+ simulations need high
performance capability computing.• Datastar is the perfect platform for this.• SDSC (Ross Walker) is working on
improving the performance and scaling of the CHARMM MD code.
• Improvements will ultimately benefit thousands of researchers.
SAN DIEGO SUPERCOMPUTER CENTER 35
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
NEES(PIs – Iowa St., Stanford,Princeton,U.Missouri,UC Berkeley, Davis etc.;
SAC Staff Dr. Dong Ju Choi)• Network for Earthquake
Engineering Simulation (NEES) is an NSF-funded MRE project.
• Provides world-class experimental facilities, coordinated IT (NEESit), data, networking and computational support, including HPC simulation support, to the NEES community.
• SDSC SAC staff is working with NEESit and NEES scientists to optimize code performance and scalability and to enable HPC for NEES community (new) users
Shaketable Viz done by SDSC Viz group –Steve Cutchin, and Amit Chourasia
• SDSC recruited NESS for HPC usageand wrote successful allocation proposal
SAN DIEGO SUPERCOMPUTER CENTER 36
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
NEES SAC• OpenSees (core structural finite element object oriented code for the
NEES community): scalability is much improved using the various parallel solver algorithm (Petsc, MUMPS, Distributed Super LU) and different communication scheme
• Recently demonstrated 2048 DataStar processor runs for 25 million elements with good scalability and single PE performance on Puente Hills earthquake simulation (originally code was modeling 1 million elements on few procs)
• 13 sub-PIs (over 30 new users) are new to HPC but using the DataStar and the TG ia64 through the NEES HPC allocation as a type of community allocation
• Users are using their improved code and/or existing structural/fluid codes (OpenSees, LS-Dyna, Abaqus, Ansys, Fluent etc.) and resulted significant increaes in HPC usage
• Designed and developed a utility for parametric runs and worked with the users to successfully complete the jobs
SAN DIEGO SUPERCOMPUTER CENTER 37
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
ENZO SAC - Scaling & Optimization(PI: Mike Norman, UCSD, SAC staff : Dr. Robert Harkness)
(SDSC contributes to writing alloc prop)• NonAMR – Lyman Alpha Forest simulation – compare results of the
simulation based on concordance model of cosmology with observation to constrain cosmological parameters.
• AMR1 – cluster of galaxies, x-ray emmisivity – comparison with x-ray obs.AMR2 – The AMR “light cone” simulations to support the construction of the LSST (Large scale Synoptic Survery Telescope)
• ENZO problem sizes increased by ~8^3, cost ~8^4 in 3 years – expect a further increase of ~8^3 in 3 years
• Today non-AMR grids up to 2048^3 with 8 billion dark matter particles possible on 2048 cpus of DataStar compared to 256^3 grids on about 64 processors a few years ago – this is result of SDSC SAC effort
• AMR 512^3 top-level grids with 7 levels of refinement, including 512^3 dark matter particles, generating > 350,000 subgrids (SAC effort resulted in N^2 to NlogN scaling improvement)
• Shared-memory parallelism used in initial conditions generator• Massively parallel dark matter particle sort enables 100% parallel I/O• Weak scaling shows linear behavior up to 2048 cpus• Strong scaling limited by ghost cells and boundary exchanges
SAN DIEGO SUPERCOMPUTER CENTER 38
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
ENZO – New physics and enhanced scaling
• ENZO will incorporate MHD and 3D flux-limited diffusion
• Advanced parallel multigrid solvers for gravity and RT
• Refactoring of AMR grid hierarchy for unlimited scaling
• Gadget equilibrium cooling• Unigrid scale up to 4096^3 and
8192^3 at Petascale on 16K to 64K processors
• 2048^3 L6 AMR at Petascale • I/O strategies for managing
multi-Petabyte results• Integrated visualization, steering
and tracking 2048^3 LAF on 2048 CPUs of DataStar (only NSF machine capable of this – 5TB memory required)
SAN DIEGO SUPERCOMPUTER CENTER 39
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC SAC TeraShake Efforts(PIs : Tom Jordan, USC, K. Olsen SDSU, B. Minster, SIO;
SAC Staff Dr. Yifeng Cui) (SDSC helps in allocation proposal)
Before SDSC SAC involved• Code deals up to 56 million mesh • Code scales up to 512 processors• Ran on local clusters only• No checkpoints/restart capability• Wave propagation simulation only• K. Olsen’s own code• Poor single-processor performance• Initialization slow and memory problems• MPI-I/O bugs, not scalable
After SDSC SAC efforts• Codes enhanced to deal with 8.6 billion mesh • Excellent speed-up to 2048 processors, achieve 1 Tflop/s• Ported to Datastar, BG/L, TG IA-64, Lemieux etc• Added Checkpoints/restart/checksum capability• Integrated dynamic rupture + wave propagation as one• Serve as SCEC Community Velocity Model• 4x speed-up of single-processor performance• 10x speed-up of initialization and memory needs reduced• MPI-I/O improved 10x, generating 47TB outputs per run
TeraShake code Total Execution Time on IBM Power4 Datastar
10.00
100.00
1000.00
10000.00
120 240 480 960 1920
Number of processors
Wal
l Clo
ck T
ime
(sec
, 101
ste
ps)
WCT time with improved I/OWCT idealWCT time with TeraShake-2WCT time with TeraShake-1
95%86% efficiency
86%
Source: 600x300x80kmM esh: 3000x1500x400Spatial resolution: 200mNumber of steps: 101Output: every time step
SAN DIEGO SUPERCOMPUTER CENTER 40
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Real World Engr Flows – PI: Mahesh Krishnan, U. Minnesota
(contributed to acquiring million SUs)
• Numerical methods and turbulence models that handle real-world engineering geometries without compromising the accuracy needed to reliably simulate the complicated details of turbulence
• DNS of turbulent jet in cross flow : 12 million control volumes (CV), 144 DS procs• Propeller crashback : 13 million CV, 384 TG procs, Re ~480,000• Spatially evolving turbulent round jet :
today : ~50 million CV (unstructured) on 1024 DataStar procs, Re ~2400 yesterday: ~6.5 million CV on 160 DataStar procs, ~Re 1000
• Fourier Spectral code runs on Blue Gene – SDSC SAC effort onging for memory scaling
An exact simulation, without approximations, of a turbulent jet using DNS. www.aem.umn.edu/~mahesh/forsdsc/jic_vort.avi
Simulation of flow around a propeller in sudden reversal known as crashback. Flow is left to right and shows streamlines
and pressure contours in the cross-section
SAN DIEGO SUPERCOMPUTER CENTER 41
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
SDSC Enables Accurate Simulation of Sun’s Corona (PI: Chuck Goodrich, BC, Z. Mikic, SAIC)
• The most true-to-life computer simulation ever made of our sun's multimillion-degree outer atmosphere, the corona, successfully predicted its actual appearance during the total solar eclipse of March 29, 2006
• The demanding calculations required four days running on more than 600 processors of the DataStar system at the SDSC
• Computer model based on spacecraftobservations of magnetic activity
• More realistic physics of how enerygyis transferred in the corona
• PMaC (Allan Snavely, Nick Wright) group involved in scaling work
A composite of observations of the eclipse. Solar north is up. Solar Physics Group, SAIC; Williams College Eclipse Expedition with support from NSF/NASA/National Geographic, and SOHO, supported by NASA and ESA
SAN DIEGO SUPERCOMPUTER CENTER 42
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Longest-ever Simulation of Type Ia SupernovaAlexei Khokhlov, Don Lamb – U. Chicago
• The first self-consistent 3-D numerical simulation of the Type Ia supernova deflagration explosion from the moment of ignition through the active explosion phase and followed up to the period of 11 days
• The current state of the art multidimensional models of such astrophysical phenomena have typically followed the evolution of the system for a few tens of seconds
• Post-explosion evolution of Type Ia supernova lasts for much longer periods of time going through various stages with different physical processes being important at different stages
• On 512 DS processors - total SU usage in August was ~30,000; Overall SU, included development & testing of the numerical code, was ~200,000 SUs
Flamestructure in thestar at 2 sec
and at 77 min
0 – unburnedor fuel
1 – totallyburned
SAN DIEGO SUPERCOMPUTER CENTER 43
Oct 16 2006
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Estimating the State of the Southern OceanCarl Wunsch, Matt Mazloff, MIT, ECCO Consort.
(recruited this group and enabled to get million SUs)• “The ECCO group faces a computationally massive problem
that is only feasible thanks to computing centers like the SDSC.” Matt Mazloff, MIT
• Diagnosing and evaluating the state of the Southern Ocean
• Global ocean circulation imact climate change, ocean currents affect fisheries dynamics, shipping, offshore mining, sea level height change, sea surface temperatures, storm development, seasonal droughts and floods
• Key adjoint method used in the MITgcm code – balanced machine, 4 GB/procs, good I/O vital
• Simulations on DataStar provided improved estimate of southern ocean for the year 2000
• Received about one million SUs last March on DataStar; short term goal is to improve year 2000 estimate and extend thru 2003