computer science and mathematics division...

31
Computer Science and Mathematics Division Overview Briefing for RAMS Program Barney Maccabe, PhD Director, Computer Science and Mathematics Division Oak Ridge National Laboratory January 21, 2009

Upload: duongdieu

Post on 15-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Computer Science and Mathematics Division Overview

Briefing for RAMS Program

Barney Maccabe, PhDDirector,Computer Science and Mathematics DivisionOak Ridge National Laboratory

January 21, 2009

Computer Science and Mathematics DivisionComputer Science and Mathematics Division

Mission

Our mission includes basic research in computational sciences and application of advanced computing systems, computational, mathematical and analysis techniques to the solution of scientific problems of national importance. We seek to work collaboratively with universities throughout the world to enhance science education and progress in the computational sciences.

Vision

The Computer Science and Mathematics Division (CSMD) seeks to maintain our position as the premier location among DOE laboratories and to become the premier location worldwide where outstanding scientists, computer scientists and mathematicians can perform interdisciplinary computational research.

2

The Computer Science and Mathematics Division (CSMD) is ORNL's premier source of basic and applied research in high-performance computing, applied mathematics, and intelligent systems. Basic and applied research programs are focused on computational sciences, intelligent systems, and information technologies.

Complex Systems

J. BarhenP. A. Boyd

Y. Y. BraimanC. W. GloverT. S. HumbleN. ImamS. M. Lenhart7

B. Liu5

L. E. Parker7

N. S. V. RaoD. B. Reister7

F. M. Rudakov3

D. R. TufanoB. D. Sullivan

Computer ScienceResearch

G. A. GeistC. Sonewald

P. K. AgarwalD. E. BernholdtM. L. ChenX. ChengJ. J. Dongarra7

W. R. ElwasifC. EngelmannS. S. Hampton5

G. H. Kora8

J. A. KohlA. Longo2

X. Ma2

T. J. NaughtonH. H. OngB-H. ParkN. F. Samatova2

S. L. ScottA. TikotekarA. Shet5a

G. R. ValleeS. VazhkudaiW. R. WingM. Wolf2

Statistics andData Science

A. B. Maccabe(acting)L. E. Thurston

R. W. CountsB. L. JacksonG. OstrouchovL. C. PouchardD. D. SchmoyerK. A. Stewart6

D. A. Wolf

Computational Earth Sciences

J. B. DrakeC. Sonewald

M. L. BranstetterD. J. EricksonK. J. EvansM. W. HamF. M. HoffmanR. T. MillsP. H. Worley

Computational Mathematics

E. F. D’AzevedoL. E. Thurston

V. Alexiades7

R. K. ArchibaldB. V. Asokan5

V. K. ChakravarthyR. DeiterdingG. I. FannS. N. FataL. J. GrayC. S. GroerR. Hartman-Baker6

J. C. HillJ. Jia5

A. K. KhamaysehM. R. Leuze7

V. Mantic2a

Y. Ou8

S. PannalaP. Plechac2

O. Sallah2a

Computational Materials Science

J. C. WellsK. L. Lewis

G. Alvarez6a

R. M. Day8

Y. Gao2

A. A. GorinI. Jouline2

P. R. Kent8

T. A. Maier6a

D. M. NicholsonP. K. NukalaB. RadhakrishnanG. B. SarmaS. SimunovicX. Tao5

X-G. Zhang6a

ComputationalChemical Sciences

R. J. HarrisonL. E. Thurston

E. ApraJ. BernholcA. Beste8

D. Crawford2

E. Cruz-Siva5

T. Fridman8

M. Goswami5J. Huang5

W. Lu2

V. MeunierM. B. Nardelli2C. PanK. Saha5

W. A. SheltonB. G. SumpterA. Vazquez-Mayagoitia

J. Cobb, Lead, TeraGrid

FutureTechnologies

J. S. VetterT. S. Darland

S. R. AlamD. A. Bader2

C. B. McCurdy5

J. S. MeredithK. J. RocheP. C. RothT. L. Sterling2a

O. O. StoraasliV. Tipparaju

ADVISORY COMMITTEEJerry Bernholc David KeyesThomas Sterling Warren Washington

Computer Science and MathematicsA. B. Maccabe, Director

L. M. Wolfe, Division Secretary

S. W. Poole, Chief Scientist andDirector of Special Programs

Operations Council

Finance TechnicalU. F. Henderson Information and

CommunicationsHuman Resources B. A. Riley6b

N. Y. Wright D. L. PackJ. Gergel

Recruiter Facility 5600/5700W. L. Peper J. Gergel

Computer Security ESH/Safety OfficerJ. G. Winfree6 D. L. Pack

Quality Assurance CSB ComputingL. C. Pouchard6 Center Manager

M. W. Dobbs6

Division Training OfficerL. M. Wolfe

1Co-op 2Joint Faculty/2aORISE Faculty3Wigner Fellow 4Householder Fellow5Postdoc/5aPostmaster 6Dual Assignment/6aMatrix/6b off-site assign.7Part-time 8JICS

Center for Molecular BiophysicsJ. C. Smith, Director

Center For Engineering Science Advanced Research (CESAR)J. Barhen, Director

1/7/2009

Computational Astrophysics

A. Mezzacappa6a

P. A. Boyd

C. Y. Cardall6a

E. Endeve5,6a

H. R. Hix6a

E. Lentz5,6a

B. Messer6a

Experimental Computing Laboratory (ExCL)J. S. Vetter, Director

Computational Engineering and Energy Sciences

J. A. TurnerT. S. Darland

Application Performance Tools

R. GrahamT. S. Darland

R. F. BarrettL. G. Broto5

S. W. HodsonT. R. JonesG. A. KoenigJ. A. KuehnJ. S. Ladd

T.C. Schulthess

Computer Science and Mathematics

• General CS Areas− Architectures & Performance measurement− Tools & I/O− System software & Programming environments

• General Math Areas− Computational Math (Linear Algebra, Meshing, Multiscale

methods)− Data Science

• “Rigorous Confidence Measures for Decision Making”• V&V, Uncertainty Quantification, Sensitivity Analysis

4

Computational Science

• Materials Science

• Chemistry

• Astrophysics

• Molecular Biophysics

• Engineering and Energy Science

• Complex Systems− Spanning virtual to physical

5

The Big Challenge: Computation at Scale

• The end goal is impact on science related to energy

• Computational Science provides the context

• Mathematics and Data Science provide the foundations

• Computer Science provides the architectures, tools and programming environments needed to for computation

• How do we provide the right computational environment for science?

• How do we think differently about computation?

6

Institutes

• Institute for Advanced Architectures and Algorithms− Math (Algorithms) and Computer Science (Architecture

and Tools)− ORNL and Sandia

• Extreme Scale System Center− Focus on DARPA HPCS systems− DoD, DARPA, DoE and ORNL

• Understand and influence architectures

7

Computer Science Research Computer Science Research

• Heterogeneous Distributed Computing – PVM, Harness, OpenMPI(NCCS)

• Holistic Fault Tolerance – CIFTS

• CCA Changing the way scientific software is developed and used

• Cluster computing management and reliability – OSCAR, MOLAR

• Building a new way to do Neutron Science – SNS portal

• Building tools to enable the LCF science teams – Workbench

• Data-Intensive Computing for Complex Biological Systems –BioPilot

• EarthSystem Grid – Turning climate datasets into community resources (SDS, Climate)

• Robust storage management from supercomputer to desktop –Freeloader

• UltraScienceNet defining the future of national networking

• Miscellaneous – electronic lab notebooks, Cumulvs, bilab, smart dust

8

Perform basic research and develop software and tools to make high performance computing more effective and accessible for scientists and engineers.

9

Sponsors include:

SciDAC• Performance Engineering Research Institute• Scientific Data Management• Petascale Data Storage Institute• Visualization (VACET)• Fusion• COMPASSDOE Office of Science• Fast OS - Molar• Software EffectivenessDOD• HPC Mod ProgramNSA• Peta-SSI FASTOSDARPA• Peta-Scale Performance (DARPA HPCS)LDRD• Performance Tools for Large Scale Systems• FPGA Programmability• PerumullaNCCS• Vendor Interaction Evaluations• Scientific Computing

Special Projects / Future TechnologiesSpecial Projects / Future TechnologiesResearch Mission - performs basic research in core

technologies for future generations of high-end computing architectures and system software, including experimental computing systems, with the goal of improving the performance, reliability, and usability of these architectures for users.

Topics include• Emerging architectures

• IBM CELL (i.e., Playstation)• Graphics Processors (e.g., Nvidia)• FPGAs (e.g., Xilinx, Altera, Cray, SRC)• Cray XMT Mutlithreaded architecture

• Operating systems• Hypervisors• Lightweight Kernels for HPC

• Programming Systems• Portable programming models for

heterogeneous systems• Parallel IO

• Improving Lustre for Cray• Performance modeling and analysis

• Improving performance on today’s systems• Modeling performance on tomorrow’s systems

(e.g., DARPA HPCS)• Tools for understanding performance

• Applications• Fusion• Physics• Bio

• Visualization• New methods for CNMS

Complex SystemsComplex Systems

10

Examples of current research topics:• Missile defense: C2BMC (tracking and discrimination), NATO, flash

hyperspectral imaging• Modeling and Simulation: Sensitivity and uncertainty analysis, global

optimization • Laser arrays: directed energy, ultraweak signal detection, terahertz

sources, underwater communications, SNS laser stripping • Terascale embedded computing: emerging multicore processors

for real-time signal processing applications (CELL, HyperX, …)• Anti-submarine warfare: ultra-sensitive detection, coherent sensor

networks, advanced computational architectures for nuclear submarines, Doppler-sensitive waveforms, synthetic aperture sonar

• Quantum optics: cryptography, quantum teleportation• Computer Science: UltraScience network (40-100Gb per L)• Intelligent Systems: neural networks, mobile robotics

Mission: Support DOD and the Intelligence CommunityTheory – Computation – Experiments

Sponsors: DOD(AFRL, DARPA, MDA,ONR, NAVSEA ), DOE(SC), NASA, NSF, IC (CIA, DNI/DTO, NSA) UltraScience Net

Statistics and Data ScienceStatistics and Data Science• Chemical and Biological Mass Spectrometer Project• Discrimination of UXO• Forensics - Time Since Death• A Statistical Framework for Guiding Visualization of Petascale Data Sets• Statistical Visualization on Ultra High Resolution Displays• Local Feature Motion Density Analysis• Statistical Decomposition of Time Varying Simulation Data• Site-wide Estimation of Item Density from Limited Area Samples• Network Intrusion Detection• Bayesian Individual Dose Estimation• Sparse Matrix Computation for Complex Problems• Environmental Tobacco Smoke (ETS)• Explosive Detection Canines• Fingerprint Uniqueness Research• Chemical Security Assessment Tool (CSAT)• Sharing a World of Data: Scaling the Earth Systems Grid to Petascale• Group Violent Intent Modeling Project• ORCAT: A desktop tool for the intelligence analyst• Data model for end-to-end simulations with Leadership Class Computing• A Knowledge-Based Middleware and Visualization Framework for the Virtual

Soldier Project• GWAVA: An Information Retrieval Web application for the Virtual Autopsy

11

Computational MathematicsComputational Mathematics

• Development of multiresolution analysis for integro-differential equations and Y-PDE

• Boundary integral modeling of Functionally Graded Materials (FGMs)

• Large-scale parallel Cartesian structured adaptive mesh refinement

• Fast Multipole / non-uniform FFTs• New large-scale first principles electronic structure

code• New electronic structure method• Fracture of 3-D cubic lattice system• Adventure system• Eigensolver with Low-rank Upgrades for Spin-

Fermion Models

12

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

13

The Science Case for Peta- (and Exa-) scale Computing• Energy

− Climate, materials, chemistry, combustion, biology, nuclear fusion/fission

• Fundamental sciences− Materials, chemistry, astrophysics

• There are many others− QCD, accelerator physics, wind, solar, engineering design (aircraft,

ships, cars, buildings)

• What are key system attribute issues?− Processor speed− Memory (capacity, B/W, latency)− Interconnect (B/W, latency)

Computational Chemical SciencesComputational Chemical Sciences• Application areas

− Chemistry, materials, nanoscience, electronics

• Major techniques− Atomistic and quantum modeling of

chemical processes− Statistical mechanics and chemical

reaction dynamics• Strong ties to experiment

− Catalysis, fuel cells (H2), atomic microscopy, neutron spectroscopy

• Theory− Electronic structure, statistical

mechanics, reaction mechanisms, polymer chemistry, electron transport

• Development− Programming models for petascale

computers− Petascale applications

14

Computational Chemical Sciences is focused on the development and application of major new capabilities for the rigorous modeling of large molecular systems found in, e.g., catalysis, energy science and nanoscalechemistry.

Applied Math

Theoretical &ComputationalChemistry

New simulation capabilities

Chemical Science:A partnership of experiment, theory and simulation workingtowards shared goals.

Computer Science

Funding Sources: OBES, OASCR, NIH, DARPA

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

15

Science Case:Example - Chemistry

• 250 TF− Accurate large-scale, all-

electron, density functional simulations

− Absorption on transition metal oxide surface (important catalytic phenomenon)

− Benchmarking of Density-Functionals (importance of Hartree-Fock exchange)

• 1 PF− dynamics of few-electron

systems − Model of few-electron systems

interaction with intense radiation to a guaranteed finite precision

• Sustained PF− Treatment of absorption

problem with larger unit cells to avoid any source of error

• 1 EF− Extension of the interaction

with intense radiation to more realistic systems containing more electrons

Design catalysts, understand radiation interaction with materials, quantitative prediction of absorption processes

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

16

System Attribute: Example - Memory B/W

• Application drivers− Multi-physics, multi-scale applications stress memory bandwidth as they

do node memory capacity• “Intuitive” coding styles often produce poor memory access patterns

− Poor data structures, excessive data copying, indirect addressing

• Algorithm drivers− Unstructured grids, linear algebra− Ex: AMR codes work on blocks or patches, encapsulating small amounts

of work per memory access to make the codes readable and maintainable• This sort of architecture requires very good memory BW to achieve

good performance

• Memory B/W suffers in the multi-core future− Must apps just have to “get used to not having any”?

• Methods to (easily) exploit multiple levels of hierarchical memory are needed

• Cache blocking, cache blocking, cache blocking• Gather-scatter

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

17

System Attribute: Interconnect Latency

• Application drivers− Biology

• Stated goal of 1 ms simulation time per wall clock day requires 0.1 μs wall clock time per time step using current algorithms

− Others: chemistry, materials science• Algorithm drivers

− Explicit algorithms using nearest-neighbor or systolic communication− Medium- to fine-grain parallelization strategies (e.g. various distributed data

approaches in computational chemistry)• Speed of light puts fundamental limit on interconnect latency

− Yet raw compute power keeps getting faster, increasing imbalance• Path forward

− Need new algorithms to meet performance goals − The combination of SW and HW must allow the ability to fully overlap

communication and computation− Specialized hardware for common ops?

• Synchronization; global and semi-global reductions− Vectorization/multithreading of communication?

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

18

Analysis of applications drives architecture

Programming Model Scaling Aids Performance Libraries I/O

Fortr

an 7

7

Fortr

an 9

0

Fortr

an 9

5

C/C

++

MPI

Ope

nMP

Ope

nMP

Pth

read

s

CAF/

UPC

SHM

EM

MPI2

Ass

embler

BLA

S

FFTW

LAPACK

PBLA

S

Pet

SC

Sca

lapa

ck

Cra

y Scilib

HDF5

MPI-I

O

netC

DF

DNS 1 1 1 1 1 1 1 1MILC 1 1 1 1 1NAMD 1 1 1 1 1 1WRF 1 1 1 1 1 1POP 1 1 1 1 1 1HOMME 1 1 1 1 1 1CICE 1 1 1 1 1 1RMG 1 1 1 1 1 1 1 1 1 1PARSEC 1 1 1 1 1 1 1Espresso 1 1 1 1 1 1 1LSMS 1 1 1 1 1 1SPECFEM3D 1 1 1 1 1Chimera 1 1 1 1 1 1 1GTC 1 1 1 1 1 1GAMESS 1 1 1 1 1

Nod

e P

eak

Flop

sM

emor

y B

andw

idth

Inte

rcon

nect

Lat

ency

Mem

ory

Late

ncy

Inte

rcon

nect

Ban

dwid

thN

ode

Mem

ory

Cap

acity

Dis

k B

andw

idth

Loca

l Sto

rage

Cap

acity

Dis

k La

tenc

yW

AN

Net

wor

k B

andw

idth

Mea

n Ti

me

to

Inte

rrup

tA

rchi

val

Sto

rage

C

apac

ity

9691

88 84 8277

6156

4440 40

38

Maximum possible score = 97Minimum possible score = 19

• System design driven by usabilityacross wide range of applications

• We understand the applications needs and implications on architectures to deliver science at the exascale

Engaging application communities to determine system requirements

19

Community engaged Process of engagement Outcome

Users Enabling Petascale Science and Engineering Applications Workshop, December 2005, Atlanta, GA

Science breakthroughs requiring sustained petascale; community benchmarks

Application Developers

Sustained Petaflops: Science Requirements, Application Design and Potential Breakthroughs Workshop, November 2006, Oak Ridge, TN

Application walk-throughs; identification of important HPC system characteristics

Application Teams Weekly conference calls, November 2006–January 2007

Benchmark results; model problems

Application Teams Cray Application Projection Workshop, January 2007 Oak Ridge, TN

Project benchmarks on XT4 to model problem on sustained petaflops system

We are Getting a Better Handle on Estimating I/O Requirements

20

Disk Bandwidth

Disk Capacity

SoftwareImplementations

21

• Fortran still winning• NetCDF and HDF5 use is

widespread, but not their parallel equivalents

• Widespread use of BLAS and LAPACK

22 Managed by UT-Battellefor the Department of Energy

Mission: Deploy and operate the computational resources neededto tackle global challenges

Vision: Maximize scientific productivityand progress on the largest scalecomputational problems

• Climate change• Terrestrial sequestration of carbon• Sustainable nuclear energy• Bio-fuels and bio-energy• Clean and efficient combustion• Nanoscale Materials• Energy, ecology and security

• Providing world class computational resources and specialized services for the most computationally intensive problems

• Providing a stable hardware/software path of increasing scaleto maximize productive applications development

• Series of increasingly powerful systems for science

Cray “Baker”: 1+ PFAMD multi-core socket G3 with upgrade to 3-6 PF

1 EF system with upgrade to 2 EF

FY2009 FY2011 FY2014 FY2017

Plan for delivering Exascale systems in a decade

20 PF system with upgrade to 40 PF

100 PF system with upgrade to 250 PF based on disruptive technologies

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

23

ORNL design for a 20PF System264 compute cabinets54 router cabinets25,000 uplink cables

• Up to ~20m long• 3.125 Gbps signaling

Potential for optical linksFat Tree Network

20 PF peak1.56 PB memory234 TB/s global bandwidth6224 SMP nodes, 112 IO nodes

Disk:• 46 PB, 480 GB/s

7,600 square feet14.6 Mwatts

Performance projected from 68 TF Cray XT4 to 20 PF Cray Cascade

24

% Peak for benchmark

Projection Time for Model Problem

Performance Projection for the Model Problem

Projected Improvement in Socket Performance Max = 3200% In

terc

onne

ct

Ban

dwid

th

Inte

rcon

nect

La

tenc

y

Inte

rcon

nect

In

ject

ion

Ban

dwid

th

Nod

e Pe

rfor

man

ce

Nod

e M

emor

y B

andw

idth

Prob

lem

Siz

e

DNS 13.20% 5.79 s/step 1.5 PF MILC 18-50% 29.2 Hours 1 PF NAMD 15.00% 7 Hours 3.5 PF WRF 11% 22.63 hours 1 PF 1200% POP 3.80% 965% HOMME 12% 1200% RMG 36.50% 310/iter 2200% PARSEC 35% 1.3 hours/iter 2400% Espresso 36% 786s/iter 1080% LSMS 68% 10 PF 2000% Chimera 19% 4.825 s/step 1400% GTC 13.30% 1100% SPECFEM3D 22% 5.6 hours 1.6 PF 1550%

Bottlenecks

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

25

Processor Trends

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2007 2008 2009 2010 2011 2012Year

GFL

OPS

per

Soc

ket

HighLowLikelyDisruptive

Processor Performance Trends

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

2626

How do Disruptive Technologies change the game?

• Assume 1TF to 2TF per socket • Assume 400 sockets / rack == 400 to 800 TF/rack• 25 to 50 racks for 20 PF system• Memory technology investment needed to get the bandwidth

without using 400,000 DIMMS• Roughly 150Kw/Rack• Water cooled racks• Lots of potential partners: Sun, Cray, Intel, AMD, IBM, DoD

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

2727

Building a 100-250 PF system• Assume 5TF to 10TF per socket • Assume 400 sockets / rack == 2PF to 4PF/rack• 50 to 100 racks for 100-250 PF system• Memory technology investment needed to get the bandwidth without

using 800,000 DIMMS• Various 3D technologies/solutions could be available• Partial Photonic Interconnect• Roughly ~150Kw/Rack• Water cooled racks• Liquid Cooled processors• Hybrid is certainly an option• Potential partners Sun, Cray, Intel, IBM, AMD, nVidia??

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

28

Building an Exaflop System• Assume 20TF to 40TF per socket • Assume 400 sockets / rack == 8PF to 16PF TF/rack• 125 to 250 racks for 1Ex-2Ex system• Memory technology investment needed to get the bandwidth without

using 1,000,000 DIMMS• Various 3D Technologies will be available, new memory design(s)• All Photonic Interconnect (No Copper)• Roughly ~250Kw/Rack• Water cooled racks• Liquid Cooled Processors• Hybrid is certainly an option• Potential partners Sun, Cray, Intel, IBM, AMD, nVidia??

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

29

What investments are needed?• Memory bandwidth on processors

− Potential 2-10X• need 2-4X just to maintain memory BW/FLOP ratio

− Potential joint project with AMD and Sun to increase effective bandwidth

− Another possibility is 3D memory - assume $15M + partners.

• Interconnect Performance− Working with IBM, Intel, Mellanox, Sun− Define 8,16,32X (4, 12X already defined)− Define ODR in IBTA− Already showing excellent latency characteristics− All Optical (exists currently)

• Systems software, tools, and applications• Proprietary networks possible 10X, but large

investment and risk. Assume $20M• Packaging density and cooling will continue to require

investment• Optical cabling/interconnect is a requirement, yet

always in the future

Potential Infiniband Performance

0

100

200

300

400

500

600

700

IB-SDR IB-DDR IB-QDR IB-ODR

Signaling Rate

Gb/s

IB-4XIB-8XIB-12XIB-16XIB-32X

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

30

• Exceptional operational expertise and experience• In-depth applications expertise in house• Strong partners• Proven management team

• Driving architectural innovation needed for Exascale• Superior global and injection bandwidths• Purpose built for scientific computing• Leverages DARPA HPCS technologies

• Broad system software development partnerships• Experienced performance/optimization tool development teams • Partnerships with vendors and agencies to lead the way

Petaflops

• Power (reliability, availability, cost)• Space (current and growth path)• Global network access capable of 96 X 100 Gb/s

Exaflops

Strengths in five key areas will drive project success with reduced risk

• Multidisciplinary application development teams• Partnerships to drive application performance• Science base and thought leadership

Questions?

31