Transcript
Page 1: S CICOM P, IBM, and TACC: Then, Now, and Next

SCICOMP, IBM, and TACC:Then, Now, and Next

Jay Boisseau, Director

Texas Advanced Computing Center

The University of Texas at Austin

August 10, 2004

Page 2: S CICOM P, IBM, and TACC: Then, Now, and Next

Precautions

• This presentation contains some historical recollections from over 5 years ago. I can’t usually recall what I had for lunch yesterday.

• This presentation contains some ideas on where I think things might be going next. If I can’t recall yesterday’s lunch, it seems unlikely that I can predict anything.

• This presentation contains many tongue-in-cheek observations, exaggerations for dramatic effect, etc.

• This presentation may cause boredom, drowsiness, nausea, or hunger.

Page 3: S CICOM P, IBM, and TACC: Then, Now, and Next

Outline

1. Why Did We Create SCICOMP5 Years Ago?

2. What Did I Do with My Summer (and the Previous 3 Years)?

3. What is TACC Doing Now with IBM?

4. Where Are We Now? Where Are We Going?

Page 4: S CICOM P, IBM, and TACC: Then, Now, and Next

Why Did We Create SCICOMP5 Years Ago?

Page 5: S CICOM P, IBM, and TACC: Then, Now, and Next

The Dark Ages of HPC

• In late 1990s, most supercomputing was accomplished on proprietary systems from IBM, HP, SGI (including Cray), etc.– User environments were not very friendly– Limited development environment (debuggers,

optimization tools, etc.)– Very few cross platform tools– Difficult programming tools (MPI, OpenMP… some

things haven’t changed)

Page 6: S CICOM P, IBM, and TACC: Then, Now, and Next

Missing Cray Research…

• Cray was no longer the dominant company, and it showed– Trend towards commoditization had begun– Systems were not balanced

• Cray T3Es were used longer than any production MPP

– Software for HPC was limited, not as reliable• Who doesn’t miss real checkpoint/restart, automatic

performance monitoring, no weekly PM downtime, etc.?

– Companies were not as focused on HPC/research customers as on larger markets

Page 7: S CICOM P, IBM, and TACC: Then, Now, and Next

1998-99: Making Things Better

• John Levesque hired by IBM to start the Advanced Computing Technology Center– Goal: ACTC should provide to customers what

Cray Research used to provide

• Jay Boisseau became first Associate Director of Scientific Computing at SDSC– Goal: Ensure SDSC helped users migrate from

Cray T3E to IBM SP and do important, effective computational research

Page 8: S CICOM P, IBM, and TACC: Then, Now, and Next

Creating SCICOMP

• John and Jay hosted workshop at SDSC in March 1999 open to users and center staff– to discuss current state, issues, techniques, and results in

using IBM systems for HPC– SP-XXL already existed, but was exclusive and more

systems-oriented• Success led to first IBM SP Scientific Computing

User Group meeting (SCICOMP) in August 1999 in Yorktown Heights – Jay as first director

• Second meeting held in early 2000 at SDSC• In late 2000, John & Jay invited international

participation in SCICOMP at IBM ACTC workshop in Paris

Page 9: S CICOM P, IBM, and TACC: Then, Now, and Next

What Did I Do with My Summer(and the Previous 3 Years)?

Page 10: S CICOM P, IBM, and TACC: Then, Now, and Next

Moving to TACC?

• In 2001, I accepted job as director of TACC• Major rebuilding task:

– Only 14 staff– No R&D programs– Outdated HPC systems– No visualization, grid computing or data-intensive

computing– Little funding– Not much profile– Past political issues

Page 11: S CICOM P, IBM, and TACC: Then, Now, and Next

Moving to TACC!

• But big opportunities– Talented key staff in HPC, systems, and operations– Space for growth– IBM Austin across the street– Almost every other major HPC vendor has large presence in

Austin– UT Austin has both quality and scale in sciences,

engineering, CS– UT and Texas have unparalleled internal & external support

(pride is not always a vice)– Austin is a fantastic place to live (and recruit)

Page 12: S CICOM P, IBM, and TACC: Then, Now, and Next

Moving to TACC!

• TEXAS-SIZED opportunities– Talented key staff in HPC, systems, and operations– Space for growth– IBM Austin across the street– Almost every other major HPC vendor has large presence in

Austin– UT Austin is has both quality and scale in sciences,

engineering, CS– UT and Texas have unparalleled internal & external support

(pride is not always a vice)– Austin is fantastic place to live (and recruit)

Page 13: S CICOM P, IBM, and TACC: Then, Now, and Next

Moving to TACC!

• TEXAS-SIZED opportunities– Talented key staff in HPC, systems, and operations– Space for growth– IBM Austin across the street– Almost every other major HPC vendor has large presence in

Austin– UT Austin is has both quality and scale in sciences,

engineering, CS– UT and Texas have unparalleled internal & external support

(pride is not always a vice)– Austin is fantastic place to live (and recruit)– I got the chance to build something else good and important

Page 14: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Mission

To enhance the research & education programs

of The University of Texas at Austin and its partners

through research, development, operation & support

of advanced computing technologies.

Page 15: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Strategy

To accomplish this mission, TACC:

– Evaluates, acquires & operatesadvanced computing systems

– Provides training, consulting, anddocumentation to users

– Collaborates with researchers toapply advanced computing techniques

– Conducts research & development toproduce new computational technologies

Resources & Services

Research & Development

Page 16: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Advanced ComputingTechnology Areas

• High Performance Computing (HPC)numerically intensive computing: produces data

Page 17: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Advanced ComputingTechnology Areas

• High Performance Computing (HPC)numerically intensive computing: produces data

• Scientific Visualization (SciVis)rendering data into information & knowledge

Page 18: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Advanced ComputingTechnology Areas

• High Performance Computing (HPC)numerically intensive computing: produces data

• Scientific Visualization (SciVis)rendering data into information & knowledge

• Data & Information Systems (DIS)managing and analyzing data for information & knowledge

Page 19: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Advanced ComputingTechnology Areas

• High Performance Computing (HPC)numerically intensive computing: produces data

• Scientific Visualization (SciVis)rendering data into information & knowledge

• Data & Information Systems (DIS)managing and analyzing data for information & knowledge

• Distributed and Grid Computing (DGC)integrating diverse resources, data, and people to produce

and share knowledge

Page 20: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Activities & Scope

Research

Development

Services

Resources

EOT 2004 2005

HPC SciVis DGC DIS Network

Since 1986

Since 2001!

Page 21: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Applications Focus Areas

• TACC advanced computing technology R&D must be driven by applications

• TACC Applications Focus Areas– Chemistry -> Biosciences– Climate/Weather/Ocean -> Geosciences– CFD

Page 22: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC HPC & Storage Systems

STK PowderHorns (2)2.8 PB max capacity

managed by Cray DMF

IBM Power4 System224 CPUs (1.16 Tflops)

½ TB memory, 7.1 TB disk

IBM Linux Pentium III Cluster64 CPUs (64 Gflops)

32 GB memory, ~1 TB disk

LONGHORN TEJAS

ARCHIVE

Cray-Dell Xeon Linux Cluster1028 CPUs (6.3 Tflops)

1 TB memory, 40+ TB disk

LONESTAR

Sun SANs (2)8 TB / 4 TB

to be expanded

SAN

Page 23: S CICOM P, IBM, and TACC: Then, Now, and Next

ACES VisLab

• Front and Rear Projection Systems– 3x1 cylindrical immersive environment, 24’ diameter– 5x2 large-screen, 16:9 panel tiled display– Full immersive capabilities with head/motion tracking

• High end rendering systems– Sun E25K: 128 processors, ½ TB memory, > 3 Gpoly/sec– SGI Onyx2: 24 CPUs, 6 IR2 Graphics Pipes, 25 GB Memory

• Matrix switch between systems, projectors, rooms

Page 24: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Services

• TACC resources and services include:– Consulting – Training– Technical documentation– Data storage/archival– System selection/configuration consulting– System hosting

Page 25: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC R&D – High Performance Computing

• Scalability, performance optimization, and performance modeling for HPC applications

• Evaluation of cluster technologies for HPC• Portability and performance issues of

applications on clusters• Climate, weather, ocean modeling

collaboration and support of DoD• Starting CFD activities

Page 26: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC R&D – Scientific Visualization

• Feature detection / terascale data analysis • Evaluation of performance characteristics and

capabilities of high-end visualization technologies

• Hardware accelerated visualization and computation on GPUs

• Remote interactive visualization / grid-enabled interactive visualization

Page 27: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC R&D – Data & Information Systems

• Newest technology group at TACC• Initial R&D focused on creating/hosting

scientific data collections• Interests / plans

– Geospatial and biological database extensions– Efficient ways to collect/create metadata– DB clusters / parallel DB I/O for scientific data

Page 28: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC R&D – Distributed & Grid Computing

• Web-based grid portals• Grid resource data collection and information

services• Grid scheduling and workflow• Grid-enabled visualization• Grid-enabled data collection hosting• Overall grid deployment and integration

Page 29: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC R&D - Networking

• Very new activities:– Exploring high-bandwidth (OC-12, GigE, OC-48,

OC192) remote and collaborative grid-enabled visualization

– Exploring network performance for moving terascale data on 10 Gbps networks (TeraGrid)

– Exploring GigE aggregation to fill 10 Gbps networks (parallel file I/O, parallel database I/O)

• Recruiting a leader for TACC networking R&D activities

Page 30: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Growth

• New infrastructure provides UT with comprehensive, balanced, world-class resources:– 50x HPC capability– 20x archival capability– 10x network capability– World-class VisLab– New SAN

• New comprehensive R&D program with focus on impact– Activities in HPC, SciVis, DIS, DGC– New opportunities for professional staff

• 40+ new, wonderful people in 3 years, adding to the excellent core of talented people that have been at TACC for many years

Page 31: S CICOM P, IBM, and TACC: Then, Now, and Next

Summary of My Time with TACCOver Past 3 years

• TACC provides terascale HPC, SciVis, storage, data collection, and network resources

• TACC provides expert support services: consulting, documentation, and training in HPC, SciVis, and Grid

• TACC conducts applied research & development in these advanced computing technologies

• TACC has become one of the leading academic advanced computing centers in years

• I have the best job in the world, mainly becauseI have the best staff in the world (but also because of UT and Austin)

Page 32: S CICOM P, IBM, and TACC: Then, Now, and Next

And one other thing kept me busy the past 3 years…

Page 33: S CICOM P, IBM, and TACC: Then, Now, and Next

What is TACC Doing Now with IBM?

Page 34: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Enable Campus-wide Terascale Distributed Computing

• Vision: provide high-end systems, but move from ‘island’ to hub of campus computing continuum– provide models for local resources (clusters, vislabs, etc.),

training, and documentation– develop procedures for connecting local systems to campus

grid• single sign-on, data space, compute space

• leverage every PC, cluster, NAS, etc. on campus!

– integrate digital assets into campus grid– integrate UT instruments & sensors into campus grid

• Joint project with IBM

Page 35: S CICOM P, IBM, and TACC: Then, Now, and Next

Building a Grid Together

• UT Grid: Joint Between UT and IBM – TACC wants to be leader in e-science – IBM is a leader in e-business – UT Grid enables both to

• Gain deployment experience (IBM Global Services) • Have a R&D testbed

– Deliverables/Benefits • Deployment experience • Grid Zone papers • Other papers

Page 36: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Initial Focus on Computing

• High-throughput parallel computing– Project Rodeo– Use CSF to schedule to LSF, PBS, SGE clusters

across campus– Use Globus 3.2 -> GT4

• High-throughput serial computing– Project Roundup uses United Devices software on

campus PCs– Also interfacing to Condor flock in CS department

Page 37: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Initial Focus on Computing

• Develop CSF adapters for popular resource management systems through collaboration:– LSF: done by Platform Computing– Globus: done by Platform Computing– PBS: partially done– SGE– LoadLeveler– Condor

Page 38: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Initial Focus on Computing

• Develop CSF capability for flexible job requirements:– Serial vs parallel: no diff, just specify Ncpus– Number: facilitate ensembles– Batch: whenever, or by priority– Advanced reservation: needed for coupling, interactive– On-demand: needed for urgency

• Integrate data management for jobs into CSF– SAN makes it easy– GridFTP is somewhat simple, if crude– Avaki Data Grid is a possibility

Page 39: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Initial Focus on Computing

• Completion time in a compute grid is a function of– data transfer times

• Use NWS for network bandwidth predictions, file transfer time predictions (Rich Wolski, UCSB)

– queue wait times• Use new software from Wolski for prediction of start of

execution in batch systems

– application performance times• Use Prophesy (Valerie Taylor) for applications performance

prediction

• Develop CSF scheduling module that is data, network, and performance aware

Page 40: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Full Service!

• UT Grid will offer a complete set of services:– Compute services– Storage services– Data collections services– Visualization services– Instruments services

• But this will take 2 years—focusing on compute services now

Page 41: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid Interfaces

• Grid User Portal– Hosted, built on GridPort– Augment developers by providing info services– Enable productivity by simplifying production usage

• Grid User Node– Hosted, software includes GridShell plus client versions of

all other UT Grid software– Downloadable version enables configuring local Linux box

into UT Grid (eventually, Windows and Mac)

Page 42: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Logical View

• Integrate distributed TACCresources first (Globus, LSF, NWS,SRB, United Devices, GridPort)

TACC HPC,Vis, Storage

(actually spread across two campuses)

Page 43: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Logical View

• Next add other UTresources in one bldg.as spoke usingsame tools andprocedures

TACC HPC,Vis, Storage

ICES Data

ICES Cluster

ICES Cluster

Page 44: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Logical View

• Next add other UTresources in one bldg.as spoke usingsame tools andprocedures

TACC HPC,Vis, Storage

ICES Cluster

ICES Cluster

ICES Cluster

PGE Cluster

PGE Cluster

PGE Data

Page 45: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Logical View

ICES Cluster

ICES Cluster

ICES Cluster

PGE Cluster

PGE Cluster

PGE Data BIO Instrument

BIO Cluster

GEO Instrument

• Next add other UTresources in one bldg.as spoke usingsame tools andprocedures

GEO Data

TACC HPC,Vis, Storage

Page 46: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Logical View

• Finally negotiateconnectionsbetween spokesfor willing participantsto develop a P2P grid.

TACC HPC,Vis, Storage

ICES Data

ICES Cluster

ICES Cluster

PGE Cluster

PGE Cluster

PGE Data BIO Instrument

Bio Cluster

GEO Data

GEO Instrument

Page 47: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Physical ViewTACC Systems

Research campus

Main campus

TACC Vis

NOC

NOC

Ext nets

GAATN

ACES

Switch

TACCPWR4

CMS

TACCStorage

Switch

TACCCluster

Page 48: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Physical ViewAdd ICES Resources

Research campus

Main campus

TACC Vis

NOC

Ext nets

GAATN

ACES

SwitchICES Cluster

ICES Data

ICES Cluster

NOC

TACCPWR4

CMS

TACCStorage

Switch

TACCCluster

Page 49: S CICOM P, IBM, and TACC: Then, Now, and Next

UT Grid: Physical ViewAdd Other Resources

Research campus

Main campus

TACC Vis

NOC

Ext nets

GAATN

ACES

SwitchICES Cluster

ICES Data

ICES Cluster

PGE Cluster

PGE Data

PGE Cluster

Switch

PGE

NOC

TACCPWR4

CMS

TACCStorage

Switch

TACCCluster

Page 50: S CICOM P, IBM, and TACC: Then, Now, and Next

Texas Internet Grid for Research & Education (TIGRE)

• Multi-university grid: Texas, A&M, Houston, Rice, Texas Tech– Build-out in 2004-5– Will integrate additional universities

• Will facilitate academic research capabilities across Texas using Internet2 initially

• Will extend to industrial partners to foster academic/industrial collaboration on R&D

Page 51: S CICOM P, IBM, and TACC: Then, Now, and Next

NSF TeraGrid: National Cyberinfrastructure for Computational Science

• TeraGrid is world’s largest cyerinfrastructure for computational research

• Includes NCSA, SDSC, PSC, Caltech, Argonne, Oak Ridge, Indiana, Purdue

• Massive bandwidth! Each connection is one or more 10 Gbps links!

- TACC will provide terascale computing, storage, - TACC will provide terascale computing, storage, and visualization resources and visualization resources- UT will provide terascale geosciences data sets- UT will provide terascale geosciences data sets

Page 52: S CICOM P, IBM, and TACC: Then, Now, and Next

Where Are We Now?Where are We Going?

Page 53: S CICOM P, IBM, and TACC: Then, Now, and Next

The Buzz Words

• Clusters, Clusters, Clusters • Grids & Cyberinfrastructure • Data, Data, Data

Page 54: S CICOM P, IBM, and TACC: Then, Now, and Next

Clusters, Clusters, Clusters

• No sense in trying to make long-term predictions here– 64-bit is going to be more important (duh)—but is

not yet (for most workloads)– Evaluate options, but differences are not so great

(for diverse workloads)– Pricing is generally normalized to performance (via

sales) for commodities

Page 55: S CICOM P, IBM, and TACC: Then, Now, and Next

Grids & Cyberinfrastructure Are Coming… Really!

• ‘The Grid’ is coming… eventually– The concept of a Grid was ahead of the standards– But we all use distributed computing anyway, and the

advantages are just too big not to solve the issues– Still have to solve many of the same distributed computing

research problems (but at least now we have standards to develop to)

• ‘grid computing’ is here… almost– WSRF means finally getting the standards right– Federal agencies and companies alike are investing heavily

in good projects and starting to see results

Page 56: S CICOM P, IBM, and TACC: Then, Now, and Next

TACC Grid Tools and Deployments

• Grid Computing Tools– GridPort: transparent grid computing from Web– GridShell: transparent grid computing from CLI– CSF: grid scheduling– GridFlow / GridSteer: for coupling vis, steering simulations

• Cyberinfrastructure Deployments– TeraGrid: national cyberinfrastructure– TIGRE: state-wide cyberinfrastructure– UT Grid: campus cyberinfrastructure for research &

education

Page 57: S CICOM P, IBM, and TACC: Then, Now, and Next

Data, Data, Data

• Our ability to create and collect data (computing systems, instruments, sensors) is exploding

• Availability of data even driving new modes of science (e.g., bioinformatics)

• Data availability and need for sharing, analysis, is driving the other aspects of computing– Need for 64-bit microprocessors, improved memory systems– Parallel file I/O– Use of scientific databases, parallel databases– Increased network bandwidth– Grids for managing, sharing remote data

Page 58: S CICOM P, IBM, and TACC: Then, Now, and Next

Renewed U.S. Interest in HEC Will Have Impact

• While clusters are important, ‘non-clusters’ are still important!!!– Projects like IBM Blue Gene/L, Cray Red Storm,

etc. address different problems than clusters– DARPA HPCS program is really important, but

only a start– Strategic national interests require national

investment!!!– I think we’ll see more federal funding for innovative

research into computer systems

Page 59: S CICOM P, IBM, and TACC: Then, Now, and Next

Visualization Will Catch Up

• Visualization often lags behind HPC, storage– Flops get publicity– Bytes can’t get lost– Even Rainman can’t get insight from terabytes of

0’s and 1’s

• Explosion in data creates limitations requiring– Feature detection (good)– Downsizing problem (bad)– Downsampling data (ugly)

Page 60: S CICOM P, IBM, and TACC: Then, Now, and Next

Visualization Will Catch Up

• As PCs impacted HPC, so will are graphics cards impacting visualization– Custom SMP systems using graphics cards (Sun,

SGI)– Graphics clusters (Linux, Windows)

• As with HPC, still a need for custom, powerful visualization solutions on certain problems– SGI has largely exited this market– IBM left long ago—please come back!– Again, requires federal investment

Page 61: S CICOM P, IBM, and TACC: Then, Now, and Next

What Should You Do This Week?

Page 62: S CICOM P, IBM, and TACC: Then, Now, and Next

Austin is Fun, Cool, Weird, & Wonderful

• Mix of hippies, slackers, academics, geeks, politicos, musicians, and cowboys

• “Keep Austin Weird”• Live Music Capital of the World (seriously)• Also great restaurants, cafes, clubs, bars,

theaters, galleries, etc.– http://www.austinchronicle.com/– http://www.austin360.com/xl/content/xl/index.html– http://www.research.ibm.com/arl/austin/index.html

Page 63: S CICOM P, IBM, and TACC: Then, Now, and Next

Your Austin To-Do List

Eat barbecue at Rudy’s, Stubb’s, Iron Works, Green Mesquite, etc. Eat Tex-Mex and at Chuy’s, Trudy’s, Maudie’s, etc. Have a cold Shiner Bock (not Lone Star) Visit 6th Street and Warehouse District at night See sketch comedy at Esther’s Follies Go to at least one live music show Learn to two-step at The Broken Spoke Walk/jog/bike around Town Lake See a million bats emerge from Congress Ave. bridge at sunset Visit the Texas State History Museum Visit the UT main campus See movie at Alamo Drafthouse Cinema (arrive early, order beer & food) See the Round Rock Express at the Dell Diamond Drive into Hill Country, visit small towns and wineries Eat Amy’s Ice Cream Listen to and buy local music at Waterloo Records Buy a bottle each of Rudy’s Barbecue Sause and Tito’s Vodka

Page 64: S CICOM P, IBM, and TACC: Then, Now, and Next

Final Comments & Thoughts

• I’m very pleased to see SCICOMP is still going strong– Great leaders and a great community make it last

• Still a need for groups like this– technologies get more powerful, but not necessarily simpler, and

impact comes from effective utilization

• More importantly, always a need for energetic, talented people to make a difference in advanced computing– Contribute to valuable efforts– Don’t be afraid to start something if necessary– Change is good (even if “the only thing certain about change is that

things will be different afterwards”)

• Enjoy Austin!– Ask any TACC staff about places to go and things to do

Page 65: S CICOM P, IBM, and TACC: Then, Now, and Next

More About TACC:

Texas Advanced Computing [email protected]

(512) 475-9411


Top Related