S CICOM P, IBM, and TACC: Then, Now, and Next

Download S CICOM P, IBM, and TACC: Then, Now, and Next

Post on 21-Jan-2016

31 views

Category:

Documents

0 download

DESCRIPTION

S CICOM P, IBM, and TACC: Then, Now, and Next. Jay Boisseau, Director Texas Advanced Computing Center The University of Texas at Austin August 10, 2004. Precautions. - PowerPoint PPT Presentation

TRANSCRIPT

  • SCICOMP, IBM, and TACC:Then, Now, and Next Jay Boisseau, DirectorTexas Advanced Computing CenterThe University of Texas at AustinAugust 10, 2004

  • PrecautionsThis presentation contains some historical recollections from over 5 years ago. I cant usually recall what I had for lunch yesterday.This presentation contains some ideas on where I think things might be going next. If I cant recall yesterdays lunch, it seems unlikely that I can predict anything.This presentation contains many tongue-in-cheek observations, exaggerations for dramatic effect, etc.This presentation may cause boredom, drowsiness, nausea, or hunger.

  • OutlineWhy Did We Create SCICOMP 5 Years Ago? What Did I Do with My Summer (and the Previous 3 Years)? What is TACC Doing Now with IBM? Where Are We Now? Where Are We Going?

  • Why Did We Create SCICOMP5 Years Ago?

  • The Dark Ages of HPCIn late 1990s, most supercomputing was accomplished on proprietary systems from IBM, HP, SGI (including Cray), etc.User environments were not very friendlyLimited development environment (debuggers, optimization tools, etc.)Very few cross platform toolsDifficult programming tools (MPI, OpenMP some things havent changed)

  • Missing Cray ResearchCray was no longer the dominant company, and it showedTrend towards commoditization had begunSystems were not balancedCray T3Es were used longer than any production MPPSoftware for HPC was limited, not as reliableWho doesnt miss real checkpoint/restart, automatic performance monitoring, no weekly PM downtime, etc.?Companies were not as focused on HPC/research customers as on larger markets

  • 1998-99: Making Things BetterJohn Levesque hired by IBM to start the Advanced Computing Technology CenterGoal: ACTC should provide to customers what Cray Research used to provideJay Boisseau became first Associate Director of Scientific Computing at SDSCGoal: Ensure SDSC helped users migrate from Cray T3E to IBM SP and do important, effective computational research

  • Creating SCICOMPJohn and Jay hosted workshop at SDSC in March 1999 open to users and center staffto discuss current state, issues, techniques, and results in using IBM systems for HPCSP-XXL already existed, but was exclusive and more systems-orientedSuccess led to first IBM SP Scientific Computing User Group meeting (SCICOMP) in August 1999 in Yorktown Heights Jay as first directorSecond meeting held in early 2000 at SDSCIn late 2000, John & Jay invited international participation in SCICOMP at IBM ACTC workshop in Paris

  • What Did I Do with My Summer(and the Previous 3 Years)?

  • Moving to TACC?In 2001, I accepted job as director of TACCMajor rebuilding task:Only 14 staffNo R&D programsOutdated HPC systemsNo visualization, grid computing or data-intensive computingLittle fundingNot much profilePast political issues

  • Moving to TACC!But big opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal & external support (pride is not always a vice)Austin is a fantastic place to live (and recruit)

  • Moving to TACC!TEXAS-SIZED opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin is has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal & external support (pride is not always a vice)Austin is fantastic place to live (and recruit)

  • Moving to TACC!TEXAS-SIZED opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin is has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal & external support (pride is not always a vice)Austin is fantastic place to live (and recruit)I got the chance to build something else good and important

  • TACC MissionTo enhance the research & education programs of The University of Texas at Austin and its partners through research, development, operation & support of advanced computing technologies.

  • TACC StrategyTo accomplish this mission, TACC: Evaluates, acquires & operates advanced computing systemsProvides training, consulting, and documentation to users Collaborates with researchers to apply advanced computing techniquesConducts research & development to produce new computational technologiesResources & ServicesResearch & Development

  • TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces data

  • TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information & knowledge

  • TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information & knowledgeData & Information Systems (DIS)managing and analyzing data for information & knowledge

  • TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information & knowledgeData & Information Systems (DIS)managing and analyzing data for information & knowledgeDistributed and Grid Computing (DGC)integrating diverse resources, data, and people to produce and share knowledge

  • TACC Activities & ScopeSince 1986Since 2001!

  • TACC Applications Focus AreasTACC advanced computing technology R&D must be driven by applicationsTACC Applications Focus AreasChemistry -> BiosciencesClimate/Weather/Ocean -> GeosciencesCFD

  • TACC HPC & Storage SystemsSTK PowderHorns (2)2.8 PB max capacitymanaged by Cray DMFIBM Power4 System224 CPUs (1.16 Tflops) TB memory, 7.1 TB diskIBM Linux Pentium III Cluster64 CPUs (64 Gflops)32 GB memory, ~1 TB diskLONGHORNTEJASARCHIVECray-Dell Xeon Linux Cluster 1028 CPUs (6.3 Tflops)1 TB memory, 40+ TB diskLONESTARSun SANs (2)8 TB / 4 TBto be expandedSAN

  • ACES VisLabFront and Rear Projection Systems3x1 cylindrical immersive environment, 24 diameter5x2 large-screen, 16:9 panel tiled displayFull immersive capabilities with head/motion tracking

    High end rendering systemsSun E25K: 128 processors, TB memory, > 3 Gpoly/secSGI Onyx2: 24 CPUs, 6 IR2 Graphics Pipes, 25 GB MemoryMatrix switch between systems, projectors, rooms

  • TACC ServicesTACC resources and services include:Consulting TrainingTechnical documentationData storage/archivalSystem selection/configuration consultingSystem hosting

  • TACC R&D High Performance ComputingScalability, performance optimization, and performance modeling for HPC applicationsEvaluation of cluster technologies for HPCPortability and performance issues of applications on clustersClimate, weather, ocean modeling collaboration and support of DoDStarting CFD activities

  • TACC R&D Scientific VisualizationFeature detection / terascale data analysis Evaluation of performance characteristics and capabilities of high-end visualization technologiesHardware accelerated visualization and computation on GPUsRemote interactive visualization / grid-enabled interactive visualization

  • TACC R&D Data & Information SystemsNewest technology group at TACCInitial R&D focused on creating/hosting scientific data collectionsInterests / plansGeospatial and biological database extensionsEfficient ways to collect/create metadataDB clusters / parallel DB I/O for scientific data

  • TACC R&D Distributed & Grid ComputingWeb-based grid portalsGrid resource data collection and information servicesGrid scheduling and workflowGrid-enabled visualizationGrid-enabled data collection hostingOverall grid deployment and integration

  • TACC R&D - NetworkingVery new activities:Exploring high-bandwidth (OC-12, GigE, OC-48, OC192) remote and collaborative grid-enabled visualizationExploring network performance for moving terascale data on 10 Gbps networks (TeraGrid)Exploring GigE aggregation to fill 10 Gbps networks (parallel file I/O, parallel database I/O)Recruiting a leader for TACC networking R&D activities

  • TACC GrowthNew infrastructure provides UT with comprehensive, balanced, world-class resources:50x HPC capability20x archival capability10x network capabilityWorld-class VisLabNew SANNew comprehensive R&D program with focus on impactActivities in HPC, SciVis, DIS, DGCNew opportunities for professional staff40+ new, wonderful people in 3 years, adding to the excellent core of talented people that have been at TACC for many years

  • Summary of My Time with TACCOver Past 3 yearsTACC provides terascale HPC, SciVis, storage, data collection, and network resourcesTACC provides expert support services: consulting, documentation, and training in HPC, SciVis, and GridTACC conducts applied research & development in these advanced computing technologiesTACC has become one of the leading academic advanced computing centers in yearsI have the best job in the world, mainly because I have the best staff in the world (but also because of UT and Austin)

  • And one other thing kept me busy the past 3 years

  • What is TACC Doing Now with IBM?

  • UT Grid: Enable Campus-wide Terascale Distributed ComputingVision: provide high-end systems, but move from island to hub of campus computing continuumprovide models for local resources (clusters, vislabs, etc.), training, and documentationdevelop procedures for connecting local systems to campus gridsingle sign-on, data space, compute spaceleverage every PC, cluster, NAS, etc. on campus!integrate digital assets into campus gridintegrate UT instruments & sensors into campus gridJoint project with IBM

  • Building a Grid TogetherUT Grid: Joint Between UT and IBM TACC wants to be leader in e-science IBM is a leader in e-business UT Grid enables both to Gain deployment experience (IBM Global Services) Have a R&D testbed Deliverables/Benefits Deployment experience Grid Zone papers Other papers

  • UT Grid: Initial Focus on ComputingHigh-throughput parallel computingProject RodeoUse CSF to schedule to LSF, PBS, SGE clusters across campusUse Globus 3.2 -> GT4High-throughput serial computingProject Roundup uses United Devices software on campus PCsAlso interfacing to Condor flock in CS department

  • UT Grid: Initial Focus on ComputingDevelop CSF adapters for popular resource management systems through collaboration:LSF: done by Platform ComputingGlobus: done by Platform ComputingPBS: partially doneSGELoadLevelerCondor

  • UT Grid: Initial Focus on ComputingDevelop CSF capability for flexible job requirements:Serial vs parallel: no diff, just specify NcpusNumber: facilitate ensemblesBatch: whenever, or by priorityAdvanced reservation: needed for coupling, interactiveOn-demand: needed for urgencyIntegrate data management for jobs into CSFSAN makes it easyGridFTP is somewhat simple, if crudeAvaki Data Grid is a possibility

  • UT Grid: Initial Focus on ComputingCompletion time in a compute grid is a function ofdata transfer timesUse NWS for network bandwidth predictions, file transfer time predictions (Rich Wolski, UCSB)queue wait timesUse new software from Wolski for prediction of start of execution in batch systemsapplication performance timesUse Prophesy (Valerie Taylor) for applications performance predictionDevelop CSF scheduling module that is data, network, and performance aware

  • UT Grid: Full Service!UT Grid will offer a complete set of services:Compute servicesStorage servicesData collections servicesVisualization servicesInstruments servicesBut this will take 2 yearsfocusing on compute services now

  • UT Grid InterfacesGrid User PortalHosted, built on GridPortAugment developers by providing info servicesEnable productivity by simplifying production usageGrid User NodeHosted, software includes GridShell plus client versions of all other UT Grid softwareDownloadable version enables configuring local Linux box into UT Grid (eventually, Windows and Mac)

  • UT Grid: Logical ViewIntegrate distributed TACC resources first (Globus, LSF, NWS, SRB, United Devices, GridPort)TACC HPC,Vis, Storage(actually spread across two campuses)

  • UT Grid: Logical View

    Next add other UT resources in one bldg. as spoke using same tools and proceduresTACC HPC,Vis, StorageICES DataICES ClusterICES Cluster

  • UT Grid: Logical View

    Next add other UT resources in one bldg. as spoke using same tools and procedures

    TACC HPC,Vis, StorageICES ClusterICES ClusterICES ClusterPGE ClusterPGE ClusterPGE Data

  • UT Grid: Logical View

    Next add other UT resources in one bldg. as spoke using same tools and procedures

    ICES ClusterICES ClusterICES ClusterPGE ClusterPGE ClusterPGE DataBIO InstrumentBIO ClusterGEO InstrumentGEO DataTACC HPC,Vis, Storage

  • UT Grid: Logical View

    Finally negotiate connections between spokes for willing participants to develop a P2P grid.TACC HPC,Vis, StorageICES DataICES ClusterICES ClusterPGE ClusterPGE ClusterPGE DataBIO InstrumentBio ClusterGEO DataGEO Instrument

  • UT Grid: Physical ViewTACC SystemsResearch campusMain campusTACC VisNOCNOCExt netsGAATNACESSwitchTACCPWR4CMSTACCStorageSwitchTACCCluster

  • UT Grid: Physical ViewAdd ICES ResourcesResearch campusMain campusTACC VisNOCExt netsGAATNACESSwitchICES ClusterICES DataICES ClusterNOCTACCPWR4CMSTACCStorageSwitchTACCCluster

  • UT Grid: Physical ViewAdd Other ResourcesResearch campusMain campusTACC VisNOCExt netsGAATNACESSwitchICES ClusterICES DataICES ClusterPGE ClusterPGE DataPGE ClusterSwitchPGENOCTACCPWR4CMSTACCStorageSwitchTACCCluster

  • Texas Internet Grid for Research & Education (TIGRE)Multi-university grid: Texas, A&M, Houston, Rice, Texas TechBuild-out in 2004-5Will integrate additional universitiesWill facilitate academic research capabilities across Texas using Internet2 initiallyWill extend to industrial partners to foster academic/industrial collaboration on R&D

  • NSF TeraGrid: National Cyberinfrastructure for Computational ScienceTeraGrid is worlds largest cyerinfrastructure for computational researchIncludes NCSA, SDSC, PSC, Caltech, Argonne, Oak Ridge, Indiana, PurdueMassive bandwidth! Each connection is one or more 10 Gbps links!- TACC will provide terascale computing, storage, and visualization resources- UT will provide terascale geosciences data sets

  • Where Are We Now?Where are We Going?

  • The Buzz WordsClusters, Clusters, Clusters Grids & Cyberinfrastructure Data, Data, Data

  • Clusters, Clusters, ClustersNo sense in trying to make long-term predictions here64-bit is going to be more important (duh)but is not yet (for most workloads)Evaluate options, but differences are not so great (for diverse workloads)Pricing is generally normalized to performance (via sales) for commodities

  • Grids & Cyberinfrastructure Are Coming Really!The Grid is coming eventuallyThe concept of a Grid was ahead of the standardsBut we all use distributed computing anyway, and the advantages are just too big not to solve the issuesStill have to solve many of the same distributed computing research problems (but at least now we have standards to develop to)grid computing is here almostWSRF means finally getting the standards rightFederal agencies and companies alike are investing heavily in good projects and starting to see results

  • TACC Grid Tools and DeploymentsGrid Computing ToolsGridPort: transparent grid computing from WebGridShell: transparent grid computing from CLICSF: grid schedulingGridFlow / GridSteer: for coupling vis, steering simulations

    Cyberinfrastructure DeploymentsTeraGrid: national cyberinfrastructureTIGRE: state-wide cyberinfrastructureUT Grid: campus cyberinfrastructure for research & education

  • Data, Data, DataOur ability to create and collect data (computing systems, instruments, sensors) is explodingAvailability of data even driving new modes of science (e.g., bioinformatics)Data availability and need for sharing, analysis, is driving the other aspects of computingNeed for 64-bit microprocessors, improved memory systemsParallel file I/OUse of scientific databases, parallel databasesIncreased network bandwidthGrids for managing, sharing remote data

  • Renewed U.S. Interest in HEC Will Have ImpactWhile clusters are important, non-clusters are still important!!!Projects like IBM Blue Gene/L, Cray Red Storm, etc. address different problems than clustersDARPA HPCS program is really important, but only a startStrategic national interests require national investment!!!I think well see more federal funding for innovative research into computer systems

  • Visualization Will Catch UpVisualization often lags behind HPC, storageFlops get publicityBytes cant get lostEven Rainman cant get insight from terabytes of 0s and 1sExplosion in data creates limitations requiringFeature detection (good)Downsizing problem (bad)Downsampling data (ugly)

  • Visualization Will Catch UpAs PCs impacted HPC, so will are graphics cards impacting visualizationCustom SMP systems using graphics cards (Sun, SGI)Graphics clusters (Linux, Windows)As with HPC, still a need for custom, powerful visualization solutions on certain problemsSGI has largely exited this marketIBM left long agoplease come back!Again, requires federal investment

  • What Should You Do This Week?

  • Austin is Fun, Cool, Weird, & WonderfulMix of hippies, slackers, academics, geeks, politicos, musicians, and cowboysKeep Austin WeirdLive Music Capital of the World (seriously)Also great restaurants, cafes, clubs, bars, theaters, galleries, etc.http://www.austinchronicle.com/http://www.austin360.com/xl/content/xl/index.htmlhttp://www.research.ibm.com/arl/austin/index.html

  • Your Austin To-Do ListEat barbecue at Rudys, Stubbs, Iron Works, Green Mesquite, etc.Eat Tex-Mex and at Chuys, Trudys, Maudies, etc.Have a cold Shiner Bock (not Lone Star)Visit 6th Street and Warehouse District at nightSee sketch comedy at Esthers FolliesGo to at least one live music showLearn to two-step at The Broken SpokeWalk/jog/bike around Town LakeSee a million bats emerge from Congress Ave. bridge at sunsetVisit the Texas State History MuseumVisit the UT main campusSee movie at Alamo Drafthouse Cinema (arrive early, order beer & food)See the Round Rock Express at the Dell DiamondDrive into Hill Country, visit small towns and wineriesEat Amys Ice CreamListen to and buy local music at Waterloo RecordsBuy a bottle each of Rudys Barbecue Sause and Titos Vodka

  • Final Comments & ThoughtsIm very pleased to see SCICOMP is still going strongGreat leaders and a great community make it lastStill a need for groups like thistechnologies get more powerful, but not necessarily simpler, and impact comes from effective utilizationMore importantly, always a need for energetic, talented people to make a difference in advanced computingContribute to valuable effortsDont be afraid to start something if necessaryChange is good (even if the only thing certain about change is that things will be different afterwards)Enjoy Austin!Ask any TACC staff about places to go and things to do

  • More About TACC:Texas Advanced Computing Centerwww.tacc.utexas.eduinfo@tacc.utexas.edu(512) 475-9411