S CICOM P, IBM, and TACC: Then, Now, and Next

Download S CICOM P, IBM, and TACC: Then, Now, and Next

Post on 21-Jan-2016

31 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

S CICOM P, IBM, and TACC: Then, Now, and Next. Jay Boisseau, Director Texas Advanced Computing Center The University of Texas at Austin August 10, 2004. Precautions. - PowerPoint PPT Presentation

TRANSCRIPT

<ul><li><p>SCICOMP, IBM, and TACC:Then, Now, and Next Jay Boisseau, DirectorTexas Advanced Computing CenterThe University of Texas at AustinAugust 10, 2004</p></li><li><p>PrecautionsThis presentation contains some historical recollections from over 5 years ago. I cant usually recall what I had for lunch yesterday.This presentation contains some ideas on where I think things might be going next. If I cant recall yesterdays lunch, it seems unlikely that I can predict anything.This presentation contains many tongue-in-cheek observations, exaggerations for dramatic effect, etc.This presentation may cause boredom, drowsiness, nausea, or hunger.</p></li><li><p>OutlineWhy Did We Create SCICOMP 5 Years Ago? What Did I Do with My Summer (and the Previous 3 Years)? What is TACC Doing Now with IBM? Where Are We Now? Where Are We Going? </p></li><li><p>Why Did We Create SCICOMP5 Years Ago?</p></li><li><p>The Dark Ages of HPCIn late 1990s, most supercomputing was accomplished on proprietary systems from IBM, HP, SGI (including Cray), etc.User environments were not very friendlyLimited development environment (debuggers, optimization tools, etc.)Very few cross platform toolsDifficult programming tools (MPI, OpenMP some things havent changed)</p></li><li><p>Missing Cray ResearchCray was no longer the dominant company, and it showedTrend towards commoditization had begunSystems were not balancedCray T3Es were used longer than any production MPPSoftware for HPC was limited, not as reliableWho doesnt miss real checkpoint/restart, automatic performance monitoring, no weekly PM downtime, etc.?Companies were not as focused on HPC/research customers as on larger markets</p></li><li><p>1998-99: Making Things BetterJohn Levesque hired by IBM to start the Advanced Computing Technology CenterGoal: ACTC should provide to customers what Cray Research used to provideJay Boisseau became first Associate Director of Scientific Computing at SDSCGoal: Ensure SDSC helped users migrate from Cray T3E to IBM SP and do important, effective computational research</p></li><li><p>Creating SCICOMPJohn and Jay hosted workshop at SDSC in March 1999 open to users and center staffto discuss current state, issues, techniques, and results in using IBM systems for HPCSP-XXL already existed, but was exclusive and more systems-orientedSuccess led to first IBM SP Scientific Computing User Group meeting (SCICOMP) in August 1999 in Yorktown Heights Jay as first directorSecond meeting held in early 2000 at SDSCIn late 2000, John &amp; Jay invited international participation in SCICOMP at IBM ACTC workshop in Paris</p></li><li><p>What Did I Do with My Summer(and the Previous 3 Years)? </p></li><li><p>Moving to TACC?In 2001, I accepted job as director of TACCMajor rebuilding task:Only 14 staffNo R&amp;D programsOutdated HPC systemsNo visualization, grid computing or data-intensive computingLittle fundingNot much profilePast political issues</p></li><li><p>Moving to TACC!But big opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal &amp; external support (pride is not always a vice)Austin is a fantastic place to live (and recruit) </p></li><li><p>Moving to TACC!TEXAS-SIZED opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin is has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal &amp; external support (pride is not always a vice)Austin is fantastic place to live (and recruit) </p></li><li><p>Moving to TACC!TEXAS-SIZED opportunitiesTalented key staff in HPC, systems, and operationsSpace for growthIBM Austin across the streetAlmost every other major HPC vendor has large presence in AustinUT Austin is has both quality and scale in sciences, engineering, CSUT and Texas have unparalleled internal &amp; external support (pride is not always a vice)Austin is fantastic place to live (and recruit)I got the chance to build something else good and important</p></li><li><p>TACC MissionTo enhance the research &amp; education programs of The University of Texas at Austin and its partners through research, development, operation &amp; support of advanced computing technologies.</p></li><li><p>TACC StrategyTo accomplish this mission, TACC: Evaluates, acquires &amp; operates advanced computing systemsProvides training, consulting, and documentation to users Collaborates with researchers to apply advanced computing techniquesConducts research &amp; development to produce new computational technologiesResources &amp; ServicesResearch &amp; Development</p></li><li><p>TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces data</p></li><li><p>TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information &amp; knowledge</p></li><li><p>TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information &amp; knowledgeData &amp; Information Systems (DIS)managing and analyzing data for information &amp; knowledge</p></li><li><p>TACC Advanced ComputingTechnology AreasHigh Performance Computing (HPC)numerically intensive computing: produces dataScientific Visualization (SciVis)rendering data into information &amp; knowledgeData &amp; Information Systems (DIS)managing and analyzing data for information &amp; knowledgeDistributed and Grid Computing (DGC)integrating diverse resources, data, and people to produce and share knowledge</p></li><li><p>TACC Activities &amp; ScopeSince 1986Since 2001!</p></li><li><p>TACC Applications Focus AreasTACC advanced computing technology R&amp;D must be driven by applicationsTACC Applications Focus AreasChemistry -&gt; BiosciencesClimate/Weather/Ocean -&gt; GeosciencesCFD</p></li><li><p>TACC HPC &amp; Storage SystemsSTK PowderHorns (2)2.8 PB max capacitymanaged by Cray DMFIBM Power4 System224 CPUs (1.16 Tflops) TB memory, 7.1 TB diskIBM Linux Pentium III Cluster64 CPUs (64 Gflops)32 GB memory, ~1 TB diskLONGHORNTEJASARCHIVECray-Dell Xeon Linux Cluster 1028 CPUs (6.3 Tflops)1 TB memory, 40+ TB diskLONESTARSun SANs (2)8 TB / 4 TBto be expandedSAN</p></li><li><p>ACES VisLabFront and Rear Projection Systems3x1 cylindrical immersive environment, 24 diameter5x2 large-screen, 16:9 panel tiled displayFull immersive capabilities with head/motion tracking</p><p>High end rendering systemsSun E25K: 128 processors, TB memory, &gt; 3 Gpoly/secSGI Onyx2: 24 CPUs, 6 IR2 Graphics Pipes, 25 GB MemoryMatrix switch between systems, projectors, rooms</p></li><li><p>TACC ServicesTACC resources and services include:Consulting TrainingTechnical documentationData storage/archivalSystem selection/configuration consultingSystem hosting</p></li><li><p>TACC R&amp;D High Performance ComputingScalability, performance optimization, and performance modeling for HPC applicationsEvaluation of cluster technologies for HPCPortability and performance issues of applications on clustersClimate, weather, ocean modeling collaboration and support of DoDStarting CFD activities</p></li><li><p>TACC R&amp;D Scientific VisualizationFeature detection / terascale data analysis Evaluation of performance characteristics and capabilities of high-end visualization technologiesHardware accelerated visualization and computation on GPUsRemote interactive visualization / grid-enabled interactive visualization</p></li><li><p>TACC R&amp;D Data &amp; Information SystemsNewest technology group at TACCInitial R&amp;D focused on creating/hosting scientific data collectionsInterests / plansGeospatial and biological database extensionsEfficient ways to collect/create metadataDB clusters / parallel DB I/O for scientific data</p></li><li><p>TACC R&amp;D Distributed &amp; Grid ComputingWeb-based grid portalsGrid resource data collection and information servicesGrid scheduling and workflowGrid-enabled visualizationGrid-enabled data collection hostingOverall grid deployment and integration</p></li><li><p>TACC R&amp;D - NetworkingVery new activities:Exploring high-bandwidth (OC-12, GigE, OC-48, OC192) remote and collaborative grid-enabled visualizationExploring network performance for moving terascale data on 10 Gbps networks (TeraGrid)Exploring GigE aggregation to fill 10 Gbps networks (parallel file I/O, parallel database I/O)Recruiting a leader for TACC networking R&amp;D activities</p></li><li><p>TACC GrowthNew infrastructure provides UT with comprehensive, balanced, world-class resources:50x HPC capability20x archival capability10x network capabilityWorld-class VisLabNew SANNew comprehensive R&amp;D program with focus on impactActivities in HPC, SciVis, DIS, DGCNew opportunities for professional staff40+ new, wonderful people in 3 years, adding to the excellent core of talented people that have been at TACC for many years</p></li><li><p>Summary of My Time with TACCOver Past 3 yearsTACC provides terascale HPC, SciVis, storage, data collection, and network resourcesTACC provides expert support services: consulting, documentation, and training in HPC, SciVis, and GridTACC conducts applied research &amp; development in these advanced computing technologiesTACC has become one of the leading academic advanced computing centers in yearsI have the best job in the world, mainly because I have the best staff in the world (but also because of UT and Austin)</p></li><li><p>And one other thing kept me busy the past 3 years</p></li><li><p>What is TACC Doing Now with IBM? </p></li><li><p>UT Grid: Enable Campus-wide Terascale Distributed ComputingVision: provide high-end systems, but move from island to hub of campus computing continuumprovide models for local resources (clusters, vislabs, etc.), training, and documentationdevelop procedures for connecting local systems to campus gridsingle sign-on, data space, compute spaceleverage every PC, cluster, NAS, etc. on campus!integrate digital assets into campus gridintegrate UT instruments &amp; sensors into campus gridJoint project with IBM</p></li><li><p>Building a Grid TogetherUT Grid: Joint Between UT and IBM TACC wants to be leader in e-science IBM is a leader in e-business UT Grid enables both to Gain deployment experience (IBM Global Services) Have a R&amp;D testbed Deliverables/Benefits Deployment experience Grid Zone papers Other papers </p></li><li><p>UT Grid: Initial Focus on ComputingHigh-throughput parallel computingProject RodeoUse CSF to schedule to LSF, PBS, SGE clusters across campusUse Globus 3.2 -&gt; GT4High-throughput serial computingProject Roundup uses United Devices software on campus PCsAlso interfacing to Condor flock in CS department</p></li><li><p>UT Grid: Initial Focus on ComputingDevelop CSF adapters for popular resource management systems through collaboration:LSF: done by Platform ComputingGlobus: done by Platform ComputingPBS: partially doneSGELoadLevelerCondor</p></li><li><p>UT Grid: Initial Focus on ComputingDevelop CSF capability for flexible job requirements:Serial vs parallel: no diff, just specify NcpusNumber: facilitate ensemblesBatch: whenever, or by priorityAdvanced reservation: needed for coupling, interactiveOn-demand: needed for urgencyIntegrate data management for jobs into CSFSAN makes it easyGridFTP is somewhat simple, if crudeAvaki Data Grid is a possibility</p></li><li><p>UT Grid: Initial Focus on ComputingCompletion time in a compute grid is a function ofdata transfer timesUse NWS for network bandwidth predictions, file transfer time predictions (Rich Wolski, UCSB)queue wait timesUse new software from Wolski for prediction of start of execution in batch systemsapplication performance timesUse Prophesy (Valerie Taylor) for applications performance predictionDevelop CSF scheduling module that is data, network, and performance aware</p></li><li><p>UT Grid: Full Service!UT Grid will offer a complete set of services:Compute servicesStorage servicesData collections servicesVisualization servicesInstruments servicesBut this will take 2 yearsfocusing on compute services now</p></li><li><p>UT Grid InterfacesGrid User PortalHosted, built on GridPortAugment developers by providing info servicesEnable productivity by simplifying production usageGrid User NodeHosted, software includes GridShell plus client versions of all other UT Grid softwareDownloadable version enables configuring local Linux box into UT Grid (eventually, Windows and Mac)</p></li><li><p>UT Grid: Logical ViewIntegrate distributed TACC resources first (Globus, LSF, NWS, SRB, United Devices, GridPort)TACC HPC,Vis, Storage(actually spread across two campuses)</p></li><li><p>UT Grid: Logical View</p><p>Next add other UT resources in one bldg. as spoke using same tools and proceduresTACC HPC,Vis, StorageICES DataICES ClusterICES Cluster</p></li><li><p>UT Grid: Logical View</p><p>Next add other UT resources in one bldg. as spoke using same tools and procedures</p><p>TACC HPC,Vis, StorageICES ClusterICES ClusterICES ClusterPGE ClusterPGE ClusterPGE Data</p></li><li><p>UT Grid: Logical View</p><p>Next add other UT resources in one bldg. as spoke using same tools and procedures</p><p>ICES ClusterICES ClusterICES ClusterPGE ClusterPGE ClusterPGE DataBIO InstrumentBIO ClusterGEO InstrumentGEO DataTACC HPC,Vis, Storage</p></li><li><p>UT Grid: Logical View</p><p>Finally negotiate connections between spokes for willing participants to develop a P2P grid.TACC HPC,Vis, StorageICES DataICES ClusterICES ClusterPGE ClusterPGE ClusterPGE DataBIO InstrumentBio ClusterGEO DataGEO Instrument</p></li><li><p>UT Grid: Physical ViewTACC SystemsResearch campusMain campusTACC VisNOCNOCExt netsGAATNACESSwitchTACCPWR4CMSTACCStorageSwitchTACCCluster</p></li><li><p>UT Grid: Physical ViewAdd ICES ResourcesResearch campusMain campusTACC VisNOCExt netsGAATNACESSwitchICES ClusterICES DataICES ClusterNOCTACCPWR4CMSTACCStorageSwitchTACCCluster</p></li><li><p>UT Grid: Physical ViewAdd Other ResourcesResearch campusMain campusTACC VisNOCExt netsGAATNACESSwitchICES ClusterICES DataICES ClusterPGE ClusterPGE DataPGE ClusterSwitchPGENOCTACCPWR4CMSTACCStorageSwitchTACCCluster</p></li><li><p>Texas Internet Grid for Research &amp; Education (TIGRE)Multi-university grid: Texas, A&amp;M, Houston, Rice, Texas TechBuild-out in 2004-5Will integrate additional universitiesWill facilitate academic research capabilities across Texas using Internet2 initiallyWill extend to industrial partners to foster academic/industrial collaboration on R&amp;D</p></li><li><p>NSF TeraGrid: National Cyberinfrastructure for Computational ScienceTeraGrid is worlds largest cyerinfrastructure for computational researchIncludes NCSA, SDSC, PSC, Caltech, Argonne, Oak Ridge, Indiana, PurdueMassive bandwidth! Each connection is one or more 10 Gbps links!- TACC will provide terascale computing, storage, and visualization resources- UT will provide terascale geosciences data sets</p></li><li><p>Where Are We Now?Where are We Going?</p></li><li><p>The Buzz WordsClusters, Clusters, Clusters Grids &amp; Cyberinfrastructure Data, Data, Data</p></li><li><p>Clusters, Clusters, ClustersNo sense in trying to make long-term predictions here64-bit is going to be more important (duh)but is not yet (for most workloads)Evaluate options, but differences are not so great (for diverse workloads)Pricing is generally normalized to performance (via sales) for commodities</p></li><li><p>Grids &amp; Cyberinfrastructure Are Coming Really!The Grid is coming eventuallyThe concept of a Grid was ahead of the standardsBut we all use distributed computing anyway, and the advantages are just too big not to solve the issuesStill...</p></li></ul>