supercomputer best practices...

15
© 2004 IBM Corporation IBM Systems and Technology Group May 11/12, 2005 David V. Gelardi VP, Industry Solutions and Proof of Concept Centers [email protected] Supercomputer Best Practices Seminar I HPC Platforms in the Industry Solutions and Proof of Concept Centers

Upload: others

Post on 30-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

May 11/12, 2005

David V. GelardiVP, Industry Solutions and Proof of Concept Centers [email protected]

Supercomputer Best Practices Seminar I

HPC Platforms in the Industry Solutions and Proof of Concept Centers

Page 2: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Agenda

Industry Solutions and Proof of Concept CentersMissions

Locations

Skills and Resources Supercomputing center best practices

Performance Tuning

Leveraging Results

Page 3: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Industry Solutions and Proof of Concept Centers

Design Center foron Demand Business

Customer BenchmarkProof of Concept

Deep ComputingCapacity on Demand

Development Center forSolution Integration

High AvailabilityCenter of Competency

High Availability

pSeries

zSeriesiSeries

xSeriesIBM TotalStorage

GRIDWebSphere DB2

SAP

Oracle

IT Architects

Project Managers

IT SpecialistsSystem Administration

Database AdministrationWAS

Tivoli

AIX

Middleware

Linux

i5/OS

z/OS

Security

SAP

Cisco

Virtualization

HPCAIX

i5/OS

z/OS

Windows

DB2

OracleNetworking

BladeCenter OpenPower

Linux Cluster

Blue Gene

Tactical, Closing Sales Driving Value, InfluencingClients

P&L CPU’s by the HourFirst of a Kind forOn Demand Technology

Solution OfferingArchitecture and Integration

STG PerformanceMarketing

eServer and TotalStorageBenchmark Planning & Marketing

20041,000+ Engagements

$1B STG Revenue7,000+ CPU’s

1.5 Petabytes Storage15 Locations WW

Page 4: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

HPC MissionsWW Customer Benchmark Centers

Provide customer demanded benchmark capability WW for eServer and TotalStorage

Include proof of concept, scaling and performanceAssist in the execution of ISV Application benchmarks

Deep Computing Capacity on Demand CentersDevelop and deploy a Capacity on Demand offering that provides clients with an

alternative means to meet their peak computational needs, do so in a manner that is complementary with traditional server and storage sales.

Provide increased Business Value to our customers through innovative On-Demand delivery mechanisms.Provide a competitive differentiator for IBM.Drive incremental revenue and profit.

Performance MarketingLead series in the creation of plans for industry benchmarks and assist in the

marketing of results (e.g. TOP500)Participate in planning with Series product marketing and development to

optimize cross-series plans and leverage cross-series opportunitiesCreate eServer and TotalStorage marketing collateral and provide customized

assistance to sales opportunities

Page 5: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

WW Benchmark & Design CentersPoughkeepsie

pSerieszSeriesxSeriesDesign CenterDCCoDHACoCLinux Clusters

MontpellierpSeriesiSerieszSeriesxSeriesDesign CenterFSS SolutionsDCCoDStorageLinux ClustersKirkland

xSeries (MS)

Rochester iSeriesDCCoDLife Sciences

Solutions

DallaspSeriesxSeries

Gaithersburg zSeriesStorage

TokyopSeriesxSeriesDesign CenterStorage

MainzStorage

BoeblingenzSeries (Linux)

Other Regional FacilitiesSeoul pSeriesSydney pSeriesSilicon Valley Lab HVWSGreenock xSeries

Examples of ProjectsDesign EngagementsHPC BenchmarksCommercial BenchmarksISV Enablement &

Benchmarking (w/R. Warren)BICoC (w/SWG)

HoustonDCCoD

BeijingpSeriesxSeriesiSerieszSeries

Page 6: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

What is a Client Benchmark

A Demand to Prove a 'Capability' to a Specific ClientOften Part of an RFP ResponseOften CompetitiveClient Sometimes Needs Help Defining RequirementsCategories of Information are Performance, Scaling and Proof of Capability, etc.Client Sets Criteria against their Data and WorkloadStrict Response Date, Usually Very Short Term

Different from...Industry Standard Benchmarks (usually done in development)TPC-C, TPC-H, SPECnnn, STREAM , Pallas, NPB, Linpack HPL etc.Application Benchmarks (usually done in ISV Enablement)SAP SD, Peoplesoft, BAAN, Siebel, etc.

Page 7: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

The Challenge: Optimizing HPC Capacity UtilizationTraditional infrastructure build-out increases in step-function phasesCompanies that build for average demand must be able to respond quickly to peak workload demands or suffer lost opportunity Companies that over-build to address peak workloads are left with over-capacity and under-utilization in business downturnIBM Deep Computing Capacity on Demand serves unfulfilled peak workload requirements

In-house Resources or Outsourced Data Center

Time

Cap

acity

UnfulfilledBusiness Peaks

UnutilizedResources

Page 8: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Management of Resources Multiple clients access “virtual”clusters made up of compute and storage resources

Dedicated to one client at a time – dedicated custom environment or– timesliced optimized environment (no root access, job scheduling)

Some resources open to all for testing, prototyping, tuningAllocations subject to availability

Highly secure and resilient infrastructure Client components

Application software and licensesDataCustom hardware

Compute NodeStorage Nodes + Storage HWManagement Node

Client A

Client B

Clients C, D, E Open nodes

Master Mgmt NodeSwitches

VPN Router

Tape Server

Page 9: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

HPC Benchmark Systemsp5 Machines Open Powerp520 x 1 p720 x 4p550 x 2p570 x 26p595 x 3

p4 Machines PowerPCp615 x 1 JS20 x 8p630 x 2p630+ x 6p655+ x 12p690 x 1p690+ x 1p690++ x 1

Note: Quantities can vary based on active projectsAccess to Loaner equipment leveraged

Clusters32 x p575 w/ High Performance Switch8 x p575 w/ Gigabit Ethernet

16 x p710 w/ Gigabit Ethernet/Myrinet2 x Blue Gene

StorageESS x 1 DS4500 x 26SAN Switches x 11EXP 700 x 117

Pentium4x335 x 16

Intel EM64Tx336 x 36 w/Myrinet

and IBx346 x 4

Opteron250e325 x 64e326 x 68 w/ Myrinet

and IB

Xeonx345 x 26

Page 10: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

IBM DCCoD: Scale Beyond In-House HPC Limits

Client HPCInfrastructure

Virtual Private Network

Houston, TX

Poughkeepsie, NY Montpellier, France

= Variable Capacity / Variable Cost

= Fixed Capacity / Fixed Cost

4 DCCoD centersIntel® Xeon ™AMD Opteron™IBM POWER™IBM Blue Gene®

7000+ CPUs

Rochester, MNNew!Available

March 2005

Secure Internet access to supercomputing power owned and hosted by IBM enables clients to rapidly and temporarily flex up/down HPC capacity proportional to business demands - to

respond to peak workloads and capture business opportunities that would otherwise be out of reach.

IBM HPC Grid

Page 11: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

IBM Blue Gene® Capacity on Demand

Rack2,048 PowerPC® CPUs

2/8/5.6 TF512GB Memory

Flexible and convenient pay-for-use access to reserved capacity for scientists, researchers, anddevelopers - to a new family of supercomputers optimized for scalability, bandwidth, and massive data handling while consuming a fraction of the power and floor space required by today’s fastest systems.

Multi-use on demand access to …Up to 2x 1,024 dual PowerPC®processor compute nodes per rack64 IO Nodes per rackFront-End Node I/O File Server Service Node

… on a sub-rack basis

World’s fastest supercomputer!Ultra scalable performance Ultra floor space densityUltra performance per W of powerInnovative architecture and system designFamiliar programmer/user Linux-based Environments

#1 on TOP50070.72 TFLOPs

sustained

AvailableMarch 2005

Page 12: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Skills Scientific and technical application skills

Architecture portingAlgorithmsCompilers: Fortran 77, Fortran 90, C , C++, Libraries & ToolsApplication tuningParallel Programming, message passing Performance analysis

System Administration skillsJob scheduling: Load Leveler, PBSOS skills: AIX, Red Hat, SLES distributionssystem tuning SW installs and systems management, ID managementInterconnect: High Performance Switch, Myrinet, Topspin IB, Voltaire IBDisk subsystems, GPFSTroubleshooting

Infrastructure SkillsHardware provisioningNetworkingSecurity and Access

Project Management skills

Page 13: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

HPC Typical Workloads

Cluster SizeStandalone Large

Uni

High

MPI Codes

Ope

nMP

Cod

es

Univer

sities

CAE:Nastran,STAR-CD

Petroleum:ReservoirModeling

Weather: MM5Life Sci: Blast

Petroleum:SeismicGov't: RYO

Life Sci: Charmm,

AmberCAE:Fluent,

AbaqusLife Sci:Gaussian

Kernels

Page 14: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Performance Tuning Process

Result correctness is the basis of all tuningReference data from customerComparison of results from very different levels of compiler optimizations.

Platform analysisProfiling and analyzing code for potential improvements

Compiler optimizationPerformance libraries such as IBM ESSL and MASSExplore auto-parallelization by compiler and ESSLSMP (part of ESSL library) libraryApply hand tuning (optimization, OpenMP parallelization and/or MPI parallelization) based on the profile

Repeat, Repeat, RepeatWithin time and cost/benefit constraints

Page 15: Supercomputer Best Practices Seminartkwon/course/5315/HW/BG/10.SuperComputer-Gelardi.pdfSupercomputer Best Practices SeminarI ... Middleware Linux i5/OS z/OS Security SAP Cisco Virtualization

© 2004 IBM Corporation

IBM Systems and Technology Group

Discussion PointsMission – What is it?Planning is KeyInfrastructure is KeySecurity a NightmarePower / CoolingSkillsBroad Applications (and long lived!)No Test Systems (FOAK)SLAsChiba City Like OS/Software StackBenchmarking v. Production (DCCoD)Metrics / Cust Sat / SurveysUser InteractionRapid Turnover of Assets In/OutWhat is not 7x24x365?