the ranger supercomputer and it ’ s legacy

20
The Ranger Supercomputer and it’s legacy Dan Stanzione Texas Advanced Computing Center The University of Texas at Austin December 2, 2013 [email protected]

Upload: michi

Post on 13-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

The Ranger Supercomputer and it ’ s legacy. Dan Stanzione Texas Advanced Computing Center The University of Texas at Austin December 2, 2013 [email protected]. The Texas Advanced Computing Center: A World Leader in High Performance Computing. PS Computation 1Mx – network 100x. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  Ranger  Supercomputer and it ’ s legacy

The Ranger Supercomputer and it’s legacy

Dan Stanzione

Texas Advanced Computing Center

The University of Texas at Austin

December 2, 2013

[email protected]

Page 2: The  Ranger  Supercomputer and it ’ s legacy

The Texas Advanced Computing Center: A World Leader in High Performance Computing

1,000,000x performance increase in UT computing capability in 10 years.

Ranger: 62,976 Processor Cores, 123TB RAM, 579 TeraFlops, Fastest Open Science Machine in the World, 2008

Lonestar: 23,000 processors, 44TB RAM, Shared Mem and GPU subsystems, #25 in the world 2011

Stampede: #7 in the world today. Somewhere around half a million processor cores with Intel Sandy Bridge and Intel MIC, Dell: >10 Petaflops.

PS Computation 1Mx – network 100x

Page 3: The  Ranger  Supercomputer and it ’ s legacy

NSF Cyberinfrastructure Strategic Plancirca 2007 – much of this never happened

• NSF Cyberinfrastructure Strategic Planreleased March 2007

– Articulates importance of CI overall– Chapters on computing, data, collaboration,

and workforce development

• NSF investing in world-class computing– Annual “Track2” HPC systems ($30M)– Single “Track1” HPC system in 2011 ($200M)

• Complementary solicitations for software, applications, education

– Software Development for CI (SDCI)– Strategic Technologies for CI (STCI)– Petascale Applications (PetaApps)– CI-Training, Education, Advancement,

Mentoring (CI-TEAM)– Cyber-enabled Discovery & Innovation

(CDI) starting in 2008: $0.75B!

http://www.nsf.gov/od/oci/CI_Vision_March07.pdf

Page 4: The  Ranger  Supercomputer and it ’ s legacy

First NSF Track2 System: 1/2 Petaflop

• TACC selected for first NSF ‘Track2’ HPC system– $30M system acquisition– Sun Constellation Cluster– AMD Opteron processors

• Project included 4 years operations and support– System maintenance– User support– Technology insertion– Extended to 5 years

Page 5: The  Ranger  Supercomputer and it ’ s legacy

Ranger System Summary

• Compute power - 579 Teraflops– 3,936 Sun four-socket blades– 15,744 AMD Opteron “Barcelona” processors

• Quad-core, 2.0 GHz, four flops/cycle (dual pipelines)

• Memory - 125 Terabytes– 2 GB/core, 32 GB/node– 132 GB/s aggregate bandwidth

• Disk subsystem - 1.7 Petabytes– 72 Sun x4500 “Thumper” I/O servers, 24TB each– ~72 GB/sec total aggregate bandwidth– 1 PB in largest /work filesystem

• Interconnect - 10 Gbps / 2.3 sec latency– Sun InfiniBand-based switches (2) with 3456 ports each– Full non-blocking 7-stage Clos fabric– Mellanox ConnectX IB cards

Page 6: The  Ranger  Supercomputer and it ’ s legacy

Ranger I/O Subsystem

• Disk Object Storage Servers (OSS) based on Sun x4500 “Thumper” servers– Each x4500:

• 48 SATA II 500GB drives (24TB total)• running internal software RAID• Dual Socket/Dual-Core Opterons @ 2.6 GHz

– Downside is that these nodes have PCI-X - raw I/O bandwidth can exceed a single PCI-X 4X InfiniBand HCA

• We use dual PCI-X– 72 Servers Total: 1.7 PB raw storage

• Metadata Servers (MDS) based on Sun Fire x4600s

• MDS is Fibre-channel connected to 9TB Flexline Storage

• Target Performance– Aggregate bandwidth: 70+ GB/sec– To largest $WORK filesystem: ~40 GB/sec

Page 7: The  Ranger  Supercomputer and it ’ s legacy

Ranger Space, Power, and Cooling

• Total Project Power: 3.4 MW• System: 2.4 MW

– 96 racks – 82 compute, 12 support, plus 2 switches– 116 APC In-Row cooling units– 2,054 sqft total footprint (~4,500 sqft including PDUs)

• Cooling: ~1 MW– In-row units fed by three 350-ton chillers (N+1)– Enclosed hot-aisles by APC– Supplemental 280-tons of cooling from CRAC units

• Observations:– Space less an issue than power– Cooling > 25kW per rack difficult – Power distribution a challenge, almost 1,400 circuits

Page 8: The  Ranger  Supercomputer and it ’ s legacy

Interconnect ArchitectureRanger InfiniBand Topology

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM

NEM

…78…

NEM

NEM

NEM

NEM“Magnum”

Switch

12x InfiniBand 3 cables combined

Page 9: The  Ranger  Supercomputer and it ’ s legacy

Who Used Ranger?

• On Ranger alone, TACC has ~6,000 users who have run about three million simulations over the last four years. – UT-Austin– UT System (through UT Research Cyberinfrastructure)– Texas A&M and Texas Tech (through Lonestar

Partnership)– Industry (through the STAR program)– Users from around the nation and world (through

NSF’s TeraGrid/XSEDE)

Page 10: The  Ranger  Supercomputer and it ’ s legacy

Japanese Earthquake Simulation

• Simulation of the seismic wave from the earthquake in Japan, propagating through an earth model

• Researchers using TACC’s Ranger supercomputer have modeled the processes responsible for continental drift and plate tectonics in greater detail than any previous simulation.

• Modeling the propagation of seismic waves through the earth is an essential first step to inferring the structure of earth's interior.

• This research is led by Omar Ghattas at The University of Texas at Austin

Page 11: The  Ranger  Supercomputer and it ’ s legacy

Studying H1N1 (“Swine Flu”)

Researchers at the University of Illinois and the University of Utah used Ranger to simulate the molecular dynamics of antiviral drugs interacting with different kinds of flu.

• They discovered how commercial medications reach the “binding pocket” – and why Tamiflu wasn’t working on the new swine flu strain.

• UT researcher Lauren Meyers also used Lonestar to predict the best course of action in the event of an outbreak

Image produced by Brandt Westing, TACC

Page 12: The  Ranger  Supercomputer and it ’ s legacy

Science at the Center of the StormUsing the Ranger supercomputer at the Texas Advanced Computing Center, National Oceanic and Atmospheric Administration (NOAA) scientists and their university colleagues, tracked Hurricane Ike and Gustav during the recent storms.

The real-time, high-resolution global and mesoscale (regional) weather predictions they produced used up to 40,000 processing cores at once — nearly two-thirds of Ranger — and included for the first time data streamed directly from NOAA planes inside the storm. The forecasts also took advantage of ensemble modeling, a method of prediction that runs dozens of simulations with slightly different starting points in order to determine the most likely path and intensity forecasts.

This new method and workflow was only possible because of the massive parallel processing power that TeraGrid resources can devote to complex scientific problems and the interagency collaboration that brought scientists, resources and infrastructure together seamlessly.

A simulation of Hurricane Ike on TACC's Ranger supercomputer shortly before the storm made landfall in Galveston, Texas, on Sept. 13, 2008. Credit: NOAA; Bill Barth, John Cazes, Greg P. Johnson, Romy Schneider and Karl Schulz, TACC

Page 13: The  Ranger  Supercomputer and it ’ s legacy

Large Eddy Simulation of the Near-Nozzle Region of Jets Exhausting from Chevron Nozzles

Noise from jet engines causes hearing damage in the military and angers communities near airports. With funding from NASA, Ali Uzun (Florida State University) is using Ranger to simulate new exhaust designs that may significantly reduce jet noise.

One way to minimize jet noise is to modify the turbulent mixing process using special control devices, such as chevrons—triangle-shaped protrusion at the end of the nozzle. Since noise is a by-product of the turbulent mixing of jet exhaust with ambient air, one can reduce the noise by modifying the mixing process.

To determine how a given design would react to high-speed jet exhaust, Uzun first created a computer model of the chevron-shaped exhaust nozzle. This was then integrated into a parallel simulation code that calculated the turbulence of the air as exhaust was forced through the nozzle. Uzun’s simulations had unprecedented resolution and detail. They proved that computational simulations can match experimental results, while supplying much more detailed information about minute physical processes.

A picture depicting a two-dimensional cut through the jet flow. The picture visualizes the turbulence in the jet flow and the resulting noise radiation away from the jet.

Page 14: The  Ranger  Supercomputer and it ’ s legacy

Ranger Project Costs

• NSF Award: $59M– Purchases full system, plus initial test equipment

– Includes 4 years of system maintenance

– Covers 4 years of operations and scientific support

• UT Austin providing power: $1M/year• UT Austin upgraded data center infrastructure: $10-15M• TACC upgrading storage archival system: $1M• Total cost $75-80M

– Thus, system cost > $50K/operational day

– Must enable user to conduct world-class science every day!

Page 15: The  Ranger  Supercomputer and it ’ s legacy

Ranger-Era TeraGrid HPC Systems

Page 16: The  Ranger  Supercomputer and it ’ s legacy

Big Deployments Always Have Challenges

• We’ve gotten extremely good in bringing in large deployments on time, but it is not an easy process.

– Impossible to rely solely on vendors, must be a cooperative process.

• Ranger slipped several months, and was changed from the original proposed plan:

– Original 2 phase deployment scrapped in favor of a single larger phase. – Several “early product” design flaws detected and corrected through the

course of the project.

Page 17: The  Ranger  Supercomputer and it ’ s legacy

Illustration of example problematic InfiniBand 12X cables as a results of kinks imposed by the initial manufacturing process: (left) dismantled cable with inner foil removed and (b) cracked twinax as seen through a microscope.

Cable Manufacturing Defect

Page 18: The  Ranger  Supercomputer and it ’ s legacy

Ranger: Circa 2007

Page 19: The  Ranger  Supercomputer and it ’ s legacy

Ranger Lives On• 20 Ranger cabinets have been sent to CHPC for

distribution to South African Universities• 16 more racks have been shipped to Tanzania.• 4 awaiting shipment to Botswana• Other components are at Texas A&M, Baylor College of

Medicine, ARL (UT classified facility). • Original Ranger user community now migrated to Stampede. • After a remarkably successful production run, Ranger will

continue to deliver science and educate HPC researchers around the world.

Page 20: The  Ranger  Supercomputer and it ’ s legacy

Ongoing Partnerships

• We at TACC are eager to use Ranger as a basis for building sustained and meaningful collaborations

• Hardware is a start (and there is always the *next* system) but training, staff development, data sharing, etc. provide new opportunities as well.