cray sonexion 3000 storage system · pdf filewp-cray-sonexion-3000-storage-0616 . cray®...

19
WP-Cray-Sonexion-3000-Storage-0616 www.cray.com Cray ® Sonexion ® 3000 Storage System The Cray ® Sonexion ® 3000 scale-out storage system combines powerful servers with the Lustre ® parallel file system and management software in a modular storage product that scales efficiently, reduces TCO and comes performance optimized end to end by Cray.

Upload: truongngoc

Post on 24-Mar-2018

232 views

Category:

Documents


2 download

TRANSCRIPT

WP-Cray-Sonexion-3000-Storage-0616 www.cray.com

Cray® Sonexion® 3000 Storage System The Cray® Sonexion® 3000 scale-out storage system combines powerful servers with the Lustre® parallel file system and management software in a modular storage product that scales efficiently, reduces TCO and comes performance optimized end to end by Cray.

WP-Cray-Sonexion-3000-Storage-0616 Page 2 of 19 www.cray.com

Table of Contents Introduction ............................................................................................................................................ 3 Cray Sonexion 3000 Scale-Out System Overview ................................................................................. 3 System Architecture ............................................................................................................................... 4 TCO Comparison ................................................................................................................................... 6 Hardware Modules ................................................................................................................................. 7

Metadata Units ................................................................................................................................... 8 Scalable Storage Unit (SSU) Hardware ............................................................................................ 10 Expansion Storage Units (ESUs)...................................................................................................... 11 Sonexion Racks ............................................................................................................................... 12

End-to-End System Architecture .......................................................................................................... 13 System Software .................................................................................................................................. 14 Grid RAID ............................................................................................................................................ 15 Reliability and Serviceability................................................................................................................. 16 Power Efficiency and Cooling .............................................................................................................. 17 Service and Support ............................................................................................................................ 17 Cray System Snapshot Analyzer ......................................................................................................... 17 Specifications ...................................................................................................................................... 18

WP-Cray-Sonexion-3000-Storage-0616 Page 3 of 19 www.cray.com

Introduction The Cray Sonexion scale-out Lustre system provides the cornerstone of Cray’s performance-engineered storage solutions for big data and supercomputing. The Sonexion 3000 system improves performance density by up to 40 percent over the competition — delivering more real-world throughput per rack unit than server-attached solutions. In a performance-optimized configuration, this translates to nearly 100 GB/s of sustained performance in a single rack — and over 1.7 TB/s sustained performance in a single file system. Built by Cray, the global leader in storage performance and I/O for supercomputing, the Sonexion system’s modular, pre-integrated and compact design keeps costs low while delivering the right performance for analytics and compute clusters and supercomputers of all types. Performance and capacity scale efficiently in modular building blocks, reducing the number of hard-disk drives needed to achieve sustained performance at scale. Performance is optimized end to end, from the compute clients to the network to the storage subsystem, based on the application workload. Overall, TCO is reduced by up to 25 percent — using fewer components and racks than the competition to achieve your desired performance goals. Data protection is provided through Grid RAID, a declustered parity form of data protection that speeds rebuild times three and a half times over traditional RAID, while maintaining Lustre performance. Management tools including the Cray Systems Snapshot Analyzer (SSA), Cray Sonexion Storage Manager (CSSM) and new management diagnostic infrastructure provide a comprehensive set of health monitoring and management tools, essential to maintaining and supporting Lustre at scale.

Cray Sonexion 3000 Scale-Out System Overview The Sonexion 3000 scale-out storage system embeds everything needed to deploy, scale and manage large-scale parallel storage for big data and supercomputing. Each module combines software and hardware in an appliance-like design to scale efficiently, simplify deployment and management, and reduce the number of physical management points and data center footprint by, in some cases, 40 percent over the competition. The Cray Sonexion 3000 scale-out system is designed specifically for high-throughput big data and supercomputing workloads — for any x86 Linux® computing environment. With an increasing need for faster, more productive high performance computing (HPC) storage systems that are easier to manage, the Sonexion system — as deployed and integrated by Cray — gives users a complete Lustre solution that scales efficiently, reduces TCO, protects data and is performance engineered end to end. The system combines powerful servers with the Lustre parallel file system and management software in a modular storage product that is built, thoroughly tested, shipped and supported as a complete solution.

WP-Cray-Sonexion-3000-Storage-0616 Page 4 of 19 www.cray.com

The Sonexion system simplifies deploying, managing and scaling storage as customers’ needs change over time. It includes three structural sets of components: high-performance scalable storage units (SSUs), a Cray Management Unit (CMU), a Cray Metadata Management Unit (MMU) and network-ready cabinets that include InfiniBand switch connectivity. Customers’ varying capacity and bandwidth requirements can be met and upgraded on the fly. The Sonexion system is shipped as an integrated solution with all management and Lustre software installed, and Cray provides global support and services. Most popular Linux client operating systems for x86 environments are supported (including Red Hat Enterprise Linux, SuSE Linux Enterprise, CentOS and other distributions).

System Architecture The Sonexion 3000 scale-out storage system embeds everything needed to deploy, scale and manage large-scale parallel storage for big data and supercomputing. Each module combines software and hardware in an appliance-like design to scale efficiently, simplify deployment and management, and reduce the number of physical management points and data center footprint by, in some cases, 40 percent over the competition. The Cray Sonexion 3000 scale-out system is designed specifically for high-throughput big data and supercomputing workloads — for any x86 Linux® computing environment. Challenges to this “component-based” or integrate-your-own approach include:

• Planning and deployment time take weeks to months on average. Often these costs are hidden in professional services which, based on $3,000 per day, can add up to $60,000 for one month of protracted services.

• Device- and software-level incompatibilities (both software and hardware) add time, complexity, effort and risk to tune and configure to meet customers’ production requirements and expectations.

• Unnecessary professional services are often required to cope with the complexity of installing, configuring and managing storage component sprawl — from HBAs, cables, host-to-logical unit name (LUN) mappings and configuring RAID groups.

• Professional services are often required to cope with the complexity of installing, configuring and managing server component sprawl — from installing and deploying Lustre across various compute systems to configuring InfiniBand to troubleshooting HBA compatibility.

• Cost and risk sourcing, plus integrating, testing and supporting components from multiple vendors, often leads to finger pointing among vendors.

• Developing and honing in-house Lustre expertise to offset professional services increases liability of and dependence on individuals within the organization.

• Power and cooling costs for extraneous racks of servers and components of controller-based solutions.

WP-Cray-Sonexion-3000-Storage-0616 Page 5 of 19 www.cray.com

Cray reduces TCO by:

• Reducing deployment time from months to days at no charge to the customer • Reducing the total hardware and data center footprint by up 40 percent compared to the

competition • Providing a single point of support for everything – all hardware and software — which

optionally includes compute capabilities • Enabling customer agility — accommodating unpredictable growth rates or real-time changes • Reducing power consumption by up to 30 percent per year over the competition

The Sonexion 3000 system alleviates these issues by providing a single scalable building block that unifies the file system and architecture. This rack-based storage solution is built on high-performance scalable storage units (SSUs) with dense capacity and processing technology that provides up to 3.36 PB of usable file system capacity in a single seven-SSU rack, using 8 TB disk drives. This dense capacity takes up about half the footprint of traditional file system data-storage solutions. The SSU is the heart of the system’s scale-out storage architecture. The SSU consolidates and integrates traditional block storage, network and file system components. Every SSU balances capacity with performance. Moreover, each SSU comes pre-integrated with everything needed to rapidly deploy and scale a Lustre file system — hardware and software. SSUs may be configured to deliver an unmatched level of performance that scales almost linearly with minimal storage or network degradation. The system scales capacity from terabytes to over 50 PB in a single file system. In a capacity-optimized configuration using 8 TB hard drives, Sonexion scales capacity from 480 TB usable capacity per SSU to nearly 3.4 PB of usable capacity in a single 42U (7 SSU) expansion rack. In this configuration, sustained performance scales from 9.5 GB/s to 66.5 GB/s in one expansion rack (and up to 96 GB/s at theoretical peak). In a performance-optimized configuration using 4 TB 10,000 RPM drives, the system scales sustained performance from 14 GB/s in a single SSU to 98 GB/s in a single expansion rack (and up to 112 GB/s at theoretical peak) to over 1.7 TB/s in a single file system. The result is more bandwidth per rack unit — and per gigabyte of capacity — than any other Lustre implementation.

WP-Cray-Sonexion-3000-Storage-0616 Page 6 of 19 www.cray.com

TCO Comparison In a sample configuration where the customer’s goal is to achieve 1 TB/s of sustained performance and 10 PB of storage capacity, we compare two “appliance-like” models. In one column is the Sonexion 3000 system with the latest interconnect, with Lustre running directly on dedicated hardware inside the SSU. In a virtualized model in the other column, Lustre is running in virtual machines on a hypervisor and embedded in a high-performance monolithic controller.

Metric Sonexion 3000 Virtualized Lustre Embedded in Controller

Number of scalable units (SSU or arrays) 72

25 embedded

(400 spindles) Number of disks 5,904 10,000 Size of disk 4 TB 10K 4 TB Number of racks 11 (42U) 18 (50U) Power Draw KW 165 187 Actual Capacity (PB) 17.28 32 Actual Bandwidth (GB/s) 1,000 1,008

Both the CAPEX and OPEX of the system needed to be accounted for. Some of the issues relating to CAPEX are:

• Total hardware footprint – measured in both rack units data center floor space. In this case, to achieve 1 TB/s, the Sonexion system reduces the total number of drives and racks used by about 40 percent.

Some of the issues relating to OPEX are:

• Total number of physical components to be managed — disks drives, cables, HBAs, servers, switches and power supplies

• Total number of logical components to be managed —virtual machine images, host mappings, RAID groups, LUNs, etc.

• Total labor in hours required to plan, design, configure, deploy, and manage or operate the system. This can be measured by the total hours required to deliver and operate the system over a given multi-year period of, for example, three years.

• Total power, cooling and hosting cost. This can be measured by the kWh, BTUs and cost per square foot (also over a given multi-year period of, for example, three years).

• Power and cooling: In the above comparison, compared to the virtualized model, based on a meter cost of 10 cents per hour, the Sonexion system saves about 12 percent in one year of power and cooling costs alone.

WP-Cray-Sonexion-3000-Storage-0616 Page 7 of 19 www.cray.com

Hardware Modules The Sonexion 3000 system is easy to configure and can be expanded as customers’ storage and performance needs increase. The system is built from three major components:

• A Cray management unit (CMU), comprised of: o Sonexion management unit (SMU) o Metadata management unit (MMU)

• High-performance scalable storage units (SSUs) • Network-ready cabinets, including gigabit management paths, EDR InfiniBand and optional

100/40 gigabit Ethernet connectivity

The system comes factory integrated and preconfigured, including all hardware and software. Once the solution is on site, Cray provides everything required to deploy, integrate and optimize it — from systems architectures to global support and services. Cray supports Sonexion 3000 data storage systems for Lustre clients on both Cray and non-Cray computer systems. The Cray Sonexion 3000 scale-out storage system is a complete hardware and software solution housed in network-ready 42U racks. The first rack is called the base rack, and additional racks are called expansion racks. Each file system is configured separately using its own base rack and expansion racks. The maximum number of racks in single file system is 29. The base rack contains switches, the SMU, the MMU and up to six SSUs. Expansion racks can be added to provide additional SSUs with up to seven SSUs per expansion rack.

CMU MMU

WP-Cray-Sonexion-3000-Storage-0616 Page 8 of 19 www.cray.com

The system rack includes two 36-port enhanced data rate (EDR) InfiniBand switches to manage I/O traffic and provide network redundancy with client systems. The racks also contain two 24-port or 48-port gigabit Ethernet management network switches. The hardware architecture consists of a preconfigured, rack-level storage system that can be expanded easily using modular storage building blocks. The principal hardware components include:

• A 42U rack containing power supplies, cabling and switches • An SSU containing Lustre Object Storage Servers (OSS) and Object Storage Target (OST)

drives • A metadata management unit (MMU) containing the Lustre Metadata Server (MDS), the Lustre

Management Server (MGS) and Metadata Target (MDT) drives • An SMU with a high availability (HA) pair of embedded servers for file system management,

boot and storage • Two 36-port EDR InfiniBand or 100/40 GigE network fabric switches • Two gigabit Ethernet management switches

The system’s configuration guidelines include:

• Every file system has a base rack containing one SMU, one MMU and from one to six SSUs. • Expansion racks contain one to seven SSUs, increasing the capacity and performance of the

Lustre file system. The actual capacity increase of the file system is defined by the capacity of the disk drives in the additional SSUs. The disks must be of the same size as the disk drives in the existing SSUs.

• Each 42U rack is configured with domestic or international power and can be installed with optional water-cooled doors, along with optional associated coolant conditioning and distribution unit, if needed.

All components are managed via a web-based Cray Sonexion System Manager (CSSM), which provides both graphical and command-line interfaces.

Metadata Units The Cray Sonexion 3000 scale-out storage system differs from its predecessor in that the Sonexion 2000 data storage system had a 4U quad MMU that contained the system management, the Lustre MDS/MGS server nodes as well as the metadata target (MDT), and CSSM data and logs. The Sonexion 3000 system has a separate CMU that includes an SMU and MMU. Both the SMU and MMU servers are actually embedded application controllers (EACs), which are a dedicated hardware CPU module that utilizes the storage bridge bay (SBB) form factor such that it can plug directly into the enclosure. Each 2U24 enclosure can house two of these servers in an HA configuration plus up to 24 2.5-inch HDD or SSD for storage.

WP-Cray-Sonexion-3000-Storage-0616 Page 9 of 19 www.cray.com

The SMU provides the two management servers (EAC), which are configured in an HA pair. Should the primary management node fail, the secondary management node will automatically take over. The management servers use the following storage components:

• A RAID 10 array (four 10K 2.5-inch SAS HDDs (2+2)) for all management data storage and logging

• A RAID 1 array (two 10K SAS HDD in 1+1) for NFS root volume, which is hosted by the secondary management node

• 2 x RAID 1 arrays (four 2.5-inch 15K SAS HDDs in 2 1+1) for dedicated local database transaction logging

In addition, the management server pair provides the following:

• Cray Storage System Manager (CSSM) • CSSM web server (HTTPS) • Boot services (DHCP, NFS and PXEBOOT servers) • InfiniBand subnet manager, if needed • Automatic failover services for the two management servers (failback is manual)

The MMU also includes two servers (EAC) which, in this case, are used as metadata servers (MDS). The difference between the Sonexion 3000 system and the Sonexion 2000 system is the fact that the two-server MMU is the same hardware that is to be used for the distributed namespace (DNE) feature. For a base MMU, it provides not only the basic MDS functionality for MDS0, but also a second server that can be used in an active-active fashion when implementing the DNE feature. The MDS servers in the MMU (and the DNE MMU) are installed in failover pairs, so if one MDS server fails its enclosure partner will take over the MDS and its associated MDT services. Each MMU contains:

• Two EAC Lustre server MDS nodes • Storage components

o Two RAID10 (5+5) arrays configured as Lustre MDTs o Two HDDs available as global hot spares o MDT0 in the base MMU, configured with the Lustre MGT o Use of Lustre DNE Phase 1 is required in order to make use of MDTs beyond MDT0

MDS node failover happens automatically, but failback requires manual intervention.

WP-Cray-Sonexion-3000-Storage-0616 Page 10 of 19 www.cray.com

Figure 1. Front and rear view of the 2U24 used for the SMU and MMU

Scalable Storage Unit (SSU) Hardware The Sonexion 3000 system’s core building block is the SSU. The SSU consolidates and integrates traditional block storage, network and file system components. Every SSU is a balanced performance building block designed to run Lustre according to best practices, such as incorporating distributed hot spare drives and a mirrored solid-state disk that help accelerate file system journaling. SSUs may be configured to deliver an unmatched level of performance that scales almost linearly with minimal storage or network degradation. Within a single file system, all SSUs are configured with identical hardware and software components. Each SSU contains two embedded SBB form factor servers and an 84-bay disk drive enclosure. The SBB servers run the Lustre OSS software and have an EDR IB interface for Lustre data movement and a GigE interface for system management. The IB interface connects to a top-of-rack IB switch. The SBB servers run in an active/active configuration. The OSSs are also connected in a shared, redundant mode, via a 6 GB SAS (future 12 GB SAS with the Gen3 chassis) mid-plane to the disk drive enclosure where 82 dual-ported 3.5-inch, 7.2K RPM or 10K HPC SAS disk drives are configured into two declustered RAID groups known as GridRAID, and two SSDs are configured as (1+1) RAID1 for file system journals.

Figure 2. SSU drawers, front and rear views

WP-Cray-Sonexion-3000-Storage-0616 Page 11 of 19 www.cray.com

The OSSs run Linux. Each module has its own dedicated CPU, memory, network and storage connectivity. Each SSU delivers 9 to 10 GB/s of sustained read or write bandwidth over IB, measured by a large sequential I/O fixed time IOR benchmark using the 7.2K RPM drives and 10.5 GB/s for the 10K RPM HPC drives (future 10K RPM with Gen3 chassis rated at 14 GB/s). A rack with six SSUs will sustain over 54 GB/s (63 GB/s in seven SSU expansion configuration) of IOR bandwidth and 63 GB/s with six SSU (73.5 GB/s for seven SSU) using the 10K RPM HPC SAS drives. In the future, using the 10K RPM HPC SAS drives and the Gen3 12 GB SAS enclosure, a six-SSU base rack can sustain 84 GB/s — and a seven-SSU expansion rack can sustain 98 GB/s.

When SSUs are configured with 160 drives each, a rack can have, at most, three SSUs. This is because an expansion storage unit, or ESU, is attached to each SSU. From the front of the rack, the ESU has the same appearance as an SSU. From the back, the ESU can be identified by the lack of a connection to the IB switch (or the 100/40 GigE switch) at the top of the rack.

SSUs are comprised of:

• A high-density 5U 84-slot base enclosure with: o Two trays, each containing 42 HDDs with a 6 GB/s SAS interconnect o 82 Dual-ported 3.5-inch SAS, 7.2K RPM drives, with either 4 TB, 6 TB or 8 TB disk drives

or 10K RPM 4 TB disk drives o HDDs with 236 TB, 354 TB or 472 TB of usable storage capacity per SSU in a

declustered RAID format (Grid RAID) o Two dual-ported SSDs

• Two H/A embedded Lustre OSSs per SSU providing: o SBB 2.0 form factor controller modules o 7 GB/s sustained IOR read or write bandwidth (with EDR IB or 100/40 GigE) using 10K

drives One QSFP data cable per OSS to each rack switch Data protection/integrity using Grid RAID with:

o Two OSTs, each consisting of 41 NL-SAS HDDs with Grid RAID (declustered parity for fast rebuilds and RAID6 data protection, i.e., protection from double drive failure)

o One OST per OSS o Two OSSs per SSU o 2 x SSD OST RAID1 mirrored disks for Lustre journaling

Expansion Storage Units (ESUs) ESUs increase system capacity by adding additional OSTs. ESUs are based on the same 5U84 enclosure used for SSUs, but the OSSs are replaced with EBOD (SAS expansion) modules. Each ESU attaches to an adjacent SSU using SAS cables. All the OSS and OST components in a Sonexion system must be configured identically. If an ESU is attached to one SSU, then an ESU must be attached to all SSUs in the system. In addition, the capacity of the individual drives in all SSUs and ESUs must match.

WP-Cray-Sonexion-3000-Storage-0616 Page 12 of 19 www.cray.com

Figure 3. SAS expansion (EBOD) module in an ESU

The configuration of the disks in an ESU is 82 disks arranged in a Grid RAID array.

Sonexion Racks

Two rack types are used in a Cray Sonexion configuration: the base rack and the expansion rack. Every Sonexion file system must have a base rack. Expansion racks are optional and are used for expanding the capacity and performance of the file system.

From the top down, the primary components of a Sonexion system rack include:

• Two GigE management switches • Two InfiniBand switches • One Cray Management Unit (CMU) • SSUs — up to six in a base rack, up to seven in expansion racks • Two power distribution units (PDUs) — on the left and right sides • Service panel • Optional: additional DNE unit (commonly called an ADU MMU)

Figure 4 shows a Sonexion storage system that includes the following:

• Base rack with dual-management Ethernet and EDR IB switches, SMU, MMU and six SSUs • An expansion rack with dual-management Ethernet and EDR IB switches, ADU and seven

SSUs • A second expansion rack with dual-management Ethernet and EDR IB switches and seven

SSUs

WP-Cray-Sonexion-3000-Storage-0616 Page 13 of 19 www.cray.com

Figure 4. Sonexion base and expansion racks

Each system rack contains two PDUs, one on the left side and one on the right. The type of PDU installed depends on the type of power available at the customer site. Each component with two (N+1) power supplies is connected to both PDUs. This provides enhanced resiliency to the loss of power in either PDU.

End-to-End System Architecture Each Sonexion system contains two 36-port EDR InfiniBand switches, allowing for multiple client connectivity methods. The system supports communication with Lustre 1.8.6 and higher clients. Non-Cray systems are also supported with a variety of operating systems and connection topologies. The Sonexion system supports either Ethernet or InfiniBand connectivity to the following client platforms:

• Any x86 Linux cluster running Red Hat Enterprise Linux (RHEL), SuSE Linux Enterprise Server (SLES), CentOS or the Cray Linux Environment (CLE)

• Cray® CS™ cluster systems, including the CS-400™ and CS-Storm™ • Cray® Urika®-GX agile analytics platform • Cray® XC™ series supercomputers

WP-Cray-Sonexion-3000-Storage-0616 Page 14 of 19 www.cray.com

A client or a LNET router node, similar to that used on Cray® XC40™ compute systems, can be connected directly to one of the Sonexion system’s two top-of-rack (TOR) switches, or the TOR switches can be connected to a director-class core switch which provides client access. At this point, the client can access both the MMU and the SSUs through the InfiniBand fabric. Optional 100/40 GigE is also available on request.

System Software The Sonexion 3000 system runs Lustre 2.5 server software in a standard Linux environment (Scientific Linux 6.5 operating system). The file system is fully integrated with the Cray Sonexion System Manager (CSSM), the hardware unit management software (GEM) and RAID layers in the stack.

Figure 5. Cray Sonexion 3000 software stack

The CSSM provides a single-pane-of-glass view of the system infrastructure. It includes a browser-based GUI that simplifies system configuration and provides consolidated management and control of the entire storage system. Additionally, the CSSM provides distributed component services to manage and monitor system hardware and software. The CSSM includes interactive wizards to guide users through configuration tasks and node provisioning. Administrators can use the GUI to effectively manage the storage environment by:

• Starting and stopping file systems • Managing node failover • Monitoring node status

CSSM elevates the experience of controlling and managing a Lustre file system to a level associated with enterprise file systems. CSSM provides status and control of all system components — storage hardware, RAID controller, operating system and Lustre file system — in a single, easy-to-use administrator interface. From the creation of file systems to ongoing monitoring and management to expansion and software upgrades, the CSSM gives the user control of the entire storage solution.

WP-Cray-Sonexion-3000-Storage-0616 Page 15 of 19 www.cray.com

The system’s CSSM includes a web client interface hosted from the MMU and interfaces with distributed system manager component services. CSSM also integrates a comprehensive set of community-developed tools such as Icinga that collect, index and analyze all the fast-moving data the system generates. As a result, administrators and datacenter managers have the tools they require to keep the file system balanced and stable. Figure 6 shows an example of a CSSM pane.

Figure 6. Cray Sonexion System Manager (CSSM)

Grid RAID Grid RAID is the Sonexion system’s implementation of a parity declustered RAID level. Grid RAID is a new Sonexion RAID level that combines the logical stripe structure of RAID 6 (8+2) data protection with the pseudo-random distribution of the RAID 6 parity groups along with reserved spare data blocks across a large number of physical storage devices. By decoupling the logical data layout from the physical storage organization and incorporating dedicated spare data blocks in the distribution process, Grid RAID can restore data redundancy for an array at an accelerated rate by overcoming the single drive throughput limits — in some cases, up to four times that of single drive reconstruct. This accelerated restoration of redundancy, called “Grid RAID reconstruction,” represents the first phase of the two-phase recovery process that Grid RAID uses to recover from drive failures. The reconstruction process regenerates the data for the failed drive and writes it to a distributed spare volume. The second recovery phase is referred to as “Grid RAID rebalancing.” This phase restores the previously reconstructed data from the distributed spare onto a physical replacement drive. In most cases this involves simply copying the reconstructed data from the distributed spare volume to the replacement drive and then freeing the spare volume. For specific failure modes involving three or four failed drives, the rebalance process may reconstruct pieces of missing data on the fly, as needed. From the Lustre configuration standpoint, with Grid RAID, the 82 drives are configured as two 41-drive Grid RAID arrays with no dedicated hot spare drives. Instead, a drive’s worth of spare space is distributed evenly along the 8+2 parity groups across all 41 drives in each array, providing the storage for the Lustre OSTs.

WP-Cray-Sonexion-3000-Storage-0616 Page 16 of 19 www.cray.com

Reliability and Serviceability The Sonexion 3000 system was designed from the ground up as an enterprise-class storage solution, with resiliency engineered into all primary subsystems. Metadata and management server operations reside on the MMU. SSUs include active/active integrated server modules, along with redundant and independent system interconnections. The SSU is fully redundant and fault tolerant, ensuring maximum data availability. Each OSS accesses the disk as shared OST storage and provides active-active failover. If an OSS fails, the other active OSS manages the OSTs and the disk operations of the failed OSS. In nonfailure mode, I/O load is balanced between the OSSs. The OSS cache is protected by power protection technology; in the event of a power outage, the OSS can write its cache to persistent storage. The shared storage of the SSU consists of high-capacity SAS disk drives configured in a RAID6 array to protect against double disk failures and drive failure during rebuilds. To maximize network reliability, the OSSs in the SSU are connected to redundant EDR InfiniBand (or optional 100/40 GigE) network switches. If one switch fails, the second OSS connected to the active switch manages the OSTs of the OSS connected to the failed switch. Additionally, to maintain continuous management connectivity within the solution, the network switches are fully redundant at every point and interconnected to provide local access from the MDS and MGS nodes to all storage nodes. Each CMU (with its SMU and MMU) is fully redundant and fault tolerant. Both controller modules are configured for active-passive failover, with an active instance of the node running on one system and a passive instance of the node running on the peer system. If an active node fails — for example, if an MDS goes down — the passive MDS on the peer system takes over the MDT operations of the failed MDS. The shared storage supports small form factor SAS and SSD drives, configured in a RAID array for data protection and integrity. Each MMU supports EDR InfiniBand connections to the MDS and MGS nodes. Additionally, each server connects, via GigE, to dedicated management and IPMI networks. The system comes with Grid RAID, an enhanced RAID option that accelerates the rebuild process. Grid RAID uses the technique of parity declustering to enable rebuilds to be written to multiple disk drives simultaneously, eliminating the traditional bottlenecks common with MDRAID. Because of this feature, a single OST on the Sonexion 3000 system using Grid RAID may rebuild up to four times faster than a traditional MDRAID (8+2) RAID6 OST. Within the SMU, MMU and SSU enclosures, all drives are individually serviceable and hot swappable. Additionally, each drive is equipped with individual drive power control, enabling superior availability with drive recovery from soft errors. The SMU, MMU and SSU platforms include unique dampening technologies that minimize the impact of rotational vibration interference (RVI). All system enclosures, including the SSU platform, maximize disk drive performance by mitigating RVI sources, including cooling fans and other disk drives, and other enclosures mounted in the same rack. The CSSM is tightly integrated into the system stack — from storage and embedded server modules, RAID operations and operating system health, through to the Lustre file system and the entire storage system. Errors can be diagnosed rapidly in the CSSM with direct correlation to the problem component.

WP-Cray-Sonexion-3000-Storage-0616 Page 17 of 19 www.cray.com

Each rack provides a dual dedicated local network on a GigE switch that is used for configuration management and health monitoring of all components. The management network is private and not used for data I/O in the storage system. This network is also used for IPMI traffic to the SMU management, the MMU’s MDSes and the SSU’s OSSs, enabling them to be power-cycled by the CSSM. Software and firmware upgrades across an entire system are executed through the CSSM, removing the burden and risks that come with large, complex Lustre implementations.

Power Efficiency and Cooling As high-performance computing environments continue to scale in the amount of data that needs to be processed, the requirement to increase power efficiencies and reduce the overall power draw for servers and storage becomes crucial. The Sonexion 3000 system provides significant savings in both power efficiency and cost by providing over 90 percent energy efficiency in each SSU. This power efficiency and extreme density result in the best Lustre file system price performance and capacity per kilowatt-hour available in the HPC storage marketplace. System racks are front-to-rear air-cooled, and thus dissipate 100 percent of heat to air. A water-cooled option, which utilizes a rear-door heat exchange unit, is available.

Service and Support The Sonexion 3000 scale-out storage system can be purchased with varying levels of support services, each with its own support service level agreement. Descriptions of support service levels can be found at www.cray.com. Cray support can also be purchased for the entire file system solution, including both the Sonexion 3000 system and the Cray file system clients on the host systems. Cray will provide notices about updates and patches to the system, as well as for the Cray Linux Environment (CLE) and Lustre, as they become available.

Cray System Snapshot Analyzer Cray offers a support automation capability called the System Snapshot Analyzer (SSA). The SSA uses a call-home capability, proactively monitors your system for health, state and configuration changes, and provides faster response during a reported issue. The SSA automates collection and reporting of support information. It can operate in a site-private stand-alone mode, or it can be enabled to call home to Cray support. This enables Cray to rapidly diagnose and respond to problems — and to provide an easy, nondisruptive solution for remote support.

WP-Cray-Sonexion-3000-Storage-0616 Page 18 of 19 www.cray.com

Specifications

Height 42U

Width 600mm

Depth 1,200mm

Data Switches Dual 36-port InfiniBand EDR switches standard

Rack Management Switches

Dual 24-port gigabit Ethernet switches standard; dual 48-port gigabit Ethernet switches optional

Standard Cooling Passive

Water-Cooled Option Rear-door heat-exchange unit

Full Rack Weight (Standard Air-Cooled)

1,138.9 kg (2,510.8 lbs) Full Rack Weight

(Water-Cooled Door) 34.5 kg (76 lbs) additional

Metadata Management Unit

Metadata Controller Height 2U, high-availability server pair for metadata management

2U24 Metadata Disk Enclosure Height

2U24 drive enclosure

System Management Unit

Metadata Controller Height 2U, high-availability server pair for system management

2U84 Metadata Disk Enclosure Height

2U24 drive enclosure

Base/Expansion Unit Height 5U

Base/Expansion Unit Data Storage 84 drive slots

Scalable Storage Unit (SSU) Base/Expansion Unit Data Drive

Types

82 x 4 TB, 6 TB or 8 TB 7.2K RPM SAS

2 x SSDs

IOR Read/Write Bandwidth 9 GB/s sustainable (InfiniBand) using 7.2K RPM SAS (13.7 GB/s peak) 12-14 GB/s sustainable (InfiniBand) using 10K RPM SAS (16 GB/s peak)

Power Consumption Rack with Switches <16 kilowatts

Heat Dissipation Rack <55,000 BTU

System Availability Hot Swappable Disk drives, power supply units, fans, power cooling modules, SBB controller modules

Software & Support Software CSSM, Linux®

and Lustre® included, 1 year renewable

Hardware 1 year renewable

General System Environmental Specifications

Altitude and Temperatures

-30 to 3,048m (-100 to 10,000 ft)

5-35°C

De-rated by 1°C/300m above 900m below the specified maximum temperature

20% to 80% noncondensing

2 input PDUs per rack (2 total inputs per rack)

Volts: 208V AC

High Power (Preferred U.S. ) Type: 3-phase AC

Amps: 60A rated

Connector: (2) IEC60309 60A

WP-Cray-Sonexion-3000-Storage-0616 Page 19 of 19 www.cray.com

© 2016 Cray Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the copyright owners. Cray, the Cray logo, Sonexion and Urika are registered trademarks of, and Cray XC and Cray CS are trademarks of, Cray Inc. Other product and service names mentioned herein are the trademarks of their respective owners.