dell ™ poweredge ™ 2650 high performance computing clusters.doc

INFOBriefDell™ PowerEdge™ 2650

High Performance Computing Clusters

Key Points

Second generation of Dell’s High Performance Computing Cluster (HPCC) provides computational-intensive capacity leveraging the latest technology available in the market.

HPCC is a cost effective method for delivering a parallel computing system platform, targeted towards compute- and data-intensive applications.

Through Dell HPCC, users can aggregate standards-based server and storage resources into powerful supercomputers to provide an inexpensive yet powerful solution.

High Performance Computing Clusters (HPCCs) are popular methods for solving these complex problems because of their low price points and excellent scalability.

Dell helps provide investment protection by offering solutions based on industry standard building blocks that can be re-

Dell Product Group Updated November 2003

High Performance Computing Clusters (HPCC) offer a cost effective, scalable solution for parallel computing system platforms designed for demanding, compute intensive applications

deployed as traditional application servers as users integrate newer technology into their network infrastructures.

Dell delivers high-volume, standards-based solutions into scientific and compute-intensive environments that can benefit from economies-of-scale, and add systems as requirements change.

Dell’s technology and methodology are designed to provide high reliability, price/performance leadership, easy scalability and simplicity by bundling order codes for hardware, software and support services for 8, 16, 32, and 64 node clusters.

Product Description

The concept of HPCC or “Beowulf” (the project name used by original designers) clusters originated at the Center of Excellence in Space Data and Information Sciences (CESDIS), located at the NASA Goddard Space Flight Center in Maryland. The project’s goal was to design a cost-effective, parallel computing cluster built from off-the-shelf components that would satisfy the computational requirements of the earth and space sciences community.

As cluster solutions have gained acceptance for solving complex computing problems, High Performance Computing Clusters (HPCC) are starting to replace supercomputers in this role. The cost of commodity HPCC systems has changed a purchase decision from evaluating expensive proprietary solutions, where cost was not the primary issue, to evaluating vendors based on their ability to deliver exceptional price-to-performance ratios and support capabilities.

Logical View of a High Performance Computing Cluster

The strategy behind parallel computing is to “divide and conquer.” By dividing a complex problem into smaller component tasks that can be worked on simultaneously, the problem can often be solved more quickly. This can help save time and resources, as well as monetary costs. Dell’s HPCC uses a multi-computer architecture, as depicted in Figure 1. It features a parallel computing system that consists of one master node and multiple compute nodes connected via standard network interconnects. All of the server nodes in a typical HPCC run an industry standard operating system, which typically offers substantial savings over proprietary operating systems.


Figure 1

Logical View of High Performance Computing Cluster

Compute Nodes

Master Node

ExternalStorage

Compute Nodes

Master Node

ExternalStorage

The master node of the cluster acts as a server for the Network File System (NFS), job-scheduling tasks, security, and acting as a gateway to end-users. The master node assigns each of the compute nodes with one or more tasks to perform as the larger task is broken into sub-functions. As a gateway, the master node allows users to gain access to the compute nodes.

The sole task of the compute nodes is to execute assigned tasks in parallel. A compute node does not have a keyboard, mouse, video card, or monitor. Access to client nodes is provided via remote connections through the master node.

From a user's perspective, a HPCC appears as a Massively Parallel Processor (MPP) system. Common methods of using the system are to access the master node either directly or through Telnet or remote login from personal workstations. Once logged onto the master node, users can prepare and compile their parallel applications and spawn jobs on a desired number of compute nodes in the cluster.

In addition to compute nodes and master nodes, key components of HPCC include: systems management utilities, applications, file systems, interconnects, and storage and software solution stacks.

Dell OpenManage Systems ManagementBecause HPCC systems can consist of many nodes, it is important to be able to monitor and manage these nodes from a single console. It is possible to have thousands of nodes within one cluster. To help manage such a sizable cluster, Dell OpenManage™ systems management utilities are designed to provide system discovery, event filtering, systems monitoring, proactive alerts, inventory and asset management as well as remote manageability for the compute nodes and master nodes.

The Enterprise Remote Access is a separate management fabric that offers features that include: remote power


Linux

Parallel Applications

Master Node

File Server/gateway

Compute nodes

Message Passing Library

Cluster Management

Tools

operations; virtualization of FLOPPY, CDROM, and other peripherals; console redirect; and BIOS flash update

ApplicationsApplications may be written to run in parallel on multiple systems and use the message-passing programming model. Jobs of a parallel application are spawned on compute nodes, which work collaboratively until the jobs are complete. During the execution, compute nodes use standard message-passing middleware to coordinate activities and information passing.

Parallel Virtual File SystemA Parallel Virtual File System (PVFS) is used as a high-performance, large parallel file system for temporary storage and as an infrastructure for parallel I/O research. PVFS stores data on the existing local file systems of multiple cluster nodes, enabling many clients access to the data simultaneously. Within a HPC cluster, PVFS enables high-performance I/O that is comparable to that of other proprietary file systems.

InterconnectTo communicate with each other, the cluster nodes are connected through a network. The interconnect technology chosen depends on the amount of interaction between nodes when an application is executed. Some applications are similar to batch environments, and the communication between compute nodes is limited. For these environments, Fast Ethernet may be adequate. However, in environments that require more frequent communication, a Gigabit Ethernet interconnect is preferable.

Some application environments can also benefit from a special interconnect that has been design to provide high-speed and low latency between the compute nodes. For these applications, Dell’s bundles are available with Myricom’s Myrinet.

High Performance Computing Cluster Solution StackDell partners with service providers to deliver the software components necessary for implementing a HPCC solution. The HPCC stack includes the job-scheduler, cluster management, message passing libraries, and compilers.


High Performance Computing Market

Target markets for high performance computing clusters are: higher education, large corporations, federal government, and technology sectors that require high performance computational computing. Industry examples include: oil and gas, aerospace, automotive, chemistry, national security, financial and pharmaceutical.

Typical high computation applications include: war and airline simulations, financial modeling, molecular modeling, fluid dynamics, circuit board design, ocean flow analysis, seismic data filtering, and visualizations.

Applications that use HPC clusters and their specific vertical markets can be found in Table 1.

Table 1Vertical Markets Appropriate for HPCC

Vertical Description of Requirements Typical Applications

Manufacturing Crash worthiness, stress analysis, shock and vibe, aerodynamics

Fluent, Radioss, Nastran, Ansys, Powerflow

Energy Seismic processing, geophysical modeling, reservoir modeling

VIP, Eclipse, Vertias

Life Sciences Drug design, bioinformatics, DNA mapping, disease research

Blast, Charmn, NAMD, PC-Gamess, Gaussian

Digital Media Render Farms Renderman, Discreet

Finance Portfolio Management (Monte Carlo simulation), risk analysis

Barra, RMG, Sungard

Although market opportunities exist in environments made up of thousands of nodes, standard HPCC configurations target the majority of clusters within the 8-node to 128-node configuration range. Customers investigating larger cluster configurations should contact the Dell Professional Services organization for assistance.

Dell’s bundled HPCC solutions target customers with varying levels of expertise, from complete turnkey solutions -- including hardware and software - to easy-to-order hardware-only bundles. For those who do require a complete solution, Dell also offers consulting assistance.


Features and Benefits

The Dell High Performance Computing Cluster leverages many advantages of Dell’s product line, including server, storage, peripheral, and services components. By creating standard product offerings, Dell solutions are designed to help minimize configuration complexity. These standard packages consist of 8, 16, 32, 64 and 128 node configurations.

The key technology features of a Dell High Performance Computing cluster configuration are shown in Table 2.

Table 2The Key Technology Features of a Dell HPCC Configuration

Feature Function Benefit

Full featured hardware configurations

Pre-bundled order codes for 8 node, 16 node, 32 node, 64 node and 128 node configurations; 16 – 256 CPU configurations

Simplified ordering process and pre-qualified configurations

PowerEdge™ 2650 (Compute Node)

Dual Intel Xeon processors at 2.0GHz, 2.4GHz, 2.8GHz, 3.06GHz, and 3.2GHz with 533MHz FSB providing highest performance2U form factor

3 slots on separate I/O buses 256MB of DDR SDRAM,

expandable to 12GB Configurable sized drives

(expandable to 5 drives) for internal storageo SCSI drives

High performance compute node for the most challenging applications

High density enables large compute clusters in a rack

Helps to minimize I/O bottlenecks

Flexibility for increasing storage capacity on compute node

PowerEdge 2650 (Master Node)

Dual Intel Xeon processors at 2.0GHz, 2.4GHz, 2.8GHz, 3.06GHz, and 3.2GHz with 533MHz FSB providing highest performance2U form factor

3 slots on separate I/O buses 256MB of DDR SDRAM,

expandable to 12GB Configurable sized drives

(expandable to 5 drives) for internal storageo SCSI drives

High performance and highly available server in dense form factor


Feature Function BenefitInterconnect – OptionsFast Ethernet – Low costGigabit Ethernet – High performanceMyrinet – High Speed Low Latency

The interconnect technologies in a HPCC configuration allow servers to communicate with each other for node-to-node communications.

The interconnect technology is designed for message passing between the nodes. Offering Fast Ethernet and Gigabit Ethernet enables customers to choose between a low cost or higher performance solution.

Myrinet provides a high speed low latency interconnect for application environments that require frequent node-to-node communication.

Storage Devices PowerVault™ 220S SCSI external storage device on the Master Node for primary storage

Provides a cost effective method for a large amount of external storage capabilities that can be allocated across multiple channels for maximized I/O performance

Headless Operation The ability to operate a system without keyboard, video or mouse (KVM)

Simplifies cable management and helps lower cost of solution by eliminating monitors, keyboards and mouse

Redirection of serial port

ERA - Enhanced features for Out-of-Band and Remote Management to allow centralized control of network devices through serial console ports.

Helps increase manageability of cluster from a single console device.

Operating System software pre-install

Factory installation of Red Hat Linux operating system

Facilitates setup of cluster configuration

Wake on LAN Provides the capability to remotely power-on compute nodes over the Ethernet network

Remote management tool that can reduce system management workload, provide flexibility to the system administrator's job, and help save time-consuming effort and costs.

HPCC Software Solution Stack –Optional deliverable through DPS

Cluster ManagerCompilersJob SchedulerMKL, BLAS, AtlasMPI interface

Dell tested tools for creating system environment for parallel computing infrastructure

Server Management Embedded systems management detects errors such as fan failures and temperature and voltage problems, which generate alerts and reports to Dell OpenManage Console

Detects and remedies problems within the cluster.


Key Customer Benefits

The performance of commodity computers and network hardware continually improves as new technology is introduced and implemented. At the same time, market conditions have led to decreases in the price of these components. As a result, it is now practical to build parallel computational systems based on low-cost, high-density servers, such as the Dell PowerEdge 2650, rather than buy CPU time on expensive supercomputers. Dell PowerEdge servers are tuned to take advantage of the existing server/OS/application combination. Dell PowerEdge server performance and price/performance are typically among the industry leaders on a variety of benchmark standard scales (TPC-C, TPC-W; SPECweb99).

Low cost and high performance are only two of the advantages of using a Dell High Performance Computing Cluster solution. Other key benefits of HPCC versus large Symmetric Multi Processors (SMP) are shown in Table 3.

Table 3Comparison of SMP and HPCC Environments

Large SMPs HPCC

Scalability Fixed Unbounded

Availability High High

Ease of Technology Refresh Low High

Application Porting None Required

Operating System Porting Difficult None

Service and Support Expensive Affordable

Standards vs. Proprietary Proprietary Standards

Vendor Lock-in Required None

System Manageability Custom; better usability

Standard; moderate usability

Application Availability High Moderate

Reusability of Components Low High

Disaster Recovery Ability Weak Strong

Installation Non-standard Standard

The features compared in Table 3 are defined as follows:


Scalability: The ability to grow in overall capacity and to meet higher usage demand as needed. When additional computational resources are needed, servers can be added to the cluster. Clusters can consist of thousands of servers.

Availability: The access to compute resources. To help ensure high availability, it is necessary to remove any single point of failure in the hardware and software. This helps to ensure that any individual system component, the system as a whole, or the solution (i.e., multiple systems) stay continuously available. A HPCC solution offers high availability because the components can be isolated and, in many cases, the loss of a compute node in the cluster does not have a large impact on the overall cluster solution. The workload of that node is allocated among the remaining compute nodes.

Ease of Technology Refresh: Integrating a new processor, memory, disk, or operating system technology can be accomplished with relative ease. In HPCC, as technology moves forward, modular pieces of the solution stack can be replaced as time, budget and needs require or permit. There is no need for a one-time 'switch-over' to the latest technology. In addition, new technology is often integrated more quickly into standards-based volume servers than proprietary system providers.

Service and Support: Total cost of ownership – including post-sales costs of maintaining the hardware and software – from standard upgrades to unit replacement to staff training and education, is generally much lower when compared to proprietary implementations that typically come with a high level of technical services due to their inherently complex nature and sophistication.

Vendor Lock-in: Proprietary solutions require a commitment to a particular vendor, whereas industry-standard implementations are interchangeable. Many proprietary solutions require only components that have been developed by that vendor. Depending on the revision and technology, application performance may be diminished. HPCC enables solutions to be built from the best-performing industry standard components.

System Manageability: System management is the installation, configuration and monitoring of key elements of computer systems, such as hardware, operating system and applications. Most large SMPs have proprietary enabling technologies (custom hardware extension and software components) that can complicate the system management. On the other hand, it is easier to manage one large system compared to hundreds of nodes. However, with wide deployment of network infrastructure and enterprise management software, it is possible to easily manage multiple servers of a HPCC system from a single point.

Reusability of Components: Commodity components can be reused when off line, therefore preserving a customer’s


investment. In the future, when refreshing a Dell HPCC PowerEdge solution with next generation platforms, the older Dell PowerEdge compute nodes can be deployed as File/Print servers, Web Servers or other infrastructure servers.

Installation: Specialized equipment generally requires expert installation teams trained to handle such cases. They also require dedicated facilities such as power, cooling, etc. For HPCC, since the components are “off-the-shelf” commodities, installation is generic and widely supported.

Hardware OptionsThe High Performance Computing Cluster configurations can be enhanced in the following ways: Increased memory in the compute nodes Increased internal HDD storage capacity in the compute

nodes

Increased external storage on the master node

Additional NICs for the compute nodes and master node

Through Dell Professional Services’ recommendations on faster interconnect technologies

Related Web Siteshttp://www.dell.com/clustering

http://www.oscar.org/

http://www.csm.ornl.gov/oscar/

http://www.beowulf.org/

http://www.dell.com/us/en/esg/topics/segtopic_servers_pedge_rackmain.htm

Service and SupportDell HPCC systems come with the following:

Three year limited warranty1 and three years of standard Next Business Day (NBD) parts replacement and one year of NBD on-site2 labor) 30-day “Getting Started” help line3

DirectLine network operating system support upgrades available with limited three-year warranty Telephone support 24 hours a day, 7 days a week, 365 days a year for the duration of the limited three-year warranty.


http://www.dell.com/us/en/esg/topics/segtopic_servers_pedge_rackmain.htm

http://www.beowulf.org/

http://www.csm.ornl.gov/oscar/

http://www.oscar.org/

http://www.dell.com/clustering

Dell Professional Services offers additional services to assist in: Solution Design Consultation Installation and Setup Pre-staging of solution at off-site location

1 For a copy of our Guarantees or Limited Warranties, please write Dell USA, L.P., Attn: Warranties, One Dell Way, Round Rock, TX 78682. For more information, visit www.dell.com/us/en/gen/services/service_service_plans.htm.2 Service may be provided by third-part y. Technician will be dispatched if necessary following phone-based troubleshooting. Subject to parts availability, geographical restrictions and terms of service contract. Service timing dependent upon time of day call placed to Dell. U.S. only.3 30-day telephone support program is at no additional charge to help customers with installation optimization and configuration questions during the critical 30-day period after shipment of PowerEdge systems. This program is available to customers who purchase Novell NetWare® or Microsoft Windows NT Server or Windows 2000 Server with their PowerEdge server from Dell. Support provided after the 30-day Getting Started Program will be for only the Dell hardware. Beyond 30 days from the invoice date, Dell’s DirectLine telephone support service is available for purchase for NOS support.

Dell, OpenManage, PowerVault and PowerEdge are trademarks of Dell Computer Corporation. Microsoft and Windows NT are registered trademarks of Microsoft Corporation. Intel is a registered trademark of Intel Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others.

©Copyright 2002 Dell Computer Corporation. All rights reserved. Reproduction in any manner whatsoever without the express written permission of Dell Computer Corporation is strictly forbidden. For more information contact Dell. Dell cannot be responsible for errors in typography or photography.