n ational p artnership for a dvanced c omputational i nfrastructure managing configuration of...

40
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks Philip M. Papadopoulos Program Director, Grid and Cluster Computing San Diego Supercomputer Center University of California, San Diego

Upload: osborn-stephens

Post on 25-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Managing Configuration of Computing Clusters with

Kickstart and XMLusing NPACI Rocks

Philip M. PapadopoulosProgram Director, Grid and Cluster

ComputingSan Diego Supercomputer CenterUniversity of California, San Diego

Page 2: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

The Rocks Guys

• Philip Papadopoulos– Parallel message passing expert (PVM and Fast

Messages)• Mason Katz

– Network protocol expert (x-kernel, Scout and Fast Messages)

• Greg Bruno– 10 years experience with NCR’s Teradata

Systems• Builders of clusters which drive very large commercial

databases

• All three of us have worked together for the past 3 years building NT and Linux clusters

Page 3: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Computing Clusters

• Background• Overview of the Rocks Methodology and

Toolkit• Description based configuration

– Taking the administrator out of cluster administration

• XML-based assembly instructions• What’s next

Page 4: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Scoping Rules

• Focused on computing clusters– Large number of nodes that need similar

system software footprints– MPI-style parallelism is the dominant

application model– Not assuming homogeneity of hardware

configurations• Do assume the same OS• Even “homogeneous” systems exhibit hardware

differences

• Not high-availability clusters– Our techniques can help here, but we don’t

address the specific software needs of HA

Page 5: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Many variations on a basic layout

Front-end Node(s) Public Ethernet

Fast-Ethernet Switching Complex

Gigabit Network Switching Complex

Node Node Node Node Node

Node Node Node Node Node

Pow

er D

istribu

tion

(Net a

dd

ressa

ble

un

its as o

ptio

n)

Page 6: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Current Configuration of the Meteor

• Rocks v2.2 (RedHat 7.2)

• 2 Frontends, 4 NFS Servers

• 100 nodes– Compaq

• 800, 933, IA-64• SCSI, IDA

– IBM• 733, 1000• SCSI

• 50 GB RAM• Ethernet

– For management• Myrinet 2000

Page 7: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

NPACI Rocks Toolkit – rocks.npaci.edu

• Techniques and software for easy installation, management, monitoring and update of Linux clusters

• Installation– Bootable CD + floppy which contains all the packages

and site configuration info to bring up an entire cluster• Management and update philosophies

– Trivial to completely reinstall any (all) nodes.– Nodes are 100% automatically configured

• Use of DHCP, NIS for configuration

– Use RedHat’s Kickstart to define the set of software that defines a node.

– All software is delivered in a RedHat Package (RPM)• Encapsulate configuration for a package (e.g.. Myrinet)• Manage dependencies

– Never try to figure out if node software is consistent• If you ever ask yourself this question, reinstall the node

Page 8: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Goals

• Simplify cluster management (Make clusters easy)– Remove the system administrator– Make software available to a wide audience– Build on de facto standards– Allow contributors to solve specific problems and package

software components• Track the rapid pace of Linux development

– Redhat 6.2 – one update every 3 days– Redhat 7.x – two updates every 3 days

• Leverage and remain open source – Unlikely that computational cluster managment is a long-

term commercial business– Some components should be purchased! (compilers,

debuggers …)

Page 9: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Who is Using It?

• Growing list of users that we know about:– SDSC, SIO, UCSD (8 Clusters, including CMS

(GriPhyN) prototype)– Caltech– Burnham Cancer Institute– PNNL (several clusters, small, medium, large)– University of Texas– University of North Texas– Northwestern University– University of Hong Kong– Compaq (Working relationship with their Intel

Standard Servers Group)– Singapore Bioinformatics Institute– Myricom (Their internal development cluster)

Page 10: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

What we thought we “Learned”

• Clusters are phenomenal price/performance computational engines, but are hard to manage

• Cluster management is amanagement is a full-time job which gets linearly harder as one scales out.

• “Heterogeneous” Nodes are a bummer (network, memory, disk, MHz, current kernel version, PXE, CDs).

Page 11: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

You Must Unlearn What You Have Learned

Page 12: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Installation/Management

• Need to have a strategy for managing cluster nodes

• Pitfalls– Installing each node “by hand”

• Difficult to keep software on nodes up to date

– Disk Imaging techniques (e.g.. VA Disk Imager)• Difficult to handle heterogeneous nodes• Treats OS as a single monolithic system

– Specialized installation programs (e.g. IBM’s LUI, or RWCPs Multicast installer) –

• let OS packaging vendors do their job

• Penultimate– RedHat Kickstart

• Define packages needed for OS on nodes, kickstart gives a reasonable measure of control.

• Need to fully automate to scale out (Rocks gets you there)

Page 13: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Networks

• High-performance networks– Myrinet, Giganet, Servernet, Gigabit Ethernet, etc.– Ethernet only Beowulf-class

• Management Networks (Light Side)– Ethernet – 100 Mbit

• Management network used to manage compute nodes and launch jobs

• Nodes are in Private IP (192.168.x.x) space, front-end does NAT

– Ethernet – 802.11b• Easy access to the cluster via laptops• Plus, wireless will change your life

• Evil Management Networks (Dark Side)– A serial “console” network is not necessary– A KVM (keyboard/video/monitor) switching system adds

too much complexity, cables, and cost

Page 14: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

How to Build Your Rocks Cluster

1. Get and burn ISO CD image from Rocks.npaci.edu

2. Fill-out form to build initial kickstart file for your first front-end machine

3. Kickstart “naked” frontend with CD and kickstart file

4. Reboot frontend machine5. Integrate compute nodes with “Insert

Ethers”6. Ready to go!

Page 15: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

insert-ethers• Used to populate the “nodes”

MySQL table• Parses a file (e.g.,

/var/log/messages) for DHCPDISCOVER messages– Extracts MAC addr and, if not in

table, adds MAC addr and hostname to table

• For every new entry:– Rebuilds /etc/hosts and

/etc/dhcpd.conf– Reconfigures NIS– Restarts DHCP and PBS

• Hostname is– <basename>-<cabinet>-

<chassis>• Configurable to change hostname

– E.g., when adding new cabinets

Page 16: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Configuration Derived from Database

mySQL DB

makehosts

/etc/hosts

makedhcp

/etc/dhcpd.conf

pbs-config-sql

pbs node list

insert-ethersNode 0

Node 1

Node N

Automated nodediscovery

Page 17: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Remote re-installationShoot-node and eKV

• Rocks provides a simple method to remotely reinstall a node– CD/Floppy used to install the first time

• By default, hard power cycling will cause a node to reinstall itself.– Addressable PDUs can do this on generic hardware

• With no serial (or KVM) console, we are able to watch a node as installs (eKV), but …– Can’t see BIOS messages at boot up

• Syslog for all nodes sent to a log host (and to local disk)– Can look at what a node was complaining about before it

went offline

Page 18: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

192.168.254.254

Remotely starting reinstallation on two nodes

192.168.254.253

Remote re-installationShoot-node and eKV

Page 19: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Key Ideas

• No difference between OS and application software – OS installation is disposable– Unique state that is kept only at a node is bad– Identical mechanisms used to install both

• Single step installation of updated software OS – Security patches pre-applied to the distribution not

post-applied on the node

• Inheritance of software configurations – Distribution– Configuration

• Description-based configuration rather than image-based

Page 20: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Don’t Differentiate OS and Application SW

• All software delivered in RPM packages

• Use a package manager to handles conflicts– RPM is not totally complete, but

• Packages will not overwrite each other without explicit override

• Tracking what has changed between the software as packaged and what is on disk – rpm –verify

• We install a complete system from a selected list of packages and associated configuration– latest security patches applied before installation.

Page 21: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

System State ?

• What is the installed state of a system?

– Software bits on disk– Configuration information (files, registry, database)

OR– Software bits in memory– Configuration in memory

• How you answer this question is fundamental to how one moves (updates) a system from one state to the next. – If the first, then you can update an installation and

configuration in a single (re)install/reboot step– If the second, you may have to make several state

changes (ordering dependencies) to update state.

Page 22: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Rocks Hierarchy

Collection of all possible software packages

(AKA Distribution)

Descriptive information to configure a node

Compute Node

Kickstart file

RPMs

IO Server Web Server

Appliances

Page 23: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Descriptive information to configure a node

Description-based Configuration

Compute Node IO Server Web Server

Kickstart file

Collection of all possible software packages

(AKA Distribution)

RPMs

Page 24: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Building Distributions: Rocks-dist

• Integrate Packages from– Redhat (mirror) – base distribution + updates– Contrib directory– Locally produced packages– Local contrib (e.g. commercially bought code)– Packages from rocks.npaci.edu

• Produces a single updated distribution that resides on front-end– This is a RedHat Distribution with patches and

updates pre-applied

Page 25: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

NPACI / SDSC

• # rocks-dist mirror– Red Hat mirror

• Red Hat 7.2 release• Red Hat 7.2 updates

• # rocks-dist dist– Rocks 2.2 release

• Red Hat 7.2 release• Red Hat 7.2 updates• Rocks software• Contributed software

Page 26: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Your Site

• # rocks-dist mirror– Rocks mirror

• Rocks 2.2 release• Rocks 2.2 updates

• # rocks-dist dist– Kickstart distribution

• Rocks 2.2 release• Rocks 2.2 updates• Local software• Contributed software

• This is the same procedure NPACI Rocks uses.– Organizations can

customize Rocks for their site.

• Dept’s can customize

Page 27: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Rocks-dist Summary

• Created for us to build software release• Modifies a stock Red Hat release

– Applies all updates– Adds local and contributed software– Patches boot images

• eKV allows us to monitor at a remote installation without a KVM

• URL kickstart description and rpms transferred over http

• Inheritance hierarchy allows customization of software collection at many levels– End-user– Group– Department– Company– Community important for distributed science group

Page 28: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Description-based Configuration

Collection of all possible software packages

(AKA Distribution)

Compute Node

RPMs

IO Server Web Server

Descriptive information to configure a node

Kickstart file

Page 29: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Description-based Configuration

• Built an infrastructure that "describes“ the roles of cluster nodes– Nodes are installed using Red Hat's

kickstart• ASCII file with names of packages to install

and "post processing“ commands

• Rocks builds kickstart on-the-fly, tailored for each node

• NPACI Rocks kickstart is general configuration + local node configuration– General configuration is described by

modules linked in a configuration graph

– Local node configuration (applied during post processing) is stored in a MySQL database

VS.

Page 30: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

What are the Challenges

• Kickstart file is ASCII– There is some structure

• Pre-configuration• Package list• Post-configuration

• Not a “programmable” format– Most complicated section is post-

configuration• Usually this is handcrafted

– Want to be able to build sections of the kickstart file from pieces

Page 31: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Break down configuration of appliances into small compositional pieces

Page 32: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Cluster Description Appliances

Page 33: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Allows small differences in configuration to be easily described

Page 34: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Architecture Dependencies

• Allows users to focus only on the differences

• Architecture type is passed from the top

Page 35: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

XML Used to Describe Modules

• Abstract Package Names, versions, architecture– ssh-client

Not

– ssh-client-2.1.5.i386.rpm• Allow an administrator to

encapsulate a logical subsystem

• Node-specific configuration can be retrieved from a database– IP Address– Firewall policies– Remote access policies– …

<?xml version="1.0" standalone="no"?><!DOCTYPE kickstart SYSTEM "@KICKSTART_DTD@" [<!

ENTITY ssh "openssh">]><kickstart> <description> Enable SSH </description> <package> &ssh; </package> <package> &ssh;-clients</package> <package> &ssh;-server</package> <package> &ssh;-askpass</package> <!-- include XFree86 packages for xauth --> <package>XFree86</package> <package>XFree86-libs</package><post>cat &gt; /etc/ssh/ssh_config &lt;&lt; 'EOF' <!--

default client setup -->Host * CheckHostIP no ForwardX11 yes ForwardAgent yes StrictHostKeyChecking no UsePrivilegedPort no FallBackToRsh no Protocol 1,2EOF </post> </kickstart>

Page 36: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Creating the Kickstart file

1. Node makes HTTP request to get configuration• Can be online or captured to a file• Node reports architecture type, IP address,

[ appliance type], [options]

2. Kpp – preprocessor• Start at appliance type (node) and make a single

large XML file by traversing the graph

3. Kgen – generation• Translation to kickstart format. Other formats could

be supported• Node-specific configuration looked up in a database

• Graph visualization using dot (AT&T)

Page 37: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

HTTP as Transport

• Kickstart file is retrieved VIA HTTP– Rocks Web site provides a form to build

configuration to build a remote site’s frontend (bootstrap, captured to a file)

– Cluster frontend as server for cluster nodes (online, bootstrap nodes)

• RPMs transported via HTTP• Web infrastructure is very scalable and

robust• Managing configurations can go beyond a

cluster– We’ve installed/configured our home

machines from SDSC over a cable modem

Page 38: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Payoff – Never before seen hardware

• Dual Athlon, White box, 20 GB IDE, 3Com Ethernet– 3:00 PM: In cardboard box– Shook out the loose screws– Dropped in a Myrinet card– Inserted it into cabinet 0

• Cabled it up

– 3:25 PM: Inserted the NPACI Rocks CD

– Ran insert-ethers (assigned node name compute-0-24)

– 3:40 PM: Ran Linpack

Page 39: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Futures

• Improve Monitoring, debugging, self-diagnosis of cluster-specific software

• Improve documentation!• Continue Tracking RedHat updates/releases• Prepare for Infiniband Interconnect

– Global file systems, I/O is an Achilles heel of clusters• Grid Tools (Development and Testing)

– Globus– Grid research tools (APST)– GridPort toolkit

• Integration with other SDSC projects– SRB– MiX - data mediation– Visualization Cluster - Display Wall

Page 40: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Managing Configuration of Computing Clusters with Kickstart and XML using NPACI Rocks

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Summary

• Rocks significantly lowers the bar for users to deploy usable compute clusters– Very simple hardware assumptions

• XML module descriptions allows encapsulation

• Graph interconnection allows appliances to share configuration– Deltas among appliances easily visualize

• HTTP transport scalable in– Performance– Distance