systems support for many task computing

Download Systems Support for Many Task Computing

If you can't read please download the document

Upload: eric-van-hensbergen

Post on 16-Apr-2017

1.258 views

Category:

Technology


1 download

TRANSCRIPT

FastOS Workshop: PROSE Presentation

Systems Support for
Many Task Computing

Holistic Aggregate Resource Environment

Eric Van Hensbergen (IBM) andRon Minnich (Sandia National Labs)

To replace the title / subtitle with your own:Click on the title block -> select all the text by pressing Ctrl+A -> press Delete key -> type your own text

Motivation

Overview of Approach

Targeting Blue Gene/P provide a complimentary runtime environment

Using Plan 9 Research Operating SystemRight Weight Kernel - balances simplicity and function

Built from the ground up as a distributed system

Leverage HPC interconnects for system services

Distribute system services among compute nodes

Leverage aggregation as a first-class systems construct to help manage complexity and provide a foundation for scalability, reliability, and efficiency.

Related Work

Default Blue Gene runtimeLinux on I/O nodes + CNK on compute nodes

High Throughput Computing (HTC) Mode

Compute Node Linux

ZeptoOS

Kittyhawk

Foundation: Plan 9 Distributed System

Right Weight KernelGeneral purpose multi-thread, multi-user environment

Pleasantly portable

Relatively Lightweight (compared to Linux)

Core PrinciplesAll resources are synthetic file hierarchies

Local & remote resources accessed via simple API

Each thread can dynamically organize local and remote resources via dynamic private namespace

Everything Represented as File Systems

HardwareDevicesSystemServicesApplicationServicesDiskNetworkTCP/IP StackDNSGUI

/dev/hda1/dev/hda2

/dev/eth0

/net /arp /udp /tcp /clone /stats /0 /1 /ctl /data /listen /local /remote /status

/net/cs/dns

/win/clone/0/1 /ctl /data /refresh/2Console, Audio, Etc.

Wiki, Authentication, and Service Control

Process Control, Debug, Etc.

Plan 9 Networks

Internet

High Bandwidth (10 GB/s) NetworkLAN (1 GB/s) Network

Wifi/EdgeCable/DSL

ContentAddressableStorageFileServerCPUServersCPUServersPDASmartphoneTermTermTermTerm

Set Top Box

ScreenPhone

)))

An Issue of Scale

Node Card(4x4x2)32 compute0-2 IO cardsCompute Card2 chipsChipBG/p 4 way

Rack32 Node Cards

System72 Racks

Aggregation as a First Class Concept

Local Service

Aggregate Service

Remote Service

Proxy Service

Remote Service

Remote Service

Issues of Topology

File Cache Example

Proxy ServiceMonitors access to remote file server & local resources

Local cache mode

Collaborative cache mode

Designated cache server(s)

Integrate replication and redundancy

Explore write coherence via territories ala Envoy

Based on experiences with Xget deployment model

Leverage natural topology of machine where possible.

Monitoring Example

Distribute monitoring throughout the systemUse for system health monitoring and load balancing

Allow for application-specific monitoring agents

Distribute filtering & control agents at key points in topology

Allow for localized monitoring and control as well as high-level global reporting and control

Explore both push and pull methods of modeling

Based on experiences with supermon system.

Workload Management Example

Provide file system interface to job execution and scheduling.

Allows scheduling of new work from within the cluster, using localized as well as global scheduling controls.

Can allow for more organic growth of workloads as well as top-down and bottom-up models.

Can be extended to allow direct access from end-user workstations.

Based on experiences with Xcpu mechanism.

Status

Initial Port to BG/P 90% Complete

ApplicationsLinux emulation environment

CNK emulation environment

Native ports of applications

Also have a port of Inferno Virtual Machine to BG/PRuns on Kittyhawk as well as Native

Baseline boot & runtime infrastructure complete

HARE Team

David Eckhardt (Carnegie Mellon University)

Charles Forsyth (Vitanuova)

Jim McKie (Bell Labs)

Ron Minnich (Sandia National Labs)

Eric Van Hensbergen (IBM Research)

Thanks

FundingThis material is based upon work supported by the Department of Energy under Aware Number DE-FG02-08ER25851

ResourcesThis work is being conducted on resources provided by the Department of Energy's Innovative and novel Computational Impact on Theory and Experiment (INCITE)

InformationThe authors would also like to thank the IBM Research Blue Gene Team along with the IBM Research Kittyhawk team for their assistance.

Questions? Discussion?

Links

FastOS Web Sitehttp://www.cs.unm.edu/~fastos/

Phase II CFPhttp://www.sc.doe.gov/grants/FAPN07-23.html

BlueGenehttp://www.research.ibm.com/bluegene/

Plan 9http://plan9.bell-labs.com/plan9

LibraryOShttp://www.research.ibm.com/prose

Plan 9 Characteristics

Kernel Breakdown - Lines of CodeArchitecture Specific CodeBG/L: ~10,000 lines of code

Portable CodePort: ~25,000 lines of code

TCP/IP Stack: ~14,000 lines of code

Binary Sizes415k Text + 140k Data + 107k BSS

Runtime Memory Footprint~4 MB for compute node kernels could be smaller or larger depending on application specific tuning.

Why not Linux?

Not a distributed system

Core systems inflexibleVM based on x86 MMU

Networking tightly tied to sockets & TCP/IP w/long call-path

Typical installations extremely overweight and noisy

Benefits of modularity and open-source advantages overcome by complexity, dependencies, and rapid rate of change

Community has become conservativeSupport for alternative interfaces waning

Support for large systems which hurts small systems not acceptable

Ultimately a customer constraintFastOS was developed to prevent OS monoculture in HPC

Few Linux projects were even invited to submit final proposals

FTQ on BG/L IO Node running Linux

FTQ on BG/L IO Node Running Plan 9

Right Weight Kernels Project (Phase I)

MotivationOS Effect on ApplicationsMetric is based on OS Interference on FWQ & FTQ benchmarks.

AIX/Linux has more capability than many apps need

LWK and CNK have less capability than apps want

ApproachCustomize the kernel to the application

Ongoing ChallengesNeed to balance capability with overhead

Why Blue Gene?

Readily available large-scale clusterMinimum allocation is 37 nodes

Easy to get 512 and 1024 node configurations

Up to 8192 nodes available upon request internally

FastOS will make 64k configuration available

DOE interest Blue Gene was a specified target

Variety of interconnects allows exploration of alternatives

Embedded core design provides simple architecture that is quick to port to and doesn't require heavy weight systems software management, device drivers, or firmware

Department of Energy FastOS CFP
aka: Operating and Runtime System for Extreme Scale Scientific Computation (DE-PS02-07ER07-23)

Goal:

Stimulate R&D related to operating and runtime systems for petascale systems in the 2010 to 2015 time frame.Expected Output

Unified operating and runtime system that could fully support and exploit petascale and beyond systems.Near Term Hardware Targets: Blue Gene, Cray XD3, and HPCS Machines.

Blue Gene Interconnects

3 Dimensional TorusInterconnects all compute nodes (65,536)

Virtual cut-through hardware routing

1.4Gb/s on all 12 node links (2.1 GB/s per node)

1 s latency between nearest neighbors, 5 s to the farthest

4 s latency for one hop with MPI, 10 s to the farthest

Communications backbone for computations

0.7/1.4 TB/s bisection bandwidth, 68TB/s total bandwidth

Global TreeOne-to-all broadcast functionality

Reduction operations functionality

2.8 Gb/s of bandwidth per link

Latency of one way tree traversal 2.5 s

~23TB/s total binary tree bandwidth (64k machine)

Interconnects all compute and I/O nodes (1024)

EthernetIncorporated into every node ASIC

Active in the I/O nodes (1:64)

All external comm. (file I/O, control, user interaction, etc.)

Low Latency Global Barrier and InterruptLatency of round trip 1.3 s

Control Network

Click to edit the title text format

IBM Research, Sandia National Labs, Bell Labs, and CMU(c) 2008 IBM Corporation

Systems Support for Many Task Computing

11/17/2008

IBM Research, Sandia National Labs, Bell Labs, & CMU

Systems Support for Many Task Computing

11/17/2008

(c) 2008 IBM Corporation

IBM Research

FastOS Workshop

05/30/06

(c) 2006 IBM Corporation