designing parallel operating systems via parallel programming

37
Euro-Par - August 31- September 3, 2004 - Pisa (Italy) Designing Parallel Operating Systems via Parallel Programming email:[email protected] Eitan Frachtenberg 1 , Kei Davis 1 , Fabrizio Petrini 1 , Juan Fernández 1,2 and José Carlos Sancho 1 1 Performance and Architecture Lab (PAL) 2 Grupo de Arquitectura y Computación Paralelas (GACOP) CCS-3 Modeling, Algorithms and Informatics Dpto. Ingeniería y Tecnología de Computadores Los Alamos National Laboratory, NM 87545, USA Universidad de Murcia, 30071 Murcia, SPAIN URL: http://www.c3.lanl.gov URL: http:// www.ditec.um.es

Upload: hazina

Post on 18-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Designing Parallel Operating Systems via Parallel Programming. Eitan Frachtenberg 1 , Kei Davis 1 , Fabrizio Petrini 1 , Juan Fernández 1,2 and José Carlos Sancho 1 1 Performance and Architecture Lab (PAL) 2 Grupo de Arquitectura y Computación Paralelas (GACOP) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Designing Parallel Operating Systemsvia Parallel Programming

Designing Parallel Operating Systemsvia Parallel Programming

email:[email protected]

Eitan Frachtenberg1, Kei Davis1, Fabrizio Petrini1,

Juan Fernández1,2 and José Carlos Sancho1

1Performance and Architecture Lab (PAL) 2Grupo de Arquitectura y Computación Paralelas (GACOP)

CCS-3 Modeling, Algorithms and Informatics Dpto. Ingeniería y Tecnología de Computadores

Los Alamos National Laboratory, NM 87545, USA Universidad de Murcia, 30071 Murcia, SPAIN

URL: http://www.c3.lanl.gov URL: http://www.ditec.um.es

Page 2: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

HARDWAREHARDWARE = = Independent Nodes + High-speed Independent Nodes + High-speed NetworkNetwork

SOFTWARESOFTWARE= = Commodity OS + Parallel Apps + System Commodity OS + Parallel Apps + System SoftwareSoftware

OSOS OSOSOSOS

OSOS

OSOS OSOSOSOS

OSOS

MotivationMotivation

Clusters have been the most successful player in high-performance computing in the last decade

Page 3: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Earth Simulator Earth Simulator 5120 Processors5120 Processors

Thunder (LLNL) Thunder (LLNL) 4096 Processors4096 Processors

ASCI Q (LANL) ASCI Q (LANL) 8192 Processors8192 Processors

MotivationMotivation

Ever-increasing demand for computing capability is driving the construction of ever-larger clusters

Systems are becoming more complex,Systems are becoming more complex,less efficient and less reliableless efficient and less reliable

Page 4: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

PROBLEM: parallel software has neither PROBLEM: parallel software has neither evolved nor scaled accordingly to cluster evolved nor scaled accordingly to cluster

sizessizes

MotivationMotivation

SOLUTION: new approach to the design SOLUTION: new approach to the design of parallel software for large-scale of parallel software for large-scale

clustersclusters

Clusters are loosely-coupled systems used for solving inherently tightly-coupled problems

Parallel software keeps all the pieces together

Development of parallel software is a time- and resource- consuming task due to its complexity

Page 5: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

GoalsGoals

Target New methodology for the design of parallel

software Simplicity, performance, scalability, reliability Backbone to integrate all nodes into a parallel OS

Vision BSP-like system running MIMD applications

(variable granularity in the order of hundreds of s)

Approach BSP-like global control and coordination of all

system activities Small set of collective communication primitives

for global coordination

Page 6: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Motivation and Goals

Toward a Parallel Operating System

Core Primitives

Parallel Software Design

Case Studies

Concluding remarks

OutlineOutline

Page 7: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Designing a Parallel OS:

Lack of global coordination (loose coupling)

Redundant/missing functionality (complexity)

Toward a Parallel OSToward a Parallel OS

Hardware

CommProtocol 1

CommProtocol 2 . . . Comm

Protocol N

ResourceManagement

ParallelApplication . . . Parallel

File System

Page 8: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Toward a Parallel OSToward a Parallel OS

Scientific applications are tightly coupled … Data dependencies between nodes They exchange messages very often

… but the processing nodes are “bolted

together” in a loosely coupled fashion

Need for global control and coordination Need for global control and coordination ofof

all the system activities, enforced byall the system activities, enforced byglobal collective communication global collective communication

primitivesprimitives

Page 9: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Designing a Parallel OS:

System-level, global control and coordination of all application and system software activities

Toward a Parallel OSToward a Parallel OS

Hardware

CommProtocol 1

CommProtocol 2 . . . Comm

Protocol N

Global control and coordination

ResourceManagement

ParallelApplication . . . Parallel

File System

Page 10: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Toward a Parallel OSToward a Parallel OS

Parallel applications use point-to-point and

collective communication

System software tasks are either collective

operations or can be cast in terms of them

Parallel applications and system Parallel applications and system software can be built atop the same software can be built atop the same

communication primitivescommunication primitives

Page 11: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Designing a Parallel OS:

Least common denominator of system and application software Core Primitives

Toward a Parallel OSToward a Parallel OS

Hardware

CommProtocol 1

CommProtocol 2 . . . Comm

Protocol NCore Primitives

Global control and coordination

ResourceManagement

ParallelApplication . . . Parallel

File System

Page 12: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Motivation and Goals

Toward a Parallel Operating System

Core Primitives

Parallel Software Design

Case Studies

Concluding remarks

OutlineOutline

Page 13: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Parallel software built atop three primitives Xfer-And-Signal

– Transfer block of data to a set of nodes– Optionally signal local/remote event upon completion

Test-Event– Poll local event

Compare-And-Write– Compare global variable on a set of nodes– Optionally write global variable on the same set of nodes

Core PrimitivesCore Primitives

Page 14: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Parallel software built atop three primitives Xfer-And-Signal (QsNet):

– Node S transfers block of data to nodes D1, D2, D3 and D4

S D1 D2D4D3

Core PrimitivesCore Primitives

Page 15: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Parallel software built atop three primitives Xfer-And-Signal (QsNet):

– Node S transfers block of data to nodes D1, D2, D3 and D4

– Events triggered at source and destinations

S D1 D2D4D3

SourceEvent

DestinationEvents

Core PrimitivesCore Primitives

Page 16: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Parallel software built atop three primitives Compare-And-Write (QsNet):

– Node S compares variable V on nodes D1, D2, D3 and D4

S D1 D2D4D3

•Is V {, , >} to Value?

Core PrimitivesCore Primitives

Page 17: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Parallel software built atop three primitives Compare-And-Write (QsNet):

– Node S compares variable V on nodes D1, D2, D3 and D4

– Partial results are combined in the switches

S D1 D2D4D3

Core PrimitivesCore Primitives

Page 18: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Motivation and Goals

Toward a Parallel Operating System

Core Primitives

Parallel Software Design

Case Studies

Concluding remarks

OutlineOutline

Page 19: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

•Global Strobe•(time slice starts)

•Global Strobe•(time slice ends)

Task 1

Task 2

•Global•Synchronization

•Global•Synchronization

Tim

e S

lice

(h

un

dre

ds

of s

)Toward a Parallel OSToward a Parallel OS

Global control/coordination of all system activities

Task 3

Page 20: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Using the core primitives… Global control and coordination

– Strobe sent at regular intervals (time slices) Compare-And-Write + Xfer-And-Signal (Master) Test-Event (Slaves)

– All system activities are tightly coupled– Global information is required to schedule resources, global

synchronization facilitates the task but it is not enough

Global resource scheduling– Exchange of requirements/restrictions

Xfer-And-Signal + Test-Event– Resource scheduling

Parallel Software DesignParallel Software Design

Page 21: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Characteristic Workstation Cluster

Job Launching OS Scripts atop the OS

SYSTEMSYSTEM

SOFTWARESOFTWARE

Job Scheduling Timeshared by OSBatch queued or gang scheduled by middleware

CommunicationStandard IPC and shared memory

MPI

StorageStandard file system

Custom parallel file system

Debuggability Standard toolsCustom parallel debugging tools

Fault Tolerance Little or noneApplication-specified checkpointing

Parallel Software DesignParallel Software Design

Applications System calls Rely on System Software

Page 22: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Characteristic

Requirement Solution

Job Launching

Data Dissemination

Flow Control

Termination Detection

Xfer-And-Signal

Compare-And-Write

Compare-And-Write

Job SchedulingHeartbeat

Context Switch

Xfer-And-Signal

Prioritized messages/multiple rails

Communication

PUT

GET

Barrier

Broadcast

Xfer-And-Signal

Xfer-And-Signal

Compare-And-Write

Compare-And-Write + Xfer-And-Signal

Using the core primitives…

Parallel Software DesignParallel Software Design

Page 23: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Can we really buildsystem software using

this new approach?

Parallel Software Design Parallel Software Design

Page 24: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Motivation and Goals

Introduction

Core Primitives

Parallel Software Design

Case Studies

Concluding remarks

OutlineOutline

Page 25: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Experimental Setup

Characteristic

Crescendo Cluster Wolverine Cluster

Nodes 32 x Dell 1550 64 AlphaServer ES40

CPUs/Node 2 x 1GHz Pentium-III 4 x 833MHz EV68

Memory/Node 1 GB 8 GB

Network Cards QM-400 Elan3 QM-400 Elan3

OS RH 7.3 + QsNet kernel RH 7.1 + QsNet kernel

Software

Qsnetlibs v1.5.0-0 +

Intel C/Fortran Compiler 5.0.1

Qsnetlibs v1.5.0-0 +

Compaq´s C Compiler

Case StudiesCase Studies

Page 26: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

STORM (Scalable TOol for Resource Management)– Architecture:

Set of dæmons running on the management/compute nodes Built atop the three core primitives BSP-like behavior: management activities are synchronized

and scheduled every few hundreds of microseconds

– Functionality: Job Launching Job Scheduling (FCFS, gang scheduling and others)

New scheduling algorithms can be “plugged in” Resource Accounting

Case StudiesCase Studies

Page 27: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Job Launching: send/execute/check for completion

40 times faster than the best reported result!!!

Case StudiesCase Studies

Page 28: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

BCS-MPI (Buffered CoScheduled MPI)– Architecture

Set of cooperative threads running in the NIC Built atop the three core primitives BSP-like behavior: communications are synchronized and

scheduled every few hundreds of microseconds– Functionality:

Subset of the MPI standard– Paves the way to provide:

Traffic segregation Deterministic replay of user applications System-level fault tolerance

Case StudiesCase Studies

Page 29: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

SWEEP3D and SAGE Performance (IA32)– Production-level MPI versus BCS-MPI

Case StudiesCase Studies

0.5% SPEEDUP 2% SPEEDUP

Page 30: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Motivation and Goals

Introduction

Core Primitives

Parallel Software Design

Case Studies

Concluding remarks

OutlineOutline

Page 31: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Methodology for designing parallel software Coordination of all system and application software activities

in a BSP-like fashion Parallel applications and system software built atop a basic

set of collective primitives for global coordination Backbone to integrate all nodes into a parallel OS

Promising preliminary results demonstrate that

this approach is indeed feasible

Concluding RemarksConcluding Remarks

Page 32: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Kernel-level implementation User-level solution is already working

Deterministic replay of MPI programs Ordered resource scheduling may enforce

reproducibility

Transparent fault tolerance Global coordination simplifies the state of the

machine

Future WorkFuture Work

Page 33: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Designing Parallel Operating Systemsvia Parallel Programming

Designing Parallel Operating Systemsvia Parallel Programming

email:[email protected]

Eitan Frachtenberg1, Kei Davis1, Fabrizio Petrini1,

Juan Fernández1,2 and José Carlos Sancho1

1Performance and Architecture Lab (PAL) 2Grupo de Arquitectura y Computación Paralelas (GACOP)

CCS-3 Modeling, Algorithms and Informatics Dpto. Ingeniería y Tecnología de Computadores

Los Alamos National Laboratory, NM 87545, USA Universidad de Murcia, 30071 Murcia, SPAIN

URL: http://www.c3.lanl.gov URL: http://www.ditec.um.es

Page 34: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Characteristic

Requirement Solution

StorageMetadata Transfer

File Data TransferXfer-And-Signal

Debuggability

Debug Data Transfer

Debug Synchronization

Xfer-And-Signal

Compare-And-Write

Fault Tolerance

Fault Detection

Checkpointing Synchronization

Checkpointing Data Transfer

Compare-And-Write

Compare-And-Write

Xfer-And-Signal

Using the core primitives…

Parallel Software DesignParallel Software Design

Page 35: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Job Scheduling: gang scheduling

Very small time slices: RESPONSIVENESS !!!

Case StudiesCase Studies

Page 36: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

•Global Strobe•(time slice starts)

•Global Strobe•(time slice ends)

Exchange of comm requirements

Communication scheduling

Real transmission

•Global•Synchronization

•Global•Synchronization

Tim

e S

lice

(h

un

dre

ds

of s

)Toward a Parallel OSToward a Parallel OS

BCS-MPI: real-time communication scheduling

Page 37: Designing Parallel Operating Systems via Parallel Programming

Euro-Par - August 31- September 3, 2004 - Pisa (Italy)

Toward a Parallel OSToward a Parallel OS

BCS-MPI: real-time communication scheduling