smd149 - operating systems - multiprocessing · smd149 - operating systems - multiprocessing ......

28
Introduction Multiprocessing architecture Operating system organization Memory access architectures Scheduling and process migration SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Introduction Multiprocessing architecture Operating system organization Memory access architectures Scheduling and process migration Overview Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55

Upload: duongthu

Post on 20-Aug-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

SMD149 - Operating Systems - Multiprocessing

Roland Parviainen

December 1, 2005

1 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Overview

Multiprocessor systems

Multiprocessor, operating system and memory organizations

2 / 55

Page 2: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Introduction

Multiprocessor system

Computer that contains more than one processor

Benefits

Increased processing powerScale resource use to application requirements

Additional operating system responsibilities

All processors remain busyEven distribution of processes throughout the systemAll processors work on consistent copies of shared dataExecution of related processes synchronizedMutual exclusion enforced

3 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Multiprocessing architecture

Examples of multiprocessors

Dual-processor personal computerPowerful server containing many processorsCluster of workstations

Classifications of multiprocessor architecture

Nature of datapathInterconnection schemeHow processors share resources

4 / 55

Page 3: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Flynn’s taxonomy

Based upon the number of concurrent instruction (or control) anddata streams available

SISD: Single instruction, single data streamMISD: Multiple instruction, single data streamSIMD: Single instruction, multiple data streamsMIMD: Multiple instruction, multiple data streams

5 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

SISD

Single instruction, single data stream

Sequential computer

Exploit minor parallelism:

PipeliningSuperscalar processorsHyper threading

6 / 55

Page 4: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

MISD

Multiple instruction, single data stream

Not commonly used

Redundancy

Three processors do the same computation, results are compared

7 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

SIMD

Single instruction, multiple data streams

One or more processing units

Same instruction on a block of data

Array or vector processors

Most processors have SIMD instructions

PowerPC’s AltiVec, Intel’s MMX, SSE, SSE2 and SSE3, AMD’s3DNow!, SPARCs VIS, PA-RISCs MAX and the MIPS MDMX andMIPS-3D.Multimedia

8 / 55

Page 5: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

MIMD

Multiple instruction, multiple data streams

Multiple autonomous processors simultaneously executing differentinstructions on different data

Multiprocessor systems

Network of workstations

Distributed systems

9 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Processor interconnection schemes

Describes how the system’s components, such as processors andmemory modules, are connected

Consists of nodes (components or switches) and links (connections)

Parameters used to evaluate interconnection schemes

Node degreeBisection widthNetwork diameterCost of the interconnection scheme

10 / 55

Page 6: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Shared bus

Single communication path between all nodes

Contention can build up for shared bus

Lacks scalability

Useful for interconnecting a small number of processors

For systems with many processors, a bus could be used tointerconnect nodes within a supernode

Dynamic network

Example: dual-processor Intel Pentium

11 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Shared bus

12 / 55

Page 7: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Crossbar-switch matrix

Separate path from every processor to every memory module (or ingeneral, from every node to every other node)

For n processors and m memory modules, there will be n x m totalswitches

High fault tolerance:

Bisection width = (n x m)/2

High performance:

Simultaneous transmissions can be supported

Network diameter = 1 yields low latency

High cost: proportional to n x m

Example: Sun UltraSPARC-III

13 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Crossbar-switch matrix

14 / 55

Page 8: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

2-D mesh network

n rows and m columns, in which a node is connected to nodesdirectly north, south, east and west of it

Maximum node degree = 4

Moderate performance, fault tolerance and cost

Not very scalable given that network diameter may be substantial

Example: Intel Paragon (large-scale system, but communication iskept between neighboring nodes)

15 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

2-D mesh network

16 / 55

Page 9: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Hypercube

n-dimensional hypercube has 2n nodes in which each node isconnected to n neighbor nodes

Better scalability and performance

4-D hypercube (16 nodes) results in network diameter of 4, while a2-D mesh network with 16 nodes results in network diameter of 6

More fault tolerant, but more expensive than crossbar switch or 2-Dmesh networks

Example: nCUBE (up to 8192 processors)

17 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Hypercube

18 / 55

Page 10: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Multistage networks

Switch nodes act as hubs routing messages between nodes

Cheaper (use simple hardware for switches that can be packedtightly)

Less fault tolerant and worse performance compared to acrossbar-switch matrix

Contention can occur at switches and network diameter is typicallylarger

Example: IBM POWER4

19 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Multistage networks

20 / 55

Page 11: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Loosely coupled vs. tightly coupled systems

Tightly coupled systems

Processors share most resources including memoryCommunicate over shared busesLess burden for operating system programmersLess flexible, poor scalability and fault tolerance

Loosely coupled systems

Processors do not share most resourcesDedicated memoryExplicit messages using communication link or shared virtual memoryMore flexible, fault tolerant, scalable

21 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Loosely coupled vs. tightly coupled systems

22 / 55

Page 12: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Multiprocessor operating system organizations

Significantly different from uniprocessor systems

Systems classified into three types (based on how processors shareoperating system responsibilities)

Master/slaveSeparate kernelsSymmetrical organization

23 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Master/slave

Master processor executes operating system code

Computation and I/O jobs

Slaves execute only user programs

Processor-bound jobs

Hardware asymmetry

Slave processors have to wait for the master processor to handle theinterrupt for OS operations

Low fault tolerance

Failure of the master processor results in system failure

Good for computationally intensive jobs

Example: nCUBE system

24 / 55

Page 13: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Separate kernels

Each processor executes its own operating system and responds onlyto interrupts generated from user processes running on thatprocessor

Loosely coupled

Little contention over resourcesSome globally shared operating system data

System failure unlikely, but failure of one processor results intermination of processes on that processor

Useful in environments where processes do not interact much andhigh availability is important

25 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Symmetrical

Operating system manages a pool of identical processors

High amount of resource sharingNeed for mutual exclusionAllows processes to truly share the processors and exploit parallelism

Highest degree of fault tolerance of any organization

Facilitates graceful degradation

Some contention for resources can limit performance improvementgained by adding more processors

26 / 55

Page 14: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Memory access architectures

Classification based on how memory is shared

Trade offs between performance, scalability and cost

27 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

UMA

Uniform memory access (UMA) multiprocessor system

All processors share main memory

Access to any memory page is nearly the same for all processors andall memory modules

Typically uses shared bus or crossbar-switch matrix

Used in small multiprocessor systems (typically two to eightprocessors)

28 / 55

Page 15: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

UMA

29 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

NUMA

Nonuniform memory access (NUMA) multiprocessor

Global memory divided into memory partitions or modules

Each node contains a few processors and a portion of systemmemory, which is local to that node

Access to local memory faster than access to global memory (rest ofthe memory modules)

More scalable than UMA with fewer bus collisions

Most of processor’s memory request served by local memory

30 / 55

Page 16: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

NUMA

31 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

COMA

Basic physical interconnection similar to NUMA Main memory isviewed as a cache and called an attraction memory (AM)

Allows system to migrate data to node that most often accesses it atgranularity of a memory line (more efficient than a memory page)Reduces the number of cache misses serviced remotelyOverhead: duplicated data items and complex protocol to ensure allupdates are received at all processors

32 / 55

Page 17: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

COMA

33 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

NORMA

No-Remote Memory Access

Loosely coupled

Simple to build: no complex interconnection

No shared global memory

Each node maintains its own local memory

Communication through explicit messages

Some implement the illusion of shared global memory known asshared virtual memory (SVM)

Distributed systems

Involve message passing via remote procedure calls (RPCs)

Example: Google server farms

34 / 55

Page 18: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

NORMA

35 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Memory sharing

Memory coherence

Ensure that all the processors have access to the most up- to-datecopy of dataCache coherency

Strategies to increase percentage of local memory hits

Page migrationPage replication

36 / 55

Page 19: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Cache coherence

UMA cache coherenceThree different ways to implement

Bus snooping (a.k.a. cache snooping)Maintain a centralized directory to indicate when to remove stale dataOnly one processor can cache a data item

Cache coherent NUMA

Each address is associated with a home nodeContact home node on reads and writes to receive most updatedversion of dataThe protocol requires a maximum of three network traversals percoherency operation

37 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Page replication and migration

Page replication

System places a copy of a memory page at a new nodeGood for pages read by processes at different nodes, but not written

Page migration

System moves a memory page from one node to anotherGood for pages written to by a single remote process

Increases percentage of local memory hits

Drawbacks

Memory overheadReplicating and migrating more expensive than remotely referencingso only useful if referenced again

38 / 55

Page 20: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Scheduling

Determines the order and to which processors processes aredispatched

Two goals

Parallelism - timesharing schedulingProcessor affinity - space-partitioning scheduling

Soft affinityHard affinity

Types of scheduling algorithms

Job-blindJob-aware

39 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Job-blind multiprocessor scheduling

Straightforward extension of uniprocessor algorithms

Examples

First-in-first out (FIFO) multiprocessor schedulingRound-robin (RR) multiprocessor schedulingShortest process first (SPF) multiprocessor scheduling

40 / 55

Page 21: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Job-aware multiprocessor scheduling

Algorithms that maximize parallelism

Shortest-number-of-processes-first (SNPF) schedulingRound-robin job (RRjob) schedulingCoscheduling

Processes from the same job placed in adjacent spots in the globalrun queueSliding window moves down run queue running adjacent processesthat fit into window (size = number of processors)

Algorithms that maximize processor affinity

Dynamic partitioning

41 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Coscheduling example

42 / 55

Page 22: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Dynamic partitioning

43 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Process migration

Transferring a process from one node to another node

Benefits of process migration

Increases fault toleranceLoad balancingReduces communication costsResource sharing

44 / 55

Page 23: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Flow of process migration

Either sender or receiver initiates request

Sender suspends migrating process

Sender creates message queue for migrating process’s messages

Sender transmits state to a “dummy” process at the receiver

Sender and receiver notify other nodes of process’s new location

Sender deletes its instance of the process

45 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Flow of process migration

46 / 55

Page 24: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Process migration concepts

Trade-off between residual dependency and initial migration latency

Characteristics of a successful migration strategy

Minimal residual dependencyTransparentScalableFault tolerantHeterogeneous

47 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Process migration strategies

Eager migration

Transfer entire state during initial migration

Dirty eager migration

Only transfer dirty memory pagesClean pages brought in from secondary storage

Copy-on reference migration

Similar to dirty eager except clean pages can also be acquired fromsending node on demand

Lazy copying

No initial transfer of pages

48 / 55

Page 25: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Process migration strategies

Flushing migration

Flush all dirty pages to disk before migrationTransfer no memory pages during initial migration

Precopy migration

Begin transferring memory pages before suspending processMigrate once dirty pages reach a lower threshold

49 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Load balacing

System attempts to distribute the processing load evenly amongprocessors

Reduces variance in response timesEnsures no processors remain idle while other processors areoverloadedTwo types

Static load balancingDynamic load balancing

50 / 55

Page 26: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Static load balacing

Useful in environments in which jobs exhibit predictable patternsGoals

Distribute the loadReduce communication costs

Solve using a graphMust use approximations for large systems to balance loads withreasonable overhead

51 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Static load balacing using graphs

52 / 55

Page 27: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Dynamic load balancing

More useful than static load balancing in environments wherecommunication patterns change and processes are created orterminated unpredictably

Types of policies

Sender-initiatedReceiver-initiatedSymmetricRandom

Algorithms

Bidding algorithmDrafting algorithm

53 / 55

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Dynamic load balancing

Communication issues

Communication with remote nodes can have long delays; load on aprocessor might change in this timeCoordination between processors to balance load creates manymessagesCan overload the systemSolution

Only communicate with neighbor nodesLoad diffuses throughout the system

54 / 55

Page 28: SMD149 - Operating Systems - Multiprocessing · SMD149 - Operating Systems - Multiprocessing ... Exploit minor parallelism: Pipelining ... Straightforward extension of uniprocessor

IntroductionMultiprocessing architecture

Operating system organizationMemory access architectures

Scheduling and process migration

Summary

Next time: networks and distributed systems

55 / 55