distributed shared memory systems and programming by: kenzie macneil adapted from parallel...

Post on 27-Dec-2015

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distributed Shared Memory Systems and Programming

By: Kenzie MacNeil

Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and

Distributed Shared Memory Systems

• Shared memory programming model on al cluster

• Has physically distributed and separate memory

• Programming Viewpoint:– Memory is grouped together and sharable

between processes

• Known as Distributed Shared Memory (DSM)

Distributed Shared Memory Systems

• Can be achieved by software or hardware• Software:– Easy to use on clusters– Inferior to using explicit message passing on the

same cluster

• Utilizes the same techniques as true shared memory systems (Chapter 8)

Distributed Shared Memory

• Shared memory programming is generally more convenient than message passing

• Data can be accessed by individual processors without explicitly sending data

• Shared data has to be controlled– Locks or other means

• Both message passing and shared memory often require synchronization

Distributed Shared Memory

• Distributed Shared Memory is a group of interconnected computers appearing to have a sing memory with a single address space

• Each computer having its own memory which is physically distributed

• Any memory location can be accessed by any processor in the cluster– Regardless of the memory residing locally

Distributed Shared Memory

Advantages of DMS

• Normal shared memory programming techniques can be used

• Easily scalable, compared to traditional bus-connected shared memory multiprocessors

• Message passing is hidden from the user• Can handle complex and large data bases

without replication or sending the data to processes

Disadvantages of DMS

• Lower performance than true shared memory multiprocessor systems

• Must provide for protection against simultaneous access to shared data – Locks, etc.

• Little programmer control over actual messages being generated

• Incur performance penalties when compared to message passing routines on a cluster

Hardware DSM Systems

• Special network interfaces and cache coherence circuits are required

• Several interfaces that support shared memory operations

• Higher level of performance• More expensive

Software DSM Systems

• Requires no hardware changes• Preformed by software routines• Software layer added between the operating

system and the applications– Kernel may or may not be modified

• Software layer can be– Page based– Shared variable based– Object based

Page Based DMS

• Existing virtual memory is used to instigate movement of data between computer

• Occurs when page referenced does not reside locally

• Referred to as virtual shared memory system• Page based systems include:– The first DMS system by Li(1986), TreadMarks

(1996), Locust (1998)

Page Based DSM System

Page Based DMS Disadvantages

• Size of the unit of the data, a page, can be too big• More than the specific data is usually referenced – Leads to longer messages

• Not portable, because they are tied to a particular virtual memory hardware and software

• False sharing effects appear at the page level– Situation in which different parts of a page are

required by different processors without any actual sharing of information, but each page must be shared by each process to access different parts

Shared Variable DMS

• Only variables declared as shared are transferred

• Transferred on demand– Paging mechanism is not used

• Software routines perform the actions• Shared Variable DMS approach includes:– Munin (1990), JIAJIA (1999), Adsmith (1996)

Object Based DMS

• Shared data is embodied in objects– Includes data items and procedures/methods– Methods used to access data

• Similar to shared variable approach, even considered an extension

• Easily implemented in OO languages

Managing Shared Data

• Many ways a processor can be given access to shared data

• Simplest is the use of a central server– Responsible for all read write operations on

shared data– Requests sent to this server– Occurs sequentially on the server– Implements a single reader/ single writer policy

Managing Shared Data

• Single reader/writer policy incurs bottleneck• Additional servers can be added to relieve this

bottleneck by sharing variables• However multiple copies of data is preferable– Allows simultaneous access to the data by

different processors– Coherence policy must be used to maintain these

copies

Multiple Reader / Single Writer

• Allows multiple processors to read shared data– Which can be achieved by replicating data

• Allows only one processor, the owner, to alter data at any instant

• When an owner alters data two policies are available:– Update policy– Invalidate policy

Multiple Reader/Single Writer Policy

• Update policy– Utilizes broadcast– All copies are altered to reflect broadcast message

• Invalidate policy– All unaltered copies of the data are flagged as invalid– Requires a processor to make a request from the

owner– Any copies of the data that are not accessed remain

invalid• Both policies are needed to be reliable

Multiple Reader/Single Writer Policy

• Page based approach• Complete page, which holds the variable, is

transferred• A variable stored on a page which is not

shared will be moved or invalidated• Protocols offered by applications like

TreadMarks for dual writing to a single page

Achieving Consistent Memory in DSM

• Memory consistency addresses when the current value of a shared variable is seen by other processors

• Various models are available:– Strict Consistency– Sequential Consistency– Relaxed Consistency– Weak consistency– Release Consistency– Lazy Release Consistency

Strict Consistency

• Variable is obtained from the most recent write to the shared variable

• As soon as a variable is altered all other processors are informed– Can be done by update or invalidity

• Disadvantage is the large number of messages and changes are not instantaneous

• Relaxed memory consistency, writes are delayed to reduce message passing

Strict Consistency

Sequential and Weak Consistency

• Sequential consistency, result of any execution same as an interleaving of individual programs

• Weak consistency, synchronized operations are used by the programmer to enforce sequential consistency

• Any accesses to shared data can be controlled with synchronized operations– Locks, etc

Release Consistency

• Extension of weak consistency• Specified synchronization operation– Acquire operation, used before a shared variable or

variables are to be read– Release operations, used after the shared variable

or variable have been altered

• Acquire is performed with a lock operation• Release is performed with an unlock operation

Release Consistency

Lazy Release Consistency

• Version of release consistency• Update is only done at the time of acquire

rather than at release• Generates fewer messages that release

consistency

Lazy Release Consistency

Distributed Shared Memory Programming Primitives

• Four fundamental and necessary operations of shared memory programming:– Process/thread creations and termination– Shared data creation– Mutual exclusion synchronization, controlled

access to shared data– Process/thread and event synchronization

• Typically provided by user-level library calls

Process Creation

• Set of routines are defined by DSM systems– Such as Adsmith and TreadMarks

• Used to start new process if process creation is supported– dsm_spawn(filename, num_processes);

Shared Data Creation

• Routine is necessary to declare shared data– dsm_shared(&x); or shared int x;– Dynamically creates memory space for shared

data in the manner of a C malloc

• After memory space can be discarded

Shared Data Access

• Various forms of data access are provided depending on the memory consistency used

• Some systems provide efficient routines for difference classes of accesses

• Adsmith provides three types of accesses:– Ordinary Accesse– Synchronization Access– Non-Synchronization Access

Synchronization Accesses

• Two principle forms:– Global synchronization and process-process pair

synchronization• Global is usually done through barrier routines• Process-process pair can be done by the same

routine or separate routines through simple synchronous send/receive routines

• DSM systems could also provide their own routines

Overlapping Computations with Communications

• Can be provided by starting a nonblocking communication before it results are needed– Called a prefetch routine

• Program continues execution after the prefetch has been called and while the data is being fetched

• Could even be done speculatively• Special mechanism must be in place to handle memory

exceptions• Similar to speculative load mechanism used in

advanced processors that overlap memory operations with program execution

Distributed Shared Memory Programming

• DSM programming on a cluster uses the same concepts as shared memory programming on a shared memory multiprocessor system

• Uses user level library routines or methods • Message passing is hidden from the user

Basic Shared-Variable Implementation

• Simplest DSM implementation is to use a shared variable approach with user level DSM library routines– Sitting on top of an existing message passing systems,

such as MPI– Routines can be embodied into classes and methods

• The routines could send messages to a central location that is responsible for the shared variables

Simple DSM System using a Centralized Server

Single reader/writer protocol

Basic Shared-Variable Implementation

• A simple DSM system using a centralized server can easily result in a bottleneck

• One method to reduce this bottleneck is to have multiple servers running on different processors

• Each server responsible for specific shared variables

• This is a single reader / single writer protocol

Simple DSM System using Multiple Servers

Basic Shared-Variable Implementation

• Also can provide multiple reader capability • A specific server is responsible for the shared

variable• Other local copies are invalidated

Simple DSM System using Multiple Servers and Multiple Reader Policy

Overlapping Data Groups

• Existing interconnections structure• Access patterns of the application• Static overlapping– Defined by the programmer prior to execution

• Shared variables can migrate according to usage

Symmetrical Multiprocessor System with Overlapping Data Regions

Simple DSM System using Multiple Servers and Multiple Reader Policy

Questions or Comments?

top related