joel crichlow distributed systems: computing over networks, phi managing distributed resources

44
JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

J O E L C R I C H L O WD I S T R I B U T E D S Y S T E M S : C O M P U T I N G O V E R N E T W O R K S , P H I

MANAGING DISTRIBUTED RESOURCES

Page 2: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

ISSUES

• Naming and Addressing• Sharing• Availability and Reliability• Replication• Privacy and Security

Page 3: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

NAMING AND ADDRESSING

• Identify• node/group/user• root-directory/sub-directory/filename

• Locate/Find• Location Independence• Mapping• Name Servers

Page 4: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

NAME SERVERS

• Allocate the address translation responsibilities to a name server

• Users use symbolic names with which they interact with the client machines

• The clients communicate with a name server which does the name to address resolution

Client

Other

server

Name

Server 1

2

3

Page 5: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

NAME SERVERS

• The name server may be designed to answer requests for the name of a resource/service given its address

• Performance• Table entries for critical resources may be held in

nonvolatile primary storage• Caching at server, caching at clients

• Cooperating Name Servers• Replication, Partitioning

Page 6: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DOMAIN NAME SYSTEM

• Distributed Name Service• Multi-level set of domains• Partitioning• Replication• Caching• IPv4 (32 bits), IPv6 (128 bits)

Page 7: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DNSIPV4 ADDRESS FORMATS

8 bits 8 bits 8 bits 8 bits Class A

0 Network Host Class B

10 Network Host Class C

110 Network Host Class D

1110 Multicast address Class E

11110 Reserved for future use

Page 8: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DNS

• A slow but steady transition to IPv6 is taking place

• IPv6 is not interoperable with IPv4 therefore a transition technology is needed

• Tunneling places IPv6 packets within IPv4 packets

• The Dual-stack implementation allows both protocols to run in the same network

v6 v6 V4 v6

Page 9: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DIRECTORY SERVICE

• The name service can be incorporated into a more comprehensive directory service which allows, not only the locating of services and resources, but also the supplying of information on people

• X.500, defined by CCITT and ISO, is a good early example of such a directory service

• Several other directory services exist• A notable example, based on X.500, is LDAP, the

Lightweight Direct Access Protocol, which uses the TCP/IP stack

Page 10: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SHARING

• Access Control• Scheduling• Allocation• Sharing Primary Memory

Page 11: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SHARING

Access Control List – ACLPer resource list

ACL for Resource0

Staff RE

System RWE

Student R

Page 12: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SHARING

Capability List - CL

System Class CL Resource0 CL

Resource0 Capability with RWE Capability with RWEResource1 Capability with RE Capability with REResource2 Capability with E Capability with E

Page 13: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SHARING

• Scheduling• Pool of identical resources• Only one resource

• Allocation• Local vs remote resources• Mutually exclusive access• Indefinite postponement

• Hardware• Software

• Consistency

Page 14: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SHARING PRIMARY MEMORY

• Distributed Shared Memory• Shareable Unit

• Physical block• Logical block

• Synchronization• Consistency

Page 15: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DISTRIBUTED SHARED MEMORY

Sequential vs Release Consistency

Process

begin

a = 0

b = 0

a = a + 1

b = b + 1

end

Process

begin

acquire-lock(CS)

a = a + 1

b = b + 1

release-lock(CS)

end

Page 16: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

PAGED DSM

L/NL S/NS SSA/PM Page frame

PMT of processShared-page-ID page in DSM

Page Manager

Page Table

Shared paged global memory

Process PMT entries indicate loaded/not-loaded (L/NL), shared/not-shared (S/NS), if not-shared the secondary storage address (SSA), if shared the link to the DSM Page Manager (PM), and if loaded the page frame number in the local memory

Page 17: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

LOGICAL DSM

• Linda• The ‘tuple-space’ model of parallel programming• It consists of two types of logical tuples: process tuples and

data tuples• Process tuples are active and can execute; data tuples are

passive• Process tuples can execute simultaneously• When a process tuple is finished executing, it turns into a

data tuple• There are four basic primitives in Linda: out, in, rd, eval

• Orca• An object-based, language-based DSM

• Component Technologies and Java

Page 18: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

• Performance• Service Outcomes• How Reachable• LAN• WAN

Page 19: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

WAN• The number of possible routes through the network

between user and resource• The channel capacity through the various communication

links• The communication protocols employed

Page 20: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

Processor and Memory Upgrades• Faster Processor• More Memory• Caches• Secondary Memory

Page 21: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

CACHING

• Locality principle• Cache consistency• Cacheable and non-cacheable data• Write-through• Copy-back• Write-invalidate• Write-update• Snoopy cache• Directory scheme

Page 22: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

Software Design

SERVER

queue

client client client

Page 23: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

Databases• Partitioning• Replication• Replicated Dictionary• Queries and Sub-queries

Make a reservation for Dorothy Swift on a red sports car to be picked up in New York on (date and time given), a small hatch back to be picked up by Jill Plain in Los Angeles on (date and time given) and a station wagon for Jack Baggage in London on (date and time given).

Page 24: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITY

Find the relevant relations (or objects) quickly

A replicated dictionary is requiredOnce the relations (objects) are located, a decision must be made quickly on what should be shipped

The request can be split into three queriesThe sub-query processing is then done at the three sitesAlternatively, the pertinent records or pages could be shipped from the remote sites for processing at the initiator

Page 25: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

AVAILABILITY AND RELIABILITYMEMCACHED

• Distributed databases also form the backbone for many dynamic web-based applications

• A key approach to improving availability in such systems is to cache the recently referenced data into memory

• A noteworthy software tool that provides such a caching service is Memcached (memcached.org)• It is a free and open source distributed caching service that uses memory as

a cache for data objects that are normally stored on the back-end database• It uses a key-value store with a hash table that can be distributed across

many computers, a pool of servers• When the table is full new arrivals are accommodated by removing old data

based on LRU (Least Recently Used) order.• The client uses the input key and a hashing algorithm to locate a server

from among that client’s list of Memcached servers• That server then uses the key to store the key-value pair into its internal

hash table

Page 26: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

REPLICATION

Maintaining copies of resources at separate nodes in the network can:• Improve the pattern of communication traffic• Help load sharing• Reduce response times• Offer an alternative when a resource becomes

unavailable

Page 27: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

REPLICATION

• How many Copies• Replicas as members of a Group• Membership Service

• CreateGroup• JoinGroup• LeaveGroup• A member may leave the group voluntarily or through

failure

Page 28: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

GROUP CHANGES

• A new member joining or a member leaving changes the composition of the group• If the membership changes during the multicast of a message how should the

outcome of the multicast be classified?• What about if the replicas were participating in transaction processing and a

member failed, what happens next? What happens if the failed member was, in fact, the coordinator of the activity?

• It is necessary that group changes be known and that a group change be synchronized with other pertinent group activity

• In the ISIS toolkit group changes are handled by the maintenance of group views• A view captures the current membership list and bears a unique identifier (i.e. a sequence

number)• A group view, i+1 differs from its immediate predecessor i either by the addition of a new

member or the departure of a member voluntarily or through failure

• Group activity, e.g. message passing, can then be associated with a particular view• If the view changes before an activity is complete a decision can be made with

respect to the outcome of that activity• Coordinating view changes is primarily a message delivery issue

Page 29: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

RELIABILITY OF MESSAGE DELIVERY

• Unreliable multicast• Deliver the message to all the members of the group

without acknowledgement

• Reliable multicast• Ensure that some (if not all) members of the group

receive the message

• Atomic multicast• All operational members in the group receive the

message or none of them do

Page 30: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

RELIABILITY OF MESSAGE DELIVERY

• Any member of the group can fail during the multicast• What if the originator fails? • The originator must be monitored to determine the failure• An effective way to do this is to require that the originator

multicast an “I’m alive” message periodically• On determining the failure another member must assume

the role of originator and attempt to complete the multicast

• An election algorithm is invoked when a member of the group concludes that the leader has failed

• The Bully Algorithm can be used• The biggest bloke on the block always wins

Page 31: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

MESSAGE ORDERING

• Unordered• Totally ordered

• Centralized sequencing• Distributed sequencing, ISIS• Clocks

• Causally ordered• Vector timestamps

• Sync ordered

Page 32: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

TOTALLY ORDEREDCENTRALIZED

• A single member, the sequencer, is responsible for allocating the sequence number to a message

• Before a message can be delivered a sequence number must be obtained from the sequencer

• Lower numbered messages are processed before higher numbered ones

• Members keep a record of the next sequence number expected (or the last one received) so that should an out-of-order one arrive, it can be held back until the correct one in the sequence arrives

Page 33: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

TOTALLY ORDEREDCENTRALIZED/DISTRIBUTED

Incoming messages are held on hold-back queue where final ordering is established before a message is moved to stable queue for processing

Page 34: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

TOTALLY ORDEREDDISTRIBUTED

• In ISIS each member records Fmax, the largest final number agreed, and Pmax, its own largest proposed number

• On receiving a message with a proposed number, each receiving member i responds to the initiator with its own proposed number computed as

  Max(Fmax, Pmax) + 1 + i/n where n is the number of members

• Each member will place the message with its own proposed number on its hold-back queue

• The initiator collects all the proposals from which it selects the largest number

• All members are then notified of this final number

Page 35: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

CAUSALLY ORDERED

• Happened-before ordering• Vector timestamps• Let replicas exist in different versions and ensure

that these versions can be causally ordered• The version number is expressed as a vector with

an entry for the number of messages received from each member.

• For example, the timestamp (2, 4, 3, 1, 3) would indicate that there are 5 members in the group and the member holding this vector timestamp has received 2 messages from member 1, 4 from member 2, and so on

Page 36: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

CAUSALLY ORDERED

• Two versions Vi and Vj are causally ordered if and only if each entry in Vi’s vector timestamp is less than or equal to the corresponding entry in Vj’s vector timestamp

• The multicast message must carry a vector timestamp provided by the initiating member

• It is generated by incrementing by 1 its own entry position in the vector of the replica that it owns

• For example, if the replica owned by member 3 has the vector timestamp (2, 4, 3, 1, 3) and it initiates a multicast relating to this replica then it would update its timestamp to (2, 4, 4, 1, 3)

• Receiving members can use the arriving timestamp to establish causal order

Page 37: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SYNC-ORDERED

• Put everything “in-sync”• A sync-ordered message m divides messages

received into two mutually exclusive sets at all members: a set 1 of messages received before m and a set 2 of messages received after m

• Recall group view changes

Page 38: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

PRIVACY AND SECURITY

• Protection• Cryptography• Secret Key Cryptography• Public Key Cryptography• Digital Signatures• Kerberos and others

Page 39: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

CRYPTOGRAPHY

Block diagram of cryptographic message transfer from A to B

key, plaintext

Encryption algo.

ciphertext

Principal A

ciphertext

Decryption algo.

plaintext

Principal B

key

Page 40: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

SECRET KEY DISTRIBUTION

Secret key authentication using a protocol derived from Needham-Schroeder

S

A B

A, B, NA

KA{NA, B, KAB, KB{KAB, A, t}}

KB{KAB, A, t}, KAB{NAA}

KAB{NAA - 1}, NB

KAB{NB - 1}

1

2

3

4

5

Page 41: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

PUBLIC KEY AUTHENTICATION

Public key authentication protocol from Needham-Schroeder-Lowe

S

A

B

A, B

DKS{ B, EKB, t}

EKB{ A, NA}

EKA{NA , NB , B}

EKB{NB }

B, A

DKS{ A, EKA, t}

1

2

34

56

7

Page 42: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DIGITAL SIGNATURES

• Verification of electronic document• Public key cryptography provides a simple

mechanism for digital signatures• Principal A can send a signed message M to

principal B with two levels of encryption as follows: EKB{DKA{M}}

Page 43: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

DIGITAL SIGNATURES

• The Message Digest• A message digest function MD transforms

the variable length message M into a fixed-length bit string MD(M) called the message digest, such that• no two messages will have the same message

digest• given M it is easy to compute MD(M)• given MD(M) it is effectively impossible to

generate M

Page 44: JOEL CRICHLOW DISTRIBUTED SYSTEMS: COMPUTING OVER NETWORKS, PHI MANAGING DISTRIBUTED RESOURCES

PRIVACY AND SECURITY

• Kerberos• Key Distribution Centre (KDC)• Authentication Server (AS)• Ticket Granting Server (TGS)

• PGP• PEM• SSL