distributed systems - konkukdms.konkuk.ac.kr/wordpress/wp-content/uploads/2018/03/01... ·...
TRANSCRIPT
Distributed Systems
(3rd Edition)
Chapter 1: Introduction
Version: February 25, 2017
Distributed System
Introduction: What is a distributed system?
Definition
A distributed system is a collection of autonomous computing elements that
appears to its users as a single coherent system.
2/56
Characteristic features
1. Autonomous computing elements, also referred to as nodes, be they
hardware devices or software processes.
2. Single coherent system: users or applications perceive a single system ⇒nodes need to collaborate.
Characteristic 1: Collection of autonomous computing elements
Distributed System
Introduction: What is a distributed system?
Title
Each node is autonomous and will thus have its own notion of time: there is no
global clock. Leads to fundamental synchronization and coordination problems.
3/56
Collection of nodes
1. How to manage group membership?
2. How to know that you are indeed communicating with an authorized
(non)member?
Characteristic 1: Collection of autonomous computing elements
Organization
Introduction: What is a distributed system?
Overlay network
Each node in the collection communicates only with other nodes in the system, its
neighbors. The set of neighbors may be dynamic, or may even be known only
implicitly (i.e., requires a look up).
4/56
Overlay networkWell-known example of overlay networks: peer-to-peer systems.
Structured: each node has a well-defined set of neighbors with whom it can
communicate (tree, ring).
Unstructured: each node has references to randomly selected other nodes from
the system.
Examples1. An end user cannot tell where a computation is taking place
2. Where data is exactly stored should be irrelevant to an application
3. If or not data has been replicated is completely hidden
Keyword is distribution transparency
Essence
The collection of nodes as a whole operates the same, no matter where, when,
and how interaction between a user and the system takes place.
Characteristic 2: Single coherent system
Coherent system
Introduction: What is a distributed system?
5/56
The snag: partial failures
It is inevitable that at any time only a part of the distributed system fails. Hiding
partial failures and their recovery is often very difficult and in general impossible
to hide.
Middleware and distributed systems
Middleware: the OS of distributed systems
Introduction: What is a distributed system?
What does it contain?
Commonly used components and functions that need not be implemented by
applications separately.
6/56
What do we want to achieve?
Introduction: Design goals
7/56
1. Support sharing of resources
2. Distribution transparency
3. Openness
4. Scalability
Supporting resource sharing
Collection of autonomous nodes
Introduction: Design goals
Observation
“The network is the computer”
(quote from John Gage, then at Sun Microsystems)
8/56
Canonical examples
1. Cloud-based shared storage and files
2. Peer-to-peer assisted multimedia streaming
3. Shared mail services (think of outsourced mail systems)
4. Shared Web hosting (think of content distribution networks)
Making distribution transparent
Distribution transparency
Introduction: Design goals
Types
9/56Types of distribution transparency
Transparency Description
Access Hide differences in data representation and how an object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another locationwhile in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by severalindependent users
Failure Hide the failure and recovery of an object
Making distribution transparent
Degree of transparency
Introduction: Design goals
Observation
10/56Degree of distribution transparency
Aiming at full distribution transparency may be too much:
1. There are communication latencies that cannot be hidden\
2. Completely hiding failures of networks and nodes is (theoretically and
practically)impossible
a) You cannot distinguish a slow computer from a failing one
b) You can never be sure that a server actually performed an
operation before a crash
3. Full transparency will cost performance, exposing distribution of the
system
a) Keeping replicas exactly up-to-date with the master takes time
b) Immediately flushing write operations to disk for fault tolerance
Making distribution transparent
Degree of transparency
Introduction: Design goals
12/56Interoperability, composability, and extensibility
Exposing distribution may be good
1. Making use of location-based services (finding your nearby friends)
2. When dealing with users in different time zones
3. When it makes it easier for a user to understand what’s going on (when
e.g., a server does not respond for a long time, report it as failing).
Conclusion
Distribution transparency is a nice a goal, but achieving it is a different story, and
it should often not even be aimed at.
Being open
Openness of distributed systems
Introduction: Design goals
What are we talking about?
Be able to interact with services from other open systems, irrespective of the
underlying environment:
1. Systems should conform to well-defined interfaces
2. Systems should easily interoperate
3. Systems should support portability of applications
4. Systems should be easily extensible
12/56Interoperability, composability, and extensibility
Being open
Policies versus mechanisms
Introduction: Design goals
Observation
1. What level of consistency do we require for client-cached data?
2. Which operations do we allow downloaded code to perform?
3. Which QoS requirements do we adjust in the face of varying bandwidth?
4. What level of secrecy do we require for communication?
13/56Separating policy from mechanism
Implementing openness: mechanisms
1. Allow (dynamic) setting of caching policies
2. Support different levels of trust for mobile code
3. Provide adjustable QoS parameters per data stream
4. Offer different encryption algorithms
Being open
On strict separation
Introduction: Design goals
Observation
The stricter the separation between policy and mechanism, the more we need to
make ensure proper mechanisms, potentially leading to many configuration
parameters and complex management.
14/56Separating policy from mechanism
Finding a balance
Hard coding policies often simplifies management and reduces complexity at the
price of less flexibility. There is no obvious solution.
Being scalable
Scale in distributed systems
Introduction: Design goals
Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.
15/56Scalability dimensions
At least three components1. Number of users and/or processes (size scalability)
2. Maximum distance between nodes (geographical scalability)
3. Number of administrative domains (administrative scalability)
Observation
Most systems account only, to a certain extent, for size scalability. Often a
solution: multiple powerful servers operating independently in parallel. Today, the
challenge still lies in geographical and administrative scalability.
Being scalable
Size scalability
Introduction: Design goals
Root causes for scalability problems with centralized solutions
1. The computational capacity, limited by the CPUs
2. The storage capacity, including the transfer rate between CPUs and disks
3. The network between the user and the centralized service
16/56Scalability dimensions
Being scalable
Problems with geographical scalability
Introduction: Design goals
20/56Scalability dimensions
1. Cannot simply go from LAN to WAN: many distributed systems assume synchronous client-server interactions: client sends request and waits
for an answer. Latency may easily prohibit this scheme.
2. WAN links are often inherently unreliable: simply moving streaming
video from LAN to WAN is bound to fail.
3. Lack of multipoint communication, so that a simple search broadcast
cannot be deployed. Solution is to develop separate naming and
directory services(having their own scalability problems).
Being scalable
Scale in distributed systems
Introduction: Design goals
Essence
Conflicting policies concerning usage (and thus payment), management, and
security
21/56Scalability dimensions
Examples
1. Computational grids: share expensive resources between different
domains.
2. Shared equipment: how to control, manage, and use a shared radio
telescope constructed as large-scale shared sensor network?
Exception: several peer-to-peer networks1. File-sharing systems (based, e.g., on BitTorrent)
2. Peer-to-peer telephony (Skype)
3. Peer-assisted audio streaming (Spotify)
Note: end users collaborate and not administrative entities.
Being scalable
Techniques for scaling
Introduction: Design goals
22/56Scaling techniques
Hide communication latencies
1. Make use of asynchronous communication 2. Have separate handler for incoming response 3. Problem: not every application fits this model
Being scalable
Techniques for scaling
Introduction: Design goals
23/56Scaling techniques
Facilitate solution by moving computations to client
Being scalable
Techniques for scaling
Introduction: Design goals
24/56Scaling techniques
Partition data and computations across multiple machines
1. Move computations to clients (Java applets) 2. Decentralized naming services (DNS)3. Decentralized information systems (WWW)
Being scalable
Techniques for scaling
Introduction: Design goals
25/56Scaling techniques
Replication and caching: Make copies of data available at different machines
1. Replicated file servers and databases2. Mirrored Web sites3. Web caches (in browsers and proxies) 4. File caching (at server and client)
Being scalable
Scaling: The problem with replication
Introduction: Design goals
26/56Scaling techniques
Applying replication is easy, except for one thing
1. Having multiple copies (cached or replicated), leads to in
consistencies: modifying one copy makes that copy different from
the rest.
2. Always keeping copies consistent and in a general way requires
global synchronization on each modification.
3. Global synchronization precludes large-scale solutions.
Being scalable
Scaling: The problem with replication
Introduction: Design goals
26/56Scaling techniques
Applying replication is easy, except for one thing
1. Having multiple copies (cached or replicated), leads to in
consistencies: modifying one copy makes that copy different from
the rest.
2. Always keeping copies consistent and in a general way requires
global synchronization on each modification.
3. Global synchronization precludes large-scale solutions.
Observation
If we can tolerate inconsistencies, we may reduce the need for global
synchronization, but tolerating inconsistencies is application dependent.
Pitfalls
Developing distributed systems: Pitfalls
Introduction: Design goals
27/56
Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.
False (and often hidden) assumptions
1. The network is reliable
2. The network is secure
3. The network is homogeneous
4. The topology does not change
5. Latency is zero
6. Bandwidth is infinite
7. Transport cost is zero
8. There is one administrator
Three types of distributed systems
Introduction: Types of distributed systems
28/56
1. High performance distributed computing systems
2. Distributed information systems
3. Distributed systems for pervasive computing
High performance distributed computing
Parallel computing
Introduction: Types of distributed systems
29/56
Observation
High-performance distributed computing started with parallel computing
Multiprocessor and multicore versus multicomputer
High performance distributed computing
Distributed shared memory systems
Introduction: Types of distributed systems
30/56
ObservationMulti processors are relatively easy to program in comparison to multi computers,
yet have problems when increasing the number of processors (or cores).Solution:
Try to implement a shared-memory model on top of a multi computer.
Example through virtual-memory techniques
Map all main-memory pages (from different processors) into one single virtual
address space. If process at processor A addresses a page P located at processor
B, the OS at A traps and fetches P from B, just as it would if P had been located on
local disk.
ProblemPerformance of distributed shared memory could never compete with that of
multiprocessors, and failed to meet the expectations of programmers. It has been
widely abandoned by now.
High performance distributed computing
Cluster computing
Introduction: Types of distributed systems
31/56Cluster computing
Essentially a group of high-end systems connected through a LAN
1. Homogeneous: same OS, near-identical hardware
2. Single managing node
High performance distributed computing
Grid computing
Introduction: Types of distributed systems
32/56Grid computing
The next step: lots of nodes from everywhere
1. Heterogeneous
2. Dispersed across several organizations
3. Can easily span a wide-area network
Note
To allow for collaborations, grids generally use virtual organizations. In essence,
this is a grouping of users (or better: their IDs) that will allow for authorization on
resource allocation.
High performance distributed computing
Architecture for grid computing
Introduction: Types of distributed systems
33/56Grid computing
The layers
Fabric: Provides interfaces to local resources
(for querying state and capabilities, locking, etc.)
Connectivity: Communication/transaction
protocols, e.g., for moving data between
resources. Also various authentication
protocols.
Resource: Manages a single resource, such as
creating processes or reading data.
Collective: Handles access to multiple
resources: discovery, scheduling,
replication.
Application: Contains actual grid applications in a
single organization.
High performance distributed computing
Cloud computing
Introduction: Types of distributed systems
34/56Cloud computing
High performance distributed computing
Cloud computing
Introduction: Types of distributed systems
35/56Cloud computing
Make a distinction between four layers
1. Hardware: Processors, routers, power and cooling systems. Customers
normally never get to see these.
2. Infrastructure: Deploys virtualization techniques. Evolves around
allocating and managing virtual storage devices and virtual servers.
3. Platform: Provides higher-level abstractions for storage and such. Example:
Amazon S3 storage system offers an API for (locally created) files to be
organized and stored in so-calledbuckets.
4. Application: Actual applications, such as office suites (text processors,
spreadsheet applications, presentation applications). Comparable to the
suite of apps shipped with OSes.
Distributed information systems
Integrating applications
Introduction: Types of distributed systems
42/56
Situation
Organizations confronted with many networked applications, but achieving
interoperability was painful.
Basic approach
Next step
Allow direct application-to-application communication, leading to Enterprise
Application Integration.
A networked application is one that runs on a server making its services
available to remote clients. Simple integration: clients combine requests for
(different) applications; send that off; collect responses, and present a
coherent result to the user.
Example EAI: (nested) transactions
Introduction: Types of distributed systems
43/56Distributed transaction processing
Transaction
Distributed information systems
Primitive Description
BEGIN TRANSACTION Mark the start of a transaction
END TRANSACTION Terminate the transaction and try to commit
ABORT TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise
Issue: all-or-nothing
1. Atomic: happens indivisibly (seemingly) 2. Consistent: does not violate system invariants 3. Isolated: not mutual interference4. Durable: commit means changes are permanent
TPM: Transaction Processing Monitor
Introduction: Types of distributed systems
44/56Distributed transaction processing
Observation
In many cases, the data involved in a transaction is distributed across several
servers. A TP Monitoris responsible for coordinating the execution of a
transaction.
Distributed information systems
Middleware and EAI
Introduction: Types of distributed systems
45/56Enterprise application integration
Middleware offers communication facilities for integration
Remote Procedure Call (RPC):Requests are sent through local procedure call,
packaged as message, processed, responded through message, and result
returned as return from call.
Message Oriented Middleware (MOM):Messages are sent to logical contact
point (published), and forwarded to subscribed applications.
Distributed information systems
Middleware and EAI
Introduction: Types of distributed systems
46/56Enterprise application integration
Distributed information systems
File transfer: Technically simple, but not flexible:
1. Figure out file format and layout
2. Figure out file management
3. Update propagation, and update notifications.
Shared database: Much more flexible, but still requires common data scheme
next to risk of bottleneck
Remote procedure call: Effective when execution of a series of actions is
needed.
Messaging: RPCs require caller and callee to be up and running at the same
time. Messaging allows decoupling in time and space.
Distributed pervasive systems
Introduction: Types of distributed systems
47/56
Three (overlapping) subtypes
Pervasive systems
Middleware offers communication facilities for integration
Emerging next-generation of distributed systems in which nodes are small,
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.
1. Ubiquitous computing systems: pervasive and continuously present, i.e.,
there is a continuous interaction between system and user.
2. Mobile computing systems: pervasive, but emphasis is on the fact that
devices are inherently mobile.
3. Sensor (and actuator) networks: pervasive, with emphasis on the actual
(collaborative) sensing and actuation of the environment.
Ubiquitous systems
Introduction: Types of distributed systems
48/56Ubiquitous computing systems
Middleware offers communication facilities for integration
1. (Distribution) Devices are networked, distributed, and accessible in a
transparent manner
2. (Interaction) Interaction between users and devices is highly unobtrusive
3. (Context awareness) The system is aware of a user’s context in order to
optimize interaction
4. (Autonomy) Devices operate autonomously without human intervention,
and are thus highly self-managed
5. (Intelligence) The system as a whole can handle a wide range of dynamic
actions and interactions
Pervasive systems
Mobile computing
Introduction: Types of distributed systems
49/56Mobile computing systems
Distinctive features
1. A myriad of different mobile devices (smartphones, tablets, GPS devices,
remote controls, active badges.
2. Mobile implies that a device’s location is expected to change over time ⇒change of local services, reachability, etc. Keyword: discovery.
3. Communication may become more difficult: no stable route, but also
perhaps no guaranteed connectivity ⇒ disruption-tolerant networking.
Pervasive systems
Mobility patterns
Introduction: Types of distributed systems
50/56Mobile computing systems
Issue
What is the relationship between information dissemination and human
mobility? Basic idea: an encounter allows for the exchange of information
Pervasive systems
A successful strategy
1. Alice’s world consists of friends and strangers.
2. If Alice wants to get a message to Bob: hand it out to all her friends
3. Friend passes message to Bob at first encounter
Issue
This strategy works because (apparently) there are relatively closed
communities of friends.
How mobile are people?
Introduction: Types of distributed systems
52/56Mobile computing systems
Experimental results
Tracing 100,000 cell-phone users during six months leads to:
Pervasive systems
Moreover: people tend to return to the same place after 24, 48, or 72 hours ⇒we’re not that mobile.
Sensor networks
Introduction: Types of distributed systems
53/56Sensor networks
Characteristics
The nodes to which sensors are attached are:
1. Many (10s-1000s)
2. Simple (small memory/compute/communication capacity)
3. Often battery-powered (or even battery-less)
Pervasive systems
Sensor networks as distributed databases
Introduction: Types of distributed systems
54/56Sensor networks
Two extremes
Pervasive systems