chapter 1ramzi.ucoz.com/distributedsys/chap-01v3.pdf• structured: each node has a well-defined set...
TRANSCRIPT
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
DISTRIBUTED SYSTEMS
Principles and ParadigmsSecond Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
Chapter 1
Introduction
Modified by: Dr. Ramzi Saifan
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Definition of a Distributed System (1)• A distributed system is a collection of
autonomous computing elements that appears
to its users as a single coherent system.
• Autonomous computing elements, also referred to as nodes, be
the hardware devices or software processes.
• Single coherent system: users or applications perceive a single
system nodes need to collaborate.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Collection of autonomous nodes• Independent behavior
• Each node is autonomous and will thus have its own notion of time:
there is no global clock. Leads to fundamental synchronization and
coordination problems.
• Collection of nodes
• How to manage group membership?
• How to know that you are indeed communicating with an
authorized(non)member?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Organization• Overlay network
• Each node in the collection communicates only with other
nodes in the system, its neighbors. The set of neighbors may
be dynamic, or may even be known only implicitly (i.e.,
requires a lookup).
• Overlay types: Well-known example of overlay networks:
peer-to-peer systems.
• Structured: each node has a well-defined set of neighbors with
whom it can communicate (tree, ring).
• Unstructured: each node has references to randomly selected other
nodes from the system.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 3e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Coherent system• Essence
• The collection of nodes as a whole operates the same, no matter
where, when, and how interaction between a user and the system
takes place.
• Examples
• An end user cannot tell where a computation is taking place
• Where data is exactly stored should be irrelevant to an application
• If or not data has been replicated is completely hidden
• Keyword is distribution transparency
• The snag: partial failures
• It is inevitable that at any time only a part of the distributed system
fails. Hiding partial failures and their recovery is often very difficult
and in general impossible to hide..
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Middleware: OS of Distributed Systems
• The middleware layer extends over multiple machines, and
offers each application the same interface.
• What does it contain?
• Commonly used components and functions that need not be implemented by
applications separately.
What do we want to achieve?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
1. Sharing resources
2. Transparency
3. Openness
4. Scalability
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Transparency in a Distributed System
Figure 1-2. Different forms of transparency in a
distributed system (ISO, 1995).
• Aiming at full distribution transparency may be too much: • There are communication latencies that cannot be hidden
• Completely hiding failures of networks and nodes is (theoretically and
practically) impossible • You cannot distinguish a slow computer from a failing one
• You can never be sure that a server actually performed an operation before
a crash
• Full transparency will cost performance, exposing distribution of
the system • Keeping replicas exactly up-to-date with the master takes time
• Immediately flushing write operations to disk for fault tolerance
• Conclusion: distribution transparency is a nice a goal, but achieving it is a
different story, and it should often not even be aimed at.
Degree of transparency
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Openness of distributed systems
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
•Open distributed system: Be able to interact with services from
other open systems, irrespective of the underlying environment:
oSystems should conform to well-defined interfaces
oSystems should support portability of applications
oSystems should easily interoperate
oSystems should be extensible
•Achieving openness: At least make the distributed system
independent from heterogeneity of the underlying environment:
oHardware
oPlatforms
oLanguages
Degrees of scalability
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
1. Size
2. Geographic
3. Administrative
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Scalability Problems
Figure 1-3. Examples of scalability limitations.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Characteristics of decentralized algorithms:
• No machine has complete information about the
system state.
• Machines make decisions based only on local
information.
• Failure of one machine does not ruin the
algorithm.
• There is no implicit assumption that a global
clock exists.
Problems with geographical scalability
• Cannot simply go from LAN to WAN: many distributed
systems assume synchronous client-server interactions:
client sends request and waits for an answer. Latency may
easily prohibit this scheme.
• WAN links are often inherently unreliable: simply moving
streaming video from LAN to WAN is bound to fail.
• Lack of multipoint communication, so that a simple search
broadcast cannot be deployed. Solution is to develop
separate naming and directory services (having their own
scalability problems).
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Problems with administrative scalability
• Conflicting policies concerning usage (and thus payment),
management, and security.
• We do not trust others.
• When multiple domains, security is an issue. How to trust the other system.
• How they manage my data and who gets access to it.
• Example when multiple cloud organizations are interacting to
introduce multiple services to the client.
• Another example, sensor networks which collect data and
send it to a central point.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Scaling techniques
• Hide communication latencies
• Make use of asynchronous communication (interrupt)
• Have separate handler for incoming response (multithreading)
• Problem: not every application fits this model (interactive)
• Partition data and computations across multiple machines
• Move computations to clients (Java applets)
• Decentralized naming services (DNS)
• Decentralized information systems (WWW)
• Replication and caching: Make copies of data available at
different machines
• Replicated file servers and databases
• Web caches (in browsers and proxies)
• File caching (at server and client)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Scaling Techniques (1)
Figure 1-4. The difference between letting (a) a server
or (b) a client check forms as they are being filled.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Scaling Techniques (2)
Figure 1-5. An example of dividing the DNS
name space into zones.
Scaling Techniques (3)
• Replication: Replicated file servers and databases, Mirrored
Web sites
o Performance enhancement
o Fault tolerance
o Load balancing
o Geographic scalability
• Cache is a replication example.
• Problems of replication: consistency, cost tradeoff.
• Strong consistency: needs global synchronization.
• The global synch may hinder replication for large systems.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Pitfalls when Developing
Distributed Systems
False assumptions made by first time developer:
• The network is reliable.
• The network is secure.
• The network is homogeneous.
• The topology does not change.
• Latency is zero.
• Bandwidth is infinite.
• Transport cost is zero.
• There is one administrator.
Types of Distributed Systems
1. High performance distributed computing systems
a) Cluster systems
b) Grid systems
c) Cloud Computing
2. Distributed information systems
a) Transactions processing
b) Integrated enterprise applications
3. Distributed pervasive systems
a) Electronic health care systems
b) Sensor networks
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Parallel computing
• High-performance distributed computing started with parallel
computing.
• Multi processor: shared memory
• Multi computer
• Distributed shared memory
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Cluster Computing Systems
Figure 1-6. An example of a cluster computing system.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Grid Computing Systems
Figure 1-7. A layered architecture for grid computing systems.
Cloud Computing
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Is cloud computing cost-effective?
• Outsource the entire infrastructure. hardware and software.
• More than just providing high performance computing.
• Is outsourcing also cheaper?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Is cloud computing cost-effective?
There is objective function that must contain costs and benefits.
Also, a set of constraints.Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Transaction Processing Systems (1)
Figure 1-8. Example primitives for transactions.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Transaction Processing Systems (2)
Characteristic properties of transactions (ACID):
• Atomic: To the outside world, the transaction
happens indivisibly.
• Consistent: The transaction does not violate
system invariants.
• Isolated: Concurrent transactions do not
interfere with each other.
• Durable: Once a transaction commits, the
changes are permanent.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Transaction Processing Systems (3)
Figure 1-9. A nested transaction.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Transaction Processing Systems (4)
Figure 1-10. The role of a TP monitor in distributed systems.
Middleware offers communication facilities for integration
• Remote Procedure Call (RPC): Requests are sent through local procedure call,
packaged as message, processed, responded through message, and result
returned as return from call.
• Message Oriented Middleware (MOM): Messages are sent to logical contact point
(published), and forwarded to subscribed applications.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Enterprise Application Integration
How to integrate applications
• File transfer: Technically simple, but not flexible:
• Figure out file format and layout
• Figure out file management
• Update propagation, and update notifications.
• Shared database: Much more flexible, but still requires
common data scheme next to risk of bottleneck.
• Remote procedure call: Effective when execution of a series
of actions is needed.
• Messaging: RPCs require caller and callee to be up and
running at the same time. Messaging allows decoupling in
time and space.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Requirements for pervasive systems
• Embrace contextual changes: changes in the networks
is always expected.
• Encourage ad hoc composition.
• Recognize sharing as the default.
• Three (overlapping) subtypes
• Ubiquitous computing systems: pervasive and continuously present,
i.e., there is a continuous interaction between system and user.
• Mobile computing systems: pervasive, but emphasis is on the fact that
devices are inherently mobile.
• Sensor (and actuator) networks: pervasive, with emphasis on the actual
(collaborative) sensing and actuation of the environment.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Distributed Pervasive Systems
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Sensor Networks (1)
Questions concerning sensor networks:
• How do we (dynamically) set up an
efficient tree in a sensor network?
• How does aggregation of results take
place? Can it be controlled?
• What happens when network links fail?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Sensor Networks (2)
Figure 1-13. Organizing a sensor network database, while storing and processing data (a) only at the operator’s site or …
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Sensor Networks (3)
Figure 1-13. Organizing a sensor network database, while storing and processing data … or (b) only at the sensors.
Duty-cycled networks
Many sensor networks need to operate on a strict energy budget:
introduce duty cycles
A node is active during Tactive time units, and then suspended for
Tsuspended units, to become active again.
Duty cycle = Tactive /(Tactive+Tsuspended)
Typical duty cycles are 10−30%, but can also be lower than 1%
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Keeping duty-cycled networks in sync
• If duty cycles are low, sensor nodes may not wake up at the
same time anymore and become permanently disconnected:
they are active during different, non-overlapping time slots.
• Solution
• Each node A adopts a cluster ID CA, being a number.
• Let a node send a join message during its suspended period.
• When A receives a join message from B and CA <CB, it sends a join message
to its neighbors (in cluster CA) before joining B.
• When CA >CB it sends a join message to B during B’s active period.
• Note
• Once a join message reaches a whole cluster, merging two clusters is very
fast. Merging means: re-adjust clocks.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5