chapter 1ramzi.ucoz.com/distributedsys/chap-01v3.pdf• structured: each node has a well-defined set...

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

DISTRIBUTED SYSTEMS

Principles and ParadigmsSecond Edition

ANDREW S. TANENBAUM

MAARTEN VAN STEEN

Chapter 1

Introduction

Modified by: Dr. Ramzi Saifan


Definition of a Distributed System (1)• A distributed system is a collection of

autonomous computing elements that appears

to its users as a single coherent system.

• Autonomous computing elements, also referred to as nodes, be

the hardware devices or software processes.

• Single coherent system: users or applications perceive a single

system nodes need to collaborate.


Collection of autonomous nodes• Independent behavior

• Each node is autonomous and will thus have its own notion of time:

there is no global clock. Leads to fundamental synchronization and

coordination problems.

• Collection of nodes

• How to manage group membership?

• How to know that you are indeed communicating with an

authorized(non)member?


Organization• Overlay network

• Each node in the collection communicates only with other

nodes in the system, its neighbors. The set of neighbors may

be dynamic, or may even be known only implicitly (i.e.,

requires a lookup).

• Overlay types: Well-known example of overlay networks:

peer-to-peer systems.

• Structured: each node has a well-defined set of neighbors with

whom it can communicate (tree, ring).

• Unstructured: each node has references to randomly selected other

nodes from the system.


Coherent system• Essence

• The collection of nodes as a whole operates the same, no matter

where, when, and how interaction between a user and the system

takes place.

• Examples

• An end user cannot tell where a computation is taking place

• Where data is exactly stored should be irrelevant to an application

• If or not data has been replicated is completely hidden

• Keyword is distribution transparency

• The snag: partial failures

• It is inevitable that at any time only a part of the distributed system

fails. Hiding partial failures and their recovery is often very difficult

and in general impossible to hide..


Middleware: OS of Distributed Systems

• The middleware layer extends over multiple machines, and

offers each application the same interface.

• What does it contain?

• Commonly used components and functions that need not be implemented by

applications separately.

What do we want to achieve?


1. Sharing resources

2. Transparency

3. Openness

4. Scalability


Transparency in a Distributed System

Figure 1-2. Different forms of transparency in a

distributed system (ISO, 1995).

• Aiming at full distribution transparency may be too much: • There are communication latencies that cannot be hidden

• Completely hiding failures of networks and nodes is (theoretically and

practically) impossible • You cannot distinguish a slow computer from a failing one

• You can never be sure that a server actually performed an operation before

a crash

• Full transparency will cost performance, exposing distribution of

the system • Keeping replicas exactly up-to-date with the master takes time

• Immediately flushing write operations to disk for fault tolerance

• Conclusion: distribution transparency is a nice a goal, but achieving it is a

different story, and it should often not even be aimed at.

Degree of transparency


Openness of distributed systems


•Open distributed system: Be able to interact with services from

other open systems, irrespective of the underlying environment:

oSystems should conform to well-defined interfaces

oSystems should support portability of applications

oSystems should easily interoperate

oSystems should be extensible

•Achieving openness: At least make the distributed system

independent from heterogeneity of the underlying environment:

oHardware

oPlatforms

oLanguages

Degrees of scalability


1. Size

2. Geographic

3. Administrative


Scalability Problems

Figure 1-3. Examples of scalability limitations.


Characteristics of decentralized algorithms:

• No machine has complete information about the

system state.

• Machines make decisions based only on local

information.

• Failure of one machine does not ruin the

algorithm.

• There is no implicit assumption that a global

clock exists.

Problems with geographical scalability

• Cannot simply go from LAN to WAN: many distributed

systems assume synchronous client-server interactions:

client sends request and waits for an answer. Latency may

easily prohibit this scheme.

• WAN links are often inherently unreliable: simply moving

streaming video from LAN to WAN is bound to fail.

• Lack of multipoint communication, so that a simple search

broadcast cannot be deployed. Solution is to develop

separate naming and directory services (having their own

scalability problems).


Problems with administrative scalability

• Conflicting policies concerning usage (and thus payment),

management, and security.

• We do not trust others.

• When multiple domains, security is an issue. How to trust the other system.

• How they manage my data and who gets access to it.

• Example when multiple cloud organizations are interacting to

introduce multiple services to the client.

• Another example, sensor networks which collect data and

send it to a central point.


Scaling techniques

• Hide communication latencies

• Make use of asynchronous communication (interrupt)

• Have separate handler for incoming response (multithreading)

• Problem: not every application fits this model (interactive)

• Partition data and computations across multiple machines

• Move computations to clients (Java applets)

• Decentralized naming services (DNS)

• Decentralized information systems (WWW)

• Replication and caching: Make copies of data available at

different machines

• Replicated file servers and databases

• Web caches (in browsers and proxies)

• File caching (at server and client)



Scaling Techniques (1)

Figure 1-4. The difference between letting (a) a server

or (b) a client check forms as they are being filled.



Figure 1-5. An example of dividing the DNS

name space into zones.


• Replication: Replicated file servers and databases, Mirrored

Web sites

o Performance enhancement

o Fault tolerance

o Load balancing

o Geographic scalability

• Cache is a replication example.

• Problems of replication: consistency, cost tradeoff.

• Strong consistency: needs global synchronization.

• The global synch may hinder replication for large systems.



Pitfalls when Developing

Distributed Systems

False assumptions made by first time developer:

• The network is reliable.

• The network is secure.

• The network is homogeneous.

• The topology does not change.

• Latency is zero.

• Bandwidth is infinite.

• Transport cost is zero.

• There is one administrator.

Types of Distributed Systems

1. High performance distributed computing systems

a) Cluster systems

b) Grid systems

c) Cloud Computing

2. Distributed information systems

a) Transactions processing

b) Integrated enterprise applications

3. Distributed pervasive systems

a) Electronic health care systems

b) Sensor networks


Parallel computing

• High-performance distributed computing started with parallel

computing.

• Multi processor: shared memory

• Multi computer

• Distributed shared memory



Cluster Computing Systems

Figure 1-6. An example of a cluster computing system.


Grid Computing Systems

Figure 1-7. A layered architecture for grid computing systems.

Cloud Computing


Is cloud computing cost-effective?

• Outsource the entire infrastructure. hardware and software.

• More than just providing high performance computing.

• Is outsourcing also cheaper?


Is cloud computing cost-effective?

There is objective function that must contain costs and benefits.

Also, a set of constraints.Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5


Transaction Processing Systems (1)

Figure 1-8. Example primitives for transactions.



Characteristic properties of transactions (ACID):

• Atomic: To the outside world, the transaction

happens indivisibly.

• Consistent: The transaction does not violate

system invariants.

• Isolated: Concurrent transactions do not

interfere with each other.

• Durable: Once a transaction commits, the

changes are permanent.



Figure 1-9. A nested transaction.



Figure 1-10. The role of a TP monitor in distributed systems.

Middleware offers communication facilities for integration

• Remote Procedure Call (RPC): Requests are sent through local procedure call,

packaged as message, processed, responded through message, and result

returned as return from call.

• Message Oriented Middleware (MOM): Messages are sent to logical contact point

(published), and forwarded to subscribed applications.


Enterprise Application Integration

How to integrate applications

• File transfer: Technically simple, but not flexible:

• Figure out file format and layout

• Figure out file management

• Update propagation, and update notifications.

• Shared database: Much more flexible, but still requires

common data scheme next to risk of bottleneck.

• Remote procedure call: Effective when execution of a series

of actions is needed.

• Messaging: RPCs require caller and callee to be up and

running at the same time. Messaging allows decoupling in

time and space.


Requirements for pervasive systems

• Embrace contextual changes: changes in the networks

is always expected.

• Encourage ad hoc composition.

• Recognize sharing as the default.

• Three (overlapping) subtypes

• Ubiquitous computing systems: pervasive and continuously present,

i.e., there is a continuous interaction between system and user.

• Mobile computing systems: pervasive, but emphasis is on the fact that

devices are inherently mobile.

• Sensor (and actuator) networks: pervasive, with emphasis on the actual

(collaborative) sensing and actuation of the environment.


Distributed Pervasive Systems


Sensor Networks (1)

Questions concerning sensor networks:

• How do we (dynamically) set up an

efficient tree in a sensor network?

• How does aggregation of results take

place? Can it be controlled?

• What happens when network links fail?


Sensor Networks (2)

Figure 1-13. Organizing a sensor network database, while storing and processing data (a) only at the operator’s site or …


Sensor Networks (3)

Figure 1-13. Organizing a sensor network database, while storing and processing data … or (b) only at the sensors.

Duty-cycled networks

Many sensor networks need to operate on a strict energy budget:

introduce duty cycles

A node is active during Tactive time units, and then suspended for

Tsuspended units, to become active again.

Duty cycle = Tactive /(Tactive+Tsuspended)

Typical duty cycles are 10−30%, but can also be lower than 1%


Keeping duty-cycled networks in sync

• If duty cycles are low, sensor nodes may not wake up at the

same time anymore and become permanently disconnected:

they are active during different, non-overlapping time slots.

• Solution

• Each node A adopts a cluster ID CA, being a number.

• Let a node send a join message during its suspended period.

• When A receives a join message from B and CA <CB, it sends a join message

to its neighbors (in cluster CA) before joining B.

• When CA >CB it sends a join message to B during B’s active period.

• Note

• Once a join message reaches a whole cluster, merging two clusters is very

fast. Merging means: re-adjust clocks.


chapter 1ramzi.ucoz.com/distributedsys/chap-01v3.pdf• structured: each node has a well-defined set...

Documents