ddd lecturer 1

8/2/2019 Ddd Lecturer 1

1/17

Distributed Design &Development

T. [email protected]


2/17

Learning out comes

Understand Microsoft Architecture forEnterprise Application

Design a distributed application

Build a distributed application

Build and use components


3/17

What youll lean?

Where you can go?

Design techniques, tool & technology forenterprise level

Microsoft Visual Studio support large

systems Multi-tier solutions based on Windows

DNA architecture The aspects of Windows NT, MTS, IISAt the end youll able to learn MCP in;

Visual Basic Distributed Application Analyzing Requirements & Defining Solution

Architectures


4/17

Assignments?

An assignment in one month time witha month duration

An exam end of the term

Remember No EXCUSES for latesubmission

Simply your mark will be cap to 50


5/17

The Basics


6/17

What is a distributed system?

A networkis the infrastructure that links

computers, workstations,terminals, servers, etc. Itconsists of routers which areconnected by communicationlinks.

A component

can be a process or any pieceof hardware required to run aprocess, supportcommunications betweenprocesses, store data, etc

It's one of those things that's hard to define without firstdefining many other things.

Here is a "cascading" definition of a distributed system:

A programis the code you write.

A processis what you get when you run it.

A messageis used to communicate betweenprocesses.

A packet

is a fragment of a message thatmight travel on a wire. A protocol

is a formal description of messageformats and the rules that twoprocesses must follow in order toexchange those messages.


7/17

A distributed system

Is an application that executes acollection of protocols to coordinatethe actions of multiple processes on a

network, such that all componentscooperate together to perform a singleor small set of related tasks.


8/17

Why build a distributedsystem? There are lots of advantages including the ability to

connect remote users with remote resources in an openand scalable way. When we say open, we mean eachcomponent is continually open to interaction with othercomponents. When we say scalable, we mean thesystem can easily be altered to accommodate changes

in the number of users, resources and computingentities.

Thus, a distributed system can be much larger andmore powerful given the combined capabilities of thedistributed components, than combinations of stand-alone systems. But it's not easy - for a distributedsystem to be useful, it must be reliable. This is a difficultgoal to achieve because of the complexity of theinteractions between simultaneously runningcomponents.


9/17

To be truly reliable, a distributed systemmust have the following characteristics:

Fault-Tolerant: It can recover from component failures withoutperforming incorrect actions.

Highly Available: It can restore operations, permitting it to resumeproviding services even when some components have failed.

Recoverable: Failed components can restart themselves and rejointhe system, after the cause of failure has been repaired.

Consistent: The system can coordinate actions by multiplecomponents often in the presence of concurrency and failure. Thisunderlies the ability of a distributed system to act like a non-distributed system.

Scalable: It can operate correctly even as some aspect of thesystem is scaled to a larger size. For example, we might increasethe size of the network on which the system is running. This

increases the frequency of network outages and could degrade a"non-scalable" system. Similarly, we might increase the number ofusers or servers, or overall load on the system. In a scalable system,this should not have a significant effect.

Predictable Performance: The ability to provide desiredresponsiveness in a timely manner.

Secure: The system authenticates access to data and services


10/17

Thinks you should consider

Failure is the defining difference between distributed andlocal programming, so you have to design distributed systemswith the expectation of failure. Imagine asking people, "If theprobability of something happening is one in 1013, how oftenwould it happen?" Common sense would be to answer,

"Never." That is an infinitely large number in human terms.But if you ask a physicist, she would say, "All the time. In acubic foot of air, those things happen all the time.


11/17

Thinks you should consider

When you design distributed systems, you have to say,"Failure happens all the time." So when you design, youdesign for failure. It is your number one concern. What doesdesigning for failure mean? One classic problem is partialfailure. If I send a message to you and then a network failure

occurs, there are two possible outcomes. One is that themessage got to you, and then the network broke, and I justdidn't get the response. The other is the message never gotto you because the network broke before it arrived.

So if I never receive a response, how do I know which of

those two results happened? I cannot determine that withouteventually finding you. The network has to be repaired or youhave to come up, because maybe what happened was not anetwork failure but you died. How does this change how Idesign things? For one thing, it puts a multiplier on the value

of simplicity. The more things I can do with you, the morethings I have to think about recovering from


12/17

Handling failures

Handling failures is an important theme in distributedsystems design.

Failures fall into two obvious categories: hardware andsoftware.

Hardware failures were a dominant concern until the

late 80's, but since then internal hardware reliability hasimproved enormously.

Decreased heat production and power consumption ofsmaller circuits, reduction of off-chip connections andwiring, and high-quality manufacturing techniques haveall played a positive role in improving hardware

reliability. Today, problems are most often associated with

connections and mechanical devices, i.e., networkfailures and drive failures.


13/17

Software failures

Software failures are a significant issue in distributed systems. Evenwith rigorous testing, software bugs account for a substantial fractionof unplanned downtime (estimated at 25-35%). Residual bugs inmature systems can be classified into two main categories.

Heisenbug A bug that seems to disappear or alter its characteristics when it is observed

or researched.

A common example is a bug that occurs in a release-mode compile of aprogram, but not when researched under debug-mode. The name "heisenbug"is a pun on the "Heisenberg uncertainty principle," a quantum physics termwhich is commonly (yet inaccurately) used to refer to the way in whichobservers affect the measurements of the things that they are observing, bythe act of observing alone (this is actually the observer effect, and is commonlyconfused with the Heisenberg uncertainty principle).

Bohrbug: A bug (named after the Bohr atom model) that, in contrast to a heisenbug,

does not disappear or alter its characteristics when it is researched.

A Bohrbug typically manifests itself reliably under a well-defined set ofconditions.Heisenbugs tend to be more prevalent in distributed systems than in

local systems. One reason for this is the difficulty programmers have in

obtaining a coherent and comprehensive view of the interactions ofconcurrent rocesses.

Li l ifi b h


14/17

Little more specific about thetypes of failures in a Distributed

System:1. Halting failures: A component simply

stops. There is no way to detect thefailure except by timeout: it eitherstops sending "I'm alive" (heartbeat)messages or fails to respond torequests. Your computer freezing is

a halting failure.2. Fail-stop: A halting failure with some

kind of notification to othercomponents. A network file servertelling its clients it is about to godown is a fail-stop.

3. Omission failures: Failure tosend/receive messages primarilydue to lack of buffering space, whichcauses a message to be discardedwith no notification to either thesender or receiver. This can happenwhen routers become overloaded.

4. Network failures: A network link

breaks.5. Network partition failure: A network

fragments into two or more disjointsub-networks within whichmessages can be sent, but betweenwhich messages are lost. This can

occur due to a network failure.6. Timing failures: A temporal property

of the system is violated. Forexample, clocks on differentcomputers which are used tocoordinate processes are not

synchronized; when a message isdelayed longer than a thresholdperiod, etc.

7. Byzantine failures: This capturesseveral types of faulty behaviors

including data corruption or loss,

failures caused by maliciousprograms, etc.


15/17

Our goal

Our goal is to design a distributedsystem with the characteristics listedabove (fault-tolerant, highly available,

recoverable, etc.), which means wemust design for failure. To design forfailure, we must be careful to not

make any assumptions about thereliability of the components of asystem.


16/17

8 Fallacies

Everyone, when they first build a distributed system, makes the following eightassumptions. These are so well-known in this field that they are commonly referred to asthe "8 Fallacies".

The network is reliable.

Latency is zero.

Bandwidth is infinite. The network is secure.

Topology doesn't change.

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Latency: the time between initiating a request for data and the beginning of the actualdata transfer.

Bandwidth: A measure of the capacity of a communications channel. The higher achannel's bandwidth, the more information it can carry.

Topology: The different configurations that can be adopted in building networks, such as

a ring, bus, star or meshed. Homogeneous network: A network running a single network protocol


17/17

So How Is It Done?

Next class

ddd lecturer 1

Documents