ddd lecturer 1

Upload: thri-bagan

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Ddd Lecturer 1

    1/17

    Distributed Design &Development

    T. [email protected]

  • 8/2/2019 Ddd Lecturer 1

    2/17

    Learning out comes

    Understand Microsoft Architecture forEnterprise Application

    Design a distributed application

    Build a distributed application

    Build and use components

  • 8/2/2019 Ddd Lecturer 1

    3/17

    What youll lean?

    Where you can go?

    Design techniques, tool & technology forenterprise level

    Microsoft Visual Studio support large

    systems Multi-tier solutions based on Windows

    DNA architecture The aspects of Windows NT, MTS, IISAt the end youll able to learn MCP in;

    Visual Basic Distributed Application Analyzing Requirements & Defining Solution

    Architectures

  • 8/2/2019 Ddd Lecturer 1

    4/17

    Assignments?

    An assignment in one month time witha month duration

    An exam end of the term

    Remember No EXCUSES for latesubmission

    Simply your mark will be cap to 50

  • 8/2/2019 Ddd Lecturer 1

    5/17

    The Basics

  • 8/2/2019 Ddd Lecturer 1

    6/17

    What is a distributed system?

    A networkis the infrastructure that links

    computers, workstations,terminals, servers, etc. Itconsists of routers which areconnected by communicationlinks.

    A component

    can be a process or any pieceof hardware required to run aprocess, supportcommunications betweenprocesses, store data, etc

    It's one of those things that's hard to define without firstdefining many other things.

    Here is a "cascading" definition of a distributed system:

    A programis the code you write.

    A processis what you get when you run it.

    A messageis used to communicate betweenprocesses.

    A packet

    is a fragment of a message thatmight travel on a wire. A protocol

    is a formal description of messageformats and the rules that twoprocesses must follow in order toexchange those messages.

  • 8/2/2019 Ddd Lecturer 1

    7/17

    A distributed system

    Is an application that executes acollection of protocols to coordinatethe actions of multiple processes on a

    network, such that all componentscooperate together to perform a singleor small set of related tasks.

  • 8/2/2019 Ddd Lecturer 1

    8/17

    Why build a distributedsystem? There are lots of advantages including the ability to

    connect remote users with remote resources in an openand scalable way. When we say open, we mean eachcomponent is continually open to interaction with othercomponents. When we say scalable, we mean thesystem can easily be altered to accommodate changes

    in the number of users, resources and computingentities.

    Thus, a distributed system can be much larger andmore powerful given the combined capabilities of thedistributed components, than combinations of stand-alone systems. But it's not easy - for a distributedsystem to be useful, it must be reliable. This is a difficultgoal to achieve because of the complexity of theinteractions between simultaneously runningcomponents.

  • 8/2/2019 Ddd Lecturer 1

    9/17

    To be truly reliable, a distributed systemmust have the following characteristics:

    Fault-Tolerant: It can recover from component failures withoutperforming incorrect actions.

    Highly Available: It can restore operations, permitting it to resumeproviding services even when some components have failed.

    Recoverable: Failed components can restart themselves and rejointhe system, after the cause of failure has been repaired.

    Consistent: The system can coordinate actions by multiplecomponents often in the presence of concurrency and failure. Thisunderlies the ability of a distributed system to act like a non-distributed system.

    Scalable: It can operate correctly even as some aspect of thesystem is scaled to a larger size. For example, we might increasethe size of the network on which the system is running. This

    increases the frequency of network outages and could degrade a"non-scalable" system. Similarly, we might increase the number ofusers or servers, or overall load on the system. In a scalable system,this should not have a significant effect.

    Predictable Performance: The ability to provide desiredresponsiveness in a timely manner.

    Secure: The system authenticates access to data and services

  • 8/2/2019 Ddd Lecturer 1

    10/17

    Thinks you should consider

    Failure is the defining difference between distributed andlocal programming, so you have to design distributed systemswith the expectation of failure. Imagine asking people, "If theprobability of something happening is one in 1013, how oftenwould it happen?" Common sense would be to answer,

    "Never." That is an infinitely large number in human terms.But if you ask a physicist, she would say, "All the time. In acubic foot of air, those things happen all the time.

  • 8/2/2019 Ddd Lecturer 1

    11/17

    Thinks you should consider

    When you design distributed systems, you have to say,"Failure happens all the time." So when you design, youdesign for failure. It is your number one concern. What doesdesigning for failure mean? One classic problem is partialfailure. If I send a message to you and then a network failure

    occurs, there are two possible outcomes. One is that themessage got to you, and then the network broke, and I justdidn't get the response. The other is the message never gotto you because the network broke before it arrived.

    So if I never receive a response, how do I know which of

    those two results happened? I cannot determine that withouteventually finding you. The network has to be repaired or youhave to come up, because maybe what happened was not anetwork failure but you died. How does this change how Idesign things? For one thing, it puts a multiplier on the value

    of simplicity. The more things I can do with you, the morethings I have to think about recovering from

  • 8/2/2019 Ddd Lecturer 1

    12/17

    Handling failures

    Handling failures is an important theme in distributedsystems design.

    Failures fall into two obvious categories: hardware andsoftware.

    Hardware failures were a dominant concern until the

    late 80's, but since then internal hardware reliability hasimproved enormously.

    Decreased heat production and power consumption ofsmaller circuits, reduction of off-chip connections andwiring, and high-quality manufacturing techniques haveall played a positive role in improving hardware

    reliability. Today, problems are most often associated with

    connections and mechanical devices, i.e., networkfailures and drive failures.

  • 8/2/2019 Ddd Lecturer 1

    13/17

    Software failures

    Software failures are a significant issue in distributed systems. Evenwith rigorous testing, software bugs account for a substantial fractionof unplanned downtime (estimated at 25-35%). Residual bugs inmature systems can be classified into two main categories.

    Heisenbug A bug that seems to disappear or alter its characteristics when it is observed

    or researched.

    A common example is a bug that occurs in a release-mode compile of aprogram, but not when researched under debug-mode. The name "heisenbug"is a pun on the "Heisenberg uncertainty principle," a quantum physics termwhich is commonly (yet inaccurately) used to refer to the way in whichobservers affect the measurements of the things that they are observing, bythe act of observing alone (this is actually the observer effect, and is commonlyconfused with the Heisenberg uncertainty principle).

    Bohrbug: A bug (named after the Bohr atom model) that, in contrast to a heisenbug,

    does not disappear or alter its characteristics when it is researched.

    A Bohrbug typically manifests itself reliably under a well-defined set ofconditions.Heisenbugs tend to be more prevalent in distributed systems than in

    local systems. One reason for this is the difficulty programmers have in

    obtaining a coherent and comprehensive view of the interactions ofconcurrent rocesses.

    Li l ifi b h

  • 8/2/2019 Ddd Lecturer 1

    14/17

    Little more specific about thetypes of failures in a Distributed

    System:1. Halting failures: A component simply

    stops. There is no way to detect thefailure except by timeout: it eitherstops sending "I'm alive" (heartbeat)messages or fails to respond torequests. Your computer freezing is

    a halting failure.2. Fail-stop: A halting failure with some

    kind of notification to othercomponents. A network file servertelling its clients it is about to godown is a fail-stop.

    3. Omission failures: Failure tosend/receive messages primarilydue to lack of buffering space, whichcauses a message to be discardedwith no notification to either thesender or receiver. This can happenwhen routers become overloaded.

    4. Network failures: A network link

    breaks.5. Network partition failure: A network

    fragments into two or more disjointsub-networks within whichmessages can be sent, but betweenwhich messages are lost. This can

    occur due to a network failure.6. Timing failures: A temporal property

    of the system is violated. Forexample, clocks on differentcomputers which are used tocoordinate processes are not

    synchronized; when a message isdelayed longer than a thresholdperiod, etc.

    7. Byzantine failures: This capturesseveral types of faulty behaviors

    including data corruption or loss,

    failures caused by maliciousprograms, etc.

  • 8/2/2019 Ddd Lecturer 1

    15/17

    Our goal

    Our goal is to design a distributedsystem with the characteristics listedabove (fault-tolerant, highly available,

    recoverable, etc.), which means wemust design for failure. To design forfailure, we must be careful to not

    make any assumptions about thereliability of the components of asystem.

  • 8/2/2019 Ddd Lecturer 1

    16/17

    8 Fallacies

    Everyone, when they first build a distributed system, makes the following eightassumptions. These are so well-known in this field that they are commonly referred to asthe "8 Fallacies".

    The network is reliable.

    Latency is zero.

    Bandwidth is infinite. The network is secure.

    Topology doesn't change.

    There is one administrator.

    Transport cost is zero.

    The network is homogeneous.

    Latency: the time between initiating a request for data and the beginning of the actualdata transfer.

    Bandwidth: A measure of the capacity of a communications channel. The higher achannel's bandwidth, the more information it can carry.

    Topology: The different configurations that can be adopted in building networks, such as

    a ring, bus, star or meshed. Homogeneous network: A network running a single network protocol

  • 8/2/2019 Ddd Lecturer 1

    17/17

    So How Is It Done?

    Next class