intro to distributed systems - github pages

Introtodistributedsystems

1

Whatisadistributedsystem?

Adistributedsystemconsistsofhardwareandsoftwarecomponentslocatedinanetworkofcomputersthatcommunicateandcoordinatetheiractionsonlybypassingmessages.[Coulouris]

Adistributedsystemisacollectionofindependentcomputersthatappearstoitsusersasasinglecoherentsystem.[Tanenbaum &vanSteen]

Adistributedsystemisasystemthatpreventsyoufromdoinganyworkwhenacomputeryouhaveneverheardabout,fails.[Lamport]

• Theabovedefinitionstakedifferentperspectives• Operationalperspective• Userperspective• DScharacteristicsperspective

2

Examplesofdistributedsystems

• Intra-net,inter-net,WWW• DNS:Hierarchicaldistributeddatabase• Networkofworkstations(NOW),ClusterComputers• Email• Electronicbanking• Airlinereservationsystem• Peer-to-peernetworks• Sensornetworks• MobileandPervasiveComputing• Cellularphonesystems• IPTelephony• Flightmanagementsysteminanaircraft• Automotivecontrolsystems(50+embeddedprocessorsinaMercedesS-class)• Distributedfilesystems(NFS,Samba)• P2Pfilesharing• Etc.,etc.,etc.

3

TheInternet

4

Google

“Googleistechnologicallyalargesupercomputer.It'sadistributedsupercomputeramongmanydatacentersdoingallsortsofinterestingthingsoverfiberopticnetworkthateventuallyareservicesavailabletoend-users.”EricSchmidt,GoogleCEO2007

“Googlerunsonhundredsofthousandsofservers—byoneestimate,inexcessof450,000—rackedupinthousandsofclustersindozensofdatacentersaroundtheworld.IthasdatacentersinDublin,Ireland;inVirginia;andinCalifornia,whereitjustacquiredthemillion-square-footheadquartersithadbeenleasing.ItrecentlyopenedanewcenterinAtlanta,andiscurrentlybuildingtwofootball-field-sizedcentersinTheDalles,Ore.”2006

AnestimatetodaysaysthatGooglerunsmorethan2,000,000serverin36datacentersaroundtheglobe

5

Google

"Ourviewisit'sbettertohavetwiceasmuchhardwarethat'snotasreliablethanhalfasmuchthat'smorereliable.Youhavetoprovidereliabilityonasoftwarelevel.Ifyou'rerunning10,000machines,somethingisgoingtodieeveryday.”JeffDean,Googlefellow,2008

“Atypicalsearchwillrequireactionsfrombetween700to1,000machinestoday.”MaryssaMayer,vicepresidentofGoogle’ssearchproductsanduserexperience,2008

Googleprocessesover40,000querieseverysecondonaverage,whichtranslatestoover3.5billionsearchesperday(internetlivestats)

“Ourcurrentgeneration— Jupiterfabrics— candelivermorethan1Petabit/secoftotalbisectionbandwidth.Toputthisinperspective,suchcapacitywouldbeenoughfor100,000serverstoexchangeinformationat10Gb/seach,enoughtoreadtheentirescannedcontentsoftheLibraryofCongressinlessthan1/10thofasecond.”AminVahdat,GoogleFellow,2015

6

Googledatacenters

E.g.,Dallassite:Three68,000squarefootdatacenterbuildings

7https://www.google.com/about/datacenters/

Distributedsystemsvscentralizedsystems

• Concurrency• Incentralizedsystems,concurrencyisadesignchoice.• Indistributedsystems,computersrunconcurrently.

• Independentandpartialfailures• Centralizedsystemsusuallyfailcompletely.• Distributedsystemsusuallyfailpartially,oftenbecauseofcommunication.Whenacomponentfails,theothersarestillrunning.Detectingfailuresmaybehard.Recoveryisalsohardbecausethestateofanapplicationisdistributed.

• Absenceofglobalclock• Incentralizedsystems,thephysicalclockofthecomputercanbeusedforsynchronization.

• Indistributedsystemsclocksmaynotbeinsync• Example:Bankaccount,startingbalance=$100

• ClientatbankmachineAmakesadepositof$100• ClientatbankmachineBmakesawithdrawalof$150• Whicheventhappenedfirst?• Shouldthebankchargetheoverdraftfee?

8

Theeightfallaciesofdistributedsystems

”Essentiallyeveryone,whentheyfirstbuildadistributedapplicationmakesthefollowingeightsimplifyingassumptions.Allfalseinthelongrun(PeterDeutsch)”

1. Thenetworkisreliable.2. Thenetworkissecure.3. Thenetworkishomogeneous.4. Thetopologydoesnotchange.5. Latencyiszero.6. Bandwidthisinfinite.7. Transportcostiszero.8. Thereisoneadministrator. Theseassumptionsultimatelyprovefalse,

resultingeitherinthefailureofthesystem,asubstantialreductioninsystemscope,orinlarge,unplannedexpensesrequiredtoredesignthesystemtomeetitsoriginalgoals

Buildingdistributedsystemsishard!

9

WhyDistributedSystems

• Functionaldistribution• Computershavedifferentfunctionalcapabilities(e.g.,Fileserver,printer)yetmayneedtoshareresources

• Client/server• Datagathering/dataprocessing

• Incrementalgrowth• Easiertoevolvethesystem• Modularexpandability

• Inherentdistributioninapplicationdomain• Banks,reservationservices,distributedgames,mobileapps• physicallyoracrossadministrativedomains• cashregisterandinventorysystemsforsupermarketchains• computersupportedcollaborativework

10

WhyDistributedSystems

• Economics• collectionsofmicroprocessorsofferabetterprice/performanceratiothanlargemainframes.

• Lowprice/performanceratio:costeffectivewaytoincreasecomputingpower.

• Betterperformance• Loadbalancing• Replicationofprocessingpower• Adistributedsystemmayhavemoretotalcomputingpowerthanamainframe.Ex.10,000CPUchips,eachrunningat50MIPS.Notpossibletobuild500,000MIPSsingleprocessorsinceitwouldrequire0.002nsec instructioncycle.Enhancedperformancethroughloaddistributing.

• IncreasedReliability• Exploitindependentfailuresproperty• Ifonemachinecrashes,thesystemasawholecanstillsurvive.

• Anotherdrivingforce:theexistenceoflargenumberofpersonalcomputers,theneedforpeopletocollaborateandshareinformation.

11

Goalsandchallengesofdistributedsystems

• Transparency• Howtoachievethesingle-systemimage

• Performance• Thesystemprovideshigh(computing,storage,..)performance

• Scalability• Theabilitytoservemoreusers,provideacceptableresponsetimeswithincreasedamountofdata

• Openness• Anopendistributedsystemcanbeextendedandimprovedincrementally• Requirespublicationofcomponentinterfacesandstandardsprotocolsforaccessinginterfaces

• Reliability/faulttolerance• Maintainavailabilityevenwhenindividualcomponentsfail

• Heterogeneity• Network,hardware,operatingsystem,programminglanguages,differentdevelopers

• Security• Confidentiality,integrityandavailability

12

Transparency• Howtoachievethesingle-systemimage,i.e.,howtomakea

collectionofcomputersappearasasinglecomputer.

• Hidingthedistributionattwolevels:• Hidethedistributionfromusers• Atalowerlevel,makethesystemlooktransparenttoprograms.

->Requireuniforminterfacessuchasaccesstofiles,communication.

• DifferentformsoftransparencyinaDS(ISO,1995).• Trade-offbetweentransparencyand

performanceofasystem

13

Transparencyindistributedsystems

Access transparency: enables local and remote resources to be accessed usingidentical operations.

• Dropbox• SQL queries

Locationtransparency:enablesresourcestobeaccessedwithoutknowledgeoftheirphysicalornetworklocation(forexample,whichbuildingorIPaddress).

• UserscannottellwherehardwareandsoftwareresourcessuchasCPUs,printers,files,databasesarelocated.

• Navigationintheweb• Tablesindistributeddatabase

MigrationTransparency:resourcesmustbefreetomovefromonelocationtoanotherwithouttheirnameschanged.E.g.,/usr/lee,/central/usr/lee

14


Concurrencytransparency:enablesseveralprocessestooperateconcurrentlyusingsharedresourceswithoutinterference.

• Theusersarenotawareoftheexistenceofotherusers.• Needtoallowmultipleuserstoconcurrentlyaccessthesameresource.Lockandunlockformutualexclusion.

• Distributedfilesystem,distributeddatabase

ParallelismTransparency:Automaticuseofparallelismwithouthavingtoprogramexplicitly.Theholygrailfordistributedandparallelsystemdesigners.

15


Replicationtransparency:enablesmultipleinstancesofresourcestobeusedtoincreasereliabilityandperformancewithoutknowledgeofthereplicasbyusersorapplicationprogrammers.

• OScanmakeadditionalcopiesoffilesandresourceswithoutusersnoticing.

Failuretransparency:enablestheconcealmentoffaults,allowingusersandapplicationprogramstocompletetheirtasksdespitethefailureofhardwareorsoftwarecomponents.

• DBMS,BigDataprocessingsystems

16


Mobilitytransparency:allowsthemovementofresourcesandclientswithinasystemwithoutaffectingtheoperationofusersorprograms.

• Roaming,movingbetweentwoaccesspoints.

Performancetransparency:allowsthesystemtobereconfiguredtoimproveperformanceasloadsvary.Scalingtransparency:allowsthesystemandapplicationstoexpandinscalewithoutchangetothesystemstructureortheapplicationalgorithms.

17


Incertaincasestransparencyisimpracticableornotconvenient

• Somethingscannotbemadetransparent• Timezones• Communicationdelays

• Hidingtoomuchmayhaveanegativeperformanceimpact• Accessingmultipletimesaremoteobjectwithoutknowing

• Sometimestransparencyisjustundesirable• Usersdonotalwayswantcompletetransparency:afancyprinter1000milesaway

18

Reliability

• Hardware,softwareandnetworkfail• DSmustmaintainavailabilityevenincaseswherehardware/software/networkhavelowreliability

• Failuresindistributedsystemsarepartial• Makeserrorhandlingparticularlydifficult

• Detection offailures– maybeimpossible• Insomecasesitiseasy,e.g.,checksumincommunication• Hasacomponentcrashed?Orisitjustslow?• Isthenetworkdown?Orisitjustslow?• Ifit’sslow– howlongshouldwewait?

• Manytechniquesforhandling failures• Maskingfailures(retransmissioninprotocols)• Toleratingfailures,degradingtheofferedservice(asinweb-browsers)• Recoveryfromfailures(periodicallysavestateofacomponent,rollbackpartiallycompletedtask)

• Redundancy(replicateserversinfailure-independentways,duplicatenetworkroutes)

19

Reliability

•Distributedsystemshouldbemorereliablethansinglesystem.• Example:

• Singlemachine:0.95probabilityofbeingup.• Systemwith3machines(allmachinesneedtobreak):1- 0.05**3probabilityofbeingup.

Availability:fractionoftimethesystemisusable.• Redundancyimprovesit• Recoverybetweenfailures

• Needtomaintainconsistency• Needtomaskfailures

20

Performance

•Withoutgainonthis,whybotherwithdistributedsystems.• Performancelossduetocommunicationdelays:

– fine-grainparallelism:highdegreeofinteraction– coarse-grainparallelism

•Performancelossduetomakingthesystemfaulttolerant.

21

Scalability

• Systemremainseffectiveasitgrows?• Asyouaddmorecomponents:

• Moresynchronization• Morecommunication–>thesystemrunsslowly.

• Asystemisscalableifitremainseffectivewhenthereisasignificantincreaseintheamountofresources(data)andnumberofusers

• Internet:numberofusersandserviceshasgrownenormously

• Scalabilitydenotestheabilityofasystemtohandleanincreasingfutureload

22

Scalability

• Requirementsofscalabilityoftenleadstoadistributedsystemarchitecture(severalcomputers)

• Systemsgrowwithtimeorbecomeobsolete.• Techniquesthatrequireresourceslinearlyintermsofthesizeofthesystemarenotscalable.

• E.g.,broadcastbasedquerywon'tworkforlargedistributedsystems.

• Examplesofbottlenecks:Everyoneiswaitingforasinglesharedresource• Centralizedservices:asinglemailserver• Centralizeddata:asingleURLaddressbook• Centralizedalgorithms:routingbasedoncompleteinformation

23

Scalingtechniques

Distribution• Splittingaresource(suchasdata)intosmallerparts,andspreadingthepartsacrossthesystem(cf DNS)

24

Scalingtechniques:DNS

Recursivemodealsopossible.Whatistheissue?

Initially,allhost-addess mappingswereinafilehosts.txt (in/etc/hosts)• ChangesweresubmittedtoSRI(StanfordResearchInstitute)byemail• Newversionsofhosts.txt ftp’dperiodicallyfromSRI• Anadministratorcouldpicknamesattheirdiscretion• Anynameisallowed:eugenesdesktopatrice (flatnamespace)

Astheinternetgrewthissystembroke:• SRIcouldn’thandledtheload• Hardtoenforcenameuniqueness• Manyhosts:inaccuratehosts.txt

DomainNameSystem(DNS)wasbornin‘83

25

Scalingtechniques

• Replication• Replicateresources(services,data)acrossthesystem,canaccesstheminmultipleplaces

• Cachingtoavoidrecomputation• Increasedavailabilityreducestheprobabilitythatabiggersystembreaks

• Hidingcommunicationlatencies• Avoidwaitingforresponsestoremoteservicerequests

• Useasynchronouscommunication

26

Scalingtechniques

• Reducingamountofremoterequests• (a)theservercheckstheformsastheyarebeingfilled(b)aclientdoes.

27

Openness

• Canthesystemsbeextendedandreimplemented invariousways?

• Tobeachieved• Publishallkeyaspectsofthesystem

• Protocols• Interfacestoservices

• Adoptingstandardsasmuchaspossible• Takedesigndecisionsthatfavorinteroperabilityandportabiliy

Example:TheInternet.RFCsandanopenstandardizationbody(IETF)

28

Heterogeneity

• Hardwareandsoftware(e.g.,operatingsystems,processors)• HowcananIntel/WindowssystemunderstandmessagessentbyanMacintoshOSXsystem?

• Differentperformance.E.g.mobiledeviceshavelowcomputingpower

• Differentnetworkinfrastructures(Ethernet,802.11– wireless)• Programminglanguages

• HowcanaJavaprogramandaCprogramcommunicate?

29

Security

• Securityfortheinformationresourcesmadeavailableandmaintainedinthedistributedsystemhasthreecomponents

• Confidentiality:Protectionagainstdisclosuretounauthorizedindividuals• Integrity:Protectionagainstalterationorcorruption• Availability:Protectionagainstinterferencewiththemeanstoaccesstheresource(e.g.,DOSattack)

• Encryptionisapowerfulmechanismbutseveralissuesarestillopen

• DOSattacks• Mobilecode• …

Example:DNSSpoofing

30

(More)Basicconcepts

31

Parallelvs.distributedcomputing

32

Middleware

• Middlewareprovideshorizontalservicestohelpbuildingdistributedapplications

• Itmasksplatformsdifferences

• Example:messageorientedmiddleware• Store(buffer),route,ortransformmessagesconvertingthemfromsenderstoreceivers

33

Intranet:AportionoftheInternet

intranet

ISP

desktop computer:

backbone

satellite link

server:

☎

network link:

☎

☎

☎

34

Intranet

AportionoftheInternetthat• isseparatelyadministered• usuallyproprietary• providesinternalandexternalservices

• canbeconfiguredtoenforcelocalsecuritypolicies

• mayuseafirewalltopreventunauthorizedmessagesleavingorentering

• maybeconnectedtotheinternetviaarouter

Services:• File,printservices,backup,program-sharing,user-,system-administration,internetaccess

the rest of

email server

Web server

Desktopcomputers

Fil e server

router/firewall

print and other servers

other servers

print

Local areanetwork

email server

the Internet

35

Throughput/Latency

• Latency– “wiredelay”• Timetosendandrecv onebyteofdata• Dependson“distance”

• Throughput• Bytespersecond• Dependsonthesizeofthevehicle

• Latencyisoftenthebottleneck• Improvesslowerthanbandwidth• Speedoflight• Routesinthemiddle(trafficstops)• Request-respondcycleoftendominatestheapplication

36

Performancescales

Register 1

L2 10

Memory 200

LAN 100,000

Disk 2,000,000

WAN 20,000,00037

Exercise

38

Exercise

Reading• There’sJustNoGettingaroundIt:You’reBuildingaDistributedSystem[MarkCavage]

• Answerquestionsintheexercisesheetavailableonthewebsite• Openquestionsfordiscussion

39

Questions?

40

intro to distributed systems - github pages

Documents