intro to distributed systems - github pages

40
Intro to distributed systems 1

Upload: others

Post on 29-Dec-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to distributed systems - GitHub Pages

Introtodistributedsystems

1

Page 2: Intro to distributed systems - GitHub Pages

Whatisadistributedsystem?

Adistributedsystemconsistsofhardwareandsoftwarecomponentslocatedinanetworkofcomputersthatcommunicateandcoordinatetheiractionsonlybypassingmessages.[Coulouris]

Adistributedsystemisacollectionofindependentcomputersthatappearstoitsusersasasinglecoherentsystem.[Tanenbaum &vanSteen]

Adistributedsystemisasystemthatpreventsyoufromdoinganyworkwhenacomputeryouhaveneverheardabout,fails.[Lamport]

• Theabovedefinitionstakedifferentperspectives• Operationalperspective• Userperspective• DScharacteristicsperspective

2

Page 3: Intro to distributed systems - GitHub Pages

Examplesofdistributedsystems

• Intra-net,inter-net,WWW• DNS:Hierarchicaldistributeddatabase• Networkofworkstations(NOW),ClusterComputers• Email• Electronicbanking• Airlinereservationsystem• Peer-to-peernetworks• Sensornetworks• MobileandPervasiveComputing• Cellularphonesystems• IPTelephony• Flightmanagementsysteminanaircraft• Automotivecontrolsystems(50+embeddedprocessorsinaMercedesS-class)• Distributedfilesystems(NFS,Samba)• P2Pfilesharing• Etc.,etc.,etc.

3

Page 4: Intro to distributed systems - GitHub Pages

TheInternet

4

Page 5: Intro to distributed systems - GitHub Pages

Google

“Googleistechnologicallyalargesupercomputer.It'sadistributedsupercomputeramongmanydatacentersdoingallsortsofinterestingthingsoverfiberopticnetworkthateventuallyareservicesavailabletoend-users.”EricSchmidt,GoogleCEO2007

“Googlerunsonhundredsofthousandsofservers—byoneestimate,inexcessof450,000—rackedupinthousandsofclustersindozensofdatacentersaroundtheworld.IthasdatacentersinDublin,Ireland;inVirginia;andinCalifornia,whereitjustacquiredthemillion-square-footheadquartersithadbeenleasing.ItrecentlyopenedanewcenterinAtlanta,andiscurrentlybuildingtwofootball-field-sizedcentersinTheDalles,Ore.”2006

AnestimatetodaysaysthatGooglerunsmorethan2,000,000serverin36datacentersaroundtheglobe

5

Page 6: Intro to distributed systems - GitHub Pages

Google

"Ourviewisit'sbettertohavetwiceasmuchhardwarethat'snotasreliablethanhalfasmuchthat'smorereliable.Youhavetoprovidereliabilityonasoftwarelevel.Ifyou'rerunning10,000machines,somethingisgoingtodieeveryday.”JeffDean,Googlefellow,2008

“Atypicalsearchwillrequireactionsfrombetween700to1,000machinestoday.”MaryssaMayer,vicepresidentofGoogle’ssearchproductsanduserexperience,2008

Googleprocessesover40,000querieseverysecondonaverage,whichtranslatestoover3.5billionsearchesperday(internetlivestats)

“Ourcurrentgeneration— Jupiterfabrics— candelivermorethan1Petabit/secoftotalbisectionbandwidth.Toputthisinperspective,suchcapacitywouldbeenoughfor100,000serverstoexchangeinformationat10Gb/seach,enoughtoreadtheentirescannedcontentsoftheLibraryofCongressinlessthan1/10thofasecond.”AminVahdat,GoogleFellow,2015

6

Page 7: Intro to distributed systems - GitHub Pages

Googledatacenters

E.g.,Dallassite:Three68,000squarefootdatacenterbuildings

7https://www.google.com/about/datacenters/

Page 8: Intro to distributed systems - GitHub Pages

Distributedsystemsvscentralizedsystems

• Concurrency• Incentralizedsystems,concurrencyisadesignchoice.• Indistributedsystems,computersrunconcurrently.

• Independentandpartialfailures• Centralizedsystemsusuallyfailcompletely.• Distributedsystemsusuallyfailpartially,oftenbecauseofcommunication.Whenacomponentfails,theothersarestillrunning.Detectingfailuresmaybehard.Recoveryisalsohardbecausethestateofanapplicationisdistributed.

• Absenceofglobalclock• Incentralizedsystems,thephysicalclockofthecomputercanbeusedforsynchronization.

• Indistributedsystemsclocksmaynotbeinsync• Example:Bankaccount,startingbalance=$100

• ClientatbankmachineAmakesadepositof$100• ClientatbankmachineBmakesawithdrawalof$150• Whicheventhappenedfirst?• Shouldthebankchargetheoverdraftfee?

8

Page 9: Intro to distributed systems - GitHub Pages

Theeightfallaciesofdistributedsystems

”Essentiallyeveryone,whentheyfirstbuildadistributedapplicationmakesthefollowingeightsimplifyingassumptions.Allfalseinthelongrun(PeterDeutsch)”

1. Thenetworkisreliable.2. Thenetworkissecure.3. Thenetworkishomogeneous.4. Thetopologydoesnotchange.5. Latencyiszero.6. Bandwidthisinfinite.7. Transportcostiszero.8. Thereisoneadministrator. Theseassumptionsultimatelyprovefalse,

resultingeitherinthefailureofthesystem,asubstantialreductioninsystemscope,orinlarge,unplannedexpensesrequiredtoredesignthesystemtomeetitsoriginalgoals

Buildingdistributedsystemsishard!

9

Page 10: Intro to distributed systems - GitHub Pages

WhyDistributedSystems

• Functionaldistribution• Computershavedifferentfunctionalcapabilities(e.g.,Fileserver,printer)yetmayneedtoshareresources

• Client/server• Datagathering/dataprocessing

• Incrementalgrowth• Easiertoevolvethesystem• Modularexpandability

• Inherentdistributioninapplicationdomain• Banks,reservationservices,distributedgames,mobileapps• physicallyoracrossadministrativedomains• cashregisterandinventorysystemsforsupermarketchains• computersupportedcollaborativework

10

Page 11: Intro to distributed systems - GitHub Pages

WhyDistributedSystems

• Economics• collectionsofmicroprocessorsofferabetterprice/performanceratiothanlargemainframes.

• Lowprice/performanceratio:costeffectivewaytoincreasecomputingpower.

• Betterperformance• Loadbalancing• Replicationofprocessingpower• Adistributedsystemmayhavemoretotalcomputingpowerthanamainframe.Ex.10,000CPUchips,eachrunningat50MIPS.Notpossibletobuild500,000MIPSsingleprocessorsinceitwouldrequire0.002nsec instructioncycle.Enhancedperformancethroughloaddistributing.

• IncreasedReliability• Exploitindependentfailuresproperty• Ifonemachinecrashes,thesystemasawholecanstillsurvive.

• Anotherdrivingforce:theexistenceoflargenumberofpersonalcomputers,theneedforpeopletocollaborateandshareinformation.

11

Page 12: Intro to distributed systems - GitHub Pages

Goalsandchallengesofdistributedsystems

• Transparency• Howtoachievethesingle-systemimage

• Performance• Thesystemprovideshigh(computing,storage,..)performance

• Scalability• Theabilitytoservemoreusers,provideacceptableresponsetimeswithincreasedamountofdata

• Openness• Anopendistributedsystemcanbeextendedandimprovedincrementally• Requirespublicationofcomponentinterfacesandstandardsprotocolsforaccessinginterfaces

• Reliability/faulttolerance• Maintainavailabilityevenwhenindividualcomponentsfail

• Heterogeneity• Network,hardware,operatingsystem,programminglanguages,differentdevelopers

• Security• Confidentiality,integrityandavailability

12

Page 13: Intro to distributed systems - GitHub Pages

Transparency• Howtoachievethesingle-systemimage,i.e.,howtomakea

collectionofcomputersappearasasinglecomputer.

• Hidingthedistributionattwolevels:• Hidethedistributionfromusers• Atalowerlevel,makethesystemlooktransparenttoprograms.

->Requireuniforminterfacessuchasaccesstofiles,communication.

• DifferentformsoftransparencyinaDS(ISO,1995).• Trade-offbetweentransparencyand

performanceofasystem

13

Page 14: Intro to distributed systems - GitHub Pages

Transparencyindistributedsystems

Access transparency: enables local and remote resources to be accessed usingidentical operations.

• Dropbox• SQL queries

Locationtransparency:enablesresourcestobeaccessedwithoutknowledgeoftheirphysicalornetworklocation(forexample,whichbuildingorIPaddress).

• UserscannottellwherehardwareandsoftwareresourcessuchasCPUs,printers,files,databasesarelocated.

• Navigationintheweb• Tablesindistributeddatabase

MigrationTransparency:resourcesmustbefreetomovefromonelocationtoanotherwithouttheirnameschanged.E.g.,/usr/lee,/central/usr/lee

14

Page 15: Intro to distributed systems - GitHub Pages

Transparencyindistributedsystems

Concurrencytransparency:enablesseveralprocessestooperateconcurrentlyusingsharedresourceswithoutinterference.

• Theusersarenotawareoftheexistenceofotherusers.• Needtoallowmultipleuserstoconcurrentlyaccessthesameresource.Lockandunlockformutualexclusion.

• Distributedfilesystem,distributeddatabase

ParallelismTransparency:Automaticuseofparallelismwithouthavingtoprogramexplicitly.Theholygrailfordistributedandparallelsystemdesigners.

15

Page 16: Intro to distributed systems - GitHub Pages

Transparencyindistributedsystems

Replicationtransparency:enablesmultipleinstancesofresourcestobeusedtoincreasereliabilityandperformancewithoutknowledgeofthereplicasbyusersorapplicationprogrammers.

• OScanmakeadditionalcopiesoffilesandresourceswithoutusersnoticing.

Failuretransparency:enablestheconcealmentoffaults,allowingusersandapplicationprogramstocompletetheirtasksdespitethefailureofhardwareorsoftwarecomponents.

• DBMS,BigDataprocessingsystems

16

Page 17: Intro to distributed systems - GitHub Pages

Transparencyindistributedsystems

Mobilitytransparency:allowsthemovementofresourcesandclientswithinasystemwithoutaffectingtheoperationofusersorprograms.

• Roaming,movingbetweentwoaccesspoints.

Performancetransparency:allowsthesystemtobereconfiguredtoimproveperformanceasloadsvary.Scalingtransparency:allowsthesystemandapplicationstoexpandinscalewithoutchangetothesystemstructureortheapplicationalgorithms.

17

Page 18: Intro to distributed systems - GitHub Pages

Transparencyindistributedsystems

Incertaincasestransparencyisimpracticableornotconvenient

• Somethingscannotbemadetransparent• Timezones• Communicationdelays

• Hidingtoomuchmayhaveanegativeperformanceimpact• Accessingmultipletimesaremoteobjectwithoutknowing

• Sometimestransparencyisjustundesirable• Usersdonotalwayswantcompletetransparency:afancyprinter1000milesaway

18

Page 19: Intro to distributed systems - GitHub Pages

Reliability

• Hardware,softwareandnetworkfail• DSmustmaintainavailabilityevenincaseswherehardware/software/networkhavelowreliability

• Failuresindistributedsystemsarepartial• Makeserrorhandlingparticularlydifficult

• Detection offailures– maybeimpossible• Insomecasesitiseasy,e.g.,checksumincommunication• Hasacomponentcrashed?Orisitjustslow?• Isthenetworkdown?Orisitjustslow?• Ifit’sslow– howlongshouldwewait?

• Manytechniquesforhandling failures• Maskingfailures(retransmissioninprotocols)• Toleratingfailures,degradingtheofferedservice(asinweb-browsers)• Recoveryfromfailures(periodicallysavestateofacomponent,rollbackpartiallycompletedtask)

• Redundancy(replicateserversinfailure-independentways,duplicatenetworkroutes)

19

Page 20: Intro to distributed systems - GitHub Pages

Reliability

•Distributedsystemshouldbemorereliablethansinglesystem.• Example:

• Singlemachine:0.95probabilityofbeingup.• Systemwith3machines(allmachinesneedtobreak):1- 0.05**3probabilityofbeingup.

Availability:fractionoftimethesystemisusable.• Redundancyimprovesit• Recoverybetweenfailures

• Needtomaintainconsistency• Needtomaskfailures

20

Page 21: Intro to distributed systems - GitHub Pages

Performance

•Withoutgainonthis,whybotherwithdistributedsystems.• Performancelossduetocommunicationdelays:

– fine-grainparallelism:highdegreeofinteraction– coarse-grainparallelism

•Performancelossduetomakingthesystemfaulttolerant.

21

Page 22: Intro to distributed systems - GitHub Pages

Scalability

• Systemremainseffectiveasitgrows?• Asyouaddmorecomponents:

• Moresynchronization• Morecommunication–>thesystemrunsslowly.

• Asystemisscalableifitremainseffectivewhenthereisasignificantincreaseintheamountofresources(data)andnumberofusers

• Internet:numberofusersandserviceshasgrownenormously

• Scalabilitydenotestheabilityofasystemtohandleanincreasingfutureload

22

Page 23: Intro to distributed systems - GitHub Pages

Scalability

• Requirementsofscalabilityoftenleadstoadistributedsystemarchitecture(severalcomputers)

• Systemsgrowwithtimeorbecomeobsolete.• Techniquesthatrequireresourceslinearlyintermsofthesizeofthesystemarenotscalable.

• E.g.,broadcastbasedquerywon'tworkforlargedistributedsystems.

• Examplesofbottlenecks:Everyoneiswaitingforasinglesharedresource• Centralizedservices:asinglemailserver• Centralizeddata:asingleURLaddressbook• Centralizedalgorithms:routingbasedoncompleteinformation

23

Page 24: Intro to distributed systems - GitHub Pages

Scalingtechniques

Distribution• Splittingaresource(suchasdata)intosmallerparts,andspreadingthepartsacrossthesystem(cf DNS)

24

Page 25: Intro to distributed systems - GitHub Pages

Scalingtechniques:DNS

Recursivemodealsopossible.Whatistheissue?

Initially,allhost-addess mappingswereinafilehosts.txt (in/etc/hosts)• ChangesweresubmittedtoSRI(StanfordResearchInstitute)byemail• Newversionsofhosts.txt ftp’dperiodicallyfromSRI• Anadministratorcouldpicknamesattheirdiscretion• Anynameisallowed:eugenesdesktopatrice (flatnamespace)

Astheinternetgrewthissystembroke:• SRIcouldn’thandledtheload• Hardtoenforcenameuniqueness• Manyhosts:inaccuratehosts.txt

DomainNameSystem(DNS)wasbornin‘83

25

Page 26: Intro to distributed systems - GitHub Pages

Scalingtechniques

• Replication• Replicateresources(services,data)acrossthesystem,canaccesstheminmultipleplaces

• Cachingtoavoidrecomputation• Increasedavailabilityreducestheprobabilitythatabiggersystembreaks

• Hidingcommunicationlatencies• Avoidwaitingforresponsestoremoteservicerequests

• Useasynchronouscommunication

26

Page 27: Intro to distributed systems - GitHub Pages

Scalingtechniques

• Reducingamountofremoterequests• (a)theservercheckstheformsastheyarebeingfilled(b)aclientdoes.

27

Page 28: Intro to distributed systems - GitHub Pages

Openness

• Canthesystemsbeextendedandreimplemented invariousways?

• Tobeachieved• Publishallkeyaspectsofthesystem

• Protocols• Interfacestoservices

• Adoptingstandardsasmuchaspossible• Takedesigndecisionsthatfavorinteroperabilityandportabiliy

Example:TheInternet.RFCsandanopenstandardizationbody(IETF)

28

Page 29: Intro to distributed systems - GitHub Pages

Heterogeneity

• Hardwareandsoftware(e.g.,operatingsystems,processors)• HowcananIntel/WindowssystemunderstandmessagessentbyanMacintoshOSXsystem?

• Differentperformance.E.g.mobiledeviceshavelowcomputingpower

• Differentnetworkinfrastructures(Ethernet,802.11– wireless)• Programminglanguages

• HowcanaJavaprogramandaCprogramcommunicate?

29

Page 30: Intro to distributed systems - GitHub Pages

Security

• Securityfortheinformationresourcesmadeavailableandmaintainedinthedistributedsystemhasthreecomponents

• Confidentiality:Protectionagainstdisclosuretounauthorizedindividuals• Integrity:Protectionagainstalterationorcorruption• Availability:Protectionagainstinterferencewiththemeanstoaccesstheresource(e.g.,DOSattack)

• Encryptionisapowerfulmechanismbutseveralissuesarestillopen

• DOSattacks• Mobilecode• …

Example:DNSSpoofing

30

Page 31: Intro to distributed systems - GitHub Pages

(More)Basicconcepts

31

Page 32: Intro to distributed systems - GitHub Pages

Parallelvs.distributedcomputing

32

Page 33: Intro to distributed systems - GitHub Pages

Middleware

• Middlewareprovideshorizontalservicestohelpbuildingdistributedapplications

• Itmasksplatformsdifferences

• Example:messageorientedmiddleware• Store(buffer),route,ortransformmessagesconvertingthemfromsenderstoreceivers

33

Page 34: Intro to distributed systems - GitHub Pages

Intranet:AportionoftheInternet

intranet

ISP

desktop computer:

backbone

satellite link

server:

network link:

34

Page 35: Intro to distributed systems - GitHub Pages

Intranet

AportionoftheInternetthat• isseparatelyadministered• usuallyproprietary• providesinternalandexternalservices

• canbeconfiguredtoenforcelocalsecuritypolicies

• mayuseafirewalltopreventunauthorizedmessagesleavingorentering

• maybeconnectedtotheinternetviaarouter

Services:• File,printservices,backup,program-sharing,user-,system-administration,internetaccess

the rest of

email server

Web server

Desktopcomputers

Fil e server

router/firewall

print and other servers

other servers

print

Local areanetwork

email server

the Internet

35

Page 36: Intro to distributed systems - GitHub Pages

Throughput/Latency

• Latency– “wiredelay”• Timetosendandrecv onebyteofdata• Dependson“distance”

• Throughput• Bytespersecond• Dependsonthesizeofthevehicle

• Latencyisoftenthebottleneck• Improvesslowerthanbandwidth• Speedoflight• Routesinthemiddle(trafficstops)• Request-respondcycleoftendominatestheapplication

36

Page 37: Intro to distributed systems - GitHub Pages

Performancescales

Register 1

L2 10

Memory 200

LAN 100,000

Disk 2,000,000

WAN 20,000,00037

Page 38: Intro to distributed systems - GitHub Pages

Exercise

38

Page 39: Intro to distributed systems - GitHub Pages

Exercise

Reading• There’sJustNoGettingaroundIt:You’reBuildingaDistributedSystem[MarkCavage]

• Answerquestionsintheexercisesheetavailableonthewebsite• Openquestionsfordiscussion

39

Page 40: Intro to distributed systems - GitHub Pages

Questions?

40