intro to distributed systems - github pages
TRANSCRIPT
Introtodistributedsystems
1
Whatisadistributedsystem?
Adistributedsystemconsistsofhardwareandsoftwarecomponentslocatedinanetworkofcomputersthatcommunicateandcoordinatetheiractionsonlybypassingmessages.[Coulouris]
Adistributedsystemisacollectionofindependentcomputersthatappearstoitsusersasasinglecoherentsystem.[Tanenbaum &vanSteen]
Adistributedsystemisasystemthatpreventsyoufromdoinganyworkwhenacomputeryouhaveneverheardabout,fails.[Lamport]
• Theabovedefinitionstakedifferentperspectives• Operationalperspective• Userperspective• DScharacteristicsperspective
2
Examplesofdistributedsystems
• Intra-net,inter-net,WWW• DNS:Hierarchicaldistributeddatabase• Networkofworkstations(NOW),ClusterComputers• Email• Electronicbanking• Airlinereservationsystem• Peer-to-peernetworks• Sensornetworks• MobileandPervasiveComputing• Cellularphonesystems• IPTelephony• Flightmanagementsysteminanaircraft• Automotivecontrolsystems(50+embeddedprocessorsinaMercedesS-class)• Distributedfilesystems(NFS,Samba)• P2Pfilesharing• Etc.,etc.,etc.
3
TheInternet
4
“Googleistechnologicallyalargesupercomputer.It'sadistributedsupercomputeramongmanydatacentersdoingallsortsofinterestingthingsoverfiberopticnetworkthateventuallyareservicesavailabletoend-users.”EricSchmidt,GoogleCEO2007
“Googlerunsonhundredsofthousandsofservers—byoneestimate,inexcessof450,000—rackedupinthousandsofclustersindozensofdatacentersaroundtheworld.IthasdatacentersinDublin,Ireland;inVirginia;andinCalifornia,whereitjustacquiredthemillion-square-footheadquartersithadbeenleasing.ItrecentlyopenedanewcenterinAtlanta,andiscurrentlybuildingtwofootball-field-sizedcentersinTheDalles,Ore.”2006
AnestimatetodaysaysthatGooglerunsmorethan2,000,000serverin36datacentersaroundtheglobe
5
"Ourviewisit'sbettertohavetwiceasmuchhardwarethat'snotasreliablethanhalfasmuchthat'smorereliable.Youhavetoprovidereliabilityonasoftwarelevel.Ifyou'rerunning10,000machines,somethingisgoingtodieeveryday.”JeffDean,Googlefellow,2008
“Atypicalsearchwillrequireactionsfrombetween700to1,000machinestoday.”MaryssaMayer,vicepresidentofGoogle’ssearchproductsanduserexperience,2008
Googleprocessesover40,000querieseverysecondonaverage,whichtranslatestoover3.5billionsearchesperday(internetlivestats)
“Ourcurrentgeneration— Jupiterfabrics— candelivermorethan1Petabit/secoftotalbisectionbandwidth.Toputthisinperspective,suchcapacitywouldbeenoughfor100,000serverstoexchangeinformationat10Gb/seach,enoughtoreadtheentirescannedcontentsoftheLibraryofCongressinlessthan1/10thofasecond.”AminVahdat,GoogleFellow,2015
6
Googledatacenters
E.g.,Dallassite:Three68,000squarefootdatacenterbuildings
7https://www.google.com/about/datacenters/
Distributedsystemsvscentralizedsystems
• Concurrency• Incentralizedsystems,concurrencyisadesignchoice.• Indistributedsystems,computersrunconcurrently.
• Independentandpartialfailures• Centralizedsystemsusuallyfailcompletely.• Distributedsystemsusuallyfailpartially,oftenbecauseofcommunication.Whenacomponentfails,theothersarestillrunning.Detectingfailuresmaybehard.Recoveryisalsohardbecausethestateofanapplicationisdistributed.
• Absenceofglobalclock• Incentralizedsystems,thephysicalclockofthecomputercanbeusedforsynchronization.
• Indistributedsystemsclocksmaynotbeinsync• Example:Bankaccount,startingbalance=$100
• ClientatbankmachineAmakesadepositof$100• ClientatbankmachineBmakesawithdrawalof$150• Whicheventhappenedfirst?• Shouldthebankchargetheoverdraftfee?
8
Theeightfallaciesofdistributedsystems
”Essentiallyeveryone,whentheyfirstbuildadistributedapplicationmakesthefollowingeightsimplifyingassumptions.Allfalseinthelongrun(PeterDeutsch)”
1. Thenetworkisreliable.2. Thenetworkissecure.3. Thenetworkishomogeneous.4. Thetopologydoesnotchange.5. Latencyiszero.6. Bandwidthisinfinite.7. Transportcostiszero.8. Thereisoneadministrator. Theseassumptionsultimatelyprovefalse,
resultingeitherinthefailureofthesystem,asubstantialreductioninsystemscope,orinlarge,unplannedexpensesrequiredtoredesignthesystemtomeetitsoriginalgoals
Buildingdistributedsystemsishard!
9
WhyDistributedSystems
• Functionaldistribution• Computershavedifferentfunctionalcapabilities(e.g.,Fileserver,printer)yetmayneedtoshareresources
• Client/server• Datagathering/dataprocessing
• Incrementalgrowth• Easiertoevolvethesystem• Modularexpandability
• Inherentdistributioninapplicationdomain• Banks,reservationservices,distributedgames,mobileapps• physicallyoracrossadministrativedomains• cashregisterandinventorysystemsforsupermarketchains• computersupportedcollaborativework
10
WhyDistributedSystems
• Economics• collectionsofmicroprocessorsofferabetterprice/performanceratiothanlargemainframes.
• Lowprice/performanceratio:costeffectivewaytoincreasecomputingpower.
• Betterperformance• Loadbalancing• Replicationofprocessingpower• Adistributedsystemmayhavemoretotalcomputingpowerthanamainframe.Ex.10,000CPUchips,eachrunningat50MIPS.Notpossibletobuild500,000MIPSsingleprocessorsinceitwouldrequire0.002nsec instructioncycle.Enhancedperformancethroughloaddistributing.
• IncreasedReliability• Exploitindependentfailuresproperty• Ifonemachinecrashes,thesystemasawholecanstillsurvive.
• Anotherdrivingforce:theexistenceoflargenumberofpersonalcomputers,theneedforpeopletocollaborateandshareinformation.
11
Goalsandchallengesofdistributedsystems
• Transparency• Howtoachievethesingle-systemimage
• Performance• Thesystemprovideshigh(computing,storage,..)performance
• Scalability• Theabilitytoservemoreusers,provideacceptableresponsetimeswithincreasedamountofdata
• Openness• Anopendistributedsystemcanbeextendedandimprovedincrementally• Requirespublicationofcomponentinterfacesandstandardsprotocolsforaccessinginterfaces
• Reliability/faulttolerance• Maintainavailabilityevenwhenindividualcomponentsfail
• Heterogeneity• Network,hardware,operatingsystem,programminglanguages,differentdevelopers
• Security• Confidentiality,integrityandavailability
12
Transparency• Howtoachievethesingle-systemimage,i.e.,howtomakea
collectionofcomputersappearasasinglecomputer.
• Hidingthedistributionattwolevels:• Hidethedistributionfromusers• Atalowerlevel,makethesystemlooktransparenttoprograms.
->Requireuniforminterfacessuchasaccesstofiles,communication.
• DifferentformsoftransparencyinaDS(ISO,1995).• Trade-offbetweentransparencyand
performanceofasystem
13
Transparencyindistributedsystems
Access transparency: enables local and remote resources to be accessed usingidentical operations.
• Dropbox• SQL queries
Locationtransparency:enablesresourcestobeaccessedwithoutknowledgeoftheirphysicalornetworklocation(forexample,whichbuildingorIPaddress).
• UserscannottellwherehardwareandsoftwareresourcessuchasCPUs,printers,files,databasesarelocated.
• Navigationintheweb• Tablesindistributeddatabase
MigrationTransparency:resourcesmustbefreetomovefromonelocationtoanotherwithouttheirnameschanged.E.g.,/usr/lee,/central/usr/lee
14
Transparencyindistributedsystems
Concurrencytransparency:enablesseveralprocessestooperateconcurrentlyusingsharedresourceswithoutinterference.
• Theusersarenotawareoftheexistenceofotherusers.• Needtoallowmultipleuserstoconcurrentlyaccessthesameresource.Lockandunlockformutualexclusion.
• Distributedfilesystem,distributeddatabase
ParallelismTransparency:Automaticuseofparallelismwithouthavingtoprogramexplicitly.Theholygrailfordistributedandparallelsystemdesigners.
15
Transparencyindistributedsystems
Replicationtransparency:enablesmultipleinstancesofresourcestobeusedtoincreasereliabilityandperformancewithoutknowledgeofthereplicasbyusersorapplicationprogrammers.
• OScanmakeadditionalcopiesoffilesandresourceswithoutusersnoticing.
Failuretransparency:enablestheconcealmentoffaults,allowingusersandapplicationprogramstocompletetheirtasksdespitethefailureofhardwareorsoftwarecomponents.
• DBMS,BigDataprocessingsystems
16
Transparencyindistributedsystems
Mobilitytransparency:allowsthemovementofresourcesandclientswithinasystemwithoutaffectingtheoperationofusersorprograms.
• Roaming,movingbetweentwoaccesspoints.
Performancetransparency:allowsthesystemtobereconfiguredtoimproveperformanceasloadsvary.Scalingtransparency:allowsthesystemandapplicationstoexpandinscalewithoutchangetothesystemstructureortheapplicationalgorithms.
17
Transparencyindistributedsystems
Incertaincasestransparencyisimpracticableornotconvenient
• Somethingscannotbemadetransparent• Timezones• Communicationdelays
• Hidingtoomuchmayhaveanegativeperformanceimpact• Accessingmultipletimesaremoteobjectwithoutknowing
• Sometimestransparencyisjustundesirable• Usersdonotalwayswantcompletetransparency:afancyprinter1000milesaway
18
Reliability
• Hardware,softwareandnetworkfail• DSmustmaintainavailabilityevenincaseswherehardware/software/networkhavelowreliability
• Failuresindistributedsystemsarepartial• Makeserrorhandlingparticularlydifficult
• Detection offailures– maybeimpossible• Insomecasesitiseasy,e.g.,checksumincommunication• Hasacomponentcrashed?Orisitjustslow?• Isthenetworkdown?Orisitjustslow?• Ifit’sslow– howlongshouldwewait?
• Manytechniquesforhandling failures• Maskingfailures(retransmissioninprotocols)• Toleratingfailures,degradingtheofferedservice(asinweb-browsers)• Recoveryfromfailures(periodicallysavestateofacomponent,rollbackpartiallycompletedtask)
• Redundancy(replicateserversinfailure-independentways,duplicatenetworkroutes)
19
Reliability
•Distributedsystemshouldbemorereliablethansinglesystem.• Example:
• Singlemachine:0.95probabilityofbeingup.• Systemwith3machines(allmachinesneedtobreak):1- 0.05**3probabilityofbeingup.
Availability:fractionoftimethesystemisusable.• Redundancyimprovesit• Recoverybetweenfailures
• Needtomaintainconsistency• Needtomaskfailures
20
Performance
•Withoutgainonthis,whybotherwithdistributedsystems.• Performancelossduetocommunicationdelays:
– fine-grainparallelism:highdegreeofinteraction– coarse-grainparallelism
•Performancelossduetomakingthesystemfaulttolerant.
21
Scalability
• Systemremainseffectiveasitgrows?• Asyouaddmorecomponents:
• Moresynchronization• Morecommunication–>thesystemrunsslowly.
• Asystemisscalableifitremainseffectivewhenthereisasignificantincreaseintheamountofresources(data)andnumberofusers
• Internet:numberofusersandserviceshasgrownenormously
• Scalabilitydenotestheabilityofasystemtohandleanincreasingfutureload
22
Scalability
• Requirementsofscalabilityoftenleadstoadistributedsystemarchitecture(severalcomputers)
• Systemsgrowwithtimeorbecomeobsolete.• Techniquesthatrequireresourceslinearlyintermsofthesizeofthesystemarenotscalable.
• E.g.,broadcastbasedquerywon'tworkforlargedistributedsystems.
• Examplesofbottlenecks:Everyoneiswaitingforasinglesharedresource• Centralizedservices:asinglemailserver• Centralizeddata:asingleURLaddressbook• Centralizedalgorithms:routingbasedoncompleteinformation
23
Scalingtechniques
Distribution• Splittingaresource(suchasdata)intosmallerparts,andspreadingthepartsacrossthesystem(cf DNS)
24
Scalingtechniques:DNS
Recursivemodealsopossible.Whatistheissue?
Initially,allhost-addess mappingswereinafilehosts.txt (in/etc/hosts)• ChangesweresubmittedtoSRI(StanfordResearchInstitute)byemail• Newversionsofhosts.txt ftp’dperiodicallyfromSRI• Anadministratorcouldpicknamesattheirdiscretion• Anynameisallowed:eugenesdesktopatrice (flatnamespace)
Astheinternetgrewthissystembroke:• SRIcouldn’thandledtheload• Hardtoenforcenameuniqueness• Manyhosts:inaccuratehosts.txt
DomainNameSystem(DNS)wasbornin‘83
25
Scalingtechniques
• Replication• Replicateresources(services,data)acrossthesystem,canaccesstheminmultipleplaces
• Cachingtoavoidrecomputation• Increasedavailabilityreducestheprobabilitythatabiggersystembreaks
• Hidingcommunicationlatencies• Avoidwaitingforresponsestoremoteservicerequests
• Useasynchronouscommunication
26
Scalingtechniques
• Reducingamountofremoterequests• (a)theservercheckstheformsastheyarebeingfilled(b)aclientdoes.
27
Openness
• Canthesystemsbeextendedandreimplemented invariousways?
• Tobeachieved• Publishallkeyaspectsofthesystem
• Protocols• Interfacestoservices
• Adoptingstandardsasmuchaspossible• Takedesigndecisionsthatfavorinteroperabilityandportabiliy
Example:TheInternet.RFCsandanopenstandardizationbody(IETF)
28
Heterogeneity
• Hardwareandsoftware(e.g.,operatingsystems,processors)• HowcananIntel/WindowssystemunderstandmessagessentbyanMacintoshOSXsystem?
• Differentperformance.E.g.mobiledeviceshavelowcomputingpower
• Differentnetworkinfrastructures(Ethernet,802.11– wireless)• Programminglanguages
• HowcanaJavaprogramandaCprogramcommunicate?
29
Security
• Securityfortheinformationresourcesmadeavailableandmaintainedinthedistributedsystemhasthreecomponents
• Confidentiality:Protectionagainstdisclosuretounauthorizedindividuals• Integrity:Protectionagainstalterationorcorruption• Availability:Protectionagainstinterferencewiththemeanstoaccesstheresource(e.g.,DOSattack)
• Encryptionisapowerfulmechanismbutseveralissuesarestillopen
• DOSattacks• Mobilecode• …
Example:DNSSpoofing
30
(More)Basicconcepts
31
Parallelvs.distributedcomputing
32
Middleware
• Middlewareprovideshorizontalservicestohelpbuildingdistributedapplications
• Itmasksplatformsdifferences
• Example:messageorientedmiddleware• Store(buffer),route,ortransformmessagesconvertingthemfromsenderstoreceivers
33
Intranet:AportionoftheInternet
intranet
ISP
desktop computer:
backbone
satellite link
server:
☎
network link:
☎
☎
☎
34
Intranet
AportionoftheInternetthat• isseparatelyadministered• usuallyproprietary• providesinternalandexternalservices
• canbeconfiguredtoenforcelocalsecuritypolicies
• mayuseafirewalltopreventunauthorizedmessagesleavingorentering
• maybeconnectedtotheinternetviaarouter
Services:• File,printservices,backup,program-sharing,user-,system-administration,internetaccess
the rest of
email server
Web server
Desktopcomputers
Fil e server
router/firewall
print and other servers
other servers
Local areanetwork
email server
the Internet
35
Throughput/Latency
• Latency– “wiredelay”• Timetosendandrecv onebyteofdata• Dependson“distance”
• Throughput• Bytespersecond• Dependsonthesizeofthevehicle
• Latencyisoftenthebottleneck• Improvesslowerthanbandwidth• Speedoflight• Routesinthemiddle(trafficstops)• Request-respondcycleoftendominatestheapplication
36
Performancescales
Register 1
L2 10
Memory 200
LAN 100,000
Disk 2,000,000
WAN 20,000,00037
Exercise
38
Exercise
Reading• There’sJustNoGettingaroundIt:You’reBuildingaDistributedSystem[MarkCavage]
• Answerquestionsintheexercisesheetavailableonthewebsite• Openquestionsfordiscussion
39
Questions?
40