dynamo: amazon's highly available key-value storemanosk/assets/slides/w18/dynamo.pdf ·...

44
Dynamo: Amazon's Highly Available Key-value Store Author : Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshalland Werner Vogels Presentation: Shijie Xu, Ying Wang

Upload: others

Post on 04-Feb-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Dynamo:Amazon'sHighlyAvailableKey-valueStore

Author:GiuseppeDeCandia,DenizHastorun,MadanJampani,GunavardhanKakulapati,AvinashLakshman,AlexPilchin,Swaminathan

Sivasubramanian,PeterVosshallandWernerVogels

Presentation:ShijieXu,YingWang

Page 2: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

WhyDynamo?

●FullyManaged●Fast,ConsistentPerformance●HighlyScalable●Flexible

Page 3: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

SystemAssumptionsandRequirements●QueryModel:simplereadandwriteoperationtoasmalldataitemthatisuniquelyidentifiedbyakey

●ACIDProperties:Atomicity,(Weaker)Consistency,(No)Isolation,Durability

●Efficiency:Latencyrequirementswhichareingeneralmeasuredatthe99.9thpercentileofthedistribution

●OtherAssumption:Onlydealwithbenignfailures

Page 4: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

ServiceLevelAgreements

●Applicationcandeliveritsfunctionalityinaboundedtime

Fig-1Service-orientedarchitectureofAmazon’splatform

Page 5: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

DesignConsideration

● Sacrificestrongconsistencyforavailability● Alwayswriteable● Conflictresolutionisexecutedduringreadinsteadofwrite● Otherprinciples:

○ Incrementalscalability○ Symmetry○ Decentralization○ Heterogeneity

Page 6: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

SystemArchitecture

Coretechniquesused:● Partitioning● Replication● Versioning● Membership● Failurehandling

Page 7: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

API● get(key)

ReturnsAsingleobjectoralistofobjects,andAcontext

● put(key,context,object)Useskey todeterminethewritereplicasWritesthereplicastodisk

● ContextSystemmetadataabouttheobject

Page 8: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioning● ConsistentHashing

○ Theoutputrangeofhashedvaluestreatasa“ring”

○ Pros: incrementallyscalable,addingasinglenodedoesnotaffectthesystemsignificantly

○ Cons:leadtotheunevendistributedload,andoblivioustotheheterogeneityintheperformanceofnodes.

Page 9: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioning

● “VirtualNode”○ Eachnodecanberesponsibleformorethanonevirtualnodes

○Workdistributionproportionaltothecapabilitiesoftheindividualnode

Page 10: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Replication

● EachdataitemisreplicatedatNhosts● Preferencelist:Thelistofnodesthatisresponsiblefor

storingaparticularkey○ MaycontainmorethanNnodesduetofailures○ Containsonlydistinctphysicalnodes

Page 11: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Replication

●Example:N=3○NodeBreplicatesthekeykatnodeCandDinadditiontostoringitlocally

○ NodeDwillstorethekeysintherange(A,B],(B,C],and(C,D]

Fig-4PartitioningareplicationofkeysinDynamoring.

Page 12: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

DataVersioning

● Systemiseventuallyconsistent,thusaget()callmayreturnmanyversionsofthesameobject

● Challenge:Anobjectcanhavedistinctversionsub-histories,thesystemneedstoreconcileinthefuture

● Solution:SyntacticreconciliationandSemanticreconciliation

Page 13: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

VectorClock● Avectorclockisalistof[node,counter]

pairs● Versionedobject->vectorclock● Updateanobject,put(key,context,

object)● “context”isobtainedfromanearlier

readoperation,whichcontainsthevectorclockinformation

Page 14: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Syntacticreconciliation

Page 15: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Whatif?

Source: Rick and Morty S02E01

Page 16: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Semanticreconciliation● Failures+concurrent

updating=>versionbranching

● Collapse● Versionbranchingis

resolvedbydatastoreortheapplication○ Datastore:latestwrite

wins○ Application:merge

Page 17: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

VectorClockIssue● Vectorclockmaygrowwhenmanyserverscoordinatethe

writestooneobject● TruncationScheme

○ Deletetheoldest[node,counter]pairwhenthenumberofpairsreachesathreshold

● Moreissue:Inefficienciesinreconciliationbecauseofmissinginformation○ Notshowinproduction

Page 18: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Clientrequestchoices

● Genericloadbalancer○ NocodespecifictoDynamo○ Extrarequestforwardingstep

● Partition-awareclientlibrary○ betterperformance

Page 19: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Executegetandput:QuoruminDynamo

● Thefirstreachablenodeinthepreferencelististhecoordinator● R:minimumnumberofnodesthatmustparticipateinsuccessful

readoperation● W:minimumnumberofnodesthatmustparticipateinasuccessful

writeoperation● SettingR+W>Nyieldsaquorum-likesystem● Thelatencyofaget()(orput())operationisdictatedbytheslowest

oftheR(orW)replicas● RandWareusuallyconfiguredtobelessthanN,toprovidebetter

latency

Page 20: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Executionofget() operationget()● CoordinatorrequestsreadingfromNnodes,waitsforRresponses● Iftheresponsesagree,returnstheobjectwithcontext● Iftheydisagree

○ Iftheyarecausallyrelated,returnsthemostrecentvalue○ Iftheyarecausallyunrelated,returnsallversions

Page 21: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Executionofput() operationput():● Coordinatorgeneratesnewversionvectorclockandwritesnew

versionlocally● Forwardsmetadatatohighestrankedreachablenodesinthe

preferencelist● WaitsforW-1ormorewritestobesucceed

Page 22: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

HandlingFailures:HintedHandoff

●“Alwayswriteable”

●Avoidthereadandwriteoperationsfailure,duetotemporarynodeornetworkfailures

Fig-6PartitioningareplicationofkeysinDynamoring.

Page 23: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

HandlingpermanentFailures:ReplicaSynchronization●Merkletree:

○ Parentnodearehashesof(immediate)children

○ Comparisonofparentsatthesameleveltellsthedifferenceinchildren

○ Doesnotrequiretransferringentire(key,value)pairs

Page 24: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Thepowerofgossip

Page 25: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Thepowerofgossip

●RingMembership○Allnodesexchangetheirmembershiphistories○Eachnoderandomlycontactapeereverysecond○Eventuallyconsistent○Eachnodeforwardakey’sread/writeoperationsrightsetofnodesdirectly

Page 26: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Thepowerofgossip

●ExternalDiscovery○Nodesmaynotknoweachother- logicalpartitions○SeedNodestoavoidlogicalpartitions

Page 27: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Thepowerofgossip

●FailureDetection○Detectfailurelocallyissufficient○Periodicallyretryfailednode(s)○Noneedforadecentralizedfailuredetector

Page 28: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Implementation

●Java●Localpersistencecomponentallowsfordifferentstorageenginestobepluggedin:

○BerkeleyDatabase(BDB)TransactionalDataStore:objectoftensofkilobytes

○MySQL:objectof>tensofkilobytes

Page 29: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

MainmodesofDynamo

●Businesslogicspecificreconciliation○Merge○Application-specificreconciliation

● Timestampbasedreconciliation○ Lastwritewins

●Highperformancereadengine○R=1,WisN,Dynamoprovidestheabilitytopartitionandreplicatetheirdataacrossmultiplenodestherebyofferingincrementalscalability

Page 30: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Experiences

●N:durability●WandR:availability,durabilityandconsistency

○ IncreaseWcanincreasedurabilitybutreduceavailability● (N,R,W)=(3,2,2)providessatisfyingperformance,durability,consistency,andavailabilitySLAs

Page 31: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Performance

●GuaranteeServiceLevelAgreements(SLA)○ Latencies:diurnalpattern(incomingrequestrate)

○Writelatencies>>Readlatencies

●Latenciesaround200ms

Fig-9Averageand99.9percentilesoflatenciesforreadandwriterequestduringpeakseasonofDec.2006

Page 32: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

BetterPerformance

●Tradedurabilityforbetterperformance

●Eachstoragenodemaintainsanobjectbufferinitsmainmemory

●Writeobjectsinbuffertodiskusingawriterthreadperiodically

●Readfrombufferinmemory

Fig-10Comparisonofperformanceof99.9th percentilelatenciesforbufferedvs.non-bufferedwritesoveraperiodof24hours

Page 33: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Balance

●Outofbalance○ Ifthenode’srequestloaddeviatesfromtheaverageloadbyavaluemorethanacertainthreshold(hereis15%)

● Imbalanceratiodecreaseswithincreasingload●Underhighloads,alargenumberofpopularkeysareaccessedandtheloadisevenlydistributed

Fig-11Fractionofnodesthatareoutofbalance,andtherecorrespondingrequestload.Theintervalbetweenticksinx-axiscorrespondtoatimeperiodof30mins.

Page 34: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioningandplacementofkey

Strategy1:Trandomtokenspernodeandpartitionbytokenvalue● Slowbootstrappingprocess● RecalculationoftheMerkletree● Datapartitioninganddataplacementareintertwined

Strategy2andStrategy3● Equalsizepartitioningstrategiestodistributeloaduniformly

Page 35: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Server-drivenandClient-drivenCoordination

●Useastatemachinetohandleincomingrequests

●Movethestatemachinetotheclientnodes

Page 36: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Balancingbackground&foreground

●Eachnodeperformsbothbackgroundandforegroundoperation

●Backgroundtriggerresourcecontention

●Admissioncontroller:changetheruntimeslicesoftheresourceforbackground

Page 37: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Conclusion

●Dynamoisahighlyavailable and scalabledatastoreforAmazon’se-commerceplatform.

●Techniques:○Gossiping formembershipandfailuredetection○Consistenthashing fornodeandkeydistribution○Objectversioning foreventually– consistentdataobjects○Quorums forpartition/failuretolerance○Merkletree forresynchronizationafterfailures/partitions

Page 38: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Questions?

Page 39: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

HandlingpermanentFailures:ReplicaSynchronization●Comparingtwonodesthataresynchronized

○ Two(key,value)pairs:(k0,v0)&(k1,v1)

Fig-7ReplicaSynchronization

Page 40: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

HandlingpermanentFailures:ReplicaSynchronization●Comparingtwonodesthatarenotsynchronized

○ Two(key,value)pairs:(k0,v0)&(k1,v1)

Fig-8ReplicaNotSynchronization

Page 41: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioningandplacementofkey

Strategy1:● Trandomtokenspernodeand

partitionbytokenvalueProblems:● Slowbootstrappingprocess● RecalculationoftheMerkletree● Complicatedarchivalprocess

Fig-12Partitioningandplacementofkey,strategy1

Page 42: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioningandplacementofkey

Fig-13Partitioningandplacementofkey,strategy2

Strategy2:● Trandomtokenspernodeandequalsized

partitions

● DividesthehashspaceintoQequallysizedpartitions

Page 43: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Partitioningandplacementofkey

Fig-14Partitioningandplacementofkey,strategy3

Strategy3:Q/Stokenspernode,equal-sizedpartitions

• DividesthehashspaceintoQequallysizedpartitions

• EachnodeisassignedQ/Stokens

Page 44: Dynamo: Amazon's Highly Available Key-value Storemanosk/assets/slides/w18/dynamo.pdf · Service Level Agreements Application can deliver its functionality in a ... The power of gossip

Thankyou