design principles of scalable, distributed systems

49
06/06/22 Tinniam V Ganesh - http://gig adom.wordpress.com 1 Design Principles of Scalable, Distributed Systems Tinniam V Ganesh [email protected]

Upload: tinniam-ganesh

Post on 05-Dec-2014

6.013 views

Category:

Technology


3 download

DESCRIPTION

Key algorithms of scalable, distributed systems

TRANSCRIPT

Page 1: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

1

Design Principles of Scalable, Distributed Systems

Tinniam V [email protected]

Page 2: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

2

Distributed Systems

There are two classes of systems- Monolithic- Distributed

Page 3: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

3

Traditional Client Server Architecture

Client Server

Page 4: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

4

Properties of Distributed SystemsDistributed Systems are made up of 100s of commodity servers• No machine has complete information about the system state• Machines make decisions based on local information• Failure of one machine does not cause any problems• There is no implicit assumption about a global clock

Page 5: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

5

Characteristics of Distributed Systems

Distributed Systems are made up of• Commodity Servers• Large number of servers• Servers crash, there network failures, messages not sent, received• New Servers can join without changing behavior

Page 6: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

6

Examples of Distributed Systems• Amazon’s e-retail store• Google• Yahoo• Facebook• Twitter• YoutubeEtc

Page 7: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

7

Key principles of distributed systems

• Incremental scalability• Symmetry – All nodes are equal• Decentralization – No central control• Work distribution heterogenity

Page 8: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

8

Transaction Processing System• Traditional databases have to ensure that transactions are consistent. Transaction

must be fully complete or not at all. • Successful transactions are committed.• Otherwise transactions are rolled back

Page 9: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

9

ACID postulateTransactions in traditional system have to have the following propertiesEarlier Systems were designed for ACID propertiesA – AtomicC – ConsistentI – IsolatedD - Durable

Page 10: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

10

ACIDAtomic – This property ensures that each transaction happens completely or not at all

Consistent - The transaction should maintain system invariants. For e.g. an internal bank transfer should result in the total amount in the bank before and after the transaction to be same. It may be temporarily different

Isolated – Different transactions should be isolated or serializable. It must appear that transactions happen sequentially in some particular order

Durable – Once the transaction commits the effect is complete and durable going forward.

Page 11: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

11

ScalingThere are 2 types of scaling

Vertical scaling – This method scales by adding faster CPU , more memory and a larger database. Does not scale beyond a particular point

Horizontal scalability – This method scales laterally by adding more servers with the same capacity

Page 12: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

12

System behavior on Scaling

Load

Response

ThroughputResponse TimeTransactions

Per Second

Page 13: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

13

Consistency and Replication

In order to increase reliability against failures data has to be replicated across multiple servers.

The problem with replicas is the need to keep the data consistent

Page 14: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

14

Reasons for Replication

Data is replicated in distributed systems for two reasons- Reliability – Ensuring that there is a consistency in data in a majority of the replicas- Performance – Performance can be improved by accessing a replica that is closer

to the user. Geographical resiliency

Page 15: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

15

Downside of Replication• Replication of data has several advantages but the downside is the issue

maintaining consistency• A modification of a copy makes it different from the rest and this update has to be

propagated to all copies to ensure consistency

Page 16: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

16

SynchronizationNo machine has a view of the global system state

• Problems with distributed systems• How can processes synchronize ?• Clocks on different systems will be slightly different• Is there a way to maintain a global view of the clock• Can we order events causally?

Page 17: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

17

Hypothetical situationConsider a hypothetical situation with banks

- Man deposits Rs 55,000/- at 10.00 am- Man withdraws Rs 20,000/- at 10.02 amWhat will happen if the updates happen in different order- Operations must be idempotent. Idempotency refers to getting the same

result no matter how many times the operation is performed.

eCommerce Site – Amazon-add to shopping cart-delete from shopping cart

Page 18: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

18

Vector ClocksVector clocks are used to capture causality between different versions of the same

object.Amazon’s Dynamo uses vector clocks to reconcile different versions of the objects and

determine the causal ordering of events.

Page 19: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

19

Vector Clocks

2

4

6

8

10

12

14

16

18

5

10

15

20

25

30

41

46

51

8

16

24

32

40

48

56

64

68

OK

Adjust

Page 20: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

20

Dynamo’s reconciliation process

Page 21: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

21

Problem with Relational DatabasesRDBMS databases provide the user the ability to construct complex queries but they

do not scale well.ProblemPerformance deteriorates as the number of records reach several million

Solution To partition the database horizontally and distribute records across several servers.

Page 22: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

22

No SQL Databases• Databases horizontally partitioned• Simple queries based on gets() and sets()• Access are made on key/value pairs• Cannot do complex queries like joins• Database can contain several hundred million records

Page 23: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

23

Databases that use Consistent Hashing

1. Cassandra2. Amazon’s Dynamo3. NoSQL4. HBASE5. CouchDB6. MongoDB

Page 24: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

24

Hash Tables• Distribute records among many servers• Distribution based on keys which is hashed• Key – 128 bit or 160 bits• Hash values fall into a range servers visualized to lie on the circumference of a

circle going clockwise.

Page 25: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

25

Distributed Hash Table• Hashing the keys results in reaching servers are assumed to reside on the

circumference of a circle• The highest key coincides back to the beginning of this circle• The movement is clockwise

Page 26: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

26

Distributed Hash Table

An entity with key K falls under the jurisdiction of the node with the smallest id >= K

• For e.g. if we have two nodes, one at position 50 and another at position 200. • If we want to store a key / value pair in the DHT and the key hash is 100, would go

to node 200.• Another key hash of 30 would go to the node 50

Page 27: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

27

Consistent HashingA naïve approach with 8 nodes and 100 keys could use a simple modulo algorithm.So key 18 would end up on node 2 and key 63 on node 7.But how do we handle servers crashing or new servers joining the system.Consistent Hashing handles this issue

Page 28: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

28

Consistent Hashing

Source: http://offthelip.org/

Page 29: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

29

Distributed Hash Table

Page 30: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

30

Consistent Hashing

Source: http://horicky.blogspot.in

Page 31: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

31

Chord System

2

1

4

14

9

18

21

20

28

1 4

1 4

3 9

4 9

5 18

1 20

1 20

3 28

4 28

5 4

1 21

1 28

3 28

4 28

5 4

1 28

1 28

3 28

4 1

5 9

1 1

1 1

3 1

4 4

5 14

FTp[i]=succ(p+2 i-1)

23

Resolving K = 26

Page 32: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

32

Process of determining nodeTo look up a key k node p will forward request to node q with index j in p’s finger table

such thatq = FTp[j] <= k < FTp[j+1] To resolve k =261. 26> FT1[5] = 18. Hence forwarded to Node 182. FT18[2] <= 26 < FT 18[3]3. FT20[1] <=26 < FT20[2]4. 26 > FT21[1] = 28 Hence Node 28 is responsible for key 26

Page 33: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

33

Hashing efficiency of Chord SystemThe Chord System gets to the node in O (log n) stepsThere are other hashing techniques that get in O(1) but use a larger local table. For

example attains a O(1) hashing method.

Page 34: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

34

Joining the Chord System

Suppose node p wants to join. It performs the following steps- Requests lookup for succ (p+1)- Inserts itself before this node

Page 35: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

35

Maintaining consistencyPeriodically each node checks its successor’s predecessor.Node ‘q’ contacts succ(q+1) and requests it to return pred(succ(q+1))If q = pred(succ(q+1)) then nothing has changed. If the node passes another value

then q knows that a new node ‘p’ has joined the system q < p < succ (q+1)so it updates its Finger table so qWill set FTq[1] = p

Page 36: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

36

CAP TheoremDatabases that are designed based on ACID properties have poor availability.

Postulated by Eric Brewer of University of BerkeleyAt most only 2 of 3 properties are possible in distributed systemsC – ConsistencyA – AvailabilityP – Partition Tolerance

Page 37: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

37

CAP Theorem• Consistency – Ability for repeated reads to provide the same value• Availability – Ability to be resilient to server crashes• Partition Tolerance – Ability to partition data between servers and always be able

to get the data

Page 38: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

38

Real world examples of CAP Theorem

Amazon’s Dynamo chooses availability over consistency. Dynamo implements eventual consistency where data become consistent over time

Google’s BigTable chooses consistency over availability

Consistentcy, Partition Tolerance (CP)Big TableHbase

Availability, Partition Tolerance (AP)DynamoVoldemortCassandra

Page 39: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

39

Consistency issuesData replication used in many commercial systems perform synchronous replica

coordination to provide strongly consistent data.The downside of this approach is the poor availabilityThese systems maintain that the data is unavailable if they are not able to ensure

consistencyFor e.g.If data is replicated on 5 servers and an update needs to be made then the following

has to be done- Update all 5 copies- Ensure all of them are successful- If one of them fails roll back the updates on the other 4

If a read is done when one of the server fail a strongly consistent system would return “data unavailable” when correctness is undetermined.

Page 40: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

40

Quorum ProtocolTo maintain consistency data is replicated in many servers. For e.g. let us assume there are N servers in the systemTypical algorithms maintain at least writes to > N/2 => N/2 +1Usually Nw> N/2A write is successful if it has been successfully committed in N/2 +1 serversThis is known as write quorum

Page 41: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

41

Quorum ProtocolSimilarly reads are done from an arbitrary number of server replicas Nr. This

is known as a read quorumReads from different servers are comparedA consistent design requires that Nw + Nr > NWith this you are assured of reading your writes

Page 42: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

42

Election AlgorithmMany distributed systems usually have one process to act as a coordinator. If

the coordinator crashes then an election takes place to identify the new coordinator

1. P sends a ELECTION message to all higher numbered processes2. If no one responds P becomes coordinator3. If a higher number process answers, it takes over the election process

Page 43: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

43

Traditional Fault ToleranceTraditional systems use redundancy to handle failures and be tolerant to fault as

shown below

Active Standby

Active Standby

Page 44: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

44

Process ResilienceHandling failures in distributed systems is much more difficult as no system has any

view of the global state.

Page 45: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

45

Byzantine FailuresByzantine refers to Byzantine General Problem where an army must unanimously

decide whether to attack another army. The problem is complicated because the generals must use messengers to communicate and by the presence of traitors

Distributed Systems are prone to a type of failures known as Byzantine failuresOmission failures – Disk crashes, network congestion, failure to receive request etcCommission failures – Failures when the server behaves incorrectly, corrupting local

state etc

Solution: To be able to handle Byzantine Failures where k processes are sick is to have a minimum 2k+1 processes so that we are left with k+1 replies given that k process are behaving incorrectly

Page 46: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

46

CheckpointingIn fault tolerant distributed computing backward error recovery requires that the

system regularly save its state at periodic intervals. We need to create a consistent global state called a distributed snapshot.

In a distributed snapshot if a process P has recorded the receipt of a message then there should be a process Q that has sent a corresponding message.

Each process saves its state from time to time.To recover we need to construct a consistent global state from these local states

Page 47: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

47

Gossip ProtocolUsed to handle server crashes and server or servers joining into the systemChanges to the distributed system like membership changes are spread

similar to gossiping- A server picks another random server and sends a message regarding a

server crash or a server joining- If the receiver has already received this message it is dropped.- The receiving server similarly gossips to other servers and the system

reaches a steady state soon

Page 48: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

48

Sloppy Quorum Quorum protocol is applied on first N healthy nodes rather than N nodes walking

clockwise in the ring.

Data meant for Node A is sent to Node D if A is temporarily down. Node D has a hinted handoff in its metadata that updates Node A when it is up.

Page 49: Design principles of scalable, distributed systems

04/10/23 Tinniam V Ganesh - http://gigadom.wordpress.com

49

Thank You !

Tinniam V Ganesh

[email protected]

Read my blogs: http://gigadom.wordpress.com/

http://savvydom.wordpress.com/