scalable persistent storage for erlang:theoryandpracticeerlang: … · 2019-07-31 · e.g. 20k data...

33
Scalable Persistent Storage for Erlang: Theory and Practice Erlang: Theory and Practice Amir Ghaffari Natalia Chechina Phil Trinder Amir Ghaffari, Natalia Chechina , Phil Trinder May 7, 2013 1

Upload: others

Post on 19-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Scalable Persistent Storage for Erlang: Theory and PracticeErlang: Theory and Practice

Amir Ghaffari Natalia Chechina Phil TrinderAmir Ghaffari, Natalia Chechina , Phil TrinderMay 7, 2013

1

Page 2: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Outline• Why Persistent Storage?

G l i i l f l bl DBMS• General principles of scalable DBMSs

• NoSQL DBMSs for Erlang

• Reliability of Riak in Practice

• Scalability of Riak in Practice• Scalability of Riak in Practice

• Investigating the scalability of distributed Erlang • Conclusion & Future work

2

Page 3: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

RELEASE projectp j

RELEASE i E j t• RELEASE is an European projectaiming to scale Erlang onto commodityarchitectures with 100000 coresarchitectures with 100000 cores.

3

Page 4: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Why Persistent Storage?Why Persistent Storage?

E l li ti d t t th i d t• Erlang application need to store their data persistently.

• Scalability limits of persistent storage can limit the scalability of Erlang applicationlimit the scalability of Erlang application.

4

Page 5: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

General principles of scalable DBMSs

Data Fragmentation1. Decentralized model (e.g. P2P model)2. Systematic load balancing (make life easier for developer)3. Location transparency

20kB

e.g. 20k data is fragmented among 10 nodes

0-2K 2k-4K 4k-6K 16k-18K 18k-20K

5

Page 6: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

General principles of scalable DBMSs

ReplicationReplication1. Decentralized model (e.g. P2P model)2 Location transparency2. Location transparency3. Asynchronous replication (write is considered complete as soon as

on node acknowledges it)

Xe.g. Key X is replicated on three nodes

.

.X

.

.X

.

.X

6

Page 7: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

General principles of scalable DBMSDBMSs

CAP theorem: cannotNot achievable because network failures are inevitable

ConsistencyConsistency

CAP theorem: cannot simultaneously guarantee:

•Partition tolerance: system continues to operate despite nodes

Not achievable because network failures are inevitable

ConsistencyConsistency AvailabilityAvailability

PartitionPartition

continues to operate despite nodes can't talk to each other

•Availability: guarantee that every request receives a responsePartition

TolerancePartition

ToleranceACID Systems Eventual Consistency

request receives a response

•Consistency: all nodes see the same data at the same time

Solution: Eventual consistency and reconciling conflicts via data versioningACID=Atomicity Consistency Isolation Durability

7

ACID=Atomicity, Consistency, Isolation, Durability

Page 8: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

NoSQL DBMSs for ErlangMnesia CouchDB Riak Cassandra

Fragmentation •Explicit placement •Explicit placement •Implicit placement •Implicit placementg•Client-server •Automatic by using a hash function

•Multi-server•Lounge is not part of each CouchDB node

•Peer to peer•Automatic by using consistent hash technique

•Peer to peer•Automatic by using consistent hash technique

Replication •Explicit placement •Explicit placement •Implicit placement •Implicit placementReplication •Explicit placement•Client-server•Asynchronous( Dirty operation)

•Explicit placement•Multi-server•Asynchronous

•Implicit placement•Peer to peer•Asynchronous

•Implicit placement•Peer to peer•Asynchronous

P titi St i t E t l i t E t l i t E t l i tPartition Tolerant

•Strong consistency •Eventual consistency•Multi-Version Concurrency Control for reconciliation

•Eventual consistency•Vector clocks for reconciliation

•Eventual consistency•Use timestamp to reconcile

Query •The largest possible •No limitation •Bitcask has memory •No limitationQuery Processing &Backend Storage

•The largest possible Mnesia table is 4Gb

•No limitation•Support Map/Reduce Queries

•Bitcask has memory limitation•LevelDB has no limitation•Support Map/Reduce queries

•No limitationSupport Map/Reduce queries

q

8

Page 9: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Initial Evaluation Results

GeneralGeneral Principles

Scalable persistent storage for SD Erlang

Initial Evaluation

storage for SD Erlang can be provided by

Dynamo-like DBMSs, e.g. Riak,Cassandra

Initial Evaluation• Mnesia• CouchDB• Riak• Cassendra

9

Page 10: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Availability and Scalability of Riak in Practice

• Basho Bench, a benchmarking tool for Riak • We use Basho Bench on 348-node Kalkyl cluster• How does Riak cope with node failure? (Availability)• How adding more Riak nodes affect the throughput?

(Scalability)(Scalability)• There are two kinds of nodes in a cluster:

• Traffic generators• Traffic generators • Riak nodes

10

Page 11: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Node Organisation g

We use one traffic generator per 3 Riak nodes

11

g p

Page 12: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Traffic Generator

12

Page 13: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Riak AvailabilityRiak Availability

Time-line shows Riak cluster losing nodes

13

Page 14: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Riak Availability

How Riak deals with failures

14

Page 15: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

ObservationObservation

• Number of failures (37)• Number of successful operations (approximately3 41 illi )3.41 million)

•When failed nodes come back up, the throughputp g phas grown which means Riak has a goodelasticity.

15

Page 16: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Riak ScalabilityRiak Scalability

Benchmark on 348-node Kalkyl cluster at Uppsala University

16

Page 17: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

FailureFailure

17

Page 18: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

What is the Bottleneck?What is the Bottleneck?

18

CPU Usage

Page 19: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Profiling DISKProfiling DISK

19

DISK Usage

Page 20: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Profiling RAMProfiling RAM

20

Memory Usage

Page 21: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Profiling-Network (Generator)Profiling Network (Generator)

21

Network Traffic of Generator Nodes

Page 22: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Profiling-Network (Riak)Profiling Network (Riak)

22

Network Traffic of Riak Nodes

Page 23: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Bottleneck for Riak ScalabilityBottleneck for Riak Scalability The results of profiling CPU, RAM,p g , ,Disk, and Network reveal that theycan't be bottleneck for Riak scalability.y

Is Riak scalability limits due to limits inydistributed Erlang? To find it, we needto measure the scalability ofydistributed Erlang.

23

Page 24: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

DEbench• We design DEbench for measuring the scalability of

distributed Erlang

• Based on Basho Bench

• Measures the Throughput and Latency of Distributed

E l dErlang commands

24

Page 25: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Distributed Erlang Commands

•Spawn: a peer to peer commandSpawn: a peer to peer command

•register_name : global name tables located on every node

•unregister_name : global name tables located on every node

•whereis name : a lookup in the local tablewhereis_name : a lookup in the local table

RegisterUnregisterRegisterUnregister

Erlang VM Erlang VM Erlang VM Erlang VM

Global name table

Global name table

Global name table

Global name table

25

Page 26: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

DEbench P2P NodesDEbench P2P Nodes

Physical host 1

Physical host 2

26

Page 27: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Scalability of Distributed ErlangScalability of Distributed Erlang 0.5% Global operation0.5% Global operation

•Throughput peaks at 50 nodesg•Little improvement beyond 40 nodes

27

Page 28: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Frequency of Global OperationFrequency of Global Operation

Frequently Max Throughput

1% 30 nodes

0.5% 50 nodes

0.33% 70 nodes

0% 1600 nodes

28

Page 29: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

What is the Bottleneck?What is the Bottleneck?

29

Latency for register and unregister for 2% global update

Page 30: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

What is the Bottleneck?What is the Bottleneck?

Latency of spawn

30

Latency of spawn

Page 31: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

What is the Bottleneck?What is the Bottleneck?

31

Latency of whereis_name

Page 32: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

Conclusion and Future workConclusion and Future work•Our benchmark confirms that Riak is highly available and fault-Our benchmark confirms that Riak is highly available and faulttolerant.

•We have discovered the scalability limits of Riak is ~60 nodesy

•Global operation limits the scalability of distributed Erlang.

•We are trying to find the Riak global operationsWe are trying to find the Riak global operations.

•In RELEASE, we are working to scale up Distributed Erlang by

grouping nodes in smaller partitionsgrouping nodes in smaller partitions.

32

Page 33: Scalable Persistent Storage for Erlang:TheoryandPracticeErlang: … · 2019-07-31 · e.g. 20k data is fragmented among 10 nodes 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 5. General principles

33