evaluating nosql performance: which database is right for your data? - sergey sverchkov (altoros)

28
© ALTOROS Systems | CONFIDENTIAL Evaluating NoSQL Performance: Which Database is Right for Your Data Sergey Sverchkov Project Manager [email protected]

Upload: jaxlondonconference

Post on 10-May-2015

259 views

Category:

Technology


2 download

DESCRIPTION

Presented at JAX London 2013 The need to operate terabyte-size databases becomes very common these days. Unless you have implemented architectures that use NoSQL databases and frameworks that support data-intensive distributed applications, then many technology options available are probably a slight enigma. This session focuses on real-world successful attempts to benchmark four of the most popular NoSQL databases side by side. The base tool selected for the purpose of this research is Yahoo Cloud Serving Benchmark and benchmarking is performed on Amazon Elastic Compute Cloud instances.

TRANSCRIPT

Page 1: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL

Evaluating NoSQL Performance:

Which Database is Right for Your

Data

Sergey SverchkovProject Manager

[email protected]

Page 2: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 2

Relational databases are great… But

Problem: Complex Object graphs

Object/Relational impedance mismatch

It is complicated to map rich domain model

to a relational schema

Performance issues

Problem: Schema evolution

Adding attributes to an object

=> have to add columns to table

Expensive, if there is lots of data in that table

Problem: Scaling

Scaling writes difficult/expensive/impossible => big data

Vertical scaling is limited and is expensive

Horizontal scaling is limited and is expensive

Relational Databases

ORDER

ADDRESS

CUSTOMER

ORDER_LINES

Order

ID: 1001Order Date: 15.9.2012

Line Items

Customer

First Name: PeterLast Name: Sample

Billing Address

Street: Somestreet 10City: SomewherePostal Code: 55901

Name

Ipod Touch

Monster Beat

Apple Mouse

Quantity

1

2

1

Price

220.95

190.00

69.90

Page 3: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 3

Why evaluate

• There is a big variety of NoSQL databases: 150+ in 2013

• Different NoSQL database types exist: key-value, columnar,

document, and graph

• NoSQL DBs don’t use the relational data model and don’t use SQL

• They are schema-free, with a flexible data model

• They have different APIs

• Some NoSQL data stores support certain SQL notions

• They operate with eventual consistency

• NoSQL DBs tend to be designed to run on a cluster

• They support horizontal scaling (scaling out)

Overview of the NoSQL ecosystem:

Page 4: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 4

Evaluation criteria:

• Data model: key-value, document, column family, or graph

• Query possibilities: REST API, query language, or Map / Reduce

support

• Concurrency control: optimistic locking or multi-version concurrency

control

• Partitioning: range or hash

• Consistency and replication: availability or consistency

• Performance: typical workloads

How to evaluate NoSQL data stores

Page 5: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 5

Performance evaluation approach: definitions

• Yahoo Cloud Serving Benchmark

a framework with a workload generator

a set of workload scenarios

• Workload is defined by different distributions

which operation to perform

which record to read or write

•  Operations of the following types:

Insert: Inserts a new record.

Update: Updates a record by replacing the value of one field.

Read: Reads a record, either one randomly selected field, or all fields.

Scan: Scans records in order, starting from a randomly selected record key.

How to evaluate NoSQL data stores

Page 6: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 6

Performance evaluation approach - definitions

• Table of 100,000,000 records

Each record is 1,000 bytes in size and contains 10 fields

Fields are named field0, field1, .. Field10

Primary key identifies each record, such as “user234123”

Values in each field are random strings of ASCII characters, 100 bytes each

• Workload executor

multiple client threads

sequential series of operations

the load phase

the transaction phase

How to evaluate NoSQL data stores

Page 7: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 7

Performance evaluation approach – component diagram

How to evaluate NoSQL data stores

Page 8: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 8

Testing environment diagram

Where to evaluate

Page 9: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 9

Performance evaluation – environment specification

• Amazon AWS EC2 instances:

Single availability zone eu-west-1b, Ireland region

Single security group with all required port opened

4 m1.xlarge 64bit instances for cluster nodes: 16GB RAM, 4 vCPU, 8 ECU, high-

performance network

1 c1.xlarge 64bit instance for YSCB client: 7GB RAM, 8 vCPU, 20 ECU, high-

performance network

2 additional c1.medium 64bit instances for mongo routers: 1.7GB RAM, 2 vCPU, 5

ECU, moderate network

• Storage for each NoSQL cluster node:

4 EBS volumes by 25 GB each in RAID0

EBS optimized volumes, no Provisioned IOPS

Where to evaluate

Page 10: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 10

Databases to evaluate

• Cassandra 2.0, settings for each cluster node

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

key_cache_size_in_mb: 1024

row_cache_size_in_mb: 6096

JVM heap size: 6GB

Snappy compressor

Replica factor 1

• MongoDB 2.4.6

2 c1.medium nodes with mongo router process - mongos

Replica factor 1

Sharding by internal key “_id”

Databases to evaluate

Page 11: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 11

Databases to evaluate

• Couchbase 2.1

Replica factor 1

Memory + disk mode

• Hbase 0.92, settings for HRegionServer

JVM heap size 12GB

Replica factor 1

Snappy compressor

Databases to evaluate

Page 12: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 12

Workloads

Performance of the systems was evaluated under different workloads:

Workload A: Update heavily - Read/update ratio: 50/50

Workload B: Read mostly - 95/5 read/update

Workload C: Read only – 100 read

Workload D: Read latest – read / insert ratio 95/5

Workload F: Read-modify-write - read-modify-write/read in a proportion of

50/50

Workload G: Write heavily - 10/90 read/insert ratio.

Workload definition parameters:

fieldcount=10 fieldlength=100

threadcount=100 operationcount=10000

recordcount=100000000

workload=com.yahoo.ycsb.workloads.CoreWorkload

Workloads

Page 13: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 13

Load phase, average latency vs. throughput

Load phase

10000 15000 20000 25000 30000 350000

1

2

3

4

5

6

7

8

9

Load phase, 100.000.000 records * 1 KB, [INSERT]

hbasecassandracouchbasemongodb

Throughput, ops/sec

Aver

age

late

ncy,

ms

Page 14: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 14

Workload A – 50% update operations

Workload A

0 500 1000 1500 2000 2500 30000

20

40

60

80

100

120

Workload A: Update (Update 50%, Read 50%)

cassandra

couchbase

hbase

mongodb

Page 15: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 15

Workload A – 50% read operations

Workload A

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60

70 Workload A: Read (Update 50%, Read 50%)

cassandra

couch

hbase

mongo

Page 16: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 16

Workload B – 5% update operations

Workload B

0 500 1000 1500 2000 25000

20

40

60

80

100

120

Workload B: Update (update 5% , read 95%)

cassandracouchhbasemongo

Page 17: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 17

Workload B – 95% read operations

Workload B

0 500 1000 1500 2000 25000

10

20

30

40

50

60

70

80

90

Workload B: Read (update 5% , read 95%)

cassandracouchhbasemongo

Page 18: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 18

Workload C – 100% read operations

Workload C

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60

70

80

Workload C: 100% Read

cassandracouchhbasemongo

Page 19: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 19

Workload D – 5% insert operations

Workload D

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60

Workload D: Insert (insert 5% , read 95%)

cassandracouchhbasemongo

Page 20: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 20

Workload D – 95% read operations

Workload D

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60

70

80

90

Workload D: Read (insert 5% , read 95%)

cassandracouchhbasemongo

Page 21: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 21

Workload E – 95% scan operations

Workload E

0 50 100 150 200 2500

50

100

150

200

250

300

350

400Workload E: Insert (Insert 5%, Scan 95%)

cassandra

hbase

Page 22: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 22

Workload F – 50% Read operations

Workload F

0 500 1000 1500 2000 25000

10

20

30

40

50

60

70

80

Workload F: read (Read-Modify-Write 50%, Read 50%)

cassandracouchhbasemongo

Page 23: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 23

Workload F – Update part of Read-Modify-Write

Workload F

0 500 1000 1500 2000 25000

20

40

60

80

100

120

140

Workload F: Update (Read-Modify-Write 50%, Read 50%)

cassandracouchhbasemongo

Page 24: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 24

Workload F – 50% Read-Modify-Write operations

Workload F

0 500 1000 1500 2000 25000

20

40

60

80

100

120

140

160

180

200

Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%)

cassandracouchhbasemongo

Page 25: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 25

Workload G – 90% Insert operations

Workload G

0 1000 2000 3000 4000 5000 6000 70000

5

10

15

20

25

30

35

Workload G: Insert (Insert 90%, Read 10%)

cassandracouchhbasemongo

Page 26: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 26

Workload G – 10% Read operations

Workload G

0 1000 2000 3000 4000 5000 6000 70000

5

10

15

20

25

30

35

40

45

50

Workload G: Read (Insert 90%, Read 10%)

cassandracouchhbasemongo

Page 27: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 27

Choose a solution based on your needs:

• Identify typical application operations

• Identify datasets and potential datamodel

• Identify transaction, replication and consistency requirements

• Identify performance requirements

• Identify how you can migrate, if needed

• Evaluate functionality and performance of chosen databases

• Build proof-of-concept for the solution

• No perfect NoSQL / RDBMS database and no “bad”

Conclusion

Page 28: Evaluating NoSQL performance: Which database is right for your data? - Sergey Sverchkov (Altoros)

© ALTOROS Systems | CONFIDENTIAL 28

Evaluating NoSQL Performance

Sergey Sverchkov

Project Manager

[email protected]

Altoros, 2013

Thank you