Сергей Сверчков - Оцениваем решения nosql: какая база...
DESCRIPTION
IT_Share. Highload 2.0TRANSCRIPT
© ALTOROS Systems | CONFIDENTIAL
Evaluating NoSQL performance:
which database is right for your
data
Sergey SverchkovProject Manager
© ALTOROS Systems | CONFIDENTIAL 2
Why to evaluate
• Big variety of NoSQL databases: 150+ in 2013
• Different NoSQL database types: key-value,
columnar, document, graph
• Schema-free with flexible data model
• Different API, some notion of SQL
• Eventual consistency
• Tend to be designed to run on cluster
NoSQL world
© ALTOROS Systems | CONFIDENTIAL 3
Evaluation criteria
• Data model: key-value, document, column family, graph
• Query possibilities: REST API, query language, Map / Reduce support
• Concurrency control: optimistic locking, multiversion concurrency
control
• Partitioning: range, hash
• Consistency and replication: availability or consistency
• Performance: typical workloads
How to evaluate
© ALTOROS Systems | CONFIDENTIAL 4
Performance evaluation approach - definitions
• Yahoo Cloud Serving Benchmark
a framework with a workload generator
a set of workload scenarios
• Workload is defined by different distributions
which operation to perform
which record to read or write
• Operations of the following types:
Insert: Inserts a new record.
Update: Updates a record by replacing the value of one field.
Read: Reads a record, either one randomly selected field, or all fields.
Scan: Scans records in order, starting at a randomly selected record key.
How to evaluate
© ALTOROS Systems | CONFIDENTIAL 5
Performance evaluation approach - definitions
• Table of 100,000,000 records
Each record is 1,000 bytes in size and contains 10 fields
Fields are named field0, field1, .. Field10
Primary key identifies each record
Values in each field are random strings of ASCII characters, 100 bytes each
• Workload executor
multiple client threads
sequential series of operations
the load phase
the transaction phase
How to evaluate
© ALTOROS Systems | CONFIDENTIAL 6
Performance evaluation approach – component diagram
How to evaluate
© ALTOROS Systems | CONFIDENTIAL 7
Performance evaluation – environment specification
• Amazon AWS EC2 instances:
Single availability zone eu-west-1b, Ireland region
Single security group with all required port opened
4 m1.xlarge 64bit instances for cluster nodes: 16GB RAM, 4 vCPU, 8 ECU, high-
performance network
1 c1.xlarge 64bit instance for YSCB client: 7GB RAM, 8 vCPU, 20 ECU, high-
performance network
2 additional c1.medium 64bit instances for mongo routers: 1.7GB RAM, 2 vCPU, 5
ECU, moderate network
• Storage for each NoSQL cluster node:
4 EBS volumes by 25 GB each in RAID0
EBS optimized volumes, no Provisioned IOPS
Where to evaluate
© ALTOROS Systems | CONFIDENTIAL 8
Testing environment diagram
Where to evaluate
© ALTOROS Systems | CONFIDENTIAL 9
Databases to evaluate
• Cassandra 2.0, settings for each cluster node
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
key_cache_size_in_mb: 1024
row_cache_size_in_mb: 6096
JVM heap size: 6GB
Snappy compressor
Replica factor 1
• MongoDB 2.4.6
2 c1.medium nodes with mongo router process - mongos
Replica factor 1
Sharding by internal key “_id”
Databases to evaluate
© ALTOROS Systems | CONFIDENTIAL 10
Databases to evaluate
• Couchbase 2.1
Replica factor 1
Memory + disk mode
• Hbase 0.92, settings for HRegionServer
JVM heap size 12GB
Replica factor 1
Snappy compressor
Databases to evaluate
© ALTOROS Systems | CONFIDENTIAL 11
Workloads
Performance of the systems was evaluated under different workloads:
Workload A: Update heavily - Read/update ratio: 50/50
Workload B: Read mostly - 95/5 read/update
Workload C: Read only – 100 read
Workload D: Read latest – read / insert ratio 95/5
Workload F: Read-modify-write - read-modify-write/read in a proportion of
50/50
Workload G: Write heavily - 10/90 read/insert ratio.
Workloads
© ALTOROS Systems | CONFIDENTIAL 12
Load phase, average latency for throughput
Load phase
10000 15000 20000 25000 30000 350000
1
2
3
4
5
6
7
8
9
Load phase, 100.000.000 records * 1 KB, [INSERT]
hbasecassandracouchbasemongodb
Throughput, ops/sec
Aver
age
late
ncy,
ms
© ALTOROS Systems | CONFIDENTIAL 13
Workload A – 50% update operations
Workload A
0 500 1000 1500 2000 2500 30000
20
40
60
80
100
120
Workload A: Update (Update 50%, Read 50%)
cassandra
couchbase
hbase
mongodb
© ALTOROS Systems | CONFIDENTIAL 14
Workload A – 50% read operations
Workload A
0 500 1000 1500 2000 2500 30000
10
20
30
40
50
60
70 Workload A: Read (Update 50%, Read 50%)
cassandra
couch
hbase
mongo
© ALTOROS Systems | CONFIDENTIAL 15
Workload B – 5% update operations
Workload B
0 500 1000 1500 2000 25000
20
40
60
80
100
120
Workload B: Update (update 5% , read 95%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 16
Workload B – 95% read operations
Workload B
0 500 1000 1500 2000 25000
10
20
30
40
50
60
70
80
90
Workload B: Read (update 5% , read 95%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 17
Workload C – 100% read operations
Workload C
0 500 1000 1500 2000 2500 30000
10
20
30
40
50
60
70
80
Workload C: 100% Read
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 18
Workload D – 5% insert operations
Workload D
0 500 1000 1500 2000 2500 30000
10
20
30
40
50
60
Workload D: Insert (insert 5% , read 95%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 19
Workload D – 95% read operations
Workload D
0 500 1000 1500 2000 2500 30000
10
20
30
40
50
60
70
80
90
Workload D: Read (insert 5% , read 95%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 20
Workload E – 95% scan operations
Workload E
0 50 100 150 200 2500
50
100
150
200
250
300
350
400Workload E: Insert (Insert 5%, Scan 95%)
cassandra
hbase
© ALTOROS Systems | CONFIDENTIAL 21
Workload F – 50% Read operations
Workload F
0 500 1000 1500 2000 25000
10
20
30
40
50
60
70
80
Workload F: read (Read-Modify-Write 50%, Read 50%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 22
Workload F – Update part of Read-Modify-Write
Workload F
0 500 1000 1500 2000 25000
20
40
60
80
100
120
140
Workload F: Update (Read-Modify-Write 50%, Read 50%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 23
Workload F – 50% Read-Modify-Write operations
Workload F
0 500 1000 1500 2000 25000
20
40
60
80
100
120
140
160
180
200
Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 24
Workload G – 90% Insert operations
Workload G
0 1000 2000 3000 4000 5000 6000 70000
5
10
15
20
25
30
35
Workload G: Insert (Insert 90%, Read 10%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 25
Workload G – 10% Read operations
Workload G
0 1000 2000 3000 4000 5000 6000 70000
5
10
15
20
25
30
35
40
45
50
Workload G: Read (Insert 90%, Read 10%)
cassandracouchhbasemongo
© ALTOROS Systems | CONFIDENTIAL 26
Choose your own path
• Identify typical application operations
• Identify datasets and potential datamodel
• Identify transaction, replication and consistency requirements
• Identify performance requirements
• Identify how you can migrate, if needed
• Evaluate functionality and performance of chosen databases
• Build proof-of-concept for the solution
• No perfect NoSQL / RDBMS database and no “bad”
Conclusion