december 14, 2012 · support single tuple or range queries. ๏ most applications have however...
TRANSCRIPT
![Page 1: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/1.jpg)
Clouder: A Flexible Large Scale Decentralized Object Store
Ricardo Vilaça
December 14, 2012
![Page 2: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/2.jpg)
๏ Traditional RDBMS were not designed for massive scale.
๏ Storage of digital data has reached unprecedented levels.
๏ Trade consistency for availability.
๏ Massive-scale distributed computing is a challenge at our doorstep.
๏ Centralized storage and processing making extensive and flexible data partitioning unavoidable.
๏ Emergence of Cloud Computing paradigm/business model.
Context
2
![Page 3: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/3.jpg)
๏ Elastic computing.๏ Highly decentralized, scalable, and dependable
systems.๏ Google’s Bigtable, Amazon’s Dynamo, Yahoo’s
PNUTS, Facebook’s Cassandra.๏ A simple data store interface, that allows
applications to insert, query, and remove individual elements.
๏ Forfeit complex relational and processing facilities.
๏ Specific narrow tradeoff between consistency, availability, performance, scale, and cost.
Large scale data stores
3
![Page 4: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/4.jpg)
๏ In most enterprises there isn’t a large in-house research development team to redesign applications.
๏Hard to provide a smooth migration path for existing applications, mostly SQL based.
๏Hurdle to the adoption of Cloud computing by a wider potential market.
Availability
Consistency
Networkpartitiontolerance
Scalability
Availability
Consistency
Networkpartitiontolerance
Scalability
Availability
Consistency
Networkpartitiontolerance
Scalability
RDBMS LSDS Clouder
Tradeoffs
4
![Page 5: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/5.jpg)
๏ New generation large scale data stores and traditional RDBMS are in disjoint design spaces, and there is a huge gap between them.
๏ This thesis aims at reducing this gap by:๏ seeking mechanisms to provide additional
consistency guarantees;๏ higher-level data processing primitives in large
scale data stores:๏ extending data stores with additional operations,
such as general multi-item operations;๏ coupling data stores with other existing
processing facilities.
Problem and goals
5
![Page 6: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/6.jpg)
DataDroplets
![Page 7: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/7.jpg)
๏ Assumptions๏ Controlled environment: same owner, reasonable
stable membership, specific set of applications.๏ Nodes can be commodity machines mainly
dedicated to enterprise or academic business tasks.
๏ Goals๏ General purpose consistent, conflict-free, data
storage, and in-place processing capabilities.๏ Allow to leverage large computing
infrastructures.๏ Dependability in face of massive distributed
data.
Clouder
7
![Page 8: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/8.jpg)
Soft-state layer
Partition
Replication
RequestRouting
DHT
• Hybrid
• Structured approach suited to a smaller scale and to offer an adequate interface to the client.
• Unstructured approach is fit to manage the massive scale of the system.
Clouder architecture
8
Clients
Storage
Persistent-state Layer
Replication
![Page 9: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/9.jpg)
Soft-state layer
Partition
Replication
RequestRouting
DHT
• Hybrid
• Structured approach suited to a smaller scale and to offer an adequate interface to the client.
• Unstructured approach is fit to manage the massive scale of the system.
Clouder architecture
8
Clients
Storage
Persistent-state Layer
Replication
Soft-state layer
Partition
Replication
RequestRouting
DHT
![Page 10: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/10.jpg)
๏Elastic data store providing atomic access to tuples and flexible data replication.
๏Environment๏Hundreds of nodes with a reasonably
stable membership.๏Functionalities๏Storage cache managed as allowed by the
adopted consistency criteria.๏Data partitioning.๏Request routing and processing.
DataDroplets
9
![Page 11: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/11.jpg)
String ⇀ (K ⇀ (V × P String))
Key Value
1
3
4
a
c
b
Tags
Blue
Red
Red
Oval
Oval Rectangular
Rectangular
๏New Operations:๏Complex queries containing tags,
wildcards, or ranges of tags.
DataDroplets data model and API
10
![Page 12: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/12.jpg)
๏ Existing data placement strategies only efficiently support single tuple or range queries.
๏ Most applications have however general multi-tuple queries.๏ Its performance is highly affected by the data
placement strategy.๏ Correlation
๏ The probability of a pair of tuples being requested together in a query is not uniform but often highly skewed.
๏ Stable over time for most applications. ๏ Challenge of achieving such placement in a
decentralized fashion.
State of the art data placement
11
![Page 13: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/13.jpg)
Our approach: Tagged
12
๏Set of tags defined per tuple.
๏Uses a dimension reducing and locality-preserving indexing scheme (SFC).
Nodes Data
Tagged
d
1
a
c3
24
h
0.000
0.2500.750
0.100
0.4000.600
a
0.500
h
![Page 14: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/14.jpg)
Data placement
13
2500Users
20nodes
Withoutreplication
Twitter alike workloadReal
200
225
250
275
300
Late
ncy
(ms)
0 100 200 300 400 500
Throughput (ops/sec)
randomorderedtagged
![Page 15: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/15.jpg)
SQL on large scale data stores
![Page 16: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/16.jpg)
๏Attaining scalable SQL processing on top of a large scale data store.
๏Does not introduce coordination bottlenecks.
๏Use the query engine functionality of RDBMS removing components that limit scalability.๏Separation of concerns: execution,
transactions, and storage.๏Resolve impedance mismatches of the
relational model and the data model of large scale data stores.
Distributed query engine
15
![Page 17: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/17.jpg)
Distributed query engine architecture
16
Large scale data store
DQELarge scale data store application
SQL Application
DQE
SQL Application
Large scale data store client library
Large scale data store client library
DQE
Large scale data store client library
![Page 18: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/18.jpg)
Distributed query engine architecture
16
Large scale data storerowrowrow
indexindexindex
DQELarge scale data store application
SQL Application
DQE
SQL Application
Large scale data store client library
Large scale data store client library
DQE
Large scale data store client library
compiler optimizer
connection handler
Available Operators
selection projection join
seqscan indexscan
SQL ApplicationSELECT * FROM
JDBC driver
meta meta meta
Large scale data store client library
Tran
sact
ions
Stat
istics
![Page 19: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/19.jpg)
๏YCSB workload.DQE overhead
17
Workload 1 client 100 tps1 client 100 tps 50 clients 100 tps50 clients 100 tps
Operation HBase DQE HBase DQE
Insert 0.58 0.93 1.04 1.98
Update 0.51 1.3 2.66 3.1
Read 0.53 0.79 1.63 1.7
Scan 1.43 2.9 4.64 6.1
![Page 20: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/20.jpg)
๏1 to 30 HBase RS (1 to 10 DQE).๏TPC-C workload.
DQE non-transactional scalability
18
0
5 . 10 4
1 . 105
1.5 .105
Thro
ughp
ut tp
mC
0 5 10 15 20 25 30
Nodes
DQEPyTPCC
![Page 21: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/21.jpg)
๏1 to 30 HBase RS (1 to 10 DQE).๏TPC-C workload.
DQE non-transactional scalability
18
0
5 . 10 4
1 . 105
1.5 .105
Thro
ughp
ut tp
mC
0 5 10 15 20 25 30
Nodes
DQEPyTPCC
Linear Scalabity
![Page 22: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/22.jpg)
๏1 to 30 HBase RS (1 to 10 DQE).๏TPC-C workload.
DQE non-transactional scalability
18
0
5 . 10 4
1 . 105
1.5 .105
Thro
ughp
ut tp
mC
0 5 10 15 20 25 30
Nodes
DQEPyTPCC
Network saturated
Linear Scalabity
![Page 23: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/23.jpg)
๏ A new architecture for large scale data stores.๏ New large scale data store with additional consistency
guarantees and higher level data processing primitives. ๏ A multi-tuple data placement strategy that allows to
efficiently store and retrieve large sets of related data at once. ๏ Provides better results in overall query performance. ๏ Usefulness of having multiple simultaneous
placement strategies in a multi-tenant system.๏ Design modifications to existing relational SQL query
engines allowing them to be distributed, and efficiently run SQL on scalable data stores while preserving scalability.
Main contributions
19
![Page 24: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/24.jpg)
๏ A prototype of a new large scale data store, following the proposed architecture and implementing the novel data placement strategy.๏ Extensive simulation and real results, under a
workload representative of applications currently exploiting the scalability of large scale data store.
๏ A simple but realistic benchmark for large scale stores based on Twitter and currently known statistical data about its usage.
๏ A prototype of a distributed SQL query engine running on a large scale data store.๏ Experimental results, using standard industrial
database benchmarks.
Main results
20
![Page 25: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/25.jpg)
Clouder: A Flexible Large Scale Decentralized Object Store
Ricardo VilaçaSupervised by Rui Oliveira
December 14, 2012
![Page 26: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/26.jpg)
๏ Involved in a National research project, Stratus (PTDC/EIA-CCO/115570/2009), and two European research projects: CumuloNimbo (FP7-257993) and GORDA (FP6-IST2-004758)
๏ Ricardo Vilaca, Rui Carlos Oliveira and José Pereira. A correlation-aware data placement strategy for key-value stores. In DAIS 2011
๏ Miguel Matos, Ricardo Vilaça, José Pereira and Rui Oliveira. An epidemic approach to dependable key-value substrates. In DCDV 2011
๏ Ricardo Vilaça, Francisco Cruz and Rui Oliveira. On the expressiveness and trade-offs of large scale tuple stores. In DOA 2010
๏ Ricardo Vilaça and Rui Oliveira. Clouder: a flexible large scale decentralized object store: architecture overview. In WDDDM 2009
Publications
22
![Page 27: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/27.jpg)
๏ Builds on the Chord structured ring overlay network.๏ Nodes in the overlay have unique identifiers uniformly
picked from the [0,1] interval and ordered along the ring.
๏ Each node is responsible for the storage of buckets of a distributed hash table (DHT) also mapped into the same [0,1] interval.
๏ Several data placement strategies defined on a per collection basis.
๏ Automatic load redistribution on membership changes.๏ As some workloads may impair the uniform data
distribution the system implements dynamic load-balancing.
Data placement
23
![Page 28: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/28.jpg)
๏ The random strategy is based on a consistent hash.
๏ Pseudo-randomly hash the tuple’s key.๏ Uniformly maps tuples identifiers to the identifier
space, providing automatic load balancing.
Nodes
DataRandom
d
1
a c
3
24
b 0.000
0.250
0.500
0.750
0.100
0.400
0.850
0.600
Random
24
![Page 29: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/29.jpg)
๏The ordered strategy places tuples according to the partial order of the tuple’
Nodes
DataOrdered
a
1
c b
3
24
d 0.000
0.250
0.500
0.750
0.100
0.400
0.850
0.600
Ordered
25
![Page 30: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/30.jpg)
DataDroplets Simulated evaluation Setting
26
๏2 Dual-Core AMD Opteron processors running at 2.53GHz and 2GB of RAM.
๏Network delay model with latency uniformly distributed between 1 ms and 2 ms to simulate a LAN network.
๏Hybrid simulation for CPU profiled with real execution.
๏Populated with 10000 concurrent users and the same number of active users were simulated.
![Page 31: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/31.jpg)
DataDroplets real evaluation setting
27
๏Real๏24 AMD Opteron Processor cores running
at 2.1GHz, 128GB of RAM.๏20 instances of Java Virtual Machine
(1.6.0) running ProtoPeer.๏Apache MINA 1.1.3 for communication.๏All data persistently stored using Berkeley
DB Java edition 4.0.5.๏Populated with 2500 concurrent users and
the same number of active users were
![Page 32: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/32.jpg)
Twitter alike workload
28
๏ Workload mimics a Twitter alike application.๏ The workload needs three collections to store the needed
information: users, tweets and users_timeline.๏ Operations
๏ List<Tweet>statuses_user_timeline(String userID,int start,int count)
๏ List<Tweet>statuses_friends_timeline(String userID,int start,int count)
๏ List<Tweet> search_contains_hashtag(String topic)
๏ List<Tweet> statuses_mentions(String owner)
๏ statuses_update(Tweet tweet)
๏ friendships_create(String userID,String toStartUserID)
๏ friendships_destroy(String userID, String toStopUserID)
๏ Open-Source
![Page 33: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/33.jpg)
DataDroplets Replication
29
10000 usersPopulation
10000Active Users
100nodes
With tags0
20
40
60
Late
ncy(
ms)
0 2.5 103 5 103 7.5 103 1 104
no replication3 replicas synchronously3 replicas asynchronously
Incoming Throughput(ops/sec)
SimulationTwitter alike workload
![Page 34: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/34.jpg)
DataDroplets Scalability
30
10000 users
Population
10000Active Users
Withoutreplication
With tags
0
50
100
150
200
Late
ncy(
ms)
0 5 103 1 104 1.5 104 2 104
Incoming Throughput(ops/sec)
100 nodes200 nodes
Twitter alike workloadSimulation
![Page 35: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/35.jpg)
Data placement simulated results
31
10000 usersPopulation
10000Active Users
100nodes
Withoutreplication
0
50
100
150
200
Latency(ms)
0 2.5 103 5 103 7.5 103 1 104
Throughput(ops/sec)
randomorderedtagged
Twitter alike workload
![Page 36: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/36.jpg)
Number of exchanged messages
32
0
2.5 106
5 106
7.5 106
1 107
Tota
l Mes
sage
s
0 50 100 150 200Nodes
randomtagged
![Page 37: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/37.jpg)
๏CumuloNimbo FP7 project.๏HBase and Apache Derby
Distributed query engine implementation
33
Query engine layer
SQL Application
HRegionServer
Storage Layer
File System Layer
HRegionServer HRegionServer
DataNode
DataNode
Data Node
DQE
SQL Application
DQE
SQL Application
DQE
Master
Name Node
Tran
sact
ion
Man
agem
ent
![Page 38: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/38.jpg)
DQE non-transactional evaluation setting
34
• We ran the experiments on a cluster of forty-two machines with 3.10GHz GHz Quad-Core i3-2100 CPU, with 4GB memory and a local SATA disk.
• Number of client machines from one to ten, each running a hundred and fifty client threads. Each client machine also ran an DQE instance in embedded mode.
• One machine was used to run the HDFS namenode, HBase Master and Zookeper .
• The remaining machines are RegionServers, each configured with a heap of 3GB, and also running a HDFS datanode instance.
• The TPC-C database ranges from five warehouses for a single RegionServer to a hundred and fifty warehouses for thirty RegionServers.
![Page 39: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/39.jpg)
DQE transactional evaluation setting
35
• Cluster of thirty-six machines
• Six of which are used to run the HDFS namenode, HBase Master, the SO server, a Zookeeper ensemble.
• The YCSB workload is run from five machines, each running a hundred client threads.
• read-only, single-row
• write, multi-row write, and and 30% scan
• Remaining machines as RegionServers, each configured with a heap of 16GB.
• Each machine has two Xeon Quad-Core 2.40Ghz processors, 24GB of memory, gigabit Ethernet and four SATA hard disks.
• MySQL is configured with a single management node and a single MySQL daemon.
• Redundancy of 3.
![Page 40: December 14, 2012 · support single tuple or range queries. ๏ Most applications have however general multi-tuple queries. ๏ Its performance is highly a!ected by the data placement](https://reader034.vdocuments.mx/reader034/viewer/2022050307/5f6fdcfdcd07ee570741c4ab/html5/thumbnails/40.jpg)
DQE transactional scalability
36
0
5 · 103
1 · 104
1.5 · 104
2 · 104
2.5 · 104
Through
put(ops/sec)
0 5 10 15 20 25
Number of storage nodes
HBaseSO+DQEMySQL
0
5 · 104
1 · 105
1.5 · 105
Through
put(ops/sec)
0 5 10 15 20 25
Number of storage nodes
HBaseSO+DQEMySQL
0
2 · 104
4 · 104
6 · 104
8 · 104
Through
put(ops/sec)
0 5 10 15 20 25
Number of storage nodes
HBaseSO+DQEMySQL
0
5 · 103
1 · 104
1.5 · 104
Through
put(ops/sec)
0 5 10 15 20 25
Number of storage nodes
HBaseSO+DQEMySQL
Read-Only Write-Only single
Write-Only multi Mixed