using simulation to explore distributed key-value stores for extreme-scale system services
DESCRIPTION
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services. Ke Wang , Abhishek Kulkarni , Mi chael Lang, Dorian Arnold, Ioan Raicu USRC @ Los Alamos National Laboratory Datasys @ Illinois Institute of Technology CS @ Indiana University - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/1.jpg)
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services
Ke Wang, Abhishek Kulkarni, Michael Lang, Dorian Arnold, Ioan RaicuUSRC @ Los Alamos National LaboratoryDatasys @ Illinois Institute of Technology
CS @ Indiana UniversityCS @ University of New Mexico
November 20th, 2013 at IEEE/ACM Supercomputing/SC 2013
![Page 2: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/2.jpg)
Current HPC System Services
• Extreme scale• Lack of decomposition for insight• Many services have centralized designs• Impacts of service architectures
an open question
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 2
![Page 3: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/3.jpg)
Long Term Goals
• Modular components design for composable services
• Explore the design space for HPC services
• Evaluate the impacts of different design choices
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 3
![Page 4: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/4.jpg)
Contribution
• A taxonomy for classifying HPC system services
• A simulation tool to explore Distributed Key-Value Stores (KVS) design choices for large-scale system services
• An evaluation of KVS design choices for extreme-scale systems using both synthetic and real workload traces
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 4
![Page 5: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/5.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 5
![Page 6: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/6.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 6
![Page 7: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/7.jpg)
Distributed System Services
• Job Launch, Resource Management Systems
• System Monitoring • I/O Forwarding, File Systems • Function Call Shipping • Key-Value Stores
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 7
![Page 8: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/8.jpg)
Key IssuesDistributed System Services
• Scalability• Dynamicity• Fault Tolerance• Consistency
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 8
![Page 9: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/9.jpg)
Key-Value Stores and HPC
• Large volume of data and state information• Distributed NoSQL data stores used as
building blocks • Examples:
Resource management (job, node status info)Monitoring (system active logs)File systems (metadata)SLURM++, MATRIX [1], FusionFS [2]
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 9
[1] K. Wang, I. Raicu. “Paving the Road to exascale through Many Task Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12) [2] D. Zhao, I. Raicu. “Distributed File Systems for Exascale Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12)
![Page 10: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/10.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 10
![Page 11: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/11.jpg)
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services
HPC KVS TaxonomyWhy?
• Decomposition• Categorization• Suggestion • Implication
11
![Page 12: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/12.jpg)
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services
HPC KVS TaxonomyComponent
• Service model: functionality• Data model: distribution and
management of data• Network model: dictates how the
components are connected• Recovery model: how to deal with
component failures• Consistency model: how rapidly data
modifications propagate12
![Page 13: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/13.jpg)
Centralized Architectures
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 13
Data model: centralizedNetwork model: aggregation treeRecovery model: fail-over Consistency model: strong
![Page 14: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/14.jpg)
Distributed Architectures
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 14
Data Model: distributed with partitionNetwork Model: fully-connected partial knowledgeRecovery Model: consecutive replicasConsistency Model: strong, eventual
Voldemort Pastry ZHT
Data distributed distributed distributed
Network fully-connected
partially-connected
fully-connected
Recovery n-way replications
n-way replications
n-way replications
Consistency eventual strong eventual
![Page 15: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/15.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 15
![Page 16: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/16.jpg)
KVS Simulation Design
• Discrete Event Simulation PeerSimEvaluated others: OMNET++, OverSim, SimPy
• Configurable number of servers and clients • Different architectures• Two parallel queues in a server
Communication queue (send/receive requests)Processing queue (process request locally)
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 16
![Page 17: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/17.jpg)
Simulation Cost Model
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 17
The time to resolve a query locally (tLR), and the time to resolve a remote query (tRR) is given by:tLR = CS + SR + LP + SS + CRFor fully connected:
tRR = tLR + 2 × (SS + SR)For partially connected:
tRR = tLR + 2k × (SS + SR)where k is the number of hops to find the predecessor
![Page 18: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/18.jpg)
Failure/Recovery Model
• Defines what to do when a node fails • How a node-state recovers when rejoining after
failure
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 18
s0
r5,1
r4,2s1
r0,1
r5,2
s4
r3,1
r2,2
s2
r1,1
r0,2s3
r2,1
r1,2
s5
r4,1
r3,2
client EM
Xnotify failure
replicate s0 data
first replica down
second replica down
replicate my data
replicate my data
s0
r5,1
r4,2s1
r0,1
r5,2
s4
r3,1
r2,2
s2
r1,1
r0,2s3
r2,1
r1,2
s5
r4,1
r3,2
client EM
Xnotify back
s0, s4, s5 data
remove s0 data
s0 is back
s0 is back
remove s5 data
![Page 19: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/19.jpg)
Consistency Model
• Strong ConsistencyEvery replica observes every update in the same orderClient sends requests to a dedicated server (primary
replica)• Eventual Consistency
Requests are sent to randomly chosen replica (coordinator)
Three key parameters: N, R, W, satisfying R + W > NUse Dynamo [G. Decandia, 2007] version clock to track
different versions of data and detect conflicts
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 19
![Page 20: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/20.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 20
![Page 21: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/21.jpg)
Evaluation
• Evaluate the overheadsDifferent architectures, focus on distributed onesDifferent models
• Light-weight simulations:Largest experiments 25GB RAM, 40 min walltime
• WorkloadsSynthetic workload with 64-bit key spaceReal workload traces from 3 representative system
services: job launch, system monitoring, and I/O forwarding
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 21
![Page 22: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/22.jpg)
Validation
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 22
• Validate against ZHT [1] (left) and Voldemort (right)• ZHT BG/P up to 8K nodes (32K cores)• Voldemort PROBE Kodiak Cluster up to 800 nodes
[1] T. Li, X. Zhou, K. Brandstatter, D. Zhao, K. Wang, A. Rajendran, Z. Zhang, I. Raicu. “ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table”, IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013
![Page 23: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/23.jpg)
Fully-connected vs Partial-connected
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 23
• Partial connectivity higher latency due to the additional routing
• Fully-connected topology faster response (twice as fast at extreme scale)
![Page 24: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/24.jpg)
Replication Overhead
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 24
• Adding replicas always involve overheads
• Replicas have larger impact on fully connected than on partially connected
![Page 25: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/25.jpg)
Failure Effect
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 25
• Higher failure frequency introduces more overhead, but the dominating factor is the client request processing messages
![Page 26: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/26.jpg)
Combined Overhead
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 26
• Eventual consistency has more overhead than the strong consistency
![Page 27: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/27.jpg)
Real Workloads
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 27
Fully connected Partially connected
• For job launch and I/O forwarding• Eventual consistency performs worse almost
URD for both request type and the key• Monitoring
• Eventual consistency works better all requests are “put”
![Page 28: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/28.jpg)
Simulation Real Services
• ZHT (distributed key/value storage)DKVS implementation
• MATRIX (runtime system)DKVS is used to keep task meta-data
• SLURM++ (job management system)DKVS is used to store task & resource
information• FusionFS (distributed file system)
DKVS is used to maintain file/directory meta-dataUsing Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 28
![Page 29: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/29.jpg)
Outline
• Introduction & Motivation• Key-Value Store Taxonomy• Key-Value Store Simulation• Evaluation• Conclusions & Future Work
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 29
![Page 30: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/30.jpg)
Conclusions
• Key-value Store is building block• Service taxonomy is important • Simulation framework to study services• Distributed architecture is demanded• Replication adds overhead• Fully-connected topology is good
As long as the request processing message dominates
• Consistency tradeoffsUsing Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 31
Write-Intensity/Availability
Rea
d-In
tens
ity/
Per
form
ance
Eventual Consistency
Strong
Consistency
Weak
Consistency
![Page 31: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/31.jpg)
Future Work
• Extend the simulator to cover more of the taxonomy
• Explore other recovery models log-based information dispersal algorithm
• Explore other consistency models• Explore using DKVS in the development of:
• General building block library• Distributed monitoring system service• Distributed message queue system
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 32
![Page 32: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/32.jpg)
Acknowledgement
• DOE contract: DE-FC02-06ER25750• Part of NSF award: CNS-1042543 (PRObE)• Collaboration with FusionFS project under NSF
grant: NSF-1054974• BG/P resource from ANL• Thanks to Tonglin Li, Dongfang Zhao, Hakan
Akkan
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 33
![Page 33: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/33.jpg)
• More information:– http://datasys.cs.iit.edu/~kewang/
• Contact:– [email protected]
• Questions?
More Information
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 34
![Page 34: Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services](https://reader036.vdocuments.mx/reader036/viewer/2022062813/5681649c550346895dd67cbc/html5/thumbnails/34.jpg)
Related Work
• Service SimulationPeer-to-peer networks simulationTelephony simulationsSimulation of consistencyProblem: not focus on HPC, or combine distributed
features• Taxonomy
Investigation of distributed hash tables, and an algorithm taxonomy
Grid computing workflows taxonomyProblems: none of them drive features in a simulation
Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 35