distributed nosql storage for extreme -scale system...

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

0.1 0.5 5.0 50.0

0.0

0.2

0.4

0.6

0.8

1.0

Request latency in ms

Distributed NoSQL Storage for Extreme-Scale System ServicesTonglin Li1, Ioan Raicu1, 2

1Illinois Institute of Technology, 2Argonne National Laboratory

Abstract FRIEDA-State: Scalable State Management for Scientific Applications on Cloud

Graph/Z: A Key-Value Store Based Scalable Graph Processing System

Selected Publications

WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks

Acknowledgement

Journal papers•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing, Journal of CCPE 2015.•Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the Performance and Potential of Cloud Computing for Scientific Applications, IEEE Transactions on Cloud Computing (TCC), 2015•Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015.

Conference papers•Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu, A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE International Conference on Big Data, 2015 •Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015•Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific Applications in the Cloud, IEEE International Congress on Big Data 2014•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS, 2013.•Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems, IEEE International Conference on Big Data 2014 •Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference on Big Data 2014

Posters and extended abstracts•Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015 •Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based Interactive Data Infrastructure for Sensor Networks, SC2014 •Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011

Motivationq Processing graph queryq Handle big data setq Fault tolerance

Design and Implementationq Pregel-like processing modelq Using ZHT as backendq Partitioning at master node

Highlighted featuresq Data localityq Load balance

Performance

Contribution

q ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table§ Design and implementation of ZHT and optimized for high-end

computing§ Verified scalability on 32K-cores scale§ Achieving latencies of1.1ms and throughput of 18M ops/sec on a

supercomputer and 0.8ms and 1.2M ops/sec on a cloud§ Simulated ZHT on 1 million-node scale for the potential use in extreme

scale systems. q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System

for the Cloud § Supports different QoS latency on a single deployment for multiple

concurrent applications,§ Both guaranteed and best-effort services are provided§ Benchmarks on real system (16 nodes) and simulations (512 nodes)

q FRIEDA-State: Scalable state management for scientific applications on cloud§ Design and implementation of FRIEDA-State§ lightweight capturing, storage and vector clock-based event ordering § Evaluation on multiple platforms at scales of up to 64 VMs

q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks§ Design and implementation of WaggleDB§ Supporting high write concurrency, transactional command execution

and tier-independent dynamic scalability§ Evaluated with up to 128 concurrent clients

q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System§ Design and implementation of GRAPH/Z, a BSP model graph processing

system on top of ZHT.§ The system utilizes data-locality and minimize data movement between

nodes. § Benchmarks up to 16-nodes scales.

ZHT: A Light-weight Reliable Dynamic Scalable Zero-hop Distributed Hash Table

Motivationq Performance gap between storage and computing resourceq Large storage systems suffering bottle neck of metadataq No suitable key-value store solution on HPC platforms

Design and Implementationq Written in C++, few dependencyq Modified Consistent hashingq Persistent backend: NoVoHT

Primitivesq insert, lookup, removeq append, cswap, callback

Highlighted featuresq Persistenceq Dynamic membershipq Fault tolerance via replication

Performance

Motivationq Cloud for scientific applicationsq Need application reproducibility and persistence of stateq Clock drifting issue in dynamic environments

Design and Implementationq Use local files to store captured statesq Merge and reorder with vector clock q Key-value store for storage and query support

Motivationq Unreliable network q Wide range of request rateq Admins interaction to nodesq High write concurrencyq Many data types for sensorsq Scalable architecture

Design and Implementationq Multi-tier architectureq Independent components in each tierq Organize each tier as a Phantom domain for dynamic scalingq Message queues as write buffersq Transactional interaction via databaseq Column-family with semi-structured data for various data types

Performance

On both HPC systems and clouds the continuously wideningperformance gap between storage and computing resourceprevents us from building scalable data-intensive systems.Distributed NoSQL storage systems are known for their ease ofuse and attractive performance and are increasingly used asbuilding blocks of large scale applications on cloud or datacenters. However there are not many works on bridging theperformance gap on supercomputers with NoSQL data stores.

This work presents a convergence of distributed NoSQLstorage systems in clouds and supercomputers. It firstly presentsZHT, a dynamic scalable zero-hop distributed key-value store,that aims to be a building block of large scale systems on cloudsand supercomputers. This work also presents several realsystems that have adopted ZHT as well as other NoSQLsystems, namely ZHT/Q (a Flexible QoS Fortified DistributedKey-Value Storage System for the Cloud), FREIDA-State (statemanagement for scientific applications on cloud), WaggleDB (aCloud-based interactive data infrastructure for sensor networkapplications), and Graph/Z (a key-value store based scalablegraph processing system); all of these systems have beensignificantly simplified due to NoSQL storage systems, and havebeen shown scalable performance.

ZHT instance

ZHT Manager

Update

Response to request

Partition

ZHT instance

Partition

Responseto request

Physical node

Membership table

UUID(ZHT)IPPortCapacityworkload

Broadcast

Applicationsq Distributed storage systems: ZHT/Q, FusionFS, IStoreq Job scheduling/launching system: MATRIX, Slurm++ q Other systems: Graph/Z, Fabriq

ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud

Motivationq Needs of running multiple applications on single data storeq Optimizing single deployment for many different requirements

Design and Implementationq Request batching proxyq Dynamic batching strategy

Highlighted featuresq Adaptive request batchingq QoS supportq Traffic-aware automatic performance tuning

Performance

Request Handler

B1 B2 B3 Bn-1 Bn

Condition Monitor& Sender

Batching Strategy Engine

PluginPluginPlugin

Check condition

Check results

Batch buckets

Push requests to batchUpdate condition

Returned batch results

Sending batches

Initiate results

KV

KV

KV

KV…

KV

KV

Result Service

Unpack and insert

Client API Wrapper

Choose strategy

…

Latency Feedback

Response buffer

0"

2000"

4000"

6000"

8000"

10000"

12000"

14000"

16000"

1" 2" 4" 8" 16"

Single'nod

e'throughp

ut'in'ops/s'

Number'of'nodes'

Pa*ern"1"

Pa*ern"2"

Pa*ern"3"

Pa*ern"4"

Workloads with multiple QoS

Performance

02000400060008000

10000120001400016000

1 2 4 8 16 32 64

Late

ncy

in n

s

Client number

File: amortized latency 1 Cassandra server2 Cassandra servers 4 Cassandra servers8 Cassandra servers DynamoDB

0

500

1000

1500

2000

2500

3000

3500

4000

1 2 4 8 16 32 64 128

Late

ncy

in u

s

Client number

Amortized merging

Amortized moving

File write atency

Storage solution comparison Overhead analysis of file based storage solution

13.5%

8.5%

6.1%4.8%

462.8% 539.1% 568.3%

1% 2% 4% 8%

0%

2%

4%

6%

8%

10%

12%

14%

16%

1%

10%

100%

1000%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130%

Number'of'queue'servers'

Latency'in'm

s'

Latency'in'm

s'(log)'

Time'in'sec'

Average%Latency%

Real:;me%Latency%

0"

0.5"

1"

1.5"

2"

2.5"

3"

1" 2" 4" 8" 16" 32" 64" 128"

Speedup&

Clients&#&

2"4"8"

Queue&Server&Number&

0"1"2"3"4"5"6"7"8"9"10"11"12"13"14"15"

1" 2" 4" 8" 16" 32" 64" 128"

Latency(in(ms(

Clients(#(

1"2"4"8"

Queue"Server"Number"

Scalable current write Speedup of distributed servers Dynamic tier scalingBatch request latency distributions Throughputs and scalability

System architecture Distributed event ordering

distributed nosql storage for extreme -scale system...

Documents