distributed nosql storage for extreme -scale system...

1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com 0.1 0.5 5.0 50.0 0.0 0.2 0.4 0.6 0.8 1.0 Request latency in ms Distributed NoSQL Storage for Extreme-Scale System Services Tonglin Li 1 , Ioan Raicu 1, 2 1 Illinois Institute of Technology, 2 Argonne National Laboratory Abstract FRIEDA-State: Scalable State Management for Scientific Applications on Cloud Graph/Z: A Key-Value Store Based Scalable Graph Processing System Selected Publications WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks Acknowledgement Journal papers Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing, Journal of CCPE 2015. •Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the Performance and Potential of Cloud Computing for Scientific Applications , IEEE Transactions on Cloud Computing (TCC), 2015 •Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and locality- aware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015. Conference papers Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu, A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE International Conference on Big Data, 2015 Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015 Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific Applications in the Cloud, IEEE International Congress on Big Data 2014 Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS, 2013. •Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems , IEEE International Conference on Big Data 2014 •Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference on Big Data 2014 Posters and extended abstracts Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015 Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based Interactive Data Infrastructure for Sensor Networks, SC2014 Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011 Motivation q Processing graph query q Handle big data set q Fault tolerance Design and Implementation q Pregel-like processing model q Using ZHT as backend q Partitioning at master node Highlighted features q Data locality q Load balance Performance Contribution q ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table § Design and implementation of ZHT and optimized for high-end computing § Verified scalability on 32K-cores scale § Achieving latencies of1.1ms and throughput of 18M ops/sec on a supercomputer and 0.8ms and 1.2M ops/sec on a cloud § Simulated ZHT on 1 million-node scale for the potential use in extreme scale systems. q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud § Supports different QoS latency on a single deployment for multiple concurrent applications, § Both guaranteed and best-effort services are provided § Benchmarks on real system (16 nodes) and simulations (512 nodes) q FRIEDA-State: Scalable state management for scientific applications on cloud § Design and implementation of FRIEDA-State § lightweight capturing, storage and vector clock-based event ordering § Evaluation on multiple platforms at scales of up to 64 VMs q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks § Design and implementation of WaggleDB § Supporting high write concurrency, transactional command execution and tier-independent dynamic scalability § Evaluated with up to 128 concurrent clients q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System § Design and implementation of GRAPH/Z, a BSP model graph processing system on top of ZHT. § The system utilizes data-locality and minimize data movement between nodes. § Benchmarks up to 16-nodes scales. ZHT: A Light-weight Reliable Dynamic Scalable Zero-hop Distributed Hash Table Motivation q Performance gap between storage and computing resource q Large storage systems suffering bottle neck of metadata q No suitable key-value store solution on HPC platforms Design and Implementation q Written in C++, few dependency q Modified Consistent hashing q Persistent backend: NoVoHT Primitives q insert, lookup, remove q append, cswap, callback Highlighted features q Persistence q Dynamic membership q Fault tolerance via replication Performance Motivation q Cloud for scientific applications q Need application reproducibility and persistence of state q Clock drifting issue in dynamic environments Design and Implementation q Use local files to store captured states q Merge and reorder with vector clock q Key-value store for storage and query support Motivation q Unreliable network q Wide range of request rate q Admins interaction to nodes q High write concurrency q Many data types for sensors q Scalable architecture Design and Implementation q Multi-tier architecture q Independent components in each tier q Organize each tier as a Phantom domain for dynamic scaling q Message queues as write buffers q Transactional interaction via database q Column-family with semi-structured data for various data types Performance On both HPC systems and clouds the continuously widening performance gap between storage and computing resource prevents us from building scalable data-intensive systems. Distributed NoSQL storage systems are known for their ease of use and attractive performance and are increasingly used as building blocks of large scale applications on cloud or data centers. However there are not many works on bridging the performance gap on supercomputers with NoSQL data stores. This work presents a convergence of distributed NoSQL storage systems in clouds and supercomputers. It firstly presents ZHT, a dynamic scalable zero-hop distributed key-value store, that aims to be a building block of large scale systems on clouds and supercomputers. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q (a Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud), FREIDA-State (state management for scientific applications on cloud), WaggleDB (a Cloud-based interactive data infrastructure for sensor network applications), and Graph/Z (a key-value store based scalable graph processing system); all of these systems have been significantly simplified due to NoSQL storage systems, and have been shown scalable performance. ZHT instance ZHT Manager Update Response to request Partition ZHT instance Partition Response to request Physical node Membership table UUID(ZHT) IP Port Capacity workload Broadcast Applications q Distributed storage systems: ZHT/Q, FusionFS, IStore q Job scheduling/launching system: MATRIX, Slurm++ q Other systems: Graph/Z, Fabriq ZHT/Q: A Flexible QoS Fortified Distributed Key- Value Storage System for the Cloud Motivation q Needs of running multiple applications on single data store q Optimizing single deployment for many different requirements Design and Implementation q Request batching proxy q Dynamic batching strategy Highlighted features q Adaptive request batching q QoS support q Traffic-aware automatic performance tuning Performance Request Handler B 1 B 2 B 3 B n-1 B n Condition Monitor & Sender Batching Strategy Engine Plugin Plugin Plugin Check condition Check results Batch buckets Push requests to batch Update condition Returned batch results Sending batches Initiate results K V K V K V K V K V K V Result Service Unpack and insert Client API Wrapper Choose strategy Latency Feedback Response buffer 0 2000 4000 6000 8000 10000 12000 14000 16000 1 2 4 8 16 Single node throughput in ops/s Number of nodes Pa*ern 1 Pa*ern 2 Pa*ern 3 Pa*ern 4 Workloads with multiple QoS Performance 0 2000 4000 6000 8000 10000 12000 14000 16000 1 2 4 8 16 32 64 Latency in ns Client number File: amortized latency 1 Cassandra server 2 Cassandra servers 4 Cassandra servers 8 Cassandra servers DynamoDB 0 500 1000 1500 2000 2500 3000 3500 4000 1 2 4 8 16 32 64 128 Latency in us Client number Amortized merging Amortized moving File write atency Storage solution comparison Overhead analysis of file based storage solution 13.5 8.5 6.1 4.8 462.8 539.1 568.3 1 2 4 8 0 2 4 6 8 10 12 14 16 1 10 100 1000 0 10 20 30 40 50 60 70 80 90 100 110 120 130 Number of queue servers Latency in ms Latency in ms (log) Time in sec Average Latency Real:;me Latency 0 0.5 1 1.5 2 2.5 3 1 2 4 8 16 32 64 128 Speedup Clients # 2 4 8 Queue Server Number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 4 8 16 32 64 128 Latency in ms Clients # 1 2 4 8 Queue Server Number Scalable current write Speedup of distributed servers Dynamic tier scaling Batch request latency distributions Throughputs and scalability System architecture Distributed event ordering

Upload: others

Post on 14-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed NoSQL Storage for Extreme -Scale System Servicessc15.supercomputing.org/sites/all/themes/SC15... · ZHT, a dynamic scalable zero-hop distributed key-value store, that

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

0.1 0.5 5.0 50.0

0.0

0.2

0.4

0.6

0.8

1.0

Request latency in ms

Distributed NoSQL Storage for Extreme-Scale System ServicesTonglin Li1, Ioan Raicu1, 2

1Illinois Institute of Technology, 2Argonne National Laboratory

Abstract FRIEDA-State: Scalable State Management for Scientific Applications on Cloud

Graph/Z: A Key-Value Store Based Scalable Graph Processing System

Selected Publications

WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks

Acknowledgement

Journal papers•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing, Journal of CCPE 2015.•Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the Performance and Potential of Cloud Computing for Scientific Applications, IEEE Transactions on Cloud Computing (TCC), 2015•Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015.

Conference papers•Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu, A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE International Conference on Big Data, 2015 •Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015•Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific Applications in the Cloud, IEEE International Congress on Big Data 2014•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS, 2013.•Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems, IEEE International Conference on Big Data 2014 •Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference on Big Data 2014

Posters and extended abstracts•Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015 •Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based Interactive Data Infrastructure for Sensor Networks, SC2014 •Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011

Motivationq Processing graph queryq Handle big data setq Fault tolerance

Design and Implementationq Pregel-like processing modelq Using ZHT as backendq Partitioning at master node

Highlighted featuresq Data localityq Load balance

Performance

Contribution

q ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table§ Design and implementation of ZHT and optimized for high-end

computing§ Verified scalability on 32K-cores scale§ Achieving latencies of1.1ms and throughput of 18M ops/sec on a

supercomputer and 0.8ms and 1.2M ops/sec on a cloud§ Simulated ZHT on 1 million-node scale for the potential use in extreme

scale systems. q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System

for the Cloud § Supports different QoS latency on a single deployment for multiple

concurrent applications,§ Both guaranteed and best-effort services are provided§ Benchmarks on real system (16 nodes) and simulations (512 nodes)

q FRIEDA-State: Scalable state management for scientific applications on cloud§ Design and implementation of FRIEDA-State§ lightweight capturing, storage and vector clock-based event ordering § Evaluation on multiple platforms at scales of up to 64 VMs

q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks§ Design and implementation of WaggleDB§ Supporting high write concurrency, transactional command execution

and tier-independent dynamic scalability§ Evaluated with up to 128 concurrent clients

q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System§ Design and implementation of GRAPH/Z, a BSP model graph processing

system on top of ZHT.§ The system utilizes data-locality and minimize data movement between

nodes. § Benchmarks up to 16-nodes scales.

ZHT: A Light-weight Reliable Dynamic Scalable Zero-hop Distributed Hash Table

Motivationq Performance gap between storage and computing resourceq Large storage systems suffering bottle neck of metadataq No suitable key-value store solution on HPC platforms

Design and Implementationq Written in C++, few dependencyq Modified Consistent hashingq Persistent backend: NoVoHT

Primitivesq insert, lookup, removeq append, cswap, callback

Highlighted featuresq Persistenceq Dynamic membershipq Fault tolerance via replication

Performance

Motivationq Cloud for scientific applicationsq Need application reproducibility and persistence of stateq Clock drifting issue in dynamic environments

Design and Implementationq Use local files to store captured statesq Merge and reorder with vector clock q Key-value store for storage and query support

Motivationq Unreliable network q Wide range of request rateq Admins interaction to nodesq High write concurrencyq Many data types for sensorsq Scalable architecture

Design and Implementationq Multi-tier architectureq Independent components in each tierq Organize each tier as a Phantom domain for dynamic scalingq Message queues as write buffersq Transactional interaction via databaseq Column-family with semi-structured data for various data types

Performance

On both HPC systems and clouds the continuously wideningperformance gap between storage and computing resourceprevents us from building scalable data-intensive systems.Distributed NoSQL storage systems are known for their ease ofuse and attractive performance and are increasingly used asbuilding blocks of large scale applications on cloud or datacenters. However there are not many works on bridging theperformance gap on supercomputers with NoSQL data stores.

This work presents a convergence of distributed NoSQLstorage systems in clouds and supercomputers. It firstly presentsZHT, a dynamic scalable zero-hop distributed key-value store,that aims to be a building block of large scale systems on cloudsand supercomputers. This work also presents several realsystems that have adopted ZHT as well as other NoSQLsystems, namely ZHT/Q (a Flexible QoS Fortified DistributedKey-Value Storage System for the Cloud), FREIDA-State (statemanagement for scientific applications on cloud), WaggleDB (aCloud-based interactive data infrastructure for sensor networkapplications), and Graph/Z (a key-value store based scalablegraph processing system); all of these systems have beensignificantly simplified due to NoSQL storage systems, and havebeen shown scalable performance.

ZHT instance

ZHT Manager

Update

Response to request

Partition

ZHT instance

Partition

Responseto request

Physical node

Membership table

UUID(ZHT)IPPortCapacityworkload

Broadcast

Applicationsq Distributed storage systems: ZHT/Q, FusionFS, IStoreq Job scheduling/launching system: MATRIX, Slurm++ q Other systems: Graph/Z, Fabriq

ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud

Motivationq Needs of running multiple applications on single data storeq Optimizing single deployment for many different requirements

Design and Implementationq Request batching proxyq Dynamic batching strategy

Highlighted featuresq Adaptive request batchingq QoS supportq Traffic-aware automatic performance tuning

Performance

Request Handler

B1 B2 B3 Bn-1 Bn

Condition Monitor& Sender

Batching Strategy Engine

PluginPluginPlugin

Check condition

Check results

Batch buckets

Push requests to batchUpdate condition

Returned batch results

Sending batches

Initiate results

KV

KV

KV

KV…

KV

KV

Result Service

Unpack and insert

Client API Wrapper

Choose strategy

Latency Feedback

Response buffer

0"

2000"

4000"

6000"

8000"

10000"

12000"

14000"

16000"

1" 2" 4" 8" 16"

Single'nod

e'throughp

ut'in'ops/s'

Number'of'nodes'

Pa*ern"1"

Pa*ern"2"

Pa*ern"3"

Pa*ern"4"

Workloads with multiple QoS

Performance

02000400060008000

10000120001400016000

1 2 4 8 16 32 64

Late

ncy

in n

s

Client number

File: amortized latency 1 Cassandra server2 Cassandra servers 4 Cassandra servers8 Cassandra servers DynamoDB

0

500

1000

1500

2000

2500

3000

3500

4000

1 2 4 8 16 32 64 128

Late

ncy

in u

s

Client number

Amortized merging

Amortized moving

File write atency

Storage solution comparison Overhead analysis of file based storage solution

13.5%

8.5%

6.1%4.8%

462.8% 539.1% 568.3%

1% 2% 4% 8%

0%

2%

4%

6%

8%

10%

12%

14%

16%

1%

10%

100%

1000%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130%

Number'of'queue'servers'

Latency'in'm

s'

Latency'in'm

s'(log)'

Time'in'sec'

Average%Latency%

Real:;me%Latency%

0"

0.5"

1"

1.5"

2"

2.5"

3"

1" 2" 4" 8" 16" 32" 64" 128"

Speedup&

Clients&#&

2"4"8"

Queue&Server&Number&

0"1"2"3"4"5"6"7"8"9"10"11"12"13"14"15"

1" 2" 4" 8" 16" 32" 64" 128"

Latency(in(ms(

Clients(#(

1"2"4"8"

Queue"Server"Number"

Scalable current write Speedup of distributed servers Dynamic tier scalingBatch request latency distributions Throughputs and scalability

System architecture Distributed event ordering