dbos: it’s time for a principled approach to distributed

28
DBOS: It’s Time for a Principled Approach to Distributed Systems State Kostis Kaffes 1 , Peter Kraft 1 , Qian Li 1 , Athinagoras Skiadopoulos 1 , Daniel Hong 2 , Shana Mathew 2 , Michael Cafarella 2 , Vijay Gadepally 2 , Goetz Graefe 3 , Jeremy Kepner 2 , Christos Kozyrakis 1 , Tim Kraska 2 , Michael Stonebraker 2 , Lalith Suresh 4 , Matei Zaharia 1 ( 1 Stanford, 2 MIT, 3 Google, 4 VMWare) Feb. 11, 2021 1

Upload: others

Post on 14-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBOS: It’s Time for a Principled Approach to Distributed

DBOS: It’s Time for a Principled Approach to Distributed Systems State

Kostis Kaffes1, Peter Kraft1, Qian Li1, Athinagoras Skiadopoulos1, Daniel Hong2, Shana Mathew2, Michael Cafarella2, Vijay Gadepally2,

Goetz Graefe3, Jeremy Kepner2, Christos Kozyrakis1, Tim Kraska2, Michael Stonebraker2, Lalith Suresh4, Matei Zaharia1

(1Stanford, 2MIT, 3Google, 4VMWare)Feb. 11, 2021

1

Page 2: DBOS: It’s Time for a Principled Approach to Distributed

Problem: Distributed Systems are Hard!

•Hard to build, maintain, extend, and optimize.• Basic features take years to develop for the best

engineers and teams in industry and academia.• Examples: distributing a scheduler, partitioning a file

system namespace, peer discovery, anything involving security, and much more.

2

Page 3: DBOS: It’s Time for a Principled Approach to Distributed

Why are Distributed Systems Hard?

•Distributed systems lack a principled approach to managing state.• State includes cluster configurations, task and

worker metadata, user data, etc.• All system operations act on state.

3

Page 4: DBOS: It’s Time for a Principled Approach to Distributed

Overview1. Describe bad state management.2. Propose a radical approach to good state

management: centralize all state in a DBMS.3. Present case studies showing the benefits of our

proposal to core system operations (e.g. scheduling).

4. Discuss proof-of-concept experiments showing this can work in practice.

4

Page 5: DBOS: It’s Time for a Principled Approach to Distributed

Three Sins of Bad State Management1. Dividing state across many disjoint data stores.2. Providing weak abstractions for manipulating state.3. Using data stores that cannot scale.

5

Page 6: DBOS: It’s Time for a Principled Approach to Distributed

Sin 1: Dividing state across many disjoint data stores.• Example: OpenWhisk/Kubernetes divide state between

stores (CouchDB, Consul, ZK, Kafka, etc.) and in-memory data structures.•Makes cross-cutting operations hard.• OpenWhisk monitoring requires ad-hoc interfaces on

every system component.• Kubernetes scheduler isn’t NUMA aware because it no

one can communicate NUMA state to it.

6

Page 7: DBOS: It’s Time for a Principled Approach to Distributed

Sin 2: Providing weak abstractions for manipulating state.

• Systems missing basic primitives for state manipulation (e.g. atomic updates).• Example: Distributed schedulers must reinvent

concurrency control (over in-memory state).• Example: Analytics over file systems requires

exporting metadata to an external database.

7

Page 8: DBOS: It’s Time for a Principled Approach to Distributed

Sin 3: Using data stores that cannot scale.

• Systems centralize all state on a “master” node.•Master bottlenecks performance for large clusters.• Example: Spark scheduler capped at 1K tasks/second.

8

Page 9: DBOS: It’s Time for a Principled Approach to Distributed

Two Principles of Good State Management

1. Centralize system state and user data in a uniform data model as database tables in a distributed DBMS.

2. Execute all operations on state as DBMS transactions invoked from otherwise stateless processes.

9

Page 10: DBOS: It’s Time for a Principled Approach to Distributed

Distributed DBMS

Linux/Hardware

Well-Managed State

Filesystem Scheduler Communication Layer

Auditor

10

Application Layer

Page 11: DBOS: It’s Time for a Principled Approach to Distributed

A Radical Redesign

Prior Systems• Design data structures to store

state.

DBMS-based Systems• Design schemas and indexes

to store state.

11

Page 12: DBOS: It’s Time for a Principled Approach to Distributed

A Radical Redesign

Prior Systems• Design data structures to store

state.• Use RPCs, manual concurrency

control to modify state.

DBMS-based Systems• Design schemas and indexes

to store state.• Modify state using DBMS

transactions.

12

Page 13: DBOS: It’s Time for a Principled Approach to Distributed

A Radical Redesign

Prior Systems• Design data structures to store

state.• Use RPCs, manual concurrency

control to modify state.• Divide user data across file

systems, remote stores.

DBMS-based Systems• Design schemas and indexes

to store state.• Modify state using DBMS

transactions.• Store user data in the DBMS

blob store.

13

Page 14: DBOS: It’s Time for a Principled Approach to Distributed

A Radical Redesign

Prior Systems• Design data structures to store

state.• Use RPCs, manual concurrency

control to modify state.• Divide user data across file

systems, remote stores.• Analyze data with ad-hoc

monitoring interfaces, log parsers.

DBMS-based Systems• Design schemas and indexes

to store state.• Modify state using DBMS

transactions.• Store user data in the DBMS

blob store.• Analyze state in the DBMS

directly!

14

Page 15: DBOS: It’s Time for a Principled Approach to Distributed

Our Proposal: DBOS•We are designing DBOS, a cluster operating system

based on these two principles.•We hope future systems can use it as a framework to

more easily build a system with good state management.• Just starting work on DBOS, details unclear!

15

Page 16: DBOS: It’s Time for a Principled Approach to Distributed

Distributed DBMS

Linux/Hardware

DBOS on the Stack

Filesystem Scheduler Communication Layer

Auditor

16

Application Layer

Page 17: DBOS: It’s Time for a Principled Approach to Distributed

Benefits of Good State Management

No more ad-hoc approach!• Extensibility• Introspection

17

Page 18: DBOS: It’s Time for a Principled Approach to Distributed

https://twitter.com/redpenblackpen/status/875100791165648898/photo/1

Benefits of Good State Management• Extensibility• Introspection

No more ad-hoc approach!

18

Page 19: DBOS: It’s Time for a Principled Approach to Distributed

https://twitter.com/redpenblackpen/status/875100791165648898/photo/1

Benefits of Good State Management• Extensibility• Introspection

No more ad-hoc approach!

19

DBOS approach+ Easier to maintain, extend, and optimize+ Easier to monitor, analyze, and debug

Page 20: DBOS: It’s Time for a Principled Approach to Distributed

Case Study I: Cluster Scheduling

• A scheduler = a stored procedure interacts with DB+Easy to add new dimensions across layers: NUMA,

heterogenous hardware, data locality+Faster innovation using query interface: DCM (OSDI’20) --

scalable, flexible new algorithms+Debuggability: “Find hot spots: workers have CPU utilization

higher than X and temperature higher than Y?”

20

select * from Workerwhere CpuUtil > X and CpuTemp > Y;

Page 21: DBOS: It’s Time for a Principled Approach to Distributed

Case Study II: Serverless Task I/O

•No more hacky, high-overhead solutions - Rendezvous server, TCP hole punching, S3-based I/O

•Query task location, pass messages through DB+Easy to support more patterns: broadcast, aggregation,...+Easy to optimize: batching, parallelization,...

21

Page 22: DBOS: It’s Time for a Principled Approach to Distributed

Case Study III: File System

• FS stores both user data and metadata in a DBMS+Easy to implement various types of data stores (block vs.

object) by changing DB schemas+Adaptive to workload changes (block size, indexes)+Native support for efficient analytics: “find all files

belonging to user X and larger than Y bytes”

22

select FileName from Filewhere UserName = X and FileSize > Y;

Page 23: DBOS: It’s Time for a Principled Approach to Distributed

Proof of Concept• A simple FS using VoltDB• Store all data (UserName, FileName, UserData,...<metadata>) in

a single File table• Two synthetic benchmarks: create and read 1KB files

insert into File values (user, fileName, userData,...);

select UserData from Filewhere UserName=user and FileName=file;

23

Page 24: DBOS: It’s Time for a Principled Approach to Distributed

Proof of Concept• Partition File table across 40 parallel partitions on 2 servers

0 1 2Throughput (⇥106 TPS)

101

102

103

104

Late

ncy

(us) Median

P99

Create files

0 1 2Throughput (⇥106 TPS)

101

102

103

104

Late

ncy

(us) Median

P99

Read 1KB files

Takeaway: VoltDB delivers sub-millisecond latency and sustain 1M+ operations/second per server

24

Page 25: DBOS: It’s Time for a Principled Approach to Distributed

The Time for Revolution is Now• High performance datacenter hardware• Database systems are finally ready -- NewSQL• Low latency, high throughput, distributed, scalable, in-memory

transactional DBMSs• Any distributed DBMS that supports:

• ACID transactions, rigorous query semantics (e.g., SQL)• Scalability: tables partitioned across nodes• Low latency: data mostly be memory-resident

• Examples• VoltDB -- millions of transactions/sec at low latency• SingleStore(MemSQL), ClustrixDB, NuoDB, CockroachDB, ...

25

Page 26: DBOS: It’s Time for a Principled Approach to Distributed

The Time for Revolution is Now• Limitations• Limited support for heterogenous storage formats• Multi-tenant interference on shared resources

• Active research areas: we are optimistic J• Polystores: uniform interface over diverse formats• Performance isolation, admission control

26

Page 27: DBOS: It’s Time for a Principled Approach to Distributed

Research Directions

• Starting to build DBOS• Many open questions

•DBOS will enable:• Security and privacy• Data provenance (GDPR compliance)• Self-adaptivity using ML/RL

27

Page 28: DBOS: It’s Time for a Principled Approach to Distributed

Conclusion

•Distributed systems are hard to build, maintain, extend, and scale• An extreme solution: manage all state centrally in a

distributed DBMS• Can be practical with today’s distributed DBMSs

Questions?Read more: https://arxiv.org/abs/2007.11112

28

DBOS: It’s Time for a Principled Approach to Distributed Systems State!