newsql - portland state universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1...

22
1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred Holahan NewSQL Keep SQL (some of it) and ACID But be speedy and scalable 11/21/11 David Maier, Portland State University 2

Upload: others

Post on 07-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

1

NewSQL: Flying on ACID

David Maier

Thanks to H-Store folks, Mike Stonebraker, Fred Holahan

NewSQL

•  Keep SQL (some of it) and ACID •  But be speedy and scalable

11/21/11 David Maier, Portland State University 2

Page 2: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

2

Database Landscape From: the 451 group

11/21/11 David Maier, Portland State University 3

OLTP Focus

•  On-Line Transaction Processing •  Lots of small reads and updates •  Many transactions no longer have a

human intermediary For example, buying sports or show tickets

•  100K+ xact/sec, maybe millions •  Horses for courses

11/21/11 David Maier, Portland State University 4

Page 3: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

3

Premises

•  If you want a fast multi-node DBMS, you need a fast single-node DBMS.

•  If you want a single-node DBMS to go 100x as fast, you need to execute 1/100 of the instructions. n  You won’t get there on clever disk I/O:

Most of the data is living in memory

11/21/11 David Maier, Portland State University 5

Where Does the Time Go?

11/21/11 David Maier, Portland State University 6

•  TPC-C CPU cycles

•  On Shore DBMS •  Instruction

counts have similar pattern

Page 4: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

4

A Bit More Detail

11/21/11 David Maier, Portland State University 7

Source:  S.  Harizopoulos,  D.  J.  Abadi,  S.  Madden,  M.  Stonebraker,  “OLTP  Under  the  Looking  Glass”,  SIGMOD  2008.  

What are These Different Parts?

Buffer manager: Manages the slots that holds disk pages n  Locate pages by a hash table n  Employs an eviction strategy (clock scan –

approximates LRU) n  Coordinates with recovery system

11/21/11 David Maier, Portland State University 8

Page 5: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

5

Different Parts 2

Locks: Logical-level shared and exclusive claims to data items and index nodes n  Locks are typically held until the end of a

transaction n  Lock manager must also manage deadlocks

11/21/11 David Maier, Portland State University 9

Different Parts 3

Latches: Low-level locks on shared structures n  Free-space list n  Buffer-pool directory (hash table) n  Buffer “clock” Also, “pinning” pages in the buffer pool

11/21/11 David Maier, Portland State University 10

Page 6: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

6

Different Parts 4

Logging: Undo and redo information in case of transaction, application or system failure n  Must be written to disk before

corresponding page can be removed from buffer pool

11/21/11 David Maier, Portland State University 11

Strategies to Reduce Cost

•  All data lives in main memory •  Multi-copy for high-assurance

Still need undo info (in memory) for rollback and disk-based information for recovery

•  No user interaction in transactions •  Avoid run-time interpretation and

planning Register all transactions in advance

11/21/11 David Maier, Portland State University 12

Page 7: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

7

Strategies, cont.

•  Serialize transactions Possible, since there aren’t waits for disk I/O

or user input

•  Parallelize •  Between transactions •  Between parts of a single transaction •  Between primary and secondary copies

11/21/11 David Maier, Portland State University 13

H-Store & VoltDB

•  H-Store is the academic project Brown/Yale/MIT http://hstore.cs.brown.edu/

•  VoltDB is the company Velocity OnLine Transactions http://community.voltdb.com/documentation Community and Enterprise editions

11/21/11 David Maier, Portland State University 14

Page 8: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

8

VoltDB Techniques

Data in main memory n  32-way cluster can have a terabyte of MM n  Don’t need a buffer manager n  No waiting for disk n  All in-use data generally resides in MM for

OLTP systems anyway

11/21/11 David Maier, Portland State University 15

VoltDB Techniques 2

Interact only via stored procedures n  No roundtrips to client during multi-query

transactions n  No user waiting n  Can compile & optimize in advance n  (Might pre-analyze conflicts)

Need to structure applications carefully

11/21/11 David Maier, Portland State University 16

Page 9: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

9

Discussion Problem

Want to support on-line course reg. 1.  Search for courses: number, time 2. User gets list of matching courses 3. User chooses a course 4.  Show enrollment status of course 5.  If not full, allow user to register

Validate prerequisites

11/21/11 David Maier, Portland State University 17

Tables

Offering(CRN, Course#, Days, Limit)

Registered(CRN, SID)

Student(SID, First, Last, Status)

Prereq(Course#, PCourse#, MinMark)

Transcript(SID, Course#, Grade)

Don’t over-enroll course No user input in transaction Don’t turn student away if you’ve shown space in

the course 11/21/11 David Maier, Portland State University 18

Page 10: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

10

VoltDB Techniques 3

Serial execution of transactions n  Avoids locking and latching n  Avoids thread or process switches n  Avoids some logging

Still need undo buffer for rollback

11/22/11 David Maier, Portland State University 19

VoltDB Techniques 4

Multiple copies for high availability n  Can specify k-factor for redundancy: can

tolerate up to k node failures n  For complete durability:

w Snapshot of DB state to disk w Log commands to disk

11/22/11 David Maier, Portland State University 20

Page 11: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

11

VoltDB Techniques 5

Shared-nothing parallelism: tables can be partitioned (or replicated) and spread across multiple sites. n  Each site has its own execution engine and

data structures n  No latching of shared structures n  Does incur some latency on multi-partition

transactions

11/22/11 David Maier, Portland State University 21

Can have partitions of several tables at each site

11/22/11 David Maier, Portland State University 22

ITEM ITEMj ITEM ITEM ITEM

P2

P4

DISTRICT

CUSTOMER

ORDER_ITEM

STOCK

ORDERS

Replicated

WAREHOUSE

P1

P1

P1

P1

P1

P1

P2

P2

P2

P2

P2

P2

P3

P3

P3

P3

P3

P3

P4

P4

P4

P4

P4

P4

P5

P5

P5

P5

P5

P5

P5

P3

P1

ITEM ITEM

ITEM ITEM

ITEM

Partitions

ITEM

Schema Tree

Page 12: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

12

Core1 Core2

HT2

HT1

Core1 Core2

HT2

HT1

Data Placement §  Assign partitions to sites on nodes.

October 26, 2009

P1

ITEM

P2

ITEM

P5

ITEM

Partitions Cluster Nodes

P4 ITEM

P3 ITEM

Node 1 Node n

Results

•  45X conventional RDBMS •  7X Cassandra on key-value workload •  Has been scaled to 3.3M (simple)

transactions per second

11/22/11 David Maier, Portland State University 24

Page 13: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

13

What VoltDB Isn’t Doing

•  Reducing latency: aim is increased throughput Might take a while to get results back

•  All of SQL (e.g., no NOT in WHERE) •  Big aggregates •  Dynamic DDL •  Ad hoc queries (possible, not fast)

11/22/11 David Maier, Portland State University 25

System Structure

Hosts (nodes) each with several sites (< #cores)

Each site has data (partitions), indexes, views, stored procedures

Client can connect to any host Encouraged for load balancing and

availability Also, request queue per host

11/22/11 David Maier, Portland State University 26

Page 14: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

14

In Operation 1.  Client invokes stored procedures with

parameters 2.  Sent to some host 3.  Rerouted to site with correct partition 4.  SPs execute serially (need coordinator if

more than one partition) 5.  Partition forwards queries to redundant

copies and waits 6.  [Rollback if aborted] 7.  Results come back in VoltTable (array)

11/22/11 David Maier, Portland State University 27

Setting up a Database

•  Schema definition •  Tables (strings are stored out of line) •  Indexes, views

•  Select partitioning column (or replicate) •  Can be different for different tables •  Needn’t be a key •  But may want same column to keep

transactions in one partition: Use CRN for Offering and Registered

11/22/11 David Maier, Portland State University 28

Page 15: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

15

Setting up a Database 2 •  Stored procedures

•  In Java and a subset of SQL (some limits) •  SQL can contain ‘?’ for parameters •  Must be deterministic (don’t read system

clock or do network I/O) •  Can submit groups of SQL statements •  Can declare that procedure runs in a single

partition (fastest) Multi-partition, multi-round can have waits and

network delays 11/22/11 David Maier, Portland State University 29

Stored Procedure Example

package fadvisor.procedures;!import org.voltdb.*; !

@ProcInfo( ! singlePartition = true,! partitionInfo = "Reservation.FlightID: 0”!)!

public class HowManySeats extends VoltProcedure {

public final SQLStmt GetSeatCount = new SQLStmt( ! "SELECT NumOfSeats, COUNT(ReserveID) " +! "FROM Flight AS F, Reservation AS R " +! "WHERE F.FlightID=R.FlightID AND R.FlightID=?;");!

Page 16: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

16

Stored Procedure Example cont. public long run(int flightid) ! throws VoltAbortException {!

long numofseats;! long seatsinuse;! VoltTable[] queryresults;!

voltQueueSQL(GetSeatCount, flightid); ! queryresults = voltExecuteSQL(); !

VoltTable result = queryresults[0]; ! if (result.getRowCount() < 1) { return -1; } ! numofseats = result.fetchRow(0).getLong(0); ! seatsinuse = result.fetchRow(0).getLong(1);!

numofseats = numofseats - seatsinuse; ! return numofseats; // Return available seats! }!}

Setting Up a Database 3

•  Compile stored procedures and client apps

•  Set up a Project Definition File •  Schema •  Stored Procedures •  Partitioning •  Groups & permissions

11/21/11 David Maier, Portland State University 32

Page 17: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

17

Project Definition File <?xml version="1.0" ?>!<project>! <database name="database”>! <schemas>! <schema path="flight.ddl" />! </schemas>! <procedures>! <procedure class="procedures.LookupFlight"/>! <procedure class="procedures.HowManySeats"/>! <procedure class="procedures.MakeReservation"/>! <procedure class="procedures.CancelReservation"/>! <procedure class="procedures.RemoveFlight"/>! </procedures>! <partitions>! <partition table="Reservation" column="FlightID"/>! <partition table="Customer" column="CustomerID"/>! </partitions>! </database>!</project>

Starting a Database

•  Need a configuration file <?xml version="1.0"?>!<deployment>! <cluster hostcount=”16”! sitesperhost=”6”! kfactor=”2”! />!</deployment>!

•  Ask a “lead node” to start VoltDB Lead becomes a peer after start up

•  Start client apps 11/22/11 David Maier, Portland State University 34

Page 18: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

18

From the Client Side

•  Connect to DB •  Call stored procedures VoltTable[] results;!try { results = client.callProcedure("LookupFlight", ! origin, ! dest,

! ! ! !departtime).getResults();!

} catch (Exception e) {! e.printStackTrace();! System.exit(-1);!}!

•  Can also be asynch. with callback 11/22/11 David Maier, Portland State University 35

What Can You Change?

•  Can add or modify stored procedures while DB is running Need to coordinate change with client apps

•  Add columns, tables Need to snapshot DB, stop, restart, restore

•  Add nodes, change partitions Same drill

11/22/11 David Maier, Portland State University 36

Page 19: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

19

High Availability

•  If a site is unavailable, use a redundant copy

•  A node can rejoin a cluster, rebuild the partitions it has Partition being copied is locked for duration

•  Can specify on a cluster split, only the larger group keeps running

11/22/11 David Maier, Portland State University 37

Snapshots

Can make a consistent copy of snapshot to disk •  Manual or on a schedule •  Each node stores a file locally •  Transaction consistent: will maintain

multiple versions of data temporarily •  Can restore with changes

•  New column •  Different partitioning

11/22/11 David Maier, Portland State University 38

Page 20: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

20

Command Logging

Can log commands to disk, then play back from last snapshot •  Don’t need to log SELECTs •  Can be synchronous, will delay client

responses •  Snapshot + synchronous command logging

shouldn’t lose anything

11/22/11 David Maier, Portland State University 39

Views

•  Views are materialized •  Must have group-by and return all

grouping columns •  Aggregates are COUNT and SUM (??)

11/22/11 David Maier, Portland State University 40

Page 21: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

21

Export VoltDB can be the front end to a

warehouse or map-reduce engine Export-only tables

n  Can only insert into them (but will undo) n  Contents are spooled to a Connector n  Export client polls the Connector n  Export data overflows to disk

Have an export client that uses Sqoop to populate HDFS

11/22/11 David Maier, Portland State University 41

Languages •  C# •  C++ •  Erlang •  Java •  JDBC •  JSON (HTTP from PHP, Python, Perl, C#) •  PHP •  Python •  Ruby

11/22/11 David Maier, Portland State University 42

Page 22: NewSQL - Portland State Universitydatalab.cs.pdx.edu/education/clouddbms/notes/volt2011.pdf · 1 NewSQL: Flying on ACID David Maier Thanks to H-Store folks, Mike Stonebraker, Fred

22

Minimal Configuration

•  OS: 64-bit Linux •  Dual-core, 64-bit proc. (4-8 cores better) •  4 Gbytes memory minimum •  Sun Java SDK 6 •  Network Time Protocol (NTP)

11/22/11 David Maier, Portland State University 43

Ongoing Work

VoltDB uses 2-phase commit on multi-partition procedures

Considering speculative execution of transactions at sites waiting for commit/abort

Would require multi-transaction rollback

11/22/11 David Maier, Portland State University 44