staged database systems

49
@Carnegie Mellon Databases Staged Database Systems Thesis Oral Stavros Harizopoulos

Upload: marinel

Post on 27-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Staged Database Systems. Thesis Oral Stavros Harizopoulos. Database world: a 30,000 ft view. internet. offload data. DBMS. Sarah: “Buy this book”. DSS: Decision Support Systems few long-running queries. Jeff: “Which store needs more advertising?”. OLTP: Online Transaction Processing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Staged Database Systems

@Carnegie MellonDatabase

s

Staged Database Systems

Thesis Oral

Stavros Harizopoulos

Page 2: Staged Database Systems

2

Database world: a 30,000 ft view

DBMSDBMS

Sarah: “Buy this book”Jeff: “Which store needs more advertising?”

internetinternetoffloaddata

OLTP: Online Transaction ProcessingOLTP: Online Transaction Processingmany short-lived requestsmany short-lived requests

DSS: Decision Support SystemsDSS: Decision Support Systemsfew long-running queriesfew long-running queries

DB systems fuel most e-applications

Improved performance Impact on everyday life

Page 3: Staged Database Systems

3

New HW/SW requirements• More capacity, throughput efficiency• CPUs run much faster than they can access data

CPU

mem

ory

the ‘80s

1 cycle 10 300

DSS stressI/O subsystem

today

Need to optimize all levels of memory hierarchy

Page 4: Staged Database Systems

4

The further, the slower

• Keep data close to CPU

• Locality and predictability is key

DBMS core design contradicts above goals

Overlap mem. accessesOverlap mem. accesseswith computationwith computation

Modify algorithms and structuresModify algorithms and structuresto exhibit more localityto exhibit more locality

Page 5: Staged Database Systems

5

Thread-based execution in DBMS

• Queries are handled by a pool of threads

• Threads execute independently

• No means to exploit common operations

DBMS

thread pool

xno

coordination

D

CD

C

StagedDB

New design to expose locality across threads

Page 6: Staged Database Systems

6

Staged Database Systems

• Organize system components into stages• No need to change algorithms / structures

Stage 3

Stage 2

Stage 1DBMS

queries

StagedDB

queries

High concurrency locality across requests

Page 7: Staged Database Systems

7

Thesis

“By organizing and assigning system components into self-contained stages,

database systems can exploit instruction and data commonality across concurrent

requests

thereby improving performance.”

Page 8: Staged Database Systems

8

Summary of main results

• 56% - 96% fewer I-misses• STEPS: full-system

evaluation on Shore

• 1.2x - 2x throughput• QPipe: full-system

evaluation on BerkeleyDB

memory hierarchy

I

I

D

D L1

L2-L3

RAMRAM

DisksDisks

Page 9: Staged Database Systems

9

Contributions and dissemination

• Introduced StagedDB design • Scheduling algorithms for staged systems

• Built novel query engine design• QPipe engine maximizes data and work sharing

• Addressed instruction cache in OLTP• STEPS applies to any DBMS with few changes

CIDR’03

VLDB’04

SIGMOD’05

CMU-TR’02IEEE Data Eng. ’05

ICDE’06 demo sub.

CMU-TR’05HDMS’05 VLDB J. subm.

TODS subm.

Page 10: Staged Database Systems

10

Outline• IntroductionIntroduction

• QPipe

• STEPS

• Conclusions

I D

DSS

Page 11: Staged Database Systems

11

Query-centric design of DB engines

• Queries are evaluated independently

• No means to share across queries

• Need new design to exploit common data instructions work across operators

Page 12: Staged Database Systems

12

QPipe: operator-centric engine• Conventional: “one-query, many-operators”

• QPipe: “one operator, many-queries”

• Relational operators become Engines

• Queries break up in tasks and queue up

conventional QPipe

queue

runtime

Page 13: Staged Database Systems

13

QPipe designpacketdispatcher

S S

A

threadpool

storage engine

queryplans

conventional design

J

Engine-S

Q Q

Engine-J

Engine-AQ

Q

readwrite

read

Page 14: Staged Database Systems

14

Reusing data & work in QPipe

• Detect overlap at run time

• Shared pages and intermediate results are

simultaneously pipelined to parent nodes

Q1

overlap inred operator

simultaneouspipelining

Q2 Q1 Q2

Page 15: Staged Database Systems

15

Mechanisms for sharing

• Multi-query optimization

• Materialized views

• Buffer pool management

• Shared scans• RedBrick, Teradata, SQL Server

requiresworkload knowledge

opportunistic

limited use

not used in practice

QPipe complements above approaches

Page 16: Staged Database Systems

16

Experimental setup

• QPipe prototype• Built on top of BerkeleyDB, 7,000 C++ lines• Shared-memory buffers, native OS threads

• Platform• 2GHz Pentium 4, 2GB RAM, 4 SCSI disks

• Benchmarks• TPC-H (4GB)

Page 17: Staged Database Systems

17

Sharing order-sensitive scans

I I

M-J

S

A

ORDERS LINEITEM

TPC-HQuery 4

Q1

M-J

S

AQ2

I IORDERS LINEITEM

• Two clients send query at different intervals• QPipe performs 2 separate joins

order-sensitive

order-insensitive

M-J

I I

M-J

I I+

Page 18: Staged Database Systems

18

Sharing order-sensitive scans

• Two clients send query at different intervals• QPipe performs 2 separate joins

0

50

100

150

200

250

300

0 20 40 60 80 100 120 140

Baseline Qpipe w/SP

time difference between arrivals

tota

l res

pons

e tim

e (s

ec)

Page 19: Staged Database Systems

19

TPC-H workload

• Clients use pool of 8 TPC-H queries

• QPipe reuses large scans, runs up to 2x faster

• ..while maintaining low response times

0

20

40

60

80

0 2 4 6 8 10 12

Qpipe w/SPDBMS XBaseline

thro

ughp

ut (

quer

ies/

hr)

number of clients

Page 20: Staged Database Systems

20

QPipe: conclusions• DB engines evaluate queries independently

• Limited existing mechanisms for sharing

• QPipe requires few code changes

• SP is simple yet powerful technique

• Allows dynamic sharing of data and work

• Other benefits (not described here)• I-cache, D-cache performance• Efficiently execute MQO plans

Page 21: Staged Database Systems

21

Outline• IntroductionIntroduction

• QPipeQPipe

• STEPS

• Conclusions

I DOLTP

Page 22: Staged Database Systems

22

Online Transaction Processing

Need solution for instruction cache-residency

L1-I sizes for various CPUs

Max on-chipL2/L3 cache

‘96 ‘00 ‘04‘98 ‘02Year Introduced

10KB

100KB

1MB

10MB

Ca

ch

e s

ize

• High-end servers, non I/O bound

• L1-I stalls are 20-40% of execution time• Instruction caches cannot grow

Page 23: Staged Database Systems

23

Related work

• Hardware and compiler approaches• Increased block size, stream buffer [Ranganathan98]

• Code layout optimizations [Ramirez01]

• Database software approaches• Instruction cache for DSS [Padmanabhan01][Zhou04]• Instruction cache for OLTP: Challenging!

Page 24: Staged Database Systems

24

STEPS for cache-resident code

STEPS: Synchronized Transactions through Explicit Processor Scheduling

• Microbenchmark: eliminate 96% of L1-I misses

• TPC-C: eliminate 2/3 of misses, 1.4 speedup

Begin

Select

Update

Insert

Delete

Commit

Transaction

keep thread model,insert sync points

S still largerthan I-cache

multiplex execution,reuse instructions

Page 25: Staged Database Systems

25

I-cache aware context-switching

code fits inI-cache

context-switch(CTX)point

select( ) s1 s2 s3 s4 s5 s6 s7

thread 1 thread 2

instructioncache

thread 1 thread 2

select( ) s1 s2 s3 s4 s5 s6 s7

select( ) s1 s2 s3

s4 s5 s6 s7

select( ) s1 s2 s3

s4 s5 s6 s7

MissMMMMMMM

MMMM

MMMM

HHHH

HitHHH

MMMMMMMM

no STEPS with STEPS

Page 26: Staged Database Systems

26

Placing CTX calls in source

AutoSTEPS tool

Evaluation

DBMSbinary

valgrind 0x010x040x050x04…

…instructionmem. refs STEPS

simulation 0x01

0x05…

…mem. address

for CTX

gdb file1.c:30

file2.c:40…

…lines to

insert CTX

• Comparable performance to manual• ..while being more conservative

Page 27: Staged Database Systems

27

Experimental setup (1st part)

• Implemented on top of Shore

• AMD AthlonXP• 64KB L1-I + 64KB L1-D, 256KB L2

• Microbenchmark• Index fetch, in-memory index

• Fast CTX for both systems, warm cache

Page 28: Staged Database Systems

28

Microbenchmark: L1-I misses

STEPS eliminates 92-96% of misses for add’l threads

Shore Shore w/Steps

1 2 4 6Concurrent threads

L1-

I cac

he

mis

ses

8 10

1K

2K

3K

4K

AthlonXPAthlonXP

Page 29: Staged Database Systems

29

L1-I misses & speedup

L1-I Miss reduction Upper LimitL1-I Miss reduction %

Sp

eed

up

1.1

1.2

1.3

1.4

Mis

s re

duct

ion 100%

80%

60%

Speedup

40%

10 20 30 40Concurrent threads

50 60 70 80

10 20 30 40Concurrent threads

50 60 70 80

Steps achieves max performance for 6-10 threads• No need for larger thread groups

AthlonXPAthlonXP

Page 30: Staged Database Systems

30

Challenges in full-system operationSo far:

• Threads are interested in same Op• Uninterrupted flow• No thread scheduler

Full-system requirements• High concurrency on similar Ops• Handle exceptions

• Disk I/O, locks, latches, abort

• Co-exist with system threads• Deadlock detection, buffer pool housekeeping

Page 31: Staged Database Systems

31

System design

• Fast CTX through fixed scheduling• Repair thread structures at exceptions• Modify only thread package

STEPS wrapper

Op X

STEPS wrapper

Op Y

STEPS wrapper

Op Z

stray thread

executionteam

to otherOp

Page 32: Staged Database Systems

32

Experimental setup (2nd part)

• AMD AthlonXP• 64KB L1-I + 64KB L1-D, 256KB L2

• TPC-C (wholesale parts supplier)• 2GB RAM, 2 disks

• 10-30 Warehouses (1-3GB), 100-300 users

• Zero think time, in-memory, lazy commits

Page 33: Staged Database Systems

33

One transaction: payment

100 200 300

• STEPS outperforms baseline system

• 1.4 speedup, 65% fewer L1-I misses

Number of users

20%

40%

60%

80%

100%

CyclesL1-I misses

Nor

mal

ized

cou

nt

Page 34: Staged Database Systems

34

Mix of four transactions

100 200Number of users

Nor

mal

ized

cou

nt

20%

40%

60%

80%

100%

CyclesL1-I misses

• Xaction mix reduces team size

• Still, 56% fewer L1-I misses

Page 35: Staged Database Systems

35

STEPS: conclusions

• STEPS can handle full OLTP workloads

• Significant improvements in TPC-C• 65% fewer L1-I misses• 1.2 – 1.4 speedup

STEPS minimizes both capacity / conflict misses without increasing I-cache size / associativity

Page 36: Staged Database Systems

36

StagedDB: future work

• Promising platform for Chip-Multiprocessors• DBMS suffer from CPU-to-CPU cache misses• StagedDB allows work to follow data

-- not the other way around!

• Resource scheduling• Stages cluster requests for DB locks, I/O• Potential for deeper, more effective scheduling

Page 37: Staged Database Systems

37

Conclusions

• New hardware, new requirements

• Server core design remains the same

• Need new design to fit modern hardware

StagedDB:Optimizes all memory hierarchy levels

Promising design for future installations

Page 38: Staged Database Systems

38

The speaker would like to thank:

his academic advisorAnastassia Ailamaki

his thesis committee membersPanos K. Chrysanthis,

Christos Faloutsos,Todd C. Mowry,

and Michael Stonebraker

and his coauthorsKun Gao,

Vladislav Shkapenyuk,and Ryan Williams

Thank you

Page 39: Staged Database Systems

39

QPipe backup

Page 40: Staged Database Systems

40

A Engine in detail

• tuple batching I-cache

• query grouping I&D-cache

relational operator code

simultaneouspipelining

schedulingthread free threads busy threads

main routineEngine

parametersEngine

queue

Harizopoulos04 (VLDB)Zhou03 (VLDB)

Padmanabhan01 (ICDE)Zhou04 (SIGMOD)

Page 41: Staged Database Systems

41

Simultaneous Pipelining in QPipe

join

without SP with SP

Q1write

Q2 Q2

COMPLETE

2 Q2copy

3

Q1 Q1Q1

4pipeline

Q1

join

Q2 Q2Q2

Q2 attach1 Q2

joincoordinator

SP

Q1

Q1 Q1

read

Page 42: Staged Database Systems

42

Sharing data & work across queries

S S

M-J

A

TABLE A TABLE B

Query 1 : “Find average age of studentsenrolled in both class A and class B”

S

TABLE A

maxQuery 2

S S

M-J

TABLE A TABLE B

Query 3min

datasharingopportunity

worksharingopportunity

Page 43: Staged Database Systems

43

Sharing opportunities at run time• Q1 executes operator R• Q2 arrives with R in its plan

sharing potential

result production for R in Q1

Q2

result production for R in Q2

Rwithout SP

Q1 Q2R

writeread

with SPR coordinator

SP

Q1Q2

readpipeline

Page 44: Staged Database Systems

44

TPC-H workload

• Clients use pool of 8 TPC-H queries

• QPipe reuses large scans, runs up to 2x faster

• ..while maintaining low response times

0

20

40

60

80

0 2 4 6 8 10 12

Qpipe w/SPDBMS XBaseline

thro

ughp

ut (

quer

ies/

hr)

number of clients

0

200

400

600

800

1000

1200

0 20 40 60 240

BaselineQpipe w/SP

aver

age

resp

onse

tim

e

think time (sec)

Page 45: Staged Database Systems

45

STEPS backup

Page 46: Staged Database Systems

46

Smaller L1-I cache

AthlonXPPentium III

209%AthlonXP, Pentium IIIAthlonXP, Pentium III

10 threads10 threads

No

rma

lize

d co

un

t

Cycles

L1-I miss

es

Br. Misp

red.

L1-D m

isses

Branch

es

20%

40%

60%

80%

100%

120%

Br. miss

ed BTB Instr. stalls

(cycles)

Steps outperforms Shore even on smaller caches (PIII)• 62-64% fewer mispredicted branches on both CPUs

Page 47: Staged Database Systems

47

SimFlex: L1-I misses

Shore-16KBSteps-16KBMIN Shore-32KB

Steps-32KBMIN

Shore-64KBSteps-64KBMIN

higherassociativity

L1-

I cac

he

mis

ses

2K

4K

6K

8K

10K

direct2-way

4-way8-way

full higherassociativity

AthlonXP

64b cache block64b cache block10 threads10 threads

Steps eliminates all capacity misses (16, 32KB caches)• Up to 89% overall miss reduction (upper limit is 90%)

Page 48: Staged Database Systems

48

One Xaction: payment10 20 30

Steps outperforms Shore• 1.4 speedup, 65% fewer L1-I misses

• 48% fewer mispredicted branches

Number of Warehouses

No

rma

lize

d co

un

t

20%

40%

60%

80%

100%

Cycles L1-Imisses

L1-Dmisses

L2-Imisses

L2-Dmisses

Branchesmispred.

Page 49: Staged Database Systems

49

Mix of four Xactions10 20

No

rma

lize

d co

un

t

20%

40%

60%

80%

100%

Cycles L1-Imisses

L1-Dmisses

L2-Imisses

L2-Dmisses

Branchesmispred.

• Xaction mix reduces average team size (4.3 in 10W)

• Still, Steps has 56% fewer L1-I misses (out of 77% max)

121% 125% Number of Warehouses