from a to e: analyzing tpcs oltp benchmarks pınar tözün ippokratis pandis* cansu kaynak djordje...

19
From A to E: Analyzing TPC’s OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale de Lausanne *IBM Almaden Research Center The obsolete, the ubiquitous, the unknown

Upload: paola-bury

Post on 29-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

From A to E:Analyzing TPC’s OLTP Benchmarks

Pınar Tözün Ippokratis Pandis*Cansu Kaynak Djordje Jevdjic

Anastasia Ailamaki

École Polytechnique Fédérale de Lausanne*IBM Almaden Research Center

The obsolete, the ubiquitous, the unknown

Page 2: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

OLTP Benchmarks of TPC

2

• Allow fair product comparisons• Drive innovations for better performance

TPC-E: Unknown – Results from one DBMS vendorTPC-C: Ubiquitous – Most common

TPC-A, TPC-B: Obsolete

20151985 1995 2005

19901989 1992 2007

TPC-C

TPC-B

TPC-E

TPC-ABanking

Wholesale supplier

Brokerage house

Page 3: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

3

How is TPC-E different?

Hardware

Storage Manager

Workload

Micro-architectural behavior

Where does time go?

Characteristics/Statistics

Under-utilization due to instruction stallsFewer cache misses and higher IPC

Harder to partition requestsLogical lock contention

More page re-useComplex schema & transactions

Longer held locks

Page 4: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

4

Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions

Page 5: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

5

Experimental SetupServer Fat (Intel Xeon X5660) Lean (Sun Niagara T2)

#Sockets 2 1#Cores per Socket 6 (OoO) 8 (in-order)

#HW Contexts 24 64Clock Speed 2.80GHz 1.40GHz

Memory 48GB 64GBL3 12MB (shared) –L2 256KB (per core) 4MB (shared)

L1-D 32KB (per core) 8KB (per core)L1-I 32KB (per core) 16KB (per core)OS Ubuntu 10.04

Linux kernel 2.6.32SunOS 5.10

Generic_141414-10

Page 6: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

6

Methodology• Shore-MT

– Scalable open-source storage manager

• Shore-Kits– Application layer for Shore-MT– Workloads: TPC-B, TPC-C, TPC-E, ++

• Micro-architectural– Xeon X5660: Vtune, Niagara T2: cputrack– Measured at peak throughput

• Storage manager profiling– Niagara T2: dtrace

*https://sites.google.com/site/shoremt

*

*

Page 7: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

7

Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions

Page 8: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

8

IPC on Fat & Lean Cores

TPC-B TPC-C TPC-E0

0.5

1

1.5

2

2.5

3

3.5

4

Inst

ructi

ons

per C

ycle

Intel Xeon X5660

TPC-B TPC-C TPC-E0

0.5

1

1.5

2

2.5

3

3.5

4

Inst

ructi

ons

per C

ycle

Sun Niagara T2Maximum

Maximum

OLTP utilizes lean cores betterTPC-E has higher IPC

Page 9: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

9

Execution Cycles and StallsIntel Xeon X5660

More than half of execution time goes to stallsInstruction stalls are the main problem

TPC-B TPC-C TPC-E0%

20%

40%

60%

80%

100%Busy Stalled

Exec

ution

Cyc

les

Brea

kdow

n

TPC-B TPC-C TPC-E0%

20%

40%

60%

80%

100%Rest Instruction

Core

Sta

lls

Page 10: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

10

TPC-B TPC-C TPC-E0

20

40

60

80

100 L2D L1D

L2I L1I

Mis

ses

per k

-Inst

ructi

ons

Cache Misses

TPC-E has lower data miss ratio (MPKI)L1-I misses dominate

Intel Xeon X566032KB L1-I & 32 KB L1-D

Sun Niagara T216KB L1-I & 8KB L1-D

TPC-B TPC-C TPC-E0

20

40

60

80

100LLC L2D L1D L2IL1I

Mis

ses

per k

-Inst

ructi

ons

Page 11: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

11

Why TPC-E has lower miss ratio?

TPC-B TPC-C TPC-E0

30

60

90

120

150

#Rec

ords

Acc

esse

d

More scans of TPC-E Increased page reuse

Average per transaction

TPC-B TPC-C TPC-E0

5

10

15

20

25

30

35

40

HeapIndex

#Pag

es A

cces

sed

Page 12: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

12

Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions

Page 13: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

13

From A to E: Schema

branch warehouse

Fixed

Scaling

Growing

customer

Increasing schema complexity

TPC-B TPC-C TPC-E

Page 14: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

14

From A to E: TransactionsTPC-B TPC-C TPC-E

#Transactions 1 5 12Transaction Mix RW 100% RW 92% RW 23%Secondary Indexes

None 2 transactions 10 transactions

Transaction Input includes

Branch ID Warehouse ID Customer ID orBroker ID orTrade ID or

Harder to partitionMore complexity & variety in transaction mix

Page 15: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

15

Within the Storage ManagerSun Niagara T2

64 HW Contexts

SF 64 – 0.6GBSpread

SF 64 – 8.2GBSpread

SF 1 – 20GBNo-Spread

4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E

0%

20%

40%

60%

80%

100%

Other

Btree

BPool

Logging

Locking

#HW Contexts

Tim

e Br

eakd

own

Page 16: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

16

Within the Storage ManagerSun Niagara T2

64 HW Contexts

SF 64 – 0.6GBSpread

SF 64 – 8.2GBSpread

SF 1 – 20GBNo-Spread

Lock manager is the main bottleneck for TPC-E

4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E

0%

20%

40%

60%

80%

100%

Other

Btree

BPool

Logging

Locking

#HW Contexts

Tim

e Br

eakd

own

Page 17: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

17

4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E

0%

20%

40%

60%

80%

100%

Physical ContentionLogical ContentionLock Ac-quisition

#HW Contexts

Tim

e Br

eakd

own

SF 64 – 8.2GBSpread

Inside the Lock Manager

SF 64 – 0.6GBSpread

SF 1 – 20GBNo-Spread

Logical contention even for a large DB

Page 18: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

18

Conclusions• Modern hardware is still highly under-utilized

– TPC-E: fewer misses, less stall time, higher IPC– OLTP utilizes less aggressive cores better

• Instruction footprint is too large to fit in L1-I– Spread instructions, (software guided) prefetching– Code/Compiler optimizations

• Logical lock contention due to hotspots– Increased complexity in schema and transactions– TPC-E: harder to physically partition– Logical partitioning, OCC

Page 19: From A to E: Analyzing TPCs OLTP Benchmarks Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki École Polytechnique Fédérale

The obsolet

eThe

ubiquitous

The unexplored

Directed by

Produced by

Also starring: Shore-MT, Xeon X5660, Niagara T2

TP C- B

TPC-C

TP C- E