load balancing & update propagation in tashkent+

Load Balancing &Update Propagation in Tashkent+

Sameh Elnikety Steven Dropsho Willy Zwaenepoel

EPFL

• Traditional: – Optimize for equal load on replicas

• MALB (Memory-Aware Load Balancing):– Optimize for in-memory execution

Load Balancing

MALB Doubles Throughput

37 X TPC-W

Ordering

16 replicas

12 X

25 X

1 X

• Traditional: – Propagate updates everywhere

• Update Filtering: – Propagate updates to where they are needed

Update Propagation

12 X

MALB + UF Triples Throughput

25 X

37 X

TPC-W

Ordering

16 replicas

1 X


• Update filtering: – Propagate updates to where they are needed

• Implementation: – Tashkent+ middleware

• Experimental evaluation:– Big performance gain

Contributions

Background: DB Replication

Replica 2

Replica 1

Replica 3

SingleDBMS

Load Balancer

Read Tx

Replica 2

Replica 1

Replica 3

Load Balancer

TT

Update Tx

Replica 2

Replica 1

Replica 3

Load Balancer

TTwswswsws

Traditional Load Balancing

Replica 2

Replica 1

Replica 3

Load Balancer

Equal load on replicas Equal load on replicas

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution

How Does MALB Work?

Database 2211 33

Workload A →

B →

MemMemory

2211

22 33

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1

Mem

Disk

2211 33

22

Replica 2

Mem

Disk

2211 33

LeastLoaded

3311

SlowSlow

SlowSlow

A →

B →

2211

22 33

Data Fits in MemoryReplica 1

Mem

Disk

2211 33

2211

Replica 2

Mem

Disk

2211 33

3322

MALB

FastFast

FastFast

A →

B →

2211

22 33A, A, A, A

B, B, B, BMemory info?Many tx and replicas?Memory info?Many tx and replicas?

A, B, A, B

• Exploit tx execution plan– Which tables & indices are accessed– Their access pattern

• Linear scan, direct access

• Metadata from database– Sizes of tables and indices

Estimate Tx Memory Needs

• Objective:– Construct tx groups that fit together in memory

• Bin packing:– Item: tx memory needs– Bin: memory of replica– Heuristic: Best Fit Decreasing

• Allocate replicas to tx groups– Adjust for group loads

Grouping Transactions

Group A

MALB in Action

A B CD E F Replica

Replica

Replica

Group B C

A

Group D E F

B C

D E F

MALB

Disk

Disk

Disk

Memory needs forA, B, C, D, E, F

• Objective:– Optimize for in-memory execution

• Method:– Estimate tx memory needs– Construct tx groups– Allocate replicas to tx groups

MALB Summary

• Traditional: – Propagate updates everywhere

• Update Filtering: – Propagate updates to where they are needed

Update Propagation

Group A

Update Filtering ExampleReplica 1

Group B

Mem

Disk

2211 33

2211

Replica 2

Mem

Disk

2211 33

3322

MALBUF

A →

B →

2211

22 33

A, B, A, B

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

2211

2211

Replica 2

Mem

Disk

2211 33

22

MALBUF

Updatetable 1

33

Updatetable 3

33

A →

B →

2211

22 33

A, B, A, B

Update Filtering in Action

UF

Update tored table

Update togreen table

• Middleware:– Based on Tashkent [EuroSys 2006]

• Consistency:– No change in consistency – Our workloads run serializably– GSI (Generalized Snapshot Isolation)

Tashkent+ Implementation

• Compare:– LeastLoaded: optimize for equal load– MALB: optimize for in-memory execution– MALB+UF: propagate updates to where needed

• Environment:– Linux cluster running PostgreSQL– Workload: TPC-W Ordering (50% update txs)

Experimental Evaluation

MALB Doubles Throughput

37 X TPC-W

Ordering

16 replicas105%105%

12 X

25 X

1 X

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%4%0%0%29%29%

48%48%105%105%45%45%

182%182%75%75%12%12%

Run from memoryRun from memory

Run from disk

Run from disk

MALB + UF Triples Throughput15 15

7 7

TPC-W

Ordering

16 replicas

4949%%

12 X

25 X

37 X

1 X

1.49

0

0.5

1

1.5

2

MALB MALB+UF

Filtering Opportunities

50%Ordering Mix

5% Browsing Mix

1.02

0

0.5

1

1.5

2

MALB MALB+UF

Updates

• MALB is better than LARD (we explain why)

• Methods for grouping transactions– Better be conservative in estimating memory

• Dynamic reallocation for MALB– Adjust to workload changes

• More workloads– TPC-W browsing & shopping– RUBiS bidding

Checkout the Paper



• Implementation: – Tashkent+ middleware

• Experimental evaluation:– Big performance gain, achievable in many cases

Conclusions

• Questions?



Thank You

• Transaction type: – SELECT item-name– FROM shopping-cart

– WHERE user = ?

Example

Transaction type: AInstances: A1, A2, A3

A: SELECT * FROM address WHERE id=$1

B: INSERT INTO course VALUES ($1, $2)

C: UPDATE emp SET sal=$1 WHERE id=$2

D: DELETE FROM bid WHERE id=$1

Transaction Types

A: SELECT * FROM address WHERE id=$1

B: INSERT INTO course VALUES ($1, $2)

C: UPDATE emp SET sal=$1 WHERE id=$2

D: DELETE FROM bid WHERE id=$1

Transaction Types

A1: SELECT * FROM address WHERE id=10A2: SELECT * FROM address WHERE id=20A3: SELECT * FROM address WHERE id=76A4: SELECT * FROM address WHERE id=78A5: SELECT * FROM address WHERE id=97

• MALB-S– Size

• MALB-SC– Size and contents– No double counting of shared tables & indices

• MALB-SCAP– Size, contents, and access pattern

• Overflow tx types– Each tx type in its own group

MALB Grouping

Tx Grouping Alternatives

• Collect CPU and disk utilizations

• Maintain smoothed averages

• Group load: average across replicas– Example

• 3 replicas• Data ( CPU, disk ) = (45, 10), (53, 9), (40, 8)• Group load = (46, 9)

• Comparing loads: max( CPU, disk )– Example

• Max(46, 9) = 46%

MALB Allocation – Load

• Move one replica at a time– From least loaded group– To most loaded group

• Future load estimates– 2 replicas, 20% load -> 1 replica, 40%– 6 replicas, 25% load -> 5 replicas, 30%

• Hysteresis– Prevent oscillations– Trigger re-allocation if load difference > 1.25x

MALB Allocation – Basic

• Quickly snap to a good configuration

• Solve set of simultaneous equations

• Example– Current configuration

• Group M, 3 replicas, 70%• Group N, 7 replicas, 10%

– Equations• Mrep + Nrep = 10• (70% * 3) / Mrep = (10% * 7) / Nrep

• Fine tune with additional adjustments

MALB Allocation – Fast

Configuration of MALB-SC

Disk IO

Average Disk IO per Transaction

MALB & Update Filtering

TPC-W Ordering

LeastCon, MALB, MALB+UF

MidDB 1.8GB with different memory sizes

MALB & Update Filtering

SmallDB 0.7GB BigDB 2.9GB

Dynamic Reallocation

• Load balancing– LARD, HACC, FLEX

• Workload of small static files• Do not estimate working set

• Replicated databases– Front-end load balancer

• Conflict-aware scheduling [Pedone06]– Reduce conflict and therefore abort rate

• Conflict-aware scheduling [Amza03]– Correctness

– Partial replication

Related Work

• Replicated database systems– J. Gray et al., The Dangers of Replication and a Solution,

SIGMOD’96.– C. Amza, Consistent Replication for Scaling Back-end Databases for

Dynamic Content Servers, MiddleWare’03.– C. Plattner et al., Ganymed: Scalable Replication for Transactional

Web Applications, Middleware’04.– B. Kemme et al., Postgres-R(SI) Middleware based Data Replication

providing Snapshot Isolation, SIGMOD’05.– K. Daudjee and K. Salem, Lazy Database Replication with Snapshot

Isolation, VLDB’06.

• Networked server systems– V. Pai et al, Locality-aware Request distribution in Clustered-based

Network Servers, ASPLOS’98.

Related Work

load balancing & update propagation in tashkent+

Documents

b memmemory2123a

b update filtering

tx memory needsbin

memory of replicaheuristic

memory executionhow

memory executionmethod

load balancer equal

memory executionmalb