cbr: sharing dram with minimum latency and bandwidth guarantees

50
CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees Zefu Dai, Mark Jarvin and Jia nwen Zhu University of Toronto

Upload: alika-tyler

Post on 31-Dec-2015

38 views

Category:

Documents


1 download

DESCRIPTION

CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees. Zefu Dai, Mark Jarvin and Jianwen Zhu. University of Toronto. Background. Consumer Electronics is part of everyday life!. SoC. Mem Contr. DRAM. Background. A portable media player SoC example. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

CBR: Sharing DRAM with Minimum Latency and Bandwidth

Guarantees

Zefu Dai, Mark Jarvin and Jianwen Zhu

University of Toronto

Page 2: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 2

Background Consumer Electronics is part of everyday life!

SoC

Mem Contr.

DRAM

Page 3: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 3

Background A portable media player SoC example

Page 4: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 4

Background A portable media player SoC example

Page 5: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 5

BackgroundA portable media player SoC example

6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s

Page 6: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 6

BackgroundA portable media player SoC example

6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s

1000x

Page 7: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 7

BackgroundA portable media player SoC example

6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s

Give me 10 KB in 1 us,

please.

Page 8: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 8

BackgroundA portable media player SoC example

6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s

Give me 10 KB in 1 us,

please.

I want the data

NOW!!!

Page 9: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 9

BackgroundA portable media player SoC example

6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s

Give me 10 KB in 1 us,

please.

I want the data

NOW!!!

I can only supply a maximum of 6.4 GB every second.

Page 10: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 10

ChallengesSimultaneously satisfy:

- Bandwidth requirements

- Latency requirements

Page 11: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 11

Previous WorkQoS aware

- Bandwidth or latency is heuristically improved

QoS guaranteed- Guaranteed minimum bandwidth and / or latency

Page 12: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 12

Main IdeasStart with Bandwidth Guaranteed Prioritized

Queuing (BGPQ) algorithm - Bandwidth guarantee

Improve it using Credit Borrow and Repay (CBR) mechanism- Minimum latency guarantee

Page 13: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 13

Bandwidth Guaranteed Prioritized Queuing

Combine both the benefits of the Priority Queuing and Weighted Fair Queuing - Credit based Weighted Fair Queuing

- Prioritized service for residual bandwidth allocation

Residual bandwidth:- The bandwidth assigned to one user that is unused

at a specific point of time

Page 14: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 14

BGPQ AlgorithmCase 1: all queues are busy

- No residual bandwidth

- Act as WFQ

Q0

Q1

Q2

Shared Resource

50%

20%

30%

0

0.0 0.0 0.0

Initial state: everybody has a credit of zero.

Multiplexer

BGPQ Scheduler

Page 15: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 15

BGPQ AlgorithmCase 1: all queues are busy

- No residual bandwidth

- Act as WFQ

Q0

Q1

Q2

Shared Resource

50%

20%

30%

0

0.50.2

0.3

Multiplexer

Step 1: calculate dynamic credit for each queue.

BGPQ Scheduler

Page 16: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 16

BGPQ AlgorithmCase 1: all queues are busy

- No residual bandwidth

- Act as WFQ

Q0

Q1

Q2

Shared Resource

50%

20%

30%

0

0.50.2

0.3

Step 2: turn on switch box and transfer data from granted queue.

BGPQ Scheduler

Multiplexer

Page 17: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 17

BGPQ AlgorithmCase 1: all queues are busy

- No residual bandwidth

- Act as WFQ

Q0

Q1

Q2

Shared Resource

50%

20%

30%

0-0.5

0.20.3

Multiplexer

Step 3: subtract 1 from the credit of granted queue.

One Scheduling cycle is Done!!

Sum of credits = 0!

BGPQ Scheduler

Page 18: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 18

BGPQ AlgorithmCase 2: some queues are empty

- Has residual bandwidth

- Prioritized service on residual bandwidth

Q0

Q1

Q2

Shared Resource

50%

20%

30%Multiplexer

Before new scheduling cycle:

Q1 is empty.

Priority: Q0>Q1>Q2

BGPQ Scheduler

0-0.5

0.20.3

Page 19: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 19

BGPQ AlgorithmCase 2: some queues are empty

- Has residual bandwidth

- Prioritized service on residual bandwidth

Q0

Q1

Q2

Shared Resource

50%

20%

30%Multiplexer

Step 1: Calculate a dynamic credit for each queue.

Credit of empty queue remain unchangedPriority: Q0>Q1>Q2

BGPQ Scheduler

00.0 0.2

0.6

Page 20: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 20

BGPQ AlgorithmCase 2: some queues are empty

- Has residual bandwidth

- Prioritized service on residual bandwidth

Q0

Q1

Q2

Shared Resource

50%

20%

30%Multiplexer

Step 2: allocate residual bandwidth to non-empty queue with highest priority.

Priority: Q0>Q1>Q2

BGPQ Scheduler

00.2 0.2

0.6

Page 21: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 21

Shared Resource

BGPQ AlgorithmCase 2: some queues are empty

- Has residual bandwidth

- Prioritized service on residual bandwidth

Q0

Q1

Q2

50%

20%

30%Multiplexer

Step 3: transfer data from granted queue.

Priority: Q0>Q1>Q2

BGPQ Scheduler

00.2 0.2

0.6

Page 22: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 22

Shared Resource

BGPQ AlgorithmCase 2: some queues are empty

- Has residual bandwidth

- Prioritized service on residual bandwidth

Q0

Q1

Q2

50%

20%

30%Multiplexer

Step 4: subtract 1 from the credit of granted queue.

Priority: Q0>Q1>Q2 One Scheduling cycle is Done!!

Sum of credits = 0!

BGPQ Scheduler

00.2 0.2

-0.4

Page 23: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 23

BGPQ AdvantagesBGPQ = WFQ + PQ

- bandwidth guarantee

- prioritized access to residual bandwidth

Low implementation cost:- 3 adders for credit calculation

- 1 comparator tree to find the highest dynamic credit

Page 24: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 24

BGPQ DisadvantageLow latency, low bandwidth requirement

class:- No minimum latency guarantee

Minimum latency:- No need to wait for any request that has lower

priority

Page 25: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 25

Latency Problem of BGPQExample:

Optimal Scheduling:

Page 26: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 26

Credit Borrow and Repay Mechanism

Borrow- Allow low latency requirement class to borrow the

scheduling opportunity from other classes

Repay- Return the credit later when convenient

Page 27: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 27

CBR MechanismCase 3: Credit Borrow and Repay

- Maintain a debt queue for Q0: a borrowed ID FIFO

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

0.7

Step 1: calculate dynamic credit, and allocate the residual bandwidth

Priority: Q0>Q1>Q2DebtQ

CBR Scheduler

Multiplexer

Page 28: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 28

CBR MechanismCase 3: Credit Borrow and Repay

- Maintain a debt queue for Q0

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

0.7

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID.

CBR Scheduler

Page 29: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 29

CBR MechanismCase 3: Credit Borrow and Repay

- Maintain a debt queue for Q0

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

0.7

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 3: transfer data

CBR Scheduler

Page 30: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 30

CBR MechanismCase 3: Credit Borrow

- Maintain a debt queue for Q0

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

-0.3

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 4: subtract 1 from original scheduled queue.

One Scheduling cycle is Done!!

Sum of credits = 0!

CBR Scheduler

Page 31: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 31

CBR MechanismCase 4: Credit Repay

- It is time to repay the credit

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

-0.3

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty

CBR Scheduler

Page 32: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 32

CBR MechanismCase 4: Credit Repay

- It is time to repay the credit

Q0

Q1

Q2

Shared Resource

10%

20%

70%

0

0.60.0 0.4

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 1: calculate dynamic credits and allocate the residual bandwidth.

CBR Scheduler

Page 33: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 33

CBR MechanismCase 4: Credit Repay

- It is time to repay the credit

Q0

Q1

Q2

Shared Resource

10%

20%

70%

0

0.60.0 0.4

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 2: return the scheduling opportunity and clear the DebtQ.

CBR Scheduler

Page 34: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 34

CBR MechanismCase 4: Credit Repay

- It is time to repay the credit

Q0

Q1

Q2

Shared Resource

10%

20%

70%

0

0.60.0 0.4

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 3: transfer data.

CBR Scheduler

Page 35: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 35

CBR MechanismCase 4: Credit Repay

- It is time to repay the credit

Q0

Q1

Q2

Shared Resource

10%

20%

70%

0-0.4

0.0 0.4

Multiplexer

Priority: Q0>Q1>Q2DebtQ

Step 4: subtract 1 from scheduled queue.

One Scheduling cycle is Done!!

Sum of credits = 0!

CBR Scheduler

Page 36: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 36

CBR MechanismMinimum Latency Guarantee using CBR

- No need to wait for requests in other queues

Worst case: Q0 is not empty while DebtQ is full- No minimum latency guarantee under such case

Page 37: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 37

Implementation in FPGACBR MPMC top level diagram

- Instantiation-time configurable port number

- Run-time programmable priority and bandwidth

Page 38: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 38

Implementation in FPGA

Credit calculation circuit

Sorting Network and CBR

Page 39: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 39

Implementation Cost8 port CBR-MPMC with 16-depth DebtQ

- Xilinx Virtex-5 XC5VLX50T

- Speedy DDR backend memory controller

Page 40: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 40

EvaluationSimulation Framework

- Cycle accurate C model of MPMC- Simple close-page DDR memory model - Trace capturing and converting method

Page 41: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 41

EvaluationCPU workload trace file (from B. Jacob)

- Cache simulation on standard SPEC2000 integer benchmark

Irregular and low bandwidth requirement:

0.4 memory transactions per 1k instructions.

Page 42: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 42

EvaluationAccelerator Workload

- ALPBench suite of parallel multimedia applications

Page 43: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 43

EvaluationAccelerator Workload

- ALPBench suite of parallel multimedia applications

Periodically repeated access pattern, high bandwidth requirement:

18.3 memory transactions per 1k instructions.

Page 44: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 44

Results BGPQ Scheduler

- Latency: number of clock cycles- Bandwidth: number of memory transaction per 1k clock cycles

Page 45: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 45

ResultsCBR Scheduler with a 16-depth debtQ

Page 46: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 46

Impact of DebtQ SizeRepay conditions:

- DebtQ is full

- Q0 is empty

Q0

Q1

Q2

Shared Resource

10%

20%

70%

0

0.60.0 0.4

Multiplexer

Priority: Q0>Q1>Q2DebtQ

CBR Scheduler

When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee!

Page 47: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 47

Impact of DebtQ SizeHow big is enough for DebtQ?

- Determined by instant time bandwidth requirement

Irregular access pattern means:- Large range of DebtQ size requirement

Tradeoff- Resource efficiency VS performance

Page 48: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 48

ResultsImpact of debt queue size

Page 49: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 49

ConclusionsCBR scheduler can provide minimum

bandwidth and latency guarantees

Low implementation cost, power consumption

We expect its successful use in a wide range of multimedia applications

Page 50: CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees

23/4/19 University of Toronto 50

Questions?

Q0

Q1

Q2

Shared Resource

10%

20%

70%

00.3 0.0

-0.3

CBR Scheduler

Multiplexer

Priority: Q0>Q1>Q2DebtQ