cbr: sharing dram with minimum latency and bandwidth guarantees
DESCRIPTION
CBR: Sharing DRAM with Minimum Latency and Bandwidth Guarantees. Zefu Dai, Mark Jarvin and Jianwen Zhu. University of Toronto. Background. Consumer Electronics is part of everyday life!. SoC. Mem Contr. DRAM. Background. A portable media player SoC example. Background. - PowerPoint PPT PresentationTRANSCRIPT
CBR: Sharing DRAM with Minimum Latency and Bandwidth
Guarantees
Zefu Dai, Mark Jarvin and Jianwen Zhu
University of Toronto
23/4/19 University of Toronto 2
Background Consumer Electronics is part of everyday life!
SoC
Mem Contr.
DRAM
23/4/19 University of Toronto 3
Background A portable media player SoC example
23/4/19 University of Toronto 4
Background A portable media player SoC example
23/4/19 University of Toronto 5
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
23/4/19 University of Toronto 6
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
1000x
23/4/19 University of Toronto 7
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
23/4/19 University of Toronto 8
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
23/4/19 University of Toronto 9
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
I can only supply a maximum of 6.4 GB every second.
23/4/19 University of Toronto 10
ChallengesSimultaneously satisfy:
- Bandwidth requirements
- Latency requirements
23/4/19 University of Toronto 11
Previous WorkQoS aware
- Bandwidth or latency is heuristically improved
QoS guaranteed- Guaranteed minimum bandwidth and / or latency
23/4/19 University of Toronto 12
Main IdeasStart with Bandwidth Guaranteed Prioritized
Queuing (BGPQ) algorithm - Bandwidth guarantee
Improve it using Credit Borrow and Repay (CBR) mechanism- Minimum latency guarantee
23/4/19 University of Toronto 13
Bandwidth Guaranteed Prioritized Queuing
Combine both the benefits of the Priority Queuing and Weighted Fair Queuing - Credit based Weighted Fair Queuing
- Prioritized service for residual bandwidth allocation
Residual bandwidth:- The bandwidth assigned to one user that is unused
at a specific point of time
23/4/19 University of Toronto 14
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.0 0.0 0.0
Initial state: everybody has a credit of zero.
Multiplexer
BGPQ Scheduler
23/4/19 University of Toronto 15
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Multiplexer
Step 1: calculate dynamic credit for each queue.
BGPQ Scheduler
23/4/19 University of Toronto 16
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Step 2: turn on switch box and transfer data from granted queue.
BGPQ Scheduler
Multiplexer
23/4/19 University of Toronto 17
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0-0.5
0.20.3
Multiplexer
Step 3: subtract 1 from the credit of granted queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
23/4/19 University of Toronto 18
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Before new scheduling cycle:
Q1 is empty.
Priority: Q0>Q1>Q2
BGPQ Scheduler
0-0.5
0.20.3
23/4/19 University of Toronto 19
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 1: Calculate a dynamic credit for each queue.
Credit of empty queue remain unchangedPriority: Q0>Q1>Q2
BGPQ Scheduler
00.0 0.2
0.6
23/4/19 University of Toronto 20
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 2: allocate residual bandwidth to non-empty queue with highest priority.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
23/4/19 University of Toronto 21
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 3: transfer data from granted queue.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
23/4/19 University of Toronto 22
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 4: subtract 1 from the credit of granted queue.
Priority: Q0>Q1>Q2 One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
00.2 0.2
-0.4
23/4/19 University of Toronto 23
BGPQ AdvantagesBGPQ = WFQ + PQ
- bandwidth guarantee
- prioritized access to residual bandwidth
Low implementation cost:- 3 adders for credit calculation
- 1 comparator tree to find the highest dynamic credit
23/4/19 University of Toronto 24
BGPQ DisadvantageLow latency, low bandwidth requirement
class:- No minimum latency guarantee
Minimum latency:- No need to wait for any request that has lower
priority
23/4/19 University of Toronto 25
Latency Problem of BGPQExample:
Optimal Scheduling:
23/4/19 University of Toronto 26
Credit Borrow and Repay Mechanism
Borrow- Allow low latency requirement class to borrow the
scheduling opportunity from other classes
Repay- Return the credit later when convenient
23/4/19 University of Toronto 27
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0: a borrowed ID FIFO
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Step 1: calculate dynamic credit, and allocate the residual bandwidth
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
Multiplexer
23/4/19 University of Toronto 28
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID.
CBR Scheduler
23/4/19 University of Toronto 29
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data
CBR Scheduler
23/4/19 University of Toronto 30
CBR MechanismCase 3: Credit Borrow
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from original scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
23/4/19 University of Toronto 31
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty
CBR Scheduler
23/4/19 University of Toronto 32
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 1: calculate dynamic credits and allocate the residual bandwidth.
CBR Scheduler
23/4/19 University of Toronto 33
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: return the scheduling opportunity and clear the DebtQ.
CBR Scheduler
23/4/19 University of Toronto 34
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data.
CBR Scheduler
23/4/19 University of Toronto 35
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0-0.4
0.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
23/4/19 University of Toronto 36
CBR MechanismMinimum Latency Guarantee using CBR
- No need to wait for requests in other queues
Worst case: Q0 is not empty while DebtQ is full- No minimum latency guarantee under such case
23/4/19 University of Toronto 37
Implementation in FPGACBR MPMC top level diagram
- Instantiation-time configurable port number
- Run-time programmable priority and bandwidth
23/4/19 University of Toronto 38
Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
23/4/19 University of Toronto 39
Implementation Cost8 port CBR-MPMC with 16-depth DebtQ
- Xilinx Virtex-5 XC5VLX50T
- Speedy DDR backend memory controller
23/4/19 University of Toronto 40
EvaluationSimulation Framework
- Cycle accurate C model of MPMC- Simple close-page DDR memory model - Trace capturing and converting method
23/4/19 University of Toronto 41
EvaluationCPU workload trace file (from B. Jacob)
- Cache simulation on standard SPEC2000 integer benchmark
Irregular and low bandwidth requirement:
0.4 memory transactions per 1k instructions.
23/4/19 University of Toronto 42
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
23/4/19 University of Toronto 43
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
Periodically repeated access pattern, high bandwidth requirement:
18.3 memory transactions per 1k instructions.
23/4/19 University of Toronto 44
Results BGPQ Scheduler
- Latency: number of clock cycles- Bandwidth: number of memory transaction per 1k clock cycles
23/4/19 University of Toronto 45
ResultsCBR Scheduler with a 16-depth debtQ
23/4/19 University of Toronto 46
Impact of DebtQ SizeRepay conditions:
- DebtQ is full
- Q0 is empty
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee!
23/4/19 University of Toronto 47
Impact of DebtQ SizeHow big is enough for DebtQ?
- Determined by instant time bandwidth requirement
Irregular access pattern means:- Large range of DebtQ size requirement
Tradeoff- Resource efficiency VS performance
23/4/19 University of Toronto 48
ResultsImpact of debt queue size
23/4/19 University of Toronto 49
ConclusionsCBR scheduler can provide minimum
bandwidth and latency guarantees
Low implementation cost, power consumption
We expect its successful use in a wide range of multimedia applications
23/4/19 University of Toronto 50
Questions?
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
CBR Scheduler
Multiplexer
Priority: Q0>Q1>Q2DebtQ