buffer capacity computation for throughput constrained streaming applications with data-dependent...
Post on 18-Dec-2015
232 views
TRANSCRIPT
Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication
Maarten Wiggers
PhD student, University of Twente, NL
Co-author and supervisor:
Marco Bekooij, NXP Semiconductors Research
Gerard Smit, University of Twente
Maarten Wiggers -- University of Twente 2
Outline
Context– Streaming applications– Programming multiprocessor architectures
Problem– Problem statement– Related work
Variable Rate Dataflow– Chain topology– Arbitrary graph topology
Experiment
Conclusion
[Wiggers – DATE 2008, Wiggers – RTAS 2008]
Maarten Wiggers -- University of Twente 3
Outline
Context– Streaming applications– Programming multiprocessor architectures
Problem– Problem statement– Related work
Variable Rate Dataflow– Chain topology– Arbitrary graph topology
Experiment
Conclusion
Maarten Wiggers -- University of Twente 5
Application model
use-case
input data stream
task
use-case
FRT video job
task
input data stream output stream
to display
FRT audio job
task
task output streamto speakers
task
Jobs process streams of data
Jobs are composed of tasks
Simultaneously running jobs together form use-cases
Jobs often have real-time requirements– Firm (FRT) if deadline misses are highly undesirable (steep quality
degradation)
Maarten Wiggers -- University of Twente 6
Task graphs
Jobs are implemented as task graphs– Tasks communicate fixed-sized containers over fixed-sized FIFO buffers
Container is a place-holder for data– Task has random access in container
Task only starts an execution on sufficient– Full containers in input buffers– Empty containers in output buffers (back-pressure)
• Backpressure robustly prevents buffer overflow
Required quanta of containers can be– Known at design-time– Dependent on the actual processed stream
Maarten Wiggers -- University of Twente 7
Example job – MP3 playback
MP3 decoding task consumes a variable number of bytes per frame– Every execution a different number of bytes consumed– BR task executes a-periodically– No static-order schedule for BR and MP3 run-time arbitration
Throughput constraint : sink needs to execute strictly periodically– All tasks are pushing data towards the sink– For sufficiently large buffers, sink can execute strictly periodically
n=[0,960]
Maarten Wiggers -- University of Twente 8
Example job – H.263 video decoder
Variable length decoder (VLD) consumes a variable number of bytes per frame
VLD produces a variable number of blocks per frame
DQ and IDCT process blocks
Motion compensator assembles a frame from blocks
Throughput constraint : sink needs to execute strictly periodically
m=[0,6536] n=[0,2376]
Maarten Wiggers -- University of Twente 9
Application trend
Behaviour of applications is increasingly input-data dependent, e.g.– Entropy encoding– Adaptation to channel conditions by digital radio’s
Reflected in– Input-data dependent execution times– Conditional execution of code– Mode changes– Input-data dependent execution rates
Input-data dependent execution rates requires run-time arbitration
Maarten Wiggers -- University of Twente 10
Trend challenge
Required properties– Functionally deterministic behaviour:
output values completely determined by input values– Deadlock free – Throughput constraint satisfied
Research challenge is to define models– For which required properties are decidable– Can model applications with input-data dependent behaviour– Include effects of run-time arbitration– E.g. Variable-Rate Dataflow
Maarten Wiggers -- University of Twente 11
Multi-processor architecture template
Multi-processor system required for performance and power reasons
DSP
mem Arb
NI
I/OExternalSDRAM
CA
ctrl
P
Network-on-Chip
NI NI NI
$
[Hansson – TODAES 2008]
Maarten Wiggers -- University of Twente 12
Compute settings
Dataflow synthesis
(cyclic) task graphWCET
multiprocessorinstance
throughput and latency constraint
scheduler settings andbuffer capacities
Maarten Wiggers -- University of Twente 13
Compute settings
Guarantees on end-to-end throughput requires guarantees on deadlock-freedom
Models that provide end-to-end throughput guarantees are not Turing complete
– Poses restrictions on• Applications : e.g. inter-task synchronisation behaviour• Architectures : e.g. applicable run-time arbitration schemes
Goal: define a model that can guarantee throughput for H.263
Maarten Wiggers -- University of Twente 14
Example
Every execution, task B can choose to consume either 2 or 3
Required buffer capacity for deadlock freedom?
Maarten Wiggers -- University of Twente 15
Example (cont.)
Attempt : assume maximum consumption quantum in every execution
Requires buffer capacity of 3 for deadlock freedom
Maarten Wiggers -- University of Twente 16
Example (cont.)
However, when consuming the minimum quantum
Buffer capacity of 3 is insufficient!
Maarten Wiggers -- University of Twente 20
Outline
Context– Streaming applications– Programming multiprocessor architectures
Problem– Problem statement– Related work
Variable Rate Dataflow– Chain topology– Arbitrary graph topology
Experiment
Conclusion
Maarten Wiggers -- University of Twente 21
Compute buffer capacities– Guarantee satisfaction of throughput constraint
– Tasks can require data-dependent quantum of data and space per execution
Problem
Maarten Wiggers -- University of Twente 22
Problem
Compute buffer capacities– Guarantee satisfaction of throughput constraint
– Tasks can require data-dependent quantum of data and space per execution
Assumptions– Run-time arbitration on shared resources
– Upper and lower bounds on transferred quanta
– Upper bound on execution time
– Throughput constraint: sink or source that executes strictly periodically
Maarten Wiggers -- University of Twente 23
Related work
Quasi static-order scheduling– Transfer quanta change only after (sub) graph iterations– For every iteration a static-order schedule computed
• Bounded memory is decidable– Models are amenable for code-synthesis– Examples
• Heterochronous Dataflow [Girault – TCAD 1999]• Parameterised Dataflow [Bhattacharya – TSP 2001]
– Requirement on changes only after graph iterations is a global requirement• Iteration is a graph property• VLD parses stream and decides next quantum locally
– Static order scheduling excludes overlapped schedules of graphs with different transfer quanta
Maarten Wiggers -- University of Twente 26
Requirements on quanta change
Quasi static-order scheduling:2*A and 3*B before change
Maarten Wiggers -- University of Twente 27
Requirements on quanta change
Variable-Rate Dataflow:can change every firing
Maarten Wiggers -- University of Twente 28
Related work
Variable token sizes instead of variable number of transferred tokens– [Sen – ASSP 2005]– Experiment will show that this results in larger buffers– Variable consumption quantum by VLD depends on processed stream
• BR task is unaware of the semantics of the stream cannot know quantum
Maarten Wiggers -- University of Twente 29
Related work
Variable token sizes instead of variable number of transferred tokens– [Sen – ASSP 2005]– Experiment will show that this results in larger buffers– Variable consumption quantum by VLD depends on processed stream
• BR task is unaware of the semantics of the stream cannot know quantum
Maarten Wiggers -- University of Twente 30
Related work
Run-time arbitration– Not required to compute schedules at design-time– Only need to show that for all transfer quanta a schedule exists– State-of-the-art
• Real-time calculus (group of Thiele at ETH Zurich)• Symta/S (group of Ernst at TU Braunschweig)
– These approaches have• Difficulties with cyclic dependencies that influence the temporal behaviour• No means to reason about bounded memory or deadlock properties
– E.g. no concept similar to consistency
Maarten Wiggers -- University of Twente 31
Outline
Context– Streaming applications– Programming multiprocessor architectures
Problem– Problem statement– Related work
Variable Rate Dataflow– Chain topology– Arbitrary graph topology
Experiment
Conclusion
Maarten Wiggers -- University of Twente 32
Phase 1
Next slides discuss buffer capacity computation in case of chain topology
Maarten Wiggers -- University of Twente 33
Phase 1 and 2
Next slides discuss buffer capacity computation in case of chain topology
Subsequent slides discuss extension to graphs
Maarten Wiggers -- University of Twente 34
Implementation=
Task graph
Model=
Dataflow graph
Variable Rate Dataflow (by example)
Maarten Wiggers -- University of Twente 35
Variable Rate Dataflow
Task graph– Tasks– Buffers
Tasks– Have a bounded response time– Consume and produce data between
start and finish
Buffers have a finite and fixed capacity
Dataflow graph– Actors– Queues
Actors– Have a fixed response time– Consume tokens atomically at the
start– Produce tokens atomically at the
finish
Queues have infinite depth
Maarten Wiggers -- University of Twente 36
periodtime-slice
x
xxxx T
wWCETTTwWCETwWCRT )(
Execution time response time
Maarten Wiggers -- University of Twente 37
periodtime-slice
x
xxxx T
wWCETTTwWCETwWCRT )(
Execution time response time
Explained in detail in [Wiggers – RTAS 2007]Generalisation that includes all starvation-free schedulers
in [Wiggers – SCOPES 2007]
Maarten Wiggers -- University of Twente 38
Variable Rate Dataflow
Task graph– Tasks– Buffers
Tasks– Have a bounded response time– Consume and produce data between
start and finish
Buffers have a finite and fixed capacity
Dataflow graph– Actors– Queues
Actors– Have a fixed response time– Consume tokens atomically at the
start– Produce tokens atomically at the
finish
Queues have infinite depth
Input specification Analysis vehicle
Maarten Wiggers -- University of Twente 39
Approach
Model task graph on architecture by Variable-Rate Dataflow graph
Let actor vτ model the throughput constraining task
Compute sufficient number of tokens to enable actor vτ to execute
strictly periodically
Computed number of tokens equals required buffer capacity– One-to-one correspondence
• Containers in task graph – tokens in dataflow graph• Enabling condition task – firing rule actor• Containers consumed and produced – tokens consumed and produced
– Execution times of actors are upper bound on execution times of tasks– Self-timed execution of Variable-Rate Dataflow is temporally monotonic
Maarten Wiggers -- University of Twente 40
Monotonic temporal behaviour
VRDF actors have sequential firing rules [Lee – 1995]– The number of tokens that is required to be present on inputs is completely
determined by already consumed tokens
VRDF actors are functional– The produced tokens are a function of the consumed tokens
Given self-timed execution. If a token arrives earlier on an input, then– This can only lead to an earlier satisfaction of the firing rule, and– This can only lead to an earlier production of the same tokens
E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time
Because of scheduling anomalies this is not true for the task graph!– A smaller response time can lead to later container arrival times
Token arrival times conservatively bound container arrival times
Maarten Wiggers -- University of Twente 41
Approach – computation of suff. tokens
Find valuation of token transfer parameters that lead to maximum required token transfer rates
On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and– A linear lower bound on token consumption times
Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative
– Offset is relative to start of first firing of actor
Use linear bounds to compute sufficient number of initial tokens
This number of tokens is also sufficient for smaller transfer rates
Maarten Wiggers -- University of Twente 42
Approach – computation of suff. tokens
Find valuation of token transfer parameters that lead to maximum required token transfer rates
On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and– A linear lower bound on token consumption times
Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative
– Offset is relative to start of first firing of actor
Use linear bounds to compute sufficient number of initial tokens
This number of tokens is also sufficient for smaller transfer rates
Maarten Wiggers -- University of Twente 43
Approach – step 1
Determine on each edge the maximum required transfer andfiring rates
Sink has to fire strictly periodically
Maximum required transfer rate on edge for– Maximum consumption quantum
Maximum required firing rates of A for– Minimum production quantum
Maarten Wiggers -- University of Twente 44
Approach – step 1
Determine on each edge the maximum required transfer andfiring rates
Sink has to fire strictly periodically
Maximum required transfer rate on edge for– Maximum consumption quantum
Maximum required firing rates of A for– Minimum production quantum
Maarten Wiggers -- University of Twente 45
Approach – computation of suff. tokens
Find valuation of token transfer parameters that lead to maximum required token transfer rates
On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and– A linear lower bound on token consumption times
Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative
– Offset is relative to start of first firing of actor
Use linear bounds to compute sufficient number of initial tokens
This number of tokens is also sufficient for smaller transfer rates
Maarten Wiggers -- University of Twente 46
Approach – step 2
Actor starts at t=0Consumes tokens at startProduces tokens at finishFinish – start = response time
Given linear bounds on production and consumption times
Find difference between bounds that allows existence of schedule for all sequences of quanta
Maarten Wiggers -- University of Twente 47
Actor starts at t=0Consumes tokens at startProduces tokens at finishFinish – start = response time
Larger quantum larger difference between bounds
Approach – step 2
Given linear bounds on production and consumption times
Find difference between bounds that allows existence of schedule for all sequences of quanta
Maarten Wiggers -- University of Twente 48
Actor starts at t=0Consumes tokens at startProduces tokens at finishFinish – start = response time
Larger quantum larger delay next start time
If largest quantum betweenbounds, then every sequencebetween bounds
Approach – step 2
Given linear bounds on production and consumption times
Find difference between bounds that allows existence of schedule for all sequences of quanta
Maarten Wiggers -- University of Twente 49
Approach – computation of suff. tokens
Find valuation of token transfer parameters that lead to maximum required token transfer rates
On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and– A linear lower bound on token consumption times
Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative
– Offset is relative to start of first firing of actor
Use linear bounds to compute sufficient number of initial tokens
This number of tokens is also sufficient for smaller transfer rates
Maarten Wiggers -- University of Twente 50
Buffer capacity is maximum difference between tokens consumed and produced
Approach – step 3
Difference between linear bounds is buffer capacity
Maarten Wiggers -- University of Twente 51
Approach – computation of suff. tokens
Find valuation of token transfer parameters that lead to maximum required token transfer rates
On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and– A linear lower bound on token consumption times
Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative
– Offset is relative to start of first firing of actor
Use linear bounds to compute sufficient number of initial tokens
This number of tokens is also sufficient for smaller transfer rates
Maarten Wiggers -- University of Twente 52
Buffer capacities are sufficient for smaller rates
Smaller rate by A delay in schedule of A
VRDF graphs have linear temporal behaviour– A delay Δ in production time cannot lead to a production that is delayed by
more than Δ
Approach – step 4
Maarten Wiggers -- University of Twente 53
Approach – step 4
Buffer capacities are sufficient for smaller rates
Smaller rate by A delay in schedule of A
VRDF graphs have linear temporal behaviour– A delay Δ in production time cannot lead to a production that is delayed by
more than Δ
Maarten Wiggers -- University of Twente 54
Approach – step 4
Buffer capacities are sufficient for smaller rates
Smaller rate by A delay in schedule of A
VRDF graphs have linear temporal behaviour– A delay Δ in production time cannot lead to a production that is delayed by
more than Δ
Maarten Wiggers -- University of Twente 55
Approach – step 4
Buffer capacities are sufficient for smaller rates
Smaller rate by A delay in schedule of A
VRDF graphs have linear temporal behaviour– A delay Δ in production time cannot lead to a production that is delayed by
more than Δ
Maarten Wiggers -- University of Twente 56
Chains of buffers
Find the maximum firing rates for all actors
Compute buffer capacities for these rates
If MP3 consumes less, then starts of BR are postponed
By linearity data will still arrive on time at MP3
Computed buffer capacities verified in our dataflow simulator
Maarten Wiggers -- University of Twente 57
Phase 1 and 2
Next slides discuss buffer capacity computation in case of chain topology
Subsequent slides discuss extension to graphs
Maarten Wiggers -- University of Twente 58
Relaxing constraints on topology
Graph definition– Consistency of task graph– Consistency is not sufficient for bounded memory
Computation of buffer capacities is now a global problem
Maarten Wiggers -- University of Twente 59
Parameter communication
Communication of parameter values
Enables modelling of conditional execution of tasks
Maarten Wiggers -- University of Twente 60
Parameter communication
Communication of parameter values
Enables modelling of conditional execution of tasks
Sequential firing rules
Maarten Wiggers -- University of Twente 61
if-then-else
Buffer capacities computed for all combinations of sequences of t and ft=!f (mutual exclusivity) is just a subsetModel abstracts from actual relations between parameters
Maarten Wiggers -- University of Twente 62
Consistency
Transfer quanta on edges determine relative firing rates
[Lee – TC 1987][Lee – TPDS 1991]
Maarten Wiggers -- University of Twente 63
Consistency
Transfer quanta on edges determine relative firing rates
Multiple paths between two actors– Requires check whether their exist firing rates with bounded memory
Maarten Wiggers -- University of Twente 64
Consistency
Fixed transfer quanta cannot model data-dependent behaviour
Allowing for different transfer quantum in every firing
Specification of intervals is insufficient
Maarten Wiggers -- University of Twente 66
Specification of intervals is insufficient
Therefore introduce transfer parameters
Consistency
Maarten Wiggers -- University of Twente 67
Specification of intervals is insufficient
Therefore introduce transfer parameters
Variable-Rate Dataflow graph is (strongly) consistent if there exists a non-trivial symbolic solution to the symbolic balance equations
Consistency
Maarten Wiggers -- University of Twente 68
Consistency is insufficient
Boolean dataflow graphBounded memory depends on control valuesBounded memory can be undecidable
[Buck – 1993]
Maarten Wiggers -- University of Twente 69
Boolean dataflow graphBounded memory depends on control valuesBounded memory can be undecidable
Consistency is insufficient
Maarten Wiggers -- University of Twente 70
Boolean dataflow graphBounded memory depends on control valuesBounded memory can be undecidable
Consistency is insufficient
Maarten Wiggers -- University of Twente 71
Chosen restriction
In the VRDF graph we require that repetition rate of actors in this sub-graph is one
Maarten Wiggers -- University of Twente 72
Every parameter value should correspond with an iteration of this sub-graph
In the VRDF graph we require that repetition rate of actors in this sub-graph is one
Chosen restriction
Maarten Wiggers -- University of Twente 77
Chosen restriction
This restriction implies that (strong) consistency is sufficient for bounded memory
Maarten Wiggers -- University of Twente 78
Requirement– Sink determines throughput for all transfer quanta
Tasks are pushing data to sink– Different quanta imply different task execution rates
– Tasks always need to be able to follow
Buffer capacity– Should enable tasks to follow maximum required rate
– Variation in quanta requires larger buffers
Buffer capacities
Maarten Wiggers -- University of Twente 83
A CBβ=1 β=1
β=1
General topology
Minimum difference between start times of actors– Not a property of an edge– Determined by all paths
Maarten Wiggers -- University of Twente 84
A CBβ=1 β=1
β=1
s=0 s=2s=1
General topology
Minimum difference between start times of actors– Not a property of an edge– Determined by all paths
Maarten Wiggers -- University of Twente 85
A CBβ=1 β=1
β=1
s=0 s=2s=1
2
General topology
Minimum difference between start times of actors– Not a property of an edge– Determined by all paths
Maarten Wiggers -- University of Twente 88
General topology
Minimum difference between start times of actors– Not a property of an edge– Determined by all paths
Network flow problem– Constraints
• minimum differences per edge– Objective
• start times as close as possible together
Maarten Wiggers -- University of Twente 89
Outline
Context– Streaming applications– Programming multiprocessor architectures
Problem– Problem statement– Related work
Variable Rate Dataflow– Chain topology– Arbitrary graph topology
Experiment
Conclusion
Maarten Wiggers -- University of Twente 90
H.263 decoder
m is number of bytes read per picture
n is number of blocks per picture
Motion compensation needs to know how many blocks to read to assemble a picture
Maarten Wiggers -- University of Twente 92
Buffer capacity
Our implementation– Buffer capacity is in blocks
Alternative implementation– buffer capacity is in frames
Maarten Wiggers -- University of Twente 93
Conclusion
Trend : streaming applications are increasingly dynamic– Include tasks that have data-dependent execution rates– Implies run-time arbitration
Variable Rate Dataflow– Production and consumption quanta can change in every execution– Can include effects of run-time arbitration– Efficient checks on execution in bounded memory
Compute buffer capacities that guarantee satisfaction of a throughput constraint
– Temporal monotonicity : token arrival times are conservative container arrival times
– Temporal linearity : Δ later token arrival time cannot result in any token arrival time that is delayed by more than Δ
Maarten Wiggers -- University of Twente 95
References
[Bhattacharya – TSP 2001] B. Bhattacharya and S.S. Bhattacharyya. Parameterized Dataflow Modeling for DSP Systems. IEEE Transactions on Signal Processing. October 2001
[Buck – 1993] J. Buck. Scheduling Dynamic Dataflow Graphs with Bounded Memory using the Token Flow Model. PhD thesis, University of Berkeley. 1993
[Girault – TCAD 1999] A. Girault, B. Lee and E.A. Lee. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on CAD. June 1999
[Hansson – TODAES 2008] A. Hansson, K.G.W. Goossens, M.J.G. Bekooij and J. Huisken. CoMPSoC: A Composable and Predictable Multi-Processor System on Chip Template. ACM Transactions on Design Automation of Electronic Systems. To appear
[Lee – TC 1987] E.A. Lee and D. Messerschmitt. Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing. IEEE Transactions on Computers. January 1987
[Lee – TPDS 1991] E.A. Lee. Consistency in Dataflow Graphs. IEEE Transactions on Par. and Distr. Systems. 1991
[Lee – 1995] E.A. Lee and T. Parks. Dataflow Process Networks. Proc. of the IEEE. May 1995
[Sen – ASSP 2005] M. Sen, S.S. Bhattacharyya, T. Lv, and W. Wolf. Modeling Image Processing Systems with Homogeneous Parameterized Dataflow Graphs. In Proc. ASSP. March 2005
Maarten Wiggers -- University of Twente 96
References
[Wiggers – RTAS 2007] M.H. Wiggers, M.J.G. Bekooij, P.G. Jansen and G.J.M. Smit. Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure. In Proc. RTAS. April 2007
[Wiggers – SCOPES 2007] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Modelling Run-Time Arbitration by Latency-Rate Servers in Dataflow Graphs. In Proc. SCOPES. April 2007
[Wiggers – DATE 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Computation of Buffer Capacities for Throughput Constrained and Data-Dependent Inter-Task Communication. In Proc. DATE. April 2008
[Wiggers – RTAS 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication. In Proc. RTAS. April 2008