temporal placement. 2 wasted resources wasted resource wr(v i ) of a node v i : unused area...
TRANSCRIPT
![Page 1: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/1.jpg)
Temporal Placement
![Page 2: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/2.jpg)
2
Wasted Resources• Wasted resource wr(vi) of a node vi:
Unused area occupied by the node vi during the computation of a partition.
wr(vi) = (t(Pi)−ti)×ai, t(Pi): run-time of partition Pi.ti: run-time of the component vi
ai: area of vi
• Wasted resource avoidance: A component is placed on the chip only when its computation is
required and remains on the device only for time it is active.
− Idle components can be replaced by new ones.• Assumptions:
There is partial reconfiguration support. Blocks are hard
− For soft library modules, temporal partitioning + 2d placement may suffice.
![Page 3: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/3.jpg)
3
• Temporal placement: Time-dependent placement Management of task execution at run-time
• Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is
allowed:− If not at the same time.
Temporal Placement
A horizontal cut: RFU configuration at a given time
![Page 4: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/4.jpg)
4
Temporal Placement
• Off-line (compile time) Pre-defined placement sequence
− In DSP applications, program flow can be predicted.
• On-line (during execution) Computation sequence is not known in advance.
− Dynamically at run time. Needs to solve computational intensive problems in a
fraction of millisecond:− Efficient management of the free space− Selection of the best site for a new component − Management of communication
![Page 5: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/5.jpg)
5
Off-Line Temporal Placement
• Off-line temporal placement: Given a DFG G = (V,E) and a reconfigurable device H
(with length Hx and width Hy), a temporal placement is a 3-dimensional vector function:
p = (px, py, pt) : V → R3,
pt defines a feasible schedule (pt(vi): Starting time of vi.)
px(vi), py(vi): Coordinates of vi on H,
![Page 6: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/6.jpg)
6
Off-Line Temporal Placement
[Bazargan]
![Page 7: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/7.jpg)
7
Off-Line Temporal Placement
• Algorithms:
1. First fit
2. Best fit
3. Packing
4. Simulated annealing (KAMER)
![Page 8: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/8.jpg)
8
First/Best Fit Algorithm
• First fit:
1. Select a node among ready nodes.− i.e. nodes whose predecessors have been placed
2. Place it in the first free location.
• Advantages: Fast (linear w.r.t. number of free locations)
• Disadvantages: Unused resources very high Fragmentation
![Page 9: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/9.jpg)
9
First/Best Fit Algorithm
• Best fit:1. Select a node among ready nodes.2. Place it in the best free location.
• Advantages: Better area utilization
• Disadvantages: Much slower:
− Complete list of free locations must be searched. Fragmentation
• Simplifying the problem: Restricting the modules to columns (Virtex II) or part
of a column (Virtex 4 / Virtex 5 / Virtex 6).
Frames
CLBs
![Page 10: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/10.jpg)
10
First Fit Algorithm• Algorithm: First-fit temporal placement of clusters
1. While all the clusters are not placed do
1.1. Select one cluster Cact from the list of ready-to-run clusters
1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that
Ctop ≤ Cact
1.3. Place the cluster Cact on top
of Ctop
2. end while
![Page 11: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/11.jpg)
11
First Fit Algorithm• Configuration sequence
produced:
a) {0, 1, 2, 3, 4}
b) {5, 1, 2, 3, 4}
c) {5, 1 , 6, 7, 4}
d) {5, 1 , 6, 7, 8}
e) {5, 9 , 6, 7, 8}
f) {10, 9 , 6, 7, 8}
![Page 12: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/12.jpg)
12
ILP
• ILP: Can construct a constraint set An objective function
• Advantage: Exact solution
• Limitation: Only for small size problems.
![Page 13: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/13.jpg)
13
Packing Approach
• Variations of Packing Problem:
1. Base Minimization Problem (BMP): Given a set of boxes B and a height H, find a container with
minimal size (x, y, H) that can accommodate the set of boxes B i.e. in temporal placement:
− Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H.
2. Strip Packing Problem (SPP): Given a set of boxes and a base (X, Y), find the minimum height h
container with size (X, Y, h) that can hold all the boxes. i.e. in temporal
− Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y).
• The device is usually fixed SPP is more common.
![Page 14: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/14.jpg)
KAMER
![Page 15: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/15.jpg)
15
KAMER Model of an RCS
• Larger picture: An RFU operation ri can be
either accepted or rejected− based on availability of
RFU real-estate If rejected, run on CPU
− running time penalty
• Assumption: No communications among
RFUs After running an operation
on RFU, results are saved in CPU registers
![Page 16: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/16.jpg)
16
KAMER Model of an RCS
• On-line temporal placement
• RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)}
ei - si : time span the operation is resident in the system
• Placement of RFUOPS: Modeled as 3D template placement
• Empty Rectangle (ER): A rectangle that does not overlap a placed module on the
chip• Maximum Empty Rectangle (MER):
An ER not included in any other rectangle than the device bounding box
![Page 17: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/17.jpg)
17
Example
• Non-ER: (E,F)
• ER: (E,D): MER (A,D): MER (E,C): not MER
• KAMER method: keeping all maximum empty rectangles:
− Permanently keeps track of all MERs
− Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v.
Possible to have many MERs in which v can fit Strategies: first-fit or best-fit
![Page 18: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/18.jpg)
18
KAMER
Once a rectangle is chosen:− Candidate points are those that do not allow an overlap
with the external part of the rectangle.
• Problem: Number of empty rectangles does not grow linear with
the number of components included
![Page 19: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/19.jpg)
19
KAMER
Run-time of free rectangle placement: O(n2)
• Heuristic: Keep only the non-overlapping empty rectangles
− Linear time, lower quality
• Problem 1: Several non-overlapping rectangle representations
![Page 20: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/20.jpg)
20
Keeping Non-Overlapping ERs
• Problem 2: Non-overlapping empty rectangles are not necessarily
maximal− A module may exists that could fit onto the device, but
cannot be placed
Can fitCan’t fit
(bad decomposition)
![Page 21: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/21.jpg)
21
Keeping Non-Overlapping ERs
• Problem 2:
Can fitCan’t fit
![Page 22: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/22.jpg)
22
Keeping Non-Overlapping ERs
• Incremental update: Whenever a new component v1 is placed in a rectangle,
two possibilities:− Horizontal splitting− Vertical splitting
Negative impact on next components
![Page 23: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/23.jpg)
23
Keeping Non-Overlapping ERs
• Horizontal
Can’t fit
Can fit
• Vertical
•Solution:Delaying the split decision for a number of steps later
![Page 24: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/24.jpg)
24
Keeping Non-Overlapping ERs
KAMER always finds a rectangle to place a new component, if one exists.
But position to place the component within the rectangle must be selected from a set of points
− whose number is the area of the rectangle in worst case• In [Bazargan00]:
Bottom-left position− Note: No communication is assumed between the rectangle to be
placed and those already placed.• If communication exists:
An optimal algorithm should consider any single point• Solution:
[Ahmadinia04]
![Page 25: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/25.jpg)
25
KAMER Off-Line Placement
• Uses simulated annealing in a 3D space:
1. Places X% largest modules− X is a parameter of the algorithm− Idea: Remove small modules
− Open space for larger ones− Small modules fragment more
Initial placement: from KAMER- best fit Perturb:
− Can displace a module on the RFU− No change in start or end time
− Can move an operation from CPU to RFU and vice versa
2. Use zero-temperature SA for fine tuning Place as many (100-X)% modules as it can
![Page 26: Temporal Placement. 2 Wasted Resources Wasted resource wr(v i ) of a node v i : Unused area occupied by the node v i during the computation of a partition](https://reader033.vdocuments.mx/reader033/viewer/2022050909/56649c745503460f949271e9/html5/thumbnails/26.jpg)
35
References [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing:
Architectures, Algorithms and Applications,” Springer, 2007. [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The
Theory and Practice of FPGA-Based Computation" Morgan-Kaufmann, 2007
[Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52–62.
[Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000.
[Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/
[Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.