faculty of sciences and technology university of algarve, faro joão m. p. cardoso april 30, 2001...
TRANSCRIPT
![Page 1: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/1.jpg)
Faculty of Sciences and TechnologyUniversity of Algarve, Faro
João M. P. Cardoso
April 30, 2001
IEEE Symposium on Field-Programmable Custom Computing Machines, Rohnert Park, CA, USA
A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units
A Novel Algorithm Combining Temporal Partitioning and Sharing of Functional Units
Portugal
![Page 2: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/2.jpg)
IndexIndex
Introduction
Temporal Partitioning
Problem Definition
New vs Previous Approach
Algorithm Working Through an Example
Experimental Results
Related Work
Conclusions
Future Work
![Page 3: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/3.jpg)
IntroductionIntroduction
“Virtual Hardware”: Reuse of devices Save silicon area View “unlimited resources” Enabled by the dynamically reconfigurable FPGAs
Two concepts: Context switching among functionalities Allowing a large “function” to be executed
FPGA devices allowing virtualization: off-chip configurations on-chip configurations
Several research efforts…
![Page 4: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/4.jpg)
IntroductionIntroduction
Answers: Temporal Partitioning Sharing of Functional Units
Goal: combining the two...
dx
+
u
-
u
-
dx
+
u_1
x y
dxx
x_1
dxu
y_1
+
y<< 1 << 1
Size larger than the available reconfigware area?
![Page 5: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/5.jpg)
Temporal PartitioningTemporal Partitioning
uxdxx u
aux1
+
x_1
dx
y_1
+
y<< 1
time
![Page 6: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/6.jpg)
Temporal PartitioningTemporal Partitioning
aux1
dx
-
u
-
dx
+
u_1
y
<< 1
time
![Page 7: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/7.jpg)
Temporal PartitioningTemporal Partitioning
aux1
+
ux
dxx
x_1
dxu
y_1
+
y<< 1
aux1
dx
-
u
-
dx
+
u_1
y
<< 1
time
![Page 8: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/8.jpg)
Temporal PartitioningTemporal Partitioning
Create temporal partitions to be executed by time-sharing the device
Netlist level (structural) Difficulties when dealing with feedbacks Loss of Information Flat structure Intricate for exploiting sharing of functional units
Behavioral level (functional) Loops can be explicitly represented Better design decisions “A must” for compilers for reconfigurable computing
![Page 9: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/9.jpg)
Problem DefinitionProblem Definition
But, if we decrease the needed area by sharing functional units?
Simultaneously Temporal Partitioning and sharing of Functional Units
THE PROBLEM:
Given a dataflow graph (representing a behavioral description), a library of components,...
Map the dataflow graph onto the available resources of the FPGA device: Considering sharing of Functional Units Considering Temporal Partitioning Decreasing the overall execution latency
![Page 10: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/10.jpg)
New vs Previous ApproachNew vs Previous Approach
Previous
Simultaneously Temporal
Partitioning and High-Level Synthesis
Component Library
ConstraintsDFG, CDFG
Circuit-generation,
Logic Synthesis
Temporal Partitioning
High-Level Synthesis
Component Library
Circuit-generation,
Logic Synthesis
ConstraintsDFG, CDFG
New
![Page 11: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/11.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Suppose the following dataflow graphSuppose the following dataflow graph Consider:
Area(+) = 1 cell Area(x) = 2 cells Delay(+) = 1 control step (cs) Delay(x) = 2 cs
Total area of the DFG: 8 cells
Available Area: 3 cells
0 1
2
3
4
5
![Page 12: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/12.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Calculate ASAP and ALAP valuesCalculate ASAP and ALAP values
Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3
0 1
2
3
4
5
![Page 13: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/13.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Identify the critical pathIdentify the critical path
Node 0 1 2 3 4 5ASAP 0 0 1 0 2 3ALAP 1 1 2 0 2 3
0 1
2
3
4
5
![Page 14: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/14.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Create an initial number of TPs: suppose 3Create an initial number of TPs: suppose 3
0 1
2
3
4
5
MAXCS
1
2
3
Area
![Page 15: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/15.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Map each node of the critical path on each temporal partitionMap each node of the critical path on each temporal partition
0 1
2
3
4
5
MAXCS
2 cs
1
2
3
3
4
5
Area
1 cs
1 cs
![Page 16: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/16.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)
0 1
2
3
4
5
MAXCS
2 cs
1
2
3
3
4
5
Area
1 cs
1 cs
![Page 17: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/17.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
0
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)
![Page 18: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/18.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)
![Page 19: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/19.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
3
Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)
0 1
2
3
4
5
![Page 20: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/20.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
2
Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)
0 1
2
3
4
5
![Page 21: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/21.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
Try to map nodes in each temporal partition (3)Try to map nodes in each temporal partition (3)
0 1
2
3
4
5
2
![Page 22: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/22.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Relax: add 1 clock step to MAXCS Relax: add 1 clock step to MAXCS
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
![Page 23: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/23.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
3
Try to map nodes in each temporal partition (1)Try to map nodes in each temporal partition (1)
![Page 24: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/24.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)
2
![Page 25: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/25.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
10
2 cs
1
2
3
3
4
5
1 cs
1 cs
MAXCSArea
0 1
2
3
4
5
2
Try to map nodes in each temporal partition (2)Try to map nodes in each temporal partition (2)
2
![Page 26: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/26.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Merge Operation (1) Merge Operation (1)
10
2 cs
1
2
3
3
4
5
2 cs
1 cs
MAXCSArea
0 1
2
3
4
5
2
![Page 27: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/27.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Merge Operation (1) Merge Operation (1)
10
1,2
3
3
4
5
MAXCSArea
2
0 1
2
3
4
54 cs
1 cs
![Page 28: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/28.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Merge Operation (2) Merge Operation (2)
10
1,2
3
3
4
5
1 cs
MAXCSArea
2
0 1
2
3
4
54 cs
![Page 29: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/29.jpg)
Algorithm Working Through an ExampleAlgorithm Working Through an Example
Merge Operation (2) Merge Operation (2)
10
1,2,3
3
4
5
MAXCSArea
2
0 1
2
3
4
5
4 cs
![Page 30: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/30.jpg)
Experimental ResultsExperimental Results
Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing
0
2
4
6
8
10
12
14
16
18
#T
Ps
-30%
-20%
-10%
0%
10%
20%
30%
Pe
rf. Im
pro
v.
#p(SA) #p(Our*)#p(Our*) %(#cs-Our*)%(#cs-Our**)
EX1 SEHWA HAL EWF
![Page 31: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/31.jpg)
Experimental ResultsExperimental Results
048
12
16202428
#TP
s
-16%-10%-4%2%8%14%20%26%32%
Per
f. Im
prov
.
#p(SA) #p(Our*) #p(Our*)
%(#cs-Our*) %(#cs-Our**)
Near-optimal w/o sharing vs sharingNear-optimal w/o sharing vs sharing
FIR MAT4x4
72 37
![Page 32: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/32.jpg)
Experimental ResultsExperimental Results
Performance vs No. of Temporal PartitionsPerformance vs No. of Temporal Partitions
Mult4x4, RMAX=10 (no sharing of adders)
05
1015202530
1 3 5 7 9 11 13 15 17 19 21 23 25Initial Number of TPs
Final
#TPs
646668
7072
Exec
. (#c
s)
TPsExec.
![Page 33: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/33.jpg)
Experimental ResultsExperimental Results
Is the algorithm good for scheduling?Is the algorithm good for scheduling?
0
5
10
15
20
25
30
35
#cs
known scheduling results
Our
EWF SEHWA
Comparison to some optimum results
![Page 34: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/34.jpg)
Related WorkRelated Work
List-Scheduling considering dynamic reconfiguration [Vasilko et al., FPL’96]
ASAP [GajjalaPurna et al., IEEE Trans. on Comp., 1999]
Minimize latency taking onto account communication costs [Cardoso et al. VLSI’99]: Enhanced Static-List Scheduling Iterative approach (Simulated Annealing)
ILP formulation [SPARCs, DATE’98; RAW’98]
Enhanced Force-Directed List Scheduling [Pandey et al., SPIE’99]
And others [see the Related Work section]
![Page 35: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/35.jpg)
ConclusionsConclusions
Novel algorithm simultaneously doing temporal partitioning and sharing of functional units Low complexity Heuristic approach Based on gradually enlarging of time slots
Permits to exploit the duality between the number of temporal partitions and resource sharing
Close-to-optimum results with some examples
Results proved that the algorithm is not weak when performing scheduling
![Page 36: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/36.jpg)
Future WorkFuture Work
Enhancements to the algorithm: consider functional units with pipelining consider pipelining between execution and
reconfiguration
Study the possibility to take into account communication and reconfiguration costs
Test results with a reconfigurable computing system (comercial board)
![Page 37: Faculty of Sciences and Technology University of Algarve, Faro João M. P. Cardoso April 30, 2001 IEEE Symposium on Field-Programmable Custom Computing](https://reader035.vdocuments.mx/reader035/viewer/2022062619/5518c0ae550346991f8b5580/html5/thumbnails/37.jpg)
Contact AuthorContact Author
João M. P. Cardoso
http://w3.ualg.pt/~jmcardo
THANK YOU!