rcim 2008 - modello generale
TRANSCRIPT
POLITECNICO DI MILANO
Core Identification for Core Identification for
Reconfigurable Systems driven by Reconfigurable Systems driven by
Specification SelfSpecification Self--SimilaritySimilarity
Roberto Cordone: [email protected]
Massimo Redaelli: [email protected]
Reconfigurable Computing Italian MeetingReconfigurable Computing Italian Meeting19 December 2008
Room S01, Politecnico di Milano - Milan (Italy)
2
OutlineOutline
Introduction
General Problem
Rationale
Core Identification solutions
Results
Concluding Remarks
3
The problemThe problem
1. Partition a specification into subsets of operations
(tasks)
2. Map each task onto a compatible circuit design
(mode)
3. Assign a portion of the device to each task,
compatibly with its mode (size, shape,
heterogeneity)
4. Assign a reconfiguration time to each task
5. Assign an execution time to each task
4
The data (1)The data (1)
A specification DFG = (O,P)
operations O, including os, oe for start and end
precedences P: (o, o’) means that o ends before o’ starts
A set M of modes, characterized by
size cm (number of CLBs, possibly shape)
reconfiguration time dm
A compatibility relation between modes and tasks
a task S can be implemented in different modes (MS)
a mode can implement different tasks
5
The data (2)The data (2)
A latency lS,m associated to each task S and compatible
mode m
A set U of reconfigurable units (RUs)
size γu is the number of CLBs in unit u
A scheduling time horizon T (provided by a heuristic)
6
DecisionDecision variablesvariables
Partition O into tasks (set xS = 1 or 0 for each S ⊆ O)
Map each used task S onto a compatible mode mS ∈MS
Assign to each used task S a portion US ⊆ U
compatible with mS
Assign to each used task S a reconfiguration start time τS
Assign to each used task S an execution start time tS
7
A A generalgeneral modelmodel (1)(1)
xS defines a partition of O, with singletons for os, oe
and no induced cyclic precedence
mode mS is compatible with task S
mode mS fits into portion US
portion US is connected (to minimize communication overhead)
further shape constraints on portion US
further compatibility constraints between mode mS and portion US
(e.g., heterogeneous RUs)
Minimize the completion time
Subject to
8
A A generalgeneral modelmodel (2)(2)
the execution follows the reconfiguration
the precedences are respected:
for all S and S’ such that xS = xS = 1 and
two tasks cannot run together on the same RU
for all S and S’ such that xS = xS = 1
when a task is in execution, its RUs cannot be reconfigured
for all S and S’ such that xS = xS = 1
when a task is in reconfiguration, another task can share the
reconfiguration, but only using the same RUs and mode
for all S and S’ such that xS = xS = 1
9
Some Some remarksremarks
The partition of O turns the DFG (O,P) into a
Task Dependency Graph TDG = (N,A)
Also the TDG is acyclic (precedence constraints)
Partitioning, mapping, placing and scheduling
are not independent
The size of the search space is overwhelming:
for each subset of operations, one must define
a mode, out of |M| available ones
a subset of RUs, out of |U| available ones
a reconfiguration start time out of |T| available ones
an execution start time out of |T| available ones
Decomposition approach: build a partition xS independent from the
scheduling, but good enough for scheduling purposes
10
The Proposed Approach The Proposed Approach -- RationaleRationale
Reconfiguration times impact heavily on the final
solution’s latency
Reuse the configurable modules!
Our approach: identify recurrent structures in the
specification, automatically
11
The Proposed ApproachThe Proposed Approach
int test_code( int io , int * o1)
{
int a = 2, b = 10;
Specification DFG
Partitioned DFG
Reconfigurable Implementation
12
The Proposed Approach: DFG PartitioningThe Proposed Approach: DFG Partitioning
Our approach: two phases
Template Identification
Produce a collection of isomorphism equivalence
classes, each containing some isomorphic subgraphs of
the original specification
Graph covering (template choice)
Choose which among the identified templates are best
suitable for implementation as (re)configurable modules
13
Template identificationTemplate identification
Problem: finding repeated operations that get
performed in the specification.
In available literature (Software Engineering):
extracting procedures from flat (maybe legacy) code
Text-based matching approach (Ducasse et al. 1999,
Baker 1995)
AST approach (Baxter et al. 1998)
Source-based metrics approach (Higo et al. 2002, 2004)
Isomorphic graphsIsomorphic graphs
are isomorphic iff exists
or, if directed,
Problems with IsomorphismProblems with Isomorphism
• Several problems have been investigated:
1. Graph Isomorphism
2. Subgraph Isomorphism (GT48)
3. Largest Common Subgraph (GT49)
• However, we are concerned with only one graph:
• Isomorphic Subgraphs
• Find two isomorphic subgraphs S1 and S2 of a given
graph G
Our problem peculiaritiesOur problem peculiarities
The input graph is a Data
Flow Graph. Then:
Each operation/node has
an associated action;
The inputs of every
operation performing a
non-commutative action
must be distinguished
17
The The AlgorithmAlgorithm
1. Build a collection V of pairs of basic isomorphic subgraphs;
2. Extract one pair (S, S’ ) from V;
a) build the non-overlapping neighborhoods N (S) and N (S’ ),
which include the nodes adjacent, respectively, to S and S’ .
If any of them is empty, goto 3;
b) perform a maximum cardinality bipartite matching between N
(S) and N (S’ );
c) for each matched pair, if adding the two nodes to S and S’
preserves the isomorphism, add them to S and S’ . Goto 2(a)
3. Save the maximal isomorphic non-overlapping subgraphs S and
S’. Goto 2.
18
Sample Sample runrun
The initialization?The initialization?
Choose good starting points…
Iterate through all the edges, and create the sets of
those with same
Source operation o1
Sink operation o2
Same input order
They induce pairs of nodes which are good starting
point for the algorithm
20
StructuringStructuring the outputthe output
The algorithm returns a list of pairs:
{ (S1, S2), (S3, S4), (S5, S6), …}
Suppose S1 and S3 are isomorphic. Then so are S2 and
S4!
Suppose S3 is isomorphic to a subgraph of S1. Then S2
has a subgraph isomorphic to S4!
21
HierarchicalHierarchical TemplateTemplate GraphGraph
Size does matter. But also frequency does…
22
Template choice: metricsTemplate choice: metrics
Largest Fit First
Largest templates are best
Most Frequent fit First
Templates with the largest number of instances are best
Communication Weight metrics
E.g., #internal edges vs. #boundary edges ratio
23
Experimental Results: ReversedExperimental Results: Reversed--tree templatestree templates
BenchmarkLargest
Template
Largest
#Instances#Templates
AES - encryptblock 16 3 151
AES - decryptblock 19 3 162
DES - des_encrypt 38 4 57
FDCT 6 6 40
24
Experimental Results: FreeExperimental Results: Free--shape templatesshape templates
BenchmarkLargest
Template
Largest
#Instances#Templates
AES - encryptblock 132 2 6790
AES - decryptblock 147 2 11006
DES - des_encrypt 100 2 1802
FDCT 62 2 1470
25
Experimental Results: Graph coveringExperimental Results: Graph covering
BenchmarkCover %
LFF
Cover %
MFF
Cover %
CommCPU Time
AES - encryptblock 74.3 32.7 74.1 32.5 sec
AES - decryptblock 85.31 51.7 70.8 61 sec
DES - des_encrypt 90.5 59.6 87.8 8.3 sec
FDCT 76.7 53.8 73.3 6.4 sec
26
ExperimentalExperimental resultsresults
2727
QuestionsQuestions