rcim 2008 - modello generale

POLITECNICO DI MILANO

Core Identification for Core Identification for

Reconfigurable Systems driven by Reconfigurable Systems driven by

Specification SelfSpecification Self--SimilaritySimilarity

Roberto Cordone: [email protected]

Massimo Redaelli: [email protected]

Reconfigurable Computing Italian MeetingReconfigurable Computing Italian Meeting19 December 2008

Room S01, Politecnico di Milano - Milan (Italy)

2

OutlineOutline

Introduction

General Problem

Rationale

Core Identification solutions

Results

Concluding Remarks

3

The problemThe problem

1. Partition a specification into subsets of operations

(tasks)

2. Map each task onto a compatible circuit design

(mode)

3. Assign a portion of the device to each task,

compatibly with its mode (size, shape,

heterogeneity)

4. Assign a reconfiguration time to each task

5. Assign an execution time to each task

4

The data (1)The data (1)

A specification DFG = (O,P)

operations O, including os, oe for start and end

precedences P: (o, o’) means that o ends before o’ starts

A set M of modes, characterized by

size cm (number of CLBs, possibly shape)

reconfiguration time dm

A compatibility relation between modes and tasks

a task S can be implemented in different modes (MS)

a mode can implement different tasks

5

The data (2)The data (2)

A latency lS,m associated to each task S and compatible

mode m

A set U of reconfigurable units (RUs)

size γu is the number of CLBs in unit u

A scheduling time horizon T (provided by a heuristic)

6

DecisionDecision variablesvariables

Partition O into tasks (set xS = 1 or 0 for each S ⊆ O)

Map each used task S onto a compatible mode mS ∈MS

Assign to each used task S a portion US ⊆ U

compatible with mS

Assign to each used task S a reconfiguration start time τS

Assign to each used task S an execution start time tS

7

A A generalgeneral modelmodel (1)(1)

xS defines a partition of O, with singletons for os, oe

and no induced cyclic precedence

mode mS is compatible with task S

mode mS fits into portion US

portion US is connected (to minimize communication overhead)

further shape constraints on portion US

further compatibility constraints between mode mS and portion US

(e.g., heterogeneous RUs)

Minimize the completion time

Subject to

8

A A generalgeneral modelmodel (2)(2)

the execution follows the reconfiguration

the precedences are respected:

for all S and S’ such that xS = xS = 1 and

two tasks cannot run together on the same RU

for all S and S’ such that xS = xS = 1

when a task is in execution, its RUs cannot be reconfigured


when a task is in reconfiguration, another task can share the

reconfiguration, but only using the same RUs and mode


9

Some Some remarksremarks

The partition of O turns the DFG (O,P) into a

Task Dependency Graph TDG = (N,A)

Also the TDG is acyclic (precedence constraints)

Partitioning, mapping, placing and scheduling

are not independent

The size of the search space is overwhelming:

for each subset of operations, one must define

a mode, out of |M| available ones

a subset of RUs, out of |U| available ones

a reconfiguration start time out of |T| available ones

an execution start time out of |T| available ones

Decomposition approach: build a partition xS independent from the

scheduling, but good enough for scheduling purposes

10

The Proposed Approach The Proposed Approach -- RationaleRationale

Reconfiguration times impact heavily on the final

solution’s latency

Reuse the configurable modules!

Our approach: identify recurrent structures in the

specification, automatically

11

The Proposed ApproachThe Proposed Approach

int test_code( int io , int * o1)

{

int a = 2, b = 10;

Specification DFG

Partitioned DFG

Reconfigurable Implementation

12

The Proposed Approach: DFG PartitioningThe Proposed Approach: DFG Partitioning

Our approach: two phases

Template Identification

Produce a collection of isomorphism equivalence

classes, each containing some isomorphic subgraphs of

the original specification

Graph covering (template choice)

Choose which among the identified templates are best

suitable for implementation as (re)configurable modules

13

Template identificationTemplate identification

Problem: finding repeated operations that get

performed in the specification.

In available literature (Software Engineering):

extracting procedures from flat (maybe legacy) code

Text-based matching approach (Ducasse et al. 1999,

Baker 1995)

AST approach (Baxter et al. 1998)

Source-based metrics approach (Higo et al. 2002, 2004)

Isomorphic graphsIsomorphic graphs

are isomorphic iff exists

or, if directed,

Problems with IsomorphismProblems with Isomorphism

• Several problems have been investigated:

1. Graph Isomorphism

2. Subgraph Isomorphism (GT48)

3. Largest Common Subgraph (GT49)

• However, we are concerned with only one graph:

• Isomorphic Subgraphs

• Find two isomorphic subgraphs S1 and S2 of a given

graph G

Our problem peculiaritiesOur problem peculiarities

The input graph is a Data

Flow Graph. Then:

Each operation/node has

an associated action;

The inputs of every

operation performing a

non-commutative action

must be distinguished

17

The The AlgorithmAlgorithm

1. Build a collection V of pairs of basic isomorphic subgraphs;

2. Extract one pair (S, S’ ) from V;

a) build the non-overlapping neighborhoods N (S) and N (S’ ),

which include the nodes adjacent, respectively, to S and S’ .

If any of them is empty, goto 3;

b) perform a maximum cardinality bipartite matching between N

(S) and N (S’ );

c) for each matched pair, if adding the two nodes to S and S’

preserves the isomorphism, add them to S and S’ . Goto 2(a)

3. Save the maximal isomorphic non-overlapping subgraphs S and

S’. Goto 2.

18

Sample Sample runrun

The initialization?The initialization?

Choose good starting points…

Iterate through all the edges, and create the sets of

those with same

Source operation o1

Sink operation o2

Same input order

They induce pairs of nodes which are good starting

point for the algorithm

20

StructuringStructuring the outputthe output

The algorithm returns a list of pairs:

{ (S1, S2), (S3, S4), (S5, S6), …}

Suppose S1 and S3 are isomorphic. Then so are S2 and

S4!

Suppose S3 is isomorphic to a subgraph of S1. Then S2

has a subgraph isomorphic to S4!

21

HierarchicalHierarchical TemplateTemplate GraphGraph

Size does matter. But also frequency does…

22

Template choice: metricsTemplate choice: metrics

Largest Fit First

Largest templates are best

Most Frequent fit First

Templates with the largest number of instances are best

Communication Weight metrics

E.g., #internal edges vs. #boundary edges ratio

23

Experimental Results: ReversedExperimental Results: Reversed--tree templatestree templates

BenchmarkLargest

Template

Largest

#Instances#Templates

AES - encryptblock 16 3 151

AES - decryptblock 19 3 162

DES - des_encrypt 38 4 57

FDCT 6 6 40

24

Experimental Results: FreeExperimental Results: Free--shape templatesshape templates

BenchmarkLargest

Template

Largest

#Instances#Templates

AES - encryptblock 132 2 6790

AES - decryptblock 147 2 11006

DES - des_encrypt 100 2 1802

FDCT 62 2 1470

25

Experimental Results: Graph coveringExperimental Results: Graph covering

BenchmarkCover %

LFF

Cover %

MFF

Cover %

CommCPU Time

AES - encryptblock 74.3 32.7 74.1 32.5 sec

AES - decryptblock 85.31 51.7 70.8 61 sec

DES - des_encrypt 90.5 59.6 87.8 8.3 sec

FDCT 76.7 53.8 73.3 6.4 sec

26

ExperimentalExperimental resultsresults

2727

QuestionsQuestions

rcim 2008 - modello generale

Technology

code int io

p operationso

oeforstartandend precedencesp

pintoa taskdependencygraphtdg

polit massimoredaelli

thedata2 alatencyls

s4 supposes1ands3areisomorphic

thedata1 aspecificationdfg