imag€¦ · motivationapplication modeldeployment using smtsymmetry eliminationdistributed memory...

192
Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions Mapping and Scheduling Streaming Applications using SMT Solvers Pranav Tendulkar Supervisors: Dr. Oded Maler Dr. Peter Poplavko Verimag, FRANCE 13 October 2014 Tendulkar Mapping/scheduling for many-core 1 / 52

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Mapping and Scheduling StreamingApplications using SMT Solvers

Pranav Tendulkar

Supervisors:Dr. Oded Maler Dr. Peter Poplavko

Verimag, FRANCE

13 October 2014

Tendulkar Mapping/scheduling for many-core 1 / 52

Page 2: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core Processors Everywhere

Cars

Phones

Space-shuttle

Tablets Laptops

Cameras

Smart-TV

Tendulkar Mapping/scheduling for many-core 2 / 52

Page 3: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core Processors Everywhere

source : http://www.csl.cornell.edu/courses/ece5745/handouts.html

Tendulkar Mapping/scheduling for many-core 3 / 52

Page 4: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core Processors Everywhere

source : http://www.csl.cornell.edu/courses/ece5745/handouts.html

Tendulkar Mapping/scheduling for many-core 3 / 52

Page 5: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core systems

How To:

Deploy the application to the platform

Decide number of processors to use?

Allocate tasks to processors and schedule them

Tendulkar Mapping/scheduling for many-core 4 / 52

Page 6: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core systems

How To:

Deploy the application to the platform

Decide number of processors to use?

Allocate tasks to processors and schedule them

Tendulkar Mapping/scheduling for many-core 4 / 52

Page 7: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core systems

How To:

Deploy the application to the platform

Decide number of processors to use?

Allocate tasks to processors and schedule them

Tendulkar Mapping/scheduling for many-core 4 / 52

Page 8: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-core systems

How To:

Deploy the application to the platform

Decide number of processors to use?

Allocate tasks to processors and schedule them

Tendulkar Mapping/scheduling for many-core 4 / 52

Page 9: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform Model

Application Model

OptimizationTechniques

CodeSolution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 10: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform ModelApplication Model

OptimizationTechniques

CodeSolution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 11: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform ModelApplication Model

OptimizationTechniques

CodeSolution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 12: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform ModelApplication Model

OptimizationTechniques

CodeSolution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 13: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform ModelApplication Model

OptimizationTechniques

Code

Solution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 14: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Our Deployment Framework

Constraints

Performance

Mem

ory

#Pro

cess

ors

Platform ModelApplication Model

OptimizationTechniques

CodeSolution

Tendulkar Mapping/scheduling for many-core 5 / 52

Page 15: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Application Model

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relationsannotated with execution time

Tendulkar Mapping/scheduling for many-core 6 / 52

Page 16: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Application Model

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relationsannotated with execution time

Tendulkar Mapping/scheduling for many-core 6 / 52

Page 17: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Application Model

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relations

annotated with execution time

Tendulkar Mapping/scheduling for many-core 6 / 52

Page 18: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Application Model

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relations

annotated with execution time

Tendulkar Mapping/scheduling for many-core 6 / 52

Page 19: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment Problem

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relations

Deployment Solution

C F E D H J

A B G I

Time

P1

P2

Pro

cess

ors

Mapping : Task⇒ Processor

Scheduling : Task⇒ Time

Tendulkar Mapping/scheduling for many-core 7 / 52

Page 20: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment Problem

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relations

Deployment Solution

C F E D H J

A B G I

Time

P1

P2

Pro

cess

ors

Mapping : Task⇒ Processor

Scheduling : Task⇒ Time

Tendulkar Mapping/scheduling for many-core 7 / 52

Page 21: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment Problem

Task Graph

D

E

F

G

B

C

A

H

I

J

Tasks : Software procedure

Edges : Precedence relations

Deployment Solution

C F E D H J

A B G I

Time

P1

P2

Pro

cess

ors

xC

Mapping : Task⇒ Processor

Scheduling : Task⇒ Time

Tendulkar Mapping/scheduling for many-core 7 / 52

Page 22: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment Problem

Solution1:

C F E D H J

A B G I

Time

P1

P2

Pro

cess

ors

Solution2:

A B C D E F G H I J

Time

P1

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 8 / 52

Page 23: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment Problem

Solution1:

C F E D H J

A B G I

Time

P1

P2

Pro

cess

ors

Solution2:

A B C D E F G H I J

Time

P1

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 8 / 52

Page 24: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Solution space is large

Tendulkar Mapping/scheduling for many-core 9 / 52

Page 25: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment problem

How to:find optimal solutions in exponential design space.

model complex hardware which has Processors, Network, DMA

evaluate multiple criteriaLatencyMemory usedProcessors used...

Tendulkar Mapping/scheduling for many-core 10 / 52

Page 26: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment problem

How to:find optimal solutions in exponential design space.

model complex hardware which has Processors, Network, DMA

evaluate multiple criteriaLatencyMemory usedProcessors used...

Tendulkar Mapping/scheduling for many-core 10 / 52

Page 27: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Deployment problem

How to:find optimal solutions in exponential design space.

model complex hardware which has Processors, Network, DMA

evaluate multiple criteriaLatencyMemory usedProcessors used...

Tendulkar Mapping/scheduling for many-core 10 / 52

Page 28: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Outline

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 11 / 52

Page 29: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 12 / 52

Page 30: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model of Computation

Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987

represents Streaming Applications

Computation

Input output

Tendulkar Mapping/scheduling for many-core 13 / 52

Page 31: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model of Computation

Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987

represents Streaming Applications

Computation

Input output

Tendulkar Mapping/scheduling for many-core 13 / 52

Page 32: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model of Computation

Synchronous Dataflow graphs (SDF)by Edward Lee and David Messerschmitt in 1987

represents Streaming Applications

Computation

Input output

Tendulkar Mapping/scheduling for many-core 13 / 52

Page 33: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre

Blur1

Blur0

Blur2

Blur3

Post

images images

Tendulkar Mapping/scheduling for many-core 14 / 52

Page 34: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 35: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4Pre Blur Post

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 36: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 37: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 38: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Actor Blur is compact representation of data parallel tasks.

All Blur tasks have same properties such as execution time.

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 39: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Synchronous DataFlow

Pre Blur Post

SDF Graph

4 1 1 4

Pre

Blur1

Blur0

Blur2

Blur3

Post

Task Graph

Actors

Edges

Rates

Actor Blur is compact representation of data parallel tasks.All Blur tasks have same properties such as execution time.

Tendulkar Mapping/scheduling for many-core 15 / 52

Page 40: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Split-Join Graphswe use split-join graphs : restriction of SDF

still covering perhaps 90% of use cases in the literature

a simple example:

A B Cα 1/α

α : spawn and split

1/α: wait and join

A0

B1

B0

. . .

Bα−1

C0

Tendulkar Mapping/scheduling for many-core 16 / 52

Page 41: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Split-Join Graphswe use split-join graphs : restriction of SDF

still covering perhaps 90% of use cases in the literature

a simple example:

A B Cα 1/α

α : spawn and split

1/α: wait and join

A0

B1

B0

. . .

Bα−1

C0

Tendulkar Mapping/scheduling for many-core 16 / 52

Page 42: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Split-Join Graphswe use split-join graphs : restriction of SDF

still covering perhaps 90% of use cases in the literature

a simple example:

A B Cα 1/α

α : spawn and split

1/α: wait and join

A0

B1

B0

. . .

Bα−1

C0

Tendulkar Mapping/scheduling for many-core 16 / 52

Page 43: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Restrictions compared to general SDF

Split-join does not support:

Stateful actors

Non-proportional rates

Initial tokens and cyclic paths

A B

2 3

32

4

Tendulkar Mapping/scheduling for many-core 17 / 52

Page 44: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Restrictions compared to general SDF

Split-join does not support:

Stateful actors

Non-proportional rates

Initial tokens and cyclic paths

A B2 3

32

4

Tendulkar Mapping/scheduling for many-core 17 / 52

Page 45: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Restrictions compared to general SDF

Split-join does not support:

Stateful actors

Non-proportional rates

Initial tokens and cyclic paths

A B2 3

32

4

Tendulkar Mapping/scheduling for many-core 17 / 52

Page 46: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 18 / 52

Page 47: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 48: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 49: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 50: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 51: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = true

UNSATout0 = false

&out1 = true

SATin0 = true,

in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 52: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 53: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = true

SATin0 = true,

in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 54: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 55: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

SATisfiability solver (SAT / SMT)

Boolean variablesin0, in1, in2 ...out0, out1, out2 ...

Constraintsout0 = in0 ∨ in1 ⊕ in2 ...

SAT solver

variables

constraints

out0 = true&

out1 = trueUNSAT

out0 = false&

out1 = trueSAT

in0 = true,in1 = false, ...

SMT extends SAT by numeric variables and constants

Tendulkar Mapping/scheduling for many-core 19 / 52

Page 56: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description Variables

Start time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

Page 57: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

Page 58: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

Page 59: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

Page 60: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

A0

B0

Time

Pro

cess

ors

Page 61: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

B2 B1

TimeP

roce

ssor

s

OR

B1 B2

Time

Pro

cess

ors

Page 62: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Encoding deployment with constraints

A0

B1

B0

B2

B3

C0

Task Graph

Actor A B CTasks A0 B0 B1 B2 B3 C0

Description VariablesStart time xA0 xB0 xB1 xB2 xB3 xC0

Allocated proc. pA0 pB0 pB1 pB2 pB3 pC0

Duration dA dB dC

Precedence ConstraintsxB0 ≥ (xA0 + dA)

Mutual Exclusion Constraintsif (pB1 = pB2) then

xB1 ≥ (xB2 + dB) ∨ xB2 ≥ (xB1 + dB)

Latency CostLatency = (xC0 + dC)

Tendulkar Mapping/scheduling for many-core 20 / 52

A0 B0 B2

B1 B3 C0

Latency

Time

Pro

cess

ors

Page 63: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-criteria Problem

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

P4

Latency = 3#Proc = 4

Latency

Pro

cess

ors

(4,2)

(3,4)Pareto Set

Tendulkar Mapping/scheduling for many-core 21 / 52

Page 64: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-criteria Problem

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

P4

Latency = 3#Proc = 4

Latency

Pro

cess

ors

(4,2)

(3,4)

Pareto Set

Tendulkar Mapping/scheduling for many-core 21 / 52

Page 65: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-criteria Problem

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

P4

Latency = 3#Proc = 4

Latency

Pro

cess

ors

(4,2)

(3,4)

Pareto Set

Tendulkar Mapping/scheduling for many-core 21 / 52

Conflicting Criteria

Page 66: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Multi-criteria Problem

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

P4

Latency = 3#Proc = 4

Latency

Pro

cess

ors

(4,2)

(3,4)Pareto Set

Tendulkar Mapping/scheduling for many-core 21 / 52

Page 67: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

Upper Bound UpperB

ound

Latency

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 68: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

Upper Bound UpperB

ound

Latency

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 69: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0

B0 B2

B1 B3 C0

Time

P1

P2

P3

Latency = 5#Proc = 3

Upper Bound UpperB

ound

Latency

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 70: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

A0 B0 B2

B1 B3 C0

Time

P1

P2

Latency = 4#Proc = 2

A0

B0 B2

B1 B3 C0

Time

P1

P2

P3

Latency = 5#Proc = 3

Upper Bound UpperB

ound

Latency

Pro

cess

ors

SAT

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 71: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

Latency = 4#Proc = 2

Not Possible

Upper Bound UpperB

ound

Latency

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 72: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

Latency = 4#Proc = 2

Not Possible

Latency = 2#Proc = 1

Also Not Possible

Upper Bound UpperB

ound

Latency

Pro

cess

ors

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 73: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Problem Monotonicity

Latency = 4#Proc = 2

Not Possible

Latency = 2#Proc = 1

Also Not Possible

Upper Bound UpperB

ound

Latency

Pro

cess

ors

UNSAT

Tendulkar Mapping/scheduling for many-core 22 / 52

Page 74: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 75: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints

SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 76: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 77: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 78: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 79: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 80: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 81: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 82: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design Space Exploration

Split-join Graph

SMT Constraints SMT Solver

Design SpaceExploration Algorithm

costconstraints

solutions

(x1, y1)

SAT

(x2, y2)

UNSAT

(x3, y3)

?TIMEOUT

Timeout:Cannot decide SAT / UNSAT in a given TIME-BUDGET.

Tendulkar Mapping/scheduling for many-core 23 / 52

Page 83: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Exploration Algorithm

Divide cost space using grids

One SMT query per point on the grid

Finer grid after every iteration

Don’t query in known area

sat points unsat points not yet explored points

Tendulkar Mapping/scheduling for many-core 24 / 52

Page 84: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Exploration Algorithm

Divide cost space using grids

One SMT query per point on the grid

Finer grid after every iteration

Don’t query in known area

sat points unsat points not yet explored points

Tendulkar Mapping/scheduling for many-core 24 / 52

Page 85: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Exploration Algorithm

Divide cost space using grids

One SMT query per point on the grid

Finer grid after every iteration

Don’t query in known area

sat points unsat points not yet explored points

Tendulkar Mapping/scheduling for many-core 24 / 52

Page 86: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Exploration Algorithm

Divide cost space using grids

One SMT query per point on the grid

Finer grid after every iteration

Don’t query in known area

sat points unsat points not yet explored points

Tendulkar Mapping/scheduling for many-core 24 / 52

Page 87: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Exploration Algorithm

Divide cost space using grids

One SMT query per point on the grid

Finer grid after every iteration

Don’t query in known area

sat points unsat points not yet explored points

Tendulkar Mapping/scheduling for many-core 24 / 52

Page 88: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 25 / 52

Page 89: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 90: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 91: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 92: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 93: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 94: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 95: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

all instances of actor C are similar (symmetric)

No change in latency !

Huge number of such symmetric solutions

Add constraints to eliminate all but one

Tendulkar Mapping/scheduling for many-core 26 / 52

Page 96: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

C10

C11

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

a lexicographic schedule

B1 C01 C10 C11 D1 E

A0 B0 C00 D0

time

P1

P2

lexicographic order : C00 � C01 � C10 � C11

enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′

s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)

Tendulkar Mapping/scheduling for many-core 27 / 52

Page 97: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

C10

C11

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

a lexicographic schedule

B1 C01 C10 C11 D1 E

A0 B0 C00 D0

time

P1

P2

lexicographic order : C00 � C01 � C10 � C11

enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′

s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)

Tendulkar Mapping/scheduling for many-core 27 / 52

Page 98: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

C10

C11

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

a lexicographic schedule

B1 C01 C10 C11 D1 E

A0 B0 C00 D0

time

P1

P2

lexicographic order : C00 � C01 � C10 � C11

enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′

s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)

Tendulkar Mapping/scheduling for many-core 27 / 52

Page 99: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

C10

C11

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

a lexicographic schedule

B1 C01 C10 C11 D1 E

A0 B0 C00 D0

time

P1

P2

lexicographic order : C00 � C01 � C10 � C11

enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′

s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)

Tendulkar Mapping/scheduling for many-core 27 / 52

Page 100: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

C00

C01

C10

C11

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

a lexicographic schedule

B1 C01 C10 C11 D1 E

A0 B0 C00 D0

time

P1

P2

lexicographic order : C00 � C01 � C10 � C11

enforce lexicographic order in schedule:s(u) ≤ s(u′) for u� u′

s(C00) ≤ s(C01) ≤ s(C10) ≤ s(C11)

Tendulkar Mapping/scheduling for many-core 27 / 52

Page 101: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry : Theorem

Lexicographic Schedule

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

Theorem : Every group has a lexicographic schedule

Corollary : No feasible cost is lost

Tendulkar Mapping/scheduling for many-core 28 / 52

Page 102: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry : Theorem

Lexicographic Schedule

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

Theorem : Every group has a lexicographic schedule

Corollary : No feasible cost is lost

Tendulkar Mapping/scheduling for many-core 28 / 52

Page 103: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry : Theorem

Lexicographic Schedule

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

Theorem : Every group has a lexicographic schedule

Corollary : No feasible cost is lost

Tendulkar Mapping/scheduling for many-core 28 / 52

Page 104: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry : Theorem

Lexicographic Schedule

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

Theorem : Every group has a lexicographic schedule

Corollary : No feasible cost is lost

Tendulkar Mapping/scheduling for many-core 28 / 52

Page 105: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Task Symmetry : Theorem

Lexicographic Schedule

a schedule

B1 C10 C01 C00 D0 E0

A0 B0 C11 D1

time

P1

P2

C01 C00

a permuted schedule

B1 C10 C00 C01 D0 E0

A0 B0 C11 D1

time

P1

P2

C00 C01

Theorem : Every group has a lexicographic schedule

Corollary : No feasible cost is lost

Tendulkar Mapping/scheduling for many-core 28 / 52

Page 106: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Processor Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

schedule

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

swapped P1 and P2

Tendulkar Mapping/scheduling for many-core 29 / 52

Page 107: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Processor Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

schedule

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

swapped P1 and P2

Tendulkar Mapping/scheduling for many-core 29 / 52

Page 108: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Processor Symmetry

C00

C01

C10

C11

B0

B1

A0

D0

D1

E0

task graph

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

schedule

A0

B1

B0 C00

C10

C01

C11

D0

D1

E0

Time

P1

P2

swapped P1 and P2

Tendulkar Mapping/scheduling for many-core 29 / 52

Page 109: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Pareto Exploration

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

without symmetry breaking

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

with symmetry breaking

0 8 16 24 32 40 48 56

Takes no time Times out

Exploration : Processors vs Latency α = 30

Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.

Tendulkar Mapping/scheduling for many-core 30 / 52

Page 110: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Pareto Exploration

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

without symmetry breaking

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

with symmetry breaking

0 8 16 24 32 40 48 56

Takes no time Times out

Exploration : Processors vs Latency α = 30

Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.

Tendulkar Mapping/scheduling for many-core 30 / 52

Page 111: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Pareto Exploration

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

without symmetry breaking

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

with symmetry breaking

0 8 16 24 32 40 48 56

Takes no time Times out

Exploration : Processors vs Latency α = 30

Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.

Tendulkar Mapping/scheduling for many-core 30 / 52

Page 112: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Pareto Exploration

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

without symmetry breaking

0 5 10 15 20 25 30

Latency

0

5

10

15

20

25

30

Pro

cess

ors

Sat Points Unsat Points Pareto Curve

with symmetry breaking

0 8 16 24 32 40 48 56

Takes no time Times out

Exploration : Processors vs Latency α = 30

Solver PerformanceTimeouts reduce !The gap between SAT and UNSAT points is smaller.Tendulkar Mapping/scheduling for many-core 30 / 52

Page 113: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size

MPEG video decoder:

122 Tasks

Latency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

With symmetry Without symmetry

Better Pareto points in same TIME-Budget !

Tendulkar Mapping/scheduling for many-core 31 / 52

Page 114: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size

MPEG video decoder:

122 TasksLatency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

With symmetry Without symmetry

Better Pareto points in same TIME-Budget !

Tendulkar Mapping/scheduling for many-core 31 / 52

Page 115: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size

MPEG video decoder:

122 TasksLatency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

Latency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

Latency(.10 3)

8

12

16

20

24Buff

erSize

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

With symmetry Without symmetry

Better Pareto points in same TIME-Budget !

Tendulkar Mapping/scheduling for many-core 31 / 52

Page 116: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size

MPEG video decoder:

122 TasksLatency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

With symmetry Without symmetry

Better Pareto points

in same TIME-Budget !

Tendulkar Mapping/scheduling for many-core 31 / 52

Page 117: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Video Decoder3D cost space (CL,CP ,CB) exploration, CB - total buffer size

MPEG video decoder:

122 TasksLatency(.10 3)

8

12

16

20

24

Buffer

Size

150

200

250

300

350

400

Pro

cess

or

0

20

40

60

80

100

120

140

[5,367,91]

[24,276,1]

[14,276,122]

[14,333,62]

[10,323,122]

[17,182,122]

[7,205,122]

[24,182,1]

[19,182,31]

[10,229,31]

With symmetry Without symmetry

Better Pareto points in same TIME-Budget !

Tendulkar Mapping/scheduling for many-core 31 / 52

Page 118: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Distributed memory scheduling

So far we ignored the communication costs

For distributed memory, communication needs to be modeled

Tendulkar Mapping/scheduling for many-core 32 / 52

Page 119: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Distributed memory scheduling

So far we ignored the communication costs

For distributed memory, communication needs to be modeled

Tendulkar Mapping/scheduling for many-core 32 / 52

Page 120: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 33 / 52

Page 121: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters16 processors2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 122: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters

16 processors2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 123: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters

16 processors2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 124: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters16 processors

2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 125: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters16 processors2 MB Shared Memory

DMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 126: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters16 processors2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 127: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Kalray MPPA-256512KB

QuadCore

USMC PCIe inter laken DDR

GPIOs

Eth

Interlaken

Quad

Core

512K

B

Eth

Interlaken

Quad

Core

512K

B

DDR

GPIOs PCIe interlaken

QuadCore

512KB

SharedMemory

D-NocRouter

DMA

syst.core

C-NocRouter

C-NoC

DSU

P0 P1

P2 P3

P8 P9

P10 P11

P4 P5

P6 P7

P12 P13

P14 P15

16 compute clusters16 processors2 MB Shared MemoryDMA

Toroidal 2D network

Tendulkar Mapping/scheduling for many-core 34 / 52

Page 128: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

The problem?

Which cluster to allocate?

Which processor to allocate?

Connected tasks in same or different cluster?

Communicating tasks if to be added, which DMA?

And the constraintsPrecedence

Mutual Exclusion

Costs

For 10 tasks, 256 processors,

1.20892582× 1024 potential solutions!

Tendulkar Mapping/scheduling for many-core 35 / 52

Page 129: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

The problem?

Which cluster to allocate?

Which processor to allocate?

Connected tasks in same or different cluster?

Communicating tasks if to be added, which DMA?

And the constraintsPrecedence

Mutual Exclusion

Costs

For 10 tasks, 256 processors,

1.20892582× 1024 potential solutions!

Tendulkar Mapping/scheduling for many-core 35 / 52

Split the problem into sub-problems.

Page 130: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 131: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

A B

C

D

E F

Group the Actors

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 132: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

A B

C

D

E F

Group the Actors

GoalsLoad balance the groupsMinimize data exchange

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 133: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

A B

C

D

E F

Place the Groups

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 134: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

A B

C

D

E F

Place the Groups

GoalsMinimize distance between communicating groups

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 135: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

C0

D0

C1

D1

B0

B1

A0

E0

E1

F0

A0 B0

B1

C0

C1

D0

D1

E0

E1

F0

transfer

Time

Clu

ster

0C

lust

er1

P1

P2

DMA0

P1

P2

ScheduleTasks

Transfer

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 136: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Design FlowApplication

Graph

Partitioning

Placement

Multi-clusterScheduling

C0

D0

C1

D1

B0

B1

A0

E0

E1

F0

A0 B0

B1

C0

C1

D0

D1

E0

E1

F0

transfer

Time

Clu

ster

0C

lust

er1

P1

P2

DMA0

P1

P2

ScheduleTasks

Transfer

GoalsMinimize LatencyMinimize Buffer size

Tendulkar Mapping/scheduling for many-core 36 / 52

Page 137: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and Transfers

Cluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 138: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and TransfersCluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 139: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and TransfersCluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 140: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and TransfersCluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 141: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and TransfersCluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 142: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Output of Design Flow

Tasks and TransfersCluster Mapping

Processor and DMA Mapping

Start time

EdgesCommunication buffer size

ApplicationLatency

Tendulkar Mapping/scheduling for many-core 37 / 52

Page 143: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

DMA Model

Tasks communicating via DMA:

A I G B

DMA

I G

A

I

I

G

B

Time

Clu

ster

0C

lust

er1

P1

DMA0

P1

Task Description Resources used Task duration

I Initialization Processor and DMA Constant

G Network Transfer Only DMA Transfer size dependent

Tendulkar Mapping/scheduling for many-core 38 / 52

Page 144: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

DMA Model

Tasks communicating via DMA:

A I G B

DMA

I G

A

I

I

G

B

Time

Clu

ster

0C

lust

er1

P1

DMA0

P1

Task Description Resources used Task duration

I Initialization Processor and DMA Constant

G Network Transfer Only DMA Transfer size dependent

Tendulkar Mapping/scheduling for many-core 38 / 52

Page 145: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

DMA Model

Tasks communicating via DMA:

A I G B

DMA

I

G

A

I

I

G

B

Time

Clu

ster

0C

lust

er1

P1

DMA0

P1

Task Description Resources used Task duration

I Initialization Processor and DMA Constant

G Network Transfer Only DMA Transfer size dependent

Tendulkar Mapping/scheduling for many-core 38 / 52

Page 146: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

DMA Model

Tasks communicating via DMA:

A I G B

DMA

I GA

I

I

G

B

Time

Clu

ster

0C

lust

er1

P1

DMA0

P1

Task Description Resources used Task duration

I Initialization Processor and DMA Constant

G Network Transfer Only DMA Transfer size dependent

Tendulkar Mapping/scheduling for many-core 38 / 52

Page 147: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 148: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 149: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 150: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 151: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 152: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-control

DMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 153: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Model TransformationAn example application graph:

A B[α, ω]

Partition-Aware graph:

A Iwr Gwr Bewt : [1, w

↑] ewn : [1] ert : [α, ω]

Buffer-Aware graph:

A Iwr Gwr

Fst

B

IrdGrd

ewt : [1, w↑] ewn : [1] ert : [α, ω]

ews: [1]

ewb : [1, 0, b(e

wt )]

e rs:[1]

ern : [1]

erb : [α −1

, 0, b(ert )]

DMA : Data

DMA : flow-controlDMA-Completion

Tendulkar Mapping/scheduling for many-core 39 / 52

Page 154: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

VLD IQ/IDCT COLOR

12 1

12

VLD : Variable Length Decoder

IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform

Color : Color Conversion

Tendulkar Mapping/scheduling for many-core 40 / 52

Page 155: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

VLD IQ/IDCT COLOR

12 1

12

VLD : Variable Length Decoder

IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform

Color : Color Conversion

Tendulkar Mapping/scheduling for many-core 40 / 52

Page 156: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

VLD IQ/IDCT COLOR

12 1

12

VLD : Variable Length Decoder

IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform

Color : Color Conversion

Tendulkar Mapping/scheduling for many-core 40 / 52

Page 157: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

VLD IQ/IDCT COLOR

12 1

12

VLD : Variable Length Decoder

IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform

Color : Color Conversion

Tendulkar Mapping/scheduling for many-core 40 / 52

Page 158: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 159: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 160: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 161: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 162: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 163: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEGDecoder

Partitioning

Placement

Multi-clusterScheduling

Partitioning

Placement

Multi-clusterScheduling

Partitioning Solutions:

Cz : No. of Groups

Cη : Total communication cost

Cτ : Max. workload per group

Allocated group Exploration CostSolution vld iq color Cz Cη Cτ

Ps0 0 1 2 3 12384 424012Ps1 0 0 1 2 2736 758116Ps2 0 0 0 1 0 934288Ps3 0 1 1 2 9648 510276

Scheduling Solutions:

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0 Ps1 Ps2 Ps3

Tendulkar Mapping/scheduling for many-core 41 / 52

Page 164: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEG decoder latency measured on Kalray platform

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps1

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize(

byte

s)

Ps2

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize(

byte

s)

Ps3

model measured-min. measured-max.

Maximum prediction error of 9%

Tendulkar Mapping/scheduling for many-core 42 / 52

Page 165: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

JPEG Decoder Example

JPEG decoder latency measured on Kalray platform

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps0

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize

(byt

es)

Ps1

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize(

byte

s)

Ps2

0.4 0.5 0.6 0.7 0.8 0.9 1

·106

1

1.1

1.2

·104

Latency (cycles)

Buf

ferS

ize(

byte

s)

Ps3

model measured-min. measured-max.

Maximum prediction error of 9%

Tendulkar Mapping/scheduling for many-core 42 / 52

Page 166: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

StreamIt Benchmarks

JPEG

Dec.

Beam

Form

er

Inser

tion Sor

t

Merge

Sort

Radix

Sort

Dct1 Dct2 Dct3 Dct4 Dct5 Dct6 Dct7 Dct8

DctCoa

rse

DctFine

Comp.

coun

t

Matrix

Mult.

Fft0

20

40

60

80

100

25

155

7

37

6 4

8

4

8 8

24

7 10

3

6 4

8 8

#Solutions %error

Tendulkar Mapping/scheduling for many-core 43 / 52

Page 167: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

StreamIt Benchmarks

JPEG

Dec.

Beam

Form

er

Inser

tion Sor

t

Merge

Sort

Radix

Sort

Dct1 Dct2 Dct3 Dct4 Dct5 Dct6 Dct7 Dct8

DctCoa

rse

DctFine

Comp.

coun

t

Matrix

Mult.

Fft0

20

40

60

80

100

25

155

7

37

6 4

8

4

8 8

24

7 10

3

6 4

8 8

#Solutions %error

Tendulkar Mapping/scheduling for many-core 43 / 52

Page 168: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 44 / 52

Page 169: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal

SAT

UNSAT

Tendulkar Mapping/scheduling for many-core 45 / 52

Page 170: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal SAT

UNSAT

Tendulkar Mapping/scheduling for many-core 45 / 52

Page 171: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal SAT

UNSAT

Tendulkar Mapping/scheduling for many-core 45 / 52

Page 172: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal SAT

UNSAT

Tendulkar Mapping/scheduling for many-core 45 / 52

Page 173: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal SAT

UNSAT

Tendulkar Mapping/scheduling for many-core 45 / 52

Page 174: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal

TIMEOUT

foundoptimal

Tendulkar Mapping/scheduling for many-core 46 / 52

Page 175: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal

TIMEOUT

foundoptimal

Tendulkar Mapping/scheduling for many-core 46 / 52

Page 176: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal

TIMEOUT

foundoptimal

Tendulkar Mapping/scheduling for many-core 46 / 52

Page 177: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

lower bound upper boundoptimal

TIMEOUT

foundoptimal

Tendulkar Mapping/scheduling for many-core 46 / 52

Page 178: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3 Optimized Schedule

Such constraints makes the problem harder for SMT

Tendulkar Mapping/scheduling for many-core 47 / 52

Page 179: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3 Optimized Schedule

Such constraints makes the problem harder for SMT

Tendulkar Mapping/scheduling for many-core 47 / 52

Page 180: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3 Optimized Schedule

Such constraints makes the problem harder for SMT

Tendulkar Mapping/scheduling for many-core 47 / 52

Page 181: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3 Optimized Schedule

Such constraints makes the problem harder for SMT

Tendulkar Mapping/scheduling for many-core 47 / 52

Page 182: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Lessons learnt from SMT solver

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3

A0 B0

B1

B2

B3

C0

Time

P1

P2

P3 Optimized Schedule

Such constraints makes the problem harder for SMT

Tendulkar Mapping/scheduling for many-core 47 / 52

Page 183: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Two-step optimization

Get a loose schedule from the solver

Optimize it for:LatencyProcessors used

Upper Bound UpperB

ound

LatencyP

roce

ssor

s

LooseSolution

Tendulkar Mapping/scheduling for many-core 48 / 52

Page 184: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Two-step optimization

Get a loose schedule from the solver

Optimize it for:LatencyProcessors used

Upper Bound UpperB

ound

LatencyP

roce

ssor

s

LooseSolution

Tendulkar Mapping/scheduling for many-core 48 / 52

Page 185: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Two-step optimization

Get a loose schedule from the solver

Optimize it for:LatencyProcessors used

Upper Bound UpperB

ound

LatencyP

roce

ssor

s

LooseSolution

optimized

Tendulkar Mapping/scheduling for many-core 48 / 52

Page 186: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Two-step optimization

Get a loose schedule from the solver

Optimize it for:LatencyProcessors used

Upper Bound UpperB

ound

LatencyP

roce

ssor

s

LooseSolution

optimized

SAT

Tendulkar Mapping/scheduling for many-core 48 / 52

Page 187: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Overview

1 Motivation

2 Application Model

3 Deployment using SMT

4 Symmetry elimination

5 Distributed memory scheduling

6 SMT Solving

7 Conclusions

Tendulkar Mapping/scheduling for many-core 49 / 52

Page 188: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Conclusions and Future Work

Conclusions:

Symmetry elimination finds better solutions

Combined Optimization with Communication modeling

Automated design flow for distributed memory

Tendulkar Mapping/scheduling for many-core 50 / 52

Page 189: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Conclusions and Future Work

Conclusions:

Symmetry elimination finds better solutions

Combined Optimization with Communication modeling

Automated design flow for distributed memory

Tendulkar Mapping/scheduling for many-core 50 / 52

Page 190: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Conclusions and Future Work

Conclusions:

Symmetry elimination finds better solutions

Combined Optimization with Communication modeling

Automated design flow for distributed memory

Tendulkar Mapping/scheduling for many-core 50 / 52

Page 191: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

References

P. Tendulkar, P. Poplavko, and O. Maler. “Symmetry Breaking forMulti-criteria Mapping and Scheduling on Multicores”. In:FORMATS. 2013

P. Tendulkar, P. Poplavko, I. Galanommatis, and O. Maler.“Many-Core Scheduling of Data Parallel Applications using SMTSolvers”. In: DSD. 2014

P. Tendulkar, P. Poplavko, and O. Maler. Strictly PeriodicScheduling of Acyclic Synchronous Dataflow Graphs using SMTSolvers. Tech. rep. Verimag Research Report, 2014

Tendulkar Mapping/scheduling for many-core 51 / 52

Page 192: imag€¦ · MotivationApplication ModelDeployment using SMTSymmetry eliminationDistributed memory schedulingSMT SolvingConclusions Pareto Exploration 0 5 10 15 20 25 30 Latency 0

Motivation Application Model Deployment using SMT Symmetry elimination Distributed memory scheduling SMT Solving Conclusions

Thank You

Questions?

Tendulkar Mapping/scheduling for many-core 52 / 52