multicore digital signal processing...10 multicore dsps – karol desnos ([email protected])...

48
1 Multicore DSPs Karol Desnos ([email protected]) MULTICORE DIGITAL SIGNAL PROCESSING Karol Desnos [email protected] Slides from M. Pelcat, K. Desnos, J.-F. Nezan, D. Ménard, M. Raulet, J. Gorin

Upload: others

Post on 13-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

1

Multicore DSPs – Karol Desnos ([email protected])

MULTICORE DIGITAL

SIGNAL PROCESSING

Karol Desnos – [email protected]

Slides from M. Pelcat, K. Desnos,

J.-F. Nezan, D. Ménard,

M. Raulet, J. Gorin

Page 2: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

2

Multicore DSPs – Karol Desnos ([email protected])

Previously

on MDSPs

Page 3: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

3

Multicore DSPs – Karol Desnos ([email protected])

Design Challenges for MPSoC-Based Systems

• Exploit architecture parallelism

• Express application parallelism

• Balance computational load on PEs

• Hardware/Software co-design process

• Complex design-space exploration

• Respect constraints

• Predict/guarantee application performances

• Reuse legacy code

Previously on MDSPs

Page 4: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

4

Multicore DSPs – Karol Desnos ([email protected])

Grail of Heterogeneous MPSoCs Programming

Previously on MDSPs

Multicore Compiler

Simulator

+ Debugger

+ Profiler

Algorithm

Architecture

Portable Multicore Program

PE

Main

Proc.

Main

Proc.

Main

Proc.

Main

Proc.

PE PE PE PE

Peripherals

Main

Memory

Multicore Runtime

Page 5: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

5

Multicore DSPs – Karol Desnos ([email protected])

Properties

• Synchronous Dataflow (SDF)

• Data-driven execution: An actor is fired when its input FIFOs contain

enough data-tokens.

Previously on MDSPs

B

A C D E

Core1 A B C C D E

2

2

1

1

1 2 1 1

Source: E. Lee and D. Messerschmitt, “Synchronous data flow”, Proceedings of the IEEE, 1987.

Page 6: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

6

Multicore DSPs – Karol Desnos ([email protected])

Properties

• Synchronous Dataflow (SDF)

• Parallelisms: / / /

Previously on MDSPs

Source: E. Lee and D. Messerschmitt, “Synchronous data flow”, Proceedings of the IEEE, 1987.

2

2

1

1

1 2 1 1

B

A C D E

Core1

Core2

Core3

x2

A B C C E A C +1 +1

B C +1 +1

D E +1 +1

D

Task parallelism Data parallelism Pipeline parallelism Internal parallelism

Data Pipeline Internal Task

Page 7: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

7

Multicore DSPs – Karol Desnos ([email protected])

• Lecture 1 – Maxime Pelcat

• Introduction to the course

• Applications for MDSPs

• Lecture 2 – Karol Desnos

• Languages and MoCs

• Programming MPSoCs

• Dataflow MoCs

• Lecture 3 – Maxime Pelcat

• Hardware Architectures

Course Outline

Page 8: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

8

Multicore DSPs – Karol Desnos ([email protected])

• Lecture 4 – Karol Desnos

• Theoretical Bounds

• Mapping/Scheduling Strategies

• Lecture 5 – Karol Desnos

• Lab Session

Course Outline

Page 9: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

9

Multicore DSPs – Karol Desnos ([email protected])

Theoretical Bounds

Page 10: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

10

Multicore DSPs – Karol Desnos ([email protected])

Amdahl’s Law • Developed in 1967 by Gene Amdahl

• A generic performance metric for applications

• Notations

• x: ratio of the code that is perfectly parallel, the rest is sequential

• N processing elements

• Speedup S refers to the acceleration brought by adding cores

• Formulation

• Ideal speedup for N PE:

𝑺 = 𝟏

𝟏−𝒙 +𝒙

𝑵

• Maximum achievable speedup :

𝑺𝒎𝒂𝒙 = lim𝑵→∞

𝑺 =𝟏

𝟏 − 𝒙

Theoretical Bounds

Page 11: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

11

Multicore DSPs – Karol Desnos ([email protected])

Amdahl’s Law

• Example: with 70% of parallel code

Theoretical Bounds

… …

… …

As many threads as we want for 70% of code

A single thread for 30% of code

Page 12: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

12

Multicore DSPs – Karol Desnos ([email protected])

Amdahl’s Law

• Example: with 70% of parallel code

• Speedup is limited to 1.0 on 1 core (no kidding !)

• Speedup is limited to 1.5 on 2 cores

• Speedup is limited to 2.1 on 4 cores

• Speedup is limited to 2.6 on 8 cores

• …

Theoretical Bounds

1

1,5

2

2,5

3

3,5

0 5 10 15 20 25 30 35

Max. speedup = 3.33

Page 13: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

13

Multicore DSPs – Karol Desnos ([email protected])

Amdahl’s Law

• Example:

• Max speedup of 5.0 for 80%

• Max speedup of 3.3 for 70%

• Max speedup of 2.5 for 60%

Theoretical Bounds

1

1,5

2

2,5

3

3,5

4

4,5

5

0 5 10 15 20 25 30 35

Page 14: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

14

Multicore DSPs – Karol Desnos ([email protected])

Amdahl’s Law

• Limitation:

• Inter-process communications are ignored

• No computation is perfectly parallel

• Amdahl’s law has brought many doubts on multicores

• Why add more cores if the parallelism of applications limits

speedups so much?

Theoretical Bounds

Page 15: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

15

Multicore DSPs – Karol Desnos ([email protected])

Gustafson’s Law • Developed in 1988 by John Gustafson

• Hypothesis: More cores imply more parallelism

• Sequential code latency remains constant regardless the number of PE

• Parallel code is increased (by the developer) to fit the number of PE

• Notations

• 𝑆 + 𝑃: Sequential time + Parallel time with 1 PE

• 𝑆 + 𝑵. 𝑃: Sequential time + Parallel time with N PE

• 𝒙 =𝑆

𝑺+𝑷: Ratio of sequential time over total time (/!\ ≠ Amdahl /!\)

• Formulation

• Ideal speedup for N PE:

𝑺𝒑𝒆𝒆𝒅𝒖𝒑 =𝑺 + 𝑵 ∙ 𝑷

𝑺 + 𝑷= ⋯ = 𝑵 − 𝒙 ∙ (𝑵 − 𝟏)

Theoretical Bounds

Page 16: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

16

Multicore DSPs – Karol Desnos ([email protected])

Gustafson’s Law

• Example: With 70% of parallel code

• Speedup is limited to 1.7 on 2 cores (Amdahl: 1.5)

• Speedup is limited to 3.1 on 4 cores (Amdahl: 2.1)

• Speedup is limited to 5.9 on 8 cores (Amdahl: 2.6)

Theoretical Bounds

1

5

9

13

17

21

25

29

0 5 10 15 20 25 30 35

Page 17: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

17

Multicore DSPs – Karol Desnos ([email protected])

Dataflow Speedup

• Maximum speedup is given by finding the critical path

• Data path whose sum of actor execution times is the largest

• Example (communications not considered): • Critical path length = 1 + 6 + 3 + 1 = 11 ms

• Total work = 23 ms

• Max speedup = 23 / 11 = 2.09

Theoretical Bounds

A 1ms

B 4ms

C 6ms

D 3ms

E 3ms

F 5ms

G 1ms

Page 18: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

18

Multicore DSPs – Karol Desnos ([email protected])

Dataflow Speedup

• PREESM Speedup Assessment Chart

• Evaluate quality of a schedule

Theoretical Bounds

A 1ms

B 4ms

C 6ms

D 3ms

E 3ms

F 5ms

G 1ms

1

2

3

Speedup

Number of PE 0

2 3 4 5 6 1 7

Critical Path length

Architecture Limit

Dummy Scheduling (Fast but far from optimal)

Page 19: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

19

Multicore DSPs – Karol Desnos ([email protected])

Dataflow Speedup

• Limitations of PREESM Speedup Assessment Chart

• Only latency is considered

• Software pipelining is not considered

– Example: New critical path: 1+6 / New max speedup = 3.8

• All cores are identical

• All communications have the same speed

Theoretical Bounds

A 1ms

B 4ms

C 6ms

D 3ms

E 3ms

F 5ms

G 1ms

Pipeline stage 1 Pipeline stage 2

Page 20: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

20

Multicore DSPs – Karol Desnos ([email protected])

Mapping/Scheduling

Strategies

Page 21: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

21

Multicore DSPs – Karol Desnos ([email protected])

• Heterogeneous Mapping/Scheduling Problem

• Heuristic Algorithms

• Load Balancing

• Runtime Systems

Mapping/Scheduling Strategies

Page 22: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

22

Multicore DSPs – Karol Desnos ([email protected])

Schedu-what ?

• Reminder

Mapping/Scheduling Strategies

Task1

Task2

Task3

Task5

Task6

Task7

Task4

Core1 Core2 PE1

Core1 Core2 PE1

Task1 Task2 Task3

Task5 Task6

Task7 Task4

Mapping

Tasks and

architecture

Core1 Core2 PE1

Task1 Task2

Task3

Task5

Task6

Task7

Task4

order order order

Scheduling Core1 Core2 PE1

Task1 Task2

Task3

Task5

Task6

Task7

Task4

time time time

Timing

Page 23: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

23

Multicore DSPs – Karol Desnos ([email protected])

Different Strategies • Choices can be made during compile or run time.

Mapping/Scheduling Strategies

Core1 Core2 PE1

Task1 Task2 Task3

Task5 Task6

Task7 Task4

Mapping Core1 Core2 PE1

Task1 Task2

Task3

Task5

Task6

Task7

Task4

order order order

Scheduling Core1 Core2 PE1

Task1 Task2

Task3

Task5

Task6

Task7

Task4

time time time

Timing

Source: E. Lee, “Scheduling Strategies for Multiprocessor real-time DSP”

Mapping Schedulin

g

Timing

fully dynamic run run run

static-

assignment

compile run run

self-timed compile compile run

fully static compile compile compile

Adaptivity++

Performance++

Page 24: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

24

Multicore DSPs – Karol Desnos ([email protected])

About Mapping, Scheduling, and Timing

• Part of “Operational Research” • How to organize a company

• How to organize a project (Gantt Chart, …)

• How to make decisions in general

• NP-Hard Problem • Verifying the validity of a solution to the problem can be computed

in polynomial time (eg. verifying that a schedule is valid).

• No polynomial time algorithm for solving NP-complete problems is

known (and it is likely that none exists.)

• When the problem grows (eg. number of cores or actors), solving it is

becoming more complex exponentially.

Mapping/Scheduling Strategies

Page 25: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

25

Multicore DSPs – Karol Desnos ([email protected])

About Mapping, Scheduling, and Timing

• Multicore scheduling is equivalent to quadratic

assignment NP-Hard problem • N facilities, each pair of facilities (f,g) associated to a flow of

communication

• N locations to put the facilities, each pair of locations (l,m)

associated to a distance

• Objective: Put each facility on a location and minimize traffic (i.e. the sum of the distances multiplied by the corresponding flows)

Mapping/Scheduling Strategies

f1 f2

f4

f3 5

5

5

5

1 l1 l2

l4

l3

5

8 6

5

3

Facilities Locations

Page 26: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

26

Multicore DSPs – Karol Desnos ([email protected])

About Mapping, Scheduling, and Timing

• Real problem is even more complex • M facilities (i.e. actors)

• N<M locations (i.e. cores) to put the facilities

• Heterogeneity: actors have different costs on different cores

• Objective is not only communication minimization

but also latency, throughput, memory, power…

Mapping/Scheduling Strategies

f1 f2

f4

f3 5

5

5

5

1 l1 l2

l4

l3

5

8 6

5

3

Facilities Locations

Page 27: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

27

Multicore DSPs – Karol Desnos ([email protected])

• Heterogeneous Mapping/Scheduling Problem

• Heuristic Algorithms

• Load Balancing

• Runtime Systems

Mapping/Scheduling Strategies

Page 28: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

28

Multicore DSPs – Karol Desnos ([email protected])

Heterogeneous Mapping/Scheduling

• Exact vs Heuristic Algorithms • Exact algorithms find the optimal solution (exponential time)

• Heuristics explore only parts of the given problem.

• Many heuristics exist • List scheduling, greedy scheduling

• FAST scheduling (Y.-K. Kwok)

• Hybrid flow-shop scheduling (J. Boutellier)

• Meta-heuristics (genetic algorithms, ant colonies…)

• …

• Quality of heuristic results can not be predicted • But models should contain enough information to make decisions

Mapping/Scheduling Strategies

Page 29: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

29

Multicore DSPs – Karol Desnos ([email protected])

Heterogeneous Mapping/Scheduling

• Several class of heuristic algorithms

Mapping/Scheduling Strategies

Problem Specific Generic Algorithms

Co

ns

tru

cti

ve

It

era

tive

Source: Z. Peng, lecture notes of “Computer aided design of electronics”, LiU

• List scheduling

• Greedy scheduling

• Hybrid flow-shop

• Divide and conquer

• Branch and bound

• Integer Linear Programming

• Genetic Algorithms

• Simulated annealing

• Ant colony

• FAST scheduling

Page 30: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

30

Multicore DSPs – Karol Desnos ([email protected])

List Scheduling Algorithm

1. Create a list of actors sorted in: • Topological order (i.e. data dependency order)

• When equivalent, secondary sorting criteria is used:

longest execution time, critical path before last task, …

Mapping/Scheduling Strategies

A 1ms

B 4ms

C 6ms

D 3ms

E 3ms

F 5ms

G 1ms

Longest Execution time

A 1ms

E 3ms

C 6ms

B 4ms

D 3ms

F 5ms

G 1ms

Longest Critical Path

A 1ms

D 3ms

C 6ms

B 4ms

F 5ms

E 3ms

G 1ms

Page 31: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

31

Multicore DSPs – Karol Desnos ([email protected])

List Scheduling Algorithm

2. Map and schedule actors to the first available PE:

Mapping/Scheduling Strategies

Longest Execution time

Core1

Core2

A

C

B

D

F

E

G

A 1ms

E 3ms

C 6ms

B 4ms

D 3ms

F 5ms

G 1ms

Longest Critical Path

A 1ms

D 3ms

C 6ms

B 4ms

F 5ms

E 3ms

G 1ms

A 1ms

B 4ms

C 6ms

D 3ms

E 3ms

F 5ms

G 1ms

Core1

Core2

A

C

B

D

E

F

G

With longest execution time

14ms

Page 32: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

32

Multicore DSPs – Karol Desnos ([email protected])

Heterogeneous List Scheduling Algorithm

• Core that can finishes actor execution first wins • For heterogeneous targets

Mapping/Scheduling Strategies

A 2ms

3ms

B 4ms

8ms

C 5ms

2ms

E 3ms

4ms

D 2ms

1ms

F 8ms

4ms

.5ms

A tCPU

tDSP

tAcc.

CPU

DSP

A

B

C

E

D

Acc. F

Longest (shortest exec. time) order: A C F B E D

8ms

Page 33: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

33

Multicore DSPs – Karol Desnos ([email protected])

Heterogeneous List Scheduling Algorithm

• Scheduling order is important

• An optimal order always exists

• Try all orders (exhaustive search to find the optimal

Mapping/Scheduling Strategies

A 2ms

3ms

B 4ms

8ms

C 5ms

2ms

E 3ms

4ms

D 2ms

1ms

F 8ms

4ms

.5ms

CPU

DSP

A

B

C

D

Acc. F

Topological order: A B C D E F

E

9ms

Page 34: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

34

Multicore DSPs – Karol Desnos ([email protected])

FAST Iterative Heuristic

1. Create an initial solution with list scheduling

2. Iteratively 1. Select a random actor from the critical path

2. Change its mapping

3. Reschedule and evaluate the resulting latency

3. Keep the best result

Mapping/Scheduling Strategies

CPU

DSP

A

B

C

D

Acc. F

E

9ms

A 2ms

3ms

B 4ms

8ms

C 5ms

2ms

E 3ms

4ms

D 2ms

1ms

F 8ms

4ms

.5ms

CPU

DSP

A

B

C

D

Acc. F

E

9ms

CPU

DSP

A

B

C

D

Acc. F

E

8ms

Page 35: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

35

Multicore DSPs – Karol Desnos ([email protected])

Genetic Iterative Heuristic

1. Create an pool of solutions with list/fast scheduling

• Each solution is represented by its ordered list

2. Iteratively 1. Discard the worst solutions

2. Produce new solutions using cross-over and mutation

3. Reschedule and evaluate the resulting latency

3. Keep the best result

Mapping/Scheduling Strategies

Mutation

A

D

C

B

F

E

A

D

C

B

E

F

A 2ms

3ms

B 4ms

8ms

C 5ms

2ms

E 3ms

4ms

D 2ms

1ms

F 8ms

4ms

.5ms

Cross-over

A

D

C

B

F

E

A

D

B

C

E

F

A

D

C

B

F

E

A

D

B

C

E

F

Page 36: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

36

Multicore DSPs – Karol Desnos ([email protected])

Scheduling under multiple constraints

• Example with latency and power

Mapping/Scheduling Strategies

A 2ms

3ms

B 4ms

8ms

C 5ms

2ms

E 3ms

4ms

D 2ms

1ms

F 8ms

4ms

.5ms

CPU

DSP

A

B

C

D

Acc. F

E

8ms – 11.1J

A 2J

1.5J

B 4J

4J

C 5J

1J

E 3J

2J

D 2J

0.5J

F 10J

2.5J

0.1J

Time Power

CPU

DSP A

B

C

D

Acc. F

E

10ms – 9.1J

Page 37: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

37

Multicore DSPs – Karol Desnos ([email protected])

• Heterogeneous Mapping/Scheduling Problem

• Heuristic Algorithms

• Load Balancing

• Runtime Systems

Mapping/Scheduling Strategies

Page 38: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

38

Multicore DSPs – Karol Desnos ([email protected])

Load Balancing

Mapping/Scheduling Strategies

CPU

DSP

A

B

C

D

Acc. F

E

8ms – 11.1J

CPU

DSP A

B

C

D

Acc. F

E

10ms – 9.1J

8ms – 8J

6ms – 3J

0.5ms – 0.1J

4ms – 4J

10ms – 5J

0.5ms – 0.1J

Unbalanced power

consumpion

Unbalanced computational

load

Page 39: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

39

Multicore DSPs – Karol Desnos ([email protected])

Load Balancing Strategies

• With total predictability

(i.e. known number of tasks – SDF-like)

• Decentralized static decision

• No adaptivity to algorithm

modifications

• No decision overhead

• Self-timed execution

Mapping/Scheduling Strategies

Core 2

Thread/Core 1

Decentralized (Preesm, SynDEx)

Actor A

Actor B

Actor D time

Actor E

Actor C

Core 2

Actor C

Page 40: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

40

Multicore DSPs – Karol Desnos ([email protected])

Load Balancing Strategies

• With high predictability

(i.e. reconfigurable tasks)

• Master/Slave

• Adaptivity to algorithm variations

• Master core can become a bottleneck

Mapping/Scheduling Strategies

Thread/Core 2

Master/Slave (Spider Runtime)

Master Operator

Multicore Runtime

Core 1

assigns

finished

Dequeue Actor

Process Actor

Signals finished

Page 41: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

41

Multicore DSPs – Karol Desnos ([email protected])

Load Balancing Strategies

• Without predictability (i.e. highly dynamic number of tasks)

• Work Queueing

• Implemented over multi-threading

• Great freedom in thread creation

• The shared task queue becomes

the bottleneck

Mapping/Scheduling Strategies

Work-queueing (Apple Grand Central Dispatch, OpenMP)

Thread/Core 1

Core 2 pops

pushes

pops

pushes

Dequeue Task

Process Task

Enqueue Task(s)

Page 42: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

42

Multicore DSPs – Karol Desnos ([email protected])

Load Balancing Strategies

• Without predictability (i.e. highly dynamic number of tasks)

• Job Stealing

• One task queue per core:

No more bottleneck

• Hard to predict performance

Mapping/Scheduling Strategies

Job stealing (Cilk, Intel Threading Building Blocks)

Thread/Core 1

pops

Core 2

pops

Core 3

pops steals

pushes pushes pushes

Dequeue Task

Process Task

Enqueue Task(s)

Page 43: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

43

Multicore DSPs – Karol Desnos ([email protected])

• Heterogeneous Mapping/Scheduling Problem

• Heuristic Algorithms

• Load Balancing

• Runtime Systems

Mapping/Scheduling Strategies

Page 44: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

44

Multicore DSPs – Karol Desnos ([email protected])

Runtime systems • Role of Runtime Systems for dynamic adaptation

Mapping/Scheduling Strategies

Multicore Compiler

Simulator

+ Debugger

+ Profiler

Algorithm

Architecture

Portable Multicore Program

PE

Main

Proc.

Main

Proc.

Main

Proc.

Main

Proc.

PE PE PE PE

Peripherals

Main

Memory

Multicore Runtime

Page 45: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

45

Multicore DSPs – Karol Desnos ([email protected])

Runtime Systems

• Role of Runtime Systems for dynamic adaptation • A runtime system is the extension of an operating system for

distributed hardware

• 2 types of multiprocessing management systems: • AMP (Asymmetric Multiprocessing)

• complexity not masked each core has its own OS, e.g. SYS/BIOS, no runtime

• SMP (Symmetric Multiprocessing) • simulating a unique core an OS controls the whole architecture, e.g. SMP Linux

• Limited notions, compilers and runtime systems are now combined

• For instance, OpenMP is based on a runtime system to dispatch

threads

Mapping/Scheduling Strategies

Page 46: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

46

Multicore DSPs – Karol Desnos ([email protected])

Runtime Systems

• The industry develops new runtime systems

• Apple Grand Central Dispatch

• Intel Threading Building Blocks

• Texas Instruments Open Event Machine

• They are based on task and data synchronization

descriptions

• The semantics are getting close to dataflow

• Several runtime systems are experimented at IETR

• Based on dataflow algorithm descriptions

• Spider: Synchronous Parameterized and Interfaced Dataflow

Embedded Runtime

Mapping/Scheduling Strategies

Page 47: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

47

Multicore DSPs – Karol Desnos ([email protected])

Runtime Systems

Mapping/Scheduling Strategies

Symmetric Phase Master/Slave Phase

Dataflow Management

Template graph

Parameter Values

uC-OS/II Tasks

uC-OS/II based RTOS Scheduling

Core

Core

Core

Core

CLK

Page 48: Multicore Digital Signal Processing...10 Multicore DSPs – Karol Desnos (kdesnos@insa-rennes.fr) Amdahl’s Law •Developed in 1967 by Gene Amdahl •A generic performance metric

48

Multicore DSPs – Karol Desnos ([email protected])

General Conclusion

• Applications and architectures are increasingly complex

• Model-based system design helps at several design stages

• To evaluate languages/models: focus on MoC

• MoCs offer « pure » semantics, free of syntax

• No one-fit-all solution to design Multicore DSP systems

• Many solutions exist now, complex choices have to be made