castness’11, rome - 17th–18th jan....

23
CASTNESS’11, ROME - 17th–18th Jan. 2011

Upload: others

Post on 20-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11, ROME - 17th–18th Jan. 2011

Page 2: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

EURETILE in a Nutshell

European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:

HP numerical and DSP many-tile programming

environment: scalability, dynamism, efficiency

fault-tolerance: system-level

dynamic applications

many-tile hardware

dynamic mapping

fault

high-temperature

2

Page 3: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

EURETILE in a Nutshell

many-tile hardware

dynamic mapping

fault

high-temperature

dynamic applications

European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:

HP numerical and DSP many-tile programming

environment: scalability, dynamism, efficiency

fault-tolerance: system-level

3

Page 4: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

DOL* Programming Model Kahn process network MoC

expressiveness: static, single (stream-oriented) applications

Mapping (incl. scheduling) expressiveness: static mapping

optimized at design-time

Distributed memory architecture

DNP

RISC

DSP/ASIP

MEM

***

***

Mapping

MPSoC Architecture

*DOL – distributed operation layerhttp://www.tik.ee.ethz.ch/~shapes

4

Page 5: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

scenario2

How to Specify Dynamic Applications?

scenario1

scenario3 scenario4

scenario1

scenario4

scenario2

scenario3

t1

t2

t3t4

t5

t6

t7 t8

t9

t10

app #1 app #2 app #3

Hierarchical spec top-level: scenarios FSM bottom-level: DOL KPN control: API,

special proc.

predictability

scalability

5

Page 6: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

DAL API Provided by Programmer/HdS

01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure

01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure10:11: procedure DAL_FINISH(Process p)12: //clean up, free mem, etc.13: end procedure

01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure10:11: procedure DAL_FINISH(Process p)12: //clean up, free mem, etc.13: end procedure14:15: procedure DAL_SAVE(Process p)16: //save current process status17: end procedure18:19: procedure DAL_RESTORE(Process p)20: //restore process status21: end procedure

01: procedure DAL_START(Application A)02: //attach all processes to schedulers03: //call DAL_INIT for all processes04: //call repeatedly DAL_FIRE for all processes05: end procedure06: 07: procedure DAL_STOP(Application A)08: //permanently detach all processes from schedulers09: //call DAL_FINISH for all processes10: end procedure

01: procedure DAL_START(Application A)02: //attach all processes to schedulers03: //call DAL_INIT for all processes04: //call repeatedly DAL_FIRE for all processes05: end procedure06: 07: procedure DAL_STOP(Application A)08: //permanently detach all processes from schedulers09: //call DAL_FINISH for all processes10: end procedure11: 12: procedure DAL_PAUSE(Application A)13: //temporarily detach all processes from schedulers14: //call DAL_SAVE for all processes15: end procedure16: 17: procedure DAL_RESUME(Application A)18: //re-attach processes to schedulers19: //call DAL_RESTORE for all processes20: end procedure

scenario1 scenario2

6

Page 7: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

How to Implement Dynamic Applications?

scalable solution: hierarchically centralized control clusters divide many-cores local controller controls cluster main controller controls local

controllers

Control: centralized vs. distributed? centralized control

control tile(s): manage dynamic execution

“slave” tiles: execute (statically) mapped applications

communication: system command lines

global controller local controllers

7

Page 8: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Centralized Controller Implementation

Centralized control mechanisms in control tile

- FSM for task control and migration- control API

in HdS of “slaves”- scheduling and FIFO management- “implicit” control tasks

8

Page 9: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

EURETILE in a Nutshell

many-tile hardware

dynamic mapping

fault

high-temperature

dynamic applications

European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:

HP numerical and DSP many-tile programming

environment: scalability, dynamism, efficiency

fault-tolerance: system-level

9

Page 10: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Run-Time Mapping of Dynamic ApplicationsIdea: each “application scenario” linked to one (or several) mapping(s)

scenario1 scenario2

t1

t2

Core

N

f Core

N

f

Core

N

f Core

N

f

Core

N

f Core

N

f

Core

N

f Core

N

f

scenario3

t3t4

t5t6

Core

N

f Core

N

f

Core

N

f Core

N

f

Trade-offs mapping optimality

vs. remapping cost

10

Page 11: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Core

N

f Core

N

f

Core

N

f Core

N

f

Run-Time Mapping of Dynamic Applications

Trade-offs mapping optimality

vs. remapping cost

Remapping is needed! … necessary to

accommodate new resource-demanding applications

… useful for overall optimality

… necessary for system-level fault tolerance

scenario1 scenario2

t1

t2

Core

N

f Core

N

f

Core

N

f Core

N

f

Core

N

f Core

N

f

Core

N

f Core

N

f

scenario3

t3t4

t5t6

Idea: each “application scenario” linked to one (or several) mapping(s)

11

Page 12: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

EURETILE in a Nutshell

many-tile hardware

dynamic mapping

fault

high-temperature

dynamic applications

European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:

HP numerical and DSP many-tile programming

environment: scalability, dynamism, efficiency

fault-tolerance: system-level

faultfault

high-temperature

12

Page 13: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Faults in EURETILE

DNP

RISC

RISC

MEM

***

***

fault

faultfault

HOT!

13

Page 14: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Fault-Tolerance at System-Level in DAL

remapping relocate applications/processes on faulty

tiles/cores … (mostly) at run-time

14

Page 15: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Fault! How to Proceed?

Shutdown/Restart Reconfiguration1) entire system shuts down

2) remap with fault info

3) restart the system

1) stop processes on faulty proc

simple procedureall contexts lost

all contexts preserveddifficult to analyze/predict

2) remap w/ context

3) restart processes

VS.

fault

15

Page 16: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Fault-Tolerance via Dynamic Remapping

Dynamic remappingfeasible even in a busy system

composable scheduling needed; schedulabilityat run-time

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

fault

16

Page 17: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Fault-Tolerance via Static Remapping

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

DNP

RISC

RISC

MEM

***

***

Static remappingmove entire mapping to empty cluster

- easy to determine (mapping topology preserved) - no analysis needed (only reconfiguration overhead)

over-provisioningfault

17

Page 18: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Runtime Remapping Decisionsscenario

(re-)mapping

#empty processors in

cluster>0

#empty processors on ≠clusters>0

(semi-dynamic analysis)

Y

N

Y

remapping strategy (implemented in controller)

system faultsperformance data:

resource utilization, app. requirements

fault detected

schedulable system?

intra-cluster remapping Y

N

inter-cluster static

remapping

(no analysis needed)

(complete or semi-dynamic analysis)

inter-clusterfully dynamic

remapping schedulable system?

Y

N

run at lower performance OR stop applications OR shutdown

Remapping compact

(computation) guarantee

performance, constraints

Fault recovery1. shutdown and

restart2. reconfiguration

Remapping1. static2. semi-dynamic3. dynamic

18

Page 19: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

Fault-Tolerance at System-Level in DAL

remapping relocate applications/processes on faulty

tiles/cores … (mostly) at run-time

system-level analysis provide guarantees, e.g., guarantee real-time

and temperature constraints … at design-time

19

Page 20: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

EURETILE in a Nutshell

many-tile hardware

dynamic mapping

fault

high-temperature

dynamic applications

European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:

HP numerical and DSP many-tile programming

environment: scalability, dynamism, efficiency

fault-tolerance: system-level

fault

20

Page 21: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

DAL in EURETILE – Vision

performance analysis (off-line)

evaluations on workstation

HdS task migration

reconfiguration

multi-mapping specification

(off-line)

systemsynthesis

(HdS generation)

simulation/VPor

emulation/HW

functional simulation generation

simulation onworkstations

on/off-line mapping spec.

specification of app scenarios &

coordination FSM

architecture specification

dynamic mapping adaption (on-line)

test

&de

bug

calib

ratio

n da

ta b

ack-

anno

tatio

n

performance datadesign space

exploration of multi- applications/mappings

21

Page 22: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich

DAL in EURETILE – Vision

performance analysis (off-line)

evaluations on workstation

HdS task migration

reconfiguration

multi-mapping specification

(off-line)

systemsynthesis

(HdS generation)

simulation/VPor

emulation/HW

functional simulation generation

simulation onworkstations

on/off-line mapping spec.

specification of app scenarios &

coordination FSM

architecture specification

dynamic mapping adaption (on-line)

test

&de

bug

calib

ratio

n da

ta b

ack-

anno

tatio

n

performance datadesign space

exploration of multi- applications/mappings

fault-tolerant remapping

dynamic mapping adaption to fault

(on-line)

HdS task migration

fault-tolerance aware

performance analysis (off-line)

reconfiguration evaluations on workstation

HdSadaption

simulation/VPor

emulation/HW

functional simulation of faults

simulation onworkstations

fault

22

Page 23: CASTNESS’11, ROME - 17th–18th Jan. 2011euretile.roma1.infn.it/mediawiki/img_auth.php/8/88/EURETILE-2-Iulian... · CASTNESS’11 – Rome, 17-18 Jan. 2011. DOL* Programming Model

[email protected]://euretile.roma1.infn.it

Thank You!Questions?