castness’11, rome - 17th–18th jan....
TRANSCRIPT
CASTNESS’11, ROME - 17th–18th Jan. 2011
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
EURETILE in a Nutshell
European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:
HP numerical and DSP many-tile programming
environment: scalability, dynamism, efficiency
fault-tolerance: system-level
dynamic applications
many-tile hardware
dynamic mapping
fault
high-temperature
2
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
EURETILE in a Nutshell
many-tile hardware
dynamic mapping
fault
high-temperature
dynamic applications
European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:
HP numerical and DSP many-tile programming
environment: scalability, dynamism, efficiency
fault-tolerance: system-level
3
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
DOL* Programming Model Kahn process network MoC
expressiveness: static, single (stream-oriented) applications
Mapping (incl. scheduling) expressiveness: static mapping
optimized at design-time
Distributed memory architecture
DNP
RISC
DSP/ASIP
MEM
***
***
Mapping
MPSoC Architecture
*DOL – distributed operation layerhttp://www.tik.ee.ethz.ch/~shapes
4
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
scenario2
How to Specify Dynamic Applications?
scenario1
scenario3 scenario4
scenario1
scenario4
scenario2
scenario3
t1
t2
t3t4
t5
t6
t7 t8
t9
t10
app #1 app #2 app #3
Hierarchical spec top-level: scenarios FSM bottom-level: DOL KPN control: API,
special proc.
predictability
scalability
5
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
DAL API Provided by Programmer/HdS
01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure
01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure10:11: procedure DAL_FINISH(Process p)12: //clean up, free mem, etc.13: end procedure
01: procedure DAL_INIT(Process p)02: //initialize process state03: end procedure04: 05: procedure DAL_FIRE(Process p)06: DAL_READ(input, size, buf)07: //processing08: DAL_WRITE(output, size, buf)09: end procedure10:11: procedure DAL_FINISH(Process p)12: //clean up, free mem, etc.13: end procedure14:15: procedure DAL_SAVE(Process p)16: //save current process status17: end procedure18:19: procedure DAL_RESTORE(Process p)20: //restore process status21: end procedure
01: procedure DAL_START(Application A)02: //attach all processes to schedulers03: //call DAL_INIT for all processes04: //call repeatedly DAL_FIRE for all processes05: end procedure06: 07: procedure DAL_STOP(Application A)08: //permanently detach all processes from schedulers09: //call DAL_FINISH for all processes10: end procedure
01: procedure DAL_START(Application A)02: //attach all processes to schedulers03: //call DAL_INIT for all processes04: //call repeatedly DAL_FIRE for all processes05: end procedure06: 07: procedure DAL_STOP(Application A)08: //permanently detach all processes from schedulers09: //call DAL_FINISH for all processes10: end procedure11: 12: procedure DAL_PAUSE(Application A)13: //temporarily detach all processes from schedulers14: //call DAL_SAVE for all processes15: end procedure16: 17: procedure DAL_RESUME(Application A)18: //re-attach processes to schedulers19: //call DAL_RESTORE for all processes20: end procedure
scenario1 scenario2
6
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
How to Implement Dynamic Applications?
scalable solution: hierarchically centralized control clusters divide many-cores local controller controls cluster main controller controls local
controllers
Control: centralized vs. distributed? centralized control
control tile(s): manage dynamic execution
“slave” tiles: execute (statically) mapped applications
communication: system command lines
global controller local controllers
7
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Centralized Controller Implementation
Centralized control mechanisms in control tile
- FSM for task control and migration- control API
in HdS of “slaves”- scheduling and FIFO management- “implicit” control tasks
8
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
EURETILE in a Nutshell
many-tile hardware
dynamic mapping
fault
high-temperature
dynamic applications
European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:
HP numerical and DSP many-tile programming
environment: scalability, dynamism, efficiency
fault-tolerance: system-level
9
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Run-Time Mapping of Dynamic ApplicationsIdea: each “application scenario” linked to one (or several) mapping(s)
scenario1 scenario2
t1
t2
Core
N
f Core
N
f
Core
N
f Core
N
f
Core
N
f Core
N
f
Core
N
f Core
N
f
scenario3
t3t4
t5t6
Core
N
f Core
N
f
Core
N
f Core
N
f
Trade-offs mapping optimality
vs. remapping cost
10
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Core
N
f Core
N
f
Core
N
f Core
N
f
Run-Time Mapping of Dynamic Applications
Trade-offs mapping optimality
vs. remapping cost
Remapping is needed! … necessary to
accommodate new resource-demanding applications
… useful for overall optimality
… necessary for system-level fault tolerance
scenario1 scenario2
t1
t2
Core
N
f Core
N
f
Core
N
f Core
N
f
Core
N
f Core
N
f
Core
N
f Core
N
f
scenario3
t3t4
t5t6
Idea: each “application scenario” linked to one (or several) mapping(s)
11
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
EURETILE in a Nutshell
many-tile hardware
dynamic mapping
fault
high-temperature
dynamic applications
European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:
HP numerical and DSP many-tile programming
environment: scalability, dynamism, efficiency
fault-tolerance: system-level
faultfault
high-temperature
12
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Faults in EURETILE
DNP
RISC
RISC
MEM
***
***
fault
faultfault
HOT!
13
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Fault-Tolerance at System-Level in DAL
remapping relocate applications/processes on faulty
tiles/cores … (mostly) at run-time
14
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Fault! How to Proceed?
Shutdown/Restart Reconfiguration1) entire system shuts down
2) remap with fault info
3) restart the system
1) stop processes on faulty proc
simple procedureall contexts lost
all contexts preserveddifficult to analyze/predict
2) remap w/ context
3) restart processes
VS.
fault
15
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Fault-Tolerance via Dynamic Remapping
Dynamic remappingfeasible even in a busy system
composable scheduling needed; schedulabilityat run-time
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
fault
16
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Fault-Tolerance via Static Remapping
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
DNP
RISC
RISC
MEM
***
***
Static remappingmove entire mapping to empty cluster
- easy to determine (mapping topology preserved) - no analysis needed (only reconfiguration overhead)
over-provisioningfault
17
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Runtime Remapping Decisionsscenario
(re-)mapping
#empty processors in
cluster>0
#empty processors on ≠clusters>0
(semi-dynamic analysis)
Y
N
Y
remapping strategy (implemented in controller)
system faultsperformance data:
resource utilization, app. requirements
fault detected
schedulable system?
intra-cluster remapping Y
N
inter-cluster static
remapping
(no analysis needed)
(complete or semi-dynamic analysis)
inter-clusterfully dynamic
remapping schedulable system?
Y
N
run at lower performance OR stop applications OR shutdown
Remapping compact
(computation) guarantee
performance, constraints
Fault recovery1. shutdown and
restart2. reconfiguration
Remapping1. static2. semi-dynamic3. dynamic
18
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
Fault-Tolerance at System-Level in DAL
remapping relocate applications/processes on faulty
tiles/cores … (mostly) at run-time
system-level analysis provide guarantees, e.g., guarantee real-time
and temperature constraints … at design-time
19
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
EURETILE in a Nutshell
many-tile hardware
dynamic mapping
fault
high-temperature
dynamic applications
European Reference Experimental Platform for Brain-Inspired Many-Tile Architectures HW prototype: 64 tiles/128 cores multiple, dynamic applications:
HP numerical and DSP many-tile programming
environment: scalability, dynamism, efficiency
fault-tolerance: system-level
fault
20
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
DAL in EURETILE – Vision
performance analysis (off-line)
evaluations on workstation
HdS task migration
reconfiguration
multi-mapping specification
(off-line)
systemsynthesis
(HdS generation)
simulation/VPor
emulation/HW
functional simulation generation
simulation onworkstations
on/off-line mapping spec.
specification of app scenarios &
coordination FSM
architecture specification
dynamic mapping adaption (on-line)
test
&de
bug
calib
ratio
n da
ta b
ack-
anno
tatio
n
performance datadesign space
exploration of multi- applications/mappings
21
CASTNESS’11 – Rome, 17-18 Jan. 2011Iuliana Bacivarov, ETH Zürich
DAL in EURETILE – Vision
performance analysis (off-line)
evaluations on workstation
HdS task migration
reconfiguration
multi-mapping specification
(off-line)
systemsynthesis
(HdS generation)
simulation/VPor
emulation/HW
functional simulation generation
simulation onworkstations
on/off-line mapping spec.
specification of app scenarios &
coordination FSM
architecture specification
dynamic mapping adaption (on-line)
test
&de
bug
calib
ratio
n da
ta b
ack-
anno
tatio
n
performance datadesign space
exploration of multi- applications/mappings
fault-tolerant remapping
dynamic mapping adaption to fault
(on-line)
HdS task migration
fault-tolerance aware
performance analysis (off-line)
reconfiguration evaluations on workstation
HdSadaption
simulation/VPor
emulation/HW
functional simulation of faults
simulation onworkstations
fault
22
[email protected]://euretile.roma1.infn.it
Thank You!Questions?