Distributed Operation Layer:Efficient and Predictable KPN-Based Design Flow
Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele
ETH Zürich, Switzerland
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
Efficiency vs. Predictability?
Efficiency is… … speed-up
… scalability
… small memory
… portability
… small effort
2
Distributed Operation Layer (DOL):
efficient and predictable
system-level MPSoC design flow
Predictability is… … analyzability
… guarantees
… fast estimates
… good estimates
… early in design
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 3
Distributed Operation Layer
Reduce “accidental complexity” in design byraising the level of abstraction and automation
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 4
Distributed Operation Layer
System specificationabstract MoC (KPN) vs. BSP
Performance analysissystem-level (formal) analysis vs. complete system simulation
Design space explorationautomated system-level exploration vs. trial-and-error
(Software) synthesisautomated synthesis on various MPSoCs(possible due to formal MoC)
Reduce “accidental complexity” in design byraising the level of abstraction and automation
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 5
Outline
Introduction
Distributed operation layer design flow Specification
Synthesis
Design space exploration
Performance analysis
Some experimental results
Conclusions
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 6
DOL Software System-Level Design FlowGoals Efficiency Predictability
Challenges Scalable specification Automated synthesis System-level design
space exploration Analytic performance
evaluation
Strengths Abstraction Automation
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 7
System Specification
Roles Express data and functional
parallelism in application Specify mapping of application
on target architecture
Challenges Scalability Platform-independence
formal MoC– basis for efficient and predictable design
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 8
Programming Model
Model of computation: Kahn process network Coordination: XML with performance annotations
Functionality: C/C++ with specific programming DOL API
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 9
Programming Model – Scalability
Scalability: “iterators” for large, multi-tile descriptions
01: <process name="src">
02: <port type="output" name="out"/>
03: <source type="c" location="src.c"/>
04: </process>
01: <iterator variable="i" range="N">
02: <process name="src">
03: <append function="i"/>
04: <port type="output" name="out"/>
05: <source type="c" location="src.c"/>
06: </process>
07: </iterator>
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 10
Abstract Platform Modeling
Elements Structure: processors, peripherals, memories, buses, etc.
Interconnect: explicit read and write communication paths
Performance data: e.g. latency and bandwidth of HW communication
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 11
Abstract Platform – Scalability
Specification: XML, including “iterators” capability
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 12
Mapping Specification
Scheduling
Constraints
Mapping
Binding Processes to processors
SW channels to HW paths
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 13
System Synthesis
Role Close the gap between
system-level specification and implementation
Challenges Achieve desired performance Handle deadlocks,
starvation, and data races Preserve KPN semantics
automatic software synthesis – essential for efficient design
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
DOL Synthesis
Synthesis Functional synthesis
SystemC untimed, native
execution model generation
Software synthesis
HdS generation for MPARM,
Atmel DIOPSIS, CELL
Strategy Source-to-source code generators from DOL KPN to
implementation
Automatic generation of “glue code”: processes and
channels implementation, bootstrapping, and scheduling
14
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 15
Functional Synthesis
Synthesis DOL processes and FIFOs: SystemC threads and channels SystemC main file: bootstrapping and scheduling
Features Execution: native, un-timed Debugging: standard tools, i.e., gdb Performance data extraction: monitor READ/WRITE/FIRE
Automatic synthesis of DOL KPN in functional SystemC
sc thread
sc channel sc channel
sc
port
sc
port
P2.fire()
sc threadsc
port
P1.fire()
sc threadsc
port
P3.fire()
scheduler
write() read()
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 16
DOL Software Synthesis MPARM: multi-ARM tiles connected
by NoC Atmel Diopsis 940: tile:ARM9+DSP
connected by an AMBA bus; several tiles connected via NoC
Cell BE: PowerPC and 8 SPEs connected via ring bus
MemoryPPE
MIC
Main storage
L2 Cache
PPU
L1 Cache
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SPU
LS
MFC
SP
E
Element interconnect bus (EIB)
Legend:
LS: Local Store
MFC: Memory Flow Controller
MIC: Memory Interface Controller
PPE: Power Processor Element
PPU: Power Processor Unit
SPE: Synergistic Processor Elements
SPU: Synergistic Processor Unit
tiletile
ARM
coreSP
x-bar
DRAM
ctrl
NI
switchswitch switch
tile
NoC
16CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 17
Design Space Exploration
Role Find Pareto-optimal mappings
of an application on target architecture
Challenges Multiple contradictory
objectives Exhaustive search not feasible Instruction-accurate simulation
too slow for design space exploration
system-level automated design space exploration – the key element of an efficient design
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 18
Mapping Optimization Framework
Control & GUI: EXPO - https://www.tik.ee.ethz.ch/expo tool to explore the design space for network processor architectures
Interface: PISA - https://www.tik.ee.ethz.ch/pisa Platform and language independent Interface for Search Algorithms
SPEA2 (Strength Pareto
Evolutionary Algorithm)
MPA (Modular
Performance Analysis)
http://www.mpa.ethz.ch
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 19
EXPO-PISA Illustration
0 2 4 6 8 10 12 14 16 18
2
4
6
8
10
12
14
16
18
20
max. processor load
max. bus load
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 20
Performance Analysis
Roles Feedback for developer Verification of single
designs Decision basis for design
space exploration
Challenges Accuracy Speed
formal performance analysis – the key element of a predictable design
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 21
DOL Performance Analysis
Goal: design real-time
systems (multi-media,
signal processing)
Method:
Modular Performance
Analysis (MPA)http://www.mpa.ethz.ch
Challenge: integrate
MPA in DOL
Generate MPA model
from high-level spec
Calibrate MPA model
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
design
space
exploration
mapping
specification
(XML)
application
specification
(XML & C)
functional
simulation
generation
simulation on
workstation
system
synthesis (HdS
generation)
simulation on
virtual platform
evaluation on
workstation
architecture
specification
(XML)
MPA analysis
model
generation
ca
lib
rati
on
da
ta b
ac
k-a
nn
ota
tio
n
performance data
tes
t &
de
bu
g
#(e
ve
nts
)
Δ
design
space
exploration #e
ve
nts
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
Modular Performance Analysis (MPA)*
Model based on Network Calculus modeling streams and
resources based on arrival and service curves
Output worst-case bounds on
system properties
(Large) MPSoC extensions complex activation schemes,
timing correlations, blocking semantics, cyclic dependencies
22
Resources
Streams bRISC bBUS bDSP
P1 FIFO1 P2
b’RISC b’DSP
FIFO2
b’BUS
P3a’
a
*http://www.mpa.ethz.ch
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
Modeling in MPA
23
intra-processor
communication
inter-processor
communication
process
complex
computation
modeling
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 24
MPA Model Generation
Automatic MPA model generation in 2 steps Framework-
independent model (XML format)
Framework-specific model (Matlab script)
Challenges Relation betw. DOL
spec and MPA model Sequential evaluation
of parallel MPA model Accurate parameters
24CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 25
MPA Model Calibration Goal: collect accurate performance data from simulation
Problem: too slow during design space exploration
Strategy: collect parameters beforehand, with “calibration
mappings”
25CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 26
… A Few Results
bus
ARM tile NARM tile 1
ARM
core
scratchpad
memory
DMA
controller
MMS
ARM
core
scratchpad
memory
MMS
DMA
controller
instruction
and data
memory
instruction
and data
memory
executing MJPEG decoder on MPARM*
*MPARM - virtual simulation platform of U. Bologna
(optimal)
mapping
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 27
Design Space Exploration
Set-up
PISA* and EXPO* (SPEA2)
Objectives1. end-to-end delay
(upper bound in MPA)
2. cost (additive model)
Population
60 individuals
x 50 generations
Pareto front 6 solutions
Search time ~2 hours
1 proc.
3 procs.
4 procs.en
d-t
o-e
nd
dela
y
cost
2 procs.
current population
*EXPO - https://www.tik.ee.ethz.ch/expo
*PISA - https://www.tik.ee.ethz.ch/pisa
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 28
Performance Analysismapping MJPEG decoder on 3-tile MPARM
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 29
… Some Performance Figures: Speed
Model calibration: time-expensive (usual for all flows) cannot be included in the design space exploration loop
Model generation and performance analysis in MPA: sec. reasonable for design space exploration
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 30
… Some Performance Figures: Accuracy
Differences: ~ 20% some MPA operators do not produce tight bounds simulation cannot provide actual worst/best-case behavior
…but system model and underlying architecture are well suited for analyzing this application!
Observed (simulation) Estimated bounds (MPA)
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 31
… Some More Performance Figures
The DOL framework is mainly implemented in Java
(available at http://www.tik.ee.ethz.ch/~shapes)
Code size of different parts of the design flow:
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 32
Conclusions
“Accidental complexity” can be considerably reduced,
resulting in a both efficient and predictable design
flow by …using a fixed MoC (KPN) (vs. BSP approaches)
…formal performance analysis (vs. simulation)
…automated, system-level design space exploration (vs.
ad-hoc, manual techniques that include synthesis)
Complete SW design flow (specification, synthesis,
design space exploration, performance analysis)
available: http://www.tik.ee.ethz.ch/~shapes
CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 33
http://www.tik.ee.ethz.ch/~shapes