gajski vahid book ss slides
TRANSCRIPT
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong1 of 214
SPECIFICATION AND DESIGNOF
EMBEDDED SYSTEMS
by
Daniel D. GajskiFrank Vahid
Sanjiv NarayanJie Gong
University of California at IrvineDepartment of Computer Science
Irvine, CA 92715-3425
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongIntroduction 2 of 214
Design representations
� BehavioralRepresents functionality but not implementation
� StructuralRepresents connectivity but not dimensionality
� PhysicalRepresents dimensionality but not functionality
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongIntroduction 3 of 214
Levels of abstraction
Transistor
Gate
Register
Processor
Behavioral forms
Structuralcomponents
Physical objects
Levels
PCBs,MCMs
Differential eq.,current−voltage diagrams
Boolean equations,finite−state machines
Executable spec., programs
Processors, controllers, memories, ASICs
Adders, comparators, registers, counters, register files, queues
Gates,flip−flops
Transistors, resistors, capacitors
Analog and digital cells
Modules, units
Microchips, ASICs
Algorithms, flowcharts, instruction sets,generalized FSM
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongIntroduction 4 of 214
Design methodologies
� Capture-and-simulateSchematic captureSimulation
� Describe-and-synthesizeHardware description languageBehavioral synthesisLogic synthesis
� Specify-explore-re neExecutable speci cationSoftware and hardware partitioningEstimation and explorationSpeci cation re nement
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongIntroduction 5 of 214
Motivation
if (x = 0) then y = a * b / 2
Processor Memory
ASIC I/O
Executablespecification
Systemimplementation
Models
Languages
Partitioning
Estimation
Refinement
Videoaccelerator
Behavioral synthesisLogic synthesis
Software compilation Physical designTest generationManufacturing
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongOutline 6 of 214
Outline
� Introduction
� Design models and architectures
� System-design languages
� An example
� Translation
� Partitioning
� Estimation
� Re nement
� Methodology and environments
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 7 of 214
Models and architectures
Implementation
Designprocess
Models are conceptual views of the system’s functionality
Models
Architectures
Specification + Constraints
Architectures are abstract views of the system’s implementation
(Specification)
(Implementation)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 8 of 214
Models and architectures
� Model: a set of functional objects and rules for composing these objects
� Architecture: a set of implementation components and their connections
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 9 of 214
Models of an elevator controller
then the elevator remains idle.loop if (req_floor = curr_floor) then direction := idle; elsif (req_floor < curr_floor) then direction := down; elsif (req_floor > curr_floor) then direction := up; end if;end loop;
then lower the elevator to the requested floor.
"If the elevator is stationary and the floor requested is equal to the current floor,
If the elevator is stationary and the floor requested is less than the current floor,
If the elevator is stationary and the floor requested is greater than the current floor, then raise the elevator to the requested floor."
(req_floor < curr_floor)/ direction := down
(req_floor = curr_floor)/ direction := idle
(req_floor > curr_floor)/ direction := up
(req_floor = curr_floor)/ direction := idle
(req_floor = curr_floor)/ direction := idle
(req_floor > curr_floor)/ direction := up
(req_floor < curr_floor)/ direction := down
(req_floor < curr_floor)
/ direction := down(req_floor < curr_floor)
/ direction := up
UpIdleDown
(a) English description (b) Algorithmic model
(c) State−machine model
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 10 of 214
Architectures for implementing the elevator controller
State register
directionC
ombi
natio
nal l
ogicreq_floor
curr_floor
In/out ports
MemoryProcessorBus
req_floorcurr_floor direction
(b) System level(a) Register level
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 11 of 214
Models
� State-oriented modelsFinite-state machine (FSM), Petri net, Hierarchical concurrent FSM
� Activity-oriented modelsData ow graph, Flowchart
� Structure-oriented modelsBlock diagram, RT netlist, Gate netlist
� Data-oriented modelsEntity-relationship diagram, Jackson’s diagram
� Heterogeneous modelsControl/data ow graph, Structure chart, Programming language paradigm,Object-oriented paradigm, Program-state machine, Queueing model
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 12 of 214
State oriented: Finite-state machine (Mealy model)
S1S2
S3
startr2/u1
r1/d1
r3/u2r1/d2r2
/d1
r3/u
1
r2/n
r3/n
r1/n
S = { s1, s2, s3}I = {r1, r2, r3}O = {d2, d1, n, u1, u2}f: S x I −> Sh: S x I −> O
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 13 of 214
State oriented: Finite-state machine (Moore model)
S12
S
S11
13
S
S
21
S22
23
S
S
S
31
33
32
start/d2
/d1
/n
/d1
/n
/u1
/n
/u1
/u2
r1r1r1
r2
r1
r1
r1
r2
r2
r1
r1
r1r2
r2
r3
r3r2
r2
r3
r3
r3r2r3r2
r3r3r3
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 14 of 214
State oriented: Finite-state machine with datapath
S1
(curr_floor != req_floor) / output := req_floor − curr_floor; curr_floor := req_floor
(curr_floor = req_floor) / output := 0
start
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 15 of 214
Finite-state machines
� Merits:represent system’s temporal behavior explicitlysuitable for control-dominated system
� Demerits:lack of hierarchy and concurrency resulting instate or arc explosion when representing complex systems
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 16 of 214
State oriented: Petri nets
Net = (P, T, I, O, u)P = {p1, p2, p3, p4, p5}T = {t1, t2, t3, t4}
I(t1) = {p1}I(t2) = {p2,p3,p5}I(t3) = {p3}I(t4) = {p4}
p1 p5
p2
p3
p4t4
t3
t2t1
I: O: u: u(p1) = 1u(p2) = 1u(p3) = 2u(p4) = 0u(p5) = 1
O(t1) = {p5}O(t2) = {p3,p5}O(t3) = {p4}O(t4) = {p2,p3}
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 17 of 214
Petri nets
t2 t1 t1t2t1
t1 t2 t1 t2 t3 t4
(a) Sequence (b) Branch (c) Synchronization
(d) Resource contention (e) Concurrency
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 18 of 214
Petri nets
� Merits:good at modeling and analyzing concurrent systems
� Demerits:‘ at’ model that isincomprehensible when system complexity increases
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 19 of 214
State oriented: Hierarchical concurrent FSM
Y
A
B
C
D
E
F
G
b
u
r
as
a(P)/c
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 20 of 214
Hierarchical concurrent FSMs
� Merits:support both hierarchy and concurrencygood for representing complex systems
� Demerits:concentrate only on modeling control aspectsand not data and activities
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 21 of 214
Activity oriented: Data ow graphs (DFG)
A 1 A 2
X
Y
V V’
Z
W
Y
W
Z
V’
A 2.1 A 2.2
A 2.3
File
+
X
Y W*
Z
Input
Output
Output
(a) Activity level (b) Operation level
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 22 of 214
Data ow graphs
� Merits:support hierarchysuitable for specifying complex transformational systemsrepresent problem-inherent data dependencies
� Demerits:do not express temporal behaviors or control sequencingweak for modeling embedded systems
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 23 of 214
Activity oriented: Flowchart (CFG)
MAX = MEM(J)
J = 1MAX = 0
J = J+1
J > N MEM(J) > MAX
start
No
Yes
Yes
No
end
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 24 of 214
Flowcharts
� Merits:useful to represent tasks governed by control owcan impose a order to supersede natural data dependencies
� Characteristics:used only when the system’s computation is well known
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 25 of 214
Structure oriented: Component-connectivity diagrams
Register file
ALU
LIR RIR
Rightbus
Leftbus
A B
Processor
Programmemory
Datamemory
I/Ocoprocessor
Application specific hardware
System bus
(a) Block diagram (b) RT netlist (c) Gate netlist
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 26 of 214
Component-connectivity diagrams
� Merits:good at representing system’s structure
� Characteristics:often used in the later phases of design process
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 27 of 214
Data oriented: Entity-relationship diagram
OrderCustomer
ProductSupplier
Availability
P.O.instance
Request
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 28 of 214
Entity-relationship diagrams
� Merits:provide a good view of the data in the system, alsosuitable for expressing complex relations among various kinds of data
� Demerits:do not describe any functional or temporal behavior of the system.
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 29 of 214
Data oriented: Jackson’s diagram
Rectangle
Drawing
Color
Circle
Width Height
Name
*
AND
OR
AND
Shape
Radius
Users
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 30 of 214
Jackson’s diagrams
� Merits:suitable for representing data having a complex composite structure.
� Demerits:do not describe any functional or temporal behavior of the system.
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 31 of 214
Heterogeneous: Control/data ow graph
Control
A2
A3
A1
enable
0S
1S
2S
start
disableenable
disable
disableenable
, enable / disable A1 A3
/ enable enable ,A1 A2
/ dis
able
st
opdi
sabl
e,
A2
A3
start stop
W = 10
X
W
Y
Z
W = 10
(a) Activity level (b) Operation level
+
1 2 E
+
+
+
Read X Read W
Write A
Const 3 Read X
Write A
Read X Const 2
Const 5
Write X Write A
A := X + WA := X + 3X := X + 2A := X + 5
Data flow graphs
Control flow graph
C
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 32 of 214
Control/data ow graphs
� Merits:correct the inability of DFG in representing the control of a systemcorrect the inability of CFG to represent data dependencies
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 33 of 214
Heterogeneous: Structure chart
Get Transform
Get_A Get_B Change_A Change_B Do_Loop1 Do_Loop2
Compute
Main
Out_C
Datacontrol
A B
A,B
A,BA’,B’
A
A’ B’
B
A’,B’C,D
C
Branch
Iteration
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 34 of 214
Structure charts
� Merits:represent both data and control
� Characteristics:used in the preliminary stages of program design
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 35 of 214
Heterogeneous: Programming languages
� Imperative vs declarative programming languages:C, Pascal, Ada, C++, etc.LISP, PROLOG, etc.
� Sequential vs concurrent programming languages:Pascal, C, etc.CSP, ADA, VHDL, etc.
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 36 of 214
Programming languages
� Merits:model data, activity, and control
� Demerits:do not explicitly model the system’s states
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 37 of 214
Heterogeneous: Object-oriented paradigm
Data
Operations
Object
Data
Operations
Object
Data
Operations
Object
Transformation function
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 38 of 214
Object-oriented paradigms
� Merits:support information hiding, inheritance, natural concurrency
� Demerits:not suitable for systems with complicated transformation functions
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 39 of 214
Heterogeneous: Program-state machine
e2
e3
Y
A
B
C
D
e1
variable A: array[1..20] of integer
variable i, max: integer ;
max = 0;for i = 1 to 20 do if ( A[i] > max ) then max = A[i] ; end if;end for
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 40 of 214
Program-state machines
� Merits:represent system’s states, data, control and activities in a single modelovercome the limitations of programming languages and HCFSM models
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 41 of 214
Heterogeneous: Queueing model
Queue ServerArrivingrequests
Arrivingrequests
(a) One server
(b) Multiple servers
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 42 of 214
Queueing model
� Characteristics:used for analyzing system’s performance, andcan nd utilization, queueing length, throughput
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 43 of 214
Architectures
� Application-speci c architecturesController architecture,Datapath architecture,Finite-state machine with datapath (FSMD).
� General-purpose processorsComplex instruction set computer (CISC)Reduced instruction set computer (RISC)Vector machineVery long instruction word computer (VLIW)
� Parallel processors
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 44 of 214
Controller architecture
Next−statefunction
Outputfunction
Outputs
Inputs
State register
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 45 of 214
Datapath architecture
x(i) b(0) x(i−1) b(1) x(i−2) b(2) b(3)x(i−3)
y(i)
+ +
x(i) b(0) x(i−1) b(1) x(i−2) b(2) b(3)x(i−3)
y(i)
Pipeline stages
Pipeline stages
+
*
+
+ +
** **
* * *
(a) Three stage pipeline
(b) Four stage pipeline
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 46 of 214
FSMD
Next−statefunction
Outputfunction
Datapath
Status
Datapath inputs
Datapath outputs
Control unit
State register
Control
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 47 of 214
CISC architecture
Status
Control unit Instruction reg.
Datapath
Memory
+1
Microprogram memory
Addressselection logic
PC
MicroPC
Control
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 48 of 214
RISC architecture
Status
Control unit
Instruction reg.
Hardwiredoutput andnext−state logic
Memory
Registerfile
ALU
Instr.cache
Datacache
Datapath
State register
Control
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 49 of 214
Vector machines
Interleaved memory
Vectorregisters
Scalarregisters
Memory pipes
Memory pipes
Vectorfunctional unit
Scalarfunctional unit
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 50 of 214
VLIW architecture
+
Memory
+ * *
Register file
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 51 of 214
Parallel processors: SIMD/MIMD
Control unit
Proc. 0
Mem. 0
Proc. 1
Mem. 1
Proc. N−1
Mem. N−1
Interconnection network
PE0 PE PE1 N−1
(a) Message passing
Proc. 0
Mem. 0
Proc. 1
Mem. 1
Proc. N−1
Mem. N−1
Interconnection network
(b) Shared memory
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongModels & Architectures 52 of 214
Conclusion
� Different models focus on different aspects
� Proper model needs to represent system’s features
� Models are implemented in architectures
� Smooth transformation of models to architectures increases productivity
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong53 of 214
System speci cation
� For every design, there exists a conceptual view
� Conceptual view depends on applicationComputation : conceptualized as a programController : conceptualized as a state-machine
� Goal of speci cation languageCapture conceptual view with minimum designer effort
� Ideal language1-to-1 mapping between conceptual model & language constructs
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 54 of 214
Outline
� Characteristics of commonly used conceptual models:Concurrency, hierarchy, synchronization
� Requirements for embedded system speci cation
� Evaluate HDLs with respect to embedded systemsVHDL, Verilog, Esterel, CSP, Statecharts, SDL, SpecCharts
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 55 of 214
Concurrency
� Behavior: a chunk of system functionalitye.g. process, procedure, state-machine
� System often conceptualized as set of concurrent behaviors
� Concurrency can exist at different abstraction levels:Job-levelTask-levelStatement-levelOperation-levelBit-level
� Two types of concurrency within a behaviorData-driven, Control-driven
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 56 of 214
Data-driven concurrency
� Operations execute when input data is available
� Execution order determined by data dependencies
1: Q = A + B 2: Y = X + P3: P = (C − D) * Q
multiply
A B
add
C D
subtract
X
add
YPQ
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 57 of 214
Control-driven concurrency
� Control thread : set of operations executed sequentially
� Concurrency represented by multiple control threads
Fork-join statement
Process statementA B C
Q
R
A CB
sequential behavior X begin Q(); fork A(); B(); C(); join; R();end behavior X;
concurrent behavior X begin process A(); process B(); process C();end behavior X;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 58 of 214
State-transitions
� Systems often are state-based, e.g. controllers
� State may representmode or stage of beingcomputation
� Dif cult to capture using programming constructs
v
w
x
P
Q R
S
start
u
y
finishTz
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 59 of 214
Hierarchy
� Required for managing system complexityAllows system modeler to focus on one subsystem at a timeEnhances comprehension of system functionalityScoping mechanism for objects like types and variables
� Two types of hierarchyStructural hierarchyBehavioral hierarchy
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 60 of 214
Structural hierarchy
� System represented as set of interconnected components
� Interconnections between components represent wires
� Several levels: systems, chips, RT-components, gates
Memory
Processor
Control Logic Datapathdata bus
control lines
System
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 61 of 214
Behavioral hierarchy
� Ability to successively decompose behavior into sub-behaviors
� Concurrent decompositionFork-joinProcess
� Sequential decompositionProcedureState-machine e1
e3
P
Q R
R1
R2
Q1
Q3
Q2
e2
e4
e6
e5
e7
e8
behavior P variable x, y;begin Q(x) ; R(y) ;end behavior P;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 62 of 214
Programming constructs
� Some behaviors easily conceptualized as sequential algorithms
� Wide variety of constructs availableAssignment, branching, iteration, subprograms,recursion, complex data types (records, lists)
type buffer_type is array (1 to 10) of integer;variable buf : buffer_type;variable i, j : integer;
for i = 1 to 10 for j = i to i if (buf(i) > buf(j)) then SWAP(buf(i), buf(j)); end if; end for;end for;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 63 of 214
Behavioral completion
� Behavior completes when all computations performed
� AdvantagesBehavior can be viewed without inter-level transitionsAllows natural decomposition into sequential subbehaviors
X Y
e1
e2
e3
e4
B
e5 Y1
Y2X3
X1
X2
q0
q1
q2
q3
startfinal state
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 64 of 214
Communication
� Concurrent behaviors exchange data
� Shared-memory modelSender updates common mediumPersistent, Non-persistent
� Message-passing modelData sent over abstract channelsUnidirectional / bidirectionalPoint-to-point / multiwayBlocking / non-blocking
shared memory
process Qprocess P
process P
begin variable x .... send (x); ....end
process Q
begin variable y .... receive (y); ....end
channel C
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 65 of 214
Synchronization
� Concurrent behaviors execute at different speeds
� Synchronization required whenData exchanged between behaviorsDifferent activities must be performed simultaneously
� Two types of synchronization mechanismsControl-dependentData-dependent
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 66 of 214
Control-dependent synchronization
� Synchronization based on control structure of behavior
Fork-join
Reset
behavior X begin Q(); fork A(); B(); C(); join; R();end behavior X;
synchronization point
Q
R
A CB
A B C
ABC
e
A2
A1
A B
AB
B1
B2
e
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 67 of 214
Data-dependent synchronization
� Synchronization based on communication of data between behaviors
A2
entered A2
A1
A B
AB
e
B1
B2
(x=1)
A
x:=0A1
x:=1A2
B1
B2
B
e
AB
Synchronization by status detection
Synchronization by common event
Synchronization by common variable
A2 B2
B1
A B
e e
A1
AB
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 68 of 214
Exception handling
� Occurrence of event terminates current computation
� Control transferred to appropriate next mode
� Example of exceptions: interrupts, resets
eP1
P2
PQ
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 69 of 214
Timing
� Required to represent real world implementations
� Functional timing: affects simulation of system speci cationwait for 200 ns;A <= A + 1 after 100 ns;
� Timing constraints: guide synthesis and veri cation tools
time
max 10 ms
IN
OUT
channel C (max 10 Mb/s)
min 50 ns
behavior B
behavior Q
behavior P
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 70 of 214
Embedded system speci cation
� Embedded system: behavior de ned by interaction with environment
� Essential characteristicsState-transitions ExceptionsBehavioral hierarchy ConcurrencyProgramming constructs Behavioral completion
u
v
w
x
start
P
Q
R
e
Q
P
P2
P1fork
S
P
Q R
join
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 71 of 214
VHDL
� IEEE standard, intended for documentationand exchange of designs [IEE88]
� Characteristics supportedBehavioral hierarchy : single level of processesStructural hierarchy : nested blocks and component instantiationsConcurrency : task-level (process), statement-level (signal assignment)Programming constructsCommunication : shared-memory using global signalsSynchronization : wait on and wait until statementsTiming : wait for statement, after clause in assignments
� Characteristics not supportedExceptions : partially supported by guarded signal assignmentsState transitions
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 72 of 214
Verilog and Esterel
� Verilog [TM91] developed as proprietary languagefor speci cation, simulation
� Esterel [Hal93] developed for speci cation of reactive systems
� Characteristics supported:Behavioral hierarchy : fork-joinStructural hierarchy : hierarchy of interconnected modulesProgramming constructsCommunication : shared registers (Verilog) and broadcasting (Esterel)Synchronization : wait for an event on a signalTiming : modeling of gate, net, assignment delays in VerilogExceptions : disable (Verilog), watching, do-upto, trap statements (Esterel)
� Characteristics not supported: State transitions
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 73 of 214
SDL (Speci cation and Description language)
� CCITT standard in telecommunicationfor protocol speci cation [BHS91]
� Characteristics supportedBehavioral hierarchy : nested data owStructural hierarchy : nested blocksState transitions : state machine in processesCommunication : message passingTiming : timeouts generated by timer object
� Characteristics not supportedExceptionsProgramming constructs
system
block
block
process
process
signal route
channel
channel
channel
signal route
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 74 of 214
CSP (Communicating Sequential Processes)
� Intended to specify programs running onmultiprocessor machines [Hoa78]
� Characteristics supportedBehavioral hierarchy : fork-join using parallel commandProgramming constructsCommunication : message passing using input, output commandsSynchronization : blocking message passing
� Characteristics not supportedExceptionsState transitionsStructural hierarchyTiming
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 75 of 214
SpecCharts
� Developed for embeddedsystem speci cation [NVG92]
� PSM (program-state machine) model + VHDL
� Characteristics supportedBehavioral hierarchy : sequential/concurrent behaviorsState transitions: TOC (transition on completion) arcsCommunication : shared memory, message passingExceptions : TI (transition immediately) arcs
� Characteristics similar to VHDLProgramming constructsStructural hierarchySynchronization and Timing
X Y
X2
e1
X1
e2
e3
B
port P, Q : in integer;E
type INTARRAY is array (natural range <>) of integer;signal A : INTARRY (15 downto 0);
variable MAX : integer ;
MAX := 0;for J in 0 to 15 loop if ( A(J) > MAX ) then max := A(J) ; end if;end loop
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 76 of 214
SpecCharts : state transitions
� State transitions represented by TOC and TI arcs between behaviors
u
v
w
x
start
P
Q
R
type sequential subbehaviors is
P : (TOC, u, Q) ; Q : (TOC, v, P), (TOC, w, R); R : (TOC, x, Q);
behavior MAINbegin
behavior P ..... behavior Q ..... behavior R .....
end MAIN;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 77 of 214
SpecCharts : behavioral hierarchy
� Hierarchy represented by nested behaviors
� Behavior decomposed into sequential or concurrent subbehaviors
fork
S
P
Q R
join
behavior MAIN begin
behavior P .....
behavior Q_R begin
behavior Q behavior R end Q_R;
behavior Send MAIN;
type sequential subbehaviors is
P : (TOC, true, Q_R); Q_R : (TOC, true, S); S : ;
.....
type concurrent subbehavior is Q : (TOC, true, halt); R : (TOC, true, halt);
..... .....
.....
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 78 of 214
SpecCharts : exceptions
� Exceptions represented by TI (transition immediately) arcs
e
Q
P
P2
P1
type sequential subbehaviors is
P : (TI, e, Q); Q : ;
....... .......
......
behavior MAIN begin
behavior P behavior P1 behavior P2 behavior Q
end MAIN;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem speci cation 79 of 214
Summary
Concurrency BehavioralCompletionExceptions
VHDL
Verilog
CSP
Statecharts
SDL
Esterel
SpecCharts
Behavioral Hierarchy
StateTransitions
ProgramConstructs
Embedded System FeaturesLanguage
Feature fullysupported
Feature partiallysupported
Feature notsupported
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong80 of 214
Speci cation example
� An executable speci cation-language enables:Early veri cationPrecisionAutomationDocumentation
� A good language/model match reduces:Capture timeComprehension timeFunctional errors
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 81 of 214
Outline
� Capture an example’s model in a particular languagePSM model in the SpecCharts language
� Point out the bene ts of a good language/model match
� Highlight experiments that demonstrate those bene ts
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 82 of 214
Answering machine controller’s environment
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 83 of 214
Highest-level view of the controller
SystemOff
SystemOn
Controller
power=’0’ power=’1’
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 84 of 214
The SystemOn behavior
� System usually respondsto the line
� Pressing any machine buttongets immediate response
SystemOn
RespondToLine
RespondToMachineButton
rising(any_button_pushed)
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 85 of 214
The RespondToMachineButton behavior
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
(a) (b)
behavior RespondToMachineButton type code isbegin if (play=’1’) then HandlePlay; elsif (fwd=’1’) then HandleFwd; elsif (rew=’1’) then HandleRew; elsif (memo=’1’) then HandleMemo; elsif (stop=’1’) then HandleStop; elsif (hear_ann=’1’) then HandleHearAnn; elsif (rec_ann=’1’) then HandleRecAnn; elsif (play_msgs=’1’) then HandlePlayMsgs; end if;end;
RespondToMachineButton
HandlePlay
HandleFwd
HandleRew
HandleMemo
HandleStop
HandleHearAnn
HandleRecAnn
HandlePlayMsgs
play=’1’
fwd=’1’
rew=’1’
memo=’1’
stop=’1’
hear_ann=’1’
rec_ann=’1’
play_msgs=’1’
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 86 of 214
The RespondToLine behavior
� Monitors line for rings
� Answers line
� Responds to exceptionsHangupMachine turned off
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
rising(hangup)
RespondToLine
Monitor
Answer
falling(machine_on)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 87 of 214
The Monitor behavior
� Counts forrequired rings
� Requirementsmay change
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
Monitor
MaintainRingsToWait CountRings
signal rings_to_wait : integer range 1 to 20 := 4;
loop rings_to_wait <= DetermineRingsToWait; wait on tollsaver, machine_on;end loop;
function DetermineRingsToWait return integer is begin if ((num_msgs > 0) and (tollsaver=’1’) and (machine_on=’1’)) then return(2); elsif (machine_on=’1’) then return(4); else return(15); end if;end;
variable I : integer range 0 to 20;
i := 0;while (i < rings_to_wait) loop wait on rings_to_wait, ring; if (rising(ring)) then i := i + 1; end if;end loop;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 88 of 214
The Answer behavior
Answer
PlayAnnouncement RecordMsg Hangup
RemoteOperation
rising(hangup)
button="0001"button="0001"
behavior PlayAnnouncement type code isbegin ann_play <= ’1’; wait until ann_done = ’1’; ann_play <= ’0’;end;
behavior RecordMsg type code isbegin ProduceBeep(1 s); if (hangup = ’0’) then tape_rec <= ’1’; wait until hangup=’1’ for 100 s; ProduceBeep(1 s); num_msgs <= num_msgs + 1; tape_rec <= ’0’; end if;end;
(a)
(b) (c)
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 89 of 214
The RemoteOperation behavior
� Owner can operate machine remotely by phone
� Owner identi es himself by four button ID
RemoteOperation
code_ok=’1’code_ok=’0’
hangup=’1’
RespondToCmds
CheckCode
(a) (b)
behavior CheckUserCode type code isbegin code_ok <= true; for (i in 1 to 4) loop wait until tone /= "1111" and tone’event; if (tone /= user_code(i)) then code_ok <= false; end if; end loop;end;
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaver
hang
up
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 90 of 214
The answering machine controller speci cation
Controller
Linecircuitry
recann
hearann
memo
stop rew play fwd
playmsgs
mic
Announcement unit
Tape unit
light
tollsaverha
ngup
offh
ook
beep
ring
tone
power
on/off
ann_
done
ann_
play
ann_
rec
tape
_fw
d
tape
_pla
y
tape
_rec
tape
_rew
phone line
messages
tape
_cnt
HearMsgsCmds MiscCmds
ResetTape
tone="0010"
hangup=’1’ other
RespondToCmds
code_ok not code_ok
hangup=’1’
RemoteOperation
PlayAnnouncement RecordMsg Hangup
tone="0001"
rising(hangup)Answer
Monitor rising(hangup)
falling(machine_on)
RespondToLine
InitializeSystem RespondToMachineButtonSystemOn
SystemOffController
CheckUserCode
rising(any_button_pushed)
power=’1’ power=’0’
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 91 of 214
Executable speci cation use
� PrecisionReadability/precision compete in a natural languageExecutable speci cation encourages precisionDesigner asks questions, speci cation answers them
� Language/model match (SpecCharts/PSM):HierarchyState-transitionsProgramming constructsConcurrencyExceptionsCompletionEquivalence of states and programs
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 92 of 214
Speci cation capture experiment
VHDL SpecCharts
Number of modelers
40
3
2
1
16
0
0
3
Number of incorrect specifications second time
Average specification−time in minutes
Number of incorrect specifications first time
� VHDL modelers required 2.5 times longer
� Two VHDL speci cations possessed control errors
� SpecCharts were effective for state-transitions and exceptions
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 93 of 214
Comparison of SpecCharts, VHDL and Statecharts
Answering machine exampleS
peci
ficat
ion
attr
ibut
esS
hort
com
ings
Program−states
Arcs
Control signals
Lines/leaf
Lines
Words
No sequentialprogram constructs
No state−transitionconstructs
Conceptual model SpecCharts
VHDL(hierarch.) Statecharts
42
40
−−
−−
−−
−−
80
135
0
−−
−−
−−
42
40
0
7
446
1733
42
40
84
27
1592
6740
32
152
1
29
963
8088
X
X
X
X
X
X
X
X
No hierarchy
No exceptionconstructs
No hierarchical events
VHDL (flat)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 94 of 214
Design quality experiment
Design attribute
3130
2277
5407
38
2630
2251
4881
38
Designed from English
Designed from SpecCharts
Control transistors
Datapath transistors
Total transistors
Total pins
� No loss in design quality with an executable language
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSpeci cation example 95 of 214
Summary
� Executable languages encourage precision and automation
� The language should support an appropriate modelMakes speci cation easy
� Strongly parallels programming languagesStructured vs. assembly languagesObject-oriented model and C++
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong96 of 214
Translation
� Model often unsupported by a standard language
(1) Use a standard language anywayMany tools availableBut, captures model unnaturally
(2) Use an application-speci c languageCaptures model naturallyBut, not many tools available
(3) Use a front-end languageCaptures model naturallyMany tools available after translating to a standard
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 97 of 214
Outline
� Front-end language in VHDL environment
� State machine translation
� Fork-join translation
� Exception translation
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 98 of 214
A front-end language in a VHDL environment
Tool output
VHDL SpecCharts
Translator
VHDL
Simulator Debuger Test−generatorSynthesis tool
VHDL environment
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 99 of 214
State machine translation
(a) (b)
type state_type is (P, Q, R);variable state : state_type := P;
loop case (state) is when P => <actions for P> if (u) then state := Q; else if (not u) then state := R; end if; when Q => <actions for Q> state := P; when R => <actions for R> state := Q; end case;end loop;
P
Q
start
u
R
not u
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 100 of 214
Fork-join translation
(a) (b)
signal fork, P1_done, P2_done : boolean;
Main: process begin
statement1;
parallel { P1; P2; }
statement2; ...
Main : processbegin
statement1;
fork <= true; wait until P1_done and P2_done;
statement2; ...
P1_process : processbegin
wait until fork;
P1;
P1_done <= true; wait until not fork; P1_done <= false;
end;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 101 of 214
Exception translation
event e : T −−> S;
S_start:
(a) (b) (c)
T : statement1; statement2; statement3;
S : statement4; statement5;
−− Sstatement4;statement5;
−− Tstatement1;if (e) goto S_start;statement2;if (e) goto S_start;statement3;
−− TT_loop : loop statement; if (e) exit T_loop; statement2; if (e) exit T_loop; statement3; exit T_loop;end loop;
−− Sstatement4;statement5;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongTranslation 102 of 214
Summary
� The perfect standard language may never exist
� No standard language supports all models
� Using a front-end language solves the problemNatural captureLarge base of tools and expertise
� Translators are simpleMaps characteristics to existing constructsGenerates well-structured and consistent output
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong103 of 214
System partitioning
� System functionality is implemented on system componentsASICs, processors, memories, buses
� Two design tasks:Allocate system components or ASIC constraintsPartition functionality among components
� ConstraintsCost, performance, size, power
� Partitioning is a central system design task
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 104 of 214
Outline
� Structural vs. functional partitioning
� Natural vs. executable language speci cations
� Basic partitioning issues and algorithms
� Functional partitioning techniques for hardware
� Hardware/software partitioning
� Functional partitioning techniques for software
� Exploring tradeoffs with functional partitioning
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 105 of 214
Structural vs. functional partitioning
� Structural: Implement structure, then partition
� Functional: Partition function, then implementEnables better size/performance tradeoffsUses fewer objects, better for algorithms/humansPermits hardware/software solutionsBut, it’s harder than graph partitioning
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 106 of 214
Natural vs. executable language speci cations
� Alternative methods for specifying functionality
� Natural languages common in practice
� Executable languages becoming popularAutomated estimation/partitioning explores solutionsEarly veri cation reduces costly late changesPrecision eases integration
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 107 of 214
Basic partitioning issues
Granularity
Output
Partitioning algorithms
Specification abstraction−level
Metrics and estimations
Objective and closeness functions
System−component allocation
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 108 of 214
Basic partitioning issues (cont.)
� Speci cation-abstraction level: input de nitionJust indicating the language is insuf cientAbstraction-level indicates amount of design already donee.g. task DFG, tasks, CDFG, FSMD
� Granularity: speci cation size in each objectFine granularity yields more possible designsCoarse granularity better for computation, designer interactione.g. tasks, procedures, statement blocks, statements
� Component allocation: types and numberse.g. ASICs, processors, memories, buses
� Output: format and usese.g. new speci cation, hints to synthesis tool
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 109 of 214
Basic partitioning issues (cont.)
� Metrics and estimations: "good" partition attributese.g. cost, speed, power, size, pins, testability, reliabilityEstimates derived from quick, rough implementationSpeed and accuracy are competing goals of estimation
� Objective and closeness functionsCombines multiple metric valuesCloseness used for grouping before complete partitionWeighted sum commone.g. k1F (area; c) + k2F (delay; c) + k3F (power; c)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 110 of 214
Basic partitioning issues (cont.)
� Algorithms: control strategiesseeking best partition
Constructive creates partitionIterative improves partitionKey is to escape local minimum
Number of moves
A
BCost
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 111 of 214
Typical partitioning-system con guration
InputModel
Output
Estimators
User interface
Algorithms
Objectivefunction
Designfeedback
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 112 of 214
Basic partitioning algorithms
� Clustering and multi-stage clustering [Joh67, LT91]
� Group migration (a.k.a. min-cut or Kernighan/Lin) [KL70, FM82]
� Ratio cut [KC91]
� Simulated annealing [KGV83]
� Genetic evolution
� Integer linear programming
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 113 of 214
Hierarchical clustering
� Constructive algorithm using closeness metrics
� OverviewGroups closest objectsRecomputes closenessesRepeats until termination condition met
� Cluster tree maintains history of mergesCutline across the tree de nes a partition
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 114 of 214
Hierarchical clustering algorithm
/* Initialize each object as a group */for each oi loop
pi = oi
P = PSpi
end loop
/* Compute closenesses between objects */for each pi loop
for each pj loop
ci;j = ComputeCloseness(pi; pj)end loop
end loop
/* Merge closest objects and recompute closenesses*/
while not Terminate(P ) loop
pi; pj = FindClosestObjects(P;C)
P = P � pi � pjSpij
for each pk loop
cij;k = ComputeCloseness(pij; pk)end loop
end loop
return P
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 115 of 214
Hierarchical clustering example
1
2 3
4
10
o
o
o
o
1 2 3 41 2 3 41 2 3 42 3 4
(a) (b) (c) (d)
1o o o o o o o o o o o o o o o o
Avg(10,10) = 10Avg(15,25) = 20
10
4
30 25
1510
10
2o 3o
o
1o
10
20
1
3
4
102o
o
o
o 4
2o 3o
o
1o
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 116 of 214
Simulated annealing
� Iterative algorithm modeled after physical annealing process
� OverviewStarts with initial partition and temperatureSlowly decreases temperatureFor each temperature, generates random movesAccepts any move that improves costAccepts some bad moves, less likely at low temperatures
� Results and complexity depend on temperature decrease rate
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 117 of 214
Simulated annealing algorithm
temp = initial temperature
cost = Objfct(P )while not Frozen loop
while not Equilibrium loop
P tentative = Move(P )
cost tentative = Objfct(P tentative)
cost = cost tentative� cost
if (Accept(cost; temp) > Random(0; 1)) then
P = P tentative
cost = cost tentative
end ifend loop
temp = DecreaseTemp(temp)end loop
where: Accept(cost; temp) = min(1; e�cost
temp )
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 118 of 214
Functional partitioning for hardware: BUD
� Goal: incorporate area/time into synthesis [MK90]
� Clusters CDFG operations into datapath modules
� Closeness metrics:Interconnecting wiresConcurrencyShared hardware
� Each clustering corresponds to an allocation/scheduling
� Selects clustering with best area/time
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 119 of 214
BUD example
+
=
−
<
−.38
.24
.7.2
0
0
(a) (b) (c)
x := a + b;if (a = b) c := ((x − y) < z);
(bit−widths = 4)
+ =
<−
a b
x y z
c
0 1
x cond
cond
start
finish
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 120 of 214
BUD example (cont.)
= <
+−
.2
−.19
.12
+ =<−
AVG(−.19,.12) = .035
+−
=<
+−=<
+ =<− + =<−
17.5 36 63015.8 26 411
16.4 26 42613.8 26 359 (best)
3 clusters
Chip area A Expected cycle time T
Objfct = AxT
(a)
(b)
(c)
Avg(−
.38,
0) =
Avg(0,.24) =
Chip
Controller
+−
< =
+−=<+−, =<+−, =, <+, −, =, <
Clusters
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 121 of 214
Functional partitioning for hardware: Aparty
� Extends BUD clustering to multiple stages [LT91]Different closeness metrics for each stage
� Closeness metrics:Control transfer reductionData transfer reductionHardware sharing
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 122 of 214
Aparty example
1
3
4
(a)
123
4
23
17
214
(b) (c)
2o
o
o
o
oo
o o
2 3 41o o o o 3 412o o o
12o
3o
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 123 of 214
Hardware/software partitioning
� Combined hardware/software systems are common
� Software is cheap, modi able, and quick to design
� Hardware is fast
� Special algorithms are needed to favor software
� Proposed algorithmsGreedy [GD92]Hill climbing [EHB94]Binary-constraint search with hill climbing [VGG93]
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 124 of 214
Functional partitioning for systems: Vulcan, Cosyma
� Vulcan [GD90]IPartitions CDFG operations among hardware onlyGroup migration and simulated annealing algorithms
� Vulcan II [GD93]Partitions operations among hardware/softwareArchitecture: processor, hardware, memory, busAll communication through memoryUses greedy algorithm, extracts behaviors from hardware
� Cosyma [EHB94]Partitions statement blocks among hardware/softwareArchitecture: processor, hardware, memory, busSimulated annealing, extracts behaviors from software
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 125 of 214
Functional partitioning for systems: SpecSyn
� Solves three partitioning problemsBehaviors to processors/ASICsVariables to memoriesCommunication channels to buses
� Uses fast incremental-update estimators
� Covers both hardware andhardware/software partitioning [GVN94, VG92]
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 126 of 214
Exploring tradeoffs with functional partitioning
� Each line represents adifferent vendor’s chip set
� Each point represents anallocation and partition
� Many designs quickly examined
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0cost (dollars)
200.0
400.0
600.0
800.0
1000.0
1200.0
perf
orm
ance
(m
icro
seco
nds)
chipset1chipset2chipset3
A B
C
D
A B
C
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 127 of 214
Summary
� Partitioning heavily in uences design quality
� Functional partitioning is necessary
� Executable speci cation enables:AutomationExplorationDocumentation
� Variety of algorithms exist
� Variety of techniques exist for different applications
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongSystem partitioning 128 of 214
Future directions
� Metrics from real design to guide partitioning
� Comparison of functional partitioning algorithms
� Impact of metric selections and orderings
� Impact of of granularity on partition quality
� Exploitation of regularity in partitioning
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong129 of 214
Estimation
� Estimates allowEvaluation of design qualityDesign space exploration
� Design modelRepresents degree of design detail computedSimple vs. complex models
� Issues for estimationAccuracySpeedFidelity
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 130 of 214
Outline
� Accuracy versus speed
� Fidelity
� Quality metricsPerformance metricsHardware and software cost metrics
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 131 of 214
Accuracy vs. Speed
� Accuracy: difference between estimated and actual value
A = 1 �
j E(D) �M (D) j
M (D)
� Speed: computation time for obtaining estimate
Actual Design
Computation Time
Simple Model
Estimation Error
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 132 of 214
Fidelity
� Estimates must predict quality metrics for different design alternatives
� Fidelity: % of correct predictions for pairs of design implementations
� Higher delity =) correct decisions based on estimates
A B C
estimate
Designpoints
MetricE(A) > E(B), M(A) < M(B)
E(B) < E(C), M(B) > M(C)
E(A) < E(C), M(A) < M(C)
(A, B) =
(B, C) =
(A, C) =
= 33 %Fidelity
measured
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 133 of 214
Quality metrics
� Performance MetricsClock cycle, control steps, execution time, communication rates
� Cost MetricsHardware: manufacturing cost (area), packaging cost(pin)Software: program size, data memory size
� Other metricsPower, testability, design time, time to market
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 134 of 214
Hardware design model
RF
Control Logic
Memory
Muxes
Registers/Register Files
Muxes
Functional UnitsFU
DatapathControl Unit
Status bits
ControlRegister
StatusRegister
State Reg.
ARDR
R1 R2
n1
n6
n5
n2
n3
n4
p3
p2
p1
Next−State Logic
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 135 of 214
Clock cycle estimation
� Clock cycle determines:Resources, execution time
� Determining clock cycleDesigner speci ed [PK89, MK90]Maximum delay of any functional unit [PPM86, JMP88]Clock utilization [NG92]
Clock CycleExec. Time Resources
: 380 ns: 380 ns
+
+
+
+
150
150
80
80 80
80
Clock CycleExec. Time Resources
: 150 ns: 600 ns
+150
150
80
+80
+80
+80
Clock CycleExec. Time Resources
: 80 ns: 400 ns
+150
150
80
+80
+80
+80xx
x x
x
x
: 2 x, 4 + : 1 x, 1 + : 1 x, 1 +
i1 i2 i3 i4 i5 i6
i1 i2 i3 i4 i5 i6i1 i2 i3 i4 i5 i6
o2o1o1
o2o1
o2
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 136 of 214
Clock slack and utilization
� Slack : portion of clock cycle for which FU is idle
slack(clk; ti) = ( ddelay(ti)� clke � clk )� delay(ti)
� Average slack: FU slack averaged over all operations
ave slack(clk) =
TXi
[ occur(ti)� slack(clk; ti) ]
TXi
occur(ti)
� Clock utilization : % of clock cycle utilized for computations
utilization(clk) = 1�ave slack(clk)
clk
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 137 of 214
Clock utilization
1 x CLK 2 x CLK 3 x CLK
50 100 150 time (ns)
Slack
occur(x)=6
occur(−)=2
occur(+)=2
Functional unit delay
number ofoperations
Clock = 65 ns
=+ +
x− +
6x32
2x9 2 x 17
= 24.4 nsave_slack(65 ns)6 + 2 + 2
utilization(65 ns) = 1 − (24.4 / 65.0) = 62
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 138 of 214
Slack minimization algorithm
Clock Slack Minimization [NG92]Compute range: clkmax, clkmin
Compute occurrences: occur(ti)
max utilization = 0/* Examine each clock cycle in range */ for clkmin � clk �
clkmax loop
for all operation types ti 2 T loopCompute slack slack(clk; ti)
end loop
Compute average slack: ave slack(clk)
Compute utilization: utilization(clk)
/* If highest utilization */ if utilization(clk)> max utilization
then
max utilization = utilization(clk)
max utilization clk = clk
end ifend loop
clk(SM) = max utilization clk
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 139 of 214
Execution time vs. clock utilizationSecond order differential equation example
� Clock with highest utilization results in better execution times
Clock cycle vs. Utilization Execution time vs. utilization
0.0 20.0 40.0 60.0 80.0 100.0Utilization (%)
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
Clo
ck c
ycle
(ns
)
56 ns
92%
0.0 20.0 40.0 60.0 80.0 100.0Utilization (%)
400.0
600.0
800.0
1000.0
1200.0
Exe
cutio
n tim
e (n
s)
92%
560 ns
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 140 of 214
Control steps estimation
� Operations in the speci cation assigned to control step
� Number of control steps determines:Execution time of designComplexity of control unit
� SchedulingGranularity is operations in a data ow graphComputationally expensive
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 141 of 214
Operator-use method
� Granularity is statements in speci cation
� Faster than scheduling, average error 13%
u6 := u − u4
u := u6 − u5
add: (1/1)*1= 1
add: (1/1)*1= 1
mult: (4/2)*4= 8
mult: (2/2)*4= 4
maximummacro−nodecontrol steps
addmultsub
121
141
clocks(t )inum(t )it i
u1 := u x dx ;u2 := 5 x w ;u3 := 3 x y ;y1 := i x dx ;w := w + dx ;u4 := u1 x u2 ;u5 := dx x u3 ;y := y + y1 ;u6 := u − u4 ;u := u6 − u5 ;
u1 := u x dxu2 := 5 x w u3 := 3 x yy1 := i x dx w := w + dx
u4 := u1 x u2u5 := dx x u3y := y + y1
max (1 , 8) = 8
max (1 , 4) = 4
sub: (1/1)*1= 1
max (1 ) = 1
Estimated total control steps
= 14
sub: (1/1)*1= 1max (1 ) = 1
y := y + y1
u6 := u − u4
u := u6 −u5
w := w + dx
u1 := u x dx
u2 := 5 x w
u3 := 3 x y
y1 := i x dx
u4 := u1 x u2
u5 := dx x u3
n1
n2
n3
n4
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 142 of 214
Branching in behaviors
� Control steps maybe shared across exclusive branchessharing schedule: fewer states, status registernon-sharing schedule: more states, no status registers
o1
o2
o3 o6
o7
o8
o4
o5
B1
B B
B4
2 3
o1
o2
o3
o4
o5
o6
o7
o8
s1
s2
s3
s4
s5
s6
o1
o2
o3
o4
o5
o6
o7
o8
s1
s2
s3
s4
s5
s6
s7
s8
(a) (b) (c)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 143 of 214
Execution time estimation
� Average start to nish time of behavior
� Straight-line code behaviors
exectime(B) = csteps(B) � clk
� Behavior with branchingEstimate execution time for each basic blockCreate control ow graph from basic blocksDetermine branching probabilitiesFormulate equations for node frequenciesSolve set of equations
exectime(B) =
Xbi2B
exectime(bi)� freq(bi)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 144 of 214
Probability-based ow analysis
A := A + 1;
for I in 1 to 10 loop B := B + 1; C := C − A; if (D > A ) then D := D + 2; else D := D + 3; end if
E := D * 2;end loop;
B := B * A;C := 3
A := A + 1;
(I =< 10)(I > 10)
D>A D <= A
D := D + 2;
B := B + 1 ;C := C − A;
E := D * 2 ;
B: = B * A;C := 3;
D := D + 3;
V1
V2
V3 V4
V5
V6
e52
e56
e35
e45
e12
24e
0.5 0.5
0.9
0.1
e23
S
B
B
B
B
B
B
1
2
3 4
5
6
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 145 of 214
Probability-based ow analysis
� Flow equations:
freq(S) = 1:0
freq(v1) = 1:0 � freq(S)
freq(v2) = 1:0 � freq(v1) + 0:9 � freq(v5)
freq(v3) = 0:5 � freq(v2)
freq(v4) = 0:5 � freq(v2)
freq(v5) = 1:0 � freq(v3) + 1:0 � freq(v4)
freq(v6) = 0:1 � freq(v5)
� Node execution frequencies:
freq(v1) = 1:0 freq(v2) = 10:0
freq(v3) = 5:0 freq(v4) = 5:0
freq(v5) = 10:0 freq(v6) = 1:0
� Can be used to estimate number of accesses tovariables, channels or procedures
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 146 of 214
Communication rates
time (ns)
8 8 8 8 8 8 8
200 400 600 800 1000
bits sent over channel C
� Average channel raterate of data transfer over lifetime of behavior
averate(C) = 56 bits
1000 ns= 56 Mb=s
� Peak channel raterate of data transfer of single message
peakrate(C) = 8 bits
100 ns= 80 Mb=s
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 147 of 214
Communication rate estimation
� Total behavior execution time consists ofComputation time, comptime(P ), obtained from ow-analysisCommunication time, commtime(P;C) = access(P;C)� delay(C)
� Total bits transferred by the channel,
total bits(P;C) = access(P;C) � bits(C)
� Channel average rate
averate(C) =
total bits(B;C)
comptime(B) + commtime(B;C)
� Channel peak rate
peakrate(C) =
bits(C)
protocol delay(C)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 148 of 214
Area estimation
� Two tasks:Determining number and type of components requiredEstimating component size for a speci c technology (FSMD, gate arrays etc.)
� Behavior implemented as a FSMD ( nite state machine with datapath)Datapath components: registers, functional units, multiplexers/busesControl unit: state register, control logic, next-state logic
� We will discussDatapath component estimationControl unit estimationLayout area for a custom implementation
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 149 of 214
Clique-partitioning
� Commonly used for determining datapath components
� Let G = (V;E) be a graph, V and E are set of vertices and edges
� Clique is a complete subgraph of G
� Clique-partitioningdivides the vertices into a minimal number of cliqueseach vertex in exactly one clique
� One heuristic: maximum number of common neighbors [CS86]Two nodes with maximum number of common neighbors are mergedEdges to two nodes replaced by edges to merged nodeProcess repeated till no more nodes can be merged
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 150 of 214
Clique-partitioning
Cliques:
{v , v , v }1 3 4
{v , v }2 5
=
=
v1
v3 v4 v5
v2
s134
s25
s134
s25
v1
v3 v4 v5
v2
s1
s2
s3 s
4
s5
1
0
0
0
1
1
e’1,3
e’2,5
e’4,5
e’3,4
e’1,4
e’2,3
Common neighborsEdge
v1
v3 v4 v5
v2
s134
s5
s2
0 e’2,5
Common neighborsEdge
v1
v3 v4 v5
v2
s4
s5
s2s
13
e’4,5 0
e’2,5 0
e’13,4 0
Common neighborsEdge
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 151 of 214
Storage-unit estimation
� Variables not used concurrently maybe mapped same storage-unit
� To use clique-partitioning, construct a graph whereEach variable represented by a vertexVariables with non-overlapping lifetimes have an edge between] their vertices
v
v
v
v v
v
vv v
vv
10
8
1
9 2
7
113 5
46
=
=
=
=
=
1
3
4
5
R
2R
R
R
R
10
{v , v }
{v , v , v }9
{v , v , v }4 5
76
8
11{v , v }
1{v }
2 3
Cliques Storage unit
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11
s1
s2
s3
s4
s0
s5
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 152 of 214
Functional-unit and interconnect-unit estimation
� Clique-partitioning can be applied
� For determining the number of FU’s required, construct a graph whereEach operation in behavior represented by a vertexEdge connects two vertices if
Corresponding operations assigned different control stepsThere exists an FU that can implement both operations
� For determining the number of interconnect units, construct a graph whereEach connection between two units is represented by a vertexEdge connects two vertices if corresponding connections not used
in same control step
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 153 of 214
Computing datapath area
� Bit-sliced datapath
Lbit = �� tr(DP )
Hrt =
nets
nets per track� �
area(bit) = Lbit � (Hcell + Hrt)
area(DP ) = bitwidth(DP ) � area(bit)
LSB MSB
Lbit
H H
Bit slicesRoutingchannel
cell
Hbit
rtDatapath components
Control lines
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 154 of 214
Pin estimation
� Number of wires at behavior’s boundary depends onGlobal dataPort accessedCommunication channels usedProcedure calls
channel ch2
channel ch1
process Factorial ( ch1, ch2) in channel ch1 ; out channel ch2;{ receive (ch1, M); /* compute factorial */ ................ send (ch2, result);}
portF
portG
process Main ( ch1, ch2) out channel ch1 ; in channel ch2;{ send (ch1, N); portF <= portG + 4; ............ receive (ch2, Result);}
variable N : integer;variable X : bit_vector(15 downto 0);
procedure SUM(A, B, OUT) isbegin ....end SUM;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 155 of 214
Software estimation models
Specification
Compile togeneric instructions
Genericinstructions
Estimator
Software Metrics
8086instructiontiming & sizeinformation
MIPSinstructiontiming & sizeinformation
68000instructiontiming & sizeinformation
technology files for target processors
Specification
Compile to 8086
Compile to 68000
Compile to MIPS
8086 instructions
68000 instructions
MIPS instructions
68000Estimator
8086Estimator
MIPSEstimator
Software Metrics
8086instructiontiming & sizeinformation
MIPSinstructiontiming & sizeinformation
68000instructiontiming & sizeinformation
Processor specific model Generic model
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 156 of 214
Deriving processor technology les
Generic instruction
technology file for 68020technology file for 8086
68020 instructions8086 instructions
clocks bytes bytesclocks
dmem3 = dmem1 + dmem2
sizegeneric instruction
...
...
execution time
dmem3 = dmem1 + dmem2 35 clocks 10 bytes
generic instruction
...
...
execution time size
dmem3 = dmem1 + dmem2 22 clocks 6bytes
mov ax, word ptr[bp+offset1] (10) 3 add ax, word ptr[bp+offset2] (9 + EA1) 4 mov word ptr[bp+offset3], ax (10) 3
instruction instruction
mov a6@(offset1), d0 (7) 2 add a6@(offset2), d0 (2 + EA2) 2 mov d0, a6@(offset3) (5) 2
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 157 of 214
Software estimation
� Program execution timeCreate basic blocks and compile into generic instructionsEstimate execution time of basic blocksPerform probability-based ow analysisCompute execution time of the entire behavior:
exectime(B) = � � (X
bi2Bexectime(bi)� freq(bi) )
� accounts for compiler optimizations
� Program memory size
progsize(B) =
Xg2G
instr size(g)
� Data memory size
datasize(B) =
Xd2D
datasize(d)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongEstimation 158 of 214
Summary and future directions
� We described methods for estimating:Performance metrics: clock, control steps, execution time, communication ratesCost metrics: design area, pins, program and data memory size
� Future directions:Incorporating synthesis/compilation optimizationsNew metrics for testability, power, integration cost, etc.New architectural features for the estimation model
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong159 of 214
Re nement
� Functional objects are grouped and mapped to system componentsFunctional objects: variables, behaviors, and channelsSystem components: memories, chips or processors, and buses
� Re nement is update of speci cation to re ect mapping
� Need for re nementMakes speci cation consistentEnables simulation of speci cationGenerate input for synthesis, compilation and veri cation tools
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 160 of 214
Outline
� Re ning variable groups
� Channel re nement
� Resolving access con icts
� Re ning incompatible interfaces
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 161 of 214
Re ning variable groups
� Group of variables mapped to a memory
� Variable folding:Implementing each variable in a memory with a xed word size
� Memory address translationAssignment of addresses to each variable in groupUpdate references to variable by accesses to memory
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 162 of 214
Variable folding
variable A : bit_vector( 3 downto 0) ;variable B : bit_vector(15 downto 0) ;variable C : bit_vector(11 downto 0) ;variable D : bit_vector(11 downto 0) ;
7 0
C(11 downto 8)
D(11 downto 6)
B(15 downto 8)
C( 7 downto 0)
D( 5 downto 0)
B( 7 downto 0)
A( 3 downto 0)
...
...
8−bit Memory
...
11 8 7 0
7..4
4x1
3..0
to variable C in memory
11 6 5 0
6x1
5..0
to variable D in memory
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 163 of 214
Memory address translation
variable J : integer := 100;variable K : integer := 0;variable MEM : IntArray (255 downto 0);....MEM(K + 100) := 3;X := MEM(136);MEM(J) := X;....for J in 100 to 163 loop SUM := SUM + MEM(J);end loop;....
variable J, K : integer := 0;variable V : IntArray (63 downto 0);....V(K) := 3;X := V(36);V(J) := X;....for J in 0 to 63 loop SUM := SUM + V(J);end loop;....
variable J, K : integer := 0;variable MEM : IntArray (255 downto 0);....MEM(K +100) := 3;X := MEM(136);MEM(J+100) := X;....for J in 0 to 63 loop SUM := SUM + MEM(J +100);end loop;....
V (63 downto 0)
MEM(163 downto 100)
Original specification Assigning addresses to V
Refined specification Refined specification without offsets for index J
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 164 of 214
Re ning channel groups
� Channels are virtual entities over which messages are transferred
� Bus is a physical medium that implements groups of channels
� Bus consists of:wires representing data and control linesprotocol de ning sequence of assignments to data and control lines
� Two re nement tasksBus generation: determining buswidth i.e. number of data linesProtocol generation: specifying mechanism of transfer over bus
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 165 of 214
Characterizing communication channels
� For a given behavior P that sends data over channel C,Message size, bits(C) : number of bits in each messageAccesses, accesses(P;C) : number of times P transfers data over C
Average rate, averate(C) : rate of data transfer of C over lifetime of behaviorPeak rate, peakrate(C) : rate of transfer of single message
t=0
8 8channel X X1 X2
8
X3
100 200 300 400
time (ns)
bits(C) = 8 bits
averate(C) = 24 bits
400 ns= 60 Mbits=s
peakrate(C) = 8 bits
100 ns= 80 Mbits=s
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 166 of 214
Characterizing buses
� For a given bus B,Buswidth , buswidth(B) : number of data lines in B
Protocol delay, protdelay(B) : delay for single message transfer over busAverage rate, averate(B) : rate of data transfer over B over lifetime of systemPeak rate, peakrate(B) : maximum rate of transfer of data on bus
peakrate(C) =buswidth(B)
protdelay(B)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 167 of 214
Determining bus rates
� Idle slots of a channel used for messages of other channels
� To ensure that channel average rates are unaffected by bus
averate(B) �
XC2B
averate(C)
� Goal: to synthesize a bus that constantly transfers data i.e.
peakrate(B) = averate(C)
t=0 1s 2s 3s 4s
8 8
8 8
16 16 16
161616
(3x16 bits) / 4s = 12 bits/s
(4 + 12 bits/s) = 16 bits/s
time
(2x8 bits) / 4s = 4 bits/s
channel X
channel Y
X1 X2
X1 X2
Y1
Y1
Y2 Y3
Y3Y2bus B
Average rate
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 168 of 214
Constraints for bus generation
� Buswidth: affects number of pins on chip boundaries
� Channel average rates: affects execution time of behaviors
� Channel peak rates: affects time required for single message transfer
t=0 1s 2s 3s 4s
8
16 16
time
8 8 8
16 16
X1
X1
X1
X2
X2
X2
averate(B) = 8 bits/speakrate(B) =8 bits/s
averate(X) = 8 bits/s
peakrate(B) = 16 bits/saverate(B) = 8 bits/s
channel X
bus B
bus B
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 169 of 214
Bus generation algorithm [NG94]
/* Determine range of buswidths */
minwidth = 1, maxwidth = Max(bits(C))
mincost =1, mincostwidth =1
for currwidth in minwidth to maxwidth loop/* compute bus peak rate */
peakrate(B) = currwidth � protdelay(B)
/* compute sum of channel average rates */
averatesum = 0;for all channels C 2 B loop
averate(C) =
access(P;C)� bits(C)
comptime(P ) + commtime(P )
averatesum = averatesum + averate(C);end loopif (peakrate(B) > averatesum) then
/* feasible solution, determine minimal cost */
currcost = ComputeCost(currwidth)if (currcost < mincost) then
mincost = currcost, mincostwidth = currwidth
end ifend if
end loopreturn(mincostwidth)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 170 of 214
Bus generation algorithm
� Compute buswidth range: minwidth = 1, maxwidth = Max(bits(C))
� For minwidth � currwidth � maxwidth loopCompute bus peak rate:
peakrate(B) = currwidth� protdelay(B)
Compute channel average rates
commtime(P ) = access(P;C) � [ d bits(C)
currwidthe � protdelay(B) ]
averate(C) =
access(P;C) � bits(C)
comptime(P ) + commtime(P )
if peakrate(B) �
XC2B
averate(C) then
if bestcost > ComputeCost(currwidth) then
bestcost = ComputeCost(currwidth)
bestwidth = currwidth
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 171 of 214
Bus generation example
� 2 behavior accessing 16 bit data over two channels
� Constraints speci ed for channel peak rates
0.0 4.0 8.0 12.0 16.0 20.0 24.0Buswidth
-1000.00.0
1000.02000.0
3000.04000.05000.0
6000.07000.08000.09000.0
Cos
t Fun
ctio
n V
alue
selected buswidthinfeasible
implementations
feasible
implementations
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 172 of 214
Performance vs. buswidth tradeoffs
� Allows a buswidth to be selected, given performance constraintse.g. behavior P1 has performance constraint of 2500 clocks.
buswidths of 4 or greater must be selected
0.0 4.0 8.0 12.0 16.0 20.0 24.0Buswidth (pins)
0.0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
Beh
avio
r ex
ecut
ion
time
(clo
cks)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 173 of 214
Protocol generation
� Bus consists of several sets of wires:Data lines, used for transferring message bitsControl lines, used for synchronization between behaviorsID lines, used for identifying the channel active on the bus
� All channels mapped to bus share these lines
� Number of data lines determined by bus generation algorithm
� Protocol generation consists of six steps
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 174 of 214
Protocol generation
1. Protocol selection: full handshake, half-handshake etc.2. ID assignment: N channels require log2(N) ID lines
bus B
CH0
CH1
CH2
CH3
behavior P variable AD;begin ..... X <= 32 ; ..... MEM(AD) := X + 7; .....end ;
behavior Q variable COUNT;begin ..... MEM(60) := COUNT ; .....end ;
variable X : bit_vector(15 downto 0) ;
variable MEM : bit_vector (63 downto 0, 15 downto 0);
"00"
"00"
"00"
"00"
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 175 of 214
Protocol generation
3. Bus structure de nition
4. Bus protocol de nition
for J in 1 to 2 loop wait until (B.START = ’1’) and (B.ID = "00") ; rxdata (8*J−1 downto 8*(J−1)) <= B.DATA ; B.DONE <= ’1’ ; wait until (B.START = ’0’) ; B.DONE <= ’0’ ; end loop;
bus B.ID <= "00" ; for J in 1 to 2 loop B.data <= txdata(8*J−1 downto 8*(J−1)) ; B.START <= ’1’ ; wait until (B.DONE = ’1’) ; B.START <= ’0’ ; wait until (B.DONE = ’0’) ; end loop;
type HandShakeBus is record
end record ;
signal B : HandShakeBus ;
procedure ReceiveCH0( rxdata : out bit_vector) isbegin
end ReceiveCH0;
procedure SendCH0( txdata : in bit_vector) isbegin
end SendCH0;
START, DONE : bit ; ID : bit_vector(1 downto 0) ; DATA : bit_vector(7 downto 0) ;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 176 of 214
Protocol generation
5. Update variable references6. Generate behaviors for variables
8
process Q variable COUNT;begin ..... SendCH3(60, COUNT); .....end ;
bus B
process Xproc variable X ; begin wait on B.ID; if (B.ID="00") then receiveCH0(X); elsif (B.ID="01" ) then sendCH1(X); end if;end;
process MEMproc variable MEM: array(0 to 63); begin wait on B.ID; if (B.ID="10") then receiveCH2(MEM); elsif (B.ID="11" ) then receiveCH3(MEM); end if;end;
process P variable AD Xtemp;begin ..... SendCH0(32) ; ..... ReceiveCH1(Xtemp); SendCH2(AD, Xtemp+7); .....end ;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 177 of 214
Resolving access con icts
� System partitioning may result in concurrent accesses to a resourceChannels mapped to a bus may attempt data transfer simultaneouslyVariables mapped to a memory may be accessed by behaviors simultaneously
� Arbiter needs to be generated to resolve such access con icts
� Three tasksArbitration model selectionArbitration scheme selectionArbiter generation
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 178 of 214
Arbitration models
Static
Dynamic
addr / data
addr / data
port1 port2
port2port1
memory MEM
memory MEMMemArbiter
MemArbiter
addr / data
addr / data
req,grant
req,grant
req,grant
behavior P behavior Q behavior R
behavior P behavior Q behavior R
req,grant
req,grant
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 179 of 214
Arbiter generation
� Example of bus arbitrationTwo behaviors accessing a single resource, bus B
Behavior P assigned higher priority than Q
Fixed priority implemented with two handshake signals Req and Grant
bus B
8
Req_P <= ’1’; wait until (Grant_P = ’1’); Req_P <= ’0’;
process P variable AD Xtemp;begin .....
SendCH0(32) ;
.....end process ;
Req_Q <= ’1’; wait until (Grant_Q = ’1’); Req_Q <= ’0’;
process Q variable COUNT;begin .....
SendCH3(60, COUNT);
.....end process;
Req_PGrant_P
Req_QGrant_Q
begin wait until (Req_P=’1’) or (Req_Q = ’1’); if (Req_P = ’1’) then Grant_P = ’1’; wait unitl (Req_P = ’0’); Grant_P = ’0"; elsif (Req_Q = ’1’) then Grant_Q <= ’1’; wait until (Req_Q = ’0’); Grant_Q <= ’0’; end if;end process;
process B_arbiter
process MEMproc variable MEM: array(0 to 63); begin wait on B.ID; if (B.ID="10") then receiveCH2(MEM); elsif (B.ID="11" ) then receiveCH3(MEM); end if;end process;
process Xproc variable X ; begin wait on B.ID; if (B.ID="00") then receiveCH0(X); elsif (B.ID="01" ) then sendCH1(X); end if;end process;
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 180 of 214
Effect of binding on interfaces
Standard
StandardStandard
Custom
Custom
behavior B
behavior B
behavior B
behavior X
behavior A
behavior A
Pa
Pb
PbPa
Pa
Pb
protocol protocol
Channel X
Channel X
Custom
Interface Process
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 181 of 214
Protocol operations
� Protocols usually consist of ve atomic operationswaiting for an event on input control lineassigning value to output control linereading value from input data portassigning value to output data portwaiting for xed time interval
� Protocol operations may be speci ed in one of three waysFinite state machines (FSMs)Timing diagramsHardware description languages (HDLs)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 182 of 214
Protocol speci cation : FSMs
� Protocol operations ordered by sequencing between states
� Constraints between events may be speci ed using timing arcs
� Conditional & repetitive event sequences require extra states, transitions
Protocol Pa Protocol Pb
a1
a2
a3
start
ADDRp <= AddrVar(7 downto 0);ARDYp <= ’1’;
(ARCVp = ’1’ )
ADDRp <= AddrVar(15 downto 8);AREQp <= ’1’;
(DRDYp = ’1’ )
DataVar <= DATAp
start
b1
b2
b3
(RDp = ’1’)
MAddrVar := MADDRp
(100 ns)
MDATAp <= MemVar (MAddrVar)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 183 of 214
Protocol speci cation : Timing diagrams
� Advantages:Ease of comprehension, representation of timing constraints
� Disadvantages:Lack of action language, not simulatableDif cult to specify conditional and repetitive event sequences
7..0 15..8
15..0
ARDYp
ADDRp
ARCVp
DREQp
DRDYp
DATAp
15..0
15..0
100ns
MADDRp
RDp
MDATAp
Protocol Pa Protocol Pb
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 184 of 214
Protocol speci cation : HDLs
� Advantages:Functionality can be veri ed by simulationEasy to specify conditional and repetitive event sequences
� Disadvantages:Cumbersome to represent timing constraints between events
MADDRpMDATAp
RDp
16
16
8
16
port ADDRp : out bit_vector(7 downto 0);port DATAp : in bit_vector(15 downto 0);port ARDYp : out bit;port ARCVp : in bit;port DREQp : out bit;port DRDYp : in bit;
ADDRp <= AddrVar(7 downto 0);ARDYp <= ’1’;wait until (ARCVp = ’1’ );ADDRp <= AddrVar(15 downto 8);DREQp <= ’1’;wait until (DRDYp = ’1’);DataVar <= DATAp;
ADDRpDATAp
ARDYp
ARCVp
DREQpDRDYp
port MADDRp : in bit_vector(15 downto 0);port MDATAp : out bit_vector(15 downto 0);port RDp : in bit;
wait until (RDp = ’1’);MAddrVar := MADDRp ;wait for 100 ns;MDATAp <= MemVar (MAddrVar);
Protocol Pa Protocol Pb
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 185 of 214
Interface process generation
� Input: HDL description of two xed, but incompatible protocols
� Output: HDL process that translates one protocol to the otheri.e. responds to their control signals and sequence their data transfers
� Four steps required for generating interface process (IP):Creating relationsPartitioning relations into groupsGenerating interface process statementsinterconnect optimization
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 186 of 214
IP generation: creating relations
� Protocol represented as an ordered set of relations
� Relations are sequences of events/actions
ADDRp <= AddrVar(7 downto 0);ARDYp <= ’1’;wait until (ARCVp = ’1’ );ADDRp <= AddrVar(15 downto 8);DREQp <= ’1’;wait until (DRDYp = ’1’);DataVar <= DATAp;
A1
A2
A3 [ (DRDYp = ’1’) : DataVar <= DATAp ]
[ (ARCVp = ’1’) : ADDRp <= AddrVar(15 downto 8) DREQp <= ’1’ ]
[ (true) : ADDRp <= AddrVar(7 downto 0) ARDYp <= ’1’ ]
Protocol Pa Relations
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 187 of 214
IP generation: partitioning relations
� Partition the set of relations from both protocols into groups.
� Group represents a unit of data transfer
B2 (16 bits out)
G1
G2
Protocol Pa Protocol Pb
A1 (8 bits out)
A2 (8 bits out)B1 (16 bits in)
A3 (16 bits in)
G1 = ( A1 A2 B1 ) G2 = ( B1 A3 )
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 188 of 214
IP generation: inverting protocol operations
� For each operation in a group, add its dual to interface process
� Dual of an operation represents the complementary operation
� Temporary variable may be required to hold data values
ADDRp
DATAp
ARDYp
ARCVp
DREQpDRDYp
MADDRpMDATAp
RDp
816
1616
Interface Process
/* (group G1)’ */ wait until (ARDYp = ’1’);TempVar1(7 downto 0) := ADDRp ;ARCVp <= ’1’ ;wait until (DREQp = ’1’);TempVar1(15 downto 8) := ADDRp ;RDp <= ’1’ ;MADDRp <= TempVar1; /* (group G2)’ */wait for 100 ns;TempVar2 := MDATAp ;DRDYp <= ’1’ ;DATAp <= TempVar2 ;
wait for 100 ns wait for 100 ns
Dual operation
Cp <= ’1’
var <= Dp
Dp <= var TempVar := Dp
Dp <= TempVar
Cp <= ’1’wait until (Cp = ’1’)
wait until (Cp = ’1’)
Atomic operation
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 189 of 214
IP generation: interconnect optimization
� Certain ports of both protocols may be directly connected
� Advantages:Bypassing interface process reduces interconnect costOperations related to these ports can be eliminated from interface process
ADDRp
DATAp
ARDYp
ARCVp
DRDYp
MADDRp
MDATAp
8
16
16
Interface Process
BA
wait until (ARDYp = ’1’);TempVar1(7 downto 0) := ADDRp ;ARCVp <= ’1’ ;wait until (DREQp = ’1’);TempVar1(15 downto 8) := ADDRp ;RDp <= ’1’ ;MADDRp <= TempVar1;wait for 100 ns;DRDYp <= ’1’ ;
DREQp
RDp
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 190 of 214
Transducer synthesis [BK87]
� Input: Timing diagram description of two xed protocols
� Output: Logic circuit description of transducer
� Steps for generating logic circuit from timing diagrams:Create event graphs for both protocolsConnect graphs based on data dependencies or explicitly speci ed orderingAdd templates for each output node in combined graphMerge and connect templatesSatisfy min/max timing constraintsOptimize skeletal circuit
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 191 of 214
Generating event graphs from timing diagrams
e.g. FIFO stack control cell
Ri
L
Ro
Ao
Ai
Ri
AoAi
L
Cell
Ro
E
S Ri
L
Ro
L
Ai
Ri
L L
Ro
Ao
Ai
Ao
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 192 of 214
Deriving skeletal circuit from event graph
AoRiL
AoRiL
Ao
LRo
Ao
LRo
S
RQ L
S
RQ
L
Ro
RoAi
Ai
Ai
S
RQ
L
L
RiRo
RiRo
Ro
� Advantages:Synthesizes logic for transducer circuit directlyAccounts for min/max timing constraints between events
� Disadvantages:Cannot interface protocols with different data port sizesTransducer not simulatable with timing diagram description of protocols
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 193 of 214
Hardware/Software interface re nement
Hardware partition
B4B3
v3 v4 s2
s1
p1 p2 p3
B2B1
v1 v2
p1 p2 p3
B4B3
v4 s2
s1
v2
v3
p2
p1
Software partition Memory
PortsBuffer
Processor
ASIC
B2B1
v1
(b) Mapping to architecture(a) Partitioned specification
Data access
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 194 of 214
Tasks of hardware/software interfacing
� Data access (e.g., behavior accessing variable) re nement
� Control access (e.g., behavior starting behavior) re nement
� Select bus to satisfy data transfer rate and reduce interfacing cost
� Interface software/hardware components to standard buses
� Schedule software behaviors to satisfy data input/output rate
� Distribute variables to reduce ASIC cost and satisfy performance
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongRe nement 195 of 214
Summary and future directions
� In this section, we described:Re nement of variable groups: variable folding, address translationRe nement of channel groups: bus and protocol generationResolution of access con icts: arbiter generationRe nement of incompatible interfaces: IP generation, transducer synthesis
� Future work should address the following issues:Effects of bus arbitration delays on performance of a behaviorDeveloping metrics to guide selection of protocols and arbitration schemesEf cient synthesis of arbiter and interface processes
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong196 of 214
Methodology
� Past design effort focused on lower levels
� Higher levels lack well-de ned methodology and tools
� Paradigm shift to higher levels can increase productivity
� Need methodology and tools for system level
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 197 of 214
Outline
� Basic concepts in design methodology
� Example
� A design methodology
� A generic synthesis system
� Conceptualization environment
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 198 of 214
Items a design methodology must specify
� Syntax and semantics of input and output
� Algorithms for transforming input to output
� Components to be used in the design implementation
� De nition and ranges of constraints
� Mechanism for selection of architectural styles
� Control strategies (scenarios or scripts)
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 199 of 214
Example: Interactive TV processor
audio_in
video_in
audio_out
video_out
InteractiveTvProcessor
Analogsubsystem
Analogsubsystem
av_cmd
button
Digital subsystem
Main computer
video audio video
keypadreceiver IC
audio +commands
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 200 of 214
Example’s data ow behavior
Digital subsystem
GenerateAudio
fonts[128][16][16]
ProcessRemoteButtons
audio_in
video_in
audio_out
video_out
screen_chars[30][30][8]
ProcessAVCmd
ProcessMainCmds
av_cmd
main_cmds button
OverlayCharacters
StoreAudio
StoreGenerateVideo
StoreAVCmd
audio1[100k][8]
audio2[100k][8]
video[500k][8]
av_cmd[8]
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 201 of 214
Example’s implementation after system design
ASIC1 ASIC2
Processor
Memory1 Memory2
Memory3
GenerateAudio
av_cmd
video_in
audio_in audio_out
video_out
main_cmds button
fonts[128][16][16]
screen_chars[30][30[]8]
ProcessAVCmd
ProcessMainCmds
ProcessRemoteButtons
OverlayCharacters
Digital subsystem
audio1[100k][8]
audio2[100k][8]
video[500k][8]
StoreGenerateVideo
StoreAVCmd
StoreAudio
av_cmd[8]
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 202 of 214
An example design methodology
Component implementation
System design
Functional specification
ASIC ASIC
bus
Variables
MemoryProcessorFunct.Spec.
Funct.Spec.
Funct.Spec.
C code
detailed bus protocol
mappedaddressspace
ASIC ASIC MemoryProcessor
RTLstruct.
RTLstruct.
Natural language Executable language
Manual PartitioningRefinement
Allocation
Current practice Proposed methodology
Functionality specification
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 203 of 214
System-design tasks
Variables
Behaviors
Channels
Allocation Partitioning Refinement
System−design tasks
Memories
Buses
Fun
ctio
nal o
bjec
ts
Variables to memories
Channels to buses
Address assignment
Arbitration/protocols
Processors Behaviors to processors Interfacing
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 204 of 214
One possible ordering of tasks
Specification
Memory allocation
Variable−to−memory partitioning
Bus allocation
Channel−to−bus partitioning
Interface synthesis
Arbiter synthesis
Implement hardwareImplement software
System design
Component implementation
Functionality specification
2.
1.
3.
ASIC/processor allocation
Behavior−to−ASIC/processor partitioning
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 205 of 214
Generic synthesis system requirements
� CompletenessAll levels of design, all implementation styles
� ExtensibilityAllow addition of new algorithms and tools
� ControllabilityUser control of tools, design-quality feedback
� InteractivityPartial design, design modi cation
� UpgradabilityEvolve to describe-and-synthesize method
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 206 of 214
A generic synthesis system
System Specification
SDB
CDB
Designer
Systemsynthesis
synthesis
ASIC descriptionto manufacturing
Physical designsynthesis
synthesis
Inte
rmed
iate
form
s
Con
cept
ualiz
atio
n en
viro
nmen
t
Softwaresynthesis
CompilationLogic/Sequential
Assembly code
Ver
ifica
tion/
sim
ulat
ion
suite
ASICD
escr
iptio
n g
ener
ator
s
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 207 of 214
A generic system-synthesis tool
Compiler
Partitioner
arbitrationsynthesis
Interface &
To chip synthesisTo software synthesis
Estimators
Allocator
System behavioral specification
System−modulebehavioral specifications
SRTransformer
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 208 of 214
A generic chip-synthesis tool
To physical design
Moduleselector
Storage
CDB
Compiler
CDFG
Microarchitectureoptimizer
Technologymapper
Logic/Sequential synthesis
Behavioraldescription
Scheduler
Interconnection
Componentselector
binder
Functional unitbinder
binder
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 209 of 214
A generic logic-synthesis tool
Statetables
Timingdiagrams
Memoryspecifications
Booleanexpressions
Timing graphcompiler
Memorysynthesis
Interfacesynthesis
Stateencoding
Logicminimization
Stateminimization
Technologymapping
Physical design
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 210 of 214
Conceptualization environment
� Tool is only effective if the designer can use itUnderstandable display of dataHighlight design parts that need attention
� Must support many design avenues
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 211 of 214
A system-synthesis tool interface
� Allocation
� Partition
� Estimates
� Constraints
MappingsModule type Area Pins
Execution time
System
ASIC1
ASIC2
Memory1
Memory2
X100
X100
V1000
V1000
30
30
10
10
25
CaptureAudio
GenerateAudio
audio_array1
video_array
ProcessRemoteButtons
CaptureGenerateVideo
ProcessMiscCmds
CaptureAVCmd
100/110
100/110
100/110
100/110
16000/20000
18000/20000
audio_array2
6000/5000*
Instr$
105/100*
Y900
Cost: 5.43 Partition/Allocate RefineView options
Processor1
46/60
48/60
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 212 of 214
An optional design view
Quality metric
Execution−time(CaptureAudio)
Execution−time(GenerateAudio)
Execution−time(CaptureGenerateVideo)
Execution−time(CaptureAVCmd)
Area(ASIC1)
Area(ASIC2)
Pins(ASIC1)
Pins(ASIC2)
$(System)
Instr(Processor1)
0 constraint
105/100
Estimate/Constraint
100/110
100/110
100/110
100/110
16000/20000
18000/20000
56/60
58/60
6000/5000
Violation?
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 213 of 214
Summary
� Three-step design methodologyFunctionality speci cationSystem designComponent implementation
� Major tasks in system designAllocationPartitioningRe nement
� Generic synthesis tool
� Conceptualization environmentCrucial to practical use
UC IrvineCopyright (c) 1994 Daniel D. Gajski, Frank Vahid, Sanjiv Narayan, and Jie GongMethodology 214 of 214
Future directions
� Advanced estimation methods
� Formal veri cation
� Testability
� Frameworks and databases
� Regularity exploiting
� System-level transformations
� Feedback incorporation
References
[BHS91] F. Belina, D. Hogrefe, and A. Sarma. SDL with Applications from Protocol Speci cations. Prentice Hall, 1991.
[BK87] G. Borriello and R.H. Katz. \Synthesis and optimization of interface transducer logic,". In Proceedings of the InternationalConference on Computer-Aided Design, 1987.
[CS86] C.Tseng and D.P. Siewiorek. \Automated synthesis of datapaths in digital systems,". IEEE Transactions on Computer-AidedDesign, pages 379{395, July 1986.
[EHB94] R. Ernst, J. Henkel, and T. Benner. \Hardware-software cosynthesis for microcontrollers,". In IEEE Design & Test of Com-puters, pages 64{75, December 1994.
[FM82] C.M. Fiduccia and R.M. Mattheyses. \A linear-time heuristic for improving network partitions,". In Proceedings of the DesignAutomation Conference, 1982.
[GD90] R. Gupta and G. DeMicheli. \Partitioning of functional models of synchronous digital systems,". In Proceedings of the Inter-national Conference on Computer-Aided Design, pages 216{219, 1990.
[GD92] R. Gupta and G. DeMicheli. \System-level synthesis using re-programmable components,". In Proceedings of the EuropeanConference on Design Automation (EDAC), pages 2{7, 1992.
[GD93] R. Gupta and G. DeMicheli. \Hardware-software cosynthesis for digital systems,". In IEEE Design & Test of Computers, pages29{41, October 1993.
[GVN94] D.D. Gajski, F. Vahid, and S. Narayan. \A system-design methodology: Executable-speci cation re nement,". In Proceedingsof the European Conference on Design Automation (EDAC), 1994.
[Hal93] Nicolas Halbwachs. Synchronous Programming of Reactive Systems. Kluwer Academic Publishers, 1993.
[Hoa78] C.A.R. Hoare. \Communicating sequential processes,". Communications of the ACM, 21(8): 666{677, 1978.
[IEE88] IEEE Inc., N.Y. IEEE Standard VHDL Language Reference Manual, 1988.
[JMP88] R. Jain, M. Mlinar, and A. Parker. \Area-time model for synthesis of non-pipelined designs,". In Proceedings of the Interna-tional Conference on Computer-Aided Design, 1988.
[Joh67] S.C. Johnson. \Hierarchical clustering schemes,". Psychometrika, pages 241{254, September 1967.
[KC91] Y.C. Kirkpatrick and C.K. Cheng. \Ratio cut partitioning for hierarchical designs,". IEEE Transactions on Computer-AidedDesign, 10(7): 911{921, 1991.
[KGV83] S. Kirkpatrick, C.D. Gelatt, and M. P. Vecchi. \Optimization by simulated annealing,". Science, 220(4598): 671{680, 1983.
[KL70] B.W. Kernighan and S. Lin. \An ef cient heuristic procedure for partitioning graphs,". Bell System Technical Journal, February1970.
[LT91] E.D. Lagnese and D.E. Thomas. \Architectural partitioning for system level synthesis of integrated circuits,". IEEE Transactionson Computer-Aided Design, July 1991.
[MK90] M.C. McFarland and T.J. Kowalski. \Incorporating bottom-up design into hardware synthesis,". IEEE Transactions onComputer-Aided Design, September 1990.
[NG92] S. Narayan and D.D. Gajski. \System clock estimation based on clock slack minimization,". In Proceedings of the EuropeanDesign Automation Conference (EuroDAC), 1992.
[NG94] S. Narayan and D.D. Gajski. \Synthesis of system-level bus interfaces,". In Proceedings of the European Conference onDesign Automation (EDAC), 1994.
[NVG92] S. Narayan, F. Vahid, and D.D. Gajski. \System speci cation with the SpecCharts language,". In IEEE Design & Test ofComputers, Dec. 1992.
[PK89] P.G. Paulin and J.P. Knight. \Algorithms for high-level synthesis,". In IEEE Design & Test of Computers, Dec. 1989.
[PPM86] A.C. Parker, T. Pizzaro, and M. Mlinar. \MAHA: A program for datapath synthesis,". In Proceedings of the Design AutomationConference, 1986.
[TM91] D.E. Thomas and P. Moorby. The Verilog Hardware Description Language. Kluwer Academic Publishers, 1991.
[VG92] F. Vahid and D.D. Gajski. \Speci cation partitioning for system design,". In Proceedings of the Design Automation Conference,1992.
[VGG93] F. Vahid, J. Gong, and D.D. Gajski. \A hardware-software partitioning algorithm for minimizing hardware,". UC Irvine, Dept.of ICS, Technical Report 93-38,1993.