cosynthesis algorithms partitioning

Upload: honeygupta480

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    1/29

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    2/29

    Winter-Spring 2001 Codesign of Embedded Systems 2

    Topics Introduction

    Preliminaries

    Hardware/Software Partitioning

    Distributed System Co-Synthesis

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    3/29

    Winter-Spring 2001 Codesign of Embedded Systems 3

    Topics Introduction

    A Classification

    Examples

    Vulcan

    Cosyma

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    4/29

    Winter-Spring 2001 Codesign of Embedded Systems 4

    Introduction to

    HW/SW Partitioning The first variety of co-synthesis applications

    Definition

    A HW/SW partitioning algorithm implements aspecificationon some sort of multiprocessorarchitecture

    Usually

    Multiprocessor architecture = one CPU + someASICs on CPU bus

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    5/29

    Winter-Spring 2001 Codesign of Embedded Systems 5

    Introduction to

    HW/SW Partitioning (contd) A Terminology

    Allocation

    Synthesis methods which design the multiprocessortopology along with the PEs and SW architecture

    Scheduling

    The process of assigning PE (CPU and/or ASICs) time toprocesses to get executed

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    6/29

    Winter-Spring 2001 Codesign of Embedded Systems 6

    Introduction to

    HW/SW Partitioning (contd) In most partitioning algorithms

    Type of CPU is fixed and given

    ASICs must be synthesized What function to implement on each ASIC?

    What characteristics should the implementation have?

    Are single-rate synthesis problems

    CDFG is the starting model

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    7/29Winter-Spring 2001 Codesign of Embedded Systems 7

    HW/SW Partitioning (contd) Normal use of architectural components

    CPU performs less computationally-intensive

    functions ASICs used to accelerate core functions

    Where to use?

    High-performance applications

    No CPU is fast enough for the operations

    Low-cost application

    ASIC accelerators allow use of much smaller, cheaperCPU

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    8/29Winter-Spring 2001 Codesign of Embedded Systems 8

    A Classification Criterion: Optimization Strategy

    Trade-off between Performanceand Cost

    Primal Approach Performance is the primary goal

    First, all functionality in ASICs. Progressively move moreto CPU to reduce cost.

    Dual Approach

    Cost is the primary goal

    First, all functions in the CPU. Move operations to theASIC to meet the performance goal.

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    9/29Winter-Spring 2001 Codesign of Embedded Systems 9

    A Classification (contd) Classification due to optimization strategy

    (contd)

    Example co-synthesis systems Vulcan (Stanford): Primal strategy

    Cosyma (Braunschweig, Germany): Dual strategy

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    10/29

    Winter-Spring 2001 Codesign of Embedded Systems 10

    Co-Synthesis Algorithms:HW/SW Partitioning

    HW/SW Partitioning Examples:

    Vulcan

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    11/29Winter-Spring 2001 Codesign of Embedded Systems 11

    Partitioning Examples:

    Vulcan Gupta, De Micheli, Stanford University

    Primal approach

    1. All-HW initial implementation.2. Iteratively move functionality to CPU to reduce

    cost.

    System specification language

    HardwareC

    Is compiled into a flow graph

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    12/29Winter-Spring 2001 Codesign of Embedded Systems 12

    Partitioning Examples:

    Vulcan (contd)nop

    x=a y=b

    1 1x=a; y=b;

    HardwareC

    cond

    x=e y=f

    c>d cd)x=e;

    else y=f;

    HardwareC

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    13/29

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    14/29

    Winter-Spring 2001 Codesign of Embedded Systems 14

    Partitioning Examples:

    Vulcan (contd) Flow Graph

    is executed repeatedly at some rate

    can have initiation-time constraints for each node t(vj)+lijt(vj) t(vj)+uij

    can have rate constraints on each node

    miRiMi

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    15/29

    Winter-Spring 2001 Codesign of Embedded Systems 15

    Partitioning Examples:

    Vulcan (contd) Vulcan Co-synthesis Algorithm

    Partitioning quantum is a thread

    Algorithm divides the flow graph into threadsandallocates them

    Thread boundary is determined by

    1. (always) a non-deterministic delay element, such as waitfor an external variable

    2. (on choice) other points of flow graph Target architecture

    CPU + Co-processor (multiple ASICs)

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    16/29

    Winter-Spring 2001 Codesign of Embedded Systems 16

    Partitioning Examples:

    Vulcan (contd) Vulcan Co-synthesis algorithm (contd)

    Allocation

    Primal approach Scheduling

    is done by a scheduler on the target CPU

    is generated as part of synthesis process

    schedules all threads (both HW and SW threads)

    cannot be static, due to some threads non-deterministicinitiation-time

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    17/29

    Winter-Spring 2001 Codesign of Embedded Systems 17

    Partitioning Examples:

    Vulcan (contd) Vulcan Co-synthesis algorithm (contd)

    Cost estimation

    SW implementation Code size

    relatively straight forward

    Data size

    Biggest challenge.

    Vulcan puts some effort to find bounds for eachthread

    HW implementation

    ?

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    18/29

    Winter-Spring 2001 Codesign of Embedded Systems 18

    Partitioning Examples:

    Vulcan (contd) Vulcan Co-synthesis algorithm (contd)

    Performance estimation

    Both SW- and HW-implementation From flow-graph, and basic execution times for the

    operators

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    19/29

    Winter-Spring 2001 Codesign of Embedded Systems 19

    Partitioning Examples:

    Vulcan (contd) Algorithm Details

    Partitioning goal

    Allocate each thread to one of two partitions CPU Set: FS Co-processor set: FH

    Required execution-rate must be met, and total costminimized

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    20/29

    Winter-Spring 2001 Codesign of Embedded Systems 20

    Partitioning Examples:

    Vulcan (contd) Algorithm Details (contd)

    Algorithm steps

    1. Put all threads in FHset2. Iteratively do

    2.1. Move some operations to FS.

    2.1.1. Select a group of operations to move to FS.

    2.1.2. Check performance feasibility, by computing

    worst-case delay through flow-graph given the newthread times

    2.1.3. Do the move, if feasible

    2.2. Incrementally update the new cost-function to reflectthe new partition

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    21/29

    Winter-Spring 2001 Codesign of Embedded Systems 21

    Partitioning Examples:

    Vulcan (contd) Algorithm Details (contd)

    Vulcan cost function

    f(w) = c1Sh(F

    H) - c2Ss(F

    S) + c3B - c4P + c5|m|

    c: weight constants

    S(): Size functions

    B: Bus utilization (

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    22/29

    Winter-Spring 2001 Codesign of Embedded Systems 22

    Partitioning Examples:

    Vulcan (contd) Algorithm Details (contd)

    Complementary notes

    A heuristic to minimize communication Once a thread is moved to FS, its immediate successors

    are placed in the list for evaluation in the next iteration.

    No back-track

    Once a thread is assigned toF

    S, it remains there Experimental results

    considerably faster implementations than all-SW, butmuch cheaper than all-HW designs are produced

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    23/29

    Winter-Spring 2001 Codesign of Embedded Systems 23

    Co-Synthesis Algorithms:HW/SW Partitioning

    HW/SW Partitioning Examples:

    Cosyma

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    24/29

    Winter-Spring 2001 Codesign of Embedded Systems 24

    Partitioning Examples:

    Cosyma Rolf Ernst, et al: Technical University of

    Braunschweig, Germany

    Dual approach1. All-SW initial implementation.

    2. Iteratively move basic blocks to the ASICaccelerator to meet performance objective.

    System specification language Cx

    Is compiled into an ESG(Extended Syntax Graph)

    ESGis much like a CDFG

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    25/29

    Winter-Spring 2001 Codesign of Embedded Systems 25

    Partitioning Examples:

    Cosyma (contd) Cosyma Co-synthesis Algorithm

    Partitioning quantum is a Basic Block

    A Basic Blocks is a branch-free block of program Target Architecture

    CPU + accelerator ASIC(s)

    Scheduling

    Allocation Cost Estimation

    Performance Estimation

    Algorithm Details

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    26/29

    Winter-Spring 2001 Codesign of Embedded Systems 26

    Partitioning Examples:

    Cosyma (contd) Cosyma Co-synthesis Algorithm (contd)

    Performance Estimation

    SW implementation Done by examining the object code for the basic block

    generated by a compiler

    HW implementation

    Assumes one operator per clock cycle.

    Creates a list schedule for the DFG of the basic block. Depth of the list gives the number of clock cycles required.

    Communication

    Done by data-flow analysis of the adjacent basic blocks.

    In Shared-Memory

    Proportional to number of variables to be accessed

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    27/29

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    28/29

    Winter-Spring 2001 Codesign of Embedded Systems 28

    Partitioning Examples:

    Cosyma (contd) Experimental Results

    By moving only basic-blocks to HW

    Typical speedup of only 2x Reason:

    Limited intra-basic-block parallelism

    Cure:

    Implement several control-flow optimizations to increase

    parallelism in the basic block, and hence in ASIC Examples: loop pipelining, speculative branch execution with

    multiple branch prediction, operator pipelining

    Result:

    Speedups: 2.7 to 9.7

    CPU times: 35 to 304 seconds on a typical workstation

  • 8/13/2019 CoSynthesis Algorithms Partitioning

    29/29

    Winter Spring 2001 Codesign of Embedded Systems 29

    What we learned today HW/SW Partitioning: One broad category of

    co-synthesis algorithms

    Criteria by which a co-synthesis algorithm iscategorized