image processing ppt

54
This is the html version of the file http://www.tik.ee.ethz.ch/~leiden05/data/presentations/Bhattacharyya.ppt . Google automatically generates html versions of documents as we crawl the web. Design and Synthesis of Image Processing Systems using Reconfigurable Dataflow Graphs Mainak Sen and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland at College Park Maryland DSPCAD Research Group

Upload: varun-tendulkar

Post on 08-Apr-2015

725 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Image Processing Ppt

This is the html version of the file http://www.tik.ee.ethz.ch/~leiden05/data/presentations/Bhattacharyya.ppt.Google automatically generates html versions of documents as we crawl the web.    

Design and Synthesis of Image Processing Systems using

Reconfigurable Dataflow Graphs 

Mainak Sen and Shuvra S. Bhattacharyya 

Department of Electrical and Computer Engineering, and 

Institute for Advanced Computer Studies 

University of Maryland at College Park 

Maryland DSPCAD Research Group http://www.ece.umd.edu/DSPCAD/ho

me/dspcad.htm 

November 22, 2005 Leiden University, The Netherlands

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 2: Image Processing Ppt

Outline 

Dataflow-based model of computation for modeling the behavior of DSP applications

Decidable dataflow models

o Example: use of decidable dataflow as a model of computation for modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors

Structured reconfiguration of dataflow graphs

Examples of meta-modeling techniques that can be classified as structured, reconfigurable dataflow

o Parameterized dataflow and its application to SDF

o Homogeneous-parameterized dataflow and its application to SDF and CSDF

o Experiments on a gesture recognition application

Summary

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 3: Image Processing Ppt

Dataflow-based design for DSP (Example from Agilent ADS tool)

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

DSP-oriented Dataflow Models of Computation 

Used widely in design tools for DSP

Application is modeled as a directed graph

o Nodes (actors) represent functions

o Edges represent communication channels between functions

o Nodes produce and consume data from edges

Page 4: Image Processing Ppt

o Edges buffer data in FIFO (first-in first-out) fashion

Data-driven execution model o A node can execute whenever

it has sufficient data on its input edges

o The order in which nodes execute is not part of the specification

o The order is typically determined by the compiler, the hardware, or both

Iterative execution

o Body of loop to be iterated a large or infinite number of times

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Dataflow Features and Advantages 

Exposes coarse-grain parallelism.

Page 5: Image Processing Ppt

Exposes high-level structure that facilitates analysis, verification, and optimization.

Captures multi-rate behavior.

Complementary to ongoing advances in DSP compiler technology for procedural languages, such as C and MATLAB.

Encourages desirable software engineering practices: modularity and code reuse

o Amenable also to aspect-oriented design.

Intuitive to DSP algorithm designers: signal flow graphs.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Evolution of Dataflow Models for DSP 

Synchronous dataflow: static multirate behavior

o Agilent ADS, Cadence SPW, etc.

Page 6: Image Processing Ppt

Well-behaved dataflow: schemas for bounded dynamics

Boolean/integer dataflow: Turing complete models

Multidimensional synchronous dataflow: image and video

Scalable synchronous dataflow: block processing

o Synopsys COSSAP

Cyclo-static dataflow: phased behavior

o Synopsys El Greco, Eonic Systems Virtuoso Synchro, System Canvas

Bounded dynamic dataflow : bounded dynamics

The processing graph method: reconfigurable dynamic DF

o US Naval Research Laboratory, MCCI Autocoding Toolset

Parameterized dataflow: dynamically-reconfigurable static DF

Page 7: Image Processing Ppt

Blocked dataflow: image and video in terms of reconfigurable dataflow

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Modeling Design Space 

 

Page 8: Image Processing Ppt

Verification / synthesis power 

C, BDF, DDF 

SDF 

CSDF 

 

CSDF, SSDF 

Page 9: Image Processing Ppt

MDSDF,  WBDF 

PSDF 

PCSDF 

(Third dimension: simplicity and intuitive appeal)

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Decidable Dataflow Models 

Modeling flow for representing static flowgraph behavior:

o Cyclo-static dataflow (CSDF), multiphase modeling

o Synchronous dataflow (SDF), multirate modeling

Page 10: Image Processing Ppt

o Homogeneous synchronous dataflow (HSDF)

o Acyclic homogeneous synchronous dataflow (“task graphs”)

These are in decreasing order or generality

Designs represented in the more general models can be converted to equivalent representations in the less general ones

o e.g., CSDF SDF HSDF task graph

HSDF: each actor (graph node) produces/consumes exactly one data value to/from  each incident output/input edge

o Suitable for exposing parallelism

o Not the best model for minimizing memory requirements

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Synthesis Techniques for Decidable Models 

Page 11: Image Processing Ppt

Static scheduling: low overhead, predictability

Performance analysis through synchronization graphs

Loop scheduling

o Implicit repetition in the dataflow graph (through changes in sample rate) needs to be translated into explicit repetition in the form of loops on the execution target.

o Complex design space exists for such translation

o Complementary to procedural language techniques for nested loop compilation

Loop scheduling techniques

o Simulation speedup (minimization of scheduling complexity)

o Code/data minimization

o Hierarchical parallel scheduling

o Block processing

Task scheduling for latency/throughput optimization

Probabilistic design: exploiting tolerances to deadline misses

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 12: Image Processing Ppt

Example: Intermediate representations for synthesis from decidable dataflow models 

Consider a decidable dataflow behavior that is to be implemented on a self-timed, embedded multiprocessor

o Natural way to implement DSP multiprocessors from decidable dataflow

o Actor assignment and ordering are performed statically

o Invocation (dispatch) of actors is performed dynamically, through synchronization

Candidate mappings of the behavior onto the architecture can be represented through an intermediate representation that also has decidable dataflow semantics

Page 13: Image Processing Ppt

o This representation is useful for understanding the performance, communication overhead, and synchronization structure associated with the candidate mapping

Facilitates the separation of communication and synchronization functionality

This is a useful modeling methodology for design space exploration

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Interprocessor Communication Graph (Gipc) 

2r1 

4s1 

4s2 

4s3 

5s1 

Page 14: Image Processing Ppt

7r1 

8r1 

9r1 

3  

4  

IPC Graph 

Every edge (vi, vj) induces the precedence constraint 

2  

4  

Page 15: Image Processing Ppt

Self-Timed Schedule

Proc 1: (1, 2, 3, 4, 6)

Proc 2: (5, 7, 8)

Proc 3: (9) 

Proc 1 

Proc 2 

Proc 3 

Self-timed schedule and  its IPC graph

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 16: Image Processing Ppt

The synchronization graph Gs

 

Derived from the interprocessor communication graph

Synchronization edges are distinguished from interprocessor communication (IPC) edges

o Synchronization edges represent precedence constraints that are enforced by synchronization protocols

o IPC edges represent data transfers

Interprocessor connections

o Coincident synchronization and IPC edges communication together with synchronization protocol (conventional approach)

Page 17: Image Processing Ppt

o IPC edge only communication without synch. protocol

o Synchronization edge only synchronization protocol only

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Applications of Synchronization Graphs 

Simulation Throughput  estimation through

cycle mean analysis

Removal of redundant synchronizations

Resynchronization

Conversion to more efficient synchronization protocols (strongly connected synchronization graphs)

Statically determining and minimizing the sizes of

Page 18: Image Processing Ppt

interprocessor communication buffers

 

All are post-processing methods that can be applied to improve a wide range of existing task graph scheduling techniques on a wide range of multiprocessor architectures.

These techniques benefit from good execution time estimates, but do not depend on exact execution time values to deliver useful results.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Beyond Decidable Models 

Page 19: Image Processing Ppt

Limited expressive power: DSP applications increasingly employ high-level dynamics in their behavior

o User interface functionality

o Mode changes

o Adaptive algorithms

o Reconfiguration of processing resources/parameters

However, key subsystems still exhibit large amounts of “quasi-static” structure --- structure that stays fixed across significant windows of time.

Various dynamic dataflow models have been proposed that address the limitation above by abandoning most or all restrictions related to decidable dataflow

However, these methods are correspondingly limited in their ability to exploit the quasi-static structure described above

Page 20: Image Processing Ppt

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Parameterized Dataflow: Structured Control of Dynamic Parameters 

The Key discipline that is imposed on reconfiguration is that each subsystem must have a consistent view of each of its actors (hierarchical or primitive) throughout any given iteration of that subsystem.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Parameterized Dataflow 

Hierarchical modeling

  

subsystem 

Page 21: Image Processing Ppt

parent graph 

subinit 

init 

body 

parameter n, ... 

writes n 

reads n 

Parameterized  DF subsystem is composed of 3 parmeterized DF graphs:

o init, subinit, body

  

Subsystem parameters

o configured in init/subinit, used in body

  

Dynamically reconfigurable

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 22: Image Processing Ppt

Meta-modeling with parameterized dataflow 

Parameterized dataflow can be applied to any dataflow model of computation (“base model”) to augment that model with dynamic reconfiguration capabilities in a structured way

o Provides for efficient quasi-static scheduling

o Enables execution to be viewed in terms of a sequence of dataflow graphs in the base model

Parameterized dataflow + XYZ “Parameterized XYZ”

Examples of parameterized dataflow models of computation that we are developing and experimenting with

Page 23: Image Processing Ppt

o parameterized synchronous dataflow (PSDF)

o parameterized cyclo-static dataflow (PCSDF)

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Parameterized Synchronous Dataflow (PSDF) 

“Locally synchrony” conditions can be formulated and checked in a quasi-static fashion to ensure that bounded token production and consumption along with bounded delays lead to bounded memory requirements overall.

o This is not true of unstructured dynamic dataflow models, such as general dynamic

Page 24: Image Processing Ppt

dataflow, boolean dataflow, and bounded dynamic dataflow

Techniques for construction of streamlined looped schedules for synchronous dataflow graphs have natural and efficient extensions to the construction of parameterized looped schedules for PSDF graphs.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

PSDF Example: CD to DAT Conversion 

initChild 

setFac

(sets i1,…d4) 

CD 

PF1  

Page 25: Image Processing Ppt

      1                       1      d1                                  i4          i1                     i3      d2                                 d4            i2        d3

 

PF2 

preamble 

PF3 

PF4 

DAT 

params i1, d1, …., i4, d4 

init 

body 

body 

Page 26: Image Processing Ppt

repeat 5 times {   fire setFac /* sets i1, d1, i2, d2, i3, d3, i4, d4 */   int _g1 = gcd(i1, d2); int _g2=gcd((i2 x i1)/_g1, d3)   int _g3=gcd((i3  x i2 x i1)/(_g2 x _g1), d4);   repeat (d4/_g3) times {       repeat (d3/_g2) times {                                       repeat (d2/_g1) times {               repeat (d1) times {fire CD}               fire PF1           }           repeat (i1/_g1) times {fire PF2}       }       repeat ((i2 x i1)/(_g2 x _g1)) times {fire PF3}   }   repeat ((i3 x i2 x i1)/(_g3 x _g2 x _g1)) times {       fire PF4   }   repeat (i4) times {fire DAT} }

Design and Synthesis of Image Processing Systems,  

Page 27: Image Processing Ppt

University of Maryland at College Park 

PSDF Example: Speech Compression

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

PCSDF Version of Speech Compression

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Outline 

Dataflow-based model of computation for modeling the behavior of DSP applications

Decidable dataflow models

o Example: use of decidable dataflow as a model of computation for modeling the mapping of (decidable) dataflow behaviors onto embedded multiprocessors

Page 28: Image Processing Ppt

Structured reconfiguration of dataflow graphs

Examples of meta-modeling techniques that can be classified as structured, reconfigurable dataflow

o Parameterized dataflow and its application to SDF

o Homogeneous-parameterized dataflow and its application to SDF and CSDF

o Experiments on a gesture recognition application

Summary

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Homogeneous Parameterized Dataflow

(HPDF) 

Page 29: Image Processing Ppt

Parameterized dataflow model that can encapsulate dynamicity of application.

Meta-modeling technique. Hierarchical actors can have any other underlying dataflow model (SDF, CSDF, PSDF etc.)

Data production & consumption rates though dynamic are equal across an edge for a large number of applications - thus the name homogeneous.

Reconfiguration can be performed without introducing hierarchy when more natural to do so (advantage over parameterized dataflow).

Parameterized dataflow is a more powerful technique and thus can be used to represent a wider set of applications.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 30: Image Processing Ppt

Applications 

Applications with dynamic run-time data and aggregated final-stage processes perform especially well for HPDF over SDF semantics.

Many applications in image and speech processing seem well suited for our model.

We applied the model on two applications –

- A real-time video processing algorithm for smart camera developed at Princeton

- A face detection algorithm developed at CFAR labs in UMD.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Application characteristics 

Page 31: Image Processing Ppt

Dynamic but balanced amount of data 

Aggregating

final-stage  

This structure seems to be abundant in many audio/video applications.

Our HPDF model is a natural fit for applications with the above structure.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Gesture recognition algorithm 

Real-time video processing for gesture recognition.

Does low-level (red oval) and high-level processing.

Page 32: Image Processing Ppt

Low-level processing recognizes body parts and identifies movements.

High-level processing recognized actions.

We concentrate on low-level processing.

 

Ref : W. Wolf, B. Ozer, T. LV. Smart cameras as embedded systems. IEEE Computer Magazine Vol 35, Iss 9, Sept 2002, Pages 48-53

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

HPDF model of

gesture recognition algorithm 

Region

finding 

Contour

following 

Page 33: Image Processing Ppt

Ellipse

Fitting 

Graph

Matching 

Dynamic data 

Aggregating

final-stage  

Dynamic data 

n n 

p p 

Ptolemy II implementation

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Modeling with HPDF/CSDF 

VIDEO

INPUT 

REGION

Page 34: Image Processing Ppt

EXTRACTION 

CONTOUR

FOLLOWING 

(s 1) (s 1) 

(s 1) (s 1) 

(s 1) (s 1) 

(s 1) (Xi, Yi) 

(s 1) (Xi, Yi)  

ELLIPSE

FITTING 

(I 0,I ki) (n 1) 

MATCH 

p (pi1, qi 0)  

p phases with 1 token and (n-p) phases with 0 token production 

#phases = #pixels = s

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Integrating HPDF and CSDF 

Page 35: Image Processing Ppt

Number of phases in a fundamental period can vary dynamically.

Number of tokens produced or consumed in a given phase can also vary dynamically.

HPDF constraint: the total number of tokens produced by a source actor of a given edge in a given invocation (a fundamental period) must equal the total number of tokens consumed by the sink in its corresponding invocation.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Each frame has 384x240 pixels, so we model the input as a CSDF actor with 92160 = s phases.

      

Page 36: Image Processing Ppt

Model captures pixel level parallelism present in Region.

It also captures the frame level parallelism through the number of phases in Input (s).

 

Finer granularity and Input modeling 

VIDEO

INPUT 

REGION

EXTRACTION 

(s 1) (s 1) 

(s 1) (s 1) 

(s 1) (s 1) 

#phases = #pixels = s

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Modeling dynamicity - Contour 

Page 37: Image Processing Ppt

2 phases for Contour

First one scans until finds a contour.

o Output = 0 tokens

Second one follows this contour and all the overlapping ones.

o Output = ki tokens, each token is a list of pixels from a contour

Homogeneous condition remains:                

                       =sDesign and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Scheduling 

VRCEM

(s V)(s R)(2I C)(n E)M

(s VR)(2I C)(n E)M

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Page 38: Image Processing Ppt

We applied HPDF to successfully model a face detection algorithm also.

We developed a TI DSP implementation of the HPDF model of the gesture recognition algorithm.

The application was run on a TMS320C64xx fixed point processor.

When implemented with our HPDF model, the runtime was 21405671 cycles.

With a 40ns cycle period, execution time for the application was 0.86 sec.

 

ResultsDesign and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Results (contd.) 

Page 39: Image Processing Ppt

Scheduling overhead was minimal as imperatively highly streamlined quasi-static schedule was obtained.

Worst case buffer size 642 Kb when the input images were 384X240 pixels. HPDF modeling suggested buffer reuse between the edges.

Original C code had runtime of 27741882 cycles, execution time was 1.11 sec with the same clock period of 40 ns.

HPDF improved runtime by 23%.

Efficient hardware code generation is being looked into using hardware synthesis framework developed in our research group.

Design and Synthesis of Image Processing Systems,  

University of Maryland at College Park 

Summary 

Dataflow-based model of computation for is attractive for

Page 40: Image Processing Ppt

modeling the behavior of DSP applications

Decidable dataflow models are useful for exposing and exploiting static structure in synthesis tools for DSP

Decidable dataflow models in conjunction with structured reconfigurable techniques allow for efficient handling of application dynamics

Examples of structured, reconfigurable dataflow techniques that we discussed:

o Parameterized dataflow and its application to SDF

o Homogeneous-parameterized dataflow and its application to SDF and CSDF

o Experiments on a gesture recognition application

Other examples include dynamic configuration of graph topologies, and blocked dataflow modeling.

Design and Synthesis of Image Processing Systems,  

Page 41: Image Processing Ppt

University of Maryland at College Park 

References 

B. Bhattacharya and S. S. Bhattacharyya. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing, 49(10):2408-2421, October 2001

S. S. Bhattacharyya, R. Leupers, and P. Marwedel. Software synthesis and code generation for DSP. IEEE Transactions on Circuits and Systems --- II: Analog and Digital Signal Processing, 47(9):849-875, September 2000.

G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete. Cyclo-static dataflow. IEEE Transactions on Signal Processing, 44(2):397-408, February 1996.

D. Ko and S. S. Bhattacharyya. Dynamic configuration of dataflow graph topology for DSP system design. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-69-V-72, Philadelphia, Pennsylvania, March 2005.

E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous dataflow programs for digital signal processing. IEEE Transactions on Computers, February 1987.

S. Neuendorffer and E. Lee. Hierarchical reconfiguration of dataflow models. In Proceedings of the International Conference

Page 42: Image Processing Ppt

on Formal Methods and Models for Codesign, June 2004.

M. Sen, S. S. Bhattacharyya, T. Lv, and W. Wolf. Modeling image processing systems with homogeneous parameterized dataflow graphs. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pages V-133-V-136, Philadelphia, Pennsylvania, March 2005