embedded system environment - · pdf fileembedded system environment on the university of...

45
Embedded System Environment Exercises Contact: Bern University of Applied Sciences Engineering and Information Technology HuCE-microLab Quellgasse 21 CH-2501 Biel Website HuCE-microLab www.microlab.ch Website BFH-TI www.ti.bfh.ch Biel - March 20, 2018

Upload: nguyencong

Post on 08-Mar-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

Embedded SystemEnvironment

Exercises

Contact:

Bern University of Applied Sciences

Engineering and Information Technology

HuCE-microLab

Quellgasse 21

CH-2501 Biel

Website HuCE-microLab

www.microlab.ch

Website BFH-TI

www.ti.bfh.ch

Biel - March 20, 2018

Contents

Contents

1 Introduction 6

1.1 Motivation 6

1.2 Naming conventions and marks 7

1.3 ESE Startup and Settings 7

2 Image Processing Theory 8

2.1 Description of the Video Processing System 9

2.2 PGM Image File Format 10

2.2.1 PGM example 10

2.3 Pixel Operation 11

2.3.1 Contrast Enhancement 11

2.3.2 Gamma Correction 12

2.4 Median Filter 13

2.4.1 Mathematics 14

2.4.2 How it Works 14

2.4.3 Guidelines for Use 14

2.5 Edge Detection with the Sobel Operator 15

2.5.1 Sobel Operator 16

2.5.2 Mathematics 17

2.5.3 Pseudo-Convolution Operator 18

2.5.4 Guidelines for Use 18

2.6 Inverting 18

3 Exercises Image Processing 20

3.1 Exercise Nr 1: Create the ESE-Model with one CPU 22

3.2 Exercise Nr 2: Split the program into four processes 24

3.2.1 Verify the reading and writing processes 24

3.2.2 Modify the c-code for the image processing 25

3.2.3 Use one processor with four processes 25

3.2.4 Use four processors with one process per processor 26

3.3 Exercise Nr 3: Insertion of parallelism in the processes 28

3.4 Exercise Nr 4: Optimize the communication 30

3.5 Discussion 31

HuCE-microLab 2

Contents

4 Exercise Video Processing 32

4.1 Exercise Nr 1: Optimize the Load Balance 32

4.2 Exercise Nr 2: Add an Inverter to the System 33

4.3 Exercise Nr 3: Reduce Edge Detection to a Single Process 34

4.4 Exercise Nr 4: Optimize the Median Filter 35

4.5 Outlook 36

A Using Eclipse 37

A.1 Importing an Existing Project 37

A.2 Debugging a Functional or Timed TLM in Eclipse with GDB 38

A.2.1 Import the Model 38

A.2.2 Common Problems 44

HuCE-microLab 3

Contents

Document information

Version Summary

AUTHOR DATE VERSION DESCRIPTION

zor1 23-04-2010 1.00 Initial version

hga3 25-05-2010 — Testing environment

hga3 04-03-2013 1.10 Set-up new VM env

kip1 13-03-2013 1.10 Adapt docu to new VM env

kip1 11-03-2014 1.20 Added the appendix again and removed bad links

bwc1 19-03-2018 1.30 Completely refined instruction of image processing exercise

Version verification

This tutorial has been verified on the following software realases:

Software

TOOL VERSION

OS Ubuntu 10.04 (Dedicated HuCE-microLab VM)

ESE Front End 2.0a (API Version 0.1.0b)

HuCE-microLab 4

Contents

Preface

This Exercise is based on the Labs of the Tutorial for the ESE Tools from the Center of

Embedded System Environment on the University of California, Irvine. The Tutorial was

rewriten and slightly adapted to fit with the environement of BFH-TI.

The authors of the original tutorial are Yongjin Ahn, Samar Abdi and Daniel Gajski from

CECS at UCI. The author of the BFH version is the same as the one of the current document.

HuCE-microLab 5

1. Introduction

1 Introduction

The basic purpose of this exercise is to introduce a user deeper into the Embedded Sys-

tem Environment (ESE ) Front End. ESE helps designers to take C/C++ application pro-

cesses and graphical platform capture and automatically produce Transaction Level Models

(TLMs) for functional verification and performance estimation. Extensive information about

ESE and its projected impact on embedded system design processes is available on the web-

site at http://www.cecs.uci.edu/~ese

The Exercise assume the basic knowledge of the tutorial. First you will optimize a im-

plemented design by using CPU’s for multiple processes. In the second part you will add

additional processes to the system and then you have again to optimize the whole system.

The system includes basic image processing methods which are described in the theoretical

part of this document. A detailed description of the system is placed in section 2.1.

1.1 Motivation

The rise in complexity of modern design has forced system designers to move to higher lev-

els of abstraction above Register Transfer Level (RTL) and traditional cycle accurate design.

Therefore, models such as TLMs that provide manyfold speedup over RTL simulation are

being used. However, in order for TLMs to be synthesizable to Hardware (HW) and Soft-

ware (SW) implementation, they must follow well defined semantics. These semantics are

currently missing in the industry and TLM standards. Moreover, enforcing semantics is not

easy with manual modeling.

Secondly, embedded application developers come from a variety of different engineering

backgrounds and are not necessarily adept at electronic design. Model automation tools are

needed for such developers so that they do not need to learn modeling languages such as

SystemC.

Thirdly, businesses that use external suppliers for their embedded system designs need un-

abmiguous executable specifications for design hand-off. An even better proposition would

be to build pre-silicon board prototypes in house. This would reduce the chances for mis-

communication in requirement specification and lead to a more robust design process. Con-

sequently, tools are required that take abstract applications and platforms and quikcly pro-

duce fast TLMs and board prototypes.

HuCE-microLab 6

1.2. Naming conventions and marks

It is with these challenges in mind that we came up with ESE that takes off the drudgery

of manual modeling from system designers. It enables non-experts to create system models

and generate board prototypes using a convenient graphical interface.

1.2 Naming conventions and marks

We have some conventions especially on naming that are intended to be consistent through-

out this document. Manufacturer and product names are formatted in accordance with the

standard rules of English grammar, e.g. “This is an example”. Manufacturer and model

names are proper nouns, and thus are written bold and beginning with a capital letter, e.g.

MicroBlaze.

This tutorial has some parts that are marked with icons on the margin to help you finding

important parts or parts you could skip. The following icons are used:

Notes: This indicates a note. Notes are used to mark information that could help you or iindicate a possible “weirdness” in a specific lab as well as in a sub-part of a lab that would

be explained with additional information or helpful links.

Warning: This is a warning. In contrast to notes mentioned above, a warning should be Ztaken more seriously. While ignoring notes will not cause any problems conversely ignoring

warnings could cause problems.

Exercise: This icon marks a section that is intended especially for students. The exercises bare for checking the hopefully learnt knowledge during the previous steps. It is important to

do this parts patiently for getting a solid and well formed basic knowledge about the ESE

and the work with the ESE Front End as well.

1.3 ESE Startup and Settings

To start the ESE Front End type the command ese into a terminal.

HuCE-microLab 7

2. Image Processing Theory

2 Image Processing Theory

Contents

2.1 Description of the Video Processing System 9

2.2 PGM Image File Format 10

2.2.1 PGM example 10

2.3 Pixel Operation 11

2.3.1 Contrast Enhancement 11

2.3.2 Gamma Correction 12

2.4 Median Filter 13

2.4.1 Mathematics 14

2.4.2 How it Works 14

2.4.3 Guidelines for Use 14

2.5 Edge Detection with the Sobel Operator 15

2.5.1 Sobel Operator 16

2.5.2 Mathematics 17

2.5.3 Pseudo-Convolution Operator 18

2.5.4 Guidelines for Use 18

2.6 Inverting 18

In this exercise we implement a video processing system. Since videos are created out of

several pictures we can use classical image processing methods to process the videos. The

difference compared to image processing is that your input is a stream of images instead of

single a image. This fact allows you to use the advatages of parallelism in multi processor

and heterogenous systems.

In the following sections the relevant parts of the huge field of image processing are ex-

plained in a short way. You should read the first section to get introduced to the system Zwhich the exercises are baed on. The additional sections are details about the different tech-

niques which are used in the system. You can read them when you need the corresponding

informations to solve the task of an exercise.

HuCE-microLab 8

2.1. Description of the Video Processing System

2.1 Description of the Video Processing System

The idea behind the system is to process the video in a matter that you can finaly extract

features out of it. An example is speed or distance measuring in a traffic surveillance video

of vehicles.

Unfortunately the input signal is noisy. Therefore the first element in the system is a denois-

ing element in the form of a 3x3 median filter. The next element will extract the features out

of the image. This will be done with a edge detection algorithm which is appling the sobel

operator to the image. The whole scheme is presented in the figure 2.1

Figure 2.1.: Block schema of the video processing system.

Each yellow block in figure 2.1 represents a process element (PE). In the basic configuration

every PE is placed on a CPU (see figure 2.2). Needless to say that this design would not be

optimal. Therefore the first exercise is to optimize this multi processor design by placing

several PE’s on one CPU. In addition the edge detection and the median filter algorithms are

not implemented in a optimal way. Therefore two of the exercises are optimizations of the

implementations of this algorithms.

Figure 2.2.: Block schema of the video processing system including the CPU’s in the basic configu-ration.

HuCE-microLab 9

2.2. PGM Image File Format

2.2 PGM Image File Format

The PGM file format is part of the group of netpbm formats. They have been developed

in the early 1980s for easily exchange images between platforms. They are also sometimes

referred to collectively as the portable anymap format (PNM).1

Each format differs in what colors it is designed to represent:

â PBM is for bitmaps (black and white, no grays)

â PGM is for grayscale

â PPM is for “pixmaps” which represent full RGB color.

Each file starts with a two-byte file descriptor (in ASCII) that explains the type of file it is

(PBM, PGM, and PPM) and its encoding (ASCII or binary). The descriptor is a capital P

followed by a single digit number (see table 2.1).

File Descriptor Type EncodingP1 Portable bitmap ASCII

P2 Portable graymap ASCII

P3 Portable pixmap ASCII

P4 Portable bitmap Binary

P5 Portable graymap Binary

P6 Portable pixmap Binary

Table 2.1.: File format descriptors for netpbm formats.

The ASCII based formats allow for human-readability and easy transport to other platforms

(so long as those platforms understand ASCII), while the binary formats are more efficient

both at saving space in the file, as well as being easier to parse due to the lack of whitespace.

When using the binary formats, PBM uses 1 bit per pixel, PGM uses 8 bits per pixel, and

PPM uses 24 bits per pixel: 8 for red, 8 for green, 8 for blue.

2.2.1 PGM example

The PGM and PPM formats (both ASCII and binary versions) have an additional parame-

ter for the maximum value (numbers of grey between black and white) after the X and Y

dimensions and before the actual pixel data. Black is 0 and max value is white. There is a

newline character at the end of each lines.

1http://en.wikipedia.org/wiki/Netpbm_format

HuCE-microLab 10

2.3. Pixel Operation

(a) Content of a ASCII PGM file

(b) Image representation of the content above.

Figure 2.3.: PGM example.

2.3 Pixel Operation

Operations which can be applied without knowing the values of its neighbors are so called

pixel operations. In the image processing exercise we use two pixel operations:

contrast enhancement Stretch the gray values of the image to the full range of 256 values

of the 8-bit gray-level.

gamma correction Is used to code and decode luminance or tristimulus values in video or

image systems.

These two operations are described in the following subsections.

2.3.1 Contrast Enhancement

Frequently, an image is scanned in such a way that the resulting brightness values do not

make full use of the available dynamic range. This can be easily observed in the histogram

of the brightness values. By stretching the histogram over the available dynamic range we

attempt to correct this situation. If the image is intended to go from brightness 0 to brightness

2B - 1, then one generally maps the 0% value (or minimum) to the value 0 and the 100%

value (or maximum) to the value 2B - 1. The appropriate transformation is given by:

HuCE-microLab 11

2.3. Pixel Operation

bm.n = (2B − 1) ·am,n −minimum

maximum−minimum

Figure 2.4.: The left image shows the original picture. In the right image, the contrast is enhancedto maximum possible without any loss.

It is also possible to apply the contrast-stretching operation on a regional basis using the

histogram from a region to determine the appropriate limits for the algorithm.

2.3.2 Gamma Correction

Gamma correction2 , gamma non-linearity, gamma encoding, or often simply gamma, is the

name of a nonlinear operation used to code and decode luminance or tristimulus values in

video or still image systems. Gamma correction is, in the simplest cases, defined by the

following power-law expression:

Vout = V γin

where the input and output values are non-negative real values, typically in a predetermined

range such as 0 to 1. A gamma value γ < 1 is sometimes called an encoding gamma, and the

process of encoding with this compressive power-law non-linearity is called gamma com-

pression; conversely a gamma value γ > 1 is called a decoding gamma and the application

of the expansive power-law non-linearity is called gamma expansion. The appropriate trans-

formation is given by:

bm.n = (2B − 1) ·(

am,n2B − 1

HuCE-microLab 12

2.4. Median Filter

Figure 2.5.: Example of CRT gamma correction.

Figure 2.6.: Gamma correction demonstration: Each panel shows the display gamma that the pixelvalues have been adjusted for; for example, the pixels in the second panel are propor-tional to intensity to the 1/2 power, so the image looks approximately correct on a typicalPC monitor.

2.4 Median Filter

The median filter is normally used to reduce noise in an image, somewhat like the mean

filter. However, it often does a better job than the mean filter of preserving useful detail in

the image.2

2http://homepages.inf.ed.ac.uk/rbf/HIPR2/median.htm

HuCE-microLab 13

2.4. Median Filter

2.4.1 Mathematics

The median x̃ of a controlled sample set (x1, x2, . . . , xn) of n measured value is

x̃ =

xn+12

n odd12

(xn

2+ xn

2+1

)n even.

2.4.2 How it Works

Like the mean filter, the median filter considers each pixel in the image in turn and looks at

its nearby neighbors to decide whether or not it is representative of its surroundings. Instead

of simply replacing the pixel value with the mean of neighboring pixel values, it replaces it

with the median of those values. The median is calculated by first sorting all the pixel values

from the surrounding neighborhood into numerical order and then replacing the pixel being

considered with the middle pixel value. (If the neighborhood under consideration contains

an even number of pixels, the average of the two middle pixel values is used.) Figure 4.4

illustrates an example calculation.

124 126 127

125150120

115 119 123

123 125 126 130 140

135

134

133

130120110116111

119

118

122Neighbourhood values:

115, 119, 120, 123, 124,

125, 126, 127, 150

Median value:

124

Figure 2.7.: Calculating the median value of a pixel neighborhood. As can be seen, the central pixelvalue of 150 is rather unrepresentative of the surrounding pixels and is replaced withthe median value: 124. A 3×3 square neighborhood is used here - larger neighborhoodswill produce more severe smoothing.

2.4.3 Guidelines for Use

By calculating the median value of a neighborhood rather than the mean filter, the median

filter has two main advantages over the mean filter:

â The median is a more robust average than the mean and so a single very unrepresen-

tative pixel in a neighborhood will not affect the median value significantly.

HuCE-microLab 14

2.5. Edge Detection with the Sobel Operator

â Since the median value must actually be the value of one of the pixels in the neigh-

borhood, the median filter does not create new unrealistic pixel values when the filter

straddles an edge. For this reason the median filter is much better at preserving sharp

edges than the mean filter.

In general, the median filter allows a great deal of high spatial frequency detail to pass while

remaining very effective at removing noise on images where less than half of the pixels in

a smoothing neighborhood have been effected. (As a consequence of this, median filtering

can be less effective at removing noise from images corrupted with Gaussian noise.)

One of the major problems with the median filter is that it is relatively expensive and com-

plex to compute. To find the median it is necessary to sort all the values in the neighborhood

into numerical order and this is relatively slow, even with fast sorting algorithms such as

quicksort. The basic algorithm can, however, be enhanced somewhat for speed. A common

technique is to notice that when the neighborhood window is slid across the image, many of

the pixels in the window are the same from one step to the next, and the relative ordering of

these with each other will obviously not have changed. Clever algorithms make use of this

to improve performance.

2.5 Edge Detection with the Sobel Operator

Edge detection is a terminology in image processing and computer vision, particularly in

the areas of feature detection and feature extraction, to refer to algorithms which aim at

identifying points in a digital image at which the image brightness changes sharply or more

formally has discontinuities.3

The purpose of detecting sharp changes in image brightness is to capture important events

and changes in properties of the world. It can be shown that under rather general assumptions

for an image formation model, discontinuities in image brightness are likely to correspond

to

â discontinuities in depth,

â discontinuities in surface orientation,

â changes in material properties and

â variations in scene illumination.

In the ideal case, the result of applying an edge detector to an image may lead to a set of

connected curves that indicate the boundaries of objects, the boundaries of surface markings

as well curves that correspond to discontinuities in surface orientation. Thus, applying an

3http://en.wikipedia.org/wiki/Edge_detection

HuCE-microLab 15

2.5. Edge Detection with the Sobel Operator

edge detector to an image may significantly reduce the amount of data to be processed and

may therefore filter out information that may be regarded as less relevant, while preserving

the important structural properties of an image. If the edge detection step is successful, the

subsequent task of interpreting the information contents in the original image may therefore

be substantially simplified.

2.5.1 Sobel Operator

The Sobel operator is used in image processing, particularly within edge detection algo-

rithms. Technically, it is a discrete differentiation operator, computing an approximation of

the gradient of the image intensity function. At each point in the image, the result of the

Sobel operator is either the corresponding gradient vector or the norm of this vector. The

Sobel operator is based on convolving the image with a small, separable, and integer valued

filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of

computations. On the other hand, the gradient approximation which it produces is relatively

crude, in particular for high frequency variations in the image.4

(a) A color picture of a steam engine. (b) The Sobel operator applied to that image.

Figure 2.8.: Exaple of an edge detection with the sobel operator.

In simple terms, the operator calculates the gradient of the image intensity at each point, giv-

ing the direction of the largest possible increase from light to dark and the rate of change in

that direction. The result therefore shows how “abruptly” or “smoothly” the image changes

at that point, and therefore how likely it is that that part of the image represents an edge, as

well as how that edge is likely to be oriented. In practice, the magnitude (likelihood of an

edge) calculation is more reliable and easier to interpret than the direction calculation.

4http://en.wikipedia.org/wiki/Sobel_operator

HuCE-microLab 16

2.5. Edge Detection with the Sobel Operator

2.5.2 Mathematics

Mathematically, the gradient of a two-variable function (here the image intensity function) is

at each image point a 2D vector with the components given by the derivatives in the horizon-

tal and vertical directions. At each image point, the gradient vector points in the direction of

largest possible intensity increase, and the length of the gradient vector corresponds to the

rate of change in that direction. This implies that the result of the Sobel operator at an image

point which is in a region of constant image intensity is a zero vector and at a point on an

edge is a vector which points across the edge, from darker to brighter values.

Mathematically, the operator uses two 3x3 kernels which are convolved with the original

image to calculate approximations of the derivatives - one for horizontal changes, and one

for vertical. If we define A as the source image, and Gx and Gy are two images which at

each point contain the horizontal and vertical derivative approximations, the computations

are as follows:

Gy =

−1 −2 −10 0 0

+1 +2 +1

∗A and Gx =

+1 0 −1+2 0 −2+1 0 −1

∗A

where ∗ here denotes the 2-dimensional convolution operation.

The x-coordinate is here defined as increasing in the “right”-direction, and the y-coordinate

is defined as increasing in the “down”-direction. At each point in the image, the resulting

gradient approximations can be combined to give the gradient magnitude, using:

G =√

Gx2 + Gy

2

The calculation of the gradient magnitude could be typically approximated using the follow-

ing computation:

|G| = |Gx|+ |Gy|

Using this information, we can also calculate the gradient’s direction:

Θ = arctan

(Gy

Gx

)where, for example, Θ is 0 for a vertical edge which is darker on the left side.

HuCE-microLab 17

2.6. Inverting

2.5.3 Pseudo-Convolution Operator

Often, the absolute magnitude is the only output the user needs - the two components of the Zgradient are conveniently computed and added in a single pass over the input image using

the pseudo-convolution operator Gp.

Gp =

P1 P2 P3

P4 P5 P6

P7 P8 P9

Using this kernel the approximate magnitude is given by:

|G| = |(P1 + 2P2 + P3)− (P7 + 2P8 + P9)|+ |(P3 + 2P6 + P9)− (P1 + 2P4 + P7)|

2.5.4 Guidelines for Use

The Sobel operator is slower to compute than other operators, but its larger convolution

kernel smooths the input image to a greater extent and so makes the operator less sensitive

to noise. The operator also generally produces considerably higher output values for similar

edges, compared with the others.

As with other operators, output values from the operator can easily overflow the maximum

allowed pixel value for image types that only support smallish integer pixel values (e.g. 8-bit

integer images). When this happens the standard practice is to simply set overflowing output

pixels to the maximum allowed value. The problem can be avoided by using an image type

that supports pixel values with a larger range.

Natural edges in images often lead to lines in the output image that are several pixels wide

due to the smoothing effect of the Sobel operator. Some thinning may be desirable to counter

this. Failing that, some sort of hysteresis ridge tracking could be used as in the Canny

operator. 5

2.6 Inverting

Inverting means “reverse the colors”. Since we are working with grayscale pictures exchang- Zing black with white and similar. Inverting is a pixel operation. Therefore, this operation is

applied to every point of the image regardless to it neighbours.

5You find more informations about this topic in R. Gonzalez and R. Woods Digital Image Processing, AddisonWesley, 3rd Edition, 2008, Chapter 3

HuCE-microLab 18

2.6. Inverting

To calculate the inverted value of a pixel, you have to substract the current value from the

absolut maximum. In our example (8-bit PGM) the maximum is 28 − 1 = 255.

HuCE-microLab 19

3. Exercises Image Processing

3 Exercises Image Processing

Contents

3.1 Exercise Nr 1: Create the ESE-Model with one CPU 22

3.2 Exercise Nr 2: Split the program into four processes 24

3.2.1 Verify the reading and writing processes 24

3.2.2 Modify the c-code for the image processing 25

3.2.3 Use one processor with four processes 25

3.2.4 Use four processors with one process per processor 26

3.3 Exercise Nr 3: Insertion of parallelism in the processes 28

3.4 Exercise Nr 4: Optimize the communication 30

3.5 Discussion 31

Welcome to the 21st century! The Digital Revolution which began around the 1980s also

continues into the present. You just began your new position in a high-tec company doing

video processing solutions for big companies such as Samsung or LG. Your first task is to

implement some basic image processing, which will then be used for the video processing.

Welcome to the 21st century! The algorithms for image processing are well known and

you won’t get anybodies attention just by implementing some other contrast-enhancement-

algorithm. In fact, everybodies expectation is to get this simple task done in a matter of

days, not weeks. Your idea of some fancy VHDL design task just went down the sink...

Welcome to the 21st century! Synthesis tools help you out of your misery. The standard

way to implement any image/signal processing algorithm is to hack in some code and to

test it on a model. You’ll then get an estimation of the computation power needed. Maybe,

you don’t need your fancy 20$-FPGA, but a 1$-microcontroller will do just fine. Or maybe,

your ARM Cortex-M3 is not as powerful as expected and you’ll need an ARM Cortex-M4.

Who would have guessed? Maybe your future boss with a solid 10 year of experience, but

he will most probably have to worry about other things than the computation power of the

new ARM Cortex M-7...

Welcome to the 21st century! Do you want your algorithm implemented in hardware, soft-

ware, or both of them? That of course depends on the computation power needed. Luckily,

you know this number since you know how to model it! Our model begins with a simple

sequential program flow - this simply means that you handle one task after the other. In the

following exercise we will implement two simple image processing tasks. In a first task, we

HuCE-microLab 20

3. Exercises Image Processing

stretch the contrast of the picture and in a second task we apply a gamma correction to the

image.

The goal of this exercise is to get an idea of the points you have to pay attention at when

writing code for processes which have to run in a parallel mode.

HuCE-microLab 21

3.1. Exercise Nr 1: Create the ESE-Model with one CPU

3.1 Exercise Nr 1: Create the ESE-Model with one CPU

In this exercise we will debug a programm with eclipse. We use eclipse because ESE

does not supply a syntax-check when changing source-file code, thus making the code-

development very time consuming. Please proceed as follows: b1. Download the source files from the microLab website (Course SoC, File: TBZ2

(Image Ex.1).

2. Type the following commands into your terminal:

mkdir ~/ese_exercise ~/ese_exercise/ex1

cd ~/ese_exercise/ex1

tar -xf ~/Downloads/imageprocessing_ex1.tar.bz2

3. Start up the eclipse for C/C++ development with the command eclipse on a termi-

nal. Let your workspace point to ~/ese_exercise.

4. Import the unpacked project into your workspace.6

Examine the code and make sure that your project builds. As ESE uses slightly different

build options compared to eclipse, there is a precompiler command which distinguished

between ESE and eclipse: search the line #define MODEL in the file global.h. If you

want to model the code in eclipse, make sure that MODEL is defined. If you want to use it in

ESE, erase (or better comment) the line.

Once the project builds, we can close eclipse (do not forget to comment the line #define

MODEL in the file global.h). Start ESE. For this exercise, we will start with a blank

project. We can now add a microblaze SW processor, and use the settings as shown in Table

3.3.

CPU Process Process Ports .c/.h-Files

CPU0 imageprocessing - ReadPGM_aux.c

WritPGM_aux.c

pixel_operation.c

ReadPGM_aux.h

WritePGM_aux.h

global.h

landschaft.h

Table 3.1.: Processor settings for Exercise Nr 1

6You find a detailed description in the appendix

HuCE-microLab 22

3.1. Exercise Nr 1: Create the ESE-Model with one CPU

Next, you can generate and simulate a functional TLM. It should work without errors and

two images should pop up ("landschaft.pgm" and "image_processed.pgm").

The image processing should be clearly visible (enhanced contrast and brighter). Next, run

the timed TLM and note the amount of cycles used to execute the entire program.

HuCE-microLab 23

3.2. Exercise Nr 2: Split the program into four processes

3.2 Exercise Nr 2: Split the program into four processes

In this exercise, you have to split the program into four independent processes which are

communicating over the ESE FIFO channel interface.

Proceed as in exercise 1, but download and extract the file TBZ2 (Image Ex.1) into a

new folder ~/ese_exercise/ex2. Several steps have to be made:

1. Verify the reading and writing processes

2. Modify the c-code for the image processing

3. Use one processor with four processes

4. Use four processors with one process per processor

3.2.1 Verify the reading and writing processes

It is always wise in engineering taks to tackle one problem at a time. This reduces any new

bugs to a well-known source and thus making debugging much easier. Therefore, we will

first divide our program into a read-process and write-process of the image files. No image

processing is done for the moment.

Open a terminal, cd to your newly created directory and open ese. Create a new design, save

it as ex2_rw.eds, and add a microblaze-processor and the following processes:

CPU Process Process Ports .c/.h-files

CPU0 readpgm r2c_if

(blocking_write, send_r2c)

readpgm.c

ReadPGM_aux.c

ReadPGM_aux.h

global.h

landschaft.h

CPU0 writepgm g2w_if

(blocking_read, recv_g2w)

writepgm.c

WritePGM_aux.c

WritePGM_aux.h

global.h

Table 3.2.: Processor settings for Exercise Nr 2-rw

Next, we have to add a communication channel and need to calculate the FIFO buffer size.

Examine the c-code (file readpgm.c) and calculate the FIFO size by yourself. Add +100

bytes in order to overcome any nasty out-of-array-effects. Discuss the buffer size with your

HuCE-microLab 24

3.2. Exercise Nr 2: Split the program into four processes

collegue. Once you are done, you can add a new channel (Channel type: FIFO Channel,

Name: r2w, Size: <your size>, Writer: readpgm, Port: r2c_if, Reader: writepgm, Port:

g2w_if, Mapping: LOCAL ACCESS, Route: LOCAL ACCESS).

Next, you can generate and simulate a functional TLM. It should work without errors and

two images should pop up ("landschaft.pgm" and "image_processed.pgm").

They should look exactly the same. Next, run the timed TLM and note the amount of cycles

used to execute the entire program. Close ESE afterwards.

3.2.2 Modify the c-code for the image processing

After making sure that you can read and write the image files, you now have to finish the two

image processing processes (gamma.c and contrast.c). The read and write processes

are already done and you therefore do not need to modify any other files. Open eclipse, un-

comment the line #define MODEL in the file global.h and make sure that your project

builds. Once you run the binary (right-click on "pixel_operation" in the binaries-folder and

select "run"), two images should pop up ("landschaft.pgm" and "image_processed.pgm").

The second picture is black.

Your job is to modify the code in the files gamma.c and contrast.c such as the second

image is not black, but a nice landscape with perfect contrast and brightness. Your job

consists of two tasks: Firstly, you have to handle the communication (i.e. the FIFO receive

and send-code-snippets). Secondly, you have to do the contrast and gamma enhancement.

As in all engineering tasks, it is wise to takle one problem at a time.

(Hint: Have a look at the files readpgm.c and writepgm.c to get an idea of the com-

munication, and then have a look at the code of the first exercise to get an idea of the image

processing tasks.)

3.2.3 Use one processor with four processes

If you have tested the c-model successfully in eclipse, we will test it in ESE. Do not forget to

first comment the line #define MODEL in the file global.h before switching to ESE.

Open ESE, open your old design ex2_rw.eds and save it as ex2_1processor.eds.

Afterwards, add the following processes:

HuCE-microLab 25

3.2. Exercise Nr 2: Split the program into four processes

CPU Process Process Ports .c/.h-Files

CPU0 readpgm r2c_if

(blocking_write, send_r2c)

readpgm.c

ReadPGM_aux.c

ReadPGM_aux.h

global.h

landschaft.h

CPU0 contrast r2c_if

(blocking_read, recv_r2c)

c2g_if

(blocking_write, send_c2g)

contrast.c

global.h

CPU0 gammavalue c2g_if

(blocking_read, recv_c2g)

g2w_if

(blocking_write, send_g2w)

gamma.c

global.h

CPU0 writepgm g2w_if

(blocking_read, recv_g2w)

writepgm.c

WritePGM_aux.c

WritePGM_aux.h

global.h

Table 3.3.: Processor settings for Exercise Nr 2 - one process

Do not forget to add an RTOS to the processor as we are running multiple processes. Now,

add the communication channels. All channels use the same buffer size which you calculated

earlier. You should manage to figure out the communication channels by yourself, if not,

ask your collegue.

Next, you can generate and simulate a functional TLM. It should work without errors and

two images should pop up ("landschaft.pgm" and "image_processed.pgm").

The image processing should be clearly visible (enhanced contrast and brighter). Next, run

the timed TLM and note the amount of cycles which are used to execute the entire program.

3.2.4 Use four processors with one process per processor

In the next design, we will use four processors with one process per processor. Open ESE,

open your old design ex2_rw.eds and save it as ex2_4processors.eds. Add 3

more processors, a OPB bus and a transducer. The transducer is needed as a shared memory

element for the FIFO’s and to made it possible to talk as a master (processor) with another

master (processor). Therefore you have to add the transducer to your system as a slave. You

HuCE-microLab 26

3.2. Exercise Nr 2: Split the program into four processes

should manage to figure out the processes and communication channels by yourself, if not,

ask your collegue.

Once you generate a design with an OPB bus in the ESE tool, you will run into an error. The ierrormessage is: “Address not set in: CE0 in bus: Bus”. To solve this problem, you have to

enter the lower address in the properties of the bus. But the ESE tool has a bug, which made

it impossible to enter this address. As a workaround you have to close your design an open

the *.eds file in a text editor. There you have to search for the following content:

<TXENTRY name = "CE0 ESETX_PORT_0"/>

and add the value for the requested address to it:

<TXENTRY name = "CE0 ESETX_PORT_0" lowaddr = "0x00200000"/>

After reopening the design in the ESE tool, you should see this address in the properties of

the OPB bus.

Next, you can generate and simulate a functional TLM. It should work without errors and

two images should pop up ("landschaft.pgm" and "image_processed.pgm").

The image processing should be clearly visible (enhanced contrast and brighter). Next, run

the timed TLM and note the amount of cycles which are used to execute the entire program.

Close ESE afterwards.

HuCE-microLab 27

3.3. Exercise Nr 3: Insertion of parallelism in the processes

3.3 Exercise Nr 3: Insertion of parallelism in the processes

As you have discovered, the execution time of the program in the design with 4 CPU’s is not

decreased (compared to the first one). This is happening because the processes are blocked

until the previous one has finished his task. The only thing you have introduced up to now

is communication overhead. Worst trade deal in history of trade deals maybe ever. Wiser

men would conclude that transferring the whole image at once is a bad idea. The processes

should therefore get parallelized in the way that you send the lines immediately to the next

process after the processing. In a system like this, the processors can work in parallel and

therefore the execution time of the whole task will decrease significantly.

In a real situation, you would parallelize all four processes. However, in the simulation

tool we are using, disk read/write cycles are excluded from the simulation. Thus, you do

not need no parallelize the I/O processes (components readpgm and writepgm), but you can

focus on the two image processing processes (files gamma.c and contrast.c). You

have to change them in way to let them receive, process and send smaller blocks of 40 lines. bTo not run into problems with ESE, the recommended way to handle the copy-pasting is as

follows:

1. Type the following commands into your terminal:

mkdir ~/ese_exercise/ex3

cd ~/ese_exercise/ex3

cp -r ../ex2/src ../ex2/image ../ex2/Debug \

../ex2/.project ../ex2/.cproject .

2. Start up the eclipse for C/C++ development with the command eclipse on a termi-

nal. Let your workspace point to ~/ese_exercise.

3. Import the unpacked project into your workspace.7

Uncomment the line #define MODEL and hack in your modifications. If your code is

running correctly in the model, close eclipse and comment the line #define MODEL prior

to opening ESE. The recommended way to handle the copy-pasting is as follows:

1. Open ESE and open your old design ex2_4processors.eds. Save it in your new

folder ~/ese_exercise/ex3 as ex3_4processors.eds.

2. Change the modified sources to point to the new direction (e.g. delete the contrast.c

file in the process contrast and re-add the new file ~/ese_exercise/ex3/src/

contrast.c, and do the same for gamma.c)

7You find a detailed description in the appendix

HuCE-microLab 28

3.3. Exercise Nr 3: Insertion of parallelism in the processes

Generate and run the timed TLM model, have again a look to the execution time and compare

it with the times from above.

HuCE-microLab 29

3.4. Exercise Nr 4: Optimize the communication

3.4 Exercise Nr 4: Optimize the communication

As last step in this exercise we want to optimize the communication. In every communi-

cation scenario you have to decide about the block size which you want to transfer in one

package. There is always a tradeoff between large packages (fewer overhead, long reserva-

tion time of the bus) and small packages (much overhead, short reservation time of the bus).

To transfer 40 lines (with 600 pixels each) seems to be near at the "large package"-case.

The contrary would be to transfer only one pixel in a block, which would produce a huge

communication overhead. As always, the optimum is placed somewhere in between the two

extremes.

In order to not break your design of exercise nr 3, we recommend to create a new folder

(ex4) and do the same copy-pasting you just did.

Try then to transfer a few or only one line per block and analyze the performance of the

system. We are interested in the total execution cycles, therefore focus on the output of the bterminal.

If you have coded in a intelligent way, you can only change the value of the symbol ZLINES_PER_BLOCK

in the global header file to change the amount of transferred lines.

HuCE-microLab 30

3.5. Discussion

3.5 Discussion

Discuss with your collegue the following questions:

1. Which design has the fewest total execution cycles?

2. Why is the one-processor-design of exercise 2 performing this bad?

3. To gain speed, we could parallelize the particular algorithms. In such a case, processor

1 would process all uneven packages, and processor 2 would proces all even packages,

thus every processor would only need to process half of the total data. Is this a good

idea? Guess the new total execution cycles.

4. In the tasks above, we only optimized for total execution cycles. What are other goals

and ressources that your future boss could be interested in to know? Is the final design

(four microblaze softcore processors implemented in a FPGA) the best design that

you could think of? What could be another interesting design to investigate? How

could you save ressources?

HuCE-microLab 31

4. Exercise Video Processing

4 Exercise Video Processing

Contents

4.1 Exercise Nr 1: Optimize the Load Balance 32

4.2 Exercise Nr 2: Add an Inverter to the System 33

4.3 Exercise Nr 3: Reduce Edge Detection to a Single Process 34

4.4 Exercise Nr 4: Optimize the Median Filter 35

4.5 Outlook 36

4.1 Exercise Nr 1: Optimize the Load Balance

Download the source files from the microLab website (Course SoC, File: TGZ (Video)).

Copy the source to your desired destination in the same way as you did in the previous

exercise.

Figure 4.1.: Block schema of the system.

Try to optimize the load balance beetween the CPUs. In the basic configuration which you

can download from the website, every process has it one CPU. This is the simplest but bnot the efficient way referenced to the use of ressources. First have a look on the charts

which show you the percentage of time which is used for computing. Use the Xilinx RTOS

(xilkernel) to run more than one process on a CPU, as you have seen it in the Lab

2 of the tutorial. Take care of the RTOS overhead which occures caused to the context

switches between the different tasks. This value would increase non linear with the number

of processes which you run on a CPU.

HuCE-microLab 32

4.2. Exercise Nr 2: Add an Inverter to the System

4.2 Exercise Nr 2: Add an Inverter to the System

median

sobelhoriz

sobelvert

gradient inverterreadpgm writepgm

Figure 4.2.: New block schema after adding an inverter to the system.

To be able to test changes on code which stays behind the processes we need a model. The badditional functions we need to run the ese code are already integrated in the project. You

can enable/disable them by uncomment the line #define MODEL. To be able to edit and

test the C-model in eclipse, you have to execute the following steps:

1. Startup the eclipse for C/C++ developement with the command eclipse on a ter-

minal. Let your workspace point to ~/ese_videoexercise.

2. Import the downloaded project into your workspace.

Now you are able to debug your programm when you implement new features.

Have a look at the code of the existing components to get used to the communication in-

terface used in ESE and to the way of programming which is used for a PE. For the theory

about image inverting see section 2.6.

When you have implemented and tested the inverter successfully, you can close eclipse and

you then have to rearrange your design corresponding to the changed source code in ESE.

As a last step, optimize the new system again to reach a good load balance.

HuCE-microLab 33

4.3. Exercise Nr 3: Reduce Edge Detection to a Single Process

4.3 Exercise Nr 3: Reduce Edge Detection to a Single Pro-cess

Figure 4.3.: New block schema after combining the three edge detection processes into one singleprocess.

Combine the three edge detection processes to one single process by using the pseudo-

convolution operartor (see subsection 2.5.3). bHint: You have to create a new filter function which does this calculation and apply it to the

frame by using the Filter3x3C()-function.

HuCE-microLab 34

4.4. Exercise Nr 4: Optimize the Median Filter

4.4 Exercise Nr 4: Optimize the Median Filter

As you have seen in the previous exercises, the quicksort part of the median filter is the

breaking element for the whole system. There are other algorithm instead of sorting to find bthe median. Look in the internet8 for faster algorithm and implement one.

Hint: Again, you have to edit only the filter function.

Figure 4.4.: Improving the speed of the median filter.

8For example: http://ndevilla.free.fr/median/median/index.html

HuCE-microLab 35

4.5. Outlook

4.5 Outlook

As you have seen in the exercises, the median filter algorithmus the computationally inten-

sive element. The next step is now to implement this component, or at least the median

algorithm, in a hardware block. The tool provide different PE out of the library to do this:

â NISC (No-Instruction-Set-Computer)

â Forte (ESL9 Synthesis Tool)

In this way, you are able to speed up the your design in the area of a decade or even more.

9Electronic system level

HuCE-microLab 36

A. Using Eclipse

A Using Eclipse

A.1 Importing an Existing Project

To import an existing project into your current workspace, you have to apply the following

steps:

1. Open the menu “File”⇒ “Import”. Then the dialog shown in figure A.1 appears.

2. Choose “Existing Projects into Workspace” out of the list.

3. Select the downloaded archive file. The included project is automatically chosen. (See

figure A.2)

4. Click the Finish-button to import the project.

Figure A.1.: Select an import source.

HuCE-microLab 37

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.2.: Select the root directory where you unpacked the downloaded templates.

A.2 Debugging a Functional or Timed TLM in Eclipse withGDB

A.2.1 Import the Model

If you have segmentation faults while running your Transaction-Level-Models, you have

normally a problem in your memory management. Therefore you want to know where your

program crash. To do this you need the possibility of debugging your code. You have to do

the following steps, to import the TLM into the Eclipse IDE and use the GDB to debug the

program:

1. As the first step, you have to create a new C Project. (See figure A.3 and A.4)

HuCE-microLab 38

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.3.: Open the project creating dialog.

Figure A.4.: Create a new C Project.

HuCE-microLab 39

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.5.: Select “Makefile project” as project type.

2. You have to select “Makefile project” as project type, because the TLM is built with a

Makefile with the target tlm. (See figure A.5)

HuCE-microLab 40

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.6.: Copy the TLM into the project folder.

3. Copy all the files which are in the {Project_name}_timed_TLM folder into your

eclipse project folder. (See figure A.6)

HuCE-microLab 41

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.7.: Create a new make-target.

4. Create a new make-target called tlm in the “Make Target”-view. If you don’t have this

view, you can open it going to the menu “Window”⇒ ““Show View”⇒ “Other...”.

Type “make” into the filter mask and then select Make Targets. (See figure A.7)

HuCE-microLab 42

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

Figure A.8.: Enter “tlm” as target name.

Figure A.9.: Change the build target in the properties.

5. As a last step, you have to change the properties of your project. Right-click on the

project name and select “Properties” in the appeared context menu. Then change the

Make build targets for saving and building at the “Behavior” tab in the “C/C++ Build”

options. (See figure A.8)

Now you can debug your generated TLMs in the same way as you are doing it for the other

projects. If you made some changes in the code, you have to rebuild the TLM with the ESE

tool and copy the files again into the eclipse project folder.

HuCE-microLab 43

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

A.2.2 Common Problems

Wrong project name: If the name of your Eclipse project is not equal to the ESE project

name, then make can’t find the resources. The solution is to rename the Eclipse project or

to change the folder name in the Makefile.

HuCE-microLab 44

A.2. Debugging a Functional or Timed TLM in Eclipse with GDB

HuCE-microLab 45