embedded system environment - · pdf fileembedded system environment on the university of...
TRANSCRIPT
Embedded SystemEnvironment
Exercises
Contact:
Bern University of Applied Sciences
Engineering and Information Technology
HuCE-microLab
Quellgasse 21
CH-2501 Biel
Website HuCE-microLab
www.microlab.ch
Website BFH-TI
www.ti.bfh.ch
Biel - March 20, 2018
Contents
Contents
1 Introduction 6
1.1 Motivation 6
1.2 Naming conventions and marks 7
1.3 ESE Startup and Settings 7
2 Image Processing Theory 8
2.1 Description of the Video Processing System 9
2.2 PGM Image File Format 10
2.2.1 PGM example 10
2.3 Pixel Operation 11
2.3.1 Contrast Enhancement 11
2.3.2 Gamma Correction 12
2.4 Median Filter 13
2.4.1 Mathematics 14
2.4.2 How it Works 14
2.4.3 Guidelines for Use 14
2.5 Edge Detection with the Sobel Operator 15
2.5.1 Sobel Operator 16
2.5.2 Mathematics 17
2.5.3 Pseudo-Convolution Operator 18
2.5.4 Guidelines for Use 18
2.6 Inverting 18
3 Exercises Image Processing 20
3.1 Exercise Nr 1: Create the ESE-Model with one CPU 22
3.2 Exercise Nr 2: Split the program into four processes 24
3.2.1 Verify the reading and writing processes 24
3.2.2 Modify the c-code for the image processing 25
3.2.3 Use one processor with four processes 25
3.2.4 Use four processors with one process per processor 26
3.3 Exercise Nr 3: Insertion of parallelism in the processes 28
3.4 Exercise Nr 4: Optimize the communication 30
3.5 Discussion 31
HuCE-microLab 2
Contents
4 Exercise Video Processing 32
4.1 Exercise Nr 1: Optimize the Load Balance 32
4.2 Exercise Nr 2: Add an Inverter to the System 33
4.3 Exercise Nr 3: Reduce Edge Detection to a Single Process 34
4.4 Exercise Nr 4: Optimize the Median Filter 35
4.5 Outlook 36
A Using Eclipse 37
A.1 Importing an Existing Project 37
A.2 Debugging a Functional or Timed TLM in Eclipse with GDB 38
A.2.1 Import the Model 38
A.2.2 Common Problems 44
HuCE-microLab 3
Contents
Document information
Version Summary
AUTHOR DATE VERSION DESCRIPTION
zor1 23-04-2010 1.00 Initial version
hga3 25-05-2010 — Testing environment
hga3 04-03-2013 1.10 Set-up new VM env
kip1 13-03-2013 1.10 Adapt docu to new VM env
kip1 11-03-2014 1.20 Added the appendix again and removed bad links
bwc1 19-03-2018 1.30 Completely refined instruction of image processing exercise
Version verification
This tutorial has been verified on the following software realases:
Software
TOOL VERSION
OS Ubuntu 10.04 (Dedicated HuCE-microLab VM)
ESE Front End 2.0a (API Version 0.1.0b)
HuCE-microLab 4
Contents
Preface
This Exercise is based on the Labs of the Tutorial for the ESE Tools from the Center of
Embedded System Environment on the University of California, Irvine. The Tutorial was
rewriten and slightly adapted to fit with the environement of BFH-TI.
The authors of the original tutorial are Yongjin Ahn, Samar Abdi and Daniel Gajski from
CECS at UCI. The author of the BFH version is the same as the one of the current document.
HuCE-microLab 5
1. Introduction
1 Introduction
The basic purpose of this exercise is to introduce a user deeper into the Embedded Sys-
tem Environment (ESE ) Front End. ESE helps designers to take C/C++ application pro-
cesses and graphical platform capture and automatically produce Transaction Level Models
(TLMs) for functional verification and performance estimation. Extensive information about
ESE and its projected impact on embedded system design processes is available on the web-
site at http://www.cecs.uci.edu/~ese
The Exercise assume the basic knowledge of the tutorial. First you will optimize a im-
plemented design by using CPU’s for multiple processes. In the second part you will add
additional processes to the system and then you have again to optimize the whole system.
The system includes basic image processing methods which are described in the theoretical
part of this document. A detailed description of the system is placed in section 2.1.
1.1 Motivation
The rise in complexity of modern design has forced system designers to move to higher lev-
els of abstraction above Register Transfer Level (RTL) and traditional cycle accurate design.
Therefore, models such as TLMs that provide manyfold speedup over RTL simulation are
being used. However, in order for TLMs to be synthesizable to Hardware (HW) and Soft-
ware (SW) implementation, they must follow well defined semantics. These semantics are
currently missing in the industry and TLM standards. Moreover, enforcing semantics is not
easy with manual modeling.
Secondly, embedded application developers come from a variety of different engineering
backgrounds and are not necessarily adept at electronic design. Model automation tools are
needed for such developers so that they do not need to learn modeling languages such as
SystemC.
Thirdly, businesses that use external suppliers for their embedded system designs need un-
abmiguous executable specifications for design hand-off. An even better proposition would
be to build pre-silicon board prototypes in house. This would reduce the chances for mis-
communication in requirement specification and lead to a more robust design process. Con-
sequently, tools are required that take abstract applications and platforms and quikcly pro-
duce fast TLMs and board prototypes.
HuCE-microLab 6
1.2. Naming conventions and marks
It is with these challenges in mind that we came up with ESE that takes off the drudgery
of manual modeling from system designers. It enables non-experts to create system models
and generate board prototypes using a convenient graphical interface.
1.2 Naming conventions and marks
We have some conventions especially on naming that are intended to be consistent through-
out this document. Manufacturer and product names are formatted in accordance with the
standard rules of English grammar, e.g. “This is an example”. Manufacturer and model
names are proper nouns, and thus are written bold and beginning with a capital letter, e.g.
MicroBlaze.
This tutorial has some parts that are marked with icons on the margin to help you finding
important parts or parts you could skip. The following icons are used:
Notes: This indicates a note. Notes are used to mark information that could help you or iindicate a possible “weirdness” in a specific lab as well as in a sub-part of a lab that would
be explained with additional information or helpful links.
Warning: This is a warning. In contrast to notes mentioned above, a warning should be Ztaken more seriously. While ignoring notes will not cause any problems conversely ignoring
warnings could cause problems.
Exercise: This icon marks a section that is intended especially for students. The exercises bare for checking the hopefully learnt knowledge during the previous steps. It is important to
do this parts patiently for getting a solid and well formed basic knowledge about the ESE
and the work with the ESE Front End as well.
1.3 ESE Startup and Settings
To start the ESE Front End type the command ese into a terminal.
HuCE-microLab 7
2. Image Processing Theory
2 Image Processing Theory
Contents
2.1 Description of the Video Processing System 9
2.2 PGM Image File Format 10
2.2.1 PGM example 10
2.3 Pixel Operation 11
2.3.1 Contrast Enhancement 11
2.3.2 Gamma Correction 12
2.4 Median Filter 13
2.4.1 Mathematics 14
2.4.2 How it Works 14
2.4.3 Guidelines for Use 14
2.5 Edge Detection with the Sobel Operator 15
2.5.1 Sobel Operator 16
2.5.2 Mathematics 17
2.5.3 Pseudo-Convolution Operator 18
2.5.4 Guidelines for Use 18
2.6 Inverting 18
In this exercise we implement a video processing system. Since videos are created out of
several pictures we can use classical image processing methods to process the videos. The
difference compared to image processing is that your input is a stream of images instead of
single a image. This fact allows you to use the advatages of parallelism in multi processor
and heterogenous systems.
In the following sections the relevant parts of the huge field of image processing are ex-
plained in a short way. You should read the first section to get introduced to the system Zwhich the exercises are baed on. The additional sections are details about the different tech-
niques which are used in the system. You can read them when you need the corresponding
informations to solve the task of an exercise.
HuCE-microLab 8
2.1. Description of the Video Processing System
2.1 Description of the Video Processing System
The idea behind the system is to process the video in a matter that you can finaly extract
features out of it. An example is speed or distance measuring in a traffic surveillance video
of vehicles.
Unfortunately the input signal is noisy. Therefore the first element in the system is a denois-
ing element in the form of a 3x3 median filter. The next element will extract the features out
of the image. This will be done with a edge detection algorithm which is appling the sobel
operator to the image. The whole scheme is presented in the figure 2.1
Figure 2.1.: Block schema of the video processing system.
Each yellow block in figure 2.1 represents a process element (PE). In the basic configuration
every PE is placed on a CPU (see figure 2.2). Needless to say that this design would not be
optimal. Therefore the first exercise is to optimize this multi processor design by placing
several PE’s on one CPU. In addition the edge detection and the median filter algorithms are
not implemented in a optimal way. Therefore two of the exercises are optimizations of the
implementations of this algorithms.
Figure 2.2.: Block schema of the video processing system including the CPU’s in the basic configu-ration.
HuCE-microLab 9
2.2. PGM Image File Format
2.2 PGM Image File Format
The PGM file format is part of the group of netpbm formats. They have been developed
in the early 1980s for easily exchange images between platforms. They are also sometimes
referred to collectively as the portable anymap format (PNM).1
Each format differs in what colors it is designed to represent:
â PBM is for bitmaps (black and white, no grays)
â PGM is for grayscale
â PPM is for “pixmaps” which represent full RGB color.
Each file starts with a two-byte file descriptor (in ASCII) that explains the type of file it is
(PBM, PGM, and PPM) and its encoding (ASCII or binary). The descriptor is a capital P
followed by a single digit number (see table 2.1).
File Descriptor Type EncodingP1 Portable bitmap ASCII
P2 Portable graymap ASCII
P3 Portable pixmap ASCII
P4 Portable bitmap Binary
P5 Portable graymap Binary
P6 Portable pixmap Binary
Table 2.1.: File format descriptors for netpbm formats.
The ASCII based formats allow for human-readability and easy transport to other platforms
(so long as those platforms understand ASCII), while the binary formats are more efficient
both at saving space in the file, as well as being easier to parse due to the lack of whitespace.
When using the binary formats, PBM uses 1 bit per pixel, PGM uses 8 bits per pixel, and
PPM uses 24 bits per pixel: 8 for red, 8 for green, 8 for blue.
2.2.1 PGM example
The PGM and PPM formats (both ASCII and binary versions) have an additional parame-
ter for the maximum value (numbers of grey between black and white) after the X and Y
dimensions and before the actual pixel data. Black is 0 and max value is white. There is a
newline character at the end of each lines.
1http://en.wikipedia.org/wiki/Netpbm_format
HuCE-microLab 10
2.3. Pixel Operation
(a) Content of a ASCII PGM file
(b) Image representation of the content above.
Figure 2.3.: PGM example.
2.3 Pixel Operation
Operations which can be applied without knowing the values of its neighbors are so called
pixel operations. In the image processing exercise we use two pixel operations:
contrast enhancement Stretch the gray values of the image to the full range of 256 values
of the 8-bit gray-level.
gamma correction Is used to code and decode luminance or tristimulus values in video or
image systems.
These two operations are described in the following subsections.
2.3.1 Contrast Enhancement
Frequently, an image is scanned in such a way that the resulting brightness values do not
make full use of the available dynamic range. This can be easily observed in the histogram
of the brightness values. By stretching the histogram over the available dynamic range we
attempt to correct this situation. If the image is intended to go from brightness 0 to brightness
2B - 1, then one generally maps the 0% value (or minimum) to the value 0 and the 100%
value (or maximum) to the value 2B - 1. The appropriate transformation is given by:
HuCE-microLab 11
2.3. Pixel Operation
bm.n = (2B − 1) ·am,n −minimum
maximum−minimum
Figure 2.4.: The left image shows the original picture. In the right image, the contrast is enhancedto maximum possible without any loss.
It is also possible to apply the contrast-stretching operation on a regional basis using the
histogram from a region to determine the appropriate limits for the algorithm.
2.3.2 Gamma Correction
Gamma correction2 , gamma non-linearity, gamma encoding, or often simply gamma, is the
name of a nonlinear operation used to code and decode luminance or tristimulus values in
video or still image systems. Gamma correction is, in the simplest cases, defined by the
following power-law expression:
Vout = V γin
where the input and output values are non-negative real values, typically in a predetermined
range such as 0 to 1. A gamma value γ < 1 is sometimes called an encoding gamma, and the
process of encoding with this compressive power-law non-linearity is called gamma com-
pression; conversely a gamma value γ > 1 is called a decoding gamma and the application
of the expansive power-law non-linearity is called gamma expansion. The appropriate trans-
formation is given by:
bm.n = (2B − 1) ·(
am,n2B − 1
)γ
HuCE-microLab 12
2.4. Median Filter
Figure 2.5.: Example of CRT gamma correction.
Figure 2.6.: Gamma correction demonstration: Each panel shows the display gamma that the pixelvalues have been adjusted for; for example, the pixels in the second panel are propor-tional to intensity to the 1/2 power, so the image looks approximately correct on a typicalPC monitor.
2.4 Median Filter
The median filter is normally used to reduce noise in an image, somewhat like the mean
filter. However, it often does a better job than the mean filter of preserving useful detail in
the image.2
2http://homepages.inf.ed.ac.uk/rbf/HIPR2/median.htm
HuCE-microLab 13
2.4. Median Filter
2.4.1 Mathematics
The median x̃ of a controlled sample set (x1, x2, . . . , xn) of n measured value is
x̃ =
xn+12
n odd12
(xn
2+ xn
2+1
)n even.
2.4.2 How it Works
Like the mean filter, the median filter considers each pixel in the image in turn and looks at
its nearby neighbors to decide whether or not it is representative of its surroundings. Instead
of simply replacing the pixel value with the mean of neighboring pixel values, it replaces it
with the median of those values. The median is calculated by first sorting all the pixel values
from the surrounding neighborhood into numerical order and then replacing the pixel being
considered with the middle pixel value. (If the neighborhood under consideration contains
an even number of pixels, the average of the two middle pixel values is used.) Figure 4.4
illustrates an example calculation.
124 126 127
125150120
115 119 123
123 125 126 130 140
135
134
133
130120110116111
119
118
122Neighbourhood values:
115, 119, 120, 123, 124,
125, 126, 127, 150
Median value:
124
Figure 2.7.: Calculating the median value of a pixel neighborhood. As can be seen, the central pixelvalue of 150 is rather unrepresentative of the surrounding pixels and is replaced withthe median value: 124. A 3×3 square neighborhood is used here - larger neighborhoodswill produce more severe smoothing.
2.4.3 Guidelines for Use
By calculating the median value of a neighborhood rather than the mean filter, the median
filter has two main advantages over the mean filter:
â The median is a more robust average than the mean and so a single very unrepresen-
tative pixel in a neighborhood will not affect the median value significantly.
HuCE-microLab 14
2.5. Edge Detection with the Sobel Operator
â Since the median value must actually be the value of one of the pixels in the neigh-
borhood, the median filter does not create new unrealistic pixel values when the filter
straddles an edge. For this reason the median filter is much better at preserving sharp
edges than the mean filter.
In general, the median filter allows a great deal of high spatial frequency detail to pass while
remaining very effective at removing noise on images where less than half of the pixels in
a smoothing neighborhood have been effected. (As a consequence of this, median filtering
can be less effective at removing noise from images corrupted with Gaussian noise.)
One of the major problems with the median filter is that it is relatively expensive and com-
plex to compute. To find the median it is necessary to sort all the values in the neighborhood
into numerical order and this is relatively slow, even with fast sorting algorithms such as
quicksort. The basic algorithm can, however, be enhanced somewhat for speed. A common
technique is to notice that when the neighborhood window is slid across the image, many of
the pixels in the window are the same from one step to the next, and the relative ordering of
these with each other will obviously not have changed. Clever algorithms make use of this
to improve performance.
2.5 Edge Detection with the Sobel Operator
Edge detection is a terminology in image processing and computer vision, particularly in
the areas of feature detection and feature extraction, to refer to algorithms which aim at
identifying points in a digital image at which the image brightness changes sharply or more
formally has discontinuities.3
The purpose of detecting sharp changes in image brightness is to capture important events
and changes in properties of the world. It can be shown that under rather general assumptions
for an image formation model, discontinuities in image brightness are likely to correspond
to
â discontinuities in depth,
â discontinuities in surface orientation,
â changes in material properties and
â variations in scene illumination.
In the ideal case, the result of applying an edge detector to an image may lead to a set of
connected curves that indicate the boundaries of objects, the boundaries of surface markings
as well curves that correspond to discontinuities in surface orientation. Thus, applying an
3http://en.wikipedia.org/wiki/Edge_detection
HuCE-microLab 15
2.5. Edge Detection with the Sobel Operator
edge detector to an image may significantly reduce the amount of data to be processed and
may therefore filter out information that may be regarded as less relevant, while preserving
the important structural properties of an image. If the edge detection step is successful, the
subsequent task of interpreting the information contents in the original image may therefore
be substantially simplified.
2.5.1 Sobel Operator
The Sobel operator is used in image processing, particularly within edge detection algo-
rithms. Technically, it is a discrete differentiation operator, computing an approximation of
the gradient of the image intensity function. At each point in the image, the result of the
Sobel operator is either the corresponding gradient vector or the norm of this vector. The
Sobel operator is based on convolving the image with a small, separable, and integer valued
filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of
computations. On the other hand, the gradient approximation which it produces is relatively
crude, in particular for high frequency variations in the image.4
(a) A color picture of a steam engine. (b) The Sobel operator applied to that image.
Figure 2.8.: Exaple of an edge detection with the sobel operator.
In simple terms, the operator calculates the gradient of the image intensity at each point, giv-
ing the direction of the largest possible increase from light to dark and the rate of change in
that direction. The result therefore shows how “abruptly” or “smoothly” the image changes
at that point, and therefore how likely it is that that part of the image represents an edge, as
well as how that edge is likely to be oriented. In practice, the magnitude (likelihood of an
edge) calculation is more reliable and easier to interpret than the direction calculation.
4http://en.wikipedia.org/wiki/Sobel_operator
HuCE-microLab 16
2.5. Edge Detection with the Sobel Operator
2.5.2 Mathematics
Mathematically, the gradient of a two-variable function (here the image intensity function) is
at each image point a 2D vector with the components given by the derivatives in the horizon-
tal and vertical directions. At each image point, the gradient vector points in the direction of
largest possible intensity increase, and the length of the gradient vector corresponds to the
rate of change in that direction. This implies that the result of the Sobel operator at an image
point which is in a region of constant image intensity is a zero vector and at a point on an
edge is a vector which points across the edge, from darker to brighter values.
Mathematically, the operator uses two 3x3 kernels which are convolved with the original
image to calculate approximations of the derivatives - one for horizontal changes, and one
for vertical. If we define A as the source image, and Gx and Gy are two images which at
each point contain the horizontal and vertical derivative approximations, the computations
are as follows:
Gy =
−1 −2 −10 0 0
+1 +2 +1
∗A and Gx =
+1 0 −1+2 0 −2+1 0 −1
∗A
where ∗ here denotes the 2-dimensional convolution operation.
The x-coordinate is here defined as increasing in the “right”-direction, and the y-coordinate
is defined as increasing in the “down”-direction. At each point in the image, the resulting
gradient approximations can be combined to give the gradient magnitude, using:
G =√
Gx2 + Gy
2
The calculation of the gradient magnitude could be typically approximated using the follow-
ing computation:
|G| = |Gx|+ |Gy|
Using this information, we can also calculate the gradient’s direction:
Θ = arctan
(Gy
Gx
)where, for example, Θ is 0 for a vertical edge which is darker on the left side.
HuCE-microLab 17
2.6. Inverting
2.5.3 Pseudo-Convolution Operator
Often, the absolute magnitude is the only output the user needs - the two components of the Zgradient are conveniently computed and added in a single pass over the input image using
the pseudo-convolution operator Gp.
Gp =
P1 P2 P3
P4 P5 P6
P7 P8 P9
Using this kernel the approximate magnitude is given by:
|G| = |(P1 + 2P2 + P3)− (P7 + 2P8 + P9)|+ |(P3 + 2P6 + P9)− (P1 + 2P4 + P7)|
2.5.4 Guidelines for Use
The Sobel operator is slower to compute than other operators, but its larger convolution
kernel smooths the input image to a greater extent and so makes the operator less sensitive
to noise. The operator also generally produces considerably higher output values for similar
edges, compared with the others.
As with other operators, output values from the operator can easily overflow the maximum
allowed pixel value for image types that only support smallish integer pixel values (e.g. 8-bit
integer images). When this happens the standard practice is to simply set overflowing output
pixels to the maximum allowed value. The problem can be avoided by using an image type
that supports pixel values with a larger range.
Natural edges in images often lead to lines in the output image that are several pixels wide
due to the smoothing effect of the Sobel operator. Some thinning may be desirable to counter
this. Failing that, some sort of hysteresis ridge tracking could be used as in the Canny
operator. 5
2.6 Inverting
Inverting means “reverse the colors”. Since we are working with grayscale pictures exchang- Zing black with white and similar. Inverting is a pixel operation. Therefore, this operation is
applied to every point of the image regardless to it neighbours.
5You find more informations about this topic in R. Gonzalez and R. Woods Digital Image Processing, AddisonWesley, 3rd Edition, 2008, Chapter 3
HuCE-microLab 18
2.6. Inverting
To calculate the inverted value of a pixel, you have to substract the current value from the
absolut maximum. In our example (8-bit PGM) the maximum is 28 − 1 = 255.
HuCE-microLab 19
3. Exercises Image Processing
3 Exercises Image Processing
Contents
3.1 Exercise Nr 1: Create the ESE-Model with one CPU 22
3.2 Exercise Nr 2: Split the program into four processes 24
3.2.1 Verify the reading and writing processes 24
3.2.2 Modify the c-code for the image processing 25
3.2.3 Use one processor with four processes 25
3.2.4 Use four processors with one process per processor 26
3.3 Exercise Nr 3: Insertion of parallelism in the processes 28
3.4 Exercise Nr 4: Optimize the communication 30
3.5 Discussion 31
Welcome to the 21st century! The Digital Revolution which began around the 1980s also
continues into the present. You just began your new position in a high-tec company doing
video processing solutions for big companies such as Samsung or LG. Your first task is to
implement some basic image processing, which will then be used for the video processing.
Welcome to the 21st century! The algorithms for image processing are well known and
you won’t get anybodies attention just by implementing some other contrast-enhancement-
algorithm. In fact, everybodies expectation is to get this simple task done in a matter of
days, not weeks. Your idea of some fancy VHDL design task just went down the sink...
Welcome to the 21st century! Synthesis tools help you out of your misery. The standard
way to implement any image/signal processing algorithm is to hack in some code and to
test it on a model. You’ll then get an estimation of the computation power needed. Maybe,
you don’t need your fancy 20$-FPGA, but a 1$-microcontroller will do just fine. Or maybe,
your ARM Cortex-M3 is not as powerful as expected and you’ll need an ARM Cortex-M4.
Who would have guessed? Maybe your future boss with a solid 10 year of experience, but
he will most probably have to worry about other things than the computation power of the
new ARM Cortex M-7...
Welcome to the 21st century! Do you want your algorithm implemented in hardware, soft-
ware, or both of them? That of course depends on the computation power needed. Luckily,
you know this number since you know how to model it! Our model begins with a simple
sequential program flow - this simply means that you handle one task after the other. In the
following exercise we will implement two simple image processing tasks. In a first task, we
HuCE-microLab 20
3. Exercises Image Processing
stretch the contrast of the picture and in a second task we apply a gamma correction to the
image.
The goal of this exercise is to get an idea of the points you have to pay attention at when
writing code for processes which have to run in a parallel mode.
HuCE-microLab 21
3.1. Exercise Nr 1: Create the ESE-Model with one CPU
3.1 Exercise Nr 1: Create the ESE-Model with one CPU
In this exercise we will debug a programm with eclipse. We use eclipse because ESE
does not supply a syntax-check when changing source-file code, thus making the code-
development very time consuming. Please proceed as follows: b1. Download the source files from the microLab website (Course SoC, File: TBZ2
(Image Ex.1).
2. Type the following commands into your terminal:
mkdir ~/ese_exercise ~/ese_exercise/ex1
cd ~/ese_exercise/ex1
tar -xf ~/Downloads/imageprocessing_ex1.tar.bz2
3. Start up the eclipse for C/C++ development with the command eclipse on a termi-
nal. Let your workspace point to ~/ese_exercise.
4. Import the unpacked project into your workspace.6
Examine the code and make sure that your project builds. As ESE uses slightly different
build options compared to eclipse, there is a precompiler command which distinguished
between ESE and eclipse: search the line #define MODEL in the file global.h. If you
want to model the code in eclipse, make sure that MODEL is defined. If you want to use it in
ESE, erase (or better comment) the line.
Once the project builds, we can close eclipse (do not forget to comment the line #define
MODEL in the file global.h). Start ESE. For this exercise, we will start with a blank
project. We can now add a microblaze SW processor, and use the settings as shown in Table
3.3.
CPU Process Process Ports .c/.h-Files
CPU0 imageprocessing - ReadPGM_aux.c
WritPGM_aux.c
pixel_operation.c
ReadPGM_aux.h
WritePGM_aux.h
global.h
landschaft.h
Table 3.1.: Processor settings for Exercise Nr 1
6You find a detailed description in the appendix
HuCE-microLab 22
3.1. Exercise Nr 1: Create the ESE-Model with one CPU
Next, you can generate and simulate a functional TLM. It should work without errors and
two images should pop up ("landschaft.pgm" and "image_processed.pgm").
The image processing should be clearly visible (enhanced contrast and brighter). Next, run
the timed TLM and note the amount of cycles used to execute the entire program.
HuCE-microLab 23
3.2. Exercise Nr 2: Split the program into four processes
3.2 Exercise Nr 2: Split the program into four processes
In this exercise, you have to split the program into four independent processes which are
communicating over the ESE FIFO channel interface.
Proceed as in exercise 1, but download and extract the file TBZ2 (Image Ex.1) into a
new folder ~/ese_exercise/ex2. Several steps have to be made:
1. Verify the reading and writing processes
2. Modify the c-code for the image processing
3. Use one processor with four processes
4. Use four processors with one process per processor
3.2.1 Verify the reading and writing processes
It is always wise in engineering taks to tackle one problem at a time. This reduces any new
bugs to a well-known source and thus making debugging much easier. Therefore, we will
first divide our program into a read-process and write-process of the image files. No image
processing is done for the moment.
Open a terminal, cd to your newly created directory and open ese. Create a new design, save
it as ex2_rw.eds, and add a microblaze-processor and the following processes:
CPU Process Process Ports .c/.h-files
CPU0 readpgm r2c_if
(blocking_write, send_r2c)
readpgm.c
ReadPGM_aux.c
ReadPGM_aux.h
global.h
landschaft.h
CPU0 writepgm g2w_if
(blocking_read, recv_g2w)
writepgm.c
WritePGM_aux.c
WritePGM_aux.h
global.h
Table 3.2.: Processor settings for Exercise Nr 2-rw
Next, we have to add a communication channel and need to calculate the FIFO buffer size.
Examine the c-code (file readpgm.c) and calculate the FIFO size by yourself. Add +100
bytes in order to overcome any nasty out-of-array-effects. Discuss the buffer size with your
HuCE-microLab 24
3.2. Exercise Nr 2: Split the program into four processes
collegue. Once you are done, you can add a new channel (Channel type: FIFO Channel,
Name: r2w, Size: <your size>, Writer: readpgm, Port: r2c_if, Reader: writepgm, Port:
g2w_if, Mapping: LOCAL ACCESS, Route: LOCAL ACCESS).
Next, you can generate and simulate a functional TLM. It should work without errors and
two images should pop up ("landschaft.pgm" and "image_processed.pgm").
They should look exactly the same. Next, run the timed TLM and note the amount of cycles
used to execute the entire program. Close ESE afterwards.
3.2.2 Modify the c-code for the image processing
After making sure that you can read and write the image files, you now have to finish the two
image processing processes (gamma.c and contrast.c). The read and write processes
are already done and you therefore do not need to modify any other files. Open eclipse, un-
comment the line #define MODEL in the file global.h and make sure that your project
builds. Once you run the binary (right-click on "pixel_operation" in the binaries-folder and
select "run"), two images should pop up ("landschaft.pgm" and "image_processed.pgm").
The second picture is black.
Your job is to modify the code in the files gamma.c and contrast.c such as the second
image is not black, but a nice landscape with perfect contrast and brightness. Your job
consists of two tasks: Firstly, you have to handle the communication (i.e. the FIFO receive
and send-code-snippets). Secondly, you have to do the contrast and gamma enhancement.
As in all engineering tasks, it is wise to takle one problem at a time.
(Hint: Have a look at the files readpgm.c and writepgm.c to get an idea of the com-
munication, and then have a look at the code of the first exercise to get an idea of the image
processing tasks.)
3.2.3 Use one processor with four processes
If you have tested the c-model successfully in eclipse, we will test it in ESE. Do not forget to
first comment the line #define MODEL in the file global.h before switching to ESE.
Open ESE, open your old design ex2_rw.eds and save it as ex2_1processor.eds.
Afterwards, add the following processes:
HuCE-microLab 25
3.2. Exercise Nr 2: Split the program into four processes
CPU Process Process Ports .c/.h-Files
CPU0 readpgm r2c_if
(blocking_write, send_r2c)
readpgm.c
ReadPGM_aux.c
ReadPGM_aux.h
global.h
landschaft.h
CPU0 contrast r2c_if
(blocking_read, recv_r2c)
c2g_if
(blocking_write, send_c2g)
contrast.c
global.h
CPU0 gammavalue c2g_if
(blocking_read, recv_c2g)
g2w_if
(blocking_write, send_g2w)
gamma.c
global.h
CPU0 writepgm g2w_if
(blocking_read, recv_g2w)
writepgm.c
WritePGM_aux.c
WritePGM_aux.h
global.h
Table 3.3.: Processor settings for Exercise Nr 2 - one process
Do not forget to add an RTOS to the processor as we are running multiple processes. Now,
add the communication channels. All channels use the same buffer size which you calculated
earlier. You should manage to figure out the communication channels by yourself, if not,
ask your collegue.
Next, you can generate and simulate a functional TLM. It should work without errors and
two images should pop up ("landschaft.pgm" and "image_processed.pgm").
The image processing should be clearly visible (enhanced contrast and brighter). Next, run
the timed TLM and note the amount of cycles which are used to execute the entire program.
3.2.4 Use four processors with one process per processor
In the next design, we will use four processors with one process per processor. Open ESE,
open your old design ex2_rw.eds and save it as ex2_4processors.eds. Add 3
more processors, a OPB bus and a transducer. The transducer is needed as a shared memory
element for the FIFO’s and to made it possible to talk as a master (processor) with another
master (processor). Therefore you have to add the transducer to your system as a slave. You
HuCE-microLab 26
3.2. Exercise Nr 2: Split the program into four processes
should manage to figure out the processes and communication channels by yourself, if not,
ask your collegue.
Once you generate a design with an OPB bus in the ESE tool, you will run into an error. The ierrormessage is: “Address not set in: CE0 in bus: Bus”. To solve this problem, you have to
enter the lower address in the properties of the bus. But the ESE tool has a bug, which made
it impossible to enter this address. As a workaround you have to close your design an open
the *.eds file in a text editor. There you have to search for the following content:
<TXENTRY name = "CE0 ESETX_PORT_0"/>
and add the value for the requested address to it:
<TXENTRY name = "CE0 ESETX_PORT_0" lowaddr = "0x00200000"/>
After reopening the design in the ESE tool, you should see this address in the properties of
the OPB bus.
Next, you can generate and simulate a functional TLM. It should work without errors and
two images should pop up ("landschaft.pgm" and "image_processed.pgm").
The image processing should be clearly visible (enhanced contrast and brighter). Next, run
the timed TLM and note the amount of cycles which are used to execute the entire program.
Close ESE afterwards.
HuCE-microLab 27
3.3. Exercise Nr 3: Insertion of parallelism in the processes
3.3 Exercise Nr 3: Insertion of parallelism in the processes
As you have discovered, the execution time of the program in the design with 4 CPU’s is not
decreased (compared to the first one). This is happening because the processes are blocked
until the previous one has finished his task. The only thing you have introduced up to now
is communication overhead. Worst trade deal in history of trade deals maybe ever. Wiser
men would conclude that transferring the whole image at once is a bad idea. The processes
should therefore get parallelized in the way that you send the lines immediately to the next
process after the processing. In a system like this, the processors can work in parallel and
therefore the execution time of the whole task will decrease significantly.
In a real situation, you would parallelize all four processes. However, in the simulation
tool we are using, disk read/write cycles are excluded from the simulation. Thus, you do
not need no parallelize the I/O processes (components readpgm and writepgm), but you can
focus on the two image processing processes (files gamma.c and contrast.c). You
have to change them in way to let them receive, process and send smaller blocks of 40 lines. bTo not run into problems with ESE, the recommended way to handle the copy-pasting is as
follows:
1. Type the following commands into your terminal:
mkdir ~/ese_exercise/ex3
cd ~/ese_exercise/ex3
cp -r ../ex2/src ../ex2/image ../ex2/Debug \
../ex2/.project ../ex2/.cproject .
2. Start up the eclipse for C/C++ development with the command eclipse on a termi-
nal. Let your workspace point to ~/ese_exercise.
3. Import the unpacked project into your workspace.7
Uncomment the line #define MODEL and hack in your modifications. If your code is
running correctly in the model, close eclipse and comment the line #define MODEL prior
to opening ESE. The recommended way to handle the copy-pasting is as follows:
1. Open ESE and open your old design ex2_4processors.eds. Save it in your new
folder ~/ese_exercise/ex3 as ex3_4processors.eds.
2. Change the modified sources to point to the new direction (e.g. delete the contrast.c
file in the process contrast and re-add the new file ~/ese_exercise/ex3/src/
contrast.c, and do the same for gamma.c)
7You find a detailed description in the appendix
HuCE-microLab 28
3.3. Exercise Nr 3: Insertion of parallelism in the processes
Generate and run the timed TLM model, have again a look to the execution time and compare
it with the times from above.
HuCE-microLab 29
3.4. Exercise Nr 4: Optimize the communication
3.4 Exercise Nr 4: Optimize the communication
As last step in this exercise we want to optimize the communication. In every communi-
cation scenario you have to decide about the block size which you want to transfer in one
package. There is always a tradeoff between large packages (fewer overhead, long reserva-
tion time of the bus) and small packages (much overhead, short reservation time of the bus).
To transfer 40 lines (with 600 pixels each) seems to be near at the "large package"-case.
The contrary would be to transfer only one pixel in a block, which would produce a huge
communication overhead. As always, the optimum is placed somewhere in between the two
extremes.
In order to not break your design of exercise nr 3, we recommend to create a new folder
(ex4) and do the same copy-pasting you just did.
Try then to transfer a few or only one line per block and analyze the performance of the
system. We are interested in the total execution cycles, therefore focus on the output of the bterminal.
If you have coded in a intelligent way, you can only change the value of the symbol ZLINES_PER_BLOCK
in the global header file to change the amount of transferred lines.
HuCE-microLab 30
3.5. Discussion
3.5 Discussion
Discuss with your collegue the following questions:
1. Which design has the fewest total execution cycles?
2. Why is the one-processor-design of exercise 2 performing this bad?
3. To gain speed, we could parallelize the particular algorithms. In such a case, processor
1 would process all uneven packages, and processor 2 would proces all even packages,
thus every processor would only need to process half of the total data. Is this a good
idea? Guess the new total execution cycles.
4. In the tasks above, we only optimized for total execution cycles. What are other goals
and ressources that your future boss could be interested in to know? Is the final design
(four microblaze softcore processors implemented in a FPGA) the best design that
you could think of? What could be another interesting design to investigate? How
could you save ressources?
HuCE-microLab 31
4. Exercise Video Processing
4 Exercise Video Processing
Contents
4.1 Exercise Nr 1: Optimize the Load Balance 32
4.2 Exercise Nr 2: Add an Inverter to the System 33
4.3 Exercise Nr 3: Reduce Edge Detection to a Single Process 34
4.4 Exercise Nr 4: Optimize the Median Filter 35
4.5 Outlook 36
4.1 Exercise Nr 1: Optimize the Load Balance
Download the source files from the microLab website (Course SoC, File: TGZ (Video)).
Copy the source to your desired destination in the same way as you did in the previous
exercise.
Figure 4.1.: Block schema of the system.
Try to optimize the load balance beetween the CPUs. In the basic configuration which you
can download from the website, every process has it one CPU. This is the simplest but bnot the efficient way referenced to the use of ressources. First have a look on the charts
which show you the percentage of time which is used for computing. Use the Xilinx RTOS
(xilkernel) to run more than one process on a CPU, as you have seen it in the Lab
2 of the tutorial. Take care of the RTOS overhead which occures caused to the context
switches between the different tasks. This value would increase non linear with the number
of processes which you run on a CPU.
HuCE-microLab 32
4.2. Exercise Nr 2: Add an Inverter to the System
4.2 Exercise Nr 2: Add an Inverter to the System
median
sobelhoriz
sobelvert
gradient inverterreadpgm writepgm
Figure 4.2.: New block schema after adding an inverter to the system.
To be able to test changes on code which stays behind the processes we need a model. The badditional functions we need to run the ese code are already integrated in the project. You
can enable/disable them by uncomment the line #define MODEL. To be able to edit and
test the C-model in eclipse, you have to execute the following steps:
1. Startup the eclipse for C/C++ developement with the command eclipse on a ter-
minal. Let your workspace point to ~/ese_videoexercise.
2. Import the downloaded project into your workspace.
Now you are able to debug your programm when you implement new features.
Have a look at the code of the existing components to get used to the communication in-
terface used in ESE and to the way of programming which is used for a PE. For the theory
about image inverting see section 2.6.
When you have implemented and tested the inverter successfully, you can close eclipse and
you then have to rearrange your design corresponding to the changed source code in ESE.
As a last step, optimize the new system again to reach a good load balance.
HuCE-microLab 33
4.3. Exercise Nr 3: Reduce Edge Detection to a Single Process
4.3 Exercise Nr 3: Reduce Edge Detection to a Single Pro-cess
Figure 4.3.: New block schema after combining the three edge detection processes into one singleprocess.
Combine the three edge detection processes to one single process by using the pseudo-
convolution operartor (see subsection 2.5.3). bHint: You have to create a new filter function which does this calculation and apply it to the
frame by using the Filter3x3C()-function.
HuCE-microLab 34
4.4. Exercise Nr 4: Optimize the Median Filter
4.4 Exercise Nr 4: Optimize the Median Filter
As you have seen in the previous exercises, the quicksort part of the median filter is the
breaking element for the whole system. There are other algorithm instead of sorting to find bthe median. Look in the internet8 for faster algorithm and implement one.
Hint: Again, you have to edit only the filter function.
Figure 4.4.: Improving the speed of the median filter.
8For example: http://ndevilla.free.fr/median/median/index.html
HuCE-microLab 35
4.5. Outlook
4.5 Outlook
As you have seen in the exercises, the median filter algorithmus the computationally inten-
sive element. The next step is now to implement this component, or at least the median
algorithm, in a hardware block. The tool provide different PE out of the library to do this:
â NISC (No-Instruction-Set-Computer)
â Forte (ESL9 Synthesis Tool)
In this way, you are able to speed up the your design in the area of a decade or even more.
9Electronic system level
HuCE-microLab 36
A. Using Eclipse
A Using Eclipse
A.1 Importing an Existing Project
To import an existing project into your current workspace, you have to apply the following
steps:
1. Open the menu “File”⇒ “Import”. Then the dialog shown in figure A.1 appears.
2. Choose “Existing Projects into Workspace” out of the list.
3. Select the downloaded archive file. The included project is automatically chosen. (See
figure A.2)
4. Click the Finish-button to import the project.
Figure A.1.: Select an import source.
HuCE-microLab 37
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.2.: Select the root directory where you unpacked the downloaded templates.
A.2 Debugging a Functional or Timed TLM in Eclipse withGDB
A.2.1 Import the Model
If you have segmentation faults while running your Transaction-Level-Models, you have
normally a problem in your memory management. Therefore you want to know where your
program crash. To do this you need the possibility of debugging your code. You have to do
the following steps, to import the TLM into the Eclipse IDE and use the GDB to debug the
program:
1. As the first step, you have to create a new C Project. (See figure A.3 and A.4)
HuCE-microLab 38
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.3.: Open the project creating dialog.
Figure A.4.: Create a new C Project.
HuCE-microLab 39
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.5.: Select “Makefile project” as project type.
2. You have to select “Makefile project” as project type, because the TLM is built with a
Makefile with the target tlm. (See figure A.5)
HuCE-microLab 40
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.6.: Copy the TLM into the project folder.
3. Copy all the files which are in the {Project_name}_timed_TLM folder into your
eclipse project folder. (See figure A.6)
HuCE-microLab 41
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.7.: Create a new make-target.
4. Create a new make-target called tlm in the “Make Target”-view. If you don’t have this
view, you can open it going to the menu “Window”⇒ ““Show View”⇒ “Other...”.
Type “make” into the filter mask and then select Make Targets. (See figure A.7)
HuCE-microLab 42
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
Figure A.8.: Enter “tlm” as target name.
Figure A.9.: Change the build target in the properties.
5. As a last step, you have to change the properties of your project. Right-click on the
project name and select “Properties” in the appeared context menu. Then change the
Make build targets for saving and building at the “Behavior” tab in the “C/C++ Build”
options. (See figure A.8)
Now you can debug your generated TLMs in the same way as you are doing it for the other
projects. If you made some changes in the code, you have to rebuild the TLM with the ESE
tool and copy the files again into the eclipse project folder.
HuCE-microLab 43
A.2. Debugging a Functional or Timed TLM in Eclipse with GDB
A.2.2 Common Problems
Wrong project name: If the name of your Eclipse project is not equal to the ESE project
name, then make can’t find the resources. The solution is to rename the Eclipse project or
to change the folder name in the Makefile.
HuCE-microLab 44