towards a programmable image processing machine on a spartan-3 fpga

7/30/2019 Towards a Programmable Image Processing Machine on a Spartan-3 FPGA

1/115

Towards a Programmable Image

Processing Machine on a Spartan-3 FPGA

Thomas Anthony Gartlan, B.A, B.A.I, M.Sc

Athlone Institute of TechnologyAthlone, Ireland

July 2006

Submitted in part-fulfilment of the degree Master of Science

(Advanced Engineering Techniques)


2/115

2

Declaration

I declare that this thesis, unless otherwise stated, is entirely my own work and that it hasnot been submitted at this or any other University as an exercise for a degree.

Signed_________________________

Date_________________


3/115

3

Acknowledgements

I would like to thank Dr Fearghal Morgan, NUI Galway, my supervisor for his advice,pointers and discussion at crucial times during this project. Also thanks to him for theproject idea and equipment made avilable to me.

A special thanks to my beautiful wife Eleanor for her patience and understanding, whilst Iabandoned ship for a little while every day, to write up this thesis.


4/115

4

Abstract

This project investigates the design and implementation of image processing algorithms ona FPGA. The FPGA is part of an overall system, developed by a team guided by DrFearghal Morgan, that consists of a development board produced by Diglent and user

interface software produced by students in NUI, Galway. The image processing algorithmsdesigned and implemented in this project are numerous point operations such asmodifying brightness and contrast, Prewitt edge detection as an example of neighbourhoodoperations and the morphological operations, dilation and erosion. In addition the theoryof warping was investigated and a foundation for future development in this area waspresented.


5/115

5

Acronyms

CAD Computer Aided DesignCSR Control Status RegisterDSP Digital Signal Processing

FPGA Field Programmable Gate ArrayJPEG Joint Photographic Experts GroupJTAG Joint Test Action GroupLAN Local Area NetworkLED Light emitting DiodeMPEG Motion Image Experts GroupNUIG National University of Ireland GalwayPC Personal computerPCB Printed Circuit BoardUSB Universal Serial Bus

VHDL Very High Speec Integrated Circuit Hardware description Language


6/115

6

List of Figures

Figure 1 User Interface for the AppliedVHDL system ______________________ 6

Figure 2 Binary Image File ___________________________________________ 9

Figure 3 8-bit Intensity image(4*4) _____________________________________ 9

Figure 4 A 256 Colour Bitmap (Indexed) image after the Black and Whiteattribute is set. _______________________________________________________ 10

Figure 5 Colourmap file if Black and White attribute is not set._____________ 11

Figure 6 Correct colourmap for a greyscale image. _______________________ 11

Figure 7 Different size greyscale images _______________________________ 12

Figure 8 180*180 Black and white ____________________________________ 12

Figure 9 Various Point Operations represented graphically(Burdick, Digital

Imaging) ____________________________________________________________ 19

Figure 10 2-D Kernel is passed over the input image during convolution ______ 20

Figure 11 Examples of 3*3 Kernels ____________________________________ 21

Figure 12 2-D Kernel is passed over input image during Erosion ____________ 23

Figure 13 Kernel for Erosion using 4-connectivity _______________________ 23Figure 14 Overview of DSP block architecture __________________________ 27

Figure 15 Main cycles of DSP Block Version 1 ___________________________ 28

Figure 16 Main Cycles of DSP Block Version 2 __________________________ 29

Figure 17 DSP Block Architecture ____________________________________ 31

Figure 18 DSP block main State Machine _______________________________ 33

Figure 19 Inputs of the Point Operations block___________________________ 34

Figure 20 Outputs of the Point Operations block _________________________ 34

Figure 21 Architecture of the Point Operations block _____________________ 35

Figure 22 Original Greyscale image used to illustrate point

operations(eye_32*32_grey.bmp). ______________________________________ 36Figure 23 Processed images from various point operations _________________ 37Figure 24 Part 1 of the Architecture of the Kernel3_img_proc block __________ 40

Figure 25 Timing of DataEn and Pixel_En signals ________________________ 40

Figure 26 Part 2 (a) of the Architecture of the Kernel3_img_proc block _______ 42

Figure 27 Part 2 (b) of the Architecture of the Kernel3_img_proc block______ 43

Figure 28 Part 3 of the Architecture of the Kernel3_img_proc block __________ 44

Figure 29 Original eye 32*32 pixels image_____________________________ 45

Figure 30 Full Edge Detection with different threshold values _______________ 45

Figure 31 Horizontal Edge Detection with a sensitive value of CSR_7 ________ 46

Figure 32 Vertical edge Detection ____________________________________ 46

Figure 33 Black and White image received from NUI.(NUIGImage1_180*180.bmp) ___________________________________________ 47

Figure 34 Full, horizontal and Vertical Edge Detection on Black and White

image (180*180)______________________________________________________ 47

Figure 35 Original Image developed to demonstrate Erosion _______________ 48

Figure 36 Performing Erosion ________________________________________ 48

Figure 37 Image developed to demonstrate usefulness of dilation ____________ 49

Figure 38 Result of one dilation ______________________________________ 49

Figure 39 Result of Erosion after Dilation_______________________________ 49

Figure 40 Illustration showing how line buffers are implemented_____________ 52

Figure 41 Core generator used to generate efficient shift registers based on RAM.

___________________________________________________________________ 54Figure 42 Source and Destination Image for Table 10 ____________________ 62


7/115

7

Figure 43 Backward Transform for 90-degree rotation around centre pixel of

15*15 image _________________________________________________________ 65

Figure 44 Flowchart for Warping algorithm ____________________________ 68

Figure 45 Use to replace shaded section in earlier flowchart, if Bilinear

Interpolation is used. __________________________________________________ 69


8/115

8

List of Tables

Table 1 Neighbourhood and Morphological Image Processing functions added to

the system.......................................................................................................................... 25

Table 2 Point Image Processing Operations added to the system ........................... 26

Table 3 DSP Block Inputs .......................................................................................... 30Table 4 DSP Block Outputs ....................................................................................... 30

Table 5 Internal Signals of the DSP block ................................................................ 32

Table 6 Pixel values of the original image(subset(8:12,3:7) .................................... 38

Table 7 Pixel values of the brightened image (subset(8:12,3:7) .............................. 38

Table 8 Types of Transformations ............................................................................ 57

Table 9 Various Affine Transformations .................................................................. 58

Table 10 Forward Transform example for 90 degree rotation around the origin. 62

Table 11 Forward Transformation example for 6o degrees rotation around origin.

........................................................................................................................................... 63

Table 12 Backward Transform for 90-degree rotation around centre pixel of 15*15

image................................................................................................................................. 64


9/115

i

Contents

Chapter 1 .............................................................................................................................. 1Introduction.................................................................................................................. 1

Overview........................................................................................................................................1 Summary of objectives.................................................................................................................2Report Organisation.....................................................................................................................3

Chapter 2 .............................................................................................................................. 4Review of Current System.......................................................................................... 4

Introduction...................................................................................................................................4 Overview of the Current System ............................................................................................... 4Limitations of the Current System ............................................................................................ 7Conclusion ...................................................................................................................................14

Chapter 3 ............................................................................................................................ 16Image processing algorithms ................................................................................... 16

Introduction.................................................................................................................................16 Point Operations.........................................................................................................................16Neighbourhood Operations.....................................................................................................20Morphological Operations........................................................................................................22Summary of New Image Processing functions added to the system................................24Conclusion ...................................................................................................................................26

Chapter 4 ............................................................................................................................ 27The New DSP controller block and image processing sub-blocks................27

Introduction.................................................................................................................................27 The DSP Controller, DSPblk.vhd...........................................................................................29The Point Operations sub-block (pixel_img_proc.vhd) .....................................................34

The Neighbourhood & Morphological Operations sub-block(kernel3_img_proc.vhd) ..........................................................................................................39Conclusion ...................................................................................................................................50

Chapter 5 ............................................................................................................................ 51Implementation of Line Buffers ............................................................................. 51

Introduction.................................................................................................................................51 Method 1:- Synthesising the Line Buffers using VHDL code...........................................52Method 2: Creating the Line Buffers using the CORE Generator...................................54Conclusion ...................................................................................................................................55

Chapter 6 ............................................................................................................................ 56Warping and Morphing ............................................................................................56

Introduction.................................................................................................................................56 Basic Theory and Applications ................................................................................................56Detailed look at the Affine transformation Rotation ........................................................ 61Implementation of Warping Algorithms................................................................................66Conclusion ...................................................................................................................................70

Chapter 7 ............................................................................................................................ 71Conclusion .................................................................................................................. 71


10/115

1

CCHHAA PPTTEERR 11

INTRODUCTION

Overview

Image processing is an area of growing significance. Until recently electronics technologyin the area of imaging has mainly focused on the capture and delivery of images in theform of analogue television. However in the last five years, as predicted by Moores law,the advent of relatively cheap and compact memory has seen digital images become viablefor consumer products such as cameras and mobile phones. Digital cameras have now

almost completely replaced their analogue ancestors. This is purely a significance of theavailability of cheap and high density memory, since this allows the technology toovercome the biggest problem that digital images have always presented, that of largememory requirements. If we take a single 4*6 inch print and scan it at 400 ppi(parts perinch) then the resultant digital image will have 1600*2400 pixels. If we want colour thenthe memory required to store this image is 11Mb. A greyscale image requires slightly less at3.67Mb.[1] Continuing this line of thought means we need almost 250 Mb to store justover 20 images. Ten years ago this type of memory was only available on hard disks.

Video involves rapidly changing images and has obviously even higher memoryrequirements. For a colour image of modest size 640*480 and 25 frames per second, weneed 23Mb for every second of video. However advances in technology have allowed even

this formidable hurdle to be overcome and we now have small digital video cameras withFlash memory replacing their large outdated analogue equivalents.

Advances in memory technology have allowed us to capture digital images and videocheaply but this in itself would be of no use if we did not have the bandwidth to transportthe large amounts of digital data created from one place to another, for example from adigital camera to a PC or across the internet. Solving this communication problem is acombination of technology and intelligence. Technology gives us fibre optic cables andhigher bit rates across USB and firewire cables while intelligence gives us coding standardssuch as JPEG and MPEG. We have become accustomed to moving images across thegreatest network of all, the internet. Over the next few years as broadband technology and

coding standards evolve we will see more and more higher quality video on the internetand LANs.

So what now? We can capture digital images and video at ease and we can shuttle themaround quickly from one location to another. However can we interpret them in ameaningful and intelligent way?Interpreting images requires sophisticated algorithms that allow us to gather usefulinformation from images. The most obvious applications to date have been in the areas ofsecurity, for example face recognition at airports to hinder known terrorists or eyerecognition to validate authorised personnel. Google have strived recently to allow one touse images to search the internet. This is only the beginning. One does not have to dwell

too long to think of dozens of image applications if a system could only take a image andfrom it garner information in much the same way as the brain does automatically. Various


11/115

2

other image and video applications in the areas of medicine, military, studio areenumerated in [2] and [3].

The algorithms that will allow us to do this will require large amounts of computing powerworking on very large amounts of data. If algorithms can be broken down so that someof the tasks can be completed in parallel then great savings in computing time can be

made. This is where FPGAs come in. An FPGA is basically programmable hardware thatallows us to implement image processing algorithms in hardware as opposed to software.This allows us to take advantage of the inherent parallelism of hardware in implementingthese computing intensive algorithms.[3],[4]and[5]

A system has been developed by NUI Galway that allows transfer of images between ahost PC and on board SRAM via a Xilinx Spartan 3 FPGA. The system provides a

versatile user-defined dspblk element within the FPGA. This element includes memoryread/write access and is programmable via the host using control registers.

The aim of this masters project is to design, simulate and implement(in the dspblk) some

common image processing algorithms to process these images within the FPGA. Thesystem then allows the processed images to be uploaded to the host PC for viewing.

The FPGA used is a Spartan 3 developed by Xilinx[6][7]. The board on which the FPGAis placed is called a Digilent Spartan-3 Starter Board Rev E [8]. This board also has apower supply and regulation, some interface sockets to allow other boards to beconnected, switches, LEDs, flash memory and a serial port connection. This serial portconnection is used to transfer data and images to and from the PC. A separate parallel to

JTAG connection is used to transfer configuration data to the FPGA. Students in NUIGhave developed hardware via a VHDL[9] design flow and Xilinx ISE CAD tools thatconfigures the FPGA to allow images to be transferred between the PC and the on board

memory. In addition Visual Basic was used to develop the application software that allowsthe user interact with the FPGA board. The entire system is well explained anddocumented in [ 10] and [11]

Summary of objectives

This project, in general terms, looks at the implementation of some established imageprocessing algorithms on a Spartan3 FPGA. The image processing algorithms consideredcan, broadly speaking, fit into three main categories: Point, Neighbourhood and

Morphological operations.

Point operations involve modifying each individual pixel value, based on its current value,and independent of all other pixel values. Neighbourhood operations involve modifyingeach individual pixel value based on some computation performed on a number of thesurrounding pixel values as well as its own value. Both Point and Neighbourhoodoperations alter the image. Morphological operations also depend on algorithms thatinvolve neighbourhood pixels but these operations are used to understand the form orstructure of an image as opposed to altering the image.

More specifically the projects main objectives are:-


12/115

3

Review of the current system setup, in terms of first, understanding its operation,and second, highlighting the limitations from the point of view of implementingimage processing algorithms.

Research of image processing algorithms with a view to selecting the mostappropriate to implement on the system available.

Design of the selected image processing algorithms, and implementation on thecurrent system.

Make recommendations in terms of changes to the current system architecture inorder to better cope with the varied demands of image processing algorithms.

Report Organisation

Chapter 2 reviews, the current system hardware and software. The system is scrutinised todetermine its operation and limitations. Since the operation is well discussed elsewhere, thebulk of the focus in this chapter will be on the limitations of the system. Whileacknowledging the fantastic work done so far in developing the system via undergraduateand postgraduate projects, it is only after having an understanding of the systemslimitations can the design work here begin. In addition, a critical analysis here will benefitany future work regarding improvements.

Chapter 3 introduces the theory of the image processing algorithms that are consideredhere. These are point, neighbourhood and morphological operations. In each case, withoutshowing how, it is stated which particular operations are implemented. Two very usefultables are presented in this chapter that show the user what register values are required toselect specific functions.

Having discussed the theory in chapter 3 and decided on the specific algorithms toimplement, chapter 4 shows how the image functions were designed and implemented.First to be discussed is the new design of the DSP controller block that allows for heavypipelining of some image processing tasks. Then it is shown how two new sub-blocks havebeen created to cater for the separate categories of image processing tasks.

Chapter 5 focuses on one specific design feature, the line buffers, that involved a largeamount of research and experiment. Line buffers are used to store one complete line of animage. Since, depending on the image width, they have the potential to use up a largeamount of chip area on the FPGA it was felt to be worthwhile researching the mostefficient way to implement these area hungry structures. The results are discussed.

Finally, chapter 6 discusses the theory of warping and morphing. Due to time constraintsno actual design work occurred in this area. However as a guideline for future projects therealisation of warping algorithms is discussed and a flowchart for one specific design ispresented.


13/115

4

CCHHAA PPTTEERR 22

REVIEW OF CURRENT SYSTEM

Introduction

This chapter describes in general detail the salient features of the system received fromNUIG. First the architecture and operation is described, then the limitations are focusedon, specifically with regard to implementing image processing algorithms

Since a detailed overview of the system architecture has been provided elsewhere,[10] and[11], a large amount of effort will not be dedicated here reiterating this work. However abrief description will ensue, regarding the design in order to give the reader a feel for the

work to date. Of perhaps more significance, and not dealt with elsewhere, are thelimitations of the current system. These limitations still exist, since it was not the purposeof this project to address them. However it was important from the point of this project toidentify these limitations since they have a direct impact on the results of the work carriedout here, i.e. the implementation of image processing algorithms. Addressing some of theselimitations is the subject of current projects, whereas solving other problems highlightedhere may provide topics for future work.

Overview of the Current System

The system used for this project, can be broadly speaking, categorised into two main parts,hardware and software.

The hardware part consists of a PCB (Printed circuit Board) based on the Xilinx Spartan-3 FPGA and developed by Digilent. The board itself is referred to as the Spartan-3 StarterBoard.[8]

The FPGA design is developed using the VHDL design language and the Xilinx IntegratedSoftware Environment (ISE) 6.3i software. In addition simulations were performed usingModelSim XE II/starter 5.8c.

Work has already being done, by students of NUI Galway, towards developing an imageprocessing system that is used as a starting point for this project. An overview of theFPGA design as it existed at the start of this project is described below. The design is welldocumented [10] and [11], however a brief description will now ensue.

The design on the FPGA is hierarchical. The top level block is called AppliedVHDL.This block consists of three sub-blocks:-


14/115

5

The main sub-block is called NUIProject. The bulk of the design is containedwithin this sub-block. This is described in more detail later.

The second top-level sub-block is the UART (Universal Asynchronous ReceiverTransmitter). Its role is to communicate data between the PC and NUIProject

block. Its main job is to convert serial data to parallel data and vice versa, since datawithin the FPGA is byte wide while data to and from the PC is via the serial porton the PC and JTAG interface on the board.

Last is the display controller dispCtrlr sub-block. It receives data from theNUIProject block and performs tasks such as binary to 7-segment code conversionto allow data to be displayed on the boards 7 segment displays.

The NUIProject top-level sub-block is where the majority of the design resides. This blockin turn is sub-divided into the following sub-blocks. These are:-

The Memory controller, MemCtrlr, controls the flow of data to and from the onboard RAM[12]. It does this by generating all the relevant RAM control signalsupon a request to read or write to the RAM. This request could come from theIOCSR block or DSP block. Data to/from the RAM is 32 bits wide.

The IOCSR block liaises between the UART on the one hand and the CSR andRAM on the other. The CSR(Control and Status Register) consists of 8 byte-wideregisters. These registers allow the user to impart information to or program thesystem. The bottom 3 registers specify the RAM address if the user wishes toperform a read or write from/to a single RAM location. The next 2 registers allowthe user to specify the number of RAM locations to use in its DSP operation.

These two registers can therefore also be used to specify image size. The top threeregisters were unused. In this project it was decided to use the top two unusedregisters, CSR_6 and CSR_7, to program the image processing function required.

This is discussed in a later chapter

The Data controller, DatCtrlr, bundles 8-bit data from the UART to 32-bit widedata for writing to RAM and in the opposite direction it unbundles the data. Inaddition this block selects between RAM data and CSR data when sending data tothe UART for transmission.

Finally, and most importantly from this projects point of view, is the DSP block.This block has been redesigned during the course of this project to perform imageprocessing tasks on an image stored in RAM. The processed image is written to adifferent location in RAM. This block is discussed in great detail in chapter 4.

The user interface software used to download images and access memory and the CSRregisters was designed in NUI Galway. It was developed using the Visual Basicprogramming language. However some 3rd party translation programs (written in C) arealso required to translate bitmap images created in Microsoft Paint to greyscale images thatcan be downloaded to memory. More on this translation process later. The user interface

program calls these translation programs automatically and so the user is isolated as muchas possible from this translation process.


15/115

6

The user interface is shown in the figure below.

Figure 1 User Interface for the AppliedVHDL system

The description above of the AppliedVHDL system is admittedly brief however a moredetailed one can be found in [10] and [11]. Attention will now turn, in the next section, to

The CSR resgisters can bewritten and read, here. Datais entered in Hex format.

Individual RAM locationscan be written and read.Addresses and data are bothin Hex format

A bitmap image is selected to

be loaded into the lowestquadrant in memory(00000H)

The processed image is read

from the last quadrant inmemory(30000H)

A second image is loadedinto the second quadrant inmemory(10000H). This isused if the processing taskrequires two images.

The size of the imagein pixels is specified.


16/115

7

the systems limitations. The purpose of this ensuing discussion is in no way to detractfrom the excellent work so far, but to cast a critical eye to determine where problemsmight be encountered in the future and to point out areas where the system might beimproved.

Limitations of the Current System

The starting system, as it was received for this project, has in my opinion, a number oflimitations. Some of these have been addressed by this project; some have been addressedelsewhere, while some others still exist. This section will discuss these limitations from thepoint of view of the ultimate desired goal, that of a versatile, fast and efficient real-timeimage processing system. These limitations concern the following, a) image size b) imagetransfer c) image translation software d) user interface software, e) hardware architecture

and f) image processing functions. A short discussion ensues under each of theseheadings.

Image Size

All image sizes are specified in pixels. For examples an image that has a size of 16*16 issquare with 16 pixels in the x-direction and 16 pixels in the y-direction. Total number ofpixels for this image would then be 256. Each pixel value at present is represented by an 8-bit value and hence can be stored using a 1 byte location. There are currently a few obvious

ways in which the image size of the system may be limited.. The factors affecting imagesize are Memory, FPGA and Application software.

The first is Memory. Currently the amount of memory available is 4 quadrants of 256Kbytes. If storing of images is limited to one quadrant then the maximum image size is512*512 pixels. It is possible in the case where the image processing function only requires1 image to use half the available memory for the input image and the other half for theprocessed image. The image size then could be approximately 725*725 pixels. This wouldinvolve however a redesign of the memory controller. If on the other hand, the processingof colour images is required then 3 times as much memory is required for the same size ofimage since each pixel will have three 8-bit values associated with it, one for each of thethree primary colours Red(R), Green(G) and Blue(B).

The second factor that affects the size of image processed is the actual size of the FPGA.Currently it is a Xilinx Spartan 3. During the course of the project, it was discovered that,for some image processing functions, line buffers are required. In the minimum case, atleast 3 line buffers are required. A great deal of effort was spent investigating theimplementation of these line buffers on the FPGA. Initially it was thought the line bufferscould be synthesised by modelling them using VHDL code. This would have theadvantage of a semi-flexible image width, since changing the image size would require thechange of only one parameter, width, before synthesis. This solution is semi-flexiblebecause if the image width changes, the design has to be re-synthesised and downloaded.However it was discovered that this synthesis option lead to a very inefficient use ofFPGA resources so that going down this route, the maximum image width that could beaccommodated was 128 pixels. Using synthesis, the line buffers were implemented usingFlip-Flops. Mapping fails when all these are used up.


17/115

8

The eventual solution fixed on a image width of 180 and then used the Xilinx COREGenerator to generate the line buffers. In this case the line buffers are more efficientlyimplemented using RAM based shift registers. Using this method the image width can beincreased to approximately 320 pixels, after which mapping fails.

Finally, there is a image size limitation due to the application software. Initially with thefirst version of software received the image size was limited to 8*8pixels. When this sizewas exceeded the software crashed with a runtime error. A new version of the softwarewas received and the new image size limit is now 180*180 pixels. Images above this size,once again, cause the software to crash.

In conclusion, current image limit is 180*180 pixels due to User Interface softwarelimitation. If this is fixed then the next expected limit would be 320 * 320 set by the FPGAsize, due to the area used up by the line buffers. If this is overcome, by choosing perhaps a

VIRTEX FPGA then it is anticipated that the next hurdle would be presented by thememory size.

Upload and Download of Images

At the moment, the current maximum image size is 180*180 pixels. This limit is due to theapplication software, which terminates abruptly with a runtime error, if this image size isexceeded.

The current method of uploading and downloading images is via the serial port. The time ittakes to download the image(180*180) is approximately 95 seconds. The time it takes toupload the processed image is approximately 125 seconds. The execution of the DSPfunction , i.e. internally processing the image is effectively instantaneous, since it canoperate at effectively the system clk rate of 50MHz. However the DSP block must wait fordata to be read from/written to RAM, and this reduces the effective clock rate byapproximately a factor of 10. Even at this 5MHz rate, a frame size of 180*180 would beprocessed in approximately 0.006seconds (180*180/5MHz) or 154 frames/sec.If the frame size was 640*480 the frame rate would be 16 frames/sec, not far off the 25frames/sec required for real time. However the bottleneck occurs in transferring data frommemory to/from the host PC.

A 3000 fold increase on the current throughput communication rate is required toapproach real-time data rates. Projects in NUI have occurred to improve the situation.USB and Ethernet technologies are been investigated. These will definitely yield animprovement but by exactly how much is unknown as yet.

Image Type and Translation programs

A great deal of confusion surrounds this area. This is primarily due to the fact that 3rd partysoftware is being used in the creation of the bitmap images (Microsoft Paint) and in thetranslation of these images eventually to a txt file that consists of a series of commandsused to write pixel values to the correct memory location. The exact operation of this

software, and the file types used, needed to be investigated since the results obtained attimes were often hard to understand. In particular, not changing, the image attributes to


18/115

9

Black and White in Microsoft Paint before saving seemed to causes data corruption eventhough the attributes revert back once the image is saved as 256 colour bitmap.Clearly the problem lies in the fact, that on one hand, our system requires greyscaleimages with 256 levels where 00 represents black and FF represents white, with variousshades of grey in between, while on the other hand we are creating bitmap files which are

colour and Indexed type images. To help clarify the process, a brief discussion will followon the various image types.

There are three main image types, Binary, Intensity and Indexed. Bitmap files are Indexedimages, while Raw (binary) files are Intensity type images.

In a binary image, each pixel assumes one of two discrete values, zero(off) and one(on).The figure below shows the pixel values and corresponding image for a 8*8 pixel binaryimage.

Figure 2 Binary Image File

Intensity images are the most commonly used images within the context of imageprocessing. These are also know as RAW data files. They are greyscale images, where thepixel value represents the grey intensity. If 8-bits are used for the pixel value then there are256 possible grey levels with 00(Hex) representing black and FF(Hex) representing white.

All values in between represent some shade of grey.

Figure 3 8-bit Intensity image(4*4)

An Indexed image consists of a data matrix, and a colormap matrix. The colormapmatrix has 3 columns of values, representing Red, Green and Blue values. Any colour canbe made by combining varying amounts of these primary colours. The number of coloursrequired determines the length of the columns. For example if 256 colours are required

then the colormap matrix will be 256 by 3.

1 0 1 10 1 1 0

1 0 1 1

1 1 1 1

255 16080 0

FF A050 00

(a) Pixel Values (b) Image

(a) Pixel values(decimal) (b)Pixel values(Hex) (c) Image


19/115

10

The data matrix has a value for each pixel in the image. However in this case, these valuesdo not represent directly the pixel shade but instead are used to index the colourmapmatrix. This is illustrated in the figure below.

Figure 4 A 256 Colour Bitmap (Indexed) imageafter the Black and White attribute is set.

The map file (colour matrix) shown above is part of the colour matrix created by MS Paintafter the attributes have been change to Black and White but saved as 256 colour hencethe 256 entries. The point to note is that the colour white is at entry FF and black at entry00. However since the black and white attribute was set, only 2 colours will exist, blackand white. In other words the data matrix will only contain the values 00 and FF only. Itseems these are the values that are used in the RAW file after translation is finished. Thefull map file examined using Matlab can be seen in the file map_bw. Part of the file is alsoshown in Appendix[1].

If the black and white attribute is not set then we get the colour matrix file shown in thefigure below.

Index Red Green Blue

00 0 0 0

01 0.502 0.502 0.502

02 0.502 0 0

03 0.502 0.502 0

04 0 0.502 0

05 0 0.502 0.502

06 0 0 0.502

07 0.502 0 0.502

08 0.502 0.502 0.251

09 0 0.251 0.251

FF 00 FCFA F9 07

Index R ed Green Blue00 0.000 0.000 0.000::

07 0.752 0.752 0.752::

F9 1.000 0.000 0.000FA 0.000 1.000 0.000FB 1.000 1.000 0.000

FC 0.000 0.000 1.000FD 1.000 0.000 1.000FE 0.000 1.000 1.000FF 1.000 1.000 1.000

(a) Data Matrix

(b) Color Matrix

(c) Image


20/115

11

0A 0 0.502 1

0B 0 0.251 0.502

0C 0.251 0 1

0D 0.502 0.251 0

0E 1 1 1

0F 0.7529 0.7529 0.752910 1 0 0

11 1 1 0

Figure 5 Colourmap file if Black and Whiteattribute is not set.

The full file can be found at map_origor a large part of the file is given in Appendix[2].Note here that the colour white (1,1,1) is at entry 0E. Therefore for a image with black and

white colours only the entries 00 and 0E appear in the data matrix table. Once again it is

these values that appear in the RAW file and are ultimately downloaded to memory. This iswrong, since in our greyscale interpretation of the world, FF should represent white andnot 0E.

MS paint, as far as can be discerned does not support greyscale images. In order toovercome this Photostudio and Matlab were used. A greyscale bitmap image was created

with 256 shades of grey. In this case for any entry in the colour matrix all values in any roware the same. In other words if you have the same amounts of Red, Green and Blue mixedyou get a shade of grey. A portion of a greyscale colourmap is shown below.

Index Red Green Blue

0 0 0 0

1 0.0039 0.0039 0.0039

2 0.0078 0.0078 0.0078

3 0.0118 0.0118 0.0118

4 0.0157 0.0157 0.0157

5 0.0196 0.0196 0.0196

6 0.0235 0.0235 0.0235

7 0.0275 0.0275 0.0275

8 0.0314 0.0314 0.0314

9 0.0353 0.0353 0.0353

10 0.0392 0.0392 0.0392::

249 0.9765 0.9765 0.9765

250 0.9804 0.9804 0.9804

251 0.9843 0.9843 0.9843

252 0.9882 0.9882 0.9882

253 0.9922 0.9922 0.9922

254 0.9961 0.9961 0.9961

255 1 1 1

Figure 6 Correct colourmap for a greyscaleimage.


21/115

12

See the full greyscale colour matrix created by Photostudio in the file 256_grey.

Unfortunately, for some unknown reason, the translation programs in this project can onlyhandle a grey bitmap image created in this way if the file is very small. For example the

bitmap image eye.bmp, shown on the left in the figure below can be translated but not thebitmap image meg_8bitgray.bmp. The first has a size of 32*32 pixels the second a size of180*180 pixels.

a) 32*32 greyscale b) 180*180 greyscale

Figure 7 Different size greyscale images

However the image NUIGImage1_180x180_bw.bmp , shown below, can be translated

even though this is also 180*180 pixels in size. The difference here is that even though a256 colour matrix is used only the values FF and 00 appear in the data matrix, i.e. black andwhite, whereas for the image meg_8bitgray.bmp the 256 grey colormatrix is used, but thedata matrix contains values other than 00 and FF.

Figure 8 180*180 Black and white

It is interesting to note that if the Matlab command imfinfo is used to show all essentialinformation on an image file, then both of the above 180*180 files look exactly the same.

The file info_meg_8bitgray can be compared with the file info_NUIGimage inAppendix[3].


22/115

13

It is not within the remit of this project to solve the problems associated with the front endsoftware. However it is hoped that the above analysis and review will be of benefit to

whoever takes on this task. The job in my opinion consists of first taking the data matrix ofa proper grey scale image created with 256 levels of grey. Then create a C program or

other such program to embed this data into the command file that is sent down theboard.

Application Software

In addition to the translation software there also seem to be additional limitations of theuser interface software.If the image size exceeds 180*180 pixels the interface simply crashes. This originallyhappened for a smaller image size with the first version of the software received. However

with the new version of software, this limit has been increased.

This project has developed a number of image processing functions. If a image isdownloaded into memory, it should be possible to perform a number of these functionson the same image, even though the processed image would be overwritten each time. Thisis not the case. If this worked then time could be saved downloading the input image eachtime. However if successful image processing functions are attempted, then the userinterface program once again terminates abnormally.

Hardware Architecture

In addition to the current limit on memory size, discussed earlier, there are also constraints

put in place by the present system in terms of how the memory is organised. At presenteach location in memory is 32 bits wide and therefore stores 4 pixels. Therefore there isthe extra hardware and time of ungrouping pixels when reading, and regrouping pixels

when writing. For most image processing functions and those implemented here, this isnot a serious problem. However when it comes to Warping where translation functionsdepend on pixel addresses then the way the memory is organised at present makesimplementation of Warping algorithms very complicated, since each pixel does not have aunique address.

Image Processing Functions

On receiving the system DSP function implemented in the dspBlk was reviewed. Thoughof no obvious practical use, it sufficed to illustrate the operation of the system in general, interms of reading data, performing some function and writing back to memory.

The DSP function performed was to subtract image1 from image2. This was not done on apixel by pixel basis but on 32 bit word basis. In VHDL the 32 bit values are treated asunsigned.

The following examples explain:-

Example 1:-

If Image 2 has 4 pixels only all white(FF FF FF FF) and image1 is all black(00 00 00 00)then the resulting calculation is (FF FF FF FF) (00 00 00 00) = (FF FF FF FF).


23/115

14

The resulting image is all white.

However if Image2 is all black and Image 1 all white then the resulting calculation is(00 00 00 00) (FF FF FF FF) = (00 00 00 01)

The result looks all black but has in fact 1 pixel value of 01 which is not quite black. In

effect an overflow has occurred.

Example 2:-

If Image1 has 4 shades black, dark grey, light grey and white pixels as follows, then theresulting values will be 00 80 C0 FF

If Image 2 is all black (00 00 00 00) then the resulting calculation will be

(00 00 00 00) - (00 80 C0 FF) = (FF 7F 3F 01)

=

In summary then, the current DSP function, while useful in proving the general operationof the system, does not have any discernible advantage as an image processing algorithm. Itis the objective of this project to implement some commonly known image processingalgorithms.

Conclusion

The chapter has been used to give a brief overview of the current system as it was received

from NUI Galway. Since the system has been described elsewhere, only a summaryoverview is given here. Time was dedicated to putting the system through the mill so tospeak, in order to highlight any limitations that may have a direct effect on the size andnature of images that can be processed, as well as the type of algorithms that can beimplemented. It was discovered that while the system works well for Black and Whiteimages up to a image size of 180*180 pixels it is not adept at handling greyscale images.

The largest greyscale image that can be handled is 32*32 pixels. These limitations are dueto application software limitations. If these limitations are overcome it was shown thatthere are other factors that will affect image size, namely memory size and FPGA size.Speed of image transfer is also an issue. At the moment since a serial port is being used,image transfer speeds are barely tolerable when image size exceeds 32*32 pixels. If real-

time processing is an ultimate goal then faster connections need to be explored. Finallywhen considering the FPGA architecture required image processing it is desirable that the

P0,0 P0,1 P0,2 P0,3

P0,0 P0,1 P0,2 P0,3 - P0,0 P0,1 P0,2 P0,3

P0,0 P0,1 P0,2 P0,3


24/115

15

FPGA should have a great deal of flexible multipliers for computation as well as memoryfor internal line buffers. Multipliers are not in abundance on the Spartan 3. However thereuse was avoided here by carefully choosing algorithms for implementation. The use of linebuffers cannot be avoided. It is hugely advantageous to have these on chip and it is shownlater how these have been implemented efficiently so as not to become a limiting factor. In

other words the XC3S200 is sufficient here but only just.The next chapter discusses various image processing algorithms in general terms andindicates those that are chosen for implementation.


25/115

16

CCHHAA PPTTEERR 33

IMAGE PROCESSING ALGORITHMS

Introduction

Image processing functions may be separated broadly into three categories; point,neighbourhood and morphological operations[13].Point operations are algorithms where each pixel in an image is processed individually i.e.independent of all other pixels in the image. Operations can be unary where only oneimage is involved or binary where two images are combined in some way. This project has

tended to concentrate on unary point operations.Neighbourhood operations use, not only the value of the current pixel, but also those of itsneighbours on the same line and surrounding lines before and after. This obviously thenrequires large amounts of memory to store complete image lines, the number of linesstored depending on the size of neighbourhood required. Low and high pass filtering aretypical neighbourhood operations as well as edge detection.Morphological operations are neighbourhood operations of a kind but are more interestedin the shapes and forms in images. They generally work on binary images. Here, if theimage is greyscale, a sort of threshold operation is performed before the morphologicaloperation. The most common morphological operations are Erosion and Dilation.

Point Operations

For 8-bit greyscale images, each pixel has a value between 0 and 255, where 0 representsblack and 255 represents white. Values in between represent varying shades of grey. Pointoperations involve taking each pixel value in turn and modifying its value according tosome mathematical formula. For example the following are point operations where xrepresents the input pixel value and f(x) represents the output or processed pixel value:-

Reduction in Intensity:


26/115

17

The first two operations are straightforward and can easily be performed on a SpartanFPGA. The last is a more complicated mathematical operation. The most obvious way toimplement this operation is by means of a LUT(Look Up Table) since there are only 256possible values for x.

These operations can be represented graphically as shown in the figures below as producedoriginally by Burdick[13]. All of the point operations illustrated in this figure will beimplemented.


27/115

18

0 Input 255

255

Output

0

a) Unity. Input pixel valuesare unchanged

255

Output

0

0 Input 255

b) Invert. All pixel values areinverted

0 Input 255

255

Output

0

c) Threshold. Convertgreyscale image to blackand white

0 Input 255

255

Output

0

d) Decrease contrast.

0 Input 255

255

Output

0

e) Increase Contrast.


28/115

19

Figure 9 Various Point Operations representedgraphically(Burdick, Digital Imaging)

0 Input 255

255

Output

0

f) Decrease Brightness

0 Input 255

255

Output

0

g) Increase Brightness.


29/115

20

Neighbourhood Operations

Neighbourhood operations combine a number of pixels in a pixels neighbourhood in

some way to determine the output or processed pixel. How the pixels are combined (ex.averaged) and the size of the neighbourhood, depends on the algorithm in question.The most common and useful neighbourhood operation is convolution. It is effectively 2-D convolution as opposed to the 1-D convolution associated with general DSP, since a 2-D kernel or convolution mask is passed over the input image in order to calculate the pixel

values of the output image. The size of the kernel depends on the application but ingeneral a 3*3 kernel is sufficient. The kernel will therefore have 9 values or coefficients. Itis these values that will determine the exact function of the processing, for example lowpass filtering, high pass filtering or in this projects case Prewitt Edge detection.

The convolution process is illustrated in the following figure.

Figure 10 2-D Kernel is passed over the inputimage during convolution

As depicted above convolution involves passing the centre of the 2-D kernel (C12) overeach image pixel value and multiplying the kernel coefficients by the overlapped imagepixel values. The products are summed and divided by the weight of the kernel. The resultis the pixel value of the output image positioned in the same relative position as the centrecoefficient of the kernel on the input image.

An equation for the current output pixel(Q35) in the figure above is:

Q35 = (C01 * P24 ) + (C02 * P25 ) +(C03 * P26 ) +(C11 * P34 ) +(C12 * P35 ) +(C21 * P36 ) +(C22*P44) +(C23 * P45 ) +(C01 * P46 ) / (C01 +C02 +C03 +C11 +C12 +C13 +C21 +C22 + C23)

C01 C02 C03

C11 C12 C13

C21 C22 C23

P01 P02 P03 .. ..P11 P12 P13

P21 P22 P23 P24 P25 P26 P27P31 P32 P33 P34 P35 P36 P37: P43 P44 P45 P46 P47

: P53 P54 P55 P56 P57(a) Kernel

(b) Input Image


30/115

21

Note, as always in convolution, the kernel is first flipped before passing over the inputimage. In practice this is not an issue since, as we will see, the Kernel is generallysymmetrical in both directions.

The following figure demonstrates the effect of changing the coefficient values of thekernel.

1 1 1

1 1 1

1 1 1

a) Coefficients for a Low Pass Filter

-1 -1 -1

-1 9 -1

-1 -1 -1

b) Coefficients for a High Pass Filter

c) Prewitt Horizontal and Vertical Edge Detection

c) Sobel Horizontal and Vertical Edge Detection

Figure 11 Examples of 3*3 Kernels

Convolution, as can easily be seen, could require a huge amount of computational effortdepending on the actual coefficient values. Calculation of each output pixel value using a3*3 kernel could potentially require 9 multiplications, 8 additions and 1 division.For Edge detection the actual output pixel values is not of interest as is in the case offiltering since it simply needs to be established if an edge exists or not. Therefore the

division step can be eliminated. The numerator is simply examined, which represents thegradient, and a decision is made if an edge exists or not depending on its value in relation

-1 -1 -1

0 0 0

1 1 1

-1 0 1

-1 0 1

-1 0 1

-1 -2 -1

0 0 0

1 2 1

-1 0 1

-2 0 2

-1 0 1


31/115

22

to a preset threshold. The output pixel is then set to FF(white) if an edge exists and00(black) otherwise.

The resource savings possible in terms of the number of multiplications and additions ismore than an order of magnitude, depending on the kernel size, if the 2-D kernel isseparable and symmetric as is the case for Prewitt. This is discussed in application note by

Altera[14]. In addition, if the coefficients are 1 then no multiplication is required.For these reasons the Prewitt Edge Detection algorithm was chosen as the most naturalplace to start in implementing edge detection algorithms on a FPGA since it leads to thesimplest architecture. The complexities involved in implementing other algorithms canthen be easily deduced.In summary then, as an example of neighbourhood operations, Prewitt edge detection hasbeen selected. This can be horizontal, vertical or full(vertical and horizontal combined), asselected by the user. In addition the user will also be able to program a threshold valueagainst which the gradient values are compared. Changing of this threshold value changesthe sensitivity of the process, i.e. the higher the threshold values the less edges will bedetected.

Morphological Operations

Morphological operations are generally concerned with the structure or form of an imageas opposed to the appearance of an image. Morphological operations are used extensivelyin automatic object detection when trying to establish the shape and size of objects. Itcould be argued that edge detection is a morphological operation, however morphologicaloperations are generally performed on binary images i.e. black and white, where 1

represents white and 0 represents black. Morphological operations can still be performedon greyscale images but first the image has to be converted to black and white. The twomost common morphological operations are Erosion and Dilation. Here pixel values aremodified depending on the binary values of its neighbours.Once again a 3*3 kernel is generally used and in this respect morphological operations are

very similar to neighbourhood operations.A common binary erosion mask is where all the kernel values are 1 as shown below


32/115

23

Figure 12 2-D Kernel is passed over input imageduring Erosion

Passing this mask over the image, as shown above means that for a pixel to remain whitei.e. 1 all its neighbours must be 1 . Consider the kernel placed over the image as shownabove then the following logical operation is performed to determine P35 , the output pixel.

For Erosion P35(out) = 1 (white) ifP35 AND P24 AND P25 AND P26 AND P34 AND P36 AND P44 AND P45 AND P46 = 1else 0.

This effectively means that the output pixel, if white in the first place, will only remainwhite if all its neighbours are white. The Erosion above is performed using 8-connectivity.If 4-connectivity is used then the corner pixels P24 , P26 , P44 , P46 for the specific operationabove would not have been used. The kernel for 4-connectivity operation is shown infigure below:

Figure 13 Kernel for Erosion using 4-connectivity

Therefore with successive erosions large white areas will shrink in size and small isolated

white areas will disappear altogether. Erosion can be used to clean up object boundaries

1 1 1

1 1 1

1 1 1

P01 P02 P03 .. ..P11 P12 P13

P21 P22 P23 P24 P25 P26 P27P31 P32 P33 P34 P35 P36 P37: P43 P44 P45 P46 P47

: P53 P54 P55 P56 P57

0 1 01 1 1

0 1 0

(a) Kernel

(b) Input Image


33/115

24

and remove spurious objects to allow for easy identification of objects by size andcalculation of number of objects present.

Dilation is the opposite of Erosion. Here White areas Dilate as opposed to Erode. The

pixel in question will become white, if it was white already, or if any of its neighbours arewhite. This time the following logical operation is performed to determine the P35(out):-

For Dilation P35(out) = 1 (white) ifP35 OR P24 OR P25 OR P26 OR P34 OR P36 OR P44 OR P45 OR P46 = 1else 0

Just as is the case for erosion, dilation can also be performed using 8-connectivity or 4-connectivity. As for erosion, for 4-connectivity dilation the corner values are not used.

All the morphological operations described above will be implemented. The following

section specifies exactly the image processing algorithms that have been implemented. Toallow the user select and control the image processing tasks required unused bytes 6 and 7within the CSR(Control Status Register) have been used.

Summary of New Image Processing functions added to the

system

Previously unused bytes 6 and 7 of the CSR have been assigned to managing the imageprocessing functions. The user via the user interface can select the image processing taskrequired using byte CSR_6 while CSR_7 is then used if any control values, ex. threshold

values, are required for the operation. Bit 3 of the CSR_6 byte is used to select betweenpoint operations and neighbourhood operations, while the three lower bits, CSR_6(2:0),are then used to select the specific operation within the category. The tables below showthe exact values required for each operation


34/115

25

Description CSR_6(7:0)Hex

CSR_7(7:0)Hex

Full Edge

Detection

00 Edge threshold value.

A good starting pointis 20(Hex)Horizontal EdgeDetection Only

01 Edge threshold value.A good starting pointis 20(Hex)

Vertical EdgeDetection Only

02 Edge threshold value.A good starting pointis 20(Hex)

Erosion with 8-connectivity

03 Not Applicable

Erosion with 4-

connectivity

04 Not Applicable

Dilation with 8-connectivity

05 Not Applicable

Dilation with 4-connectivity

06 Not Applicable

Not usedFull EdgeDetection bydefault

07

Table 1 Neighbourhood and Morphological

Image Processing functions added to the system


35/115

26

Description CSR_6(7:0)Hex

CSR_7(7:0)Hex

No change 08

0000 1000

Not Applicable

Invert 090000 1001

Not Applicable

Brighten 0A0000 1010

Value determines theamount to brightenby

Darken 0B0000 1011

Value determines theamount to darken by

Increase contrast 0C0000 1100

Value determines theamount to increasecontrast by

Decreasecontrast

0D0000 1101

Value determines theamount to decreasecontrast by

ConvertGreyscale toBlack and White

0E0000 1110

Value determines thegrey value thresholdto use

Not usedGreyscale toBlack and Whiteby default

0F0000 1111

Value determines thegrey value thresholdto use

Table 2 Point Image Processing Operationsadded to the system

Conclusion

This chapter discussed the theory behind some point and neighbourhood image processingtasks. Though admittedly not an extensive discussion it hopefully conveys enough

information for one to gain an appreciation for the task of implementing these systems.Implementation is the focus of the next chapter. All the tasks listed in the two tables above

will be implemented. For a more detailed discussion of image processing theory seeBurdicks excellent book[13] where many algorithms are implemented in C. Awcocks[15]book provides another perspective and for a very mathematical treatment of the subjectsee[16].


36/115

27

CCHHAA PPTTEERR 44

THE NEW DSP CONTROLLER BLOCK ANDIMAGE PROCESSING SUB-BLOCKS

Introduction

This chapter focuses on the implementation of the image processing algorithms describedin the last chapter.

A new version of the DSP controller has been designed. This will henceforth be referred toas version 2 to distinguish it from the original one received from NUI. This original DSPcontroller will be referred to as version 1. Version 2 is almost pin compatible with version1, except for two new 8-bit inputs CSR_6 and CSR_7, that will be used to select andcontrol the image processing function required. Though the footprint is very similar to theold version, which allows for easy replacement, the internal architecture is very different.

There are two major differences between the version 1 and version 2 of the DSPcontroller.

First the controller operation and image processing operations have been separatedstructurally by creating two new sub-blocks within the DSP controller to handle the

various image-processing operations. An overview of the DSP controller version 2architecture is shown in figure 14 below.

Figure 14 Overview of DSP block architecture

State Machine

Pixel_Img_Proc.vhd

PointOperations

Kernel3_Img_Proc.vhd

NeighborhoodOperations

DSPblk.vhd


37/115

28

The central part of the architecture is the state machine controller that takes care of datatransfer to and from the RAM and to and from the sub-blocks. The image processing taskshave been divided into two groups, point operations (pixel_img_proc.vhd) andneighbourhood operations(kernel3_img_proc.vhd).

The second major change is a redesign of the state machine within the DSP controller.Version 1, has a major limitation that needed to be overcome before time intensive imageprocessing algorithms could be implemented. The state machine in the DSP controller

version 1 is designed in such a way, as to assume that all DSP functions can happen in oneclock cycle. This is not the case for even the simplest of functions here, since evenstripping the individual pixel values from the 32-bit values read from RAM take at least oneclock cycle. Note operations in version 1 of the controller were previously performed on32 bit values.

A brief overview of the cycles performed in the state machine of version 1 is illustrated

below:-

Figure 15 Main cycles of DSP Block Version 1

This is sufficient when the Perform Task only takes one clock cycle. However long delayswould be introduced if the task were heavily pipelined, as is the case when neighbourhoodoperations are performed.

Version 2 of the DSP controller addresses this problem by allowing for the fact thatvarious DSP tasks may be pipelined and therefore, there may be a large time lag of manyclock cycles between when the function block receives valid data and when valid data isoutput. In version 2 of the DSP block the state machine has been modified to perform thecycles shown in the figure below.

Read data

PerformTask

Write data


38/115

29

Figure 16 Main Cycles of DSP Block Version 2

From the above it can be seen that at any stage the controller might be continuouslyreading data, at the beginning, alternating between reading and writing data when the timelag of the sub-block or function block has been overcome, or continuously writing datatowards the end.

The design of the 3 VHDL blocks that make up the new DSP controller will now beexamined. First the top-level block DSPblk.vhd is described in more detail and followingthat the two sub-blocks, pixel_img_proc.vhd and kernel3_img_proc.vhd are discussed.

The DSP Controller, DSPblk.vhd

The pin out for DSP block version 2 is shown below. As can be seen the pinout is identicalto version 1, except for the addition of two new input 8-bit buses called CSR_6 andCSR_7. These two bytes are the upper two bytes of the CSR(Control Status Register) andup until now were unused. The user has access to these bytes via the user interface and cantherefore read from and write to them. Making these two bytes available in the DSPcontroller block allows their values to be passed to the image processing sub-blocks, were

they are used to select and control the image-processing task required. Chapter 3

Continue reading and passing datato the function block until valid data

appears

When valid data appears continuereading and writing data until all datais read

When all data is read continuewriting data until all data is written.

No validdata ready

to write

No dataleft to read

data leftto read


39/115

30

documents the CSR values required to execute each task. The following is a complete listof the DSP blocks pins.

Pin Description

clk System clock

rst System reset

dspActive This signal is used to activate the DSP block. While itis 0 the DSP blocks main state machine remains inthe idle state

DatFromRam(31: 0) Data read from RAM. Each RAM location is 32 bitswide

DspAddRange(15:0) Number of RAM locations to be accessed. This isdetermined by the size of the image

CSR_6: (7:0) This input has been added in order to allow an imageprocessing task to be selected by the user. This valuecan be accessed via the user interface

CSR_7: (7:0) This input has been added and used by some of theimage processing functions as a threshold value. It isalso set by the user

ramDone Used to indicate to the DSP when a read or write toRAM is completed

Table 3 DSP Block Inputs

dspDone Signal used by dspBlock to indicate it hasfinished the allotted task

dspRamWr Enable RAM wr access by dspBlock

dspRamRd Enable RAM rd access by dspBlock

dspDat2Ram(31:0) Data from dspBlock to be written to RAM

dspRamAdd(17:0) RAM address (from dspBlock)

Table 4 DSP Block Outputs

A more detailed view of the architecture of the DSP block is shown below. The completeVHDL code for the DSP block can be found inAppendix [4].


40/115

31

Figure

17

DSPBlockArchitecture


41/115

32

The external signals in figure 17 above are shown in bold. A description of the internalsignals is give in the table below:

DataEn Generated by state machine to indicate to sub-blocks thatdata is available

ldcnt0 Generated by the state machine to start the read and writeaddress counters

incReadcnt Generated by the state machine to increment the readaddress counter

incWritecnt Generated by the state machine to increment the writeaddress counter

CS Current state of the state machineWritecnt Keeps track of number of locations written

Readcnt Keeps track of number of locations read

Cnt Keeps track of number of times DSP function block hasbeen enabled

Pipeline_lnt Pipeline length for active sub_block

DataVal Indicates to state machine that data is valid from sub-block

Finish_Writing DSP block is finished writing

Finish_Reading DSP block is finished reading

Total_length Total number of data enables required to complete task

Table 5 Internal Signals of the DSP block

The DSP block contains the two sub-blocks. The sub-block pixel_img_proc isresponsible for point operations, while the sub block kernel_img_proc is responsible forneighbourhood and morphological operations. Both of these blocks are activated and feddata, however only the output of one is chosen using a multiplexer. Bit 3 of the CSR_6register, which is set by the user, determines which block output is chosen. The lower bitsof the CSR_6 register determine which particular function is selected within the sub block.

The main part of the DSP block, is the state machine which controls all activity includingdata to and from the RAM.

A flowchart that describes in detail the state machine operation is shown below. TheVHDL code can be found inAppendix [4].


42/115

33

Figure 18 DSP block main State Machine

As explained earlier the state machine has been expanded to allow for the fact that theoperation of the sub-blocks can be heavily pipelined. Each of the sub-blocks will now beexplained in more detail.


43/115

34

The Point Operations sub-block (pixel_img_proc.vhd)

The Pixel Image processing block is a sub-block within the main DSP controller block. Ithas the task of performing any point operations required on the image, for example,converting a greyscale image to black and white. Point operations were discussed in chapter3. Table 2 listed all the point operations that are implemented by this block.

The following is a complete list of the blocks pins.

Pin Description

clk System clock

rst System reset

RamDataIn(31 downto 0) Data read from RAM. Each RAM location is 32bits wide

DataEn Used to register the RAM data when it is valid

CSR_6: (7 downto 0) At present only the lower three bits are used toselect the specific point operation requiredCSR_6(2:0)

CSR_7: (7 downto 0) Some operations require a threshold value. CSR_7is used to pass this threshold value.

Figure 19 Inputs of the Point Operations block

There are only two outputs relating to the processed data and pipeline length.

RamDataOut(31 downto 0) Processed Data sent back to the DSP controllerblock

Pipeline_lnt(15 downto 0) Pipeline length of the operation. This is requiredby the DSP controller in order to determine

when the data will become valid. At present allpoint operations have the same pipeline length.

Figure 20 Outputs of the Point Operations block

A detailed view of the block architecture is shown below in figure 21 below. The VHDLcode for the block is given inAppendix [5].


44/115

35

Figure 21 Architecture of the Point Operations block

The 32-bit data from the RAM is first seperated simultaneously into individual 8-bit pixels.A multiplexer controlled by the lower three bits of the CSR_6 register determines whichpoint operation is selected. The CSR_7 register value is used by all except two of theoperations as illustrated above. Finally the processed bytes are reassembled into a 32-bit

word before being available to the main DSP block, for writing to RAM. Note there is ineffect only a 32 bit-register at the input and output, with the rest of the internal logic beingcombinational. The bytes are processed in parallel benefiting from the power of ahardware implementation. Only the combinational logic for one of the bytes(pixel 0) isshown in figure 21 above. The pipeline length of this block in terms of the DataEn signal

is 2, i.e very small since no neighbourhood pixel values are required.

The greyscale image shown in figure 22 below will be used to illustrate the effect of thevarious operations. It is 32*32 pixels in size, although it has been stretched here for bettervisibility at the expense of quality. If reading this report online then clicking any of theimages in this report should invoke Microsoft Paint to show the image.


45/115

36

Figure 22 Original Greyscale image used to illustratepoint operations(eye_32*32_grey.bmp).

The following figure shows the effect of the various point operations on this image and thespecific values used for CSR_6 and CSR_7.

Processed Image Function CSR_6(7:0)hex

CSR_7(7:0) hex

No Change

A useful test toensure pipeline delay

etc is correct.

08 Not Applicable

Invert

All pixels values areinverted aroundcentre value of

80(hex)

09 Not Applicable

Brighten0A

Value used tobrighten each pixel by

(20 in this example)

Darken 0B

Value used to darkeneach pixel by



46/115

37

Increase Contrast 0C

Value used to adjustpixel value by


Decrease Contrast 0D

Value used to adjustpixel value by


Convert to Blackand White

0E or 0F

Threshold value to

use in determiningwhether a pixel shouldbe set to black or

white(56 hex in this

example)

Figure 23 Processed images from various point operations

In addition to viewing the images, the actual pixel values were also examined using Matlab.The No Change operation is a useful function to ensure pixel values are not beingcorrupted or shifted in anyway as a result of processing.

The following Matlab command can be used to read a bitmap image.

[im1 map1] = imread('eye_32x32_grey.bmp');

im1 will be a 32*32 matrix containing the pixel values, whereas map1 is the 256*3colourmap.

Individual pixel values can then be examined as follows. For example the follow commandlooks at a 5*5 subset of a 32*32 matrix.

im1(8:12,3:7)

The original image has the following pixel values for the subset im1(8:12,3:7) i.e. 25 values.


47/115

38

122 110 96 86 79

104 90 79 79 82

99 92 88 92 94

105 108 111 111 104

123 125 127 124 121

Table 6 Pixel values of the original image(subset(8:12,3:7)

When the pixels are brightened by 20 (hex) i.e. 32(decimal) their values then become as

shown in the table below. This can be obtained by reading the image into Matlab andexamining the pixel values.

The matlab commands used are:

[im3 map3] = imread('point_brighten.bmp');im3(8:12,3:7)

while the pixel values of the brightened image are as follows:-

154 142 128 118 111

136 122 111 111 114

131 124 120 124 126

137 140 143 143 136

155 157 159 156 153

Table 7 Pixel values of the brightened image (subset(8:12,3:7)

In conclusion then the implementation of point operations has proved successful andrelatively straightforward to implement. The next section now discusses theimplementation of neighbourhood and morphological operations.


48/115

39

The Neighbourhood & Morphological Operations sub-

block(kernel3_img_proc.vhd)

In addition to point operations the DSP block has also been redesigned to performneighbourhood and morphological operations. The neighbourhood operation chosen isPrewitt Edge detection whereas the common morphological operations Erosion andDilation are also implemented. Since all of these operations require the use of a 3 * 3kernel, they have been grouped together into a sub-block called kernel3_img_proc.vhd.

The VHDL code for this block is available inAppendix[6].The user chooses between point and these neighbourhood/morphological operationsusing bit 3 of the CSR_6 register, (CSR_6(3)=0 => neighbourhood operation,CSR_6(3)=1 => point operations). Neighbourhood operations are far more complex toimplement since they require knowledge of a pixels neighbourhood values. This implieslots of memory, ideally internal easily accessible memory, since neighbourhood values must

be readily available. The size of memory or buffering required depends on the image sizeand the kernel size. This will be discussed more in the next chapter where implementationissues are explained.For this project, which is an exploration of image processing on FPGAs and the issuesinvolved, it was decided to fix upon the widely used default kernel size of 3 * 3 pixels.

In the case of edge detection a threshold value also needs to be set to determine howsensitive the edge detection process should be. CSR_7 is used to allow the user pass thisthreshold value. A value of 20 Hex is a good starting value for the threshold. This can thenbe increased or decreased as required.

The pinout for this sub-block is the same as for the point operations sub-block. This isvery deliberate since it was felt all image processing sub-blocks could, for ease of use andsimplicity, have the same pinouts.

The architecture of this sub-block has 3 main parts. Each of these parts are shown onseparate sheets below.

The first part of this sub-blocks architecture is depicted below. It shows RAM dataentering the block on a 32 bit bus, being separated into pixels 8-bits wide and then enteringthe three line buffers. Three line buffers are required to make neighbourhood values fromthe preceding line and the next line available, as well as the current line. The line buffers are

enabled using the Pixel_En signal which is also generated in this block. Four pixel enablesignals will be generated each time a Data_En signal is generated as illustrated in thetiming diagram below. As can be seen, each DataEn valid activates the Pixel_En signal for4 clock cycles. A great deal of time was spent considering the actual implementation of theline buffers and the result of this work is presented in the next chapter. At this point it issuffice to note that the line buffer length is equal to the image width.


49/115

40

Figure 24 Part 1 of the Architecture of the Kernel3_img_procblock

Figure 25 Timing of DataEn and Pixel_En signals

The second part, part 2, of this sub-block, shown in figure 26, consists of the actualoperations as describe above. All of these operations use the pixel values supplied by theline buffers. Part2(a) below shows the implementation of the following morphological

operations:- Erosion using 8-Connectivity

clk

DataEn

Pixel_En


50/115

41

Erosion using 4-Connectivity Dilation using 8-Connectivity Dilation using 4-connectivity

Meanwhile part2(b) of this sub-block, depicted in the following diagram, illustrates the

implementation of the Prewitt Edge detection algorithm. This consists of the following:- Horizontal Edge Detection Vertical Edge Detection Full Edge Detection (Horizontal and Vertical combined)

Finally the third part of this sub-blocks architecture illustrated in figure 28 shows how oneof the operation outputs is selected using the value of CSR_6 register. Since there are lessthan eight operations only the lower three bits are required. Once the appropriateoperation is selected, the processed pixel values from this block are reassembled into 32-bit

chunks and passed back to the DSP controller block.


51/115

42

Figure

26

Part

2(a)o

fthe

Arc

hitectu

reo

fthe

Kerne

l3_

img

_procb

loc

k


52/115

43

Figure

27

Part

2(b)o

fthe

Arc

hitectureo

fthe

Kerne

l3_

img

_proc

bloc

k


53/115

44

F

igure

28

Part

3o

fthe

Arch

itectureo

fthe

Kerne

l3_

img_proc

bloc

k


54/115

45

In chapter 2, the limitations of the current system were discussed. One of these limitationsconcerns the size of greyscale images. It was found that the user software was only reliablefor greyscale images up to 32*32 pixels. For Black and White images the maximum imagesize is 180 *180. Edge detection can generally be performed on greyscale and Black and

White images and therefore the implementation was tested using the 32*32 greyscale eye

image shown earlier as well as the following Black and White image received from NUIGalway.First using the eye image the following results were obtained for Full Edge detection.using different threshold values, which has the effect of making the process more or lesssensitive to edges.

Figure 29 Original eye 32*32 pixels image

CSR_7 = 20 CSR_7 = 30 CSR_7 = 17

Figure 30 Full Edge Detection with differentthreshold values

Horizontal Edge detection means detecting edges as the kernel moves horizontallyacross the image. In effect therefore it will detect vertical lines best. The image aboveconsists mostly of horizontal lines, and therefore horizontal edge detection in this case

detects few edges, even at a the sensitive value of CSR_7 = 17, as shown in the figurebelow.


55/115

46

CSR_7 = 17

Figure 31 Horizontal Edge Detection with a sensitive value of CSR_7

On the other hand, for this particular image,vertical edge detection will detect quite a lotsince these are the edges detected as the kernel moves vertically down the image. Verticaledge detection will be good at detecting horizontal lines.

The result of vertical edge detection on this image is shown below.

CSR_7 = 17

Figure 32 Vertical edge Detection

Note if the results of horizontal edge detection and vertical edge detection are added, theresult is full edge detection for this particular threshold value.

If a Black and White image is used as opposed to Greyscale images then the usersoftware for some unknown reason as present, can cope with larger image sizes up to180*180 pixels. The image below, received from NUI, will be used to illustrate edgedetection on larger(180*180) black and white images. Since the images are black and white,edge detection will obviously not be as sensitive to the threshold value since pixel values

are FF or 00 with no values in between, hence gradient values will either be very large orvery small with no in between.


56/115

47

Figure 33 Black and White image received from NUI.(NUIGImage1_180*180.bmp)

The results of full, horizontal and vertical edge detection are shown below:-

Full Edge Detection withCSR_7 = 20

Horizontal Edge Detectionwith CSR_7 = 20

Vertical Edge Detection withCSR_7 = 20

Figure 34 Full, horizontal and Vertical Edge Detection onBlack and White image (180*180)

Note, edges are depicted here as white lines on a black background. It is very simple tochange the code to have edges shown as black lines on a white background.

Erosion and Dilation are generally performed on binary images and though theimplementation here does allow greyscale images to be used, it is such that only the m.s.bof the pixel value is examined thus performing a sort of black and white conversionbefore erosion or dilation. From the point of view of illustrating the results of the erosionand dilation implementations specific images have been designed to better demonstrate theusefulness of these functions.

The first image developed to illustrate th

towards a programmable image processing machine on a spartan-3 fpga

Documents