design solution for utilization in a low cost portable ...home.iitk.ac.in/~jayant/summer...

31
Report on Design solution for utilization in a low cost portable Ultrasound machine for Telemedicine. Submitted to Dr. Ajai Jain Professor in Dept. of Computer Science and Engg. IIT Kanpur July 10 2006 By Jayant Kumar (Y4174) Naresh Kumar Kachhi (Y4248) Rishi Kumar (Y4355) Sankalp Bose (Y4382) Shishir Jain (Y4414)

Upload: phungtuong

Post on 08-Mar-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Report on

Design solution for utilization in a low cost portable Ultrasound machine for Telemedicine.

Submitted to Dr. Ajai Jain Professor in Dept. of Computer Science and Engg.

IIT Kanpur July 10 2006

By Jayant Kumar (Y4174)

Naresh Kumar Kachhi (Y4248) Rishi Kumar (Y4355) Sankalp Bose (Y4382) Shishir Jain (Y4414)

1

Table of Index

Abstract Acknowledgement Chapters

1 Fundamental of image formation using ULTRASONIC waves

4

2 Simulation on MATLAB 7

3 Excitation Pulse 9

4 Mapping of excitation pulse into SRAM 11

5 Design flow for generation of excitation pulse for ULTRASOUND Imaging

12

6 The Interface for communication of streaming images

13

7 Using the Memory on the FPGA

16

8 Design components

20

9 Image Compression

27

2

ABSTRACT

Ultrasonic waves are the mechanical waves which are generated by applying potential difference across transducer. The conversion of electrical pulses to mechanical vibrations and the conversion of returned mechanical vibrations back into electrical energy is the basis for ultrasonic testing. In this present work we attempt to approach three different issues towards development of a design solution for utilization in a low cost portable Ultrasound machine for Telemedicine.

The first problem deals with generation of Excitation pulses. Ultrasonic imaging is done with a system of linear transducers. An array consisting of 64 to 128 array elements is used for this purpose. More number of transducer helps in scanning a wider area. To steer the beam in a particular direction a set of appropriate delay is applied across the transducers. This delay is implemented by applying electric signals across the transducers in some interval of time. Excitation pulse is nothing but switching on and off at regular intervals and applying proper time delays. Excitation pulse is mapped onto FPGA by writing Verilog code. The code is used for steering and focusing on number of scatters placed in the region of interest. In the second problem we look to develop a framework for lossless compression of streaming images, using the idea of sparse matrices. The third problem deals with designing hardware controller components like the SDRAM, SRAM, PS/2 mouse, PS/2 keyboard, RS232 trans-receiver and VGA controller.

3

Acknowledgements

Our project supervisor, Dr Ajai Jain, is the person who gave us an opportunity to work on this project. We are deeply grateful to him for his guidance, patience and understanding. His idea of sparse processing of images made it possible to achieve high quality lossless compression. He gave us invaluable comments which proved highly helpful for us. We consider ourselves most fortunate for having had the opportunity to work under the guidance Dr. Ajai Jain. Thanks a lot sir.

Our project guide Mr. Shashwat Raizada played a significant role in guiding our efforts in the right direction. He got us acquainted with design tools like Quartus, Nios II, Matlab and ModelSim which proved to be highly beneficial. His optimism and enthusiasm enabled us to try out difficult tasks without the fear of failure. For every problem that we encountered he was able to guide us towards finding the solution.

4

Fundamental of image formation using ULTRASONIC waves

Introduction Ultrasound imaging is a method of obtaining images from inside the human body through the use of high-frequency sound waves. The reflected sound wave echoes are recorded and displayed as a real-time visual image. No ionizing radiation (x-ray) is involved in ultrasound imaging. Ultrasound imaging is based on the same principles involved in the sonar used by bats, ships at sea and anglers with fish detectors. As the sound passes through the body, echoes are produced that can be used to identify how far away an object is, how large it is, its shape and its consistency (fluid, solid or mixed). The ultrasound transducer functions as both a generator of sound (like a speaker) and a detector (like a microphone). When the transducer is pressed against the skin it directs inaudible, high-frequency sound waves into the body. As the sound echoes from the body’s fluids and tissues the transducer records the strength and character of the reflected waves.

Ultrasonic waves are waves of frequency greater than 20 KHz. In Ultrasound practice many sound waves are used together, encompassing a certain thickness called a "beam." A sound beam can be visualized as the region in front of the transducer that can receive or "hear" the sound wave, within which objects can be imaged. Immediately in front of the transducer, this region would have the shape of the transducer face but at farther distance the profile changes significantly. In the case of sound, each point source generates sound equally in all directions creating a spherical wavefront.

5

Beam converges in a region called near field given by formula

Near field length=a^2/wavelength a is the radius of the radius of the transducer. Beam then diverges at an angle given by

Divergence angle = (inv) sin (.61*wavelength/a)

By changing the frequency, beam shape can be altered. The beam thickness is at a minimum at medium distances, an area called the focal region. This region is the optimum length for obtaining an image from the sound wave. A sound wave pulse projected into a medium will partially reflect back towards its generating transducer upon striking any interface. On hearing the echoes, it converts it into electrical signal, which is used later in image construction. Construction of single line: When an ultrasonic beam is transmitted by transducer it gets reflected and scattered upon hitting the interface. The echo is determined by 2 different things: time and magnitude. Time is translated to distance of the interface to the transducer.

d= (1/2)*c*t Where, c is the velocity of sound in that medium and t is the time taken for the wave to travel back to the transducer. Magnitude of echo represents the impedance mismatch on either side of interface. Once the location of each interface is known and the impedance values of the tissues on either side of each interface is also known, an image can be constructed. The area is covered by single sound wave, so resulting image is a 1 dimensional strip.

The magnitude of impedance mismatch can be directly plot (a format known as A-mode) or it can be converted into brightness level in the image (format known as B-mode).

6

A two-dimensional image can easily be reconstructed by repeating many times the procedure for reconstructing a single line.

7

Simulation on MATLAB

Various transducer parameters like center frequency, sampling frequency, focus etc were simulated using Mat lab. Mat lab is software, which allows easy matrix manipulation, plotting of functions and data, implementation of algorithms, creation of user interfaces and many computations. For simulating any organ of a human body, first its bitmap image is drawn also known as phantom and then thousands of scatterers are placed in it. A single RF line in an image can be calculated by summing the response from a collection of scatterers, in which the scattering strength is determined by the density and speed of sound perturbations in the tissue. The phantoms typically consist of 100,000 or more scatterers, and simulating 50 to 128 RF lines can take several days, thus in order to reduce the time, a slight modification was made in the Mat lab code. The number of scatterers was reduced and 50 RF lines were simulated. Simulation of kidney: In the simulation of kidney, its phantom has been made and millions of scatterers were placed randomly.

Phantom of kidney In this phantom, 200,000 scatterers were placed and it was scanned with a 5 MHz 64 element phased array transducer with lambda/2 spacing and Hanning apodization. A single transmit focus 70 mm from the transducer was used, and focusing during reception is at 40 to 140 mm in 10 mm increments. The image consists of 128 lines with 0.7 degrees between lines. Simulating with large number of scatterers takes many days, so in order to save time we simulated the phantom by placing 2000 scatterers and generated 64 RF lines. We made these modifications in the m files that we obtained from the following website: http://www.es.oersted.dtu.dk/staff/jaj/field/examples/example_kidney.html

8

Below is shown two images that were simulated using the phantom.

Real time Simulation: After simulating the phantom image of kidney we get transducer parameters for real time simulation. The table below shows the list of transducer parameters for real time simulation. Transducer center frequency 5 MHz Sampling Frequency 100 MHz Number of RF lines in image 50 Size of image sector 40/1000 Angle increment for image Size of image sector/ number of lines Speed of sound in human body 1540 m/s

9

Excitation Pulse

During real time simulation electrical signals are sent to piezoelectric which convert them into ultrasonic waves. Piezoelectric materials develop electric potential across their surface when mechanical stress is applied on them. Excitation pulse is electrical signals applied across transducer. Beam Steering: To steer a beam and focus in a particular direction proper time delays have to be applied, that is transducer should be switched on and off in such a manner such that beam gets focused in a particular direction.

The electric signals for focusing and steering in a particular direction are applied by using the formula given below.

is the time delay for nth transducer .

10

is the coordinate of the nth transducer.

is the scan depth and c is the speed of the sound. Source: http://www.eecs.umich.edu/~dnoll/BME516

The values time delay for each transducer for a particular value of scan depth is calculated and these values are converted in terms of number of clock cycles.

11

Mapping of excitation pulse into SRAM

To implement the delays, the values obtained from the formula have to be loaded into SRAM. Then these values are read from SRAM by the verilog code and time delays are simulated using the Vector wave form . It’s a GUI provided by Quartus, that a shows a particular transducer gets triggered at a particular time. Problem of huge data: Generation of Excitation Pulse requires a huge amount of data to be loaded onto FPGA memory. For an image line with a length dl = 10 cm, sampled at a frequency fs = 100 MHz and for speed of sound c = 1540 ms-1, the number of the delay values is

Nl = 2dl fs/c= 12900 source: PhD thesis by Tomov

That number, multiplied by the number of channels n, will yield the necessary storage size for the beamformation of one image line. For calculating the necessary storage for a whole image, multiplication by the number of lines m has to be done. For a modest ultrasound imaging system with 64 channels, the number of delay values per frame consisting of 50 lines is Nf = mn2dl fs/c = 40*10^6 source: PhD thesis by Tomov

Solution: To solve the problem of huge data the symmetry in the beamforming can be exploited so that the number of values is halved. Apart from exploiting the symmetry in phased array imaging, PC interface can be built, that passes parameters like initial scan depth, final scan depth and increment in scan depth. This type of interface can be easily made with Borland C++ software. When values are passed to the interface, a code written in C generates the table of time delays for each transducer. The interface loads these values into SRAM and the Verilog code implements these values.

12

Design flow for generation of excitation pulse for ULTRASOUND Imaging

Interface Pass parameters

C code that generates the values for each transducer

Load values in a file

Transfer file in SRAM using Interface

Verilog Code that reads values

Implement time delays to generate Excitation Pulse

Simulate Excitation Pulse on Vector Wave Form

13

The Interface for communication of streaming images

The RS232 protocol standard was chosen as the interface for streaming of compressed images. RS232 is an industry standard communications interface between a PC computer and a peripheral device. Most computers have one or two serial RS-232 interface as standard equipment. RS-232 is used to send the information (compressed image frames) from remote location to receiver (which uncompresses the image and displays the image on a VGA monitor). Introduction:

• Full duplex communication: Receiving and transmitting is independent. • Asynchronous communication: It works independent of clock • Can communicate at a maximum speed of roughly 10KBytes/s.

pin 2: RxD (receive data). pin 3: TxD (transmit data). pin 5: GND (ground).

Data is sent by using pin 3 and is received by using pin 2. Every time data is serialized before sending or after receiving as one bit is sent or received. Format of Serialized bits: Mark state - Initially ideal state Start bit - active low at the start of data Stop bit - end of a byte

14

Speed: The speed is specified in baud, i.e. how many bits-per-seconds can be sent or

received. For example 15200 bauds per second means 115200 bits-per-second can be received or transmitted. For this generate a signal Baud which will change for 115200 times in a second. Speed of data transmission and speed of data receiving should be same. There are some standard speeds we can have, these are 1200 bauds. 9600 bauds. 38400 bauds. 115200 bauds (usually the fastest). Developing the RS232 transmitter 1. Baud Generator Here we want to use the serial link at maximum speed, i.e. 115200 bauds. FPGAs usually run at speed well above 115200Hz. Traditionally, RS-232 chips use a 1.8432MHz clock, because that makes generating the standard baud frequencies very easy. 1.8432MHz divided by 16 gives 115200Hz, so "BaudTick" is asserted once every 16 clocks, i.e. 115200 times a second when using a 1.8432MHz clock. The clock on the FPGA is 50 MHz, therefore a frequency divider circuit needs to be implemented on the FPGA so as to obtain a clock as close to 1.8432MHz as possible. 2. RS232 transmitter A block diagram of the RS232 transmitter interface is as follows

The transmitter takes an 8-bits data, and serializes it (starting when the "TxD_start" signal is asserted).

• The "busy" signal is asserted while a transmission occurs. The "TxD_start" signal is ignored during that time.

The RS-232 parameters used are fixed: 8-bits data, 2 stop bits, no-parity.

3. RS-232 receiver module

A block diagram of the RS232 transmitter interface is as follows

15

Our implementation works like that:

• The module assembles data from the RxD line as it comes • As a byte is being received, it appears on the "data" bus. Once a complete byte

has been received, "data_ready" is asserted for one clock

Note that "data" is valid only when "data_ready" is asserted. The rest of the time, don't use it as new data may come that shuffles it. 4. Problems and its solutions: (I) In RS-232 receiver sampling is done 8 or 16 times of receiving rate. For this a signal Baud8 is generated. This will change for 115200x8 times in a second. This is done because real hardware is not perfect.

(II) Short spike on receiver line are mistaken with start bit. This problem can be solved by checking that this start signal remains high for at least 4 Baud8 cycles.

4. Verification of the module Transmitter: The design was tested by interfacing it with the serial port on the PC.A software called the serial device tester to send and receive data. A predetermined series of data was generated on the FPGA and received on the PC and the results were found to be consistent. Receiver: Some bytes were sent to FPGA board by serial device tester these were stored in SRAM. After storing them they were checked on 7 segment display.

16

Using the Memory on the FPGA

The raw image data received on the FPGA needs to be stored on the primary memory. A 640 * 480 image with 1 byte per pixel information comes out to around 300 KB per image. Since at given time we need to have the information of two image frames and memory equivalent to about two more frames would be required for parsing the images and storing the compressed frame.

The memory options available on the FPGA board are SRAM, SDRAM and FLASH. For the purpose of compressing the images, SRAM is unsuitable because of space constraints. FLASH was ruled out because of its characteristic of wearing out after a certain number of read/writes, which makes it highly unsuitable for a dynamic operation as real time video compression. SDRAM is suitable for our purpose as it is both fast and the space available (16 MB) is quite sizeable for our requirements.

SRAM is used as a VGA buffer. The received data is stored in SRAM and this data is read by VGA module which displays it on the screen. When data is received from a keyboard, information according to that character is loaded in the SRAM. This displays character on the screen. SRAM Overview DE2 board has 512 KB of SRAM memory Read and write can be done in one clock cycle. But only one at a time. It has 18 bit address, each address returns 16 bit data. For VGA , every pixel is assigned 1 byte. So input address is of 19 bit. Addr[0] tells if it is data[7:0] or data [15:8]. Actual SRAM addr[17:0]= in_addr[18:1]

17

SRAM schematic

IP Core:

address[18..0]

clock

reset

data[15..0]

wren

q[7..0]

SRAM_CE_N

SRAM_OE_N

SRAM_LB_N

SRAM_UB_N

SRAM_WE_N

SRAM_ADDR[17..0]

SRAM_DQ[15..0]

sram

inst

Developing the SDRAM interface module The function of this interface is to present a simple “SRAM-like” interface for performing operations in the SDRAM and hide the complicated internal signals controlling the actual hardware. SDRAM Overview SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface. Internally, SDRAM devices are organized in banks of memory, which are addressed by row and column. The number of row- and column-address bits and the number of banks depends on the size of the memory.

18

SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN, and WEN signals. For instance, on a clock cycle where all three signals are high, the associated command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted.

SDRAM banks must be opened before a range of addresses can be written to or read from. The row and bank to be opened are registered coincident with the ACT command. When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it if the row to be accessed is different than the row that is currently opened. Closing a bank is done with the PCH command. The primary commands used to access SDRAM are RD and WR. When the WR command is issued, the initial column address and data word is registered. When a RD command is issued, the initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later. This is known as CAS latency and is due to the time required to physically read the internal DRAM core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency are required. After the initial RD or WR command, sequential read and writes continue until the burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is performed by the SDR SDRAM Controller and is transparent to the user. The LMR is used to configure the SDRAM mode register. which stores the CAS latency, burst length, burst type, and write burst mode. To reduce pin count SDRAM row and column addresses are multiplexed on the same pins. The Controller Interface The block diagram for the SDRAM controller module is as shown below. The host side is driven by the 3 bit signal CMD[2:0] for command. The full list of commands, their 3 bit signal code and the timing requirements for the same can be looked into in the SDRAM technical manual at http://www.altera.com.cn/literature/wp/sdr_sdram.pdf

Debugging/Testing the design

Debugging the design was a major challenge given the size of the logic implemented. To make the job easier the design was first tested in simulation where the behavior of the actual SDRAM hardware is simulated using a module provided by

SDRAM interface system level diagram

19

http://www.micron.com. Since the Quartus environment has an inherent limitation of not being able to simulate results for 224 memory locations, ModelSim had to be used for the purpose. For the purpose of simulation a testbench was written in Verilog which generates a ramp value, writes it onto contiguous memory locations and then after a series of values are written they are read back. After removing the bugs and obtaining consistent results in simulation, the same test was implemented on hardware and consistent results were obtained.

20

Design components

Design of Video (VGA) Controller: The main purpose of the Video controller is streaming the display data from the

FPGA to the Video Graphics Array monitor (VGA). The VGA monitor displays images on the screen by turning on or off pixels (pixel elements). The screen is a two dimensional array having 640 columns and 480 rows of pixels. The monitor has an electron gun which is used to illuminate the pixels. The target direction of the electron beam is so focused such that it illuminates the pixels row wise from left to right starting from the topmost row to the bottom one. The focusing is done using a magnetic field that is controlled by a sync stripper which synchronizes the beam position with the horizontal and vertical sync. The color of the pixel is generated according to the value of the analog video signal. The video controller would be able to get data from the FPGA’s memory/peripherals and generate the timing and control pulses for controlling the monitor. VGA Interface:

The interface between the monitor and the FPGA board is the VGA connector. It has fifteen pins which are described in the figure below.

Signals from Controller to VGA: (a) Horizontal Sync ( HSync ). (b) Vertical Sync (VSync). (c) Red (R). (d) Green (G). (e) Blue(B).

The R, G and B are low-level analog signals (from 0V to 0.7V), while HSync and VSync are digital signals. R, G, B are 10 bit signals in the module these are changed to analog R,G,B signals which are sent to VGA monitor. The monitor displays a picture in a linear fashion line-by-line, from top -to-bottom. Each line is drawn from left-to-right. The drawing can initiate by sending short pulses on HSync and VSync at fixed intervals. HSync makes a new line to start drawing; VSync tells that the bottom has been reached (makes the monitor go back up to the top line).

21

Layout of the VGA screen

For the standard 640x480 VGA video signal, the frequencies of the pulses should be :- (a) Vertical Freq (VSync) : 60 Hz (60 pulses per second). (b) Horizontal Freq (HSync) :31.5 kHz (31500 pulses per second). The structure of a single horizontal and vertical pulse is shown in the diagram below:- Horizontal & Vertical Timing Pulse

22

Generating of the VGA signals: (i) Sync Pulse. The timing for the VGA monitor is obtained from a 25 Mhz clock using two counters. This counter keeps a track of the time for generating the horizontal and vertical sync pulse. Depending on the number of ticks, the polarity of the sync pulse is set. It caters for the front porch, active time, back porch and the blank time. This sync pulse is fed to the VGA monitor. Use of VGA:

VGA monitor is to show the ultrasound image which will be received from the remote location.

Introduction to PS/2 Protocol 1. Data 2. Not Implemented 3. Ground 4. Vcc (+5V) 5. Clock 6. Not Implemented

The PS/2 mouse and keyboard implement a bidirectional synchronous serial protocol. Device always generates the clock signal, but the host always has ultimate control over communication.

23

-host refers to FPGA or computer -device refers to keyboard or mouse

Bus States: Data = high, Clock = high: Idle state. Data = high, Clock = low: Communication Inhibited. Data = low, Clock = high: Host Request-to-Send Serial protocol with 11 bit frames: 1 start bit. This is always 0. 8 data bits, least significant bit first 1 parity bit (odd parity) 1 stop bit. This is always 1 Communication: Device-to-Host:

The keyboard/mouse writes a bit on the Data line when Clock is high, and it is read by the host when Clock is low.

Communication: Host-to-Device: PS/2 device always generates the clock signal. If the host wants to send data, it

must first put the Clock and Data lines in a "Request-to-send" state by Inhibiting communication by pulling Clock low for at least 100 microseconds. Apply "Request-to-send" by pulling Data low, then release Clock. The host changes the Data line only when the Clock line is low, and data is read by the device when Clock is high.

24

PS/2 Mouse Every time a packet of three Bytes is transmitted. Data bits formats:

• Byte 0: YV, XV, YS, XS, 1, 0, R, L • Byte 1: X7..X0 • Byte 2: Y7..Y0

o Where YV, XV are set to indicate overflow conditions. o XS, YS are set to indicate negative quantities (sign bits). o R, L are set to indicate buttons pressed, left and right.

IP Core:

m eter...u

clk

reset_ud

lef t_button

right_button

packet_good

data_ready

error_no_ack

HEX0[6..0]

HEX1[6..0]

HEX2[6..0]

HEX3[6..0]

HEX4[6..0]

HEX5[6..0]

HEX6[6..0]

HEX7[6..0]

ps2_clk

ps2_data

ps2_mouse

inst

Testing the PS2 mouse: The PS2 mouse module was implemented on the FPGA and the data from the

mouse (corresponding to the mouse action) was displayed on a seven segment display. The module was also interfaced to the VGA controller module and a cursor was generated. The cursor trace pattern corresponded to the mouse movements. The color of the cursor was changed on the mouse click. Use of PS2 mouse:

There will be GUI with buttons (Ex. Print, zoom, send message). Mouse will used to press buttons. PS/2 Keyboard Every time one byte is received. Byte received by keyboard is changed into ASCII code (with the case statement).

25

Host Commands:Commands sent by the Host to the Keyboard. The most common command is

setting/resetting of the Status Indicators (i.e. the Num lock, Caps Lock & Scroll Lock LEDs). The more common and useful commands are shown below.

ED Set Status LED's - This command can be used to turn on and off the Num Lock, Caps Lock & Scroll Lock LED's. First send ED to keyboard. Keyboard will reply with ACK (FA) and wait for another byte. For the next byte Bit 0 controls the Scroll Lock, Bit 1 the Num Lock and Bit 2 the Caps lock. Bits 3 to 7 are ignored.

EE Echo - Upon sending a Echo command to the Keyboard, the

keyboard reply with an Echo (EE) F0 Set Scan Code Set. Upon Sending F0, keyboard will reply with

ACK (FA) and wait for another byte, 01-03 which determines the Scan Code Used. Sending 00 as the second byte will return the Scan Code Set currently in Use

F3 Set Typematic Repeat Rate. Keyboard will Acknowledge

command with FA and wait for second byte, which determines the Typematic Repeat Rate.

F4 Keyboard Enable - Clears the keyboards output buffer, enables

Keyboard Scanning and returns an Acknowledgment. F5 Keyboard Disable - Resets the keyboard, disables Keyboard

Scanning and returns an Acknowledgment. FE Resend - Upon receipt of the resend command the keyboard will

re- transmit the last byte sent. FF Reset - Resets the Keyboard.

26

IP Core

ps2_key board_interf ace

inst

clk INPUTreset INPUTrx_read INPUTtx_data[7..0] INPUTtx_w rite INPUTrx_extended OUTPUTrx_released OUTPUTrx_shif t_key_on OUTPUTrx_scan_code[7..0] OUTPUTrx_ascii[7..0] OUTPUTrx_data_ready OUTPUTtx_w rite_ack_o OUTPUTtx_error_no_keyboard_ack OUTPUTps2_clk BIDIRps2_data BIDIR

I/O Type

m u

Use of Keyboard: Keyboard is used to enter some text like identity of patient, some important notes, to chat etc.

inst

27

Image Compression

Image compression would be needed in the following places:- (a) During transmission of the video to the PC. (b) During storage of the video on the PC. (c) For transmission on the network as in telemedicine application. Image compression can be divided into:- (a) Lossless. Those algorithms that maintain full fidelity of the original digital data during the process of compression ("lossless" compression). (b) Lossy. Those algorithms that do not maintain full fidelity of the original digital data during the process of compression ("lossy" compression). The degree of "lossiness" is usually under operator control. Considering the criticalness of the application it was decided to implement lossless compression technique. An analysis of ultrasound videos from different sources reveals that successive incremental frames differ in only 20% of the pixels. Representing the incremented frame as a sparse matrix gave values compression ratios of up to 78 % - 80%. Thus sparse matrix representation of the incremented frame appears as the most suitable contender for the lossless compression. The problem, however, is in obtaining successive compression for rates over 30 frames per second. Using a purely software approach it is possible to get 3-4 frames compressed per second. To circumvent this problem it is decided to implement the algorithm on hardware. The Image Compression algorithm The image compression algorithm uses the sparse matrix principle described above to achieve compression. For this purpose we store the two incremental frames as two chunks of data in the memory. The two sets of data of data are then read back; their XOR is taken and then written back as another chunk in the memory. This chunk is then parsed to remove redundant information about the unchanged pixels. Additionally, header information is added for decompression of the compressed data. It contains the locations and counts of changed and unchanged pixels. Moreover, there is provision for detecting a considerably changed image, i.e. images in which the number of unchanged pixels is more than a certain threshold value. Such an image is transmitted as it is without any compression. The flowchart given below roughly describes the framework for the lossless image compression algorithm to be implemented. The previous image and the incremental image are stored on areas 1 and 2 of the memory. They are then XORed and stored on area 3 of the memory. This XORed image is then read back pixel by pixel, and the header information is generated for the image. The counts of the changed and unchanged run lengths are obtained and are stored along with an indicator bit in the memory area 4. Additionally, unchanged bits are stored on the memory area 5. After an entire frame is processed, the count of the total number of unchanged pixels is compared against a threshold value. If the incremental frame memory area 2 is transmitted on the communication interface else memory area 4 + memory area 5 are transmitted.

28

A

Changed pixel ?

Byes If

complete image read

Counter2++ Counter1++

Counter3++

If counter 2!=0?

If counter 3!=0?

Write 1’b0 and counter3 to area 4 of the memory and read pixel to area 5. Reset counter3.

Write 1’b1 and counter3 to area 4 of the memory. Reset counter2.

yes

AA

29

Timing Analysis of the compression process Since the speed of this entire operation is a critical issue, we did timing analysis for the circuit. The timing analysis given below does not take into account the bottlenecks imposed by the communication interface.

A 640*480 image with 8 bits per pixel is 300KB. Every WRITE operation takes 2 clock cycles i.e. 1 clock cycle for every byte. Each READ takes 4 clock cycles taking into account that before giving read command first we have to set up the registers. For our purpose we would require writing a complete frame, reading the two frames back, writing the XOR value back and then reading it back to get the sparse matrix. Further we need to write about 20% of the unchanged pixels again in the area 5 (based on the assumption that only about 20% of the pixels are unchanged). The writing time of the header information can be neglected in such a frame as it would be very small. The compressed image then obtained has to be read back for transmission.

For such a scenario, we have roughly 2.2 WRITEs and 3.2 READs for every 16 bits. Hence processing the entire frame takes about 0.053s, which gives a frame rate of 19 frames per second assuming a 50MHz clock. In the worst case (completely changed image) we have 3 WRITEs and 4 READs for every 16 bits, which gives a frame rate of 14 frames per second.

B

If counter1 > threshold

value

Transmit 8’hff and incremental frame

Transmit 8’0f and area 4 and area 5 completely

Read next image and repeat process

30

References: DE2 user manual. http://www.theavguide.co.uk/AdvHTML_Upload/vgaplug3.jpg) http://www.beyondlogic.org/keyboard/keybrd.htm http://developer.apple.com/documentation/mac/NetworkingOT/NetworkingWOT-79.html http://penguin.dcs.bbk.ac.uk/academic/technology/physical-layer/asynchronous/ http://www.taltech.com/TALtech_web/resources/intro-sc.html http://en.wikipedia.org/wiki/VGA http://www.computer-engineering.org/ps2protocol/