fpga implementation of mimo systems for ensuring...

1

FPGA IMPLEMENTATION OF MIMO SYSTEMS

FOR ENSURING MULTIMEDIA QOS OVER

W IRELESS CHANNELS

SAKET GUPTA AND SPARSH MITTAL

B.TECH . IVTH YEAR DEPARTME NT O F ELECT R ONIC S AND COMPU TER S ,

INDI AN I NSTI TU TE O F TEC HNO LOGY , ROO RKEE

FOR PART I AL FUL FILLME NT O F BAC HELO R O F TECH NOLOGY I N ELEC TRO NI CS AND COMMUNIC ATIO NS

ENGI NEER ING

UNDER THE SUPER VI S ION OF DR . S. DASG UPT A

ASS I ST ANT PRO FE SSO R , DEPARTME NT O F ELECT R ONIC S & COM PUTE R S INDI AN I NSTI TU TE O F TEC HNO LOGY , ROO RKEE

DATE : 22 T H

M AY , 2008

2

CERTIFICATE

This is to certify that Mr. Saket Gupta (Enrollment No.040157), Sparsh Mittal (Enrollment

No.041112), students of B.Tech IVrd year, Electronics and Communications Engineering,

Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee

have successfully worked on a research project entitled “FPGA implementation of MIMO

Systems for ensuring Multimedia QoS over Wireless Channels” as a part of the final year

B.Tech project.

They have obtained extremely competent results for the problem assigned to them using many

novel approaches. Their results are satisfactory and they have 4 accepted and 2 submitted

research papers in reputed conferences of ACM and IEEE.

Date: May 22, 2008

…………………..

Dr. Sudeb Dasgupta

Assistant Professor,

Department of Electronics & Computer Engineering,

Indian Institute of Technology Roorkee

3

A C K N O W L E D G E M E N T

We would like to thank our project guide Dr Sudeb Dasgupta for his help and guidance in the Final Year

B. Tech. project, and his monitoring our work continuously by taking regular reports. Under his mature

guidance we have felt quite secure in taking up our work even in the most challenging area, in which we

had no background whatsoever. Dr. Dasgupta gave us guidance as to what should be our line of action,

which is very vital. Although there is limited equipment in this area in the department, he provided

access to all what he had, including working in the lab.

We would also like to thank Dr. Ankush Mittal for his help and support. His constant encouragements

and background vision for our work, apart from his wonderful suggestions and personal interest have

helped us a lot. Through his personal example we have learnt to be dedicated and responsible for our

work, and this made our working on the project a truly learning experience. He has been very heartily

providing us with answers to our unceasing questions and doubts in the area.

We would also like to thank our seniors: Amit Pande (B.Tech ’06 Batch), PhD student, Iowa State

University, Ames, Praveen Kumar Verma (senior PhD student, IITR), Naveen (M.Tech 2nd year), who

helped us to come up with the ideas for our project, and gave us help from time to time.

Sparsh Mittal

Saket Gupta

B.Tech. (IV Year, Electronics and Communications Engineering)

Department of Electronics & Computer Engineering

Indian Institute of Technology Roorkee

4

A B S T R A C T

Existing multimedia software in e-learning do not provide par excellence multimedia data

service to the common user; hence e-learning services are still short of intelligence and

sophisticated end user tools for visualization and retrieval. Network QoS (Quality of Servcie)

becomes critical with precision requiring low motion video streaming, over wireless scarce

resource networks having fluctuating bandwidth, fading channels, multiple paths and

requirement of minimal and optimal power usage. In this project, QoS for low motion video

streaming with best perceptual quality is being guaranteed with novel techniques. We consider

educational videos for our research. Our strategy is to segment video into different segments

and code by our pooled compression scheme (CEZW+), and employ Forward Error Correction

coded OFDM signals for transmission over MIMO (Multiple Input Multiple Output) wireless

channels.

We guarantee transmission QoS for such compressed video streaming with maximum reliability

and perceptual quality; selective video frames transmission for least data redundancy; high

data rates with MIMO systems; optimal power allocation for transmission at different BW

levels; and preferential allocation of fluctuating bandwidth according to relative importance of

segmented video blocks. We exploit both Spatial Multiplexing and Alamouti Space Time Block

Coding for transmission. Experimentation results demonstrate effectiveness of our proposed

schemes. We exploit low motion video characteristics to achieve maximum compression and

streaming throughput. The MIMO system is implemented on Xilinx Spartan FPGA (Field

Programmable Gate Array). Parallel implementation of MIMO-OFDM internal configuration on

FPGA through specifically designed process which uses System Generator tool guarantees

optimal performance of testbed which is measured through parameters like prototype

development time, synthesis error elimination, processing time for transmission bit generation

and decoding, FPGA resource utilization and reliability over conventional algorithms for FPGA

implementation like those employing VHDL, and Verilog. The results are compared with the

state of the art transmission and hardware schemes over the network to illustrate the superior

performance of our approach.

5

P U B L I C A T I O N S F R O M T H E W O R K

P U B L I S H E D A N D A C C E P T E D :

1. Saket Gupta, Sparsh Mittal, S. Dasgupta and A. Mittal, "MIMO Systems For Ensuring

Multimedia QoS Over Scarce Resource Wireless Networks", Published in the proceedings of

ACM International Conference On Advance Computing, India, February 21-22, 2008,

Proceedings Yet To Arrive.

2. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "System Generator: The State-Of-Art FPGA

Design Tool For DSP Applications", Accepted for the proceedings of Third International

Innovative Conference On Embedded Systems, Mobile Communication And Computing

(ICEMC2 2008), August 11-14, 2008, Global Education Center, Infosys.

3. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "FPGA: An Efficient And Promising Platform For

Real-Time Image Processing Applications", Accepted for the proceedings of National

Conference On Research & Development In Hardware & Systems (CSI-RDHS 2008) June 20-21,

2008, Kolkata.

4. Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “Guaranteed QoS with MIMO systems for

Scalable Low Motion Video Streaming over Scarce Resource Wireless Channels”, Accepted for

the proceedings of International Conference on Information Processing, ICIP, 2008. Proceeding

to be published by Springer, August 8-10, 2008.

S U B M I T T E D :

5. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "MIMO Systems on FPGA using System

Generator for Providing Educational Multimedia services to Masses", submitted to IEEE

TENCON 2008, Hyderabad, India, November 18-21, 2008.

6. Saket Gupta, Sparsh Mittal and S. Dasgupta, "Optimal Performance FPGA testbeds for low

motion video streaming over MIMO wireless Channels", submitted to 15th IEEE International

Conference on High Performance Computing (HiPC), December 17-20, 2008,

T O B E S U B M I T T E D :

6

7. Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “OTMS: An optimal Testbed for Low

motion video streaming over MIMO wireless channels”, to be submitted in Elsevier

International Journal of Electronics and Communications.

T A B L E O F C O N T E N T S CERTIFICATE 2

ACKNOWLEDGEMENT Error! Bookmark not defined.

ABSTRACT Error! Bookmark not defined.

PUBLICATIONS Error! Bookmark not defined.

LIST OF FIGURES Error! Bookmark not defined.

Chapter 1 : INTRODUCTION Error! Bookmark not defined.

1.1 Need for Video compression 9 1.2 Need for QoS Enhancement for Multimedia Delivery.Error! Bookmark not defined. 1.3 Need for mimo testbeds 11 1.4 Problem statement 13 1.5 Contribution of this work 13 1.6 Organisation of the report 15

CHAPTER 2 : BACKGROUND STUDY 16

2.1 Video Compression Scheme 16 2.1.1 DWT: The Discrete Wavelet Transform Error! Bookmark not defined. 2.1.2 Color Embedded Zerotree Wavelet (CEZW) Scheme 17 2.1.3 Motion Estimation and Motion Compensation as in MPEG 1 19 2.1.4 Frame Packaging 21

2.2 MIMO ISSUES 24 2.2.1. Existing MIMO coding Techniques 24 2.2.2 MIMO-OFDM 24

2.3 MIMO FPGA Testbed Issues 25

2.3.1 Utilization of FPGA in DIP and DSP applications 25 2.3.2 Superiority of FPGA over other implementation platforms 26 2.4 FPGA DESIGN OPTIONS 28 2.4.1. Xilinx System Generator(XSG) 28

7

2.4.2. Superior performance offered by XSG 29 CHAPTER 3 : SYSTEM OVERVIEW 31

3.1 Video classification and Segmentation 32 3.2 Pooled Video Compression 33 3.3 MIMO SYSTEM ARCHITECTURE 34 3.3.1. FEC and OFDM coding for overcoming ISI 35 3.3.2. Spatial Multiplexing 35 3.3.3. STBC and Transmitter system 36 3.3.4. Channel and CSI 36 3.4. QOS GUARANTEEING 37 3.4.1. Fluctuating Bandwidth: 37 3.4.2. Optimal Power Allocation (OPA) 37 3.4.3. Data Rate Increase 40 3.4.4. Reliability 40 3.5. XSG IMPLEMENTATION OF MIMO SYSTEM ON FPGA 40 3.6 FPGA HARDWARE DESIGN 43 CHAPTER 4 : IMPLEMENTATION 45

4.1 STEPS THAT LED TO THE FINAL DESIGN 45 4.2 CODE DEVELOPMENT MODEL 46

4.2.1 Segmentation Module 51 4.2.2 Compression Module 51 4.2.3 Network Streaming Scheme 527

4.2.4 Bandwidth Estimation 49 4.2.5 MATLAB To FPGA 49 4.3 CHALLENGES / FAILURES FACED 50 4.4 CODE DEVELOPMENT FEATURES FOR HIGHER PERFORMANCE 51 4.5 LIMITATIONS & BOTTLENECKS 54 CHAPTER 5: RESULTS 56 5.1. PIB FRAME PACKAGING COMPRESSION 56 5.2. DATA RATE INCREASE 57 5.3. RELIABILITY 58 5.4. BANDWIDTH ALLOCATION 59 5.6. BER THROUGH FPGA IMPLEMENTATION OF HYBRID ALAMOUTI 60 5.7. OPTIMAL PERFORMANCE OF XSG FOR FPGA’S 61 CHAPTER 6: CONCLUSIONS AND FUTURE WORK 64 REFERENCES 65 APPENDIX: CONTENTS OF THE CD 70

8

List of Figures

9

CHAPTER 1: INTRODUCTION

Demands for the multimedia services over wireless are rapidly increasing while the expectation

of quality for these services is becoming higher and higher. However, inherently limited channel

bandwidth and unpredictability of the channel propagation becomes significant obstacle for

wireless communication providers to offer high quality, reliability and data rates at minimum

cost. Transmission and streaming of e-learning videos has been a hot topic and a challenging

issue for many years. Real time delivery or streaming is an essential for most educational

structures. Many institutes, such as MIT and IITs have opened their web servers for free

lecture-on-demand on several courses [1-2]. The concept of remote laboratories also demands

real-time multimedia content delivery [3-4].

Educational videos possess inherent characteristics which can be exploited for robust and

optimal transmission over wireless. These include acceptable coding with scalable bitstream,

provisioning of low bpp for transmission using suitable coding schemes, use of static camera,

predefined components like instructor, blackboard and background and low component motion

rate in video allowing minimal transmission of coded bits. General video coding standards and

formats like MPEG-1, MPEG-2, and H.261 etc. achieve a high rate of video compression but

10

educational videos and these characteristics are not separately dealt by these standards. Thus,

end to end transmission structure for transmission of such videos needs to be assembled.

Fig. 1. Snapshots of classroom Lecture sessions in low motion videos.

1.1 Need for video compression

The smallest unit of quantization in an image is called a pixel element or pixel. A standard video

monitor displays a frame usually with the resolution of 800 * 600 pixels. In color image a pixel is

represented by 3 bytes of data.(one for Red, Blue and Green respectively).Thus even one

uncompressed image requires 1.373 MB of storage. One hour Video at 15 frames per second

will require 72.07 GB of space in our hard-disk and is impossible to transmit. This leads to need

for compression of videos.

One hour of video coded on MPEG standards still takes 500-600 MB of storage. MIT videos are

coded in Real Video and the size is further decreased to around 160 MB. But this is also

unsatisfactory. Educational lectures are slow moving videos and specific applications built to

compress them on the basis of their content can achieve very high compression standards. In

this project we have achieved a high compression rate for educational providing a scalable

solution to ensure best Multimedia QoS over various network conditions.

1.2 Need for QoS Enhancement for Multimedia Delivery.

The distribution of network resources is generally done statically in traditional multimedia

frameworks. However, serious blockages towards such educational multimedia delivery exist.

Such delivery and QoS issues are summarized as follows:

11

1 Addressing QoS issues directly through network and channel issues, thus necessitating

techniques far superior than traditional protocol development for guaranteeing QoS. Such a

need is more acute in video streaming.

2 Requirement of an efficient coding scheme without transmission of redundant data.

Generally, in such videos, the frame-to-frame macroblock instructor or instructional

blackboard or background contents motion is very slow. Conventional coding schemes do

not exploit such redundancy.

3 Fluctuating bandwidth proves as a bottleneck for viewing the lecture videos at good

resolution because of their large size.

4 Educational lectures are extensively used in various institutions and firms, thus entailing

delivery at minimum cost, requiring most optimal power usage for transmission. With

fluctuations in bandwidth, power requirement and channel noise also change, requiring

dynamic system adaption.

5 In a bit rate regulation scheme, the educational video source might sometimes be required

to decrease its output flow due to high traffic load across the network. This decrease

certainly leads to quality degradation (since the quantization distortion becomes more

noticeable at lower bit rates). However, real time educational video streaming requires high

data rates to ensure best uninterrupted perceptual quality of the video. Thus, tradeoff

between the two requirements is needed.

6 Requirement of high reliability, as any loss in instructional content is prohibited.

7 Fading in wireless channels while transmission, resulting in unpredictable loss.

8 Accounting for multi-paths delays and ISI, as this is unsafe for educational video streaming

(which require high precision in bits transmission).

Although considerable work has been done on content-based classification [5-6], content-based

streaming [7], bandwidth adaptation and on network issues [8], a complete framework that

addresses all these issues to provide an end-to-end solution with QoS for educational videos

12

does not exist. Many attempts to exploit the above characteristics for overcoming blockages

have been made. Liu et.al [9] provides a real-time content analysis method to detect and

extract content regions from instructional videos and then adjusts the Quality of Service (QoS)

of video streams dynamically based on video content. However, real time network scenarios as

mentioned above are not included. [7] uses a content based retransmission scheme, but is

generally not preferred in streaming over wireless. In videos with static camera, there is a need

to segment an individual frame into objects so that special coding techniques can be applied to

each of them. Moreover, the regions having instructional content obtained by the other

approaches have arbitrary dimensions which cannot be directly used with the CEZW

compression scheme or ISU/ITU standards like MPEG-1, H.261, etc [10].

In area of formulating resource allocation schemes, Zhang, et. al. [11] addresses such resource

allocation problems. The work is novel and addresses many QoS issues from the network and

protocol perspective. Real time QoS guaranteeing however, involves wireless network behavior

feedback and channel state information for effective feedback and system adjustment.

1.3 Need for MIMO Testbeds

WiFi and existing high-speed cellular networks being deployed today meet some of the above

needs, but OFDM-MIMO, used by WiMAX 802.16e or beyond 3G, is the technology needed to

allow for economic and scalable wireless broadband. MIMO and OFDM are key technologies for

enabling the wireless industry to deliver on the vast potential and promise of wireless

broadband.

MIMO systems can be used to increase system capacity as well as data reliability in wireless

communication systems. Research has been in developing space-time codes [12] for

transmission over MIMO systems. Where these codes provide increase in capacity while

improving data reliability, they assume that all the data bits are equally important for the

receiver. However, videos coded using most of the current standards; different parts of the

bitstream have different importance. This is especially so in educational videos [13]. For high

data rates spatial multiplexing schemes are employed parallel sub transmission [14]. MIMO

13

systems are also used to effectively distribute power available between different video

segments in the most optimal way. [15] presents unequal power allocation (UPA) scheme for

transmission of JPEG compressed images over MIMO systems. [16] guarantees QoS on MIMO

wireless for enhancement and base layers, with differential power allocation. However, they

work under constant transmit power constraint. Power requirements depend heavily upon

network conditions, video quality and network bandwidth. A low cost optimal power allocation

can be achieved only by considering all three factors simultaneously. Where a large amount of

work has been done for UPA for SISO wireless systems, there is very little published work to

date in UPA for image and video communication for MIMO systems, and practically no such

research on real time streaming.

MIMO systems generally require a large processing time when working with video lectures, as

the blocks of bits generated through compressed educational videos are still quite large in

number for real time processing, thus necessitating a huge processing time on software. FPGA’s

can be employed for increase in speed enhancement, as they offer parallel implementation of

time consuming blocks, thus increasing the speed drastically. Where the speed of software is

limited by internal processor clocking and other processes running on the system, dedicated

hardware for such MIMO systems can be developed using FPGA’s.

Recently, Field Programmable Gate Array (FPGA) technology [18] has become a viable target for

the implementation of algorithms suited to Digital Signal Processing applications [19]. Field-

programmable gate arrays (FPGAs) are nonconventional processors built primarily out of logic

blocks connected by programmable wires. Each logic block has one or more lookup tables

(LUTs) and several bits of memory. As a result, logic blocks can implement arbitrary logic

functions (up to a few bits). Therefore FPGAs, as a whole can implement circuit diagrams, by

mapping the gates and registers onto logic blocks. With more than 1,000 built-in functions as

well as toolbox extensions, MATLAB is an excellent tool for algorithm development and data

analysis [20]. An estimated 90% of the algorithms used today in DSP originate as MATLAB

models [21]. Simulink is a graphical tool, which lets a user graphically design the architecture

and simulate the timing and behavior of the whole system. It augments MATLAB, allowing the

14

user to model the digital, analog and event driven components together in one simulation.

Using Simulink one can quickly build up models from libraries of pre-built blocks. Xilinx System

Generator (XSG) for DSP is a tool which offers block libraries that plugs into Simulink tool

(containing bit-true and cycle-accurate models of their FPGA’s particular math, logic, and DSP

functions).

1.4 Problem Statement

The problem tackled in this project is to design a framework for an end-to-end e-learning

solution capable of dynamic video compression and transmission over scarce resource wireless

networks, and implement the system on dedicated hardware for real time robust processing.

The system must be competent to guarantee all necessary QoS in wireless networks, without

manipulating the network protocol systems.

1.5 Contributions of the work

In this project, we propose a new approach for guaranteeing QoS by bridging all the above

mentioned gaps in transmission, with network aware optimal resource allocation for

educational video streaming. This approach, employing MIMO systems implemented on FPGA,

is applicable to all low motion videos is exploited and explained in this project for educational

videos. The major contributions of our work are:

a) System rises above conventional network protocol issues to address QoS for videos.

b) Enabling of a streaming server-client system to perform real-time processing (using FPGA’s)

of videos.

c) Optimal bitstream generation (without redundancy) by pooled compression scheme.

d) Network QoS of optimal power, bandwidth adaptability, high reliability, high data rate and

no loss due to delays in routing, are being guaranteed by the system.

15

e) We combine both spatial multiplexing and STBC schemes to achieve high data rates even

over long distances by the use of MIMO channels. Thus we use available spectrum with the

utmost efficiency to allow higher data throughput over the wireless link

f) Huge Processing time required by MIMO OFDM system is reduced to a fraction of the

original by its implementation on FPGA testbed.

g) Method employed for such implementation is the most optimal as compared to other

conventional methods of ‘burning’ systems on FPGA’s

h) Our system works with different kinds of lecture videos, with varying illumination, changing

lecture environment and noise (unpredictable situations) in the lecture video.

i) MIMO-OFDM coding scheme for video streaming eliminates many conventional complex

processing techniques usually required at transmitter and receiver side (like CRC

retransmissions, ISI removal, etc).

j) Many research groups, professionals, companies and so on, working in the field of digital

electronics, MIMO, wireless communication, image processing, medical science etc are

shifting towards the use of higher level tools and superior methodologies and also

ultimately going for hardware prototyping and implementation of their project. Our work

will provide a boost to their research by giving many insights, and directions.

The proposed system architecture takes pre-recorded lecture videos and efficiently compresses

them so that they can be transmitted by the user in a scalable manner over MIMO channels.

1.6 Organisation of the report

Chapter 2 gives the introduction and related works in Video compression and packaging

schemes and MIMO FPGA testbeds.

Chapter 3 explains the overall system architecture both at server and client ends. It explains the

theory behind various modules.

16

Chapter 4 discusses the implementation of the proposed system architecture. It was first

prototyped and simulated in Matlab. Later, the implementation was performed on FPGA. It also

gives the details of implementation, challenges faced in each step, code development model,

steps that lead to final design and briefly explain the code development features.

Chapter 5 discusses the results obtained over several input videos.

Chapter 6 gives the conclusion and suggested future works.

The references are given at the end.

17

CHAPTER 2: BACKGROUND STUDY

This section discusses the basics of video processing and computer networks and also presents

a literature review of recent developments in these fields.

2.1 Video Compression Scheme

Transform coding has been a dominant method of video and still image compression. It takes

advantage of energy compaction properties of various transforms (such as DCT, DFT, DWT, etc.)

and properties of the Human Visual System to minimize the number of useful coefficients. DCT

has been the popular choice for image and video processing schemes [22, 23]. Here we discuss

those techniques which set a foundation of understanding our hybrid compression scheme.

2.1.1 DWT: The Discrete Wavelet Transform

In image processing, the DWT (Discrete Wavelet Transform) is obtained for the entire image,

and it results in a set of independent, spatially oriented frequency channels or subbands. The

wavelet transform is typically implemented using separable and possibly different filters. It

allows localization in both the space and frequency domains.

Fig 2 : One level of Wavelet Decomposition

Typically the full image is decomposed into a hierarchy of frequency sub-bands. The

decomposition is achieved by filtering along one spatial dimension at a time to effectively

obtain four frequency bands. The lowest subband (LL), represents the information at all coarser

18

scales (as shown in Figure.) and it is decomposed and subsampled to form another set of four

subbands.

This process can be continued until the desired number of levels of decomposition is attained.

Two analysis filters, namely g and h, carry out the decomposition, into independent frequency

spectra of different resolutions, producing different levels of detail. Formulation of sub-bands

does not cause any compression (same number of samples are required to represent the

subbands as is for the original image), but arranges the data in more efficiently codable format.

Figure 2 shows one level of wavelet decomposition.

Several efficient coding schemes have been used for coding DWT coefficients. Embedded

Zerotree Wavelet (EZW) introduced by Shapiro [24] is one such coding scheme for gray scale

images. It exploits the interdependence between the coefficients of the wavelet decomposition

of an image, by grouping them into spatial orientation trees (SOT). It outputs an embedded

bitstream. An embedded bit stream can be truncated at any point during decoding, and can be

used to obtain a coarse version of the image. Decoding additional data from the compressed bit

stream can then refine this version.

2.1.2 Color Embedded Zerotree Wavelet (CEZW) Scheme

For color images, the same coding scheme can be used on each color component. However,

this approach fails to exploit the interdependence between color components. It has been

noted that strong chrominance edges are accompanied by strong luminance edges. However,

the reverse is not true, that is, many luminance transitions are not accompanied by transitions

in the chrominance components. This spatial correlation, in the form of a unique spatial

orientation tree (SOT) in the YUV color space, is used in a technique for still image compression

known as Color Embedded Zerotree Wavelet (CEZW) [25-26]. CEZW exploits the

interdependence of the color components to achieve a higher degree of compression. The

parent child dependency in CEZW is illustrated in figure 3.

19

The coding strategy in CEZW is similar to Shapiro's EZW [85], and can be summarized as follows:

Let ( , )if m n be a YUV image, where { , , }i Y U V and let [ ( , )]iW f m n be the coefficients of the

wavelet decomposition of component i .

Fig 3 : Parent Child Dependency in CEZW scheme

1. Set the threshold T = },,{,2/)]),([(max VUYinmfW ii .

2. Dominant pass:

The luminance component is scanned first. Compare the magnitude of each wavelet coefficient

in a tree, starting with the root, to the threshold T.

If the magnitudes of all the wavelet coefficients in the tree (including the coefficients in the

luminance and chrominance components) are smaller than T, then the entire tree structure

20

(that is, the root and all its descendants) is represented by one symbol, the zerotree (ZTR)

symbol.

Otherwise, the root is said to be significant (when its magnitude is greater than T), or

insignificant (when its magnitude is less than T). A significant coefficient is represented by one

of two symbols, POS or NEG, depending on whether its value is positive or negative.

The magnitude of a significant coefficient is set to zero to facilitate the formation of zero tree

structures. An insignificant coefficient is represented by the symbol IZ, isolated zero. The two

chrominance components are scanned after the luminance component. Coefficients in the

chrominance components that have already been encoded as part of a zerotree are not

examined. This process is carried out such that all the coefficients in the tree are examined for

possible subzerotree structures.

3. Subordinate pass (essentially the same as EZW): The significant wavelet coefficients in the

image are refined by determining whether their magnitudes lie within the interval [T; 3T/2),

represented by the symbol LOW, or the interval [3T=2; 2T), represented by the symbol HIGH.

4. Set T = T/2, and go to Step 2. Only the coefficients that have not yet been found to be

significant are examined.The compressed bit stream consists of the initial threshold T, followed

by the resulting symbols from the dominant and subordinate passes, which are entropy coded

using an arithmetic coder [23].

21

Fig 4 : Flow diagram of CEZW coding algorithm

2.1.3 Motion Estimation and Motion Compensation as in MPEG 1

In image and video coding schemes image is divided into small blocks for operation by

prediction techniques. Motion estimation is used to determine the movement of a macroblock

from the reference frame to the current frame. Motion is estimated by searching for the

macroblock in the reference picture that provides the closest match, as shown in figure 5. The

difference between the values of both the macroblocks is coded for reconstruction at the

decoder. To reduce the distortion between the decoded and the original picture, the encoder

uses a reconstructed reference frame to perform motion estimation. This reconstructed

reference frame is same as used at the decoder side.

Fig 5 : Motion estimation of a block

22

Motion estimation computes one motion vector per macroblock. Usually, the search is

conducted for the luminance component only. A predictive frame is constructed from the

motion vectors obtained for all macroblocks in the frame, by replicating the macroblocks from

the reference frame at the new locations indicated by the motion vectors. The difference

between the values of the predicted and the current frames, known as predictive error frame

(PEF), is then encoded using the same procedure as for an intra-coded frame. The frame

obtained by adding the predictive frame to the PEF is known as the reconstructed frame. The

energy of the PEF is low, thus many coefficients are zero, reducing the number of bits needed

to encode the frame.

2.1.4 Frame Packaging

23

Video compression system identifies three types of frames, depending on the location of all

reference block, I-frame (Intra-coded frame), P-frame (predictive frame), and the B-frame (bi-

directionally predictive coded frame). The video frame is efficiently coded for the I-frame by

motion estimating within the same frame (to reduce spatial redundancy); whereas, the location

of target block for motion estimation/compensation decides the type of coded frame (P-frame

or B-frame). P-frames are coded relative to a temporarily preceding I or P frame and B-frames

are coded relative to the nearest previous and/or future I and P frame. Reasons for coding B-

frames are: 1) better matching of blocks, as it can be matched in reverse as well as forward

direction, hence lower bpp (bits per pixel) after compression and 2) B-frames can be filtered out

from the video stream for lowering the frame rate to improve the load condition of under-

performing network, without considerably affecting the video quality at client side.

Fig 6 : Temporal relations between I, P and B

24

2.2 MIMO Issues

The availability of multimedia resources places new demands on the service that a network

must provide. The most important of these are the data rates, power requirement for

transmission, bit error rate, delay and delay variations, and finally the fluctuations in

bandwidth. Compression algorithms and techniques are critical to the viability of multimedia

streaming and transmission. Uncompressed digital television requires about 140 Mbps BW,

which is practically far fetched as in view of today’s technological advancement. Since few users

have this sort of network access compression and QoS guaranteeing over the network is the

only hope for the widespread deployment of digital video and multimedia.

Multimedia applications, particularly those using video and images demand large bandwidths.

However bandwidth for the foreseeable future will be limited. The limitations arise from the

cost of installing optical fiber transmission, terminal equipment complexity and speed, tariffing

regimes, switching speeds, and increasing numbers of users sharing equipment and networks.

In Indian context, bandwidth is a very crucial factor. Poor network infrastructure in most areas

implies the need of video transmission over poor network conditions with sufficient quality.

Moreover, design of new protocols and modification of existing ones for enhancing QoS leads

to either many clashes or complexity increase in the design of such protocols.

Multimedia also makes new demands on the workstations used to reproduce audio and video.

Processor speeds, transmission systems, storage medium, and network samplers, all be capable

of handling multimedia transmission at high rates.

2.2.1. Existing MIMO coding Techniques

Basic transmission schemes in MIMO systems include :

Precoding: is multi-layer beamforming. In (single-layer) beamforming, the same signal is

emitted from each of the transmit antennas with appropriate phase (and sometimes gain)

weighting such that the signal power is maximized at the receiver input. The benefits of

25

beamforming are to increase the signal gain from constructive combining and to reduce the

multipath fading effect. In the absence of scattering, beamforming results in a well defined

directional pattern, but in typical cellular conventional beams are not a good analogy.

Precoding requires knowledge of the channel state information (CSI) at the transmitter.

Spatial multiplexing(SM): requires MIMO antenna configuration. In spatial multiplexing, a high

rate signal is split into multiple lower rate streams and each stream is transmitted from a

different transmit antenna in the same frequency channel. If these signals arrive at the receiver

antenna array with sufficiently different spatial signatures, the receiver can separate these

streams, creating parallel channels for free. Spatial multiplexing is very powerful technique for

increasing channel capacity at higher Signal to Noise Ratio (SNR). The maximum number of

spatial streams is limited by the lesser in the number of antennas at the transmitter or receiver.

Spatial multiplexing can be used with or without transmit channel knowledge. Such techniques

have discussed in [14]. V-Blast [25] and D-Blast [26] techniques aiming at data rate increase

have also been developed with layered architecture for achieving spatial multiplexing.

Space Time (Diversity) Block Coding (STBC): techniques are used when there is no channel

knowledge at the transmitter. In diversity methods a single stream (unlike multiple streams in

spatial multiplexing) is transmitted, but the signal is coded using techniques called space-time

coding. The signal is emitted from each of the transmit antennas using certain principles of full

or near orthogonal coding (such as in OFDM). Diversity exploits the independent fading in the

multiple antenna links to enhance signal diversity. Because there is no channel knowledge,

there is no beamforming or array gain from diversity coding. Alamouti [27] proposed a basic yet

robust architecture for space time coding, which has been the foundation for many coding

schemes proposed until now [28, 29, 30].

26

Fig 7 A simple MIMO structure

2.2.2. MIMO-OFDM:

MIMO works by creating multiple parallel data streams between the multiple transmit and

receive antennas (see figure below). Using the multi-path phenomenon, it can differentiate the

separate signal paths from each MIMO antenna. It is combined with another radio technology

with tremendous potential for helping solve spectrum challenges is OFDM (Orthogonal

Frequency Division Multiplexing). OFDM is a modulation technique, depicted in the following

graphic, which uses many sub-carriers, or tones, to carry a signal [32].

Typically, OFDM, a spread-spectrum technology that gives wireless networks a new physical

layer, is implemented in embedded chipsets made up of radio transceivers, Fast Fourier

Transform (FFT) processors, system input/output (I/O), serial to parallel and back again

translators and OFDM logic. Figure 1 shows the working of OFDM. OFDM system takes a data

stream and splits it into N parallel data streams, each at a rate 1/N of the original rate. Each

stream is then mapped to a tone at a unique frequency and combined together and sent to

Inverse Fast Fourier Transform (IFFT) to yield the time domain waveform to be transmitted. By

creating slower parallel data streams, the bandwidth of the modulation symbol is effectively

decreased. OFDM can be simply defined as a form of multi-carrier modulation where its carrier

spacing is carefully selected so that each sub-carrier is orthogonal to the other sub-carriers. As

it is well known, orthogonal signals can be separated at the receiver by correlation techniques,

27

hence, inter-symbol interference (ISI) among channels can be eliminated. Proper selection of

system parameters, such as number of tones and tone spacing, can greatly reduce, or even

eliminate ISI.

Some of the benefits of OFDM are: Better coverage and penetration; Reduced operation and

installation costs; Ultra high spectral efficiency; High resistance to multi path; Minimized inter

symbol interference. General OFDM is implemented as:

Fig 8. Basic Structure for OFDM Modulation of bitstreams.

2.3 MIMO FPGA Testbed Issues

2.3.1 Utilization of FPGA in DIP and DSP applications

A lot of research has been recently done on utilizing FPGAs as development platform for DIP

and DSP algorithms. The authors in [33] have developed a high level language (called SA-C) for

expressing DIP algorithms, and an optimizing compiler that compiles the high-level program

written on SA-C and runs them on FPGAs. SA-C is a single-assignment dialect of the C

programming language designed to exploit many features of FPGAs [34, 35]. To compare the

28

performance of FPGAs and the Pentium processors, they have implemented SA-C programs

compiled to a Xilinx FPGA to equivalent programs running on an 800 MHz Pentium III. For 8

common DIP routines implemented on both these platforms, FPGAs offer 8 to 800 times speed-

ups over the Pentium. Experiment results and analysis of various issues such as pipelining,

parallelism, optimizations, memory, I/O etc, brings out many prominent features of the FPGAs,

relevant to image processing realm. In [36] they present performance numbers for several

image-processing routines such as Gaussian, max and Laplace filter etc, written in SA-C.

The authors in [37] present a pipelined architecture of image processing algorithms like median

filter, basic morphological operators, convolution and edge detection implemented on FPGA.

The hardware modeling is done with the Handel-C language. Moreover, in their work [38], the

performance and efficiency of Handel-C language on image processing algorithms is compared

at simulation level with another C-based system level language called SystemC and at synthesis

level with the industry standard Hardware Descriptive language (HDL), Verilog. Comparison

parameters at simulation level include, man-hours for implementation, compile time and lines

of code. Comparison parameters at synthesis level include logic resources required, maximum

frequency of operation and execution time.

The author in [39] implemented the Rank Order Filter, Erosion, Dilation, Opening, Closing and

Convolution algorithms using VHDL and MATLAB on two FPGA platforms. He also integrates the

FPGA algorithms into the modeling environment called ACS. The authors in [40] report the

speed ups that FPGAs offer on image processing methods (such as image denoising and

restoration, segmentation, morphological shape recovery etc.) on 2D and 3D images. In

computer vision and image processing, FPGAs have already been used to accelerate real-time

point tracking, stereo, color-based object detection, and video and image compression [40] (see

also [38]). Crookes presented a hardware FPGA implementation of image filtering to increase

the speed [42]. The authors in [43] applied three 2-input bubble sorting algorithm to obtain a

triple input sorter and implemented it in FPGA. This algorithm can be utilized to obtain the

maximum, middle, and minimum values and hence can be used to realize the 2-D sorting.

29

In the field of digital signal processing, FPGAs have been used widely, for implementation of

various algorithms of diverse complexity. FPGAs provide an ideal implementation platform for

developing a variety of wireless systems [44].The authors in [45] discuss the implementation of

an FIR (Finite-Impulse Response) filter with variable coefficients that fits in a single FPGA. In

some application of DSP such as Software-defined radio (SDR) [46] the performance depend,

not only on latency and throughput but also on speed and energy efficiency. Due to SDR’s

adaptivity and high computational requirement, an FPGA based system is very viable solution.

The authors in [47] present techniques for energy-efficient design at the algorithm level using

FPGAs. They apply these techniques to create energy-efficient designs for two signal processing

kernel applications: fast Fourier transform (FFT) and matrix multiplication (see also [48]).

2.3.2 Superiority of FPGA over other implementation platforms

DSPs: The primary reason most engineers choose an FPGA over a DSP is driven by the MIPS

requirements of an application. The shift towards FPGA today is motivated by the emergence of

innovative technologies like MIMO that utilize complex DSP algorithms. Such high performance

and complexity requirements call for the massive parallel processing capabilities inherent to

FPGA. Even novices in FPGA design can easily implement and validate FPGA-based DSP designs

by using model-based design tools and methodologies which have greatly simplified the whole

design [49]. As many of its features demonstrate, an FPGA is a more native implementation for

most digital signal processing algorithms.

The authors in [50] compare the development effort and performance of a field programmable

gate array (FPGA)-based implementation of a signal processing solution with that of a

traditional digital signal processor (DSP) implementation. They have implemented an acoustic

array processing task, where several traditional DSP “functions” found in many applications can

be employed. As they note, in terms of timing performance, the FPGA implementation is

significantly faster than the DSP.

ASICs: The authors in [51] discuss the advantage that FPGAs have due to their product reliability

and maintainability which also improves its development process. Since FPGA verification tools

30

are closely related cousins of their ASIC counterparts, they have benefited enormously from the

many years of investment in the ASIC verification. The use of FPGA partitioning, test benches

and simulation models have made both integration and on-going regression testing very

effective for quickly isolating problems, speeding the development process, and simplifying

product maintenance and feature additions.

General Purpose software: Software implementation of most image processing algorithms has

several limitations and hence it is quite difficult to achieve. Complex operations have to be

realized by a large sequence of simple operations, which can only be implemented serially. The

range of available operations is limited to common basic operations. The constraint of real-time

processing introduces a number of additional complications. These include such issues as

limited memory bandwidth, resource conflicts, and the need for pipelining. The CPU is

burdened with additional tasks, such as OS requests, user interaction, etc., which is a major

drawback in the context of real-time processing. At real-time video rates of 25 frames per

second a single operation performed on every pixel of a 768 by 576 color image (PAL frame)

equates to 33 million operations per second (excluding the overhead of storing and retrieving

pixel values.) Many image-processing applications require that several operations be performed

on each pixel in the image resulting in an even large number of operations per second. As a

result it is difficult to meet hard real time requirements with softwares [40].

The salient features of FPGAs that make them superior in speed, over conventional general-

purpose hardware like Pentiums is their greater I/O bandwidth to local memory, pipelining,

parallelism and availability of optimizing compiler. Complex tasks, which involve, multiple

image operators, run much faster on FPGAs than on Pentiums, in fact, ref [33] report 800 time

speed up by FPGA using SA-C.

2.4 FPGA Design Options

The initial efforts to generate a hardware netlist for an FPGA target have been to use some

form of a Hardware Description Language (HDL), such as VHDL or Verilog, as a behavioral or

31

structural specification. The focus has been shifting, however, from traditional HDLs to higher

level languages. Presently, a range of high-level tools and languages for FPGA design are

available including: Celoxica’s Handel-C [52] – a C based parallel language that generates EDIFs;

C++ extensions such as SystemC [53], AccelChip’s AccelFPGA – a Matlab synthesis tool that

generates RTL; Xilinx’s Forge Java *54+ – a Java to Verilog tool, or Java classes such as JHDL [55],

X-BLOX from Xilinx (see also [56]). Certain tools such as C-Level Design *57+ convert “C”

software into a hardware description language (HDL) format such as Verilog or VHDL that can

be processed by traditional FPGA design flows. Annapolis MicroSystems has developed

“CoreFire” which uses prebuilt blocks which removes the need for the back-end processes of

the FPGA design flow. Ptolemy is a system that allows modeling, design, and simulation of

embedded systems. Ptolemy provides software synthesis from models. However, all these

systems are still under development and have their own limitations.

2.4.1. Xilinx System Generator

Xilinx System Generator (XSG) for DSP is a tool which offers block libraries that plugs into

Simulink tool (containing bit-true and cycle-accurate models of their FPGA’s particular math,

logic, and DSP functions). The AccelDSP tool allows DSP algorithm developers to create HDL

designs from MATLAB and export them into System Generator for DSP. With more than 1,000

built-in functions as well as toolbox extensions, MATLAB is an excellent tool for algorithm

development and data analysis. An estimated 90% of the algorithms used today in DSP

originate as MATLAB models [58]. Simulink is a graphical tool, which lets a user graphically

design the architecture and simulate the timing and behavior of the whole system. It augments

MATLAB, allowing the user to model the digital, analog and event driven components together

in one simulation.

The authors in [49] design a complete 2 × 2 transceiver, including the Alamouti space-time

decoder, using blocks from the System Generator for DSP tool. The authors in [59] discuss

design of digital electronics using XSG and Matlab. The design of digital electronics today begins

with the building of a model in Simulink Matlab. The model is later implemented in an FPGA

from Xilinx using VHDL source code. They implemented a model of a frequency estimator often

32

used in digital radar receivers in Matlab using XSG. The authors in [60] present a methodology

for implementing real-time DSP applications on a reconfigurable logic platform using XSG for

Matlab. As an example for the demonstration of the design methodology they implement a

function that measures the time delay between two sine waves. This function has applications

in a radar system.

The authors in [61] have developed a rapid prototyping FPGA-based platform for wireless

transmission using the Matlab and System Generator softwares. Their goal is to build a versatile

educational platform for wireless transmission based on FPGA design. The FPGA-based platform

design flow in System Generator consisted of a basic four-step methodology, namely system

design, verification and simulation, implementation and configuration.

2.4.2. Superior performance offered by XSG

A promising and efficient platform such as FPGA can be fully exploited with a powerful tool.

Current research has demonstrated the performance gains offered by System Generator, over

the other design tools, such as classical HDLs. Ref [62] reports an 80% reduction in Project

development time and reduction of 4-to-1 in Overall project time, including hardware

integration and lab testing using XSG. In a project involving creating a fully functional SDR

waveform using the traditional design flow of hand coding using VHDL took an experienced

designer 645 hours which another less experienced engineer completed in fewer than 46 hours

using Simulink and Xilinx System Generator. Efficient GUI for design entry and facility of code

reuse, together with automatic configuration of design blocks for varying bit widths and

number of iterations reduce the design time and overhead of the engineer by many orders.

It has been argued [56] that the decrease in time for testing and verification alone is worth the

migration to System Generator. A 2X to 4X productivity improvement is likely to be achieved

using System Generator over conventional HDL language development methods due to its

design environment and simulation speed. In literature, comparison has been performed

between several design options. In [38], the performance and efficiency of Handel-C language

on image processing algorithms is compared at simulation level with another C-based system

33

level language called SystemC and at synthesis level with the industry standard Hardware

Descriptive language (HDL), Verilog.

34

CHAPTER 3: SYSTEM OVERVIEW

We present algorithmic implementation of video streaming for preferential adaptation of

CEZW+ (Color Embedded Zero Wavelet) compressed videos over wireless, in the facade of

variable network bandwidth and channel conditions. The video lecture is segmented into it’s

component blocks. Compression scheme for such segments includes CEZW coding of video bits,

combined with motion compensation and prediction. This combined scheme removes

redundancy present in the video. Compressed bits are then sent to the Dynamic Decision

Maker (DDM), which consists of QoS (Bandwidth, transmission power, delay, data rate and

reliability) provisioning module. Bandwidth is estimated and fed to the DDM, where segments

with higher relevance are allocated more bandwidth and vice versa based on their relative

importance and motion.

QoS PROVISIONING

Control Feedbac k

Information and CSI

Teacher motion

sequence

Black board motion

sequence

Background motion

sequence

t

e

x

t

VIDEO

SEGM ENTATION

BLOCK

VIDEO COMPONENT

CLASSIFICATION

(2-2) X 3

MIMO Transmission

System

IMPLEMENTATION

ON FPGA

35

Fig. 9. System Overview

The DDM also includes allocating power to sub channels according to fluctuating BW, as well as

determines the total constraint power to be allocated for minimal cost. CSI from the receiver is

fed back to the receiver via feedback. DDM adjusts system’s working according to BW and CSI.

Bits are then passed to the MIMO system, where they are OFDM modulated, and the resulting

symbols are Alamouti STBC coded, transmitted through a (2-2)X3 MIMO antenna system,

received and ML decoded at the receiver. These decoded bits demodulated and checked for

errors. The resulting bitstream is decompressed and reconstructed video is presented to the

user. Alamouti STBC is combined with spatial multiplexing. SM is achieved by our novel method

of transmitting segments of the segmented video through different sets of the transmitting

antennas.

3.1 Video classification and Segmentation:

A typical classroom lecture has Teacher Writing On Board (TWOB) sessions (Fig. 1). These

sessions have a teacher (complete view) writing on Black Board and explaining the class points

to the students. Other types of frames constitute a very low portion of video, have low motion

and can be directly coded. For the sake of brevity we limit our discussion to TWOB sessions

only.

As, video classification is the most common and first step employed in all parallel data

transmission system, hence we exploit the characteristics of TWOB frames, which possess some

typical distinctiveness assisting in determining them from other frames: The complete board is

visible, motion is slow, some students may be visible in the view and a significant part of the

tutor's body may be visible in the frame (see Fig.1). The height of teacher in these frames

generally doesn’t vary much. Thus they can be easily identified. They are classified into three

visual objects: background, Black Board and the tutor. In this work, we have modeled the

background for static camera.

36

Many such schemes have been proposed [13, 17]. In this system, Sobel edge detection based

blackboard extraction is used for horizontal and vertical BB edges, followed by blurring and

intra pixel filling for determining the BB. For background prediction, we choose a set of frames

that have significant tutor motion. Teacher region is removed for extraction of BG. Finally,

teacher region is extracted by comparing the present frame with the modeled background and

blackboard.

3.2. Pooled Video Compression

A combined video compression, including a) enhanced and more efficient algorithm of

conventional PIB (MPEG1) packaging and b) CEZW coding (hence the name CEZW+) is being

used for video compression and removal of redundancy in frame bits for each segment

(teacher, BB or BG). Slow motion videos consist of macroblocks with negligible motion vector

for consecutive frames. Our novel method motion compensates individual segments to yield

PIB frames with just one frame in a GOP size n (say 24). This is achieves as follows. The change

in motion vector is threshold at a value , determined by MSE cutoff function , determined

for a particular GOP as follows:.

,1 , / , / ,2 / ,( 1) / ,max{ ( , ), ( , ),......, ( , )}I I GOPsize m I GOPsize m I GOPsize m I m GOPSize m I GOPsizeMSE F F MSE F F MSE F F

k

37

Fig.10 Consecutive Differences of frames 1 to 10 in relatively fast, slow motion Sparsh Lecture

Video - Changes in MV with frame transition. Significant change at frame 8

where , / ,2 /( , )I GOPsize m I GOPsize mMSE F F denotes the motion vector estimation of frames situated

every 1/mth distance of the GOP size. m is known as the leap factor, and represents the

distance between two non-redundant frames. , /I GOPsize mF represents the frame contents as if

this frame was an Intra frame at 1/m distance of the GOP size. is made proportional to ,

through the constant k , which enables flexibility in choice of suitable motion vector threshold

desired by the coder. Leap can be increased if desired value can be afforded higher with same

perceptual quality. Once is determined, there is no need of coding or constructing any P or B

frames for which motion-vector value is below , and the estimation at the receiver is done by

using same I frame again for reconstruction, without affecting perceptual quality. This removes

high amount of redundancy present in original PIB frame packaging.

38

Resultant frames are then further coded using CEZW scheme [13]. Compressed bit stream

consists of initial threshold T, followed by the resulting symbols from the dominant and

subordinate passes, which are entropy coded using an arithmetic coder. Scaling factor decides

ratio of bits per pixel available to the Y component to the bits per pixel available to the U and V

components. The input segment frame is read as R, G and B components and converted to Y, U,

V color space and down sampled.

Fig 11. CEZW+ compression module steps

3.3 MIMO System Architecture

The MIMO system consists of multiple input and output antennas. Input data in the form of

compressed video bits are suitably coded and then transmitted with special hybrid scheme of

spatial multiplexing and space time block coding. These are described stepwise as follows:

Advanced PIB frame Packaging

2D-Wavelet Decomposition

CEZW compression coding

CEZW+ Coded

Video

DYNAMIC

DECISION

MAKER

(DDM)

Component

Classificatio

n

MIMO OFDM

TRANSMISSION

Structure

Teacher

A N T E N N A

Hybrid

Alamouti

CE

ZW

+

Co

mp

res

sio

n

Blackboard

A N T E N N A

SM

FPGA

SYNTHESIS

VID

EO

IN

PU

T

TO Channel

39

Fig. 12. Architecture of transmitter testbed

3.3.1. FEC and OFDM coding for overcoming ISI: FEC is used for avoiding retransmissions and

bit corrections. A major blockage in streaming is ISI (inter symbol interference due to delays).

Normal video transmission can cope up with delays through many algorithms, buffering, etc.

However such methods cannot be employed in real time video streaming as any delay is

hinders video display altogether and thus is intolerable. OFDM guarantees that ISI is overcome

through orthogonal sub channel formation. We use 1024 point FFT for coding compressed

bitstream blocks. These blocks are appended with cyclic prefix (CP) which ensures further

protection from delay and ISI.

3.3.2. Spatial Multiplexing: Our new structure achieves Spatial multiplexing by transmission of

different segments of video through different antenna sets as shown in the figure. Note that

segment C is background. This segment has negligible change for the whole educational video,

as is determined by above coding scheme, and is thus transmitted only when a significant

change has occurred due to unpredicted movement in the background when the teacher is

instructing. In such a case, the background bits are accommodated with the other two

segments, although in different bandwidth regions. Thus a separate transmitter antenna set for

segment C is not constructed.

3.3.3. STBC and Transmitter system: For STBC, transmit and receiver diversity is achieved with

(2-2) X 3 MIMO structure using Alamouti coding and ML decoding scheme. Each encoded and

compressed segment. There are two sets of antennas at transmitted. Set A transmits Segment

A (teacher), and Set B transmits segment B (Blackboard). Each set consists of two antennas for

Alamouti coding of each segment separately.

40

Fig 13. Coding schematic for Simple Alamouti STBC.

These are then transmitted. Reception has a complex yet efficient hybrid structure. At the

receiver, antennas A and B form a set for receiving Segment A; Antennas B and C form the set

for receiving Segment B. Thus the original 2X2 Alamouti is still being employed, but with hybrid

robust structure. Such hybrid structure cannot be used at transmitter due to space time coding

constraint of Alamouti, which is not necessitated at the receiver. Moreover, downlink receiving

signals for different segments are easily differentiated through filters, as different segments are

transmitted in different BW regions. Figure summarizes the transmitter receiver system..

Fig.14. Receiver system Antenna Selection

3.3.4. Channel and CSI: In this system, primary MIMO channel model under consideration is a

random, frequency non-selective, Rayleigh fading channel model. Channel for MIMO system

with MT transmitter antennas and MR receiver antennas, the system equation is Y = HX + N

where Y is the MR X 1 received signal vector, X is the MT X 1 transmitted signal vector, H is

MRXMT channel matrix, and N is the MRX1 Rayleigh Channel noise matrix. We further assume

that the estimated channel state information (CSI) at the receiver is available at the transmitter

through feedback. This is used in extracting noise to carrier ratios and bandwidth estimation

and adjusting the total signal power as discussed in following sections.

3.4. QoS Guaranteeing

MIMO Transmitter

(2-2)

SEGMENT A

SEGMENT B

SEGMENT C

MIMO Receiver

(3)

ML Decoding

A

C

B

FOR SEGMENT

A

FOR SEGMENT

B

41

3.4.1. Fluctuating Bandwidth: With change in bandwidth, and constraint on the amount of data

transmission, important data needs to be streamed at lower bandwidth, while allocating low

relative BW to those with lesser importance (Change in B contents are given higher importance

than teacher movement, while vice versa when teacher motion is present rather than change in

BB contents). This is achieved through bandwidth distribution based on PSNR of video frames,

which is available from the coding block. Let the total available BW at any time t is WtT. Further,

let WtTCR, WtBB be the allocated BW for teacher and Blackboard. Let PSNRFt

TCR, PSNRFtBB be the

PSNR for Teacher and Blackboard regions for Frame Ft. We also assume the BW to be constant

for a frame, which is quite practical and obvious. Then, BW for segments are allocated BW

according to the following metrics:

( )( ) ( ) TFtt tTCR BB BG

Ft Ft Ft

PSNRW W

PSNR PSNR PSNR

where, is TCR, BB and BG, 1, ( ) || 0,frames BGMSE BG otherwise

Here, MSE cannot be a metric as in [13], as we have used advanced PIB coding with redundant

motion vector removal, thus inhibiting the use of MSE metric. Higher preference to coded

frames with a greater PSNR ensures the maximum information streaming. Also, images with

PSNR below threshold need not be communicated, as they can be replaced by earlier frames,

without any visual perception loss.

3.4.2. Optimal Power Allocation (OPA): Noise in channels increase with BW increase. Thus,

total power requirement should conform to increased power requirement. We can no longer

depend on constant total power constraint for all different transmit streams. Low power at

lower BW provides with lower usage and thus saving of power and reducing cost. Also, higher

BW needs higher power allocation. Thus, we develop OPA scheme, with determination of total

transmit power based on CSI and BW estimation. Let at time t, the reference power

requirement at BW WtR be PtR (all power units as dBm). Let at time t+t’, changed BW be

Wt+t’C, and total power allocated for transmission be Pt+t’A, then an increase or decrease in

power is determined dynamically through the following equation:

42

' '

1 10 ' 10'

10

1

/ /log log

( )log

/ /

niA iA

t t t t C RiA R R t t t

t t t t RniR iR tt t

i

N C nW W

P P PW

N C n

where ' '

1

/ /n

iA iA

t t t t

i

N C n

is the mean Noise to carrier ratio determined through CSI for new

power allocation for n sub-channels at time t+t’. Similar is for reference (t) being used. Factor

10 ' 10 10[log log ]/[log ]C R R

t t t tW W W is included to account for the fact that a 10 KHz fluctuation

at 100 KHz reference is quite less significant than a 10 MHz fluctuation at 100 MHz. This factor

ensures that above formula does not yield an absurd value of new power in such cases.

Fig 15. N/C increase with increase in BW

N/C may be typically high for one sub-channel, while low in others. In such cases, averaging

yields far-off estimate of actual channel conditions. Thus, mean is replaced by maximum value

of N/C obtained in sub-channels. In our system, as there are two sets with two antennas each

and two receive antennas for each set, hence n=8. Figure 5 shows wastage of power at lower

43

BW and extra requirement at higher BW. Due to increase in N/C at higher BW, additional power

is required for streaming.

Fig.16-17. Constant power allocated and resultant bottlenecks in power allocation and

Integration of Power allocation module

3.4.3. Data Rate Increase: We achieve high data rate increase due employment of spatial

multiplexing in the system, over conventional SISO, MISO or non-SM MIMO channels. Parallel

WATERFILLING

ALGORITHM BASED

UPA

TO RX

DDMLOSSY NETWORK

CHANNEL

CSI

BW ESTIMATION

POWER

ALLOCATION

MODULE

MIMO - TX

STRUCTURE – UPA for

different MIMO

transmission antennas

44

transmission ensures high data rate. Also, data rate increases due to fact redundant PB frames

are removed during the compression. Thus, a higher video data can be transmitted in a much

lesser time. This is shown in the experimentation. Also, due inherent MIMO property, data rate

increase is depicted by the following conventional Shannon capacity equations: H here denotes

the channel matrix or the transfer function of the channel, with the output data being: rt

=Hst+vt, r is the received signal vector, X is the transmitted signal vector, v is the noise vector.

The capacity equations are:

C=log2 (1+ SNR x |H|2) for SISO channel

C=log2 (1+SNR x HH*) for MISO/SIMO channel

2 M

SNR HH*C = log det I +

N

for MXN MIMO channel

here, H = abs (H); H* = conj (trans (abs (H))); IM is the identity matrix; M is the number of

transmit antennas, N is the number of receive antennas. However, even a fraction of such

capacities are hardly reached practically. Therefore, we use water filling algorithm with OPA

discussed above to ensure maximum throughput, as WF algorithm iterates to reach Shannon

limits. This is proved experimentally.

3.4.4. Reliability: Alamouti scheme results in decrease in BER of system. We use channel FEC

coding to improve the BER and thus the perceptual quality. Moreover we give more power to

more important segment depending on PSNR, similar to BW allocation. This gives higher

information at the receiver.

3.5. XSG implementation of MIMO System on FPGA

Figure 18 shows the design of the system for Alamouti MIMO scheme. The unique advantage of

the design is the graphical modeling, ease of implementation, clear flow diagram, suitable

interfacing between modules implementable on the hardware and software, reusability,

modularity and ease of development. The system view can be made more modular by

integrating the separate block as fewer subsystems.

45

Here is the brief description of the System Block Diagram.

Fig 18. FPGA MIMO System Generator Implementation of One segment of Hybrid Alamouti

A. The block on the transmitter side for carrying out BPSK modulation sends its input to the

output of the two antennas at the transmitter side. The input signal is randomly generated

and is also used ultimately to verify errors in the transmission. This completes transmitter

side.

B. The channel is simulated in the design itself. The channel is modeled as a flat-fading

Rayleigh channel with random noise. The H matrix of the channel models its characteristic,

46

which affects the signal in a particular way. The noise is additive in nature, to balance the

delays introduced by some of the blocks, intentional delay is introduced in the other paths.

C. The receiver implements Maximum Likelihood decoding which is used to finally take the

decision about the transmitted symbol. This is followed by BPSK demodulation, so that

symbol may be converted in the original signal format. A comparison of the received signal

and the original signal shows the errors in transmission.

D. Performance of the system is assessed by performing simulation of transmission of a large

number of bits. Cumulative error is termed as Bit Error Rate (BER) and this is determined by

accumulating individual errors. This model can be simulated on software for accuracy.

E. The symbol of ‘System Generator’ on the left is finally used to convert the design into fixed

point model, implementable on the hardware. XSG generates complete files which can be

directly used in Xilinx ISE software for creating the final design.

The models of any circuit or system developed in XSG can easily be simulated in Matlab, which

is one of the strongest arguments of using Xilinx Blockset and the XSG. System Generator (XSG)

provides capability of the bit-true and cycle-accurate simulation for DSP, due to which the user

can validate the design before implementing it on hardware. Because it is generic to any Xilinx

FPGA device, by simply changing one single parameter the user can compare the results of say,

a Virtex-II implementation to that of a Virtex-4 device. It also provides many features such as

System Resource Estimation to take full advantage of the FPGA resources, Hardware Co-

Simulation and accelerated simulation through hardware in the loop co-simulation etc. Along

with the high level abstraction, the tool also provides access to underlying FPGA resources

through low-level abstractions, allowing the construction of highly efficient FPGA designs.

Figure 19 shows the option of hardware optimization possible by choosing proper rolling factor

in any loop. For a nested loop, the option is also provided to separately optimize both the loops

by choosing loop rolling factors individually for row and column loops.

47

Figure 19 also shows the design flow in AccelDSP from Matlab code till generation of System

Generator block which can be added as a custom, user-defined reusable block. For generation

of such block, all the previous steps should have been completed with any error. It is clear that

AccelDSP executes both the original Matlab design (floating point design) and the generated

hardware design (fixed point design) so that the user can compare these designs.

Fig 19. Optimization steps for implementation

3.6 FPGA Hardware Design

We have implemented our work on FPGA Spartan 3E xc3s500e -5fg320 platform. The design

takes into consideration practical limitations of hardware, namely number of input output pins,

interfacing and the clock issues. Major characteristic of our design is that all of logic in FPGA can

be rewired, or reconfigured, with a different design. This allows a large variety of logic designs

dependent on the processor’s resources), which can be interchanged for a new design as soon

as device can be reprogrammed. Final design on FPGA provides flexibility to reprogram and

48

reusability. An informal discussion with several students doing their projects on FPGA or

reconfigurable hardware revealed the effectiveness of our approach (using XSG) for FPGA

design to produce the final design in time small enough and with little efforts on debugging.

The steps taken, techniques employed for optimizations, results and the limitations are shown

in next chapter. Many advantages of FPGAs make them a preferred choice of implementation in

DIP realm. Multiple iterative processing of data sets such as four stages of canny edge detector,

which require performing multiple passes over the image have to be performed sequentially on

a general-purpose computer, which can be fused in one pass in FPGA, as their structure is able

to exploit spatial and temporal parallelism. FPGA can perform multiple image windows in

parallel and multiple operations within one window also in parallel. By employing several

optimizations techniques such as Loop Fusion, Loop Unrolling etc efficient usage of FPGA

resources and speed-up in implementations is possible by avoiding many redundant operations.

FPGAs are capable of parallel I/O, which allows them to perform read (from memory), process

and write (to memory) simultaneously. Many operations such as convolutions, finding square

root etc can be executed much faster by using pipelining and parallelism. High computational

density in FPGA together with a low development costs allows even the lowest volume

consumer market to bear the development costs of FPGAs.

49

CHAPTER 4: IMPLEMENTATION

The implementation of segmentation and pooled compression module and MIMO-OFDM and

testing was carried out in MATLAB. QoS Guaranteeing was done in MATLAB, prior to

transmission. We then implemented the whole MIMO system on FPGA.

4.1 Steps that led to the final design

The following steps were involved:

Step 1: Development of the segmentation module.

Step 2: Development of the CEZW+ coder and decoder.

Step 3: OFDM MIMO system implementation on MATLAB

Step 4: Incorporation of QoS guaranteeing modules.

Step 5: Implementation of MIMO system on FPGA:

i. A pure Simulink model-based design of the functionality and simulation

ii. Using AccelDSP to generate System Generator blocks

iii. Using the blocks generated by AccelDSP and with the basic blocks in Xilinx library, design

of system by XSG in Simulink and simulation

iv. Generation of hardware code on VHDL, optimization, debugging, simulation etc in ISE

and generating programming file for a particular FPGA

v. Downloading to FPGA after configuration, testing, and demonstrating the functionality

50

Fig 20. Steps 1-2: Development of the segmentation and CEZW+ module

Fig 21. Steps 3-4: Integration of OFDM-MIMO system and QoS Guaranteeing

51

Fig 22. Step 5: Development and Integration of FPGA Network Streaming Module

4.2 Code Development Model

The code development was done using Modular approach. The project was divided into

different modules and each module was independently developed. Figure 22 shows the

different modules.

4.2.1 Segmentation Module

The segmentation module was developed and tested in Matlab. It segments each video frame

into three Visual objects. The GUI developed for implementation is shown in fig.18. Several

techniques like using edge detection based approach, Kalman filtering approach for motion

prediction, contour mapping based approach for object approach for object segmentation etc

were applied and the optimal results obtained.

4.2.2 Pooled Compression Module

52

The compression module was first implemented in Matlab. Block and image processing was

done using the inbuilt Matlab image processing toolbox functions. Advanced PIB frame

packaging is performed with MATALB and MV Threshold. Packet format is especially built to

support the information regarding PIB frames. Special bits specify the number of times a

particular frame has to be repeated, in order to achiever successful PIB decoding and best

perceptual quality. The packet format is as shown in Figure:

Fig 23. MAC layer PDU format for transmission of bits

For implementing CEZW scheme, we employ open source Matlab code of EZW compression

scheme. However, the EZW scheme is limited to compression of grayscale images only. Thus,

we break the frames into their YUV components and run the code for separate compression of

each color space. The coding scheme is shown in the figure below:

4.2.3 Network Streaming Scheme – Implementation of MIMO on MATLAB

For Transmitter, we develop the MATLAB code of Hybrid Alamouti Scheme, with the formal

mathematical analysis required by the Alamouti coder. Antennas need to be simulated due to

lack of proper resources and absence of any wireless network in the department. Thus, each

path from transmitter antenna to receiver antenna is separately modeled as a Rayleigh fading

channel. The data is then QAM modulated and further OFDM modulated with cyclic Prefix (CP),

to ensure removal of ISI.

The functions used are:

53

channel = rayleighchan (ST, DS, DMP, PG) constructs a frequency-selective ("multiple

path") fading channel object that models each discrete path as an independent Rayleigh

fading process. ST is sampling time of the channel, kept equal to the bit duration, DS is

the Doppler shift, kept negligible for testing, as we are stream to static users, DMP is a

vector of multi-path delays, each specified in seconds. PG is a vector of average path

gains, each specified in dB

Use of FFT and IFFT functions for OFDM implementation with FFT size as 1024. This is

also thus the block length of the transmission. Also, cyclic prefix is chosen to be of

length of ceiling (0.4 times the size of block).

CSI is obtained through feedback from the receiver, with random modeling of Rayleigh

additive noise in the channel. Moreover, CSI information is set to provide dynamic

carrier to noise ratio and channel modeling in the form of changes in the fading of sub-

channels.

Self designed code for ML decoding of hybrid Alamouti receive structure, based on

original decoding criterion set by Alamouti. Filters implemented for downlink

transmission separate the different segments, which are inherent frequency separation

due to employment of OFDM coding. Each segments PDU is separately dealt with

different code streams and output is then merged through a commonly shared function,

which then integrates the segments in to the whole frame sequence and then outputs it

to the decompression module for decompression. Following decompression, frames are

de-interleaved to account for jump factor, and finally reconstructed.

4.2.4 Bandwidth Estimation

Most of the bandwidth estimation tools available as well those proposed in research papers are

based on the principle of self-induced congestion. Since bandwidth estimation has to be done

every 5 seconds, it is very inefficient to use them as they may lead to network congestion, thus

deteriorating performance. Therefore, we simply used the feedback based packet transmission

54

technique which has inherent bandwidth estimation capabilities. The data is sent to the

receiver at the rate at which feedback is received.

4.2.5 MATLAB to FPGA

We employ a two-pronged approach to the solution of the problem. An intelligent and practical

design of the system must involve a balanced approach towards the solution of the problem.

Firstly, one must work on software, developing the code, design and block diagram, to satisfy

the specifications of the design and assembling all the required softwares. And secondly choice

of proper hardware, resources required and learning time and development efforts have to be

taken care of. A successful design will be one where the two aspects of the design influence

each other and take into consideration the constraints and advantages the other platform

offers.

Simulink Design

System Generator

Design

AccelDSP Block Design

Matlab Code

Mutual Correspondence and Influence

Downloading toFPGA

Simulation

Fig 24. Design Approcah to migrate from Matlab to FPGA

4.3 Challenges / Failures faced

We faced certain challenges in deciding the jump factors for each individual segment in case of

background noise, such as falling go books. In those cases, background gains more importance

55

in terms of PSNR and thus, needs to be compensated in the system. This we did through

accommodating the teacher and background bits, in the same transmission path, and

differentiated the bits corresponding to the two components using the specification bits of the

MAC layer PDU shown in figure 23.

Another challenge was the sudden change in channel response and noise due. Once CSI is

obtained, system has to wait for transmission time of a whole block. During this time, which is

relatively quite small, yet if the system changes abruptly, loss in the accuracy and optimal

allocations of DDM occurs. Such a case was encountered while implementing and running the

system for determining data rates with hybrid Alamouti. The result is as shown in the figure

below:

Fig. 24. Test case failure due to abrupt change in channel state immediately after CSI

information is obtained. The situation however soon rectifies with progressive CSI available

While working practically on FPGA design, we developed many insights into actual scenario,

limitations and advantages of the systems. The important challenges we faced were:-

1. For any non-trivial design, downloading it on FPGA requires enormous resources. The

difference between the numbers of I/O pins required and actually available on the

hardware becomes a problem and forces to modify the program.

56

2. Image processing applications present special problems while designing FPGAs, because the

software tools such as blocksets developed using AccelDSP, don’t accept a three or even

two dimensional data directly. So, such data has to be given sequentially as one dimensional

data. This precludes the possibility of using many in-built function of MATLAB since they

work with two or three dimensional image matrices.

3. The number of blocks and functions of the blocks in System Generator are also limited

today. This has forced us to ourselves design many of the blocks, especially in image

processing field.

4. Not all blocks from Simulink are available in Xilinx Blockset. And there exists no way to

generate hardware from models which use the basic Simulink Blockset (apart from those

provided by the Xilinx). So though, for simulation purpose, Simulink blocks provide graphical

block based design, the actual implementation possible on hardware is quite different from

it. This creates a big gap between modeling and hardware design and as such, restricts the

usefulness of the approach.

4.4 Code Development features for higher performance

The following code development and optimization techniques were used in the implementation

to enhance performance:

FPGA migration from MATLAB: The structures for the transmission and decoding eat up

most of the streaming time available and thus slow down the system largely due to slow

processing at software platform. Thus, code is mapped to FPGA by use of system generator

Xilinx tool. Initial coding is done in Matlab, and finally compressed data is passed to

programmed FPGA’s thorough JTEG cables for parallel processing of a large number of bits.

Program Level Optimizations: Use of inline functions and macros have greatly reduced the

number of function calls thus accelerating performance. Loops are resolved in to parallel

processing blocks in FPGA implementation, thus reducing the per frame computation and

processing time drastically.

57

A powerful and valuable design option used by us for efficient hardware generation in

AccelDSP is the “Unrolling a Loop to Increase Hardware Performance”. Figure 25 shows the

‘Fixed Point Report’ generated by the AccelDSP for a design of MIMO block. In an FPGA, the

possibility of using the ‘loop unrolling’ feature gives the designer a choice to trade-off

between area and performance. A general ‘for’, ‘while’ loop implemented in software or in

hardware (with no optimization) needs to be executed as many times as the iteration

interval (say N) of the loop. This would mean applying the data to a single data path for

each time. However, if the loop construct is completely unrolled, then the AccelDSP

Synthesis Tool builds a different hardware structure where N data samples are applied

simultaneously to N identical parallel data paths. Performance is increased N times at a cost

of increasing the hardware area by N times. If a full unroll consumes too much hardware,

then the user can go for a partial unroll making a suitable balance between area and

performance requirements. The encircled portion in the figure 4 refers to the optimization

of the following statement in the AccelDSP, 0.5;r r where r is a two dimensional (108*2)

array. The statement adds .05 to each of the element of r . Note that the option is provided

to optimize the loop in both the directions, this serves as a typical example of optimization

of N-dimensional loop in AccelDSP.

58

Fig 25: Output of AccelDSP “Generate Fixed Point Report”

Apart from the blocks provided by the standard library, we have also made custom blocks

using both MCode blocks (for non-algorithmic code) and AccelDSP (for algorithmic code).

This helped us to extend the utility of the XSG tool. An important attribute of our design

using AccelDSP was that the blocksets generated in AccelDSP for XSG, are reusable and can

be neatly divided into appropriate libraries each containing blocks specific to a certain field

such as (for example) ImageProcessingLibrary , MIMOSystemLibrary etc depending on their

applications. The MIMOSystemLibrary for example may contain all blocks made by us which

are useful in MIMO communication field and so on. Figure 26 shows the library consisting of

custom blocks designed by us. MySysGenBlocks library in Simulink Library Browser

(encircled) contains blocks used in design of MIMO systems. Another Simulink file, named

MyMCodeBlocks contains is the library of blocks made from MCode and gives the user

facility to augment the functionality in XSG. The model file of the user named ‘untitled’ can

use blocks from both of these sets also.

59

Figure 26. Snapshots of user-defined blocks augmenting functionality of XSG

Scalability is defined as the ability to support larger volumes of data and more users in cost-

effective increments. In this section, we will talk about two types of scalabilities:

1) Scalability in terms of the number of videos: The CEZW+ coding technique stores the

lecture videos in highly compressed form which enable storage of large volume of such

videos. The videos are encoded once and saved on the server for subsequent streaming

to the clients at adaptive bitrates.

2) Scalability in terms of the number of users: To stream a video to a client, feedback based

QoS guaranteeing is performed at the server and a part of the encoded video bitstream is

transferred to the client. Decoding is performed at the client side. Server doesn’t

perform coding or decoding during the processing of the request. Thus, during the

serving of a request, the server behaves like a normal web server and can handle a large

number of simultaneous requests.

4.5 Limitations & Bottlenecks

Lecture videos of arbitrary dimensions cannot be used with the codec because the nth level

wavelet decomposition requires the dimensions to be a multiple of 2n.

Transmission of coded bits through multiple channels also results in a change of frequency

in the practical transmission systems, this is generally called as Frequency shift. This can

result in some of the data bits not being received, when implementing on a practical

wireless transmission system. Thus there is an increase in the retransmission through the

routers.

Sudden change in Channel fading, of which probability is quite low, can result in defeating

the QoS guaranteeing of DDM. This however, occurs a few times and is rectified as the next

CSI feedback arrives after block transmission.

The segmentation algorithm may not function properly if there are illumination changes in

the video thus leading to higher bitrates.

60

The blocks provided in Xilinx do not provide as much flexibility as their counterparts in

Simulink Blockset. Apart from a need of different treatment for these blocks, this also puts a

burden on the designer for having to calculate many design parameters and specifications

such as binary points etc for every block output, and also for constants used in blocks.

The number of blocks and functions of the blocks are also limited today. For example, the

existing FFT block provided with System Generator can only handle 16-bit input and 16-bit

output.

Although modern FPGAs can be reconfigured quickly, to achieve this dynamically while

continuing to process data is a complex and challenging task.

The size of memory that can be implemented using standard logic cells on an FPGA is

limited, as implementing memory is an inefficient use of FPGA resources.

However, many of the problems of the System Generator can be explained by the fact that it is

a new product not fully developed yet and it is expected that experience of the experts from

industry and universities will help it grow into a more practical system. Moreover, it is updated

continually, giving better performance and more blocks in Xilinx Blockset.

61

CHAPTER 5: RESULTS

5.1. PIB Frame packaging Compression: GOP of size 24 were chosen for videos. Three different

lecture videos were used. CEZW encoding threshold = 20; Wavelet Type Employed: db4 and

bior 4.4., video dimensions: 240X320. Figure below shows the difference in MV’s of the GOP

frames with jumps. The teacher movement from Frame 1 takes a jump to Frame 10 for crossing

threshold, at the reference frame number 1. However, the Blackboard still does not show any

change significant enough to cross MV threshold. Thus the same frame is repeated for

consecutive frame reconstruction at receiver. Table 1 shows the results of achievement of very

low compression bits output due to our pooled compression scheme. :

62

Fig 27. Differences in Teacher and Blackboard frames and respective crossing of MV Thresholds.

Table 1. Performance comparison of videos for Lecture video sequence

m Average bpp CEZW compression only Average bpp with Pooled compression

Video 1 Video 2 Video 3 Video 1 Video 2 Video 3

8

0.39

0.52

0.45

0.23 0.30 0.26

7 0.20 0.26 0.22

6 0.17 0.22 0.20

5 0.15 0.20 0.18

63

4 0.14 0.19 0.16

5.2. Data rate increase: Data rates are measured as the maximum number of bits allocated per

Hz for transmission, ensuring a maintenance in BER, as larger data rates may increase BER. A

perfect measure of the data rates is to plot it as normalized w.r.t to Shannon limits spoken of

earlier, rather than expressing in bits/Hz. This is a more realistic measure of performance of the

system. The experiment is performed with varying channel conditions and different SNR values

of the transmitted signals. The graphs show the increased performance reaching closer to

Shannon limits with WF.

64

Fig 28. Normalized Data rate increase under different channel conditions.

5.3. Reliability: Compressed video bits undergo channel effects in MIMO transmission.

Different segments are allotted different power as similar to BW allocation, depending on

segment PSNR Figure below shows improvement in Teacher segment perceptual quality for

Frame no. 94 due to increase in power allocation.

Fig. 29. Perceptual quality improvement in Teacher Segment transmission with: SNR = 3 db,

6dB, 9dB, 12dB; Resultant BER’s = 0.2938, 0.0328, 0 .000562, 2.91e-5;

65

5.4. Bandwidth allocation: BW allocation was done for a special lecture with 286 frames in

which we deliberately introduce noise in the background (falling of books) to ensure suitable

resultant allocation. Different amount of available BW is allocated to either TCR (magenta), BB

(green), or to BG (black).

Fig. 30. Bandwidth Allocation. Note the extra noise in BG at Frame number 209

5.5. Power allocation at different BW levels: Graph shows power variation in new scheme

rather than constant power allocation. 10 MHz is chosen as the reference BW, and N/C

calculated form CSI. Change in reconstruction of frame is clearly visible. The experimental BW is

chosen to be 25 MHz

66

Fig. 31. New Total power allocation and resultant reconstruction quality increase.

Fig. 32. New Total power allocation under different noise distributions.

5.6. BER through FPGA implementation of hybrid Alamouti

67

Figure 33 shows the output of the System Generator block simulation for transmission of 500

bits. The two figures correspond to two receiver antennas at the receiver side in 2*2 scheme.

The encircled values of the signal correspond to error in transmission due to noise and inter-

symbol interference. However, the design shows a very small BER value of 10/500 (.025), which

supports the superiority of our approach. Clearly a graphical design and output along with the

power of Simulink comes as a great boon for the programmer.

Fig 33. Based on our work we have concluded the following comparison results for the

performance between VHDL and the System Generator.

5.7. Optimal Performance of XSG for FPGA’s

Table 2 describes the significant milestones our optimal approach of employing XSGF achieves

as compared to conventional approaches of suing VHDL platforms for implementation on

68

FPGA’s. Not only the processing time, code memory requirement, and other aspects improved,

XSG also provides for bypassing the learning time required for VHDL coding. Thus, it proves as a

convenient tool in the hands of the generations to come.

Table 2. Comparison of VHDL and XSG Platforms for FPGA implementation

VHDL Xilinx System Generator

Learning and Design time 3 months 4 weeks

Debugging time Larger Small (GUI design)

Code reuse Difficult and likely to

create error

Easy to reuse the blocks by

integrating into library

Design overheads User must involve himself

in many low-level design

issues

Tool automatically

generates HDL netlist,

place and route info etc

flexibility Modifying the program is

very error-prone

Model based design makes

modification very easy

Simulation/test and debugging

support

Large time required in

simulating and creating

waveforms

Avails the power of

simulation of Simulink

(and design in Matlab)

Time for simulation High An order of magnitude

faster than HDL simulator

Higher level routines Less 90 DSP blocks available,

can be extended

Code lines/blocks for

comparable program

200 lines of code 3-5 blocks

69

Productivity Low 2X to 4X over HDLs

Figure 34. shows the resource utilization in different target platforms in terms of four

parameters which are common to all the platforms. The comparison shows clearly the optimum

nature of the platforms on the higher ends. The Spartan 2 platform uses the highest number of

resources in all the parameters, while as Virtex 4 kit uses the lowest. It is to be noted that the

resources depend critically on the nature of the platform (called specification of the family) and

may also vary with the design.

Figure 34. Resource utilization in different target platforms. (Where Spa2= Spartan2 xc2s200-

6fg456, Spa3= Spartan3E xc3s500e -5fg320, Vrtx2Pro = Virtex2Pro xc2vp30 ff6896, Vrtx4=

Virtex4 xc4vfx12 ML402)

70

CHAPTER 6: CONCLUSIONS AND FUTURE WORK

We ensure guaranteed QoS for low motion video MIMO streaming, with numerous novel

techniques and supporting results [63] [64] [65] [66] [67] [68] [69]. We provide new algorithm

that significantly reduces the lecture video size, and hence increases the performance. Many

bottlenecks and their solution were considered, ands thus addressed through MIMO system

architectures. Furthermore, our work highlighted the improvement in design offered by the

FPGAs by using several techniques such as parallel I/O, pipelining etc. Our work clearly

demonstrates the optimal performance of System Generator, which make it an excellent high

level design tool for FPGA design. Decrease in time for testing and verification alone is worth

the migration to System Generator as it offers a 2X to 4X productivity improvement over

conventional HDL language development methods due to its design environment and

simulation speed. Our future work includes implementation of such robust systems on FPGA

testbeds, and further increasing the performance of the system. Such hardware testbeds are

going to be the next generation revolution systems. Further work on slides, QA session videos,

etc is intended.

This work can be extended to improve the accuracy, run time, and robustness of the system by

1. Implementing several basic and complex modules of MIMO OFDM communication systems

in XSG and comparing their performance with those implemented on other design

platforms such as HandelC, SystemC etc

2. Making prototype of advanced MIMO, DSP and Compression systems and validating their

designs in XSG and downloading it on FPGA

3. Critically analyzing the performance of several FPGA platforms such as Spartan and Virtex

family, for complex modules such as Radar Signal detection application, Image convolution

and FFT/ IFFT.

71

4. Extending the functionality of XSG further by making libraries of XSG blocks using AccelDSP,

so that a large number of Matlab modules can be converted into hardware implementable

blocks.

REFERENCES

*1+ J. Gerhard and P. Mayr, “Competing in the e-learning environment-strategies for universities,” Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS), January 2002 pp. 3270 – 3279.

[2] MIT Website, http://ocw.mit.edu/index.html.

[3] Z. Nedic, J. Machotka, A. Nafalski, "Remote laboratories versus virtual and real laboratories", 33rd Annual Frontiers in Education, 5-8 Nov. 2003, Vol.1, Iss. pp T3E-1- T3E-6.

[4] H.H Hann, M.W. Spong, "Remote laboratories for control education", Proceedings of the 39th IEEE Conference on Decision and Control, 12-15 Dec. 2000, Vol.1 pp895 – 900.

[5] B. G. Haskell, A. Dumitras, “ A background modeling method by texture replacement and mapping with application to content-based movie coding”, ICIP (1) 2002 pp 65-68.

[6] S. Murphy, S. Goor and L. Murphy, “Performance comparison of multiplexing techniques for MPEG-4 object based content”, IPCC 2005. 24thIEEE International 7-9 April 2005 pp 505 – 510.

[7] J.Dong, Y.F.Zheng, “Content-based retransmission for 3-D Wavelet video streaming over lossy networks”, IEEE Transactions on Circuits and Systems for Video Technology, Volume 16, Issue 9, September. 2006, pp.1125 – 1133..

[8] J. Lee, and H. Radha, “Interleaved Source Coding (ISC) for Predictive Video Coded Frames over the Internet,” IEEE Proc. ICC, May, 2005, pp 1224-1228.

[9] T. Liu and C. Choudary, “Content-Adaptive Wireless Streaming of Instructional Videos”, Multimedia Tools and Applications, 23(2), pp. 157-171, 2006.

[10] J. O. Limb and C. B. Rubinstein, “Plateau coding of the chrominance component of color picture signals", IEEE Transactions on Communications, vol. COM-22, no. 3, pp. 12{820, June 1974.

[11] Q. Zhang, W. Zhu and Y.Q. Zhang “End-to-End QoS for Video Delivery Over Wireless Internet.”, Invited paper for the proceedings of the IEEE, vol. 93, no. 1, January 2005

http://ocw.mit.edu/index.html

72

[12] D Gesbert, M Shafi, D Shiu, PJ Smith, A Naguib. “From Theory to Practice: An overview of MIMO space-time coded wireless systems”. IEEE Journal on Selected Areas in Communications, 2003, vol. 21, no. 3, pp 281-302

[13] A. Mittal, A. Pande, P. Verma, “Content-based Network Resource Allocation for Mobile Engineering Laboratory Applications ", 6th international conference on mobile Learning, mlearn 2007, 16–19 October 2007 Melbourne Australia.

[14] H. Bölcskei, E. Zurich, “MIMO-OFDM Wireless Systems: Basics, Perspectives, and Challenges”, IEEE Wireless Communications, August 2006, pp 31-37.

[15] M. F. Sabir, R W. H. Jr. and A. C. Bovik. “Unequal Power Allocation For Jpeg Transmission Over Mimo Systems” Proceedings of IEEE Asilomar Conf. on Signals, Systems, and Computers, pp. 1608-1612, Pacific Grove, CA, USA, Oct. 30 - Nov. 2, 2005.

[16] D. Song and C. W. Chen, “QoS Guaranteed Scalable Video Transmission Over MIMO Systems With Time-Varying Channel Capacity”, IEEE International Conference on Multimedia and Expo, 2007, pp1215-1218.

[17] M. Onishi, M. Izumi and K. Fukunaga, “Blackboard Segmentation Using Video Image of Lecture and Its Applications,” Proceedings of 15th International Conference on Volume 4, 3-7 Sept. 2000, pp. 615 - 618.

[18] A. Plotkin, Reconfigurable Hardware: FPGA Device, 2003.

[19] U. Meyer, “Digital Signal Processing with Field Programmable Gate Arrays”, Springer, Heidelberg 2004, pp.

[20] U. Vera, A. Meyer, A. Pattichis, M. Perry, "Discrete wavelet transform FPGA design using MatLab/Simulink" In Proceedings of SPIE The International Society For Optical Engineering, 2006, vol. 6247, pp 624-703.

[21] V.Tim, “Using MATLAB to Create IP for System Generator for DSP” In Xcell Journal, Fourth Quarter 2007

[22+ A. K. Jain, “Fundamentals of digital Image Processing” Prentice Hall Inc.

[23] R. C. Gonzalez, R. E. Woods, and S. L. Eddins, “Digital Image processing using MATLAB Pearson edition”, Prentice-Hall, 2004.

*24+ J. M. Shapiro, “Embedded image coding using Zerotree of wavelet coefficients,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3445-3462, December 1993.

[25+ C. Choudary and T. Liu, “Extracting content from instructional videos by statistical modelling and classification”, Pattern Analysis & Applications, 2006.

http://www.iitr.ernet.in/departments/ECE/pages/People+Faculty+Ankush_Mittal.html

http://web.nps.navy.mil/~asilomar

http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4284552

73

[26] P.W. Wolniansky, G.J. Foschini, G.D. Golden, R.A. Valenzuela. “V-BLAST: Ann Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel”. http://www.bell-labs.com/project/blast/, last accessed – 25th April, 2008.

[27] J. Du, Y. Geoffrey Li. “D-BLAST OFDM with Channel Estimation” EURASIP Journal on Applied Signal Processing 2005, pp 605–612.

[28+ S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE J. Select. Areas Commun., vol. 16, pp. 1451–1458, Oct. 1998.

[29+ G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” Bell Labs Technical Journal, 1996, vol. 1, no. 2, pp. 41–59.

[30] R.S. Blum, Y Geoffrey Li J. H. Winters,. Q. Yan “Improved space-time coding for MIMO-OFDM Wireless communications” IEEE Transactions On Communications, Vol. 49, No. 11, November 2001.

[31] V. Tarokh, H. Jafarkhani, A.R. Calderbank, “Space-time block coding for wireless communications: performance results” IEEE Journal On Selected Areas In Communications, Vol. 17, No. 3, March 1999.

*32+ J. V. Nichols, “OFDM: Old Technology for New Markets”. Wi-Fi plane, Nov 2002. From http://www.wi-fiplanet.com/tutorials/article.php/1500641.

*33+ A. Bruce et al. “Accelerated Image Processing on FPGAs”, IEEE Transactions on Image Processing, Dec.2003 Volume 12, Issue12, pp 1543- 1551

*34+ A. P. W. Böhm, et al, “Mapping a single assignment programming language to reconfigurable systems,” Supercomputing, vol. 21, pp. 117–130, 2002.

[35] http://www.cs.colostate.edu/~cameron/ last accessed 20 April 2008

*36+ Bruce Draper et. al., “Compiling and Optimizing Image Processing Algorithms for FPGA”, Proceedings of Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000. Pp. 222-231

[37] V. Rao Daggu and M. Venkatesan. (2004). “Design and Implementation of an Efficient Reconfigurable Architecture for Image Processing Algorithms using Handel-C. Celoxica Inc. research papers. http://www.celoxica.com/techlib/files/CEL-W040414XQ7-281.pdf

*38+ V. R.Daggu, et al “Implementation and Evaluation of Image Processing Algorithms on Reconfigurable Architecture using C-based Hardware Descriptive Languages” International Journal of Theoretical and Applied Computer Sciences (IJTACS). Available www.gbspublisher.com/ijtacs/1002.pdf

http://www.bell-labs.com/project/blast/

http://www.bell-labs.com/project/blast/

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(blum%20%20r.%20s.%3cIN%3eau)&valnm=Blum%2C+R.S.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(%20ye%20geoffrey%20li%3cIN%3eau)&valnm=+Ye+Geoffrey+Li&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(%20winters%20%20j.%20h.%3cIN%3eau)&valnm=+Winters%2C+J.H.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(%20qing%20yan%3cIN%3eau)&valnm=+Qing+Yan&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(tarokh%20%20v.%3cIN%3eau)&valnm=Tarokh%2C+V.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(%20jafarkhani%20%20h.%3cIN%3eau)&valnm=+Jafarkhani%2C+H.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=(%20calderbank%20%20a.%20r.%3cIN%3eau)&valnm=+Calderbank%2C+A.R.&reqloc%20=others&history=yes

http://www.cs.colostate.edu/~cameron/

74

[39] A. Nelson,. Implementation of Image Processing Algorithms on FPGA Hardware. Masters Thesis, Graduate School of Vanderbilt University, 2000.

[40] S. Klupsch, et al. “Real Time Image Processing based on Reconfigurable Hardware Acceleration” Proceedings of IEEE Workshop Heterogeneous reconfigurable Systems on Chip, 2002

*41+ R.W. Hartenstein, J. Becker, R. Kress, H. Reinig, and K. Schmidt, “A reconfigurable machine for applications in image and video compression,” in Proc. Conf. Compression Technologies and Standards for Image and Video Compression, Amsterdam, The Netherlands, 1995.

*42+ D. Crookes et al., “Design and implementation of a high level programming environment for FPGA-based image processing,” IEE Proceedings on Vision, Image and Signal Processing, vol. 147, Issue: 4 , Aug, 2000, pp. 377 -384.

*43+ R. Maheshwari, S.S.S.P.Rao, and P.G. Poonacha, “FPGA implementation of median filter,” Tenth International Conference on VLSI Design, June, 1997, pp. 523 -524.

[44] A. Plotkin, Reconfigurable Hardware: FPGA Device,2003.

*45+ “FPGA-based FIR filter using bit-serial digital signal processing”, Atmel technical paper, Available www.atmel.com/dyn/resources/prod_documents/DOC0529.PDF

*46+ C. Dick, “The Platform FPGA: Enabling the Software Radio,” Software Defined Radio Technical Conference and Product Exposition (SDR), November 2002.

*47+ S. Choi, R. Scrofano, and V.K. Prasanna, J.W. Jang, “Energy-efficient signal processing using FPGAs”, Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays, pp. 225 - 234

*48+ R. Scrofano, S. Choi, and V. K. Prasanna, “Energy Efficiency of FPGAs and Programmable Processors for Matrix Multiplication,” Proceedings of IEEE International Conference on Field-Programmable Technology, 2002. (FPT). 16-18 Dec. 2002 pp 422 - 425

*49+ A. Telikepalli, E. Fiset, “Platform FPGA design for high-performance DSP”. White Paper Available http://www.lyrtech.com/DSP-development/technical_lib/form1_wp.php

[50] R. Duren, J. Stevenson and M.Thompson “A comparison of FPGA and DSP development environments and performance for acoustic array processing” 50th Midwest Symposium on Circuits and Systems, 2007. MWSCAS2007. pp 1177 – 1180.

*51+ M. Parker, “FPGA versus DSP design and maintenance” Altera Corporation, Technical Papers. Available http://www.techonline.com/learning/techpaper/199000540

[52] Celoxica, Abingdon, Oxfordshire, UK, DK Design Suite, Available http://www.celoxica.com/methodology/c2rtl.asp

http://www.lyrtech.com/DSP-development/technical_lib/form1_wp.php

http://www.techonline.com/learning/techpaper/199000540

75

[53] J. Gerlach and W. Rosenstiel. System level design using the SystemC modeling platform. In Workshop on System Design Automation, pages 185-189, Rathen, Germany, Mar. 2000.

[54] Xilinx Inc., Forge, www.xilinx.com/ise/advanced/forge.htm

*55+ P. Bellows and B. Hutchings. “JHDL — an HDL for Reconfigurable Systems,” Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines, pp. 175-184, April 1998.

*56+ J. Frigo, T. Braun, J. Arrowood, M. Gokhale “Comparison of High-Level FPGA Design Tools for A BPSK Signal Detection Application” Proceedings, SDR Forum Technical Conference, 2003.

[57] http://www.synopsys.com/C-level.html

*58+ Tim Vanevenhoven, “Using MATLAB to Create IP for System Generator for DSP” In Xcell Journal, Fourth Quarter 2007

[59] Petter Fandén, “Evaluation of Xilinx System Generator,” Master’s Thesis, Linköping University, Department of Science and Technology 2001

[60] M. Ownby, W.H. Mahmoud, , “A design methodology for implementing DSP with Xilinx System Generator for Matlab,” Proceedings of the 35th Southeastern Symposium on System Theory , 16-18 March 2003, pp. 404- 408

[61] G.E. Martinez-Torres J.M. Luna-Rivera R.E. Balderas-Navarro, “FPGA-Based Educational Platform for Wireless Transmission Using System Generator”, IEEE International Conference on Reconfigurable Computing and FPGA's, 2006. ReConFig 2006. Sept. 2006, pp. 1-9.

[62] User Stories, BAE Systems Available at:

http://www.mathworks.com/company/user_stories/userstory12386.html

[63] Saket Gupta, Sparsh Mittal, S. Dasgupta and A. Mittal, "MIMO Systems For Ensuring

Multimedia QoS Over Scarce Resource Wireless Networks", ACM International Conference On

Advance Computing, India, February 21-22, 2008.

[64] Sparsh Mittal, Saket Gupta, and S. Dasgupta, "System Generator: The State-Of-Art FPGA

Design Tool For DSP Applications", Third International Innovative Conference On Embedded

Systems, Mobile Communication And Computing (ICEMC2 2008), August 11-14, 2008, Global

Education Center, Infosys.

[65] Sparsh Mittal, Saket Gupta, and S. Dasgupta, "FPGA: An Efficient And Promising Platform

For Real-Time Image Processing Applications", National Conference On Research &

Development In Hardware & Systems (CSI-RDHS 2008) June 20-21, 2008, Kolkata, India.

http://www.xilinx.com/ise/advanced/forge.htm

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=%28ownby%20%20m.%3CIN%3Eau%29&valnm=Ownby%2C+M.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/search/searchresult.jsp?disp=cit&queryText=%28%20mahmoud%20%20w.%20h.%3CIN%3Eau%29&valnm=+Mahmoud%2C+W.H.&reqloc%20=others&history=yes

http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=8492

76

[66] Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “Guaranteed QoS with MIMO systems for

Scalable Low Motion Video Streaming over Scarce Resource Wireless Channels”, International

Conference on Information Processing, ICIP, 2008. IK International Pvt Ltd.

*67+ Saket Gupta, Sparsh Mittal, Ankush Mittal “EureQA: Overcoming The Digital Divide

Through A Multidocument QA System For E-Learning”. The National Conference on Emerging

Trends in Information Technology, India, 2008.

*68+ Sparsh Mittal, Saket Gupta, Ankush Mittal, Sumit Bhatia “BioinQA : Addressing bottlenecks

of Biomedical Domain through Biomedical Question Answering System” International

Conference on Systemics, Cybernetics and Informatics (ICSCI-2008), India 98-103.

*69+ Sparsh Mittal, Saket Gupta, Ankush Mittal “BionQA Multidocument Question Answering

System: Enabling e-learning for masses” Journal of Engineering Students, CAFFET INNOVA

Technical Society, India pp. 28-37.

fpga implementation of mimo systems for ensuring...

Documents