fpga implementation of mimo systems for ensuring...
TRANSCRIPT
1
FPGA IMPLEMENTATION OF MIMO SYSTEMS
FOR ENSURING MULTIMEDIA QOS OVER
W IRELESS CHANNELS
SAKET GUPTA AND SPARSH MITTAL
B.TECH . IVTH YEAR DEPARTME NT O F ELECT R ONIC S AND COMPU TER S ,
INDI AN I NSTI TU TE O F TEC HNO LOGY , ROO RKEE
FOR PART I AL FUL FILLME NT O F BAC HELO R O F TECH NOLOGY I N ELEC TRO NI CS AND COMMUNIC ATIO NS
ENGI NEER ING
UNDER THE SUPER VI S ION OF DR . S. DASG UPT A
ASS I ST ANT PRO FE SSO R , DEPARTME NT O F ELECT R ONIC S & COM PUTE R S INDI AN I NSTI TU TE O F TEC HNO LOGY , ROO RKEE
DATE : 22 T H
M AY , 2008
2
CERTIFICATE
This is to certify that Mr. Saket Gupta (Enrollment No.040157), Sparsh Mittal (Enrollment
No.041112), students of B.Tech IVrd year, Electronics and Communications Engineering,
Department of Electronics & Computer Engineering, Indian Institute of Technology, Roorkee
have successfully worked on a research project entitled “FPGA implementation of MIMO
Systems for ensuring Multimedia QoS over Wireless Channels” as a part of the final year
B.Tech project.
They have obtained extremely competent results for the problem assigned to them using many
novel approaches. Their results are satisfactory and they have 4 accepted and 2 submitted
research papers in reputed conferences of ACM and IEEE.
Date: May 22, 2008
…………………..
Dr. Sudeb Dasgupta
Assistant Professor,
Department of Electronics & Computer Engineering,
Indian Institute of Technology Roorkee
3
A C K N O W L E D G E M E N T
We would like to thank our project guide Dr Sudeb Dasgupta for his help and guidance in the Final Year
B. Tech. project, and his monitoring our work continuously by taking regular reports. Under his mature
guidance we have felt quite secure in taking up our work even in the most challenging area, in which we
had no background whatsoever. Dr. Dasgupta gave us guidance as to what should be our line of action,
which is very vital. Although there is limited equipment in this area in the department, he provided
access to all what he had, including working in the lab.
We would also like to thank Dr. Ankush Mittal for his help and support. His constant encouragements
and background vision for our work, apart from his wonderful suggestions and personal interest have
helped us a lot. Through his personal example we have learnt to be dedicated and responsible for our
work, and this made our working on the project a truly learning experience. He has been very heartily
providing us with answers to our unceasing questions and doubts in the area.
We would also like to thank our seniors: Amit Pande (B.Tech ’06 Batch), PhD student, Iowa State
University, Ames, Praveen Kumar Verma (senior PhD student, IITR), Naveen (M.Tech 2nd year), who
helped us to come up with the ideas for our project, and gave us help from time to time.
Sparsh Mittal
Saket Gupta
B.Tech. (IV Year, Electronics and Communications Engineering)
Department of Electronics & Computer Engineering
Indian Institute of Technology Roorkee
4
A B S T R A C T
Existing multimedia software in e-learning do not provide par excellence multimedia data
service to the common user; hence e-learning services are still short of intelligence and
sophisticated end user tools for visualization and retrieval. Network QoS (Quality of Servcie)
becomes critical with precision requiring low motion video streaming, over wireless scarce
resource networks having fluctuating bandwidth, fading channels, multiple paths and
requirement of minimal and optimal power usage. In this project, QoS for low motion video
streaming with best perceptual quality is being guaranteed with novel techniques. We consider
educational videos for our research. Our strategy is to segment video into different segments
and code by our pooled compression scheme (CEZW+), and employ Forward Error Correction
coded OFDM signals for transmission over MIMO (Multiple Input Multiple Output) wireless
channels.
We guarantee transmission QoS for such compressed video streaming with maximum reliability
and perceptual quality; selective video frames transmission for least data redundancy; high
data rates with MIMO systems; optimal power allocation for transmission at different BW
levels; and preferential allocation of fluctuating bandwidth according to relative importance of
segmented video blocks. We exploit both Spatial Multiplexing and Alamouti Space Time Block
Coding for transmission. Experimentation results demonstrate effectiveness of our proposed
schemes. We exploit low motion video characteristics to achieve maximum compression and
streaming throughput. The MIMO system is implemented on Xilinx Spartan FPGA (Field
Programmable Gate Array). Parallel implementation of MIMO-OFDM internal configuration on
FPGA through specifically designed process which uses System Generator tool guarantees
optimal performance of testbed which is measured through parameters like prototype
development time, synthesis error elimination, processing time for transmission bit generation
and decoding, FPGA resource utilization and reliability over conventional algorithms for FPGA
implementation like those employing VHDL, and Verilog. The results are compared with the
state of the art transmission and hardware schemes over the network to illustrate the superior
performance of our approach.
5
P U B L I C A T I O N S F R O M T H E W O R K
P U B L I S H E D A N D A C C E P T E D :
1. Saket Gupta, Sparsh Mittal, S. Dasgupta and A. Mittal, "MIMO Systems For Ensuring
Multimedia QoS Over Scarce Resource Wireless Networks", Published in the proceedings of
ACM International Conference On Advance Computing, India, February 21-22, 2008,
Proceedings Yet To Arrive.
2. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "System Generator: The State-Of-Art FPGA
Design Tool For DSP Applications", Accepted for the proceedings of Third International
Innovative Conference On Embedded Systems, Mobile Communication And Computing
(ICEMC2 2008), August 11-14, 2008, Global Education Center, Infosys.
3. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "FPGA: An Efficient And Promising Platform For
Real-Time Image Processing Applications", Accepted for the proceedings of National
Conference On Research & Development In Hardware & Systems (CSI-RDHS 2008) June 20-21,
2008, Kolkata.
4. Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “Guaranteed QoS with MIMO systems for
Scalable Low Motion Video Streaming over Scarce Resource Wireless Channels”, Accepted for
the proceedings of International Conference on Information Processing, ICIP, 2008. Proceeding
to be published by Springer, August 8-10, 2008.
S U B M I T T E D :
5. Sparsh Mittal, Saket Gupta, and S. Dasgupta, "MIMO Systems on FPGA using System
Generator for Providing Educational Multimedia services to Masses", submitted to IEEE
TENCON 2008, Hyderabad, India, November 18-21, 2008.
6. Saket Gupta, Sparsh Mittal and S. Dasgupta, "Optimal Performance FPGA testbeds for low
motion video streaming over MIMO wireless Channels", submitted to 15th IEEE International
Conference on High Performance Computing (HiPC), December 17-20, 2008,
T O B E S U B M I T T E D :
6
7. Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “OTMS: An optimal Testbed for Low
motion video streaming over MIMO wireless channels”, to be submitted in Elsevier
International Journal of Electronics and Communications.
T A B L E O F C O N T E N T S CERTIFICATE 2
ACKNOWLEDGEMENT Error! Bookmark not defined.
ABSTRACT Error! Bookmark not defined.
PUBLICATIONS Error! Bookmark not defined.
LIST OF FIGURES Error! Bookmark not defined.
Chapter 1 : INTRODUCTION Error! Bookmark not defined.
1.1 Need for Video compression 9 1.2 Need for QoS Enhancement for Multimedia Delivery.Error! Bookmark not defined. 1.3 Need for mimo testbeds 11 1.4 Problem statement 13 1.5 Contribution of this work 13 1.6 Organisation of the report 15
CHAPTER 2 : BACKGROUND STUDY 16
2.1 Video Compression Scheme 16 2.1.1 DWT: The Discrete Wavelet Transform Error! Bookmark not defined. 2.1.2 Color Embedded Zerotree Wavelet (CEZW) Scheme 17 2.1.3 Motion Estimation and Motion Compensation as in MPEG 1 19 2.1.4 Frame Packaging 21
2.2 MIMO ISSUES 24 2.2.1. Existing MIMO coding Techniques 24 2.2.2 MIMO-OFDM 24
2.3 MIMO FPGA Testbed Issues 25
2.3.1 Utilization of FPGA in DIP and DSP applications 25 2.3.2 Superiority of FPGA over other implementation platforms 26 2.4 FPGA DESIGN OPTIONS 28 2.4.1. Xilinx System Generator(XSG) 28
7
2.4.2. Superior performance offered by XSG 29 CHAPTER 3 : SYSTEM OVERVIEW 31
3.1 Video classification and Segmentation 32 3.2 Pooled Video Compression 33 3.3 MIMO SYSTEM ARCHITECTURE 34 3.3.1. FEC and OFDM coding for overcoming ISI 35 3.3.2. Spatial Multiplexing 35 3.3.3. STBC and Transmitter system 36 3.3.4. Channel and CSI 36 3.4. QOS GUARANTEEING 37 3.4.1. Fluctuating Bandwidth: 37 3.4.2. Optimal Power Allocation (OPA) 37 3.4.3. Data Rate Increase 40 3.4.4. Reliability 40 3.5. XSG IMPLEMENTATION OF MIMO SYSTEM ON FPGA 40 3.6 FPGA HARDWARE DESIGN 43 CHAPTER 4 : IMPLEMENTATION 45
4.1 STEPS THAT LED TO THE FINAL DESIGN 45 4.2 CODE DEVELOPMENT MODEL 46
4.2.1 Segmentation Module 51 4.2.2 Compression Module 51 4.2.3 Network Streaming Scheme 527
4.2.4 Bandwidth Estimation 49 4.2.5 MATLAB To FPGA 49 4.3 CHALLENGES / FAILURES FACED 50 4.4 CODE DEVELOPMENT FEATURES FOR HIGHER PERFORMANCE 51 4.5 LIMITATIONS & BOTTLENECKS 54 CHAPTER 5: RESULTS 56 5.1. PIB FRAME PACKAGING COMPRESSION 56 5.2. DATA RATE INCREASE 57 5.3. RELIABILITY 58 5.4. BANDWIDTH ALLOCATION 59 5.6. BER THROUGH FPGA IMPLEMENTATION OF HYBRID ALAMOUTI 60 5.7. OPTIMAL PERFORMANCE OF XSG FOR FPGA’S 61 CHAPTER 6: CONCLUSIONS AND FUTURE WORK 64 REFERENCES 65 APPENDIX: CONTENTS OF THE CD 70
8
List of Figures
9
CHAPTER 1: INTRODUCTION
Demands for the multimedia services over wireless are rapidly increasing while the expectation
of quality for these services is becoming higher and higher. However, inherently limited channel
bandwidth and unpredictability of the channel propagation becomes significant obstacle for
wireless communication providers to offer high quality, reliability and data rates at minimum
cost. Transmission and streaming of e-learning videos has been a hot topic and a challenging
issue for many years. Real time delivery or streaming is an essential for most educational
structures. Many institutes, such as MIT and IITs have opened their web servers for free
lecture-on-demand on several courses [1-2]. The concept of remote laboratories also demands
real-time multimedia content delivery [3-4].
Educational videos possess inherent characteristics which can be exploited for robust and
optimal transmission over wireless. These include acceptable coding with scalable bitstream,
provisioning of low bpp for transmission using suitable coding schemes, use of static camera,
predefined components like instructor, blackboard and background and low component motion
rate in video allowing minimal transmission of coded bits. General video coding standards and
formats like MPEG-1, MPEG-2, and H.261 etc. achieve a high rate of video compression but
10
educational videos and these characteristics are not separately dealt by these standards. Thus,
end to end transmission structure for transmission of such videos needs to be assembled.
Fig. 1. Snapshots of classroom Lecture sessions in low motion videos.
1.1 Need for video compression
The smallest unit of quantization in an image is called a pixel element or pixel. A standard video
monitor displays a frame usually with the resolution of 800 * 600 pixels. In color image a pixel is
represented by 3 bytes of data.(one for Red, Blue and Green respectively).Thus even one
uncompressed image requires 1.373 MB of storage. One hour Video at 15 frames per second
will require 72.07 GB of space in our hard-disk and is impossible to transmit. This leads to need
for compression of videos.
One hour of video coded on MPEG standards still takes 500-600 MB of storage. MIT videos are
coded in Real Video and the size is further decreased to around 160 MB. But this is also
unsatisfactory. Educational lectures are slow moving videos and specific applications built to
compress them on the basis of their content can achieve very high compression standards. In
this project we have achieved a high compression rate for educational providing a scalable
solution to ensure best Multimedia QoS over various network conditions.
1.2 Need for QoS Enhancement for Multimedia Delivery.
The distribution of network resources is generally done statically in traditional multimedia
frameworks. However, serious blockages towards such educational multimedia delivery exist.
Such delivery and QoS issues are summarized as follows:
11
1 Addressing QoS issues directly through network and channel issues, thus necessitating
techniques far superior than traditional protocol development for guaranteeing QoS. Such a
need is more acute in video streaming.
2 Requirement of an efficient coding scheme without transmission of redundant data.
Generally, in such videos, the frame-to-frame macroblock instructor or instructional
blackboard or background contents motion is very slow. Conventional coding schemes do
not exploit such redundancy.
3 Fluctuating bandwidth proves as a bottleneck for viewing the lecture videos at good
resolution because of their large size.
4 Educational lectures are extensively used in various institutions and firms, thus entailing
delivery at minimum cost, requiring most optimal power usage for transmission. With
fluctuations in bandwidth, power requirement and channel noise also change, requiring
dynamic system adaption.
5 In a bit rate regulation scheme, the educational video source might sometimes be required
to decrease its output flow due to high traffic load across the network. This decrease
certainly leads to quality degradation (since the quantization distortion becomes more
noticeable at lower bit rates). However, real time educational video streaming requires high
data rates to ensure best uninterrupted perceptual quality of the video. Thus, tradeoff
between the two requirements is needed.
6 Requirement of high reliability, as any loss in instructional content is prohibited.
7 Fading in wireless channels while transmission, resulting in unpredictable loss.
8 Accounting for multi-paths delays and ISI, as this is unsafe for educational video streaming
(which require high precision in bits transmission).
Although considerable work has been done on content-based classification [5-6], content-based
streaming [7], bandwidth adaptation and on network issues [8], a complete framework that
addresses all these issues to provide an end-to-end solution with QoS for educational videos
12
does not exist. Many attempts to exploit the above characteristics for overcoming blockages
have been made. Liu et.al [9] provides a real-time content analysis method to detect and
extract content regions from instructional videos and then adjusts the Quality of Service (QoS)
of video streams dynamically based on video content. However, real time network scenarios as
mentioned above are not included. [7] uses a content based retransmission scheme, but is
generally not preferred in streaming over wireless. In videos with static camera, there is a need
to segment an individual frame into objects so that special coding techniques can be applied to
each of them. Moreover, the regions having instructional content obtained by the other
approaches have arbitrary dimensions which cannot be directly used with the CEZW
compression scheme or ISU/ITU standards like MPEG-1, H.261, etc [10].
In area of formulating resource allocation schemes, Zhang, et. al. [11] addresses such resource
allocation problems. The work is novel and addresses many QoS issues from the network and
protocol perspective. Real time QoS guaranteeing however, involves wireless network behavior
feedback and channel state information for effective feedback and system adjustment.
1.3 Need for MIMO Testbeds
WiFi and existing high-speed cellular networks being deployed today meet some of the above
needs, but OFDM-MIMO, used by WiMAX 802.16e or beyond 3G, is the technology needed to
allow for economic and scalable wireless broadband. MIMO and OFDM are key technologies for
enabling the wireless industry to deliver on the vast potential and promise of wireless
broadband.
MIMO systems can be used to increase system capacity as well as data reliability in wireless
communication systems. Research has been in developing space-time codes [12] for
transmission over MIMO systems. Where these codes provide increase in capacity while
improving data reliability, they assume that all the data bits are equally important for the
receiver. However, videos coded using most of the current standards; different parts of the
bitstream have different importance. This is especially so in educational videos [13]. For high
data rates spatial multiplexing schemes are employed parallel sub transmission [14]. MIMO
13
systems are also used to effectively distribute power available between different video
segments in the most optimal way. [15] presents unequal power allocation (UPA) scheme for
transmission of JPEG compressed images over MIMO systems. [16] guarantees QoS on MIMO
wireless for enhancement and base layers, with differential power allocation. However, they
work under constant transmit power constraint. Power requirements depend heavily upon
network conditions, video quality and network bandwidth. A low cost optimal power allocation
can be achieved only by considering all three factors simultaneously. Where a large amount of
work has been done for UPA for SISO wireless systems, there is very little published work to
date in UPA for image and video communication for MIMO systems, and practically no such
research on real time streaming.
MIMO systems generally require a large processing time when working with video lectures, as
the blocks of bits generated through compressed educational videos are still quite large in
number for real time processing, thus necessitating a huge processing time on software. FPGA’s
can be employed for increase in speed enhancement, as they offer parallel implementation of
time consuming blocks, thus increasing the speed drastically. Where the speed of software is
limited by internal processor clocking and other processes running on the system, dedicated
hardware for such MIMO systems can be developed using FPGA’s.
Recently, Field Programmable Gate Array (FPGA) technology [18] has become a viable target for
the implementation of algorithms suited to Digital Signal Processing applications [19]. Field-
programmable gate arrays (FPGAs) are nonconventional processors built primarily out of logic
blocks connected by programmable wires. Each logic block has one or more lookup tables
(LUTs) and several bits of memory. As a result, logic blocks can implement arbitrary logic
functions (up to a few bits). Therefore FPGAs, as a whole can implement circuit diagrams, by
mapping the gates and registers onto logic blocks. With more than 1,000 built-in functions as
well as toolbox extensions, MATLAB is an excellent tool for algorithm development and data
analysis [20]. An estimated 90% of the algorithms used today in DSP originate as MATLAB
models [21]. Simulink is a graphical tool, which lets a user graphically design the architecture
and simulate the timing and behavior of the whole system. It augments MATLAB, allowing the
14
user to model the digital, analog and event driven components together in one simulation.
Using Simulink one can quickly build up models from libraries of pre-built blocks. Xilinx System
Generator (XSG) for DSP is a tool which offers block libraries that plugs into Simulink tool
(containing bit-true and cycle-accurate models of their FPGA’s particular math, logic, and DSP
functions).
1.4 Problem Statement
The problem tackled in this project is to design a framework for an end-to-end e-learning
solution capable of dynamic video compression and transmission over scarce resource wireless
networks, and implement the system on dedicated hardware for real time robust processing.
The system must be competent to guarantee all necessary QoS in wireless networks, without
manipulating the network protocol systems.
1.5 Contributions of the work
In this project, we propose a new approach for guaranteeing QoS by bridging all the above
mentioned gaps in transmission, with network aware optimal resource allocation for
educational video streaming. This approach, employing MIMO systems implemented on FPGA,
is applicable to all low motion videos is exploited and explained in this project for educational
videos. The major contributions of our work are:
a) System rises above conventional network protocol issues to address QoS for videos.
b) Enabling of a streaming server-client system to perform real-time processing (using FPGA’s)
of videos.
c) Optimal bitstream generation (without redundancy) by pooled compression scheme.
d) Network QoS of optimal power, bandwidth adaptability, high reliability, high data rate and
no loss due to delays in routing, are being guaranteed by the system.
15
e) We combine both spatial multiplexing and STBC schemes to achieve high data rates even
over long distances by the use of MIMO channels. Thus we use available spectrum with the
utmost efficiency to allow higher data throughput over the wireless link
f) Huge Processing time required by MIMO OFDM system is reduced to a fraction of the
original by its implementation on FPGA testbed.
g) Method employed for such implementation is the most optimal as compared to other
conventional methods of ‘burning’ systems on FPGA’s
h) Our system works with different kinds of lecture videos, with varying illumination, changing
lecture environment and noise (unpredictable situations) in the lecture video.
i) MIMO-OFDM coding scheme for video streaming eliminates many conventional complex
processing techniques usually required at transmitter and receiver side (like CRC
retransmissions, ISI removal, etc).
j) Many research groups, professionals, companies and so on, working in the field of digital
electronics, MIMO, wireless communication, image processing, medical science etc are
shifting towards the use of higher level tools and superior methodologies and also
ultimately going for hardware prototyping and implementation of their project. Our work
will provide a boost to their research by giving many insights, and directions.
The proposed system architecture takes pre-recorded lecture videos and efficiently compresses
them so that they can be transmitted by the user in a scalable manner over MIMO channels.
1.6 Organisation of the report
Chapter 2 gives the introduction and related works in Video compression and packaging
schemes and MIMO FPGA testbeds.
Chapter 3 explains the overall system architecture both at server and client ends. It explains the
theory behind various modules.
16
Chapter 4 discusses the implementation of the proposed system architecture. It was first
prototyped and simulated in Matlab. Later, the implementation was performed on FPGA. It also
gives the details of implementation, challenges faced in each step, code development model,
steps that lead to final design and briefly explain the code development features.
Chapter 5 discusses the results obtained over several input videos.
Chapter 6 gives the conclusion and suggested future works.
The references are given at the end.
17
CHAPTER 2: BACKGROUND STUDY
This section discusses the basics of video processing and computer networks and also presents
a literature review of recent developments in these fields.
2.1 Video Compression Scheme
Transform coding has been a dominant method of video and still image compression. It takes
advantage of energy compaction properties of various transforms (such as DCT, DFT, DWT, etc.)
and properties of the Human Visual System to minimize the number of useful coefficients. DCT
has been the popular choice for image and video processing schemes [22, 23]. Here we discuss
those techniques which set a foundation of understanding our hybrid compression scheme.
2.1.1 DWT: The Discrete Wavelet Transform
In image processing, the DWT (Discrete Wavelet Transform) is obtained for the entire image,
and it results in a set of independent, spatially oriented frequency channels or subbands. The
wavelet transform is typically implemented using separable and possibly different filters. It
allows localization in both the space and frequency domains.
Fig 2 : One level of Wavelet Decomposition
Typically the full image is decomposed into a hierarchy of frequency sub-bands. The
decomposition is achieved by filtering along one spatial dimension at a time to effectively
obtain four frequency bands. The lowest subband (LL), represents the information at all coarser
18
scales (as shown in Figure.) and it is decomposed and subsampled to form another set of four
subbands.
This process can be continued until the desired number of levels of decomposition is attained.
Two analysis filters, namely g and h, carry out the decomposition, into independent frequency
spectra of different resolutions, producing different levels of detail. Formulation of sub-bands
does not cause any compression (same number of samples are required to represent the
subbands as is for the original image), but arranges the data in more efficiently codable format.
Figure 2 shows one level of wavelet decomposition.
Several efficient coding schemes have been used for coding DWT coefficients. Embedded
Zerotree Wavelet (EZW) introduced by Shapiro [24] is one such coding scheme for gray scale
images. It exploits the interdependence between the coefficients of the wavelet decomposition
of an image, by grouping them into spatial orientation trees (SOT). It outputs an embedded
bitstream. An embedded bit stream can be truncated at any point during decoding, and can be
used to obtain a coarse version of the image. Decoding additional data from the compressed bit
stream can then refine this version.
2.1.2 Color Embedded Zerotree Wavelet (CEZW) Scheme
For color images, the same coding scheme can be used on each color component. However,
this approach fails to exploit the interdependence between color components. It has been
noted that strong chrominance edges are accompanied by strong luminance edges. However,
the reverse is not true, that is, many luminance transitions are not accompanied by transitions
in the chrominance components. This spatial correlation, in the form of a unique spatial
orientation tree (SOT) in the YUV color space, is used in a technique for still image compression
known as Color Embedded Zerotree Wavelet (CEZW) [25-26]. CEZW exploits the
interdependence of the color components to achieve a higher degree of compression. The
parent child dependency in CEZW is illustrated in figure 3.
19
The coding strategy in CEZW is similar to Shapiro's EZW [85], and can be summarized as follows:
Let ( , )if m n be a YUV image, where { , , }i Y U V and let [ ( , )]iW f m n be the coefficients of the
wavelet decomposition of component i .
Fig 3 : Parent Child Dependency in CEZW scheme
1. Set the threshold T = },,{,2/)]),([(max VUYinmfW ii .
2. Dominant pass:
The luminance component is scanned first. Compare the magnitude of each wavelet coefficient
in a tree, starting with the root, to the threshold T.
If the magnitudes of all the wavelet coefficients in the tree (including the coefficients in the
luminance and chrominance components) are smaller than T, then the entire tree structure
20
(that is, the root and all its descendants) is represented by one symbol, the zerotree (ZTR)
symbol.
Otherwise, the root is said to be significant (when its magnitude is greater than T), or
insignificant (when its magnitude is less than T). A significant coefficient is represented by one
of two symbols, POS or NEG, depending on whether its value is positive or negative.
The magnitude of a significant coefficient is set to zero to facilitate the formation of zero tree
structures. An insignificant coefficient is represented by the symbol IZ, isolated zero. The two
chrominance components are scanned after the luminance component. Coefficients in the
chrominance components that have already been encoded as part of a zerotree are not
examined. This process is carried out such that all the coefficients in the tree are examined for
possible subzerotree structures.
3. Subordinate pass (essentially the same as EZW): The significant wavelet coefficients in the
image are refined by determining whether their magnitudes lie within the interval [T; 3T/2),
represented by the symbol LOW, or the interval [3T=2; 2T), represented by the symbol HIGH.
4. Set T = T/2, and go to Step 2. Only the coefficients that have not yet been found to be
significant are examined.The compressed bit stream consists of the initial threshold T, followed
by the resulting symbols from the dominant and subordinate passes, which are entropy coded
using an arithmetic coder [23].
21
Fig 4 : Flow diagram of CEZW coding algorithm
2.1.3 Motion Estimation and Motion Compensation as in MPEG 1
In image and video coding schemes image is divided into small blocks for operation by
prediction techniques. Motion estimation is used to determine the movement of a macroblock
from the reference frame to the current frame. Motion is estimated by searching for the
macroblock in the reference picture that provides the closest match, as shown in figure 5. The
difference between the values of both the macroblocks is coded for reconstruction at the
decoder. To reduce the distortion between the decoded and the original picture, the encoder
uses a reconstructed reference frame to perform motion estimation. This reconstructed
reference frame is same as used at the decoder side.
Fig 5 : Motion estimation of a block
22
Motion estimation computes one motion vector per macroblock. Usually, the search is
conducted for the luminance component only. A predictive frame is constructed from the
motion vectors obtained for all macroblocks in the frame, by replicating the macroblocks from
the reference frame at the new locations indicated by the motion vectors. The difference
between the values of the predicted and the current frames, known as predictive error frame
(PEF), is then encoded using the same procedure as for an intra-coded frame. The frame
obtained by adding the predictive frame to the PEF is known as the reconstructed frame. The
energy of the PEF is low, thus many coefficients are zero, reducing the number of bits needed
to encode the frame.
2.1.4 Frame Packaging
23
Video compression system identifies three types of frames, depending on the location of all
reference block, I-frame (Intra-coded frame), P-frame (predictive frame), and the B-frame (bi-
directionally predictive coded frame). The video frame is efficiently coded for the I-frame by
motion estimating within the same frame (to reduce spatial redundancy); whereas, the location
of target block for motion estimation/compensation decides the type of coded frame (P-frame
or B-frame). P-frames are coded relative to a temporarily preceding I or P frame and B-frames
are coded relative to the nearest previous and/or future I and P frame. Reasons for coding B-
frames are: 1) better matching of blocks, as it can be matched in reverse as well as forward
direction, hence lower bpp (bits per pixel) after compression and 2) B-frames can be filtered out
from the video stream for lowering the frame rate to improve the load condition of under-
performing network, without considerably affecting the video quality at client side.
Fig 6 : Temporal relations between I, P and B
24
2.2 MIMO Issues
The availability of multimedia resources places new demands on the service that a network
must provide. The most important of these are the data rates, power requirement for
transmission, bit error rate, delay and delay variations, and finally the fluctuations in
bandwidth. Compression algorithms and techniques are critical to the viability of multimedia
streaming and transmission. Uncompressed digital television requires about 140 Mbps BW,
which is practically far fetched as in view of today’s technological advancement. Since few users
have this sort of network access compression and QoS guaranteeing over the network is the
only hope for the widespread deployment of digital video and multimedia.
Multimedia applications, particularly those using video and images demand large bandwidths.
However bandwidth for the foreseeable future will be limited. The limitations arise from the
cost of installing optical fiber transmission, terminal equipment complexity and speed, tariffing
regimes, switching speeds, and increasing numbers of users sharing equipment and networks.
In Indian context, bandwidth is a very crucial factor. Poor network infrastructure in most areas
implies the need of video transmission over poor network conditions with sufficient quality.
Moreover, design of new protocols and modification of existing ones for enhancing QoS leads
to either many clashes or complexity increase in the design of such protocols.
Multimedia also makes new demands on the workstations used to reproduce audio and video.
Processor speeds, transmission systems, storage medium, and network samplers, all be capable
of handling multimedia transmission at high rates.
2.2.1. Existing MIMO coding Techniques
Basic transmission schemes in MIMO systems include :
Precoding: is multi-layer beamforming. In (single-layer) beamforming, the same signal is
emitted from each of the transmit antennas with appropriate phase (and sometimes gain)
weighting such that the signal power is maximized at the receiver input. The benefits of
25
beamforming are to increase the signal gain from constructive combining and to reduce the
multipath fading effect. In the absence of scattering, beamforming results in a well defined
directional pattern, but in typical cellular conventional beams are not a good analogy.
Precoding requires knowledge of the channel state information (CSI) at the transmitter.
Spatial multiplexing(SM): requires MIMO antenna configuration. In spatial multiplexing, a high
rate signal is split into multiple lower rate streams and each stream is transmitted from a
different transmit antenna in the same frequency channel. If these signals arrive at the receiver
antenna array with sufficiently different spatial signatures, the receiver can separate these
streams, creating parallel channels for free. Spatial multiplexing is very powerful technique for
increasing channel capacity at higher Signal to Noise Ratio (SNR). The maximum number of
spatial streams is limited by the lesser in the number of antennas at the transmitter or receiver.
Spatial multiplexing can be used with or without transmit channel knowledge. Such techniques
have discussed in [14]. V-Blast [25] and D-Blast [26] techniques aiming at data rate increase
have also been developed with layered architecture for achieving spatial multiplexing.
Space Time (Diversity) Block Coding (STBC): techniques are used when there is no channel
knowledge at the transmitter. In diversity methods a single stream (unlike multiple streams in
spatial multiplexing) is transmitted, but the signal is coded using techniques called space-time
coding. The signal is emitted from each of the transmit antennas using certain principles of full
or near orthogonal coding (such as in OFDM). Diversity exploits the independent fading in the
multiple antenna links to enhance signal diversity. Because there is no channel knowledge,
there is no beamforming or array gain from diversity coding. Alamouti [27] proposed a basic yet
robust architecture for space time coding, which has been the foundation for many coding
schemes proposed until now [28, 29, 30].
26
Fig 7 A simple MIMO structure
2.2.2. MIMO-OFDM:
MIMO works by creating multiple parallel data streams between the multiple transmit and
receive antennas (see figure below). Using the multi-path phenomenon, it can differentiate the
separate signal paths from each MIMO antenna. It is combined with another radio technology
with tremendous potential for helping solve spectrum challenges is OFDM (Orthogonal
Frequency Division Multiplexing). OFDM is a modulation technique, depicted in the following
graphic, which uses many sub-carriers, or tones, to carry a signal [32].
Typically, OFDM, a spread-spectrum technology that gives wireless networks a new physical
layer, is implemented in embedded chipsets made up of radio transceivers, Fast Fourier
Transform (FFT) processors, system input/output (I/O), serial to parallel and back again
translators and OFDM logic. Figure 1 shows the working of OFDM. OFDM system takes a data
stream and splits it into N parallel data streams, each at a rate 1/N of the original rate. Each
stream is then mapped to a tone at a unique frequency and combined together and sent to
Inverse Fast Fourier Transform (IFFT) to yield the time domain waveform to be transmitted. By
creating slower parallel data streams, the bandwidth of the modulation symbol is effectively
decreased. OFDM can be simply defined as a form of multi-carrier modulation where its carrier
spacing is carefully selected so that each sub-carrier is orthogonal to the other sub-carriers. As
it is well known, orthogonal signals can be separated at the receiver by correlation techniques,
27
hence, inter-symbol interference (ISI) among channels can be eliminated. Proper selection of
system parameters, such as number of tones and tone spacing, can greatly reduce, or even
eliminate ISI.
Some of the benefits of OFDM are: Better coverage and penetration; Reduced operation and
installation costs; Ultra high spectral efficiency; High resistance to multi path; Minimized inter
symbol interference. General OFDM is implemented as:
Fig 8. Basic Structure for OFDM Modulation of bitstreams.
2.3 MIMO FPGA Testbed Issues
2.3.1 Utilization of FPGA in DIP and DSP applications
A lot of research has been recently done on utilizing FPGAs as development platform for DIP
and DSP algorithms. The authors in [33] have developed a high level language (called SA-C) for
expressing DIP algorithms, and an optimizing compiler that compiles the high-level program
written on SA-C and runs them on FPGAs. SA-C is a single-assignment dialect of the C
programming language designed to exploit many features of FPGAs [34, 35]. To compare the
28
performance of FPGAs and the Pentium processors, they have implemented SA-C programs
compiled to a Xilinx FPGA to equivalent programs running on an 800 MHz Pentium III. For 8
common DIP routines implemented on both these platforms, FPGAs offer 8 to 800 times speed-
ups over the Pentium. Experiment results and analysis of various issues such as pipelining,
parallelism, optimizations, memory, I/O etc, brings out many prominent features of the FPGAs,
relevant to image processing realm. In [36] they present performance numbers for several
image-processing routines such as Gaussian, max and Laplace filter etc, written in SA-C.
The authors in [37] present a pipelined architecture of image processing algorithms like median
filter, basic morphological operators, convolution and edge detection implemented on FPGA.
The hardware modeling is done with the Handel-C language. Moreover, in their work [38], the
performance and efficiency of Handel-C language on image processing algorithms is compared
at simulation level with another C-based system level language called SystemC and at synthesis
level with the industry standard Hardware Descriptive language (HDL), Verilog. Comparison
parameters at simulation level include, man-hours for implementation, compile time and lines
of code. Comparison parameters at synthesis level include logic resources required, maximum
frequency of operation and execution time.
The author in [39] implemented the Rank Order Filter, Erosion, Dilation, Opening, Closing and
Convolution algorithms using VHDL and MATLAB on two FPGA platforms. He also integrates the
FPGA algorithms into the modeling environment called ACS. The authors in [40] report the
speed ups that FPGAs offer on image processing methods (such as image denoising and
restoration, segmentation, morphological shape recovery etc.) on 2D and 3D images. In
computer vision and image processing, FPGAs have already been used to accelerate real-time
point tracking, stereo, color-based object detection, and video and image compression [40] (see
also [38]). Crookes presented a hardware FPGA implementation of image filtering to increase
the speed [42]. The authors in [43] applied three 2-input bubble sorting algorithm to obtain a
triple input sorter and implemented it in FPGA. This algorithm can be utilized to obtain the
maximum, middle, and minimum values and hence can be used to realize the 2-D sorting.
29
In the field of digital signal processing, FPGAs have been used widely, for implementation of
various algorithms of diverse complexity. FPGAs provide an ideal implementation platform for
developing a variety of wireless systems [44].The authors in [45] discuss the implementation of
an FIR (Finite-Impulse Response) filter with variable coefficients that fits in a single FPGA. In
some application of DSP such as Software-defined radio (SDR) [46] the performance depend,
not only on latency and throughput but also on speed and energy efficiency. Due to SDR’s
adaptivity and high computational requirement, an FPGA based system is very viable solution.
The authors in [47] present techniques for energy-efficient design at the algorithm level using
FPGAs. They apply these techniques to create energy-efficient designs for two signal processing
kernel applications: fast Fourier transform (FFT) and matrix multiplication (see also [48]).
2.3.2 Superiority of FPGA over other implementation platforms
DSPs: The primary reason most engineers choose an FPGA over a DSP is driven by the MIPS
requirements of an application. The shift towards FPGA today is motivated by the emergence of
innovative technologies like MIMO that utilize complex DSP algorithms. Such high performance
and complexity requirements call for the massive parallel processing capabilities inherent to
FPGA. Even novices in FPGA design can easily implement and validate FPGA-based DSP designs
by using model-based design tools and methodologies which have greatly simplified the whole
design [49]. As many of its features demonstrate, an FPGA is a more native implementation for
most digital signal processing algorithms.
The authors in [50] compare the development effort and performance of a field programmable
gate array (FPGA)-based implementation of a signal processing solution with that of a
traditional digital signal processor (DSP) implementation. They have implemented an acoustic
array processing task, where several traditional DSP “functions” found in many applications can
be employed. As they note, in terms of timing performance, the FPGA implementation is
significantly faster than the DSP.
ASICs: The authors in [51] discuss the advantage that FPGAs have due to their product reliability
and maintainability which also improves its development process. Since FPGA verification tools
30
are closely related cousins of their ASIC counterparts, they have benefited enormously from the
many years of investment in the ASIC verification. The use of FPGA partitioning, test benches
and simulation models have made both integration and on-going regression testing very
effective for quickly isolating problems, speeding the development process, and simplifying
product maintenance and feature additions.
General Purpose software: Software implementation of most image processing algorithms has
several limitations and hence it is quite difficult to achieve. Complex operations have to be
realized by a large sequence of simple operations, which can only be implemented serially. The
range of available operations is limited to common basic operations. The constraint of real-time
processing introduces a number of additional complications. These include such issues as
limited memory bandwidth, resource conflicts, and the need for pipelining. The CPU is
burdened with additional tasks, such as OS requests, user interaction, etc., which is a major
drawback in the context of real-time processing. At real-time video rates of 25 frames per
second a single operation performed on every pixel of a 768 by 576 color image (PAL frame)
equates to 33 million operations per second (excluding the overhead of storing and retrieving
pixel values.) Many image-processing applications require that several operations be performed
on each pixel in the image resulting in an even large number of operations per second. As a
result it is difficult to meet hard real time requirements with softwares [40].
The salient features of FPGAs that make them superior in speed, over conventional general-
purpose hardware like Pentiums is their greater I/O bandwidth to local memory, pipelining,
parallelism and availability of optimizing compiler. Complex tasks, which involve, multiple
image operators, run much faster on FPGAs than on Pentiums, in fact, ref [33] report 800 time
speed up by FPGA using SA-C.
2.4 FPGA Design Options
The initial efforts to generate a hardware netlist for an FPGA target have been to use some
form of a Hardware Description Language (HDL), such as VHDL or Verilog, as a behavioral or
31
structural specification. The focus has been shifting, however, from traditional HDLs to higher
level languages. Presently, a range of high-level tools and languages for FPGA design are
available including: Celoxica’s Handel-C [52] – a C based parallel language that generates EDIFs;
C++ extensions such as SystemC [53], AccelChip’s AccelFPGA – a Matlab synthesis tool that
generates RTL; Xilinx’s Forge Java *54+ – a Java to Verilog tool, or Java classes such as JHDL [55],
X-BLOX from Xilinx (see also [56]). Certain tools such as C-Level Design *57+ convert “C”
software into a hardware description language (HDL) format such as Verilog or VHDL that can
be processed by traditional FPGA design flows. Annapolis MicroSystems has developed
“CoreFire” which uses prebuilt blocks which removes the need for the back-end processes of
the FPGA design flow. Ptolemy is a system that allows modeling, design, and simulation of
embedded systems. Ptolemy provides software synthesis from models. However, all these
systems are still under development and have their own limitations.
2.4.1. Xilinx System Generator
Xilinx System Generator (XSG) for DSP is a tool which offers block libraries that plugs into
Simulink tool (containing bit-true and cycle-accurate models of their FPGA’s particular math,
logic, and DSP functions). The AccelDSP tool allows DSP algorithm developers to create HDL
designs from MATLAB and export them into System Generator for DSP. With more than 1,000
built-in functions as well as toolbox extensions, MATLAB is an excellent tool for algorithm
development and data analysis. An estimated 90% of the algorithms used today in DSP
originate as MATLAB models [58]. Simulink is a graphical tool, which lets a user graphically
design the architecture and simulate the timing and behavior of the whole system. It augments
MATLAB, allowing the user to model the digital, analog and event driven components together
in one simulation.
The authors in [49] design a complete 2 × 2 transceiver, including the Alamouti space-time
decoder, using blocks from the System Generator for DSP tool. The authors in [59] discuss
design of digital electronics using XSG and Matlab. The design of digital electronics today begins
with the building of a model in Simulink Matlab. The model is later implemented in an FPGA
from Xilinx using VHDL source code. They implemented a model of a frequency estimator often
32
used in digital radar receivers in Matlab using XSG. The authors in [60] present a methodology
for implementing real-time DSP applications on a reconfigurable logic platform using XSG for
Matlab. As an example for the demonstration of the design methodology they implement a
function that measures the time delay between two sine waves. This function has applications
in a radar system.
The authors in [61] have developed a rapid prototyping FPGA-based platform for wireless
transmission using the Matlab and System Generator softwares. Their goal is to build a versatile
educational platform for wireless transmission based on FPGA design. The FPGA-based platform
design flow in System Generator consisted of a basic four-step methodology, namely system
design, verification and simulation, implementation and configuration.
2.4.2. Superior performance offered by XSG
A promising and efficient platform such as FPGA can be fully exploited with a powerful tool.
Current research has demonstrated the performance gains offered by System Generator, over
the other design tools, such as classical HDLs. Ref [62] reports an 80% reduction in Project
development time and reduction of 4-to-1 in Overall project time, including hardware
integration and lab testing using XSG. In a project involving creating a fully functional SDR
waveform using the traditional design flow of hand coding using VHDL took an experienced
designer 645 hours which another less experienced engineer completed in fewer than 46 hours
using Simulink and Xilinx System Generator. Efficient GUI for design entry and facility of code
reuse, together with automatic configuration of design blocks for varying bit widths and
number of iterations reduce the design time and overhead of the engineer by many orders.
It has been argued [56] that the decrease in time for testing and verification alone is worth the
migration to System Generator. A 2X to 4X productivity improvement is likely to be achieved
using System Generator over conventional HDL language development methods due to its
design environment and simulation speed. In literature, comparison has been performed
between several design options. In [38], the performance and efficiency of Handel-C language
on image processing algorithms is compared at simulation level with another C-based system
33
level language called SystemC and at synthesis level with the industry standard Hardware
Descriptive language (HDL), Verilog.
34
CHAPTER 3: SYSTEM OVERVIEW
We present algorithmic implementation of video streaming for preferential adaptation of
CEZW+ (Color Embedded Zero Wavelet) compressed videos over wireless, in the facade of
variable network bandwidth and channel conditions. The video lecture is segmented into it’s
component blocks. Compression scheme for such segments includes CEZW coding of video bits,
combined with motion compensation and prediction. This combined scheme removes
redundancy present in the video. Compressed bits are then sent to the Dynamic Decision
Maker (DDM), which consists of QoS (Bandwidth, transmission power, delay, data rate and
reliability) provisioning module. Bandwidth is estimated and fed to the DDM, where segments
with higher relevance are allocated more bandwidth and vice versa based on their relative
importance and motion.
QoS PROVISIONING
Control Feedbac k
Information and CSI
Teacher motion
sequence
Black board motion
sequence
Background motion
sequence
t
e
x
t
VIDEO
SEGM ENTATION
BLOCK
VIDEO COMPONENT
CLASSIFICATION
(2-2) X 3
MIMO Transmission
System
IMPLEMENTATION
ON FPGA
35
Fig. 9. System Overview
The DDM also includes allocating power to sub channels according to fluctuating BW, as well as
determines the total constraint power to be allocated for minimal cost. CSI from the receiver is
fed back to the receiver via feedback. DDM adjusts system’s working according to BW and CSI.
Bits are then passed to the MIMO system, where they are OFDM modulated, and the resulting
symbols are Alamouti STBC coded, transmitted through a (2-2)X3 MIMO antenna system,
received and ML decoded at the receiver. These decoded bits demodulated and checked for
errors. The resulting bitstream is decompressed and reconstructed video is presented to the
user. Alamouti STBC is combined with spatial multiplexing. SM is achieved by our novel method
of transmitting segments of the segmented video through different sets of the transmitting
antennas.
3.1 Video classification and Segmentation:
A typical classroom lecture has Teacher Writing On Board (TWOB) sessions (Fig. 1). These
sessions have a teacher (complete view) writing on Black Board and explaining the class points
to the students. Other types of frames constitute a very low portion of video, have low motion
and can be directly coded. For the sake of brevity we limit our discussion to TWOB sessions
only.
As, video classification is the most common and first step employed in all parallel data
transmission system, hence we exploit the characteristics of TWOB frames, which possess some
typical distinctiveness assisting in determining them from other frames: The complete board is
visible, motion is slow, some students may be visible in the view and a significant part of the
tutor's body may be visible in the frame (see Fig.1). The height of teacher in these frames
generally doesn’t vary much. Thus they can be easily identified. They are classified into three
visual objects: background, Black Board and the tutor. In this work, we have modeled the
background for static camera.
36
Many such schemes have been proposed [13, 17]. In this system, Sobel edge detection based
blackboard extraction is used for horizontal and vertical BB edges, followed by blurring and
intra pixel filling for determining the BB. For background prediction, we choose a set of frames
that have significant tutor motion. Teacher region is removed for extraction of BG. Finally,
teacher region is extracted by comparing the present frame with the modeled background and
blackboard.
3.2. Pooled Video Compression
A combined video compression, including a) enhanced and more efficient algorithm of
conventional PIB (MPEG1) packaging and b) CEZW coding (hence the name CEZW+) is being
used for video compression and removal of redundancy in frame bits for each segment
(teacher, BB or BG). Slow motion videos consist of macroblocks with negligible motion vector
for consecutive frames. Our novel method motion compensates individual segments to yield
PIB frames with just one frame in a GOP size n (say 24). This is achieves as follows. The change
in motion vector is threshold at a value , determined by MSE cutoff function , determined
for a particular GOP as follows:.
,1 , / , / ,2 / ,( 1) / ,max{ ( , ), ( , ),......, ( , )}I I GOPsize m I GOPsize m I GOPsize m I m GOPSize m I GOPsizeMSE F F MSE F F MSE F F
k
37
Fig.10 Consecutive Differences of frames 1 to 10 in relatively fast, slow motion Sparsh Lecture
Video - Changes in MV with frame transition. Significant change at frame 8
where , / ,2 /( , )I GOPsize m I GOPsize mMSE F F denotes the motion vector estimation of frames situated
every 1/mth distance of the GOP size. m is known as the leap factor, and represents the
distance between two non-redundant frames. , /I GOPsize mF represents the frame contents as if
this frame was an Intra frame at 1/m distance of the GOP size. is made proportional to ,
through the constant k , which enables flexibility in choice of suitable motion vector threshold
desired by the coder. Leap can be increased if desired value can be afforded higher with same
perceptual quality. Once is determined, there is no need of coding or constructing any P or B
frames for which motion-vector value is below , and the estimation at the receiver is done by
using same I frame again for reconstruction, without affecting perceptual quality. This removes
high amount of redundancy present in original PIB frame packaging.
38
Resultant frames are then further coded using CEZW scheme [13]. Compressed bit stream
consists of initial threshold T, followed by the resulting symbols from the dominant and
subordinate passes, which are entropy coded using an arithmetic coder. Scaling factor decides
ratio of bits per pixel available to the Y component to the bits per pixel available to the U and V
components. The input segment frame is read as R, G and B components and converted to Y, U,
V color space and down sampled.
Fig 11. CEZW+ compression module steps
3.3 MIMO System Architecture
The MIMO system consists of multiple input and output antennas. Input data in the form of
compressed video bits are suitably coded and then transmitted with special hybrid scheme of
spatial multiplexing and space time block coding. These are described stepwise as follows:
Advanced PIB frame Packaging
2D-Wavelet Decomposition
CEZW compression coding
CEZW+ Coded
Video
DYNAMIC
DECISION
MAKER
(DDM)
Component
Classificatio
n
MIMO OFDM
TRANSMISSION
Structure
Teacher
A N T E N N A
Hybrid
Alamouti
CE
ZW
+
Co
mp
res
sio
n
Blackboard
A N T E N N A
SM
FPGA
SYNTHESIS
VID
EO
IN
PU
T
TO Channel
39
Fig. 12. Architecture of transmitter testbed
3.3.1. FEC and OFDM coding for overcoming ISI: FEC is used for avoiding retransmissions and
bit corrections. A major blockage in streaming is ISI (inter symbol interference due to delays).
Normal video transmission can cope up with delays through many algorithms, buffering, etc.
However such methods cannot be employed in real time video streaming as any delay is
hinders video display altogether and thus is intolerable. OFDM guarantees that ISI is overcome
through orthogonal sub channel formation. We use 1024 point FFT for coding compressed
bitstream blocks. These blocks are appended with cyclic prefix (CP) which ensures further
protection from delay and ISI.
3.3.2. Spatial Multiplexing: Our new structure achieves Spatial multiplexing by transmission of
different segments of video through different antenna sets as shown in the figure. Note that
segment C is background. This segment has negligible change for the whole educational video,
as is determined by above coding scheme, and is thus transmitted only when a significant
change has occurred due to unpredicted movement in the background when the teacher is
instructing. In such a case, the background bits are accommodated with the other two
segments, although in different bandwidth regions. Thus a separate transmitter antenna set for
segment C is not constructed.
3.3.3. STBC and Transmitter system: For STBC, transmit and receiver diversity is achieved with
(2-2) X 3 MIMO structure using Alamouti coding and ML decoding scheme. Each encoded and
compressed segment. There are two sets of antennas at transmitted. Set A transmits Segment
A (teacher), and Set B transmits segment B (Blackboard). Each set consists of two antennas for
Alamouti coding of each segment separately.
40
Fig 13. Coding schematic for Simple Alamouti STBC.
These are then transmitted. Reception has a complex yet efficient hybrid structure. At the
receiver, antennas A and B form a set for receiving Segment A; Antennas B and C form the set
for receiving Segment B. Thus the original 2X2 Alamouti is still being employed, but with hybrid
robust structure. Such hybrid structure cannot be used at transmitter due to space time coding
constraint of Alamouti, which is not necessitated at the receiver. Moreover, downlink receiving
signals for different segments are easily differentiated through filters, as different segments are
transmitted in different BW regions. Figure summarizes the transmitter receiver system..
Fig.14. Receiver system Antenna Selection
3.3.4. Channel and CSI: In this system, primary MIMO channel model under consideration is a
random, frequency non-selective, Rayleigh fading channel model. Channel for MIMO system
with MT transmitter antennas and MR receiver antennas, the system equation is Y = HX + N
where Y is the MR X 1 received signal vector, X is the MT X 1 transmitted signal vector, H is
MRXMT channel matrix, and N is the MRX1 Rayleigh Channel noise matrix. We further assume
that the estimated channel state information (CSI) at the receiver is available at the transmitter
through feedback. This is used in extracting noise to carrier ratios and bandwidth estimation
and adjusting the total signal power as discussed in following sections.
3.4. QoS Guaranteeing
MIMO Transmitter
(2-2)
SEGMENT A
SEGMENT B
SEGMENT C
MIMO Receiver
(3)
ML Decoding
A
C
B
FOR SEGMENT
A
FOR SEGMENT
B
41
3.4.1. Fluctuating Bandwidth: With change in bandwidth, and constraint on the amount of data
transmission, important data needs to be streamed at lower bandwidth, while allocating low
relative BW to those with lesser importance (Change in B contents are given higher importance
than teacher movement, while vice versa when teacher motion is present rather than change in
BB contents). This is achieved through bandwidth distribution based on PSNR of video frames,
which is available from the coding block. Let the total available BW at any time t is WtT. Further,
let WtTCR, WtBB be the allocated BW for teacher and Blackboard. Let PSNRFt
TCR, PSNRFtBB be the
PSNR for Teacher and Blackboard regions for Frame Ft. We also assume the BW to be constant
for a frame, which is quite practical and obvious. Then, BW for segments are allocated BW
according to the following metrics:
( )( ) ( ) TFtt tTCR BB BG
Ft Ft Ft
PSNRW W
PSNR PSNR PSNR
where, is TCR, BB and BG, 1, ( ) || 0,frames BGMSE BG otherwise
Here, MSE cannot be a metric as in [13], as we have used advanced PIB coding with redundant
motion vector removal, thus inhibiting the use of MSE metric. Higher preference to coded
frames with a greater PSNR ensures the maximum information streaming. Also, images with
PSNR below threshold need not be communicated, as they can be replaced by earlier frames,
without any visual perception loss.
3.4.2. Optimal Power Allocation (OPA): Noise in channels increase with BW increase. Thus,
total power requirement should conform to increased power requirement. We can no longer
depend on constant total power constraint for all different transmit streams. Low power at
lower BW provides with lower usage and thus saving of power and reducing cost. Also, higher
BW needs higher power allocation. Thus, we develop OPA scheme, with determination of total
transmit power based on CSI and BW estimation. Let at time t, the reference power
requirement at BW WtR be PtR (all power units as dBm). Let at time t+t’, changed BW be
Wt+t’C, and total power allocated for transmission be Pt+t’A, then an increase or decrease in
power is determined dynamically through the following equation:
42
' '
1 10 ' 10'
10
1
/ /log log
( )log
/ /
niA iA
t t t t C RiA R R t t t
t t t t RniR iR tt t
i
N C nW W
P P PW
N C n
where ' '
1
/ /n
iA iA
t t t t
i
N C n
is the mean Noise to carrier ratio determined through CSI for new
power allocation for n sub-channels at time t+t’. Similar is for reference (t) being used. Factor
10 ' 10 10[log log ]/[log ]C R R
t t t tW W W is included to account for the fact that a 10 KHz fluctuation
at 100 KHz reference is quite less significant than a 10 MHz fluctuation at 100 MHz. This factor
ensures that above formula does not yield an absurd value of new power in such cases.
Fig 15. N/C increase with increase in BW
N/C may be typically high for one sub-channel, while low in others. In such cases, averaging
yields far-off estimate of actual channel conditions. Thus, mean is replaced by maximum value
of N/C obtained in sub-channels. In our system, as there are two sets with two antennas each
and two receive antennas for each set, hence n=8. Figure 5 shows wastage of power at lower
43
BW and extra requirement at higher BW. Due to increase in N/C at higher BW, additional power
is required for streaming.
Fig.16-17. Constant power allocated and resultant bottlenecks in power allocation and
Integration of Power allocation module
3.4.3. Data Rate Increase: We achieve high data rate increase due employment of spatial
multiplexing in the system, over conventional SISO, MISO or non-SM MIMO channels. Parallel
WATERFILLING
ALGORITHM BASED
UPA
TO RX
DDMLOSSY NETWORK
CHANNEL
CSI
BW ESTIMATION
POWER
ALLOCATION
MODULE
MIMO - TX
STRUCTURE – UPA for
different MIMO
transmission antennas
44
transmission ensures high data rate. Also, data rate increases due to fact redundant PB frames
are removed during the compression. Thus, a higher video data can be transmitted in a much
lesser time. This is shown in the experimentation. Also, due inherent MIMO property, data rate
increase is depicted by the following conventional Shannon capacity equations: H here denotes
the channel matrix or the transfer function of the channel, with the output data being: rt
=Hst+vt, r is the received signal vector, X is the transmitted signal vector, v is the noise vector.
The capacity equations are:
C=log2 (1+ SNR x |H|2) for SISO channel
C=log2 (1+SNR x HH*) for MISO/SIMO channel
2 M
SNR HH*C = log det I +
N
for MXN MIMO channel
here, H = abs (H); H* = conj (trans (abs (H))); IM is the identity matrix; M is the number of
transmit antennas, N is the number of receive antennas. However, even a fraction of such
capacities are hardly reached practically. Therefore, we use water filling algorithm with OPA
discussed above to ensure maximum throughput, as WF algorithm iterates to reach Shannon
limits. This is proved experimentally.
3.4.4. Reliability: Alamouti scheme results in decrease in BER of system. We use channel FEC
coding to improve the BER and thus the perceptual quality. Moreover we give more power to
more important segment depending on PSNR, similar to BW allocation. This gives higher
information at the receiver.
3.5. XSG implementation of MIMO System on FPGA
Figure 18 shows the design of the system for Alamouti MIMO scheme. The unique advantage of
the design is the graphical modeling, ease of implementation, clear flow diagram, suitable
interfacing between modules implementable on the hardware and software, reusability,
modularity and ease of development. The system view can be made more modular by
integrating the separate block as fewer subsystems.
45
Here is the brief description of the System Block Diagram.
Fig 18. FPGA MIMO System Generator Implementation of One segment of Hybrid Alamouti
A. The block on the transmitter side for carrying out BPSK modulation sends its input to the
output of the two antennas at the transmitter side. The input signal is randomly generated
and is also used ultimately to verify errors in the transmission. This completes transmitter
side.
B. The channel is simulated in the design itself. The channel is modeled as a flat-fading
Rayleigh channel with random noise. The H matrix of the channel models its characteristic,
46
which affects the signal in a particular way. The noise is additive in nature, to balance the
delays introduced by some of the blocks, intentional delay is introduced in the other paths.
C. The receiver implements Maximum Likelihood decoding which is used to finally take the
decision about the transmitted symbol. This is followed by BPSK demodulation, so that
symbol may be converted in the original signal format. A comparison of the received signal
and the original signal shows the errors in transmission.
D. Performance of the system is assessed by performing simulation of transmission of a large
number of bits. Cumulative error is termed as Bit Error Rate (BER) and this is determined by
accumulating individual errors. This model can be simulated on software for accuracy.
E. The symbol of ‘System Generator’ on the left is finally used to convert the design into fixed
point model, implementable on the hardware. XSG generates complete files which can be
directly used in Xilinx ISE software for creating the final design.
The models of any circuit or system developed in XSG can easily be simulated in Matlab, which
is one of the strongest arguments of using Xilinx Blockset and the XSG. System Generator (XSG)
provides capability of the bit-true and cycle-accurate simulation for DSP, due to which the user
can validate the design before implementing it on hardware. Because it is generic to any Xilinx
FPGA device, by simply changing one single parameter the user can compare the results of say,
a Virtex-II implementation to that of a Virtex-4 device. It also provides many features such as
System Resource Estimation to take full advantage of the FPGA resources, Hardware Co-
Simulation and accelerated simulation through hardware in the loop co-simulation etc. Along
with the high level abstraction, the tool also provides access to underlying FPGA resources
through low-level abstractions, allowing the construction of highly efficient FPGA designs.
Figure 19 shows the option of hardware optimization possible by choosing proper rolling factor
in any loop. For a nested loop, the option is also provided to separately optimize both the loops
by choosing loop rolling factors individually for row and column loops.
47
Figure 19 also shows the design flow in AccelDSP from Matlab code till generation of System
Generator block which can be added as a custom, user-defined reusable block. For generation
of such block, all the previous steps should have been completed with any error. It is clear that
AccelDSP executes both the original Matlab design (floating point design) and the generated
hardware design (fixed point design) so that the user can compare these designs.
Fig 19. Optimization steps for implementation
3.6 FPGA Hardware Design
We have implemented our work on FPGA Spartan 3E xc3s500e -5fg320 platform. The design
takes into consideration practical limitations of hardware, namely number of input output pins,
interfacing and the clock issues. Major characteristic of our design is that all of logic in FPGA can
be rewired, or reconfigured, with a different design. This allows a large variety of logic designs
dependent on the processor’s resources), which can be interchanged for a new design as soon
as device can be reprogrammed. Final design on FPGA provides flexibility to reprogram and
48
reusability. An informal discussion with several students doing their projects on FPGA or
reconfigurable hardware revealed the effectiveness of our approach (using XSG) for FPGA
design to produce the final design in time small enough and with little efforts on debugging.
The steps taken, techniques employed for optimizations, results and the limitations are shown
in next chapter. Many advantages of FPGAs make them a preferred choice of implementation in
DIP realm. Multiple iterative processing of data sets such as four stages of canny edge detector,
which require performing multiple passes over the image have to be performed sequentially on
a general-purpose computer, which can be fused in one pass in FPGA, as their structure is able
to exploit spatial and temporal parallelism. FPGA can perform multiple image windows in
parallel and multiple operations within one window also in parallel. By employing several
optimizations techniques such as Loop Fusion, Loop Unrolling etc efficient usage of FPGA
resources and speed-up in implementations is possible by avoiding many redundant operations.
FPGAs are capable of parallel I/O, which allows them to perform read (from memory), process
and write (to memory) simultaneously. Many operations such as convolutions, finding square
root etc can be executed much faster by using pipelining and parallelism. High computational
density in FPGA together with a low development costs allows even the lowest volume
consumer market to bear the development costs of FPGAs.
49
CHAPTER 4: IMPLEMENTATION
The implementation of segmentation and pooled compression module and MIMO-OFDM and
testing was carried out in MATLAB. QoS Guaranteeing was done in MATLAB, prior to
transmission. We then implemented the whole MIMO system on FPGA.
4.1 Steps that led to the final design
The following steps were involved:
Step 1: Development of the segmentation module.
Step 2: Development of the CEZW+ coder and decoder.
Step 3: OFDM MIMO system implementation on MATLAB
Step 4: Incorporation of QoS guaranteeing modules.
Step 5: Implementation of MIMO system on FPGA:
i. A pure Simulink model-based design of the functionality and simulation
ii. Using AccelDSP to generate System Generator blocks
iii. Using the blocks generated by AccelDSP and with the basic blocks in Xilinx library, design
of system by XSG in Simulink and simulation
iv. Generation of hardware code on VHDL, optimization, debugging, simulation etc in ISE
and generating programming file for a particular FPGA
v. Downloading to FPGA after configuration, testing, and demonstrating the functionality
50
Fig 20. Steps 1-2: Development of the segmentation and CEZW+ module
Fig 21. Steps 3-4: Integration of OFDM-MIMO system and QoS Guaranteeing
51
Fig 22. Step 5: Development and Integration of FPGA Network Streaming Module
4.2 Code Development Model
The code development was done using Modular approach. The project was divided into
different modules and each module was independently developed. Figure 22 shows the
different modules.
4.2.1 Segmentation Module
The segmentation module was developed and tested in Matlab. It segments each video frame
into three Visual objects. The GUI developed for implementation is shown in fig.18. Several
techniques like using edge detection based approach, Kalman filtering approach for motion
prediction, contour mapping based approach for object approach for object segmentation etc
were applied and the optimal results obtained.
4.2.2 Pooled Compression Module
52
The compression module was first implemented in Matlab. Block and image processing was
done using the inbuilt Matlab image processing toolbox functions. Advanced PIB frame
packaging is performed with MATALB and MV Threshold. Packet format is especially built to
support the information regarding PIB frames. Special bits specify the number of times a
particular frame has to be repeated, in order to achiever successful PIB decoding and best
perceptual quality. The packet format is as shown in Figure:
Fig 23. MAC layer PDU format for transmission of bits
For implementing CEZW scheme, we employ open source Matlab code of EZW compression
scheme. However, the EZW scheme is limited to compression of grayscale images only. Thus,
we break the frames into their YUV components and run the code for separate compression of
each color space. The coding scheme is shown in the figure below:
4.2.3 Network Streaming Scheme – Implementation of MIMO on MATLAB
For Transmitter, we develop the MATLAB code of Hybrid Alamouti Scheme, with the formal
mathematical analysis required by the Alamouti coder. Antennas need to be simulated due to
lack of proper resources and absence of any wireless network in the department. Thus, each
path from transmitter antenna to receiver antenna is separately modeled as a Rayleigh fading
channel. The data is then QAM modulated and further OFDM modulated with cyclic Prefix (CP),
to ensure removal of ISI.
The functions used are:
53
channel = rayleighchan (ST, DS, DMP, PG) constructs a frequency-selective ("multiple
path") fading channel object that models each discrete path as an independent Rayleigh
fading process. ST is sampling time of the channel, kept equal to the bit duration, DS is
the Doppler shift, kept negligible for testing, as we are stream to static users, DMP is a
vector of multi-path delays, each specified in seconds. PG is a vector of average path
gains, each specified in dB
Use of FFT and IFFT functions for OFDM implementation with FFT size as 1024. This is
also thus the block length of the transmission. Also, cyclic prefix is chosen to be of
length of ceiling (0.4 times the size of block).
CSI is obtained through feedback from the receiver, with random modeling of Rayleigh
additive noise in the channel. Moreover, CSI information is set to provide dynamic
carrier to noise ratio and channel modeling in the form of changes in the fading of sub-
channels.
Self designed code for ML decoding of hybrid Alamouti receive structure, based on
original decoding criterion set by Alamouti. Filters implemented for downlink
transmission separate the different segments, which are inherent frequency separation
due to employment of OFDM coding. Each segments PDU is separately dealt with
different code streams and output is then merged through a commonly shared function,
which then integrates the segments in to the whole frame sequence and then outputs it
to the decompression module for decompression. Following decompression, frames are
de-interleaved to account for jump factor, and finally reconstructed.
4.2.4 Bandwidth Estimation
Most of the bandwidth estimation tools available as well those proposed in research papers are
based on the principle of self-induced congestion. Since bandwidth estimation has to be done
every 5 seconds, it is very inefficient to use them as they may lead to network congestion, thus
deteriorating performance. Therefore, we simply used the feedback based packet transmission
54
technique which has inherent bandwidth estimation capabilities. The data is sent to the
receiver at the rate at which feedback is received.
4.2.5 MATLAB to FPGA
We employ a two-pronged approach to the solution of the problem. An intelligent and practical
design of the system must involve a balanced approach towards the solution of the problem.
Firstly, one must work on software, developing the code, design and block diagram, to satisfy
the specifications of the design and assembling all the required softwares. And secondly choice
of proper hardware, resources required and learning time and development efforts have to be
taken care of. A successful design will be one where the two aspects of the design influence
each other and take into consideration the constraints and advantages the other platform
offers.
Simulink Design
System Generator
Design
AccelDSP Block Design
Matlab Code
Mutual Correspondence and Influence
Downloading toFPGA
Simulation
Fig 24. Design Approcah to migrate from Matlab to FPGA
4.3 Challenges / Failures faced
We faced certain challenges in deciding the jump factors for each individual segment in case of
background noise, such as falling go books. In those cases, background gains more importance
55
in terms of PSNR and thus, needs to be compensated in the system. This we did through
accommodating the teacher and background bits, in the same transmission path, and
differentiated the bits corresponding to the two components using the specification bits of the
MAC layer PDU shown in figure 23.
Another challenge was the sudden change in channel response and noise due. Once CSI is
obtained, system has to wait for transmission time of a whole block. During this time, which is
relatively quite small, yet if the system changes abruptly, loss in the accuracy and optimal
allocations of DDM occurs. Such a case was encountered while implementing and running the
system for determining data rates with hybrid Alamouti. The result is as shown in the figure
below:
Fig. 24. Test case failure due to abrupt change in channel state immediately after CSI
information is obtained. The situation however soon rectifies with progressive CSI available
While working practically on FPGA design, we developed many insights into actual scenario,
limitations and advantages of the systems. The important challenges we faced were:-
1. For any non-trivial design, downloading it on FPGA requires enormous resources. The
difference between the numbers of I/O pins required and actually available on the
hardware becomes a problem and forces to modify the program.
56
2. Image processing applications present special problems while designing FPGAs, because the
software tools such as blocksets developed using AccelDSP, don’t accept a three or even
two dimensional data directly. So, such data has to be given sequentially as one dimensional
data. This precludes the possibility of using many in-built function of MATLAB since they
work with two or three dimensional image matrices.
3. The number of blocks and functions of the blocks in System Generator are also limited
today. This has forced us to ourselves design many of the blocks, especially in image
processing field.
4. Not all blocks from Simulink are available in Xilinx Blockset. And there exists no way to
generate hardware from models which use the basic Simulink Blockset (apart from those
provided by the Xilinx). So though, for simulation purpose, Simulink blocks provide graphical
block based design, the actual implementation possible on hardware is quite different from
it. This creates a big gap between modeling and hardware design and as such, restricts the
usefulness of the approach.
4.4 Code Development features for higher performance
The following code development and optimization techniques were used in the implementation
to enhance performance:
FPGA migration from MATLAB: The structures for the transmission and decoding eat up
most of the streaming time available and thus slow down the system largely due to slow
processing at software platform. Thus, code is mapped to FPGA by use of system generator
Xilinx tool. Initial coding is done in Matlab, and finally compressed data is passed to
programmed FPGA’s thorough JTEG cables for parallel processing of a large number of bits.
Program Level Optimizations: Use of inline functions and macros have greatly reduced the
number of function calls thus accelerating performance. Loops are resolved in to parallel
processing blocks in FPGA implementation, thus reducing the per frame computation and
processing time drastically.
57
A powerful and valuable design option used by us for efficient hardware generation in
AccelDSP is the “Unrolling a Loop to Increase Hardware Performance”. Figure 25 shows the
‘Fixed Point Report’ generated by the AccelDSP for a design of MIMO block. In an FPGA, the
possibility of using the ‘loop unrolling’ feature gives the designer a choice to trade-off
between area and performance. A general ‘for’, ‘while’ loop implemented in software or in
hardware (with no optimization) needs to be executed as many times as the iteration
interval (say N) of the loop. This would mean applying the data to a single data path for
each time. However, if the loop construct is completely unrolled, then the AccelDSP
Synthesis Tool builds a different hardware structure where N data samples are applied
simultaneously to N identical parallel data paths. Performance is increased N times at a cost
of increasing the hardware area by N times. If a full unroll consumes too much hardware,
then the user can go for a partial unroll making a suitable balance between area and
performance requirements. The encircled portion in the figure 4 refers to the optimization
of the following statement in the AccelDSP, 0.5;r r where r is a two dimensional (108*2)
array. The statement adds .05 to each of the element of r . Note that the option is provided
to optimize the loop in both the directions, this serves as a typical example of optimization
of N-dimensional loop in AccelDSP.
58
Fig 25: Output of AccelDSP “Generate Fixed Point Report”
Apart from the blocks provided by the standard library, we have also made custom blocks
using both MCode blocks (for non-algorithmic code) and AccelDSP (for algorithmic code).
This helped us to extend the utility of the XSG tool. An important attribute of our design
using AccelDSP was that the blocksets generated in AccelDSP for XSG, are reusable and can
be neatly divided into appropriate libraries each containing blocks specific to a certain field
such as (for example) ImageProcessingLibrary , MIMOSystemLibrary etc depending on their
applications. The MIMOSystemLibrary for example may contain all blocks made by us which
are useful in MIMO communication field and so on. Figure 26 shows the library consisting of
custom blocks designed by us. MySysGenBlocks library in Simulink Library Browser
(encircled) contains blocks used in design of MIMO systems. Another Simulink file, named
MyMCodeBlocks contains is the library of blocks made from MCode and gives the user
facility to augment the functionality in XSG. The model file of the user named ‘untitled’ can
use blocks from both of these sets also.
59
Figure 26. Snapshots of user-defined blocks augmenting functionality of XSG
Scalability is defined as the ability to support larger volumes of data and more users in cost-
effective increments. In this section, we will talk about two types of scalabilities:
1) Scalability in terms of the number of videos: The CEZW+ coding technique stores the
lecture videos in highly compressed form which enable storage of large volume of such
videos. The videos are encoded once and saved on the server for subsequent streaming
to the clients at adaptive bitrates.
2) Scalability in terms of the number of users: To stream a video to a client, feedback based
QoS guaranteeing is performed at the server and a part of the encoded video bitstream is
transferred to the client. Decoding is performed at the client side. Server doesn’t
perform coding or decoding during the processing of the request. Thus, during the
serving of a request, the server behaves like a normal web server and can handle a large
number of simultaneous requests.
4.5 Limitations & Bottlenecks
Lecture videos of arbitrary dimensions cannot be used with the codec because the nth level
wavelet decomposition requires the dimensions to be a multiple of 2n.
Transmission of coded bits through multiple channels also results in a change of frequency
in the practical transmission systems, this is generally called as Frequency shift. This can
result in some of the data bits not being received, when implementing on a practical
wireless transmission system. Thus there is an increase in the retransmission through the
routers.
Sudden change in Channel fading, of which probability is quite low, can result in defeating
the QoS guaranteeing of DDM. This however, occurs a few times and is rectified as the next
CSI feedback arrives after block transmission.
The segmentation algorithm may not function properly if there are illumination changes in
the video thus leading to higher bitrates.
60
The blocks provided in Xilinx do not provide as much flexibility as their counterparts in
Simulink Blockset. Apart from a need of different treatment for these blocks, this also puts a
burden on the designer for having to calculate many design parameters and specifications
such as binary points etc for every block output, and also for constants used in blocks.
The number of blocks and functions of the blocks are also limited today. For example, the
existing FFT block provided with System Generator can only handle 16-bit input and 16-bit
output.
Although modern FPGAs can be reconfigured quickly, to achieve this dynamically while
continuing to process data is a complex and challenging task.
The size of memory that can be implemented using standard logic cells on an FPGA is
limited, as implementing memory is an inefficient use of FPGA resources.
However, many of the problems of the System Generator can be explained by the fact that it is
a new product not fully developed yet and it is expected that experience of the experts from
industry and universities will help it grow into a more practical system. Moreover, it is updated
continually, giving better performance and more blocks in Xilinx Blockset.
61
CHAPTER 5: RESULTS
5.1. PIB Frame packaging Compression: GOP of size 24 were chosen for videos. Three different
lecture videos were used. CEZW encoding threshold = 20; Wavelet Type Employed: db4 and
bior 4.4., video dimensions: 240X320. Figure below shows the difference in MV’s of the GOP
frames with jumps. The teacher movement from Frame 1 takes a jump to Frame 10 for crossing
threshold, at the reference frame number 1. However, the Blackboard still does not show any
change significant enough to cross MV threshold. Thus the same frame is repeated for
consecutive frame reconstruction at receiver. Table 1 shows the results of achievement of very
low compression bits output due to our pooled compression scheme. :
62
Fig 27. Differences in Teacher and Blackboard frames and respective crossing of MV Thresholds.
Table 1. Performance comparison of videos for Lecture video sequence
m Average bpp CEZW compression only Average bpp with Pooled compression
Video 1 Video 2 Video 3 Video 1 Video 2 Video 3
8
0.39
0.52
0.45
0.23 0.30 0.26
7 0.20 0.26 0.22
6 0.17 0.22 0.20
5 0.15 0.20 0.18
63
4 0.14 0.19 0.16
5.2. Data rate increase: Data rates are measured as the maximum number of bits allocated per
Hz for transmission, ensuring a maintenance in BER, as larger data rates may increase BER. A
perfect measure of the data rates is to plot it as normalized w.r.t to Shannon limits spoken of
earlier, rather than expressing in bits/Hz. This is a more realistic measure of performance of the
system. The experiment is performed with varying channel conditions and different SNR values
of the transmitted signals. The graphs show the increased performance reaching closer to
Shannon limits with WF.
64
Fig 28. Normalized Data rate increase under different channel conditions.
5.3. Reliability: Compressed video bits undergo channel effects in MIMO transmission.
Different segments are allotted different power as similar to BW allocation, depending on
segment PSNR Figure below shows improvement in Teacher segment perceptual quality for
Frame no. 94 due to increase in power allocation.
Fig. 29. Perceptual quality improvement in Teacher Segment transmission with: SNR = 3 db,
6dB, 9dB, 12dB; Resultant BER’s = 0.2938, 0.0328, 0 .000562, 2.91e-5;
65
5.4. Bandwidth allocation: BW allocation was done for a special lecture with 286 frames in
which we deliberately introduce noise in the background (falling of books) to ensure suitable
resultant allocation. Different amount of available BW is allocated to either TCR (magenta), BB
(green), or to BG (black).
Fig. 30. Bandwidth Allocation. Note the extra noise in BG at Frame number 209
5.5. Power allocation at different BW levels: Graph shows power variation in new scheme
rather than constant power allocation. 10 MHz is chosen as the reference BW, and N/C
calculated form CSI. Change in reconstruction of frame is clearly visible. The experimental BW is
chosen to be 25 MHz
66
Fig. 31. New Total power allocation and resultant reconstruction quality increase.
Fig. 32. New Total power allocation under different noise distributions.
5.6. BER through FPGA implementation of hybrid Alamouti
67
Figure 33 shows the output of the System Generator block simulation for transmission of 500
bits. The two figures correspond to two receiver antennas at the receiver side in 2*2 scheme.
The encircled values of the signal correspond to error in transmission due to noise and inter-
symbol interference. However, the design shows a very small BER value of 10/500 (.025), which
supports the superiority of our approach. Clearly a graphical design and output along with the
power of Simulink comes as a great boon for the programmer.
Fig 33. Based on our work we have concluded the following comparison results for the
performance between VHDL and the System Generator.
5.7. Optimal Performance of XSG for FPGA’s
Table 2 describes the significant milestones our optimal approach of employing XSGF achieves
as compared to conventional approaches of suing VHDL platforms for implementation on
68
FPGA’s. Not only the processing time, code memory requirement, and other aspects improved,
XSG also provides for bypassing the learning time required for VHDL coding. Thus, it proves as a
convenient tool in the hands of the generations to come.
Table 2. Comparison of VHDL and XSG Platforms for FPGA implementation
VHDL Xilinx System Generator
Learning and Design time 3 months 4 weeks
Debugging time Larger Small (GUI design)
Code reuse Difficult and likely to
create error
Easy to reuse the blocks by
integrating into library
Design overheads User must involve himself
in many low-level design
issues
Tool automatically
generates HDL netlist,
place and route info etc
flexibility Modifying the program is
very error-prone
Model based design makes
modification very easy
Simulation/test and debugging
support
Large time required in
simulating and creating
waveforms
Avails the power of
simulation of Simulink
(and design in Matlab)
Time for simulation High An order of magnitude
faster than HDL simulator
Higher level routines Less 90 DSP blocks available,
can be extended
Code lines/blocks for
comparable program
200 lines of code 3-5 blocks
69
Productivity Low 2X to 4X over HDLs
Figure 34. shows the resource utilization in different target platforms in terms of four
parameters which are common to all the platforms. The comparison shows clearly the optimum
nature of the platforms on the higher ends. The Spartan 2 platform uses the highest number of
resources in all the parameters, while as Virtex 4 kit uses the lowest. It is to be noted that the
resources depend critically on the nature of the platform (called specification of the family) and
may also vary with the design.
Figure 34. Resource utilization in different target platforms. (Where Spa2= Spartan2 xc2s200-
6fg456, Spa3= Spartan3E xc3s500e -5fg320, Vrtx2Pro = Virtex2Pro xc2vp30 ff6896, Vrtx4=
Virtex4 xc4vfx12 ML402)
70
CHAPTER 6: CONCLUSIONS AND FUTURE WORK
We ensure guaranteed QoS for low motion video MIMO streaming, with numerous novel
techniques and supporting results [63] [64] [65] [66] [67] [68] [69]. We provide new algorithm
that significantly reduces the lecture video size, and hence increases the performance. Many
bottlenecks and their solution were considered, ands thus addressed through MIMO system
architectures. Furthermore, our work highlighted the improvement in design offered by the
FPGAs by using several techniques such as parallel I/O, pipelining etc. Our work clearly
demonstrates the optimal performance of System Generator, which make it an excellent high
level design tool for FPGA design. Decrease in time for testing and verification alone is worth
the migration to System Generator as it offers a 2X to 4X productivity improvement over
conventional HDL language development methods due to its design environment and
simulation speed. Our future work includes implementation of such robust systems on FPGA
testbeds, and further increasing the performance of the system. Such hardware testbeds are
going to be the next generation revolution systems. Further work on slides, QA session videos,
etc is intended.
This work can be extended to improve the accuracy, run time, and robustness of the system by
1. Implementing several basic and complex modules of MIMO OFDM communication systems
in XSG and comparing their performance with those implemented on other design
platforms such as HandelC, SystemC etc
2. Making prototype of advanced MIMO, DSP and Compression systems and validating their
designs in XSG and downloading it on FPGA
3. Critically analyzing the performance of several FPGA platforms such as Spartan and Virtex
family, for complex modules such as Radar Signal detection application, Image convolution
and FFT/ IFFT.
71
4. Extending the functionality of XSG further by making libraries of XSG blocks using AccelDSP,
so that a large number of Matlab modules can be converted into hardware implementable
blocks.
REFERENCES
*1+ J. Gerhard and P. Mayr, “Competing in the e-learning environment-strategies for universities,” Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS), January 2002 pp. 3270 – 3279.
[2] MIT Website, http://ocw.mit.edu/index.html.
[3] Z. Nedic, J. Machotka, A. Nafalski, "Remote laboratories versus virtual and real laboratories", 33rd Annual Frontiers in Education, 5-8 Nov. 2003, Vol.1, Iss. pp T3E-1- T3E-6.
[4] H.H Hann, M.W. Spong, "Remote laboratories for control education", Proceedings of the 39th IEEE Conference on Decision and Control, 12-15 Dec. 2000, Vol.1 pp895 – 900.
[5] B. G. Haskell, A. Dumitras, “ A background modeling method by texture replacement and mapping with application to content-based movie coding”, ICIP (1) 2002 pp 65-68.
[6] S. Murphy, S. Goor and L. Murphy, “Performance comparison of multiplexing techniques for MPEG-4 object based content”, IPCC 2005. 24thIEEE International 7-9 April 2005 pp 505 – 510.
[7] J.Dong, Y.F.Zheng, “Content-based retransmission for 3-D Wavelet video streaming over lossy networks”, IEEE Transactions on Circuits and Systems for Video Technology, Volume 16, Issue 9, September. 2006, pp.1125 – 1133..
[8] J. Lee, and H. Radha, “Interleaved Source Coding (ISC) for Predictive Video Coded Frames over the Internet,” IEEE Proc. ICC, May, 2005, pp 1224-1228.
[9] T. Liu and C. Choudary, “Content-Adaptive Wireless Streaming of Instructional Videos”, Multimedia Tools and Applications, 23(2), pp. 157-171, 2006.
[10] J. O. Limb and C. B. Rubinstein, “Plateau coding of the chrominance component of color picture signals", IEEE Transactions on Communications, vol. COM-22, no. 3, pp. 12{820, June 1974.
[11] Q. Zhang, W. Zhu and Y.Q. Zhang “End-to-End QoS for Video Delivery Over Wireless Internet.”, Invited paper for the proceedings of the IEEE, vol. 93, no. 1, January 2005
72
[12] D Gesbert, M Shafi, D Shiu, PJ Smith, A Naguib. “From Theory to Practice: An overview of MIMO space-time coded wireless systems”. IEEE Journal on Selected Areas in Communications, 2003, vol. 21, no. 3, pp 281-302
[13] A. Mittal, A. Pande, P. Verma, “Content-based Network Resource Allocation for Mobile Engineering Laboratory Applications ", 6th international conference on mobile Learning, mlearn 2007, 16–19 October 2007 Melbourne Australia.
[14] H. Bölcskei, E. Zurich, “MIMO-OFDM Wireless Systems: Basics, Perspectives, and Challenges”, IEEE Wireless Communications, August 2006, pp 31-37.
[15] M. F. Sabir, R W. H. Jr. and A. C. Bovik. “Unequal Power Allocation For Jpeg Transmission Over Mimo Systems” Proceedings of IEEE Asilomar Conf. on Signals, Systems, and Computers, pp. 1608-1612, Pacific Grove, CA, USA, Oct. 30 - Nov. 2, 2005.
[16] D. Song and C. W. Chen, “QoS Guaranteed Scalable Video Transmission Over MIMO Systems With Time-Varying Channel Capacity”, IEEE International Conference on Multimedia and Expo, 2007, pp1215-1218.
[17] M. Onishi, M. Izumi and K. Fukunaga, “Blackboard Segmentation Using Video Image of Lecture and Its Applications,” Proceedings of 15th International Conference on Volume 4, 3-7 Sept. 2000, pp. 615 - 618.
[18] A. Plotkin, Reconfigurable Hardware: FPGA Device, 2003.
[19] U. Meyer, “Digital Signal Processing with Field Programmable Gate Arrays”, Springer, Heidelberg 2004, pp.
[20] U. Vera, A. Meyer, A. Pattichis, M. Perry, "Discrete wavelet transform FPGA design using MatLab/Simulink" In Proceedings of SPIE The International Society For Optical Engineering, 2006, vol. 6247, pp 624-703.
[21] V.Tim, “Using MATLAB to Create IP for System Generator for DSP” In Xcell Journal, Fourth Quarter 2007
[22+ A. K. Jain, “Fundamentals of digital Image Processing” Prentice Hall Inc.
[23] R. C. Gonzalez, R. E. Woods, and S. L. Eddins, “Digital Image processing using MATLAB Pearson edition”, Prentice-Hall, 2004.
*24+ J. M. Shapiro, “Embedded image coding using Zerotree of wavelet coefficients,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3445-3462, December 1993.
[25+ C. Choudary and T. Liu, “Extracting content from instructional videos by statistical modelling and classification”, Pattern Analysis & Applications, 2006.
73
[26] P.W. Wolniansky, G.J. Foschini, G.D. Golden, R.A. Valenzuela. “V-BLAST: Ann Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel”. http://www.bell-labs.com/project/blast/, last accessed – 25th April, 2008.
[27] J. Du, Y. Geoffrey Li. “D-BLAST OFDM with Channel Estimation” EURASIP Journal on Applied Signal Processing 2005, pp 605–612.
[28+ S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE J. Select. Areas Commun., vol. 16, pp. 1451–1458, Oct. 1998.
[29+ G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” Bell Labs Technical Journal, 1996, vol. 1, no. 2, pp. 41–59.
[30] R.S. Blum, Y Geoffrey Li J. H. Winters,. Q. Yan “Improved space-time coding for MIMO-OFDM Wireless communications” IEEE Transactions On Communications, Vol. 49, No. 11, November 2001.
[31] V. Tarokh, H. Jafarkhani, A.R. Calderbank, “Space-time block coding for wireless communications: performance results” IEEE Journal On Selected Areas In Communications, Vol. 17, No. 3, March 1999.
*32+ J. V. Nichols, “OFDM: Old Technology for New Markets”. Wi-Fi plane, Nov 2002. From http://www.wi-fiplanet.com/tutorials/article.php/1500641.
*33+ A. Bruce et al. “Accelerated Image Processing on FPGAs”, IEEE Transactions on Image Processing, Dec.2003 Volume 12, Issue12, pp 1543- 1551
*34+ A. P. W. Böhm, et al, “Mapping a single assignment programming language to reconfigurable systems,” Supercomputing, vol. 21, pp. 117–130, 2002.
[35] http://www.cs.colostate.edu/~cameron/ last accessed 20 April 2008
*36+ Bruce Draper et. al., “Compiling and Optimizing Image Processing Algorithms for FPGA”, Proceedings of Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000. Pp. 222-231
[37] V. Rao Daggu and M. Venkatesan. (2004). “Design and Implementation of an Efficient Reconfigurable Architecture for Image Processing Algorithms using Handel-C. Celoxica Inc. research papers. http://www.celoxica.com/techlib/files/CEL-W040414XQ7-281.pdf
*38+ V. R.Daggu, et al “Implementation and Evaluation of Image Processing Algorithms on Reconfigurable Architecture using C-based Hardware Descriptive Languages” International Journal of Theoretical and Applied Computer Sciences (IJTACS). Available www.gbspublisher.com/ijtacs/1002.pdf
74
[39] A. Nelson,. Implementation of Image Processing Algorithms on FPGA Hardware. Masters Thesis, Graduate School of Vanderbilt University, 2000.
[40] S. Klupsch, et al. “Real Time Image Processing based on Reconfigurable Hardware Acceleration” Proceedings of IEEE Workshop Heterogeneous reconfigurable Systems on Chip, 2002
*41+ R.W. Hartenstein, J. Becker, R. Kress, H. Reinig, and K. Schmidt, “A reconfigurable machine for applications in image and video compression,” in Proc. Conf. Compression Technologies and Standards for Image and Video Compression, Amsterdam, The Netherlands, 1995.
*42+ D. Crookes et al., “Design and implementation of a high level programming environment for FPGA-based image processing,” IEE Proceedings on Vision, Image and Signal Processing, vol. 147, Issue: 4 , Aug, 2000, pp. 377 -384.
*43+ R. Maheshwari, S.S.S.P.Rao, and P.G. Poonacha, “FPGA implementation of median filter,” Tenth International Conference on VLSI Design, June, 1997, pp. 523 -524.
[44] A. Plotkin, Reconfigurable Hardware: FPGA Device,2003.
*45+ “FPGA-based FIR filter using bit-serial digital signal processing”, Atmel technical paper, Available www.atmel.com/dyn/resources/prod_documents/DOC0529.PDF
*46+ C. Dick, “The Platform FPGA: Enabling the Software Radio,” Software Defined Radio Technical Conference and Product Exposition (SDR), November 2002.
*47+ S. Choi, R. Scrofano, and V.K. Prasanna, J.W. Jang, “Energy-efficient signal processing using FPGAs”, Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays, pp. 225 - 234
*48+ R. Scrofano, S. Choi, and V. K. Prasanna, “Energy Efficiency of FPGAs and Programmable Processors for Matrix Multiplication,” Proceedings of IEEE International Conference on Field-Programmable Technology, 2002. (FPT). 16-18 Dec. 2002 pp 422 - 425
*49+ A. Telikepalli, E. Fiset, “Platform FPGA design for high-performance DSP”. White Paper Available http://www.lyrtech.com/DSP-development/technical_lib/form1_wp.php
[50] R. Duren, J. Stevenson and M.Thompson “A comparison of FPGA and DSP development environments and performance for acoustic array processing” 50th Midwest Symposium on Circuits and Systems, 2007. MWSCAS2007. pp 1177 – 1180.
*51+ M. Parker, “FPGA versus DSP design and maintenance” Altera Corporation, Technical Papers. Available http://www.techonline.com/learning/techpaper/199000540
[52] Celoxica, Abingdon, Oxfordshire, UK, DK Design Suite, Available http://www.celoxica.com/methodology/c2rtl.asp
75
[53] J. Gerlach and W. Rosenstiel. System level design using the SystemC modeling platform. In Workshop on System Design Automation, pages 185-189, Rathen, Germany, Mar. 2000.
[54] Xilinx Inc., Forge, www.xilinx.com/ise/advanced/forge.htm
*55+ P. Bellows and B. Hutchings. “JHDL — an HDL for Reconfigurable Systems,” Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines, pp. 175-184, April 1998.
*56+ J. Frigo, T. Braun, J. Arrowood, M. Gokhale “Comparison of High-Level FPGA Design Tools for A BPSK Signal Detection Application” Proceedings, SDR Forum Technical Conference, 2003.
[57] http://www.synopsys.com/C-level.html
*58+ Tim Vanevenhoven, “Using MATLAB to Create IP for System Generator for DSP” In Xcell Journal, Fourth Quarter 2007
[59] Petter Fandén, “Evaluation of Xilinx System Generator,” Master’s Thesis, Linköping University, Department of Science and Technology 2001
[60] M. Ownby, W.H. Mahmoud, , “A design methodology for implementing DSP with Xilinx System Generator for Matlab,” Proceedings of the 35th Southeastern Symposium on System Theory , 16-18 March 2003, pp. 404- 408
[61] G.E. Martinez-Torres J.M. Luna-Rivera R.E. Balderas-Navarro, “FPGA-Based Educational Platform for Wireless Transmission Using System Generator”, IEEE International Conference on Reconfigurable Computing and FPGA's, 2006. ReConFig 2006. Sept. 2006, pp. 1-9.
[62] User Stories, BAE Systems Available at:
http://www.mathworks.com/company/user_stories/userstory12386.html
[63] Saket Gupta, Sparsh Mittal, S. Dasgupta and A. Mittal, "MIMO Systems For Ensuring
Multimedia QoS Over Scarce Resource Wireless Networks", ACM International Conference On
Advance Computing, India, February 21-22, 2008.
[64] Sparsh Mittal, Saket Gupta, and S. Dasgupta, "System Generator: The State-Of-Art FPGA
Design Tool For DSP Applications", Third International Innovative Conference On Embedded
Systems, Mobile Communication And Computing (ICEMC2 2008), August 11-14, 2008, Global
Education Center, Infosys.
[65] Sparsh Mittal, Saket Gupta, and S. Dasgupta, "FPGA: An Efficient And Promising Platform
For Real-Time Image Processing Applications", National Conference On Research &
Development In Hardware & Systems (CSI-RDHS 2008) June 20-21, 2008, Kolkata, India.
76
[66] Saket Gupta, Sparsh Mittal and Sudeb Dasgupta, “Guaranteed QoS with MIMO systems for
Scalable Low Motion Video Streaming over Scarce Resource Wireless Channels”, International
Conference on Information Processing, ICIP, 2008. IK International Pvt Ltd.
*67+ Saket Gupta, Sparsh Mittal, Ankush Mittal “EureQA: Overcoming The Digital Divide
Through A Multidocument QA System For E-Learning”. The National Conference on Emerging
Trends in Information Technology, India, 2008.
*68+ Sparsh Mittal, Saket Gupta, Ankush Mittal, Sumit Bhatia “BioinQA : Addressing bottlenecks
of Biomedical Domain through Biomedical Question Answering System” International
Conference on Systemics, Cybernetics and Informatics (ICSCI-2008), India 98-103.
*69+ Sparsh Mittal, Saket Gupta, Ankush Mittal “BionQA Multidocument Question Answering
System: Enabling e-learning for masses” Journal of Engineering Students, CAFFET INNOVA
Technical Society, India pp. 28-37.