copyright by jacob s. schneider 2005users.ece.utexas.edu/~adnan/comm/js_ms_05.pdf · 2006-04-20 ·...

Copyright

by

Jacob S. Schneider

2005

Error Correction Logic for Wireless USB

by

Jacob S. Schneider, B.S.

Report

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Masters of Science in Engineering

The University of Texas at Austin

December 2005


Approved by Supervising Committee:

Dedication

To my wife Anne, who moved to the great city of Austin and always supported me

through the trials and tribulations of juggling school and work.

v

Acknowledgements

The author would like to acknowledge the significant guidance received from

Adnan Aziz, Mark McDermott, and Saf Asghar, as well as the contributions of project

members Dan Dankert, Sanjeev Gokhale, and Dimitry Patent. Finally, he would like to

give special recognition to Intel Corporation for providing the opportunity to pursue this

degree.

December 2005

vi

Abstract


Jacob S. Schneider, M.S.E.

The University of Texas at Austin, 2005

Supervisor: Adnan Aziz

This paper will present the motivations behind and actions taken to create a

wireless device compatible with Universal Serial Bus 2.0 (USB). This device is intended

to be used in portable devices needing a USB link to a host controller, serving as a

replacement for the normal wired transceiver. Integrating a small wireless transceiver

with standard USB 2.0 host, hub, and function controllers in lieu of the standard wired

connection would help to eliminate nests of wires without compromising the usefulness

of the broad range of designs that already conform to the USB specifications. Wireless

mice and keyboards can already be purchased that can connect to USB, but these devices

are all low speed human interface devices. The proposed transceivers would extend this

wireless capability to full-speed and high-speed USB 2.0 protocols; allowing for devices

such as disk drives, digital cameras, and others to connect wirelessly to a PC while still

utilizing the robustness of the USB protocol.

vii

Area and power savings were the two main focal points in implementing this

transceiver. A unique protocol layer was developed for this application to aid the

transmission and reception of various analog USB states. Both digital and analog clock

recovery systems were employed as well as an error correction block to aid in bit error

rate minimization. A simple ROM based CORDIC sine wave generation scheme was

employed for the reference clocks in the local oscillators. Emphasis was placed in the RF

front end to limit the number of discrete components needed to transmit and receive.

Finally, a combination of MatLab, Hspice, and VCS simulations were used to determine

and fine tune operation of both the digital and analog components.

This specific report will focus on the top level architecture and error correction

that was employed in this design. The error correction helps reduce the bit error rate that

occurs due to the wireless channel and noise from the various system components. It

does require some additional circuitry to perform the encoding and decoding, as well as a

few other design features to enable the use of desirable clock frequencies. The encoding

scheme employed here is a 1/3 convolutional code with Viterbi decoding.

viii

Table of Contents

List of Tables ...................................................................................................... x

List of Figures .................................................................................................... xi

ERROR CORRECTION LOGIC FOR WIRELESS USB 1

Chapter 1: Introduction....................................................................................... 1

1.1 Design Space ..................................................................................... 1

1.2 Overall Design Problem..................................................................... 1

1.3 Specific Design Problem.................................................................... 3

Chapter 2: Top-Level Architecture ...................................................................... 5

2.1 Desired Characteristics and Features: ................................................. 5

2.2 USB 2.0 Requirements....................................................................... 6

2.3 Transceiver Requirements.................................................................. 7

2.4 Clocking and Wireless Requirements ................................................. 8

2.5 USB Interface .................................................................................... 8

2.6 Top Level Details............................................................................. 10

2.7 Transmit and Receive Details ........................................................... 14

2.8 Clocking Details .............................................................................. 17

Chapter 3: Error Correction .............................................................................. 19

3.1 Introduction ..................................................................................... 19

3.2 The Convolutional Coder ................................................................. 22

3.3 The Viterbi Decoder......................................................................... 29

Chapter 4: Error Correction Simulation Results ................................................ 31

4.1 Matlab Simulation Results ............................................................... 31

4.2 Verilog Simulation Results............................................................... 36

4.2.1 Encoder Testing ...................................................................... 37

4.2.2 Encoder/Decoder Pair Test Results.......................................... 38

4.2.3 Decoder Error Correction Test Results .................................... 40

4.2.4 Encoder/Decoder Summary..................................................... 44

ix

Chapter 5: Project Integration and Conclusion.................................................. 45

5.1 Digital USB Interface and Error Correction Circuitry....................... 46

5.2 Clock Recovery Mechanism............................................................. 47

5.3 Low-Noise Amplifier, Power Amplifier, and Antenna...................... 49

5.4 Frequency Synthesizer ..................................................................... 50

5.5 Summary ......................................................................................... 51

Appendices........................................................................................................... i

A1 Acronym Definitions........................................................................... i

A2 Matlab Convolutional Code and Viterbi Decoding Source Code ........ ii

A3 Verilog Code for the Convolutional Encoder...................................... v

A4 Verilog Code for the Viterbi Decoder............................................... xii

Works Cited ................................................................................................... xviii

Vita .................................................................................................................. xix

x

List of Tables

Table 1: List of possible states .................................................................... 14

Table 2: Convolutional Codes for Various Constraint Lengths .................... 21

Table 3: Simulation Viterbi Decoding Simulation Results........................... 23

Table 4: Valid Encoding Outputs ................................................................ 27

Table 5: Encoder Input and Output Examples ............................................. 37

xi

List of Figures

Figure 1: System-Level Block Diagram.......................................................... 6

Figure 2: USB Transceiver Details [5]............................................................ 9

Figure 3: Wireless USB Transceiver Top Level Diagram ............................. 11

Figure 4: Encoder Block Diagram ................................................................ 16

Figure 5: Decoder Block Diagram................................................................ 16

Figure 6: Clock Selection Circuitry .............................................................. 17

Figure 7: Reset Detection Circuitry .............................................................. 18

Figure 8: Sample Convolutional Coder......................................................... 20

Figure 9: Simulink Viterbi Decoding Simulation.......................................... 23

Figure 10: Wireless USB Convolutional Coder .............................................. 24

Figure 11: Convolutional Coder State Machine .............................................. 25

Figure 12: Convolutional Coder Trellis Diagram............................................ 25

Figure 13: Example of Convolutional Encoding ............................................. 27

Figure 16: Encoding example......................................................................... 32

Figure 17: Encoded symbols modulated by QPSK ......................................... 33

Figure 18: Transmitted vs. Received Data ...................................................... 34

Figure 19: Original vs. Decoded symbols ....................................................... 35

Figure 20: First Packet Encoding ................................................................... 38

Figure 21: Second Packet Encoding ............................................................... 38

Figure 22: First Packet Decoding ................................................................... 39

Figure 23: Second Packet Decoding ............................................................... 40

Figure 24: Correcting a Single Error .............................................................. 41

Figure 25: Two Non-Simultaneous Errors ...................................................... 42

xii

Figure 26: Correcting Two Simultaneous Errors............................................. 43

Figure 27: Two Simultaneous Errors That Are Not Correctable...................... 43

1

ERROR CORRECTION LOGIC FOR WIRELESS USB

Chapter 1: Introduction

1.1 DESIGN SPACE

The design space that this project focuses on is the Universal Serial Bus (USB)

2.0 domain. More specifically, it focuses on the implementation of a wireless transmit

and receive scheme that adheres to all USB 2.0 protocols as defined in the Universal

Serial Bus Specification Revision 2.0 [5]. Additional efforts were made to incorporate

some power reduction techniques to broaden the range of products that might use it.

Furthermore, efforts were made to simplify the device interface with the host controller

so that future changes to the protocol would result in minimal changes to the interface.

1.2 OVERALL DESIGN PROBLEM

As with any design, it is important to step back and identify the problem that is

being solved. The problem was first identified when looking at the computer setup in one

of the designer’s houses. The spider web of USB cables running between the CPU box

and peripheral devices had expanded beyond control. The need for a solution quickly

became apparent; a generic wireless USB device that could be substituted for cables.

After doing some preliminary research, it became apparent that a generic wireless USB

solution did not exist in the marketplace. Several questions came to light. First, how

would this device need to function? How would this device be powered? How could this

device be made desirable in the marketplace? What wireless transmission scheme would

2

be best suited for this device? Answering these questions and more is the basis for this

report.

First and foremost, the device needs to adhere to the USB 2.0 protocol [5].

Additionally, the device would need to use a limited amount of power so that it could

either be powered by the USB power bus (ideally) or by a single AA sized lithium ion

battery. Any additional power needs would require a more bulky power supply or wires;

both of which would decrease its popularity in the marketplace.

This device also requires a transmitter, receiver, antenna, clock recovery

mechanism, error correcting scheme, low noise amplifier (LNA), and an algorithm for

sine wave generation. This set of devices would be the minimum required regardless of

the transmission scheme that was used. Once a transmission scheme has been chosen, the

above devices can be designed to best fit the system.

Since this device will be wireless, it needs a modulation scheme and quadrature

phase shift keying (QPSK) will be used for this solution. Why was QPSK chosen as the

modulation scheme over other schemes? The answer is that QPSK offers a good balance

between the number of symbols it can transmit at a time and the modulation complexity

required to implement it. While binary phase shift keying (BPSK) offers a large amount

of distance between the symbols it transmits and is a very simple modulation scheme, it

can only transmit one symbol per period. QPSK, on the other hand, can transmit two

symbols per period with only a slightly more complex modulator, essentially halving the

clock speed needed to transmit a given chunk of data compared with BPSK. There are

other modulation schemes that can transmit more symbols per period, but they come with

added modulator complexity. Also, the frequency reduction achieved by transmitting

three symbols per period compared with two is only 33%, whereas going from one

symbol to two offered a frequency decrease of 50%. The frequency reduction percentage

3

only decreases as more symbols are transmitted per period, while the complexity of the

modulator increases and the distances between distinct symbols decreases. As the

distance between symbols decreases, the probability of channel noise causing a different

symbol than the one transmitted to be received increases. Hence, QPSK was chosen as it

offers an optimal balance between modulation complexity, number of symbols

transmitted per period, and distance between distinct symbols [2].

The design needs an upper level protocol that would adhere to the host controller

specifications. By choosing to implement this in a simple digital controller, the design

can be completed using simple digital building blocks, aiding the speed of design and

validation. In addition, prior to the wire in a current USB transceiver, all of the signals

are digital. By eliminating the wire and utilizing the USB controller’s digital signals, the

use of analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) can

be avoided, as can the added complexity that they create. One other advantage to the

digital controller is that it can easily be updated should any future USB protocols require

different specifications, providing a quick path to creating future product iterations.

Each of the above requirements presents a different challenge to the design team.

By attacking these problems from many different angles, solutions were found to each of

these problems that were tailored to best suit the design. Additionally, by providing

innovative and unique solutions to these problems, the design could be marketed as an

attractive solution in the market place.

1.3 SPECIFIC DESIGN PROBLEM

Claude Shannon described the basis of modern communication systems as a

system composed of an information source, a transmitter, a channel, a receiver, and a

destination [3]. The main requirement of this system is that the symbol at the source and

4

the destination match. This channel has some maximum transmission rate based on the

probability of an information symbol passing through it. If this channel is noisy, errors

can occur as the noise can cause the probability of two symbols occurring to overlap at

the receiver. By changing the channel to a correction channel that provides an encoder

prior to the transmitter and a decoder after the receiver, the probabilities of two distinct

symbols occurring can be pushed apart, lowering the error rate seen by the system.

The device presented in this report will be assailed by noise from many sources:

noise from the power amplifier on the transmit path and the low noise amplifier on the

receive path, jitter from the clocking domains, phase mismatches in the clock recovery

scheme, noise from the QPSK modulator and demodulator, quantization error on the sine

wave generation, multi-path signals, interference, and others. On the average, these noise

sources may not affect the sequences that are being transmitted and received, but

occasionally they may cause an error to occur, resulting in some portion of the sequence

being received incorrectly. By adding some error correction to this system, essentially

the channel is converted into a correction channel, allowing for some of these noise-

related errors to be eliminated. The error correction in this device will be accomplished

through convolutional encoding with Viterbi decoding.

5

Chapter 2: Top-Level Architecture

2.1 DESIRED CHARACTERISTICS AND FEATURES:

The wireless transceiver serves as an interpreter; translating the outputs of a USB

device into the desired wireless protocol for transmission and performing the opposite

translation during reception. Ideally, the transceiver simply replaces the wired

transceiver utilized in USB 2.0. It bolts onto the USB function, hub, or host controller

with minimal integration. Another solution would allow for the transceiver to simply

plug into the USB receptacles on the host and functional devices. The device supports all

forms of USB transmission, including low, full, and high speed data rates. It utilizes the

power of the USB bus wherever possible to minimize the use of external power sources.

Ideally, the transceiver would support numerous functional devices when attached

to a hub or a host, but this is not a requirement of this first pass system. Each of the

components in this system has its own requirements. Figure 1 details some of the major

sub-blocks of this system. The USB interface is represented by the host controller. The

transceiver contains the digital interface, the error correction blocks, and QPSK blocks.

The clocking block consists of the clock recovery, direct digital frequency synthesizer,

and the phase-locked loops. The wireless block consists of the two power and low noise

amplifiers, filters, and the antenna. The bulleted lists below mention the requirements

briefly, and they will be expanded upon later.

6

Figure 1: System-Level Block Diagram

2.2 USB 2.0 REQUIREMENTS

The list below illustrates the specifications that must be met to enable USB 2.0

[5].

• USB 2.0, including low, full, and high speed data transmission must be supported.

This means data can be transmitted at 1.5 Mb/s, 12 Mb/s, and 480 Mb/s.

• Device connection, disconnection, reset, and suspend should not be impeded by

the wireless protocol.

• All data normally transmitted through a wired USB connection must be supported

through the wireless connection.

• Data high and low values for all speeds (referred to as differential 1, differential 0,

also called J and K states)

7

• Single-ended zero values for all speeds which are used to indicate a reset

condition.

• Chirp J and Chirp K states utilized during reset to reset devices into the high

speed mode

• Squelching of invalid data when in high speed mode

• Device disconnection in high speed mode

• Total delay from USB transmit to USB receive (through the wireless protocol)

cannot exceed the maximum allowable USB cable delay of 30nanoseconds

• Total current consumption of wireless transceiver plus the attached function

controller cannot exceed a current of 200mA drawn from the USB controller.

2.3 TRANSCEIVER REQUIREMENTS

The list below illustrates the requirements of the wireless receiver.

• Must provide some simple error correction.

• Attach easily to the controller portion of the USB device. Must be attached where

the wire would normally attach and support digital controls, allowing it to be

turned off and on by the state of the USB controller.

• Finite state machine that performs the translation from USB to wireless and back.

All of the states mentioned in the USB requirements above must be accounted for.

• Some knowledge of reset is needed to allow for the proper transmission of the

chirp J and chirp K states. A counter will be used to determine how many

consecutive single-ended zero (SE0) states have been received and will indicate a

reset accordingly.

• Two extra bits are used to indicate the start of a transmission packet. This

becomes quite helpful when the packet is received and must be decoded.

8

• Should take care of switching between full speed and high speed transmission

rates. Low speed devices cannot connect as high speed devices but high speed

devices need to connect as full speed before indicating it can utilize high speed

data rates.

2.4 CLOCKING AND WIRELESS REQUIREMENTS

The list below illustrates the clocking and carrier frequency requirements.

• Must support clock synchronization through the wireless channel

• Need to support 1.5 Mb/s 12 Mb/s, and 480 Mb/s transmission rates, as well the

wireless carrier wave, ideally 3.6 GHz.

• If utilizing current state transmission and start/end of packet transmission, clock

speeds of 3 times the various transmission rates must be supported/generated.

• Use some sort of digital frequency synthesis for carrier frequency generation.

Otherwise, use a PLL or off-chip crystal to generate the carrier frequency.

2.5 USB INTERFACE

The USB interface shown in Figure 2 below is taken from the USB 2.0

specification [5] and it was the starting point for interfacing the transceiver with USB.

The transceiver design basically removes the wire and the various resistors, but still has

to look functionally similar to the USB host controller. The USB signals that must be

accounted for are below the figure, and these signals must be accounted for in the

wireless transceiver.

9

Figure 2: USB Transceiver Details [5]

The list below illustrates the signals required in a USB transceiver.

• Rpu_enable – pull up resistor enable. Not needed, as there are no resistors in the

wireless transceiver

• HS_Current_Source_Enable – enables high speed current source of the high speed

transmitter. Used in conjunction with HS_Drive_Enable to indicate high speed

data rates.

• HS_Drive_Enable – signal to enable high speed data transmission

• HS_Data_Drive_Input – High speed data stream for transmission.

• LS/FS_Data_Drive_Input – Low or full speed data stream for transmission

10

• Assert_Single_Ended_Zero – Asserts a single ended zero on the output of the low

or full speed transmitter

• FS_Edge_Mode_Sel – Chooses low speed or full speed data rates for LS/FS

transmitter

• HS_Differential_Receiver_Output – data stream from the receiver during high

speed operation

• Squelch – Utilized during high speed operation to indicate that invalid data has

been received (in wired operation, the data was below the expected differential

thresholds)

• LS/FS_Differential_Receiver_Output – data stream from the receiver during low

or high speed operation

• HS_Disconnect – Utilized during high speed operation to indicate that a device

has been disconnected

• SE_Data+_Receiver_Output – D+ signal used when single-ended data is received

(SE one is not allowed).

• SE_Data-_Reciever_Output – D- signal used when single-ended data is received

(SE one is not allowed).

2.6 TOP LEVEL DETAILS

The top level block diagram of the wireless transceiver in the figure below shares

a substantial portion of the interface with the USB controller as the transceiver shown

above from the USB specifications. This diagram does not include the complex registers

that connect to the ECC blocks. Those registers will be handled later in Figures 4 and 5,

as will all the components in the dotted line box in the bottom-right corner.

11

Figure 3: Wireless USB Transceiver Top Level Diagram

There is a circuit in the upper-left corner of Figure 3 that handles the high-speed

transmission, converting a 1 on the HS_Data_Driver line to 1 on Data+ and 0 on the

Data- terminals, and vice versa if a zero occurs on the same line. There is a similar

circuit that handles low speed and full speed transmission, converting a 1 into a Data+ =

1 and Data- = 0 for a 1 to be transmitted and vice versa for a 0 in full speed mode. The

opposite translation is done if low speed mode. The receiving circuitry to translate the

received data on the Data+ and Data- lines into the correct value to be sent to the host

controller is basically the same as shown in the USB specification transceiver. The only

12

addition is the RxEn gating signal that enables the output of the receiving circuitry only

when a valid packet reception has occurred.

There are a number of signals in the wireless transceiver that are not in the USB

specification transceiver.

• RxEn – Indicates that the receiver and receiver buffer should be enabled. This

occurs when none of the transmitter drivers are enabled by the USB controller.

• TxEn – Indicates that the transmitter and transmission buffer should be enabled.

This occurs when the USB controller indicates that either the HS or LS/FS driver

should be enabled. This also causes deactivation of the receiving circuitry.

• TxReset – Detects a reset state during transmission of 3.0 ms or more of state

SE0. Note that the counters that indicate the duration must have knowledge of the

current transmission speed to accurately detect this 3.0 ms.

• RxReset – Detects a reset state during reception of 2.5 us or more of SE0

• Reset – Indicates a reset state (either transmit reset or receive reset)

• LS_Clock_3x – Clock running at 3x the speed of LS transmission (4.5 MHz)

• FS_Clock_3x – Clock running at 3x the speed of FS transmission (36 MHz)

• HS_Clock_3x – Clock running at 3x the speed of HS transmission (1.44 GHz)

• Clock_3x – Clock running at 3x the speed of the current USB mode clock. This

is chosen based on what speed the USB controller is transmitting with.

• Clock_1x – Clock running at the speed of the current USB mode clock. This is

derived from the Clock_3x signal using a divide by 3 clock divider.

• TxChirp – Indicates that the J or K state being sent is actually a chirp signal, as

the device is in reset.

• RxChirp – Indicates that a chirp has been detected.

13

• RxPacket – Indicates that a packet has been successfully received, due to the

presence of a 1 in both the RX0 and RX1 flops in addition to the receiver being

enabled due to the assertion of RxEn.

Many of the signals in the transceiver are dependent upon the states being

transmitted or received. There are numerous states possible in the USB 2.0 architecture.

In low speed and full speed, there are 2 main states, the J and K state, which correspond

to either 01 or 10 on the Data+ and Data- lines. There is also a single-ended zero state

(SE0), which is denoted by zeros on both data lines. The high-speed state also has J, K,

and SE0 states. Since there are 3 main states for all transmission modes, it makes sense

to use 2 bits to denote the state transmitted. However, the high-speed mode also has

some extraneous states that are possible. When coming out of reset in high speed mode,

the host controller will broadcast chirp J and chirp K states, which in a wired solution

have a larger voltage swing than normal, but in the wireless solution will require an extra

bit to be sent, indicating that the device is chirping. Also in the high-speed state, the

transceiver must also indicate that data was being squelched, or that a device is

disconnecting. Between chirping, squelch, and disconnect, there are an additional three

states that must be accounted for.

To allow for error correction to work without increasing the clock frequency, six

USB transmissions are buffered together before transmission. This means that there are

twelve bits of USB data that need to be transmitted in each packet. To account for the

squelch, disconnect and chirp states, it would seem that there should be an additional 12

bits transmitted to indicate whether any of these states occurred during the transmission

of the USB data. However, since these states are all persistent states, meaning they most

likely occurred for a long string of USB states, only two bits are sent. If a squelch,

14

disconnect, or chirp occurred in any of the six USB transmissions, these bits are set

appropriately. Therefore, each packet will consist of sixteen bits: two header bits to

indicate start of transmission, twelve USB data information bits, and two chirp,

squelch/disconnect bits. The header consists of two consecutive ones, and the rest of the

bits are guaranteed to never repeat that sequence (once the data stream is partitioned for

QPSK modulation). This feature is to help the receiver realize when the start of a packet

occurs.

Below is a list of the possible states, if only one USB bit were packetized (six bits

sent instead of sixteen). The list of possible states for the six USB buffered packet is too

long to list here.

Header D+ D- S/D Chirp Definition

1 1 0 0 0 0 Single-Ended 01 1 1 0 0 0 Differential 11 1 0 1 0 0 Differential 01 1 0 0 1 0 Squelch (invalid range)1 1 0 0 1 1 Disconnect Detected1 1 1 0 0 1 Chirp J State1 1 0 1 0 1 Chirp K State

Table 1: List of possible states

There is a register that will capture these sixteen bits and pass them along to the

ECC block sequentially. The ECC block encodes the sixteen bits using convolutional

coding, and the encoded sequence is passed along to the QPSK transmit block, which

modulates the signal.

2.7 TRANSMIT AND RECEIVE DETAILS

After QPSK modulation, the modulated signal is filtered and sent to the power

amplifier, where only the desired frequency is amplified before being sent to the antenna

15

for transmission. On the receiving side, the encoded packet will pass through a filter, a

low noise amplifier, and then pass through the QPSK demodulator, which passes the

received sequence on to the ECC decoder (Viterbi decoder). The Viterbi decoder will

reconstruct the original packet from the received packet. When the packet is received,

the various bits will be used to create the RxPacket, squelch, disconnect, and data signals,

which will be passed on to the USB controller on the receive side.

Figure 4 focuses on the register shown in the top-level diagram that precedes the

ECC block on the transmit path. This register packetizes the data that needs to be

encoded. There are four six-bit shift registers that capture the information on the Data+,

Data-, S/D, and Chirp lines on the 1x clock. These registers are enabled only when the

device is transmitting, and the various clocks will be discussed later. The outputs of the

data shift registers are fed sequentially to two inputs of a six-input multiplexor. The

logical OR of the outputs of the S/D shift register is sent to another input of the six-input

multiplexor, as is the logical OR of the outputs of the Chirp shift register. The final two

bits of the six-input multiplexor are tied to logic one, and represent the start of packet

information.

There is a small state machine that iterates through which multiplexor input drives

the output to the ECC encoder. This state machine is triggered on rising edges of the 3x

clock, ensuring all eighteen bits (sixteen data bits and two flush bits) are encoded in six

1x clock cycles. The eighteen bit input packet is encoded into a 36 bit packet. The

encoder also partitions the data into two data streams of 18 bits that are sent into the I and

Q inputs of the QPSK modulator.

16

Figure 4: Encoder Block Diagram

Figure 5: Decoder Block Diagram

17

On the receive side, the two outputs of the QPSK demodulator are sent straight to

the Viterbi decoder on the 3x clock. The Viterbi decoder takes these two 18-bit

sequences and decodes them into a single 16-bit sequence that contains all of the original

data. It then sends this data to the USB logic in packets of six bits that look like those

shown in Table 1. Not shown in Figure 5 are the filter and LNA that precede the

demodulator, nor the details of the Viterbi decoder, which is covered later.

2.8 CLOCKING DETAILS

Figure 6: Clock Selection Circuitry

USB 2.0 requires support of three different data speeds: 1.5 Mb/s, 12 Mb/s, and

480 Mb/s (Compaq). For purposes of this project, these are defined as the 1x clocks.

The transmit and receive registers as well as the ECC blocks require a clock that is three

times this frequency. All of these clocks are generated off of the same, high frequency

clock that is synthesized by a phase-locked loop, or PLL. This PLL clock is then divided

down to the 3 possible 3x clock frequencies (4.5 MHz, 36 MHz, and 1.44 GHz). As

portrayed in Figure 6, one system level 3x clock is chosen depending on the mode of

operation, and this system level 3x clock is passed through a divide by three circuit that

creates the system level 1x clock, which is the same speed as the USB data rate.

18

Figure 7: Reset Detection Circuitry

There are two reset conditions defined by the USB specification [5] and a circuit

to detect these conditions is shown in Figure 7. If a device is in transmit mode, and it has

transmitted more than three milliseconds worth of the single-ended zero (SE0) state, the

transceiver needs recognize the reset state. If the device is in the receive mode, and it has

received more than 2.5 microseconds of SE0, then it must realize the transmitting device

is in reset. These two possible reset conditions are detected using counters, based on the

1x clock. If a non-SE0 state occurs, the counters are reset. If the counters reach either 3

ms on the transmit side or 2.5 us on the receive side, then the reset signal is enabled.

19

Chapter 3: Error Correction

3.1 INTRODUCTION

In the realm of wireless communications, one of the most difficult challenges is

minimizing the effects of noise throughout the solution. Noise can be contributed in

many aspects of a design; from thermal noise to channel noise to interference to multi-

path and others. There are various ways to minimize the effects of these noise sources,

including the use of carefully designed low noise amplifiers to limit the noise to only a

small region around the desired frequency of operation as well as transmitting numerous

carrier frequency periods per transmitted bit. One other common feature to prevent these

errors is the use of error-correcting codes (ECC), with one of the most common ECCs

being convolutional coding with Viterbi decoding.

Convolutional codes generate some n number of output bits, based on a stream of

k input bits. The n output bits are generated from a combination of the current k input

bits and L previous sets of input bits. L is referred to as the constraint length, and it

represents how many time steps of inputs are convolved. For example, if the constraint

length is three, and the number of input bits per time step is two (k = 2), the number of

bits that need to be saved is four. The bits that combine to produce the output bits in this

case are the two current bits, the two bits from the previous time step, and the two bits

from two time steps previous. Convolutional coding schemes are generally referred to as

k/n codes. If the number of output bits generated in the example above is three, the code

would be referred to as a 2/3 code, as there are three output bits generated per two input

bits.

20

The code is generated by combining certain sequences of the current bit and

stored bits. The outputs are generated by an XOR or addition of a certain sequence of

bits, similar to the example below [9].

u1

v1

v2

u1 u0 u-1

v3

(1,1,1)

(1,0,1)

(0,1,1)

Figure 8: Sample Convolutional Coder

Figure 8 shows a 1/3 code: there are three output bits for every input. The first

output is an addition of the current bit and the two previous bits, the second output is a

combination of the two previous bits, and the third output bit is a combination of the

current bit and the bit from two time steps earlier.

The sequences used to generate the output bits cannot be chosen haphazardly.

They should ensure that the output bits have as large of a Hamming distance from each

other as possible which results in lower error rates [4]. Generating sequences can be

generated with computer simulations. However, for the purposes of the ECC used in this

project, the sequences for a ½ rate code and various constraint lengths can be used

directly from Table 2 [9].

21

Constraint Length G1 G23 110 1114 1101 11105 11010 111016 110101 111011

Table 2: Convolutional Codes for Various Constraint Lengths

Once generator sequences are chosen, it is easy to create a state diagram (also

referred to as a “trellis diagram”) to illustrate how the inputs are translated into the

outputs. The key idea is that every possible input sequence encodes into a unique set of

outputs, and all possible output sequences will differ from each other in as many bits as

possible.

There are a couple of ways to decode convolutionally encoded data. One option

is to use sequential decoding, where the bits are compared against possible states as they

are received. If the received bits differ from what is possible during the sequence, a state

machine will increment the number of errors received and makes a choice as to what the

input bits most likely were. It continues in this fashion until either it reaches the end of

what is received or until a certain threshold of errors is reached. If there are too many

errors, the decoder backtracks until it finds a path that minimizes the number of errors

when compared to possible sequences [9]. However, the problem with this decoding

technique stems from how long it takes. If the decision making in the face of an error is

poor, there can be lots of backtracking, which can take a lot of time. If the number of

errors allowed before backtracking is small, it leaves the door open for higher error rates.

Therefore, another method of decoding, called maximum likelihood decoding (one flavor

of which is Viterbi decoding) will be used.

Viterbi decoding examines the entire received sequence, possibly chopping it into

smaller pieces first, computing how much in common each received piece is in

22

comparison with all the valid sequences [4]. Possible received paths each have some

metric assigned to them, usually using the Hamming metric, showing how close to valid

paths the possible paths are. As each bit is received, the possible paths increase in length,

and each new path is updated with a new path metric. The number of possible paths can

quickly balloon to large values, but the list is continually pruned. At each state of the

decoder, many paths start to overlap, and only the ones that are closest to valid paths are

retained in each stage. Therefore, the total number of paths kept at any one time is the

same as the number of states in the encoder and decoder.

3.2 THE CONVOLUTIONAL CODER

Once deciding to use convolutional decoding, an appropriate rate code must be

chosen. Since multiple bits will need to be transmitted per USB transmission, and the

high-speed USB rate is 480 MHz, choosing a code with few output bits per input bit will

help keep the needed clock frequency at a reasonable level. Therefore, the desired code

will be a ½ code.

Now that a code rate has been settled upon, the choice of constraint lengths is

necessary. A constraint length of three requires only a few states and is pretty simple to

decode, but that alone is not enough reason to settle on a constraint length of three. Some

simulations were run on the Simulink setup shown in Figure 9, varying the constraint

length from three to six.

23

Figure 9: Simulink Viterbi Decoding Simulation

The results of the simulations are summarized in Table 3:

Constraint Length G1 G2 BER

3 110 111 0.006174 1101 1110 0.003235 11010 11101 0.004346 110101 111011 0.00061

Table 3: Simulation Viterbi Decoding Simulation Results

The channel used in this case essentially just adds white, Gaussian noise to the

input, and the modulation scheme used was BPSK. As the chart shows, using constraint

length of 3 provides a BER of 0.6%. It decreases going to four and six, although

interestingly, the length of five is actually higher than a constraint length of four. The

minimum BER occurs, not surprisingly, with a constraint length of six that proves a BER

of 0.06%. So, choosing a constraint length of three is not the best option, but still

provides a pretty viable alternative, as the project will actually be using QPSK, which

should cut the BER in half. The SNR in this case was 5 dB, so it was a pretty small

signal-to-noise ratio (probably more noise than the environment this device will be used

24

in). There are higher level protocols that could be used for retransmission should BER

prove problematic, although they are not included in this design.

Using the generator codes of G1=110 and G2=111, the basic encoding structure

looks like Figure 10:

Figure 10: Wireless USB Convolutional Coder

Basically, the first output bit is generated from the current bit and the previous

one, while the second output bit is generated from the current bit and the two previous

ones. In discussing the encoder, it is easiest to discuss it in terms of states. The states of

the encoder are determined by the storage elements, which are U0 and U-1 in the diagram

above. If the sequence of inputs received had been 0, 1, 1, the current bit would be 0, and

the state would be 11.

The state diagram for the encoder is shown in Figure 11.

25

Figure 11: Convolutional Coder State Machine

Another way to view the structure of the encoder is via a trellis diagram, as shown

in Figure 12.

Figure 12: Convolutional Coder Trellis Diagram

26

The structure of the encoder ensures that the two outputs generated from each

input bit will differ in both bits: 11 vs. 00 or 01 vs. 10. This creates a greater Hamming

distance between the possible outputs, lowering the error rate [4].

Walking through either the state diagram beginning in state 00, if a 0 is received,

all of the stored bits and current bits are 0, so the output bits are 00: V1 = 0 + 0 + 0 and

V2 = 0 + 0 and the next state is state 00. Similarly, if a 1 is received while in state 00, the

output bits are 11: V1 = 1 + 0 + 0 and V2 = 1 + 0 and the next state is state 10. State 00

will always have the same behavior as mentioned above, so assuming a 1 is received, the

current state is 10. Now, if a 0 is received, the output bits are 11: V1 = 0 + 1 + 0 and V2

= 0 + 1, and the state machine progresses to state 01. However, if a 1 were received, the

output bits would be 00, as in modulo 2 arithmetic V1 = 1 + 1 + 0 = 0 and V2 = 1 + 1 = 0

and the state machine progresses to state 11. It should be easy to follow the trellis

diagram now, starting from state 00. The dotted lines indicate that a 1 was received while

in the current state while the solid lines indicate the input bit was 0. The text above the

lines show the bit received and, in parentheses, the corresponding output bits.

To show an example of encoding, the information in Table 1 will be used, even

though the actual packets will be 16 bits, not the 6 from the table. Although there are six

bits that are sent from the USB/logic interface per USB transmission, in reality there are

only nine possible values that this six bit packet could be. The first two bits will always

be ones, indicating the start of a packet, and the next two pairs have the limitation on

them that they cannot be all ones. The nine possible input packets and the encoded

outputs are shown in Table 4.

27

Valid Input Sequence

Encoded Output Flush Bits

110000 110010010000 0000110001 110010010011 1101110010 110010011111 0100110100 110010101101 0000110101 110010101110 1101110110 110010100010 0100111000 110001100100 0000111001 110001100111 1101111010 110001101011 0100

Table 4: Valid Encoding Outputs

To explain how the encoded output is obtained, and the significance of the flush

bits, translation of 110101 will be explained, using the trellis diagram of Figure 13.

Figure 13: Example of Convolutional Encoding

Walking through the diagram, a 1 is received in the initial state 00. This causes a

transition to state 10, with output bits 11. In state 10, a 1 is received, causing a transition

to state 11 and creating outputs 00. In state 11, a 0 is received, creating outputs 10 and

changing the state to 01. Now, a 1 is received, transitioning the state to 10 with outputs

of 01. The next two inputs are 0 and 1, causing outputs of 11 and 10, respectively, and

28

the sequence has reached the grey box. So far, the input sequence 110101 has created

outputs of 110010101110, which matches the table above. The grey box represents the

outputs that are generated as the state machine is flushed. After the last input bit (a one

in this case) is received, it is moved into the state machine. To flush any residual data

from the packet from the state machine, 2 zeros are passed into the state machine. The

purpose of these zeros is to reset the state machine to state 00, but as each zero moves

through, they create more output bits [9].

Are these flush output bits worth transmitting, or should they just be discarded?

Well, in order to correct all patterns that contain n or fewer errors, each code must differ

from all other codes by greater than 2n+1 or more positions [4]. Without the flush bits,

the encoded outputs only differ from each other by 2 or more bits. That means that

without the flush bits, the encoding scheme cannot guarantee correcting ANY error, it

can only detect them. With the flush bits, all codes differ from each other by at least 4

bits, meaning that the scheme can guarantee to correct all single bit errors and detect all

double bit errors. These details are for hard-decision decoding, which uses absolute

voltage thresholds to determine the bit received. With soft-decision decoding, which uses

conditional probabilities depending on the magnitude of the voltage received and the

previous bit received [9] and is not used in this project, the scheme could come very close

to fixing most 2 bit errors. So, the flush bits are worth keeping around, even in the

project case, where 16 bits are already in each packet. In addition, by adding the 2 flush

bits, the total number of encoded bits to transmit jumps from 32 to 36, which is divisible

by three, and easily transmitted using the 3x clock.

29

3.3 THE VITERBI DECODER

At first glance, there are an overwhelming number of possible paths to compare

the input to. For the 36 bit packets that are transmitted, there would be 236 possible

packets (more than 68 billion possibilities). However, with the constraints that have been

put on the possible inputs to the encoder (the first 2 bits are always one, each USB packet

only has 3 states instead of 4, etc.), the number of valid packets is still 37, which is still

over 2000 valid packets. Comparing the values to these 2000+ valid packets could get

overwhelming very quickly. Fortunately, the Viterbi decoder is designed to avoid this

blowup. As each pair of bits is received, it prunes down the number of possible

sequences to one per encoder/decoder state: the most likely received packet per state [4].

Since there are only four possible encoder/decoder states in this solution, the decoder cuts

the 2000+ possible packets down to the four most likely received codes. Once the full

packet has been received, the decoder can quickly choose the most likely received packet

from these four possible packets and pass on the result to the rest of the receiver.

There are three main sub-blocks to the Viterbi decoder used in this project: the

path-decision block, the path and error updating block, and the path-output block. The

path-decision block is responsible for deciding what the two bits received from the QPSK

demodulator on every 3x clock cycle were supposed to be. It has a decision-making unit

for each state. For two bits, there are four possibilities for each QPSK output. However,

only two of those four represent a sequence that could put the state machine into a given

state. For example, from Figure 11, only states 00 and 01 could result in the next state

being state 00, and those transitions would have produced bit sequences of 00 or 01,

respectively. So, the decision-making unit of state 00 compares the two bit input to 00

and 01, and decides which one causes fewer errors. If the 00 sequence is more similar to

the received bits, it decides that state 00 was most likely the preceding state, so it shifts

30

the 00 values into the LSBs of the state 00 path register. If 01 is deemed more correct,

then state 01 was most likely the preceding state. Therefore, the path from state 01 is

shifted into the MSBs of state 00, while 01 is shifted into the LSBs. There is a similar

path-decision block for each of the four states. For state 01, it compares the input stream

to 10 and 11, for state 10 it compares it to 10 and 11, and for state 11 it compares it to 00

and 01. The registers are updated in a similar manner for all states as explained for state

00.

The path updating mentioned above takes place as part of the path and error

update block. Once the paths for each state have been updated, the errors must be

updated. The decoder already has computed how many errors occurred for the given two

bit input that is shifted into a state’s path, so it just adds that to the running total of errors

for that state. The other thing that happens in this block is on-the-fly decoding of the

input sequence according to a state’s current path. Since the decoder knows which state

the input sequence is coming from, and knows the bits, it can easily determine what the

input bit was that generated those bits. So, the decoded output for each state is computed

during path updating as well.

Finally, once the whole packet has been received, there are four paths from which

to choose the output; one from each state. The choice is made by comparing the error

totals that have accumulated for each state. Obviously, the one with the fewest errors is

chosen, and the decoded data that corresponds to the chose state is sent out in six bit

parcels to the receiver. The six bits consist of the header bits, the D+ and D- bits, and the

S/D and Chirp bits. The full 16 bits of original data is sent in six 1x clock cycles,

matching the speed at which it was created by the transmitter and encoder.

31

Chapter 4: Error Correction Simulation Results

4.1 MATLAB SIMULATION RESULTS

The first step in designing the convolutional code is to determine the effect of the

constraint length on the bit error rate. While longer constraint lengths can provide better

bit error rates, they also require more bits to be encoded per transmission, which then

requires a faster clock frequency. Therefore, for this study, constraint lengths of three,

four, and five were compared. The generating sequences used are listed in Table 3.

Figure 15: Bit Error Rate vs. SNR for varying constraint lengths

32

The bit error rate for lengths four and five are almost identical, while that of the

length three is a little worse. However, the benefits of the length three code supersedes

the fact that it has a little higher bit error rate. Using the length three code allows for a

slower clock speed, and allows for a simpler decoder, saving both power and area.

Besides, the error rate at high SNR values, most like the operating region of this device,

are still quite miniscule for the length three code.

Figure 16: Encoding example

Figure 16 shows an example of encoded data. The top plot is the original data,

while the bottom is the encoded data. Since the code has two generating sequences, there

are two encoded bits produced for each bit in the original data sequence.

33

Figure 17: Encoded symbols modulated by QPSK

An added bonus to the code generating two bits per bit in the original sequence is

that this can easily interface with the QPSK modulation algorithm. The QPSK algorithm

requires two inputs, one for the in-phase output and one for the quadrature output. So, in

essence, one bit of the original sequence will be encoded and map to both inputs of the

QPSK algorithm in every time slot. Figure 17 shows how the previously generated

encoded signal maps to the QPSK inputs.

34

Figure 18: Transmitted vs. Received Data

The channel noise in these Matlab simulations is modeled as simple, white

Gaussian noise. Figure 18 shows the differences between the data that was transmitted

through the channel and the data that was received after the channel. Since such a short

time period is shown, a low SNR (5 dB in this case) was used to show what happens

when errors occur. The transmitted bits are marked by an ‘x’ , while the received are

marked by an ‘o’ . Three total errors occur in this sample: a single error at time 6, and

consecutive errors around time 43.

35

Figure 19: Original vs. Decoded symbols

So, what happens to the errors that occur in the received data depicted in Figure

18? Well, as shown in Figure 19, the single error that occurs around time 6 disappears.

However, the two consecutive errors that occur around time 43 combine to cause a single

bit error in the decoded output. The encoding/decoding scheme should be able to correct

any single bit error that occurs, but consecutive or simultaneous errors will cause an error

in the decoded signal.

36

4.2 VERILOG SIMULATION RESULTS

After defining the specifications for the encoder and decoder and then simulating

them in Matlab, they were ported to Verilog. The verification of the Verilog

implementation of the encoder/decoder pair was accomplished in three parts. First, the

encoder was coded and tested by first passing through a single packet of data, and then by

sending through a string of packets. Once the encoder was working, it was used to

generate encoded data streams for testing of the decoder. The inputs to the decoder were

connected directly to the outputs of the encoder. In similar fashion to the encoder, it was

tested by sending first a single packet through, and then a string of packets. This set of

tests could prove that data can be encoded and decoded, but does not test out the error

correction of the decoder. The data stream to the decoder contained no errors. To test

the error correction capabilities of the decoder, it was disconnected from the encoder.

The decoder then received inputs that were the same as the ones the encoder generated,

but some of the bits were flipped. Codes were sent through with a single bit error, then

with a couple widely-spaced bit errors, and then with consecutive bit errors. The results

from these tests are discussed here.

37

4.2.1 Encoder Testing Transmission Data (2 Packets)D+ 0 1 1 0 0 1 1 0 0 1 1 0D- 1 0 1 0 1 0 0 1 0 0 1 0Squelch/Discon 0 0 0 0 0 0 0 0 0 1 0 0Chirp 0 0 0 0 0 0 0 0 0 0 0 0

First TransmissionState 0 1 3 2 1 3 2 1 3 2 0 0 1 3 2 0 0 0Encoder Input 1 1 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0Encoder Out1 1 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0Encoder Out2 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0

FlushS/D & ChirpData TransmissionHeader

Second TransmissionState 0 1 3 3 2 0 1 2 0 1 2 1 3 2 0 1 2 0Encoder Input 1 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0Encoder Out1 1 0 0 1 0 1 1 0 1 1 1 0 1 0 1 1 0 0Encoder Out2 1 0 1 0 1 1 1 1 1 1 0 0 0 1 1 1 1 0

Data TransmissionHeaderS/D & Chirp Flush

Table 5: Encoder Input and Output Examples

After the encoder was deemed to be functional when receiving a single packet,

another test with multiple consecutive packets was performed. Table 5 shows a sample

of two of the packets, back to back, that were encoded. The top portion of the table

details the incoming data streams, while the next two portions detail the encoded outputs

of the two packets, including state transitions, encoder input bits, and the corresponding

output bits. For the first transmission the input data on the D+ line is 011001 (19 in

hexadecimal), while the D- line input data is 101010 (2A in hex). These two values are

captured in the dplus_broadcast, and dminus_broadcast registers in Figure 20. These two

registers are used to send the inputs to the encoder and represented the buffered data from

the USB controller and they show the correct values, so the data is being buffered

correctly. Comparing the state, input, out1, and out2 in table 5 to the encoder state,

ecc_input, out1, and out2 signals in Figures 20 and 21 shows an exact correlation. In

short, the encoder is functioning properly.

38

Figure 20: First Packet Encoding

Figure 21: Second Packet Encoding

4.2.2 Encoder/Decoder Pair Test Results

Testing the decoder is a slightly more complex task. Not only does it have to

function when the inputs are error-free, but it should also provide some error correction

in the presence of incorrect data streams. To test that the decoder can decode data from

the encoder that does not have errors, the outputs of the encoder were connected directly

to the inputs of the decoder. Thus, there is no data corruption between the transmission

39

and reception. The same packets that are shown in Table 5 are the ones that are tracked

in Figures 22 and 23 below. The correct path can be followed through the various states

by looking at the state##_err signals. Since there is no noise, the correct path will have

no errors. So the err signal that has 00 for its value is the correct path as it passes through

the decoder. It starts at state 00 and moves to 10, 11, 01, etc. At the end of the packet

reception, when the packet_received signal goes high, the correct data is chosen from the

four state data options. In this case, the correct value is 36C60 (hex) and is residing in

state 00, which can be derived from the first transmission portion of table 5.

Figure 22: First Packet Decoding

The second packet that is decoded is the one from the second transmission portion

of Table 5. Similar to the previous figure, the correct path can be followed through the

decoder by following 00 as it traverses through the state##_err signals. In this case the

correct data to be captured is 392C8 which, again, can be computed from Table 5.

However, the other interesting thing shown in Figure 23 is the transmission of the data

from the previous packet to the rest of the receiver. It is represented by the rx[5:0] signal

below. The correct sequence of six six-bit packets to send for the 36C60 data that was

40

received is 110100, 111000, 111100, 110000, 110100, and 111000, which are 34, 38, 3C,

30, 34, and 38 in hexadecimal format. The hex numbers are the same as the ones that are

present in the figure.

Figure 23: Second Packet Decoding

4.2.3 Decoder Error Correction Test Results

Once the decoder is working in the absence of errors, the ability of the decoder to

correct errors was assessed. Three kinds of errors are presented here: a single error in a

packet, two errors in a single packet that have wide separation, and two errors in a packet

that occur concurrently. In all three cases, the pristine data stream to be used will be one

of the two presented above: 36C60. With no errors, the two input streams from the

QPSK demodulator that produce this sequence are 101101101001010000 and

100000000101001000 on input1 and input2, which are 2DA50 and 20148 in hex. So, for

the single error case, instead of putting 2DA50 on the first input, a sequence of 2DB50

will be used. In the second case, with two errors separated by a significant margin,

2DB50 and 20158 will be the two sequences. Finally, in the third case, an error on both

41

inputs at the same time will occur, and the sequences will be 2DB50 and 20058 in

addition to 0DA50 and 00148.

Figure 24: Correcting a Single Error

Figure 24 above shows what happens when a single error occurs during a packet.

Tracing the 00 case through the state##_err values shows which state the correct path is

in up until about the middle of the packet, when an error occurs. After this error occurs,

the decoder realizes it, the number of errors is incremented, and the 01 case becomes the

lowest error total left. So, for the rest of the sequence, the error code 01 corresponds to

the correct path. At the end of reception, the decoded sequence that is selected is again

36C60. So, the decoder corrected the error! Figure 24 is just an example, this single bit

error was tried in a number of different positions in the incoming sequence on both

inputs, and every time the decoder was able to correct it. So, the decoder does a great job

handling single bit errors.

The next case to be handled is the case of two, non-simultaneous errors in the

received packet, as presented in Figure 25. Similar to the previous case, the sequence is

error-free until about a third of the way through, when the lowest number of errors in the

42

state##_err registers becomes 01. So, one error has occurred. The decoder is still fine

with this until about four packets prior to the end when another error in the received

sequence occurs. The minimum number of errors becomes 02, and then quickly becomes

03 and 04. Finally, at the end of the packet, the path with the minimum number of errors

is now in state 11, state four claims it has four errors! So, the decoder could not recover

from these two, non-simultaneous errors. In fact, it received 2DAA0, which is

significantly different from the expected 36C60. However, it would be able to indicate

that a substantial number of errors have occurred. Also, due to the flush bits, the decoder

should always start and end each packet in state 00, so if the path with the fewest number

of errors is in a state other than 00, it would know that something went wrong, and would

be able to indicate this to the USB interface.

Figure 25: Two Non-Simultaneous Errors

After trying two non-simultaneous errors, the case of two simultaneous errors was

checked, with the expectation that it would cause rather impressive failures. Figure 25

appears to support this, as the received sequence (36DD6) has numerous errors in it.

However, as the bits that were affected occurred at different indices of the input

43

sequence, the results changed. In some cases, such as shown in Figure 26 which had the

first bit of both input streams flipped, the decoder was still able to recover 36C60, which

was the original sequence. It also indicates that two errors occurred. In some cases, the

decoder is able to recover from 2 errors.

Figure 26: Correcting Two Simultaneous Errors

Figure 27: Two Simultaneous Errors That Are Not Correctable

44

4.2.4 Encoder/Decoder Summary

The encoder-decoder pair should be able to correct any single bit error that occurs

during the transmission of a packet. Also, it may be able to correct some two bit errors,

but will not be able to do so all the time. The decoder would know that errors have

occurred and, although not implemented in this project, could indicate to the host

controller that it could not correct the errors in the transmission, so the data could be

squelched or it could ask for retransmission. The single bit error correction is sufficient

for this project as the packets are small enough to avoid most bundles of errors. In order

to increase the number of bits to be corrected, a larger constraint length or a larger k value

would need to have been chosen for the encoder.

45

Chapter 5: Project Integration and Conclusion

A systematic approach has been presented for the design of a wireless USB

transceiver, from a problem statement to the realization of the low-power monolithic IC

design for a wireless USB transceiver. The first chapter goes through the overall design

problem and proposed solution; to replace the complex tangle of wiring used to connect

consumer electronics, computer peripheral, and mobile devices with a high-bandwidth

wireless, low-power, low-cost wireless links. The second chapter goes through high-level

details for the wireless USB device: how the transceiver communicates with the USB

physical interface, how the device is going to package the data for wireless

communication and obviously the wireless transmission itself. The USB wireless

transceiver would need to transmit 4 bits, to cover all the USB states. This would imply

that a packet of 6 bits, including headers, would be needed to transmit at USB data

speeds. However, to enable the use of error correction, six USB transmissions are

buffered together prior to encoder, and 18 bits are encoded into 36 bits for each wireless

packet. Since a QPSK transmission scheme was chosen for the USB Wireless Device,

which in effect doubles the data that can be transmitted, in essence 18 bits are sent during

the course of six USB transmissions, so three QPSK symbols per USB transmission.

Therefore, the data for high-speed USB would need to be transmitted at 1.44 GHz

(480Mb/sec*3). The frequency of the carrier waves to carry these data packets was set to

3.6 GHz.

The wireless USB project then diverged into the design of 4 major components

for the wireless USB design: 1) USB Interface, Packet Generation, and Error Correction

Scheme 2) Clock Recovery Mechanism between the Transmitter and Receiver [6] 3) low

noise amplifier (LNA), power amplifier (PA), and antenna [13] and 4) Frequency

46

Synthesizer for sine and cosine wave generation for the QPSK transmission scheme [8].

Emphasis was made on each of these major components to reduce the power dissipation

and area.

5.1 DIGITAL USB INTERFACE AND ERROR CORRECTION CIRCUITRY

A method to provide a simple interface with basic error correction between a USB

2.0 device and the QPSK wireless transmission algorithm has been presented. The

interface takes inputs from an attached USB host controller that are generated at a

maximum frequency of 480 MHz and creates buffers of six USB transmissions to be

encoded via a convolutional coder that employs a 1/3 rate code. The two outputs of the

convolutional coder are the inputs to the QPSK, which modulates the data, which is

generated with a maximum frequency of 1.44 GHz, with a 3.6 GHz carrier wave. On the

receiving end, the decoder receives two streams of data from the QPSK demodulator.

The decoder decodes the data using the Viterbi algorithm, eventually choosing the most-

likely received data from four sets of saved data. The selected data stream is then

broadcast to the receiving end of the USB interface in six bit parcels, which are generated

off the 1x clock which has a maximum frequency of 480 MHz. These parcels contain

two header bits, data + and data – bits, as well as the s/d and chirp bits. These are

converted inside the interface to the correct format and then sent on to the host controller.

The overall interface is quite simple from a digital circuit standpoint. It consists

mostly of small pieces of logic to do the data conversion and determine when a packet is

received or transmitted. The largest portion of it is the counters used to detect a reset

condition. To do this, there must be one counter that can detect 2.5 us of a certain state

and another that detects 3 ms of the state. These can be done with a 13-bit counter and a

47

2-bit counter, if the low speed 1x clock with a frequency of 1.5 MHz is used. In total, the

digital interface consumes about 1 mW of power during transmit or receive.

The transmit register is more complex than the top interface, but still rather

simple. It consists of 9 flip-flops running on the 3x clock (1.44 GHz), and 41 flip-flops

running on the 1x clock (480 MHz). It also contains 2 single bit precision adders to

perform the convolutional encoding. From simulations, this results in a power

consumption of 4.2 mW during transmit for the transmit register.

The receive register is the most complex portion of the digital interface. It

contains 253 flip-flops operating on the 3x clock which store the various paths, the

number of errors they contain, and the like. It also contains 11 flops that operate on the

1x clock and 18 flip-flops that essentially operate at 80 MHz which take care of the

interface between the top interface and the decoder. Finally, it contains five 6-bit adders

to update the number of errors found in the stored paths, and three 5-bit comparators to

decide with of the paths contain the fewest errors. From simulations, a power

consumption 63 mW is seen during reception in the receive register.

The encoder and decoder’ s primary interface is with the top level interface and

the QPSK modulator and demodulator as mentioned above. However, it also will not

perform any encoding nor decoding until it has received a signal from the clock recovery

mechanism that the PLL has locked.

5.2 CLOCK RECOVERY MECHANISM

The clock recovery scheme takes a very straight forward approach to solving this

problem. Using a QPSK transmit and receive scheme allows for the easy recovery of the

clock using standard analog integration techniques. By starting out with a receive clock

48

that is “close” to the same frequency that was transmitted, the clock recovery circuitry

can recover the data and the phase offset using integrators and mixers. The frame

recovery can also be realized using a standard integrator.

The phase offset recovered from the incoming wave can then be fed back to the

PLLs for use in helping the PLL to lock onto the transmitted frequency. There are two

PLLs used in this system, one to generate sine terms and one to generate cosine terms [6].

Each PLL receives a reference clock from the frequency synthesizer (One sine and one

cosine) that is used as a direct input into the phase detector. The phase detector then

compares the frequency and phase of the reference clock to the phase term generated by

the clock recovery circuitry and controls the VCO appropriately. The system requires a

start signal form the LNA circuitry that enables the PLL to start the locking sequence.

The transmitter will begin transmitting random data upon power up, so there will be

sufficient time to achieve lock.

Once the PLL has locked on frequency, the clock recovery circuitry will begin

transmitting data to the ECC block. Prior to that, the clock recovery circuitry will

transmit an enable signal to the ECC block letting it know that the incoming data stream

is valid and that the PLL is locked.

Since QPSK is fairly tolerant to jitter on the clock, the response time of the PLL is

not absolutely critical. This clock recovery mechanism will adjust to phase shifts every

cycle, but is limited by the response time of the PLL. Overall, the system will be able to

correct for phase shifts and large amounts of jitter while continuing to align to the frame

boundary [6].

49

5.3 LOW-NOISE AMPLIFIER, POWER AMPLIFIER, AND ANTENNA

The straightforward approach of the single ended low noise amplifier (LNA)

allowed for clean and simple input mechanism to receive the transmitted signal from the

antenna [13]. By using a single ended LNA, it was able to produce a signal output that

met the requirements for noise rejection as well as utilize a low power implementation.

The use of a direct down conversion scheme on the input architecture can keep the

number of off chip components to a minimum, continuing the trend of low power, low

cost implementation. The single ended LNA was critical to realizing these constraints.

The single ended LNA front end architecture consists of the antenna followed by

the off chip channel select bandpass filter. The signal gets received by the LNA on die

and further bandpass filtered to hone in on the fundamental 3.6 GHz signal. The main

mechanism through which the LNA achieves its gain at a given frequency while at the

same time attenuating other frequencies is tuned inductive resonance. The LNA tries to

minimize the amount of noise injected into the system at 3.6 GHz and minimize noise at

other frequencies. In that manner, it can achieve a very high signal to noise ratio with a

small, single stage of amplification. The overall LNA solution should also include sine

wave input detection circuitry to notify downstream components like the clock recovery

block that it is receiving a signal.

The power amplifier is essentially the reverse of the LNA. It tries to transmit a

given amount of power to the antenna at a desired frequency and not transmit at any other

frequencies. The necessity to deliver power to a load drives the amount of power

consumed. The design goals called for minimizing power consumed as well as off die

components needed. Again, the direct up conversion architecture enables the ease of

lowering component count helping us meet our constraints [13].

50

The Class C power amplifier presented here achieves a good balance of power

delivery and power consumption. By using a zero gate bias on the output the amplifier

can be largely sized to deliver the needed current to the load without consuming a large

static current [13]. The Class C amplifier seeks to minimize the conduction angle such

that the transistor acts as close to an ideal switch as possible. Essentially, there should be

zero current when there is a large voltage across the transistor and a large current when

there is a very small voltage across the transistor.

Details on Noise Contribution and Power:

PA noise transmitted 5 a HzV

PA Power Added Efficiency 18%

PA drain efficiency 48%

PA SFDR 55db

PA power consumed 33 dbmW

LNA SNR 70db

LNA noise injected 17 f HzV

LNA power consumed 15 dbmW

23 mW RMS power supplied to the Antenna.

5.4 FREQUENCY SYNTHESIZER

A multilevel abstraction approach has been presented for the design of a

frequency synthesizer that will produce the sine/cosine carrier waves used by the

transceiver for this wireless USB design [8]. Options for frequency synthesizers were

presented. While analog frequency synthesizers such as the PLL are most prevalent,

many new wireless applications prefer digital frequency synthesizers such as the DDFS

51

because they provide high frequency accuracy, temperature and time stability, as well as

being frequency agile and phase continuous. The best feature that is most often forgotten

is that digital frequency synthesizers do not need to be tuned, and hence could provide

lower test time and potentially lower costs.

Reference [8] also illustrates the system level consideration about which design of

digital frequency synthesizers to choose and analysis on optimal settings for the DDFS to

meet specifications for the design. The analysis shows that a 64 entry LUT, with 5 bits

output (excluding 1 bit for sign), can produce a SQNR of greater than 30dB and SFDR of

greater than 55dBc.

At a 100MHz operating frequency, at room temperature and using TSMC 0.18u

3.3V technology, each ROM for the folded ROM architecture produce an average power

of 3.5 mW, the 8-bit RCA adder produce an average power of 1.2 mW and the 2 8-bit

registers that store the phase offset produce an average power of 0.7 mW. Excluding the

power consumption of DAC, the total power consumption of the digital frequency

synthesizer is less than 10 mW [8].

5.5 SUMMARY

The transceiver presented can be used in two configurations: it can be attached to

a USB host that has its own power supply, or it can be attached to a device that is

powered by batteries. In the first case, the device must draw less than 200 mA of current

in order to be powered by the USB bus [5]. However, in the second case, when the

transceiver would be connected to battery powered devices such as digital cameras, it

would be nice if it could function continuously on a single AA lithium battery for an

entire day. In battery connected mode, the transceiver should only draw current during

52

transmit or receive. When the USB connection is idle, the current drawn should decrease

considerably.

When the device is transmitting, it is using the power amplifier, the DDFS, the

PLLs, the digital interface, the ECC encoder, and the QPSK modulator. The power

amplifier consumes 33 mW, the PLLs and clock recovery and modulator use 180 mW

[6], the digital interface 1 mW, and the ECC encoder uses 4.2 mW. The entire system

will consume about 218 mW when transmitting. This power consumption can be

converted to amperes by dividing by the voltage of 3.3V, resulting in a current

consumption of 66 mA. In the receiving state, the components used include the low

noise amplifier, the PLLs, the digital interface, the ECC decoder, and the QPSK

demodulator. The LNA consumes 15 mW, the PLLs and clock recovery and modulator

use 180 mW, the digital interface 1 mW, the ECC decoder uses 63 mW. This results in a

total power consumption of 259 mW, or a current consumption of 78 mA.

Assuming the device is always on and half the time it is transmitting and the other

half it is receiving, the average current consumption would be about 72 mA. This easily

meets the constraints of using the USB bus for power (200 mA or less). Considering that

a single AA lithium battery has a capacity of about 2900 mAh, the transceiver could run

for more than 40 hours continuously in battery-connected mode. Since this device is

targeted for commodities such as digital cameras, that should be acceptable. The power

numbers presented here are for high-speed USB operation. The power consumption

would be significantly less for devices operating in full-speed or low-speed USB modes,

such as mice and keyboards, due to much lower frequency requirements.

i

Appendices

A1 ACRONYM DEFINITIONS

ADC – Analog to Digital Converter

BER – Bit Error Rate

BPSK – Binary Phase Shift Keying

DAC – Digital to Analog Converter

DDFS – Direct Digital Frequency Synthesizer

ECC – Error Correcting Code

LNA – Low Noise Amplifier

LSB – Least Significant Bit

LUT – Look Up Table

MSB – Most Significant Bit

PA – Power Amplifier

PLL – Phase-Locked Loop

QPSK – Quadrature Phase Shift Keying

RCA – Ripple Carry Adder

ROM – Read Only Memory

SE0 – Single-Ended Zero

SNR – Signal to Noise Ratio

SQNR – Signal Quality to Noise Ratio

USB – Universal Serial Bus

VCO – Voltage Controlled Oscillator

ii

A2 MATLAB CONVOLUTIONAL CODE AND VITERBI DECODING SOURCE CODE

The Matlab code below was leveraged from two Matlab tutorials [10] and [11].

SNR = 4.5:.5:14; linSNR = 10.^(SNR(:).*0.1); M = 4; % sampling rate -> 2.5 periods per symbol in this project codeRate = 1/2; % number input bits/number output constlen1 = 3; % constraint length of 3 constlen2 = 4; % constraint length of 4 constlen3 = 5; % constraint length of 5 k = log2(M); codegen1 = [6 7]; % convolutional generating codes: v1 = u0+u1, v2 = u0+u1+u-1 codegen2 = [15 16]; % codes for constlen2 codegen3 = [32 35]; % codes for constlen3 tblen = 8; % traceback length for viterbi decoder trellis1 = poly2trellis(constlen1, codegen1); % create the trellis for the encoder trellis2 = poly2trellis(constlen2, codegen2); trellis3 = poly2trellis(constlen3, codegen3); dspec1 = distspec(trellis1, 3); % compute the distances between the codes dspec2 = distspec(trellis2, 4); dspec3 = distspec(trellis3, 5); expectedBER1 = bercoding(SNR, 'conv', 'hard', codeRate, dspec1); % compute the BER upper bound expectedBER2 = bercoding(SNR, 'conv', 'hard', codeRate, dspec2); expectedBER3 = bercoding(SNR, 'conv', 'hard', codeRate, dspec3); figure; semilogy(SNR, expectedBER1, 'g', SNR, expectedBER2, 'r', SNR, expectedBER3, 'b'); xlabel('SNR (dB)'); ylabel ('BER'); title('Performance for R=1/2, K=3,4,5 Convolutional Code with Hard Decision'); grid on; axis([4 14 10e-20 10e-1]); legend('Constraint Length 3', 0, 'Constraint Length 4', 0, 'Constraint Length 5', 0); % generate random data numberSymbols = 100; Nsamp = 4; % oversampling rate SNR_temp = 5; % set SNR for channel EsNO = SNR_temp + 10*log10(k); seed = [192837 564738];

iii

rand('state', seed(1)); randn('state',seed(2)); msg_orig = randsrc(numberSymbols, 1, 0:1); figure; subplot(2,1,1); stem(0:numberSymbols-1, msg_orig(1:numberSymbols), 'bx'); axis([0 numberSymbols -0.2 1.2]); xlabel('Time'); ylabel('Amplitude'); title('Binary Symbols Before Encoding'); legend off; % convolutionally encode the data [msg_encode] = convenc(msg_orig, trellis1); numberEncoded = numberSymbols / codeRate; tEnc = (0:numberEncoded-1) * codeRate; subplot(2,1,2); stem(tEnc, msg_encode(1:length(tEnc)), 'ro'); axis([min(tEnc) max(tEnc) -0.2, 1.2]); xlabel('Time'); ylabel('Amplitude'); title('Binary Symbols After Encoding'); % modulate using QPSK randn('state', seed(2)); msg_enc = bi2de(reshape(msg_encode, size(msg_encode, 2)*k, size(msg_encode, 1) /k)'); msg_tx = pskmod(msg_enc, M, pi/4); msg_tx = rectpulse(msg_tx, Nsamp); msg_rx = awgn(msg_tx, EsNO-10*log10(1/codeRate)-10*log10(Nsamp)); numberModulated = numberEncoded * Nsamp ./ k; timeModulated = (0:numberModulated-1) ./ Nsamp .* k; figure; plot(timeModulated, real(msg_tx(1:length(timeModulated))), 'c-', timeModulated, imag(msg_tx(1:length(timeModulated))), 'm-'); axis([min(timeModulated) max(timeModulated) -1.5 1.5]); xlabel('Time'); ylabel('Amplitude'); title('Encoded Symbols After QPSK Modulation'); Legend('In-Phase', 'Quardrature', 0); % now demodulate and compare to original signal. msg_rx_de = intdump(msg_rx, Nsamp); msg_demod_de = pskdemod(msg_rx_de, M, pi/4); msg_demod = de2bi(msg_demod_de, k)'; msg_demod = msg_demod(:); figure; stem(tEnc, msg_encode(1:numberEncoded), 'bx');

iv

hold on; stem(tEnc, msg_demod(1:numberEncoded), 'ro'); axis([0 60 -0.2 1.2]); xlabel('Time'); ylabel('Amplitude'); title('Transmitted and Demodulated Symbols'); Legend('Transmitted', 'Received', 0); % use the viterbi decoder to demodulate the received signal msg_decode = vitdec(msg_demod, trellis1, tblen, 'cont', 'hard'); figure; stem(0:59, msg_orig(1:60), 'bo'); hold on; stem(0:59, msg_decode(1+tblen:60+tblen), 'rx'); hold off; axis([0 50, -0.2, 1.2]); xlabel('Time'); ylabel('Amplitude'); title('Original vs. Decoded Symbols'); Legend('Original', 'Decoded', 0); % compute BER through channel and received stream BER [channelErrors BERchannel] = biterr(msg_encode, msg_demod) [codeErrors BERcode] = biterr(msg_orig(1:end-tblen), msg_decode(1+tblen:end)) % results from simulation snr_array = [4 5 6 7 8 9 10 11 12 13 14] channel_errors = [161762 109369 67553 37087 17869 7217 2256 526 97 4 0] channel_ber = [0.080881 0.054684 0.033777 0.018544 0.0089345 0.0036085 0.001128 0.000263 4.85e-005 2e-006 0] symbol_errors = [73091 46059 26553 13714 6331 2527 726 157 34 0 0] symbol_ber = [0.073092 0.046059 0.026553 0.013714 0.0063311 0.002527 0.00072601 0.000157 3.4e-005 0 0] figure; semilogy(SNR, expectedBER1, 'g', snr_array, channel_ber, 'b*-', snr_array, symbol_ber, 'rx-') axis([4 14 10e-14 10e-1]); xlabel('SNR (dB)'); ylabel('BER'); title('Performance: Simulated and Ideal'); Legend('Ideal', 'Channel BER', 'Code BER');

v

A3 VERILOG CODE FOR THE CONVOLUTIONAL ENCODER

The verilog code written by Stojanovic and Rao [15] was helpful for

understanding general operation of the Viterbi decoder, although the code written for this

project was written from scratch.

module tx_register_ecc (reset, txen, dplus, dminus, squelch_discon, chirp, clock1x, clock3x, out1, out2); input dplus, dminus, squelch_discon, chirp, clock1x, clock3x, reset, txen; output out1, out2; reg tx2, tx3, tx4, tx5; reg [4:0] state; // state bits that control the encoding reg dplus0, dplus1, dplus2, dplus3, dplus4, dplus5; // shift registers to store DPlus data reg dminus0, dminus1, dminus2, dminus3, dminus4, dminus5; // shift registers to store DMinus data reg [5:0] dplus_broadcast, dminus_broadcast; // holding registers used to send data to QPSK while // shift registers get overwritten reg sd0, sd1, sd2, sd3, sd4, sd5; // squelch/disconnect storage reg chirp0, chirp1, chirp2, chirp3, chirp4, chirp5; // chirp storage reg sd_broadcast, chirp_broadcast; // used to send to QPSK reg buffer_full; // used to tell QPSK that there is data to send (enabled in state 18). reg [2:0] buff_state; // state bits that control the buffer filling wire out2, out1; reg ecc_input; reg encode; // register to enable encoding reg [1:0] encode_state; // select states parameter sel0 = 6'b000001; parameter sel1 = 6'b000010; parameter sel2 = 6'b000100; parameter sel3 = 6'b001000; parameter sel4 = 6'b010000; parameter sel5 = 6'b100000; // buff_state states

vi

parameter bs0 = 3'b000; parameter bs1 = 3'b001; parameter bs2 = 3'b010; parameter bs3 = 3'b011; parameter bs4 = 3'b100; parameter bs5 = 3'b101; // state machine states for sending data to QPSK parameter s0 = 5'b00000; parameter s1 = 5'b00001; parameter s2 = 5'b00010; parameter s3 = 5'b00011; parameter s4 = 5'b00100; parameter s5 = 5'b00101; parameter s6 = 5'b00110; parameter s7 = 5'b00111; parameter s8 = 5'b01000; parameter s9 = 5'b01001; parameter s10 = 5'b01010; parameter s11 = 5'b01011; parameter s12 = 5'b01100; parameter s13 = 5'b01101; parameter s14 = 5'b01110; parameter s15 = 5'b01111; parameter s16 = 5'b10000; parameter s17 = 5'b10001; // capture latest USB data into storage registers // shift older data through the registers always @(posedge clock1x) begin if (reset) begin dplus0 <= 1'b0; dminus0 <= 1'b0; sd0 <= 1'b0; chirp0 <= 1'b0; dplus1 <= 1'b0; dminus1 <= 1'b0; sd1 <= 1'b0; chirp1 <= 1'b0; dplus2 <= 1'b0; dminus2 <= 1'b0; sd2 <= 1'b0; chirp2 <= 1'b0; dplus3 <= 1'b0; dminus3 <= 1'b0;

vii

sd3 <= 1'b0; chirp3 <= 1'b0; dplus4 <= 1'b0; dminus4 <= 1'b0; sd4 <= 1'b0; chirp4 <= 1'b0; dplus5 <= 1'b0; dminus5 <= 1'b0; sd5 <= 1'b0; chirp5 <= 1'b0; end else if (txen) begin dplus0 <= dplus; dminus0 <= dminus; sd0 <= squelch_discon; chirp0 <= chirp; dplus1 <= dplus0; dminus1 <= dminus0; sd1 <= sd0; chirp1 <= chirp0; dplus2 <= dplus1; dminus2 <= dminus1; sd2 <= sd1; chirp2 <= chirp1; dplus3 <= dplus2; dminus3 <= dminus2; sd3 <= sd2; chirp3 <= chirp2; dplus4 <= dplus3; dminus4 <= dminus3; sd4 <= sd3; chirp4 <= chirp3; dplus5 <= dplus4; dminus5 <= dminus4; sd5 <= sd4; chirp5 <= chirp4; end end // state machine that controls buffering of USB data always @(posedge clock1x) begin if (reset) begin buff_state[2:0] <= bs0; buffer_full <= 1'b0; end

viii

else if (buff_state[2:0] == bs0) begin if (txen) buff_state[2:0] <= bs1; else begin buffer_full <= 1'b0; buff_state[2:0] <= buff_state[2:0]; end end else if (buff_state[2:0] == bs1) begin buff_state[2:0] <= bs2; end else if (buff_state[2:0] == bs2) begin buff_state[2:0] <= bs3; end else if (buff_state[2:0] == bs3) begin buff_state[2:0] <= bs4; end else if (buff_state[2:0] == bs4) begin buff_state[2:0] <= bs5; end else if (buff_state[2:0] == bs5) begin buff_state[2:0] <= bs0; buffer_full <= 1'b1; // indicate to the other state machine // that it can copy over the data end end // transmit state machine that controls sending data to QPSK // need to reset into state s0. always @(posedge clock3x) begin if (reset) begin state[4:0] <= s0; encode <= 1'b0; ecc_input <= 1'b0; end else if (state[4:0] == s0) begin if (buffer_full) begin state[4:0] <= s1; //USB buffer full, start transmitting ECC data ecc_input <= 1'b1; // 1st header bit encode <= 1'b1; //Make a copy of the USB buffer dplus_broadcast[5] <= dplus5; dminus_broadcast[5] <= dminus5; dplus_broadcast[4] <= dplus4;

ix

dminus_broadcast[4] <= dminus4; dplus_broadcast[3] <= dplus3; dminus_broadcast[3] <= dminus3; dplus_broadcast[2] <= dplus2; dminus_broadcast[2] <= dminus2; dplus_broadcast[1] <= dplus1; dminus_broadcast[1] <= dminus1; dplus_broadcast[0] <= dplus0; dminus_broadcast[0] <= dminus0; //minimize SD & Chirp data sd_broadcast <= sd5 | sd4 | sd3 | sd2 | sd1 | sd0; chirp_broadcast <= chirp5 | chirp4 | chirp3 | chirp2 | chirp1 | chirp0; end // if (buffer_full) else begin encode <= 1'b0; state[4:0] <= s0; end end // if (buffer_full) else if (state[4:0] == s1) begin state[4:0] <= s2; ecc_input <= 1'b1; // 2nd header bit end else if (state[4:0] == s2) begin state[4:0] <= s3; ecc_input <= dplus_broadcast[5]; //1st D+ bit end else if (state[4:0] == s3) begin state[4:0] <= s4; ecc_input <= dminus_broadcast[5]; //1st D- bit end else if (state[4:0] == s4) begin state[4:0] <= s5; ecc_input <= dplus_broadcast[4]; //2nd D+ bit end else if (state[4:0] == s5) begin state[4:0] <= s6; ecc_input <= dminus_broadcast[4]; //2nd D- bit end else if (state[4:0] == s6) begin state[4:0] <= s7; ecc_input <= dplus_broadcast[3]; // 3rd D+ bit end else if (state[4:0] == s7) begin

x

state[4:0] <= s8; ecc_input <= dminus_broadcast[3]; // 3rd D- bit end else if (state[4:0] == s8) begin state[4:0] <= s9; ecc_input <= dplus_broadcast[2]; // 4th D+ bit end else if (state[4:0] == s9) begin state[4:0] <= s10; ecc_input <= dminus_broadcast[2]; // 4th D- bit end else if (state[4:0] == s10) begin state[4:0] <= s11; ecc_input <= dplus_broadcast[1]; // 5th D+ bit end else if (state[4:0] == s11) begin state[4:0] <= s12; ecc_input <= dminus_broadcast[1]; // 5th D- bit end else if (state[4:0] == s12) begin state[4:0] <= s13; ecc_input <= dplus_broadcast[0]; // 6th D+ bit end else if (state[4:0] == s13) begin state[4:0] <= s14; ecc_input <= dminus_broadcast[0]; // 6th D- bit end else if (state[4:0] == s14) begin state[4:0] <= s15; ecc_input <= sd_broadcast; // SD bit end else if (state[4:0] == s15) begin state[4:0] <= s16; ecc_input <= chirp_broadcast; // Chirp bit end else if (state[4:0] == s16) begin state[4:0] <= s17; ecc_input <= 1'b0; // 1st flush bit end else if (state[4:0] == s17) begin state[4:0] <= s0; // s0 will capture data again ecc_input <= 1'b0; // 2nd flush bit end else begin state[4:0] <= state[4:0];

xi

end end // always @ (posedge clock3x) // perform the convolutional encoding // 1/3 code G = ( 6, 7 ) always @(posedge clock3x) begin if (reset) begin encode_state[1:0] <= 2'b00; end else if (encode) begin encode_state[1] <= encode_state[0]; encode_state[0] <= ecc_input; end end assign out1 = (ecc_input & ~encode_state[0]) | (~ecc_input & encode_state[0]); assign out2 = (ecc_input & encode_state[1] & encode_state[0]) | (ecc_input & ~encode_state[1] & ~encode_state[0]) | (~ecc_input & ~encode_state[1] & encode_state[0]) | (~ecc_input & encode_state[1] & ~encode_state[0]); endmodule // tx_register

xii

A4 VERILOG CODE FOR THE VITERBI DECODER module rx_register_ecc (reset, clock1x, clock3x, rxen, in1, in2, rx); input reset, rxen, in1, in2, clock1x, clock3x; output [5:0] rx; // output to send to the transceiver reg [5:0] rx; // wires containing output of comparisons with input bits wire [1:0] state00_comp00, state00_comp01, state01_comp11, state01_comp10; wire [1:0] state10_comp11, state10_comp10, state11_comp00, state11_comp01; // wires for the decision between which possible inputs to shift // into the possible paths for each state wire state00_decision, state01_decision, state10_decision, state11_decision; reg buffer_full; // place holders for the decision of which state to keep wire [1:0] state00_holder, state01_holder, state10_holder, state11_holder; // # errors placeholders for each state wire [5:0] state00_sum, state01_sum, state10_sum, state11_sum; // shift registers containing best path for each state reg [35:0] state00_path, state01_path, state10_path, state11_path; // state decoded data reg [17:0] state00_data, state01_data, state10_data, state11_data; // decoded data to be broadcast reg [15:0] rx_data; // registers to hold hamming metric data for each state reg [5:0] state00_err, state01_err, state10_err, state11_err; // temporary registers to hold the errors during summing wire [5:0] state00_temp, state01_temp, state10_temp, state11_temp; // register to hold receive count reg [4:0] receive_counter; wire packet_received; // a packet is received when 36 bits are received wire [2:0] output_choice; // wires to choose the final outputs wire [5:0] out_comp0, out_comp1; reg [17:0] output_buffer; // output state machine reg [4:0] output_state;

xiii

// state machine states for sending data out of the receiver // buff_state states parameter bs0 = 3'b000; parameter bs1 = 3'b001; parameter bs2 = 3'b010; parameter bs3 = 3'b011; parameter bs4 = 3'b100; parameter bs5 = 3'b101; // perform comparison of input data with all allowed // input combinations for each state // the comparison is an XOR, but since one input of each // XOR is known beforehand, the comparison can be done // solely with inverters. assign state00_comp00[1] = in1; // compare inputs to 00 assign state00_comp00[0] = in2; assign state00_comp01[1] = in1; // compare inputs to 01 assign state00_comp01[0] = ~in2; assign state01_comp11[1] = ~in1; // compare inputs to 11 assign state01_comp11[0] = ~in2; assign state01_comp10[1] = ~in1; // compare inputs to 10 assign state01_comp10[0] = in2; assign state10_comp11[1] = ~in1; // compare inputs to 11 assign state10_comp11[0] = ~in2; assign state10_comp10[1] = ~in1; // compare inputs to 10 assign state10_comp10[0] = in2; assign state11_comp00[1] = in1; // compare inputs to 00 assign state11_comp00[0] = in2; assign state11_comp01[1] = in1; // compare inputs to 01 assign state11_comp01[0] = ~in2; // decide which of the two inputs should be chosen for each state assign state00_decision = (~state00_comp00[1] & ~state00_comp00[0]) | (state00_comp01[1] & state00_comp01[0]); assign state01_decision = (~state01_comp11[1] & ~state01_comp11[0]) | (state01_comp10[1] & state01_comp10[0]); assign state10_decision = (~state10_comp11[1] & ~state10_comp11[0]) | (state10_comp10[1] & state10_comp10[0]); assign state11_decision = (~state11_comp00[1] & ~state11_comp00[0]) | (state11_comp01[1] & state11_comp01[0]); // update the possible paths for each state // current state decision shift from value to shift in decoded input

xiv

// 00 1 00 00 0 // 00 0 01 01 0 // 01 1 10 11 0 // 01 0 11 10 0 // 10 1 00 11 1 // 10 0 01 10 1 // 11 1 10 00 1 // 11 0 11 01 1 always @(posedge clock3x) begin if (reset) begin state00_path <= 36'h000000000; state01_path <= 36'h000000000; state10_path <= 36'h000000000; state11_path <= 36'h000000000; state00_err <= 6'b000000; state01_err <= 6'b000000; state10_err <= 6'b000000; state11_err <= 6'b000000; state00_data <= 18'b000000000000000000; state01_data <= 18'b000000000000000000; state10_data <= 18'b000000000000000000; state11_data <= 18'b000000000000000000; receive_counter <= 5'b00000; end else if (rxen) begin // update paths and decoded data for each state if (receive_counter == 5'b10011) begin receive_counter <= 5'b00010; state00_err <= {4'b0000, state00_holder}; // 6'b000000 state01_err <= {4'b0000, state01_holder}; state10_err <= {4'b0000, state10_holder}; state11_err <= {4'b0000, state11_holder}; end else begin receive_counter <= receive_counter + 1; state00_err <= state00_sum; state01_err <= state01_sum; state10_err <= state10_sum; state11_err <= state11_sum; end // else: !if(receive_counter == 100011) state00_path <= state00_decision ? {state00_path[33:0], 1'b0, 1'b0} : {state01_path[33:0], 1'b1, 1'b0}; state00_data <= state00_decision ? {state00_data[16:0], 1'b0} : {state01_data[16:0], 1'b0};

xv

state01_path <= state01_decision ? {state10_path[33:0], 1'b1, 1'b1} : {state11_path[33:0], 1'b1, 1'b0}; state01_data <= state01_decision ? {state10_data[16:0], 1'b0} : {state11_data[16:0], 1'b0}; state10_path <= state10_decision ? {state00_path[33:0], 1'b1, 1'b1} : {state01_path[33:0], 1'b0, 1'b1}; state10_data <= state10_decision ? {state00_data[16:0], 1'b1} : {state01_data[16:0], 1'b1}; state11_path <= state11_decision ? {state10_path[33:0], 1'b0, 1'b0} : {state11_path[33:0], 1'b0, 1'b1}; state11_data <= state11_decision ? {state10_data[16:0], 1'b1} : {state11_data[16:0], 1'b1}; end // if (rxen) else begin receive_counter <= 6'b000000; end // else: !if(rxen) end // always @ (posedge clock3x) // create a place holder for each bits that will be added // to the existing # of errors for each state assign state00_holder[1:0] = state00_decision ? state00_comp00[1:0] : state00_comp01[1:0]; assign state01_holder[1:0] = state01_decision ? state01_comp11[1:0] : state01_comp10[1:0]; assign state10_holder[1:0] = state10_decision ? state10_comp11[1:0] : state10_comp10[1:0]; assign state11_holder[1:0] = state11_decision ? state11_comp00[1:0] : state11_comp01[1:0]; assign state00_temp[5:0] = state00_decision ? state00_err[5:0] : state01_err[5:0]; assign state01_temp[5:0] = state01_decision ? state10_err[5:0] : state11_err[5:0]; assign state10_temp[5:0] = state10_decision ? state00_err[5:0] : state01_err[5:0]; assign state11_temp[5:0] = state11_decision ? state10_err[5:0] : state11_err[5:0]; // update the errors for the path present in each state adder6 s0add(state00_temp, {5'b0000, state00_holder[1]}, state00_sum); adder6 s1add(state01_temp, {5'b0000, state01_holder[1]}, state01_sum); adder6 s2add(state10_temp, {5'b0000, state10_holder[1]}, state10_sum); adder6 s3add(state11_temp, {5'b0000, state11_holder[1]}, state11_sum); assign output_choice[1] = state10_err < state11_err; assign output_choice[0] = state00_err < state01_err; assign out_comp0 = output_choice[1] ? state10_err : state11_err; assign out_comp1 = output_choice[0] ? state00_err : state01_err; assign output_choice[2] = out_comp0 < out_comp1; assign packet_received = (receive_counter == 5'b10011);

xvi

always @(negedge clock3x) begin // posedge works for one packet if (reset) begin output_buffer[17:0] <= 18'b000000000000000000; buffer_full <= 1'b0; end else begin if (packet_received) begin output_buffer <= output_choice[2] ? (output_choice[1] ? state10_data[17:0] : state00_data[17:0]) : (output_choice[0] ? state00_data[17:0] : state01_data[17:0]); buffer_full <= 1'b1; end end end always @(posedge clock1x) begin if (reset) begin output_state <= 5'b00000; rx[5:0] <= 6'b000000; end else begin if (output_state == bs0) begin if (buffer_full) begin output_state <= bs1; rx[5:0] <= {output_buffer[17:14],output_buffer[3:2]}; //rx[5:0] <= {2'b1, (output_choice[2] ? (output_choice[1] ? state00_data[13:12] : state01_data[13:12]) : (output_choice[0] ? state10_data[13:12] : state11_data[13:12])), (output_choice[2] ? (output_choice[1] ? state00_data[1:0] : state01_data[1:0]) : (output_choice[0] ? state10_data[1:0] : state11_data[1:0]))}; end else begin output_state <= bs0; buffer_full <= 1'b0; end end if (output_state == bs1) begin rx[5:0] <= {output_buffer[17:16], output_buffer[13:12], output_buffer[3:2]}; output_state <= bs2; end if (output_state == bs2) begin rx[5:0] <= {output_buffer[17:16], output_buffer[11:10], output_buffer[3:2]}; output_state <= bs3; end if (output_state == bs3) begin rx[5:0] <= {output_buffer[17:16], output_buffer[9:8], output_buffer[3:2]};

xvii

output_state <= bs4; end if (output_state == bs4) begin rx[5:0] <= {output_buffer[17:16], output_buffer[7:6], output_buffer[3:2]}; output_state <= bs5; end if (output_state == bs5) begin rx[5:0] <= {output_buffer[17:16], output_buffer[5:4], output_buffer[3:2]}; output_state <= bs0; end end end endmodule // rx_register_ecc module full_adder (a, b, cin, out, cout); input a, b, cin; output cout, out; assign out = (a & b & cin) | (a & ~b & ~cin) | (~a & b & ~cin) | (~a & ~b & cin); assign cout = (a & b) | (a & cin) | (b & cin); endmodule // full_adder module adder6 (a, b, sum); input [5:0] a, b; output [5:0] sum; wire [5:0] carry, sum; full_adder fa0(a[0], b[0], 1'b0, sum[0], carry[0]); full_adder fa1(a[1], b[1], carry[0], sum[1], carry[1]); full_adder fa2(a[2], b[2], carry[1], sum[2], carry[2]); full_adder fa3(a[3], b[3], carry[2], sum[3], carry[3]); full_adder fa4(a[4], b[4], carry[3], sum[4], carry[4]); full_adder fa5(a[5], b[5], carry[4], sum[5], carry[5]); endmodule // adder6

xviii

Works Cited

[1] Asghar, Saf, PhD. Personal Communication. Feb. 2005.

[2] Aziz, Adnan, PhD. Personal Communication. Summer 2005.

[3] Baldwin, Richard. “Information Theory and Creationism: Classical Information Theory (Shannon).” 2003. <http://home.mira.net/~reynella/debate/shannon.htm>.

[4] Clark, George C., Jr., and J. Bibb Cain. “Convolutional Code Structure and Viterbi Decoding.” Error-Correction Coding for Digital Communications. New York: Plenum Press, 1982. 227-66.

[5] Compaq, Et al. Universal Serial Bus Specification. Revision 2.0 ed. 2000.

[6] Dankert, Dan. “Wireless USB Transmit and Receive Scheme Clock Recovery Circuit”. Fall 2005.

[7] Fleming, Chip. A Tutorial on Convolutional Coding with Viterbi Decoding. 31 Jan. 2003. Sept. 2005 <http://home.netcom.com/~chip.f/viterbi/tutorial.html>.

[8] Gokhale, Sanjeev. “Design of a Digital Frequency Synthesizer for Wireless USB.” Fall 2005.

[9] Langton, Charan. “Tutorial 12 – Coding and Decoding with Convolutional Codes.” Complex2Real.com Complex Communications Technology Made Easy. July 1999. Sept. 2005 <http://www.complextoreal.com/chapters/convo.pdf>.

[10] Mathworks, The. “Covolutional Encoder and Viterbi Decoder - Demo”. Matlab Version 7. Apr. 2004.

[11] Mathworks, The. “Phase Shift Keying Simulation.” Matlab Version 7. Apr. 2004.

[12] McDermott, Mark. Personal Communication. Spring 2005.

[13] Patent, Dimitry. “Wireless USB RF Transceiver Circuitry.” Fall 2005.

[14] Poli, Alain, and Llorenc Huguet. “Application of Codes.” Error Correcting Codes Theory and Applications. Trans. Iain Craig. Hertfordshire, UK: Prentice Hall International Ltd, 1992. 410-58.

[15] Stojanovic, Vladimir, and Ketaki Rao. Viterbi Decoder - Verilog Code. July 2000. Sept. 2005 <http://mos.stanford.edu/ee272/proj99/babyviterbi/verilogcode.html>.

xix

Vita

Jacob S. Schneider was born February 13, 1979 in St. Louis, Missouri. He is the

son of Paul and Kathy Schneider. He received a Bachelors of Science in Electrical

Engineering from Rice University in Houston, Texas. He has worked at Intel

Corporation in Austin, Texas for the past four years as a circuit designer.

Permanent address: 3103 Stanwood Drive, Austin, Texas, 78757

This dissertation was typed by Jacob S. Schneider.

copyright by jacob s. schneider 2005users.ece.utexas.edu/~adnan/comm/js_ms_05.pdf · 2006-04-20 ·...

Documents