wireless communications from systems to silicon raghu rao wireless systems group, xilinx inc

85
WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc.

Upload: clementine-white

Post on 12-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

WIRELESS COMMUNICATIONS From Systems to Silicon

Raghu RaoWireless Systems Group,Xilinx Inc.

Page 2: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

2

R. M. Rao, 2008

Agenda• Introduction to Wireless communications

– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems

– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.

• The Platform FPGA• Overview of FPGAs and FPGA tools

– Building DSP sub-systems on FPGAs– Digital baseband

• FPGA tools and design methodology

Page 3: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

3

R. M. Rao, 2008

Communications Roadmap

• Key markets• Core DSP technologies

– OFDM– MIMO

• IP Network is key• Enables new approaches to

– QoS management– Robustness– Capacity

Page 4: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

4

R. M. Rao, 2008

Wireless Environment

• Multipaths caused by reflections from various objects.

Page 5: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

5

R. M. Rao, 2008

Modeling the Channel• As the mobile moves through the environment, the field

strength varies due to :– Free space path loss– Long term (slow) fading– Short term (fast) fading

log(distance)

Sign

al L

evel

(dB

)

path loss

long term fading

short term fading

Page 6: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

6

R. M. Rao, 2008

Doppler

• Changes in the received carrier frequency due to the relative motion of the mobile to the base station

• f= fd = (v/cos()– for f=900 MHz, v = 70 MPH (112 km/h)– fD-max = v/ = 93.3 Hz

D=v.t

Page 7: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

7

R. M. Rao, 2008

Delay Spread• Measure of the time distribution of power in the channel

impulse response– Typical office 25 ns to 60 ns

– Large Lobbies and atria: 100 ns

– Warehouse and factory floors: 100 ns to 200 ns

– Delay spreads are up to 10 microseconds in cellular environments• Greater than 3 sec in urban areas• 0.5 s in suburban and open areas

Page 8: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

8

R. M. Rao, 2008

Exponential Power Delay Profile

• If the delay spread of the channel is larger than the symbol interval we will see multiple paths in our channel.

• Leads to inter-symbol interference (ISI).• Leads to a frequency selective channel.• Average energy of the channel impulse response follows an

exponential power-delay profile.

Page 9: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

9

R. M. Rao, 2008

Coherence Bandwidth

• Maximum frequency bandwidth for which the signals are still considered to be correlated.

• Bc in Hz = 1/(2rms) when considering amplitude correlation (correlation coefficient = 0.5)

• rms is the rms-delay-spread of the channel

Page 10: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

10

R. M. Rao, 2008

Coherence Time

• Maximum time period for which the signals are still considered to be correlated.

• It is used to characterize the time varying nature of the channel.

• Rule of Thumb 9/(16fm)<Tc<0.423/(fm)

– fm is the maximum Doppler frequency– Correlation coefficient = 0.5

Page 11: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

11

R. M. Rao, 2008

Link Budget

• A link budget is used to compute the range, transmit power, receiver sensitivity and other requirements of the communication system.

• In free space the path loss is given by the Friis equation :

• Gt , Gr represent transmit and receive antenna gains. Pt , Pr represent the transmit power and receive power. is the wavelength, d is the distance.

2

2 2(4 )t t r

r

PGGP

d

Page 12: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

12

R. M. Rao, 2008

Link Budget

• Expressing path loss in dB :

• Note: is the path loss exponent depending on the environment (2 in free space).

• To compute the SNR at the baseband we need to include thermal noise in the signal bandwidth B, and noise figure of the system NF.

( ) ( ) ( ) ( ) 20log( ) ( ).10log( )4r t t rP dB P dB G dB G dB d

( ) 174 / 10log( )rP dB dBm Hz NF B SNR

Page 13: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

13

R. M. Rao, 2008

Link Budget• Margin for desired outage taking into account receiver

structure and antenna diversity.– Standards specify outage probabilities– WiMax – 90% in the cell, 75% at the boundary of the cell.

• Compensation factors for other impairments– Interference from neighbouring cell– Shadow fading, etc.

• Diversity helps achieve the outage probability (or reduces the margin for outage) without increase in transmit power.

Page 14: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

14

R. M. Rao, 2008

Diversity

• Diversity provides the receiver with multiple looks at the transmitted signal.

• Prob(all channels in a fade) << Prob(any 1 channel in a fade)• Diversity improves link reliability.

0 20 40 60 80 100 120 140 160 180 200-20

-15

-10

-5

0

5

10

Time

Sign

al L

evel

(dB)

Channel 1

Channel 2

Combinedchannel

Page 15: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

15

R. M. Rao, 2008

Diversity Techniques• Spatial Diversity

– Antennas “sufficiently spaced” apart (> ½ wavelength).– Will result in an independent channel response and provide another look at the

transmitted signal.• Frequency Diversity

– Transmit over multiple carrier frequencies.– If the frequencies are “sufficiently far” (coherence bandwidth) apart the channel

response will be different on the different frequencies.• Time Diversity

– Channel is continuously changing.– Transmit signals “sufficiently spaced” (coherence time) apart in time so the 2nd

transmission “sees” a different channel compared to the first one.• Polarization Diversity

– Signals transmitted on two orthogonal polarizations exhibit uncorrelated fading statistics.

Page 16: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

16

R. M. Rao, 2008

MIMO Systems

Tx Antenna 1

Tx Antenna 2

Rx Antenna 1

Rx Antenna 2

Tx Antenna M Rx Antenna N

H

• MIMO systems:• Multiple Antennas at the transmitter and

receiver.• 3 types of MIMO Systems:

• STBC MIMO systems• Diversity gain.

• Spatial Multiplexing MIMO systems• Capacity/throughput gain.

• Feedback MIMO systems• Higher performance thru interference

reduction.• MISO (multiple input single output) Systems:

• STBC can be used with just 1 receive antenna.• Provides diversity gain.• To achieve array gain, need knowledge of

channel at the transmitter (feedback).

Page 17: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

17

R. M. Rao, 2008

Spatial Multiplexing

• A spatial multiplexing MIMO system transmits different data symbols from each transmitter.

• The signals from each transmitter combine over the air and are received by multiple receive antennas.

• SM systems have a rate=M (num transmit antennas). The diversity order depends on the type of encoding and receiver (uncoded SM with ML decoding has diversity order=N (num receive antennas)).

MODULATOR

MODULATOR

MODULATOR

MIMOReceiverMIMO

Receiver

x(t)

y(t)

z(t)

r1(t) = a11x(t)+a12y(t)+a13z(t)

r3(t) = a31x(t)+a32y(t)+a33z(t)

x(n)

y(n)

z(n)

x(n)

y(n)

z(n)

Page 18: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

18

R. M. Rao, 2008

Spatial Multiplexing Receivers

Zero Forcing receiver:

11h

22h

21h

12hTx Antenna 1

Tx Antenna 2

Rx Antenna 1

Rx Antenna 2

1 11 1 12 2 1

2 21 1 22 2 2

1 11 12 1 1

2 21 22 2 2

1 1

2 2

1

1 11 12 1

2 21 22 2

ˆ

ˆ

ˆ

ˆ

y h x h x n

y h x h x n

y h h x n

y h h x n

x y

x y

x h h y

x h h y

W

Significant increase in noise when the channel is in a deep fade.

For ZF receivers 1W H

Page 19: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

19

R. M. Rao, 2008

Spatial Multiplexing Receivers

• MMSE MIMO Decoders:– Cancels interference and minimizes noise.– Minimizes the over all error (mean squared error).

2ˆ[( ) ]E x x

1H H

MMSE Ms

M MW H H I H

E SNR

Page 20: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

20

R. M. Rao, 2008

Spatial Multiplexing Receivers

• Zero-Forcing• MMSE• Successive Interference cancellation receivers• Sphere detectors (sub-optimal Maximum

Likelihood)

Page 21: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

21

R. M. Rao, 2008

Transmit Diversity

• Space Time Block Code (STBC)– 2 Antenna STBC also known as “Alamouti Code”.– Improves BER/SER performance.

Information Source

Constellation Mapper Alamouti ST

block code

h1

h2

SymbolPeriod 2

SymbolPeriod 1

STBC Decoder

ML Decision

ML Decision

Soft decision for c1

Soft decision for c2

1 1 1 2 2r h c h c * *2 1 2 2 1( ) ( )r h c h c

Page 22: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

22

R. M. Rao, 2008

STBC Decoder

1 1 2 1 1* * * *

2 2 1 2 2

r h h c n

r h h c n

r Hc n

Decoder:

2 2 1 11 2 *

2 2

ˆ ( )

0ˆ ( )

0

H Hc H r H Hc n

c nc h h

c n

In matrix form the received signal is:

Low complexity decoder.Just 2 complex mults per symbol for a 2 antenna system (and grows linearly with block length/num antennas).

Page 23: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

23

R. M. Rao, 2008

Other MIMO schemes

• Achieving high rate high diversity MIMO systems is an area of active research.

• There are many suboptimal STBC schemes that improve the rate but reduce the diversity order.

• There are also combinations of spatial multiplexing and STBC schemes.

• One such scheme is 2 (or more) Alamouti’s in parallel.

Page 24: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

24

R. M. Rao, 2008

Stacked Alamouti

Information Source

Constellation Mapper Alamouti ST

block code

Constellation Mapper Alamouti ST

block code

Data Stream 1

Data Stream 2

Interference Cancellation and ML Decision

C1

C2

Data Stream 1

Data Stream 2

r1

r2

Receiver for Interference Cancelling STBC

Transmitter for Interference Cancelling STBC

• Interference Cancelling STBC• 2 Alamouti’s in parallel• Rate 2 system• Diversity order =

N*(M-K+1)– K : co-channel users– N : transmit antennas per user.– M : receive antennas

• Requires N*(K-1)+1 antennas at the receiver to suppress K-1 interferers.

Page 25: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

25

R. M. Rao, 2008

Orthogonal Frequency Division Multiplexing (OFDM)

Frequency

Ma

gn

itud

e

OFDM divides a frequency selective channel into a numberof flat fading channels

Page 26: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

26

R. M. Rao, 2008

OFDM Modulation

QAMMapping

IFFTCyclicPrefix

S/P P/SD/AandRF

(a)

RFandA/D

Stripcyclicprefix

S/P FFT P/SQAM

decoding

(b)

FEQ

• A QAM symbol is modulated onto each subcarrier

• IFFT/FFT are used for efficient modulation and demodulation

Frequency Domain Time Domain

Time Domain Frequency Domain

Page 27: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

27

R. M. Rao, 2008

Combating Multipath

• Sampling at instant Ts all channels experience the same channel and there is no ICI

Multipath componentsmax

Sampling InstantTs

OFDM Symbol

CP

Constructing the cyclic prefix (CP)

Page 28: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

28

R. M. Rao, 2008

MIMO and OFDM

• MIMO – Multiple Input Multiple Output Communication System. Employs multiple antennas at both transmitter and receiver.

• OFDM – Orthogonal Frequency Division Multiplexing. Breaks up a broadband channel into many parallel narrowband channels (subcarriers).

• MIMO-OFDM – A Combination of MIMO and OFDM. Appears like many parallel MIMO systems on orthogonal subcarriers.

Page 29: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

29

R. M. Rao, 2008

MIMO-OFDM System

OFDM TRANSMITTER 1

OFDM TRANSMITTER N

OFDMDEMODULATOR 1

OFDMDEMODULATOR N

RIC

H S

CA

TT

ER

ING

EN

VIR

ON

ME

NT

MIM

O D

EC

OD

ER

Each transmitter is an independent OFDM modulator.

The source symbols could be space-time block coded or just QAM modulated for spatial multiplexing.

Each receiver is an OFDM demodulator combined with a MIMO decoder to invert the channel on each subcarrier and extract the source symbols.

Page 30: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

30

R. M. Rao, 2008

Agenda• Introduction to Wireless communications

– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems

– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.

• The Platform FPGA• Overview of FPGAs and FPGA tools

– Building DSP sub-systems on FPGAs– Digital baseband

• FPGA tools and design methodology

Page 31: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

31

R. M. Rao, 2008

802.16/802.16e

• The 802.16 WirelessMAN standard includes requirements for operation in :– Line Of Sight (LOS), 10-66 GHz for fixed wireless systems.– Non Line Of Sight (NLOS), <11 GHz for fixed wireless

systems.

• 802.16e (Mobile WiWax) adds enhancements for mobility in the <11 GHz licensed and unlicensed bands. – Operation in mobile mode is limited to licensed bands between

2 GHz and 6 GHz.

Page 32: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

32

R. M. Rao, 2008

Scalable OFDMA parameters

Parameters Values

System bandwidth (MHz) 1.25 5 10 20

FFT size (NFFT) 128 512 1024 2048

Sampling Frequency (Fs, MHz) 1.4 5.6 11.2 22.4

Sample Time (1/Fs ns) 714.28 178.57 89.28 44.64

Subcarrier spacing 10.94 KHz

Useful Symbol time 91.4 us

Guard interval 11.4 us

OFDMA symbol time 102.9 us

Page 33: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

33

R. M. Rao, 2008

Link BudgetDownlink Uplink

Transmit Power 10 Watts = 40dBm (max=20 Watts)

200 mW = 23dBm (max=200 mW)

Antenna Height 32 meters 1.5 meters

Antenna Gain 15 dBi (BS) -1 dBi (mobile)

EIRP 55 dBm (approx) 22 dBm

# occupied subcarriers 840 out of 1024 840 out of 1024

Power/subcarrier 28 dBm 3.44 dBm

Noise Figure 9 dB (at mobile) 4 dB (at BS)

Total margin for interference, shadow fading, .. (75% coverage at cell edge, 90% overall)

20 dB 20 dB

BS to BS distance 2.8 kms 2.8 kms

SNR Required (Modulation – QPSK 1/8, (repetition code = 4)) (BER=10^-6 after FEC)

-3.31 dB -2.5 dB

Rx sensitivity -100.7 dB -111.1 dB

Max allowable path loss 136.4 dB 133 dB

Page 34: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

34

R. M. Rao, 2008

Time Division Duplexing

• 802.16e can be deployed in TDD and FDD environments.• Initial certification profiles are only for TDD.• The DL subframe and UL subframe lengths are adjustable.• TDD assures channel reciprocity.

Frame (j-2) Frame (j+2)Frame (j+1)Frame (j)Frame (j-1)

Downlink subframe Uplink subframe

Adaptive

TTG : Transmit-Receive transition gap

RTG : Receive-Transmit transition gap

Page 35: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

35

R. M. Rao, 2008

OFDMA Frame Structure

DL-MAP – Downlink MAP : downlink allocationsUL-MAP – Uplink MAP : uplink allocationsFCH – Frame control header : contains information about the DL-MAP

FCH

FCH

Downlink (DL) Subframe Uplink (UL) SubframeTTG RTG

OFDMA Symbol Number

Su

bch

anne

l log

ical

num

be

r

Pre

am

ble

DL-

MA

P

UL-

MA

P

DL Burst SS1

DL Burst Broadcast

DL Burst Multicast

DL Burst SS2

DL Burst SS3

DL Burst SS1

(From BS2)

DL Burst SS4

Pre

am

ble

DL

-MA

P

UL Burst SS1

UL Burst SS2

UL Burst SS3

UL Burst SS4

Ranging subchannel

Page 36: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

36

R. M. Rao, 2008

Data rates for SIMO/MIMO configurations

Source: WiMax Forum

64 QAM with 5/6 CTC

Page 37: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

37

R. M. Rao, 2008

Baseband Transmission Model

• OFDM receiver provides estimates of– Channel hn,i(t)

– Frequency offset 0

– Sample timing T'– OFDM symbol timing

OFDMTransmitter

ChannelInner

ReceiverOuter

Receiverai,k

s(t) r(t) ADC

ResultingChannel hi(t)

Timing Delayd(t-eT')

s(t)

hn,i(t)Timing Delay

0(t) Noisen(t)

T'

r(n)

r(n)

Page 38: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

38

R. M. Rao, 2008

Generic OFDM Transmitter

• Figure shows a generic MIMO OFDM Tx– MIMO not an element of 802.11a, but it is in 802.11n,

3GPP-LTE and 802.16e

MAC

SourceCoding

e.g. LDPC

Space-TimeEncoder

Beamforming

IFFTAppend

CPInsert Pilots

CFR DUC DPD DAC RF PA

IFFTAppend

CPInsert Pilots

CFR DUC DPD DAC RF PA

Page 39: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

39

R. M. Rao, 2008

OFDM Receiver Architecture

• Figure illustrates architecture for generic OFDM Rx• Details will vary as a function of

– Packet-based versus broadcast transmission– Existance of a preamble (or not) in the waveform

ADC

DA

C

DDCSample

Clock Adj.

Course Freq. Offset

Correction

Symbol Timing

CPRemoval

FFT

Extract Pilots

Fine Sample

Clock Adj

Fine Freq.Offset Adj.

Freq. Domain Equalizer

Channel Estimation

PowerEst.

Extract Preamble

Channel Decoding, e.g.

LDPC

MediumAccess

Controller

To/From Network

Page 40: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

40

R. M. Rao, 2008

Agenda• Introduction to Wireless communications

– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems

– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.

• The Platform FPGA– Overview of FPGAs and FPGA tools– Building DSP sub-systems on FPGAs– Digital baseband

• FPGA tools and design methodology

Page 41: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

41

R. M. Rao, 2008

Digital Receiver Architecture:Abstracted Architecture

• Common model of abstraction for digital receiver is inner/outer receiver

Ø Frequency Offset Estimation/CorrectionØ Sample Clock Offset CorrectionØ Channel Estimation/EqualizationØ Frame detectionØ AGCØ Successive Interference CancellationØ Space-Time-CodingØ IFFT/FFTØ Per sub-carrier processing

Inner Receiver

Receiver Abstraction

Outer Receiver

Control, Protocol and Link Layer processing

Digital IF Processing

q Beamformingq QRD-RLS

Ø Up-ConversionØ Down-ConversionØ ChannelizerØ Fast AGC

Ø Channel Coding

q LDPCq TPCq CTCq Viterbiq (De-) Interleave

Ø Medium Access Control (MAC)Ø Link Layer Processing

Ø System Initialization, Control and MonitoringØ Application

Ø EthernetØ PCI ExpressØ SRIO

Ø CPRIØ OBSAI

Page 42: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

42

R. M. Rao, 2008

Receiver Abstraction and Projection on to Platform FPGA

Receiver Function

Characteristics FPGA Platform

Comments

Digital IF Processing

MAC Intensive SX DSP48 main requirement

Inner Receiver MAC intensive Some functions LUT

intensive CORDIC in QRD-RLS

FFT processing for OFDM Correlation processing for

timing Per-carrier complexity

processing (MIMO-OFDM)

SX/LX DSP48 leveraged FFT

FPGA fabric for CORDIC FFT

Outer Receiver

Symbol rate tasks Channel coding

LX ACS/ACSO dominated by low bit precision add/multiplexors

Good match for fabric

Lots of memory required

Control/ Protocol

Gigabit connectivity Linux OS “heavy” tasks TCP/IP

FX Embedded PPC used Rocket IO for

PCI Express SRIO

Num. Sub-carriersTX RXN N

SX/LX

Receiver Abstraction

LX

FX

SX

FPGA product portfolioTailored for various processing Tasks in communicationsreceiver

Page 43: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

43

R. M. Rao, 2008

Digital Frontend

Digital upconversion (downconversion)Crest factor reductionDigital pre-distortion

Page 44: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

44

R. M. Rao, 2008

Serial Gigabit OBSAI/CPRI Proprietary serial

backplane Inter-chip connectivity

Embedded Software

MAC (Media Access)Decision oriented

tasks CORBARTOSNBAPSCA (JTRS radios)

Conn

ectiv

ity

DACDACADCADC

Logic & IO OBSAI/CPRI SRIO AD/DA interface EMIF

DUC,DDCCFR,DPD

RACHSearcher

OFDM PHYTCC

MIMO

High Performance Processing

High MIPs tasks Radio PHYSupported by embedded

DSP tiles, distributed memory, block memory and logic fabric

SRIO

EMIF

The Platform

Page 45: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

45

R. M. Rao, 2008

Virtex-4/5 FPGA ArhitectureHigh-Level View

• FPGA family with 3 members tailored for specific classes of processing– SX: DSP

– LX: Logic centric

– FX: Full featured

• Embedded PowerPC hard IP

• Giga-bit serial connectivity

• DSP processing tiles “DSP48”

Page 46: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

46

R. M. Rao, 2008

Virtex-5 FPGA Platform

• 2 slices per CLB, 4 LUTs per CLB• Can be configured as a shift register• Can be configured as distributed memory

Can be configured as RAM

Can be configured as a shift register

Page 47: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

47

R. M. Rao, 2008

Arithmatica Parallel CounterArithmatica Parallel Counter20% Faster Performance and 20% Faster Performance and

Uses Less AreaUses Less Area

Arithmatica Parallel CounterArithmatica Parallel Counter20% Faster Performance and 20% Faster Performance and

Uses Less AreaUses Less Area

Integrated Cascade Integrated Cascade Routing Enables Routing Enables

Scalable Performance Scalable Performance

Integrated Cascade Integrated Cascade Routing Enables Routing Enables

Scalable Performance Scalable Performance

Arithmatica A+AdderArithmatica A+Adder

20% Faster Than20% Faster Than

Other ImplementationsOther Implementations

Arithmatica A+AdderArithmatica A+Adder

20% Faster Than20% Faster Than

Other ImplementationsOther Implementations

Pipeline RegistersPipeline RegistersEnable 500Mhz Enable 500Mhz Performance Performance

Pipeline RegistersPipeline RegistersEnable 500Mhz Enable 500Mhz Performance Performance

Scalable 500MHz Performance Not Possible Using Scalable 500MHz Performance Not Possible Using

Standard Cell Libraries and Standard Cell Design FlowStandard Cell Libraries and Standard Cell Design Flow

Scalable 500MHz Performance Not Possible Using Scalable 500MHz Performance Not Possible Using

Standard Cell Libraries and Standard Cell Design FlowStandard Cell Libraries and Standard Cell Design Flow

Virtex-4 DSP48 Slice

Page 48: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

48

R. M. Rao, 2008

Z

Y

X

36

36

48

A

B

BCIN

18

18

18

P48

CIN

SUB

3618

18

18

BCOUT

48

ZERO 48

48

PCOUT48

PCIN

48

18

72

Wire Shift Right By 17b

C

48

48

48

To Adjacent DSP48 Tile

Register

48

Pipelined Multiplier

3 delay latency

18

18B

AP (PCOUT)

LS Word

MS Word

48

36b product sign extended to 48b

z-3

Page 49: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

49

R. M. Rao, 2008

Pipelined Complex 18x18 MPY

Ar18

Bi18

‘0’

48

Ar18

Bi18

48

S1

S2

48

sn = Slice n

Ar18

Br18

‘0’

48

Ai18

Bi18

48

S3

S4

48-

Pi

Pr

Register

36

Sign Extension

Page 50: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

50

R. M. Rao, 2008

Wide Filters At Full Speed Within the Virtex-4 DSP Slice Column

• Systolic N-tap FIR– Scalable N-levels deep implementation– N-levels deep at 500MHz performance

• Uses Integrated Pipeline Registers to Synchronize Filter Inputs

• Utilizes Input and Output Cascade Routing

Build Massively Parallel 512-TAP FIR Filter Build Massively Parallel 512-TAP FIR Filter In a Single Device Achieving In a Single Device Achieving 256 GMACCs/s Performance256 GMACCs/s Performance

Build Massively Parallel 512-TAP FIR Filter Build Massively Parallel 512-TAP FIR Filter In a Single Device Achieving In a Single Device Achieving 256 GMACCs/s Performance256 GMACCs/s Performance

Equivalent Implementation Would Consume Equivalent Implementation Would Consume

444 Embedded Multipliers and 77,008 LCs 444 Embedded Multipliers and 77,008 LCs

And Would Only Achieve ½ The Performance And Would Only Achieve ½ The Performance

Equivalent Implementation Would Consume Equivalent Implementation Would Consume

444 Embedded Multipliers and 77,008 LCs 444 Embedded Multipliers and 77,008 LCs

And Would Only Achieve ½ The Performance And Would Only Achieve ½ The Performance

Page 51: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

51

R. M. Rao, 2008

Xilinx FFT IP (4)

• FFT fully utilizes FPGA arithmetic hardware resources• FFT viewed as a recursion using a butterfly kernel

Phase factors: e-j2k/N

e-j2k/N

CADD1CADD2

CMPY

• CADD{1|2}: complex adder• CMPY: complex multiplier

Page 52: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

52

R. M. Rao, 2008

Virtex-4 DSP Slice• DSP slice key for

implementing high-performance arithmetic

• Embedded 18x18 MPY and 48b adder– Butterfly phase rotator– Cross-addition

Page 53: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

53

R. M. Rao, 2008

Butterfly CMPLX MPY

• Complex MPY used in FFT butterfly

• Optimized to employ Virtex-4 DSP Slice– 4 and 3 MPY option

• Complex MPY available as IP module†

Ar

Br

Ai

Bi

Pi

Pr

DSP Slice 1

DSP Slice 4

DSP Slice 2

DSP Slice 3

Pr + jPi = (Ar+jAi) x (Br + jBi)

† Available: 6.2i IP Update 2

Page 54: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

54

R. M. Rao, 2008

Performance/Parallelism/Area• FPGA: highly parallel computing machine• Achieve performance using functional unit parallelism

• Area/throughput tradeoff delivered via Xilinx IP library

• Butterfly array to produce high-performance FFT processor• High computation rate using (possibly) hundreds of DSP

slices– Allocate resources as appropriate to meet system requirements

• Large memory bandwidth using multi-port memory constructed from BRAMs

Mem read BW: 320 x 36 x 500e6 = 5.76 Tera-bps

Page 55: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

55

R. M. Rao, 2008

FFT Architecture• For small number of carriers and modest data rates single

butterfly (I)FFT is probably suitable - Small FPGA footprint

switc

h

PhaseFactor ROM

DataRam 0

DataRam 1

switc

h

Output Data

Input Data

Iteration Engine

Page 56: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

56

R. M. Rao, 2008

Block boundary detection/Fine timing acquisition

Z-1 Z-1 Z-1Z-1 Z-1 Z-1 Z-1Z-1

Z-1 Z-1 Z-1Z-1 Z-1 Z-1 Z-1Z-1

||2

()*

arg

SAMPLES

KNOWNSEQUENCE

1 OFDM block ofrepeated data

Timing Est

Freq Est

ave

Half an OFDM block

F. Tufvesson, O. Edfors, M. Faulkner, “Time and Frequency Synchronization for OFDM using PN-Sequence Preambles”, VTC-1999/Fall, vol 4, pp.2203-7, New Jersey, 1999.

Page 57: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

57

R. M. Rao, 2008

Fine-timing acquisition using a clipped correlator

1

ynsysgencast

bc3

sysgencast

bc2sysgen

d

en

qz-1

in0

in1out0

Register1

sysgen

a

b

suba b

AddSub

3

ld

2

coeff

1

a

2

xnz

1

ynsysgenaddrz-1

ROM1

sysgen

d

addr

en

q

R

a

coeff

ld

yn

MACsysgenz-1

Delay2

4

LD

3

CAddr

2

DAddr

1

xn

1

y

BaudClk

Data Addr

Coef Addr

load

FSM

sysgenenz-1

Delay7

sysgenenz-7

Delay6

sysgenenz-1

Delay5

sysgenz-1

Delay4

sysgenenz-8

Delay3

sysgenz-1

Delay2

sysgenenz-8

Delay1

sysgenz-2

Delay

xn

DAddr

CAddr

LD

yn

xnz

C7

xn

DAddr

CAddr

LD

yn

xnz

C6

xn

DAddr

CAddr

LD

yn

xnz

C5

xn

DAddr

CAddr

LD

yn

xnz

C4

xn

DAddr

CAddr

LD

yn

xnz

C3

xn

DAddr

CAddr

LD

yn

xnz

C2

xn

DAddr

CAddr

LD

yn

xnz

C1

sysgen

a b

en

a +

bz-1AddSub4

sysgen

a b

en

a +

bz-1AddSub2sysgen

a b

en

a +

bz-1AddSub13

sysgen

a b

en

a +

bz-1AddSub12sysgen

a b

en

a +

bz-1AddSub1sysgen

a b

en

a +

bz-1AddSub

2

BaudClk

1

x

Bank of correlators

1-bit correlator

10 time multiplexedcorrelators

Each 1-bit correlator :10 slices

Total for clipped correlator :589 slices

Full precision correlators :32 embedded multipliers896 flipflops

Page 58: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

58

R. M. Rao, 2008

QRD

• One of the popular methods of matrix inversion is based on QRD.

• Q is Unitary and R is upper triangular• A Unitary matrix has a trival inverse, • An upper triangular matrix can be inverted by

back-substitution

H QR

1 HQ Q

1 1 HH R Q

Page 59: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

59

R. M. Rao, 2008

Givens Rotations

• For a 2x1 vector of real numbers

• For a NxM matrix, repeat the process 2 cells at a time.

2 2

2 2 2 2

0

,

c s a a bs c b

a bc s

a b a b

11 12 13 11 12 1311 12 1311 12 13

21 22 23 21 22 23 22 23 22 23

31 32 33 32 33 32 33 33

0 0

0 0 0 0

a a a a a aa a aa a a

a a a a a a a a a a

a a a a a a a a

Page 60: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

60

R. M. Rao, 2008

Systolic Arrays

• Structured arrays with identical cells. Usually a “boundary” cell and an “internal” cell for the QRD process.

Boundary cell

Internal cell 1. The boundary cell generates the rotations.

2. Internal cell applies the rotations to all the cells in the row.

3. The systolic array in this figure can handle any matrix below 3x3.

Page 61: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

61

R. M. Rao, 2008

Triangularization mode• For QRD of upto a 3x3

matrix we need 3 boundary cells and 3 internal cells.

• Boundary cells calculate rotation vectors and internal cells store them.

• Data is fed column-wise into the systolic array.

• This may have to be staggered depending on the pipelining delays thru the boundary cell and internal cell.

11 12 1311 12 13 11 12 1311 12 13

21 22 23 22 23 22 23 22 23

31 32 33 31 32 33 32 33 33

0 0 0

0 0 0

a a aa a a a a aa a a

a a a a a a a a a

a a a a a a a a a

31

21

11

a

a

a

32

22

12

a

a

a

33

23

13

a

a

a

The rotation factors for zeroing out cell A(2,1) are stored in cell A(1,2), etc.

Page 62: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

62

R. M. Rao, 2008

Q-matrix computation mode

H

H H

Q A R

Q I Q

11 12 1321 21 31 31 11 12 13

32 32 21 21 21 22 23 22 23

32 32 31 31 31 32 33 33

1 0 0 0 0

0 0 0 1 0 0

0 0 0 1 0 0 0

a a ac s c s a a a

c s s c a a a a a

s c s c a a a a

0

0

1

0

1

0

1

0

0

first column of Q matrix

second column of Q matrix

third column of Q matrix

* *

* . * .

* . * .

;

s x I s s I c

z x I c s I s

c c

HQ RA

Page 63: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

63

R. M. Rao, 2008

Agenda• Introduction to Wireless communications

– Systems design and considerations• The wireless environment• Link budget• MIMO and OFDM Systems

– High level view of wireless communication systems• Mobile WiMax, an example of wireless comm system, • Hardware/software partitioning• PHY/MAC etc.

• The Platform FPGA– Overview of FPGAs and FPGA tools– Building DSP sub-systems on FPGAs– Digital baseband

• FPGA tools and design methodology

Page 64: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

64

R. M. Rao, 2008

FPGA Tools for DSP Systems Design

• Higher level tools are raising the level of abstraction.

• Allows non-hardware engineers (algorithm designers) to get a first look at hardware.

• System Generator– Simulink to Hardware

• C-to-Gates tools– C or “higher” level languages to gates

Page 65: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

65

R. M. Rao, 2008

System GeneratorSystem Level Modeling & Simulation Framework

Work in the language of your problem

HDL

C

Page 66: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

66

R. M. Rao, 2008

HDL Simulation Flow

1. Develop Algorithm &System Model

Download to FPGA

DSP Development Flow

2. Automatic CodeGeneration

Simulink MDL

Bitstream

System Generator Flow

3. Xilinx Implementation Flow

HDL Test Bench Test Vectors

RTL VHDL & Cores

FPGA

Page 67: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

67

R. M. Rao, 2008

Configurable MIMO-OFDM Transmitter

8

ImagOut4

7

RealOut4

6

ImagOut3

5

RealOut3

4

ImagOut2

3

RealOut2

2

ImagOut1

1

RealOut1

RealIn

ImagIn

WriteFIFO

BaudClk

RealOut1

ImagOut1

RealOut2

ImagOut2

RealOut3

ImagOut3

RealOut4

ImagOut4

Spatial Demultiplexing

RealIn

ImagIn

SampleClk

Bdata

rfd

Preamble

BFrame

FFTbusy

RealOut

ImagOut

Start

Enable

DataRequest

DataSubcarrier

Pilot Insertionand Data loading

DataIn

SampleClk

Zeroblks

Preamble

Bdata

DataSubc

DataEnable

RealOut

ImagOut

Packetizationand Encoding

SampleClk

Zeroblks

Preamble

Bdata

BFrame

Packet Controller

sysgenandz-0

Logical2

sysgenandz-0

Logical

sysgennot

Inverter FFT

xn_re

xn_im

start

enable

xk_re

xk_im

xk_index

rfd

vout

Busy

FFT

Clock Generator

SampleClk

BaudClk

ClockGenerator

RealIn

ImagIn

Addr

WriteFIFO

RealOut

ImagOut

ReadFIFO

Add Cyclic Extension

3

DataDone2

DataEnable

1

DataIndouble double

double

double

double double

double

Fix_16_10

UFix_6_0double

double

double

Fix_16_10

doubledouble

double

double

double

double

double

double

double

double

double

double

double

double

double

double

Bool

Bool

Bool

double double

Booldouble

double

Packet Controller

Packetization and configurable STBC

encoding

Pilot insertion and data loading

Time shared FFT across antennas

Add Cyclic Extension/Block

Shaping

Spatial Demultiplexing

and Interpolation

Resource sharing (folding factor)Ratio of System clock rate to symbol rate > 8 needed for a 4 transmit antenna system

Page 68: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

68

R. M. Rao, 2008

MIMO Receiver Architecture

Samples processed at sample clock rate Samples processedat system clock rate

Packet Detection

Packet Detection

Packet Detection

Packet Detection

Block Boundary Detection

BlockBoundary

Coarse CFOestimate

Coarse CFOestimate

CFO estimator

Strip CP

Strip CP

Strip CP

Strip CP

Input FIFO

Input FIFO

Input FIFO

Input FIFO

FFT

FFT

FFT

FFT

Rx 1

Rx 2

Rx 3

Rx 4

Channel Estimator

Output FIFO

Output FIFO

Output FIFO

Output FIFO

Combine PD

MIMO Decoder Matrix

(MMSE, etc)

MIMO Decode

Soft Decisions

MIMO Decoder

FIFO

Pilot based CFO estimator

Packet Controller

Preamble

Payload

CF

O C

ompe

nsat

or

Page 69: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

69

R. M. Rao, 2008

Fine-timing acquisition using a clipped correlator

1

ynsysgencast

bc3

sysgencast

bc2sysgen

d

en

qz-1

in0

in1out0

Register1

sysgen

a

b

suba b

AddSub

3

ld

2

coeff

1

a

2

xnz

1

ynsysgenaddrz-1

ROM1

sysgen

d

addr

en

q

R

a

coeff

ld

yn

MACsysgenz-1

Delay2

4

LD

3

CAddr

2

DAddr

1

xn

1

y

BaudClk

Data Addr

Coef Addr

load

FSM

sysgenenz-1

Delay7

sysgenenz-7

Delay6

sysgenenz-1

Delay5

sysgenz-1

Delay4

sysgenenz-8

Delay3

sysgenz-1

Delay2

sysgenenz-8

Delay1

sysgenz-2

Delay

xn

DAddr

CAddr

LD

yn

xnz

C7

xn

DAddr

CAddr

LD

yn

xnz

C6

xn

DAddr

CAddr

LD

yn

xnz

C5

xn

DAddr

CAddr

LD

yn

xnz

C4

xn

DAddr

CAddr

LD

yn

xnz

C3

xn

DAddr

CAddr

LD

yn

xnz

C2

xn

DAddr

CAddr

LD

yn

xnz

C1

sysgen

a b

en

a +

bz-1AddSub4

sysgen

a b

en

a +

bz-1AddSub2sysgen

a b

en

a +

bz-1AddSub13

sysgen

a b

en

a +

bz-1AddSub12sysgen

a b

en

a +

bz-1AddSub1sysgen

a b

en

a +

bz-1AddSub

2

BaudClk

1

x

Bank of correlators

1-bit correlator

10 time multiplexedcorrelators

Each 1-bit correlator :10 slices

Total for clipped correlator :589 slices

Full precision correlators :32 embedded multipliers896 flipflops

Page 70: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

70

R. M. Rao, 2008

MIMO-OFDM Receiver

10

ValidOut

9

PacketDetect

8

SoftDecImag4

7

SoftDecReal4

6

SoftDecImag3

5

SoftDecReal3

4

SoftDecImag2

3

SoftDecReal2

2

SoftDecImag1

1

SoftDecReal1

Ch_tx1rx1

Ch_tx1rx2

Ch_tx1rx3

Ch_tx1rx4

Ch_tx2rx1

Ch_tx2rx2

Ch_tx2rx3

Ch_tx2rx4

Ch_tx3rx1

Ch_tx3rx2

Ch_tx3rx3

Ch_tx3rx4

Ch_tx4rx1

Ch_tx4rx2

Ch_tx4rx3

Ch_tx4rx4

En

Addr

wreal_1_1

wimag_1_1

wreal_1_2

wimag_1_2

wreal_1_3

wimag_1_3

wreal_1_4

wimag_1_4

wreal_2_1

wimag_2_1

wreal_2_2

wimag_2_2

wreal_2_3

wimag_2_3

wreal_2_4

wimag_2_4

wreal_3_1

wimag_3_1

wreal_3_2

wimag_3_2

wreal_3_3

wimag_3_3

wreal_3_4

wimag_3_4

wreal_4_1

wimag_4_1

wreal_4_2

wimag_4_2

wreal_4_3

wimag_4_3

wreal_4_4

wimag_4_4

Weight Matrix Computation

Rxreal1

Rximag1

Rxreal2

Rximag2

Rxreal3

Rximag3

Rxreal4

Rximag4

ValidData

Addr

Out_real1

Out_imag1

Out_real2

Out_imag2

Out_real3

Out_imag3

Out_real4

Out_imag4

ReadFIFO

AddrOut

Output FIFO

RealIn1

ImagIn1

RealIn2

ImagIn2

Baud_clk

PacketDetect

CFO_Est

PktDetPulse

MIMO Packet Detect1

Rxreal1

Rximag1

Rxreal2

Rximag2

Rxreal3

Rximag3

Rxreal4

Rximag4

ReadFIFO

Addr

wreal_1_1

wimag_1_1

wreal_1_2

wimag_1_2

wreal_1_3

wimag_1_3

wreal_1_4

wimag_1_4

wreal_2_1

wimag_2_1

wreal_2_2

wimag_2_2

wreal_2_3

wimag_2_3

wreal_2_4

wimag_2_4

wreal_3_1

wimag_3_1

wreal_3_2

wimag_3_2

wreal_3_3

wimag_3_3

wreal_3_4

wimag_3_4

wreal_4_1

wimag_4_1

wreal_4_2

wimag_4_2

wreal_4_3

wimag_4_3

wreal_4_4

wimag_4_4

BaudClk

Out_real1

Out_imag1

valid_out

ReadWeightMatrix

Out_real2

Out_imag2

Out_real3

Out_imag3

Out_real4

Out_imag4

MIMO Decoder

WriteFIFO

RxStream1

RxStream2

RxStream3

RxStream4

Enable

ReadFIFO

CFO_est

FFT_Start

CFO_Valid

RxOut1

RxOut2

RxOut3

RxOut4

FIFO_status_flag

Input Buffer

RealIn

ImagIn

BaudClk

Out2

BBDValid

Fine Timing Acquisition

RxStream1

RxStream2

RxStream3

RxStream4

FIFO_status_flag

Enable

CFO_Valid

Reset

RxReal1

RxImag1

RxReal2

RxImag2

RxReal3

RxImag3

RxReal4

RxImag4

Valid out

Addr

FFT_RFD

FFT_Start

FFT

0

Display2

0

Display1

z-1 Delay8

enz-1

Delay7

enz-1

Delay6

enz-1

Delay5

enz-1

Delay4

enz-1

Delay3

enz-1

Delay2

enz-1

Delay1

enz-1

Delay

BlkBounDetect

RealIn1

ImagIn1

RealIn2

ImagIn2

RealIn3

ImagIn3

RealIn4

ImagIn4

PacketDetect

BaudClk

ReadEnable

RxStream1

RxStream2

RxStream3

RxStream4

Cyclic Prefix Removal

Clock Generator

SampleClk

BaudClk

ClockGenerator

Rxreal1

Rximag1

Rxreal2

Rximag2

Rxreal3

Rximag3

Rxreal4

Rximag4

ValidData

Addr

ReadAddr

Ch_1_1

Ch_1_2

Ch_1_3

Ch_1_4

Ch_2_1

Ch_2_2

Ch_2_3

Ch_2_4

Ch_3_1

Ch_3_2

Ch_3_3

Ch_3_4

Ch_4_1

Ch_4_2

Ch_4_3

Ch_4_4

CFO_Est

CFO_Est_Valid

Channel Estimation

a

ba - b

AddSub

9

Reset

8

ImagIn4

7

RealIn4

6

ImagIn3

5

RealIn3

4

ImagIn2

3

RealIn2

2

ImagIn1

1

RealIn1

Packet Detection

Fine Timing Acq

Cyclic prefix removal

Channel Estimation

Weight Matrix Computation

MIMO Decoder

FFT

Carrier Frequency Offset Correction

Output FIFO

Page 71: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

71

R. M. Rao, 2008

Channel Estimation

32

Chimag16

31

Chreal1630

Chimag15

29

Chreal1528

Chimag14

27

Chreal1426

Chimag13

25

Chreal13

24

Chimag12

23

Chreal1222

Chimag11

21

Chreal1120

Chimag10

19

Chreal10

18

Chimag9

17

Chreal9

16

Chimag8

15

Chreal814

Chimag7

13

Chreal7

12

Chimag6

11

Chreal6

10

Chimag5

9

Chreal5

8

Chimag4

7

Chreal4

6

Chimag3

5

Chreal3

4

Chimag2

3

Chreal2

2

Chimag1

1

Chreal1

Enable

Reset

Pilot_real

Training SymbolsTx4

Enable

Reset

Pilot_real

Training SymbolsTx3

Enable

Reset

Pilot_real

Training SymbolsTx2

Enable

Reset

Pilots

Addr

Training SymbolsTx1

simout11

To Workspace2

addr

Real

Imag

WE

EN

real_out

imag_out

Single Port RAM3

addr

Real

Imag

WE

EN

real_out

imag_out

Single Port RAM2

addr

Real

Imag

WE

EN

real_out

imag_out

Single Port RAM1

addr

Real

Imag

WE

EN

real_out

imag_out

Single Port RAM

sysgen

sel

d0

d1

Mux1

sysgen

sel

d0

d1

Mux

sysgenandz-2

Logical

sysgenz-2

Delay9

sysgenz-2

Delay8

sysgenz-2

Delay7

sysgenz-1 Delay6

sysgenz-2

Delay5

sysgenz-2

Delay4

sysgenz-2

Delay3

sysgenz-2

Delay2

sysgenz-2

Delay12

sysgenz-2

Delay11

sysgenz-2

Delay10

sysgenz-3

Delay1

sysgenrst

enout

Counter2

sysgenrst

enout

Counter1

ValidData

ChEstPilots

ChEstEn

ChEstRst

En

Rst

En2

ChEstPilots1

ControlSignals

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx4-Rx4

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx4-Rx3

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx4-Rx2

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx4-Rx1

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx3-Rx4

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx3-Rx3

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx3-Rx2

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx3-Rx1

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx2-Rx4

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx2-Rx3

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx2-Rx2

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx2-Rx1

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx1-Rx4

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx1-Rx3

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx1-Rx2

addr

Pilots1

Real

Imag

WE

VDATA

real_out

imag_out

Real_in

Imag_in

ChEst Tx1-Rx1

sysgenx 0.3535

CMult7

sysgenx 0.3535

CMult6

sysgenx 0.3535

CMult5

sysgenx 0.3535

CMult4

sysgenx 0.3535

CMult3

sysgenx 0.3535

CMult2

sysgenx 0.3535

CMult1

sysgenx 0.3535

CMult

12

ReadAddr

11

ChEstPilots

10

Addr

9

ValidData

8

Rximag4

7

Rxreal4

6

Rximag3

5

Rxreal3

4

Rximag2

3

Rxreal2

2

Rximag1

1

Rxreal1

double

double

Bool

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

UFix_6_0

Fix_16_10

UFix_6_0

UFix_6_0

UFix_6_0

Fix_16_10

Fix_16_10

double

double

double

Bool

double

double

UFix_6_0

Fix_16_10

Fix_16_10

Bool

double

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_32_20

Fix_32_20

Fix_32_20

double

double

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

double

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_2_0

Fix_32_20

Fix_32_20

Fix_32_20

double

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_2_0

UFix_6_0

double

double

double (8)

double

double

double

double

double

double

double

doubleFix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_32_20

Fix_32_20

double

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Fix_32_20

Channel Estimation Pilots for Tx4

Channel Estimation Pilots for Tx1

4x4 Channel Estimation Memory

Control Signals

Input FIFO

Page 72: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

72

R. M. Rao, 2008

Packet Detection

Schmidl and Cox algorithm for Packet Detection and coarse carrier frequency offset estimation.

T. M. Schmidl, D. C. Cox, “Low Overhead Low Complexity Synchronization for OFDM”, ICC 1996, Vol 3, pp 1301-1306. Z-D

C

P

2

2( )

r(n) c(n)

p(n)

m(n)*

*

Identical halves of 1 OFDM symbol

Page 73: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

73

R. M. Rao, 2008

Two Branch CFO estimation using Schmidl and Cox algo

AvePwr

3

CorrMetric _ imag

2

CorrMetric _real

1

Sliding WindowAverager

In

BaudClk

Rst

Out

Slice5

[a:b]

Slice3

[a:b]

Slice2

[a:b]

Slice1

[a:b]

Reinterpret 4

reinterpret

Reinterpret 3

reinterpret

Reinterpret 2

reinterpret

Reinterpret 1

reinterpret

Magnitude -Squared 1

Squarer

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

Delay 4

enz-32

Delay 3

enz-32

Delay 2

enz-2

Delay 1

enz-32

Delay

enz-32

Complex Sliding Window Averager 1

RealIn

ImagIn

BaudClk

Rst

RealOut

ImagOut

Complex Sliding Window Averager

RealIn

ImagIn

BaudClk

Rst

RealOut

ImagOut

Complex Multiply 3

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

Complex Multiply 2

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

AddSub 2

a

b

a + bz-1

AddSub 1

a

b

a + bz-1

Rst

6

BaudClk

5

ImagIn 2

4

RealIn 2

3

ImagIn 1

2

RealIn 1

1 a

b

Combine the metric from both Antennas

Carrier Frequency Offset causes a linearly increasing rotation in the time domain

jYe Y

Page 74: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

74

R. M. Rao, 2008

Carrier Frequency Offset Estimation

• Pre-FFT– Uses a dedicated preamble or symbol for CFO estimation

• Post-FFT using channel estimation pilots– Uses channel estimation training symbols

• Post-FFT CFO Tracking– Needs continuous pilots during payload symbols

• CFO Estimation using Cyclic Prefix– Works well when you have a lengthy cyclic prefix– Examples: WiMax, 3GPP-LTE, DVB-T/H– Does not need preamble or pilot support

Page 75: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

75

R. M. Rao, 2008

Pre-FFT Carrier Frequency Offset Estimation

CFO_Est1

Truncate

In1

In2

In3

Out1

Out2

Out3

Rising edgedetector

In1

Out1

Register1

drsten

qz- 1

Packet Detection 3

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

Rst

CorrMetric _ real

CorrMetric _ imag

AvePwr

Delay6

enz-24

Delay5

enz-14

Convert

cast

CORDIC ATAN

z-17

x

y

mag

atan

CMult8

x 0.003906z-2

BBD7

Rst6

Baud_clk5

ImagIn24

RealIn23

ImagIn12

RealIn11

The angle of the correlation metric is proportional to the Carrier frequency offset.

Right size the number of bits before the CORDIC operation.

CORDIC ATAN from the Xilinx Math library calculates the angle.

ˆ

22sN

Page 76: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

76

R. M. Rao, 2008

Post-FFT CFO Estimation and tracking

Location of channel estimation training symbols for Antenna 1 for a 2 antenna MIMO system

A subset of channel estimation training symbols is used for CFO

estimation

Angular rotation on symbol 1

Angular rotation on symbol 2

( )kProportional to

CFO

( ( ))ˆ

2 (1 )

mean kc N

CPNs

e

CFO causes a linear rotation every sample in the time domain.

CFO causes a constant rotation on all subcarriers in the frequency domain.

This rotation increases from OFDM symbol to symbol and can be used to estimate CFO.

Page 77: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

77

R. M. Rao, 2008

Carrier Frequency Offset Correction

ImagOut 4

8

RealOut 4

7

ImagOut 3

6

RealOut 3

5

ImagOut 2

4

RealOut 2

3

ImagOut 1

2

RealOut 1

1

Rising edgedetector

In1 Out1

Relational 1

a

b

a<=b

z-0

Relational

a

b

a<b

z-0

Negate 1

x(-1)

Logical 1

orz-0

Logical

and

z-0

Delay 7

z-1

Delay 6

z-1

Delay 5

z-1

Delay 4

z-1

Delay 3

z-1

Delay 2

z-1

Delay 1

z-1

Delay

z-1

DDS

freq_off

Enable

Reset

cos_out

sin_out

Counter

rst out

Constant 3

1

Constant 2

78

Constant 1

0

Complex Multiply 3

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

Complex Multiply 2

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

Complex Multiply 1

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

Complex Multiply

Complex Multiply

RealIn 1

ImagIn 1

RealIn 2

ImagIn 2

BaudClk

RealOut

ImagOut

CMult

x 0.01563

Reset

12

CFO_Est_valid

11

FFT_Start

10

CFO_Est

9

ImagIn 4

8

RealIn 4

7

ImagIn 3

6

RealIn 3

5

ImagIn 2

4

RealIn 2

3

ImagIn 1

2

RealIn 1

1

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Bool

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_15

Fix_16_15Fix_17_15

Fix_16_12

Fix_16_10

Fix_16_10

UFix_16_0

UFix_16_0

UFix_16_0

Bool

Bool

BoolBool

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Fix_16_10

Bool

Bool

Fix_16_10

Fix_16_10

Fix_16_16

double

Direct digital synthesizer (DDS) from the Xilinx DSP SysGen library.

Page 78: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

78

R. M. Rao, 2008

Design methodology issues

• FPGA tools– Where to from here?

• C-to-gates– Higher level design languages to gates– Raising the level of abstraction

Page 79: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

79

R. M. Rao, 2008

End of Roadmap for theVon Neumann Model

SPEC

Int9

2/M

Hz

Source: Ronen [2001]

CPUs are as smart as they can be!

MHz

L2 $

Spot the CPU!

L1 $

CPU

Source: Agarwala [2002]TI 6416

Clock frequency

scaling

Absolute power limits

With Moore’s law you also get leakage!

Source: Borkar [1999]

Divide and conquer

Source: Zu & Baas [2006]

Multi-core Arrays

1945-2005Sequential

programming

2005 - ????Concurrent

programming 6x6 GALS Processor Array

Page 80: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

80

R. M. Rao, 2008

Merging Mindsets:Software Design vs. Hardware Design

class A

start()

class B

class C

class D

resourceA resourceB resourceC

Events Protocols Ordering Sequential execution

Encapsulation Abstraction Portability Re-use

Implementation Detail Control Logic

Interface Glue Concurrency

Communication Architecture

Clocks Signals

Timing

Combining the strengths of both paradigms can bring about a radical improvement in hardware/software system design productivity.

Page 81: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

81

R. M. Rao, 2008

Objective for a New Methodology:reduce design cost (by a lot)

• Quality of result (QoR) is not a design goal! Performance, power, BOM cost budgets make QoR a design constraint

• The real objective is to meet the QoR target and minimize: Non-recurring engineering costs (NRE) Time-to-market (TTM)

• The new methodology should save on design cost by enabling Design of portable, retargetable, composable IP blocks Rapid design space exploration and system composition

Total Design CostNRE $, TTM

Traditional HDL FlowQoRperformance/$

performance/W New methodology

AbstractionProfit

abstractioncost

Page 82: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

82

R. M. Rao, 2008

‘C’ or higher level language to Gates

• There is interest in higher level design methodologies, such as C-to-Gates from the design community.

• ESL (Electronic system level) tools/design methodologies are being explored.

• But, extracting all the concurrency from a sequential description is not an easy problem.

Page 83: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

83

R. M. Rao, 2008

Actor/Dataflow Programming Model

encapsulated state

Actions

State

point-to-point, buffered token-passing connections

actors guarded atomic actions

• A well-known and researched model for concurrent systems– Edward Lee et. al. (UC Berkeley)– Arvind et. al. (MIT)

• Broadly applicable to heterogeneous HW/SW systems• Actors are described in the CAL language (UC Berkeley)

– Open source simulator available from SourceForge– Under consideration as reference model for MPEG

Page 84: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

84

R. M. Rao, 2008

Conclusion

• FPGAs are finding wide use in infrastructure communication systems and signal processing systems.

• FPGA are an efficient choice for exploring VLSI architectures.

• FPGA tools are raising the level of abstraction to allow algorithm designers the ability to explore h/w architectures without learning “h/w design tools/languages”.

Page 85: WIRELESS COMMUNICATIONS From Systems to Silicon Raghu Rao Wireless Systems Group, Xilinx Inc

85

R. M. Rao, 2008

Questions?