flexible wireless communication architectures

41
RICE UNIVERSITY Flexible wireless communication architectures Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar – Southern Methodist University April 23, 2003 This work has been supported in part by NSF, Nokia and Texas Instruments

Upload: gordon

Post on 05-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Flexible wireless communication architectures. Sridhar Rajagopal Department of Electrical and Computer Engineering Rice University, Houston TX Faculty Candidate Seminar – Southern Methodist University April 23, 2003. This work has been supported in part by NSF, Nokia and Texas Instruments. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Flexible wireless communication architectures

RICE UNIVERSITY

Flexible wireless communication architectures

Sridhar Rajagopal

Department of Electrical and Computer EngineeringRice University, Houston TX

Faculty Candidate Seminar – Southern Methodist UniversityApril 23, 2003

This work has been supported in part by NSF, Nokia and Texas Instruments

Page 2: Flexible wireless communication architectures

2RICE UNIVERSITY

Future wireless devices demand flexibility

Multiple algorithms and environments supported in same device

High data rate mobile devices with multimedia

Flexible algorithms: Multiple antennas, complex signal processing

Flexible architectures: High performance (Mbps), low power (mW)

Fast design with structured exploration

Bluetooth/Home Networks

Wireless Cellular

Wireless LAN

Page 3: Flexible wireless communication architectures

3RICE UNIVERSITY

Flexibility needed in different layers

Physical Layer

MAC Layer

Network Layer

Application Layer Puppeteer project at Ricehttp://www.cs.rice.edu/CS/Systems/Puppeteer/

Analog RF

Flexible Algorithms

Mapping

Flexible Architectures

Page 4: Flexible wireless communication architectures

4RICE UNIVERSITY

Research vision: Attain flexibility

Algorithms:Flexibility: support variety of sophisticated

algorithms

Architectures:Flexibility: adapts hardware to algorithms

Fast, structured design exploration

Design me

Page 5: Flexible wireless communication architectures

5RICE UNIVERSITY

Contributions: Algorithms

Multi-user channel estimation:[Jnl. Of VLSI Sig. Proc.’02, ASAP’00] Matrix-inversions Numerical techniques

conjugate-gradient descent for complexity reduction

Multi-user detection: [ISCAS’01] Block-based computation to streaming computations

Pipelining, lower memory requirements

Parallel, fixed-point, streaming VLSI implementations [IEEE Trans. Wireless Comm.’02]

Page 6: Flexible wireless communication architectures

6RICE UNIVERSITY

Contributions: Architectures

Heterogeneous DSP-FPGA system designs: [ICSPAT’00]

Computer arithmetic:[Symp. On Comp. Arith’01]Dynamic truncation in ASICs using on-line arithmeticwith Most Significant Digit First computation

[Ph.D. Thesis]

Scalable Wireless Application-specific Processors (SWAPs)

Rapid, structured architectures with flexibility-performance tradeoffs

Page 7: Flexible wireless communication architectures

7RICE UNIVERSITY

Scalable Wireless Application-specific Processors

Family of flexible programmable processorsClusters of ALUsHigh performance by supporting 100’s of ALUsCan provide customization for various algorithmsAdapts (“swaps”) architecture dynamically for power

+

?

**

+

**

+

**

+

**

…? ? ?

Scale Clusters

ScaleALUs

Page 8: Flexible wireless communication architectures

8RICE UNIVERSITY

Rapid, structured design for SWAPs

Low “complexity”, parallel, fixed point

algorithms

Architecture Exploration ASIC

designapply

DSPdesign

apply

SWAPs+?**

+

**

+

**

+

**

…? ? ?

Page 9: Flexible wireless communication architectures

9RICE UNIVERSITY

Research vision summary

Provide a structured framework to rapidly explore:flexible, high performance, low power architectures

(SWAPs)

Efficient algorithm design for mapping to SWAPs

Understanding of algorithms, DSPs and ASICs used

Flexibility-performance trade-offs

Inter-disciplinary research:Wireless communications, VLSI Signal Processing, Computer

architecture, Computer arithmetic, Circuits, CAD, Compilers

Page 10: Flexible wireless communication architectures

10RICE UNIVERSITY

Talk Outline

Research vision

SWAPs - Background

Algorithm design for SWAPs

Architecture design for SWAPs

Current and Future Research Goals

Page 11: Flexible wireless communication architectures

11RICE UNIVERSITY

SWAPs borrow from DSPs

DSPs use : Instruction Level Parallelism (ILP) Subword Parallelism (MMX)

Not enough ALUs for GOPs of computation-- Need 100’s TI C6x has 8 ALUs

Why not more ALUs?Cannot support more registers (area,ports)Difficult to find ILP as ALUs increase

32

Register File

1 ALURF 4 16

Page 12: Flexible wireless communication architectures

12RICE UNIVERSITY

SWAPs borrow from ASICs

Exploit data parallelism (DP)Available in many wireless algorithmsThis is what ASICs do!

int i,a[N],b[N],sum[N]; // 32 bitsshort int c[N],d[N],diff[N]; // 16 bits packed

for (i = 0; i< 1024; ++i)

{

sum[i] = a[i] + b[i];

diff[i] = c[i] - d[i];

}

ILP

DP

Subword

Page 13: Flexible wireless communication architectures

13RICE UNIVERSITY

SWAPs borrow from stream processors

Kernel

Viterbidecoding

StreamInput Data Output Data

Correlator channelestimation

receivedsignal

Matchedfilter

InterferenceCancellation

Decoded bits

Kernels (computation) and streams (communication)

Use local data in clusters providing GOPs support

Imagine stream processor at Stanford [Rixner’01]

Scott Rixner. Stream Processor Architecture, Kluwer Academic Publishers: Boston, MA, 2001.

Page 14: Flexible wireless communication architectures

14RICE UNIVERSITY

SWAPs are multi-cluster DSPs

+++***

InternalMemory

ILP

Memory: Stream Register File (SRF)

DSP(1 cluster)

+++***

+++***

+++***

+++***

…ILP

DP

SWAPsadapt clusters to DP

Identical clusters, same operations.Power-down unused FUs, clusters

Page 15: Flexible wireless communication architectures

15RICE UNIVERSITY

Arithmetic clusters in SWAPs

Intercluster NetworkComm. Unit

Scratchpad (indexed accesses)

SRF

From/To SRF

Cross Point

Distributed Register Files(supports more ALUs)

+

+

+*

*/

+/

+

+

+*

*/

+

/

Page 16: Flexible wireless communication architectures

16RICE UNIVERSITY

Talk Outline

Research vision

SWAPs Background

Algorithm design for SWAPs

Architecture design for SWAPs

Current and Future Research Goals

Page 17: Flexible wireless communication architectures

17RICE UNIVERSITY

SWAPs: Physical layer algorithms

Antenna

Channel estimation

Detection DecodingHigher(MAC/

Network/OS)Layers

RF Front-end

Baseband processing

Complex signal processing algorithms with GOPs of computation

Page 18: Flexible wireless communication architectures

18RICE UNIVERSITY

SWAP mapping example: Viterbi decoding

Multiple antenna systems (MIMO systems)Complexity exponential with transmit x receive antennas

Estimation: Linear MMSE, blind, conjugate gradient….

Detection: FFT, (blind) interference cancellation….

Decoding: Viterbi, Turbo, LDPC…. & joint schemes

SWAP flexibility lets you use the best algorithms for the situation

Example for concept demonstration: Viterbi decoding

Page 19: Flexible wireless communication architectures

19RICE UNIVERSITY

Parallel Viterbi Decoding for SWAPs

Add-Compare-Select (ACS) : trellis interconnect : computationsParallelism depends on constraint length (#states)

Traceback: searchingConventional

• Sequential (No DP) with dynamic branching• Difficult to implement in parallel architecture

Use Register Exchange (RE) • parallel solution

ACS Unit

Traceback Unit

Detectedbits

Decodedbits

Page 20: Flexible wireless communication architectures

20RICE UNIVERSITY

Parallel Viterbi needs re-ordering for SWAPs

Exploiting Viterbi DP in SWAPs:Use RE instead of regular traceback Re-order ACS, RE

X(0)

X(1)

X(2)X(3)

X(4)

X(5)

X(6)X(7)

X(8)

X(9)

X(10) X(11)

X(12)

X(13)

X(14) X(15)

X(0)

X(1)

X(2)X(3)

X(4)

X(5)

X(6)X(7)

X(8)

X(9)

X(10) X(11)

X(12)

X(13)

X(14) X(15)

X(0)

X(2)

X(4)X(6)

X(8)

X(10)

X(12)X(14)

X(1)

X(3)

X(5) X(7)

X(9)

X(11)

X(13) X(15)

X(0)

X(1)

X(2)X(3)

X(4)

X(5)

X(6)X(7)

X(8)

X(9)

X(10) X(11)

X(12)

X(13)

X(14) X(15)

DP

vector

Regular ACSACS in SWAPs

Page 21: Flexible wireless communication architectures

21RICE UNIVERSITY

Talk Outline

Research vision

SWAP Background

Algorithm design for SWAPs

Architecture design for SWAPs

Current and Future Research Goals

Page 22: Flexible wireless communication architectures

22RICE UNIVERSITY

SWAP architecture design

More clusters better than more ALUs/per cluster (if #clusters > 2)

1. Decide how many clusters Exploit DP

2. Decide what to put within each cluster Maximize ILP with high functional unit efficiency Search design space with “explore” tool

Time-power-area characterization

+?**

+

**

+

**

+

**

…ILP

DP

? ? ?

Page 23: Flexible wireless communication architectures

23RICE UNIVERSITY

Design a SWAP cluster: “Explore”

Auto-exploration of adders and multipliers for “ACS"

1

2

3

4

5

1

2

3

4

5

40

60

80

100

120

140

160

(43,58)

(54,59)

(39,41)

(62,62)

(47,43)

#Multipliers

(40,32)

(70,59)

(65,45)

(49,33)

(39,27)

(80,34)

(73,41)

(61,33)

(48,26)

(39,22)

(50,22)

(85,24)

(76,33)

(60,26)

#Adders

(61,22)

(85,17)

(72,22)

(72,19)

(85,13)

(85,11)

Inst

ruct

ion c

ount

(Adder util%, Multiplier util%)

Page 24: Flexible wireless communication architectures

24RICE UNIVERSITY

“Explore” tool benefits

Instruction count vs. ALU efficiencyWhat goes inside each cluster

Design customized application-specific unitsBetter performance with increased ALU utilization

Explore multiple algorithms turn off functional units not in use for given kernelVdd-gating, clock gating techniques

Page 25: Flexible wireless communication architectures

25RICE UNIVERSITY

Example for SWAP architecture design

Explore Algorithm 1 : 3 adders, 3 multipliers, 32 clusters

Explore Algorithm 2 : 4 adders, 1 multiplier, 64 clusters

Explore Algorithm 3 : 2 adders, 2 multipliers, 64 clusters

Explore Algorithm 4 : 2 adders, 2 multipliers, 16 clusters

Chosen Architecture: 4 adders, 3 multipliers, 64 clusters

ILP

DP

Page 26: Flexible wireless communication architectures

26RICE UNIVERSITY

SWAP flexibility provides power savings

Multiple algorithmsDifferent ALU, cluster requirements

Turning off ALUs ( –add –mul compiler options)Use the right #ALUs from “explore” tool

Turning off clustersData across SRF of all clustersCluster only has access to its own SRFNext kernel may need data from SRF of other

clustersReconfiguration support needs to be provided

Page 27: Flexible wireless communication architectures

27RICE UNIVERSITY

SWAPs provide cluster reconfiguration

SRF

Clusters

Mux-DemuxNetwork

WithStreambuffers

M D X 2 M D X 2

M D X 1

LA T C H LA T C H LA T C H LA T C H

Additional latency (few cycles) due to microcontroller stalls

- Minimal loss in performance

Page 28: Flexible wireless communication architectures

28RICE UNIVERSITY

Cluster reconfiguration for Viterbi

Packet 1Constraint length 7

(16 clusters)

Packet 2Constraint length 9

(64 clusters)

Packet 3Constraint length 5

(4 clusters)

DP Can be turned OFF

Page 29: Flexible wireless communication architectures

29RICE UNIVERSITY

64-bit Rate ½

Packet 1K = 7

Packet 2K = 9

Packet 3K = 5

Kernels(Computation)

No Data Memoryaccesses

Execu

tion T

ime

(cycl

es)

Clusters Memory

SWAPs provide flexibility at negligible overhead

Page 30: Flexible wireless communication architectures

30RICE UNIVERSITY

SWAP exploration for Viterbi decoding

1 10 1001

10

100

1000

Number of clusters

Fre

qu

en

cy n

eed

ed

to a

ttain

real-

tim

e (

in M

Hz)

K = 9K = 7 K = 5Different SWAPs

(Without reconfiguration)Same SWAP

(With reconfiguration)

DSP

Ideal C64x (w/o co-proc) needs ~200 MHz for real-time

Max DP

Page 31: Flexible wireless communication architectures

31RICE UNIVERSITY

SWAPs : Salient features

1-2 orders of magnitude better than a DSP

Any constraint length 10 MHz at 128 Kbps

Same code for all constraint lengths no need to re-compile or load another codeas long as parallelism/cluster ratio is constant

Power savings due to dynamic cluster scaling

Page 32: Flexible wireless communication architectures

32RICE UNIVERSITY

Expected SWAP power consumption

Power model based on [Khailany’03] 64 clusters and 1 multiplier per cluster:

0.13 micron, 1.2 V Peak Active Power: ~9 mW at 1 MHz (DSP ~1 mW) Area: ~53.7 mm2

10 MHz, 128 Kbps with reconfiguration

Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of the Ninth Symposium on High Performance Computer Architecture, February 8-12, 2003

0 10 20 30 40 50 60 700102030405060708090

Active Clusters (max 64)P

ow

er (

in m

W)

Viterbi Clusters Used

Peak Power

K = 9 64 ~90 mW

K = 7 16 ~28.57 mWK = 5 4 ~13.8 mW

overhead 0 ~8.1 mW

DSP, K = 9 1 ~200 mW

Page 33: Flexible wireless communication architectures

33RICE UNIVERSITY

Multiuser Estimation-Detection+Decoding

Real-time target : 128 Kbps per user

1 10 10010

100

1000

10000

100000

Number of clusters

Fre

qu

en

cy

ne

ed

ed

to

att

ain

re

al-

tim

e (

in M

Hz)

FASTMEDIUMSLOW

32-user base-station

Mobile

DSP

Ideal C64x (w/o co-proc) needs ~15 GHz for real-time

Fading scenarios

Page 34: Flexible wireless communication architectures

34RICE UNIVERSITY

Expected SWAP power : base-station

32 user base-station with 3 X’s per cluster and 64 clusters: 0.13 micron, 1.2 V Peak Active Power: ~18.19 mW for 1 MHz (increased

X) Area: ~93.4 mm2

Total Peak Base-station power consumption:~18.19 W at 1 GHz for 32 users at 128 Kbps/user

Page 35: Flexible wireless communication architectures

35RICE UNIVERSITY

Talk Outline

Research vision

SWAP Background

Algorithm design for SWAPs

Architecture design for SWAPs

Current and Future Research Goals

Page 36: Flexible wireless communication architectures

36RICE UNIVERSITY

Current research: Flexibility vs. performance

SWAPs: 128 Kbps at ~10-100 mW for ViterbiBorrow DP from ASICs!

suitable for base-stationsFlexibility more important than power

suitable for mobile devicesPower constraints tightercan be customized for further power savings

Handset SWAPs (H-SWAPs) Borrow Task pipelining from ASICs!Application-specific units and specialized comm.

network

Page 37: Flexible wireless communication architectures

37RICE UNIVERSITY

Handset SWAPs: H-SWAPs

Trade Data Parallelism for Task Pipelining

SRF

+++***

+++***

+++***

+++***

+++***

+++***

+++***

+++***

+++***

DP

SWAPs(max. clusters

and reconfigure)

+++*

+++*

+++*

+++*

LimitedDP

SWAPlet(limit

clusters)

+++*

+++*

+++*

+++*

LimitedDP

++*

++*

++*

++*

LimitedDP

++++

++++

LimitedDP

H-SWAPs(collection of customized

SWAPlets)

Page 38: Flexible wireless communication architectures

38RICE UNIVERSITY

Sample points in architecture exploration

DSPs(1 cluster)

ILPSubword

ILPSubword

DP

SWAPs(multiple)

H-SWAPs(optimized for handsets)

ILPSubword

DP Task PipeliningCustom ALUs

Programmable solutions with increased customization

Performance, Power benefits(with decreasing flexibility)

Page 39: Flexible wireless communication architectures

39RICE UNIVERSITY

Future: Efficient algorithms and mapping

MultipathC hannel

EqualizerMRC Decoder

DetectorDemodulator

Non-C oherent

STC

Beam-forming

C oherentSTC

C hannelEstimator

C hannel

Turbo Equalizer

Multiple antenna systems with 1-2 orders-of-magnitude higher complexity

Page 40: Flexible wireless communication architectures

40RICE UNIVERSITY

Future research: Architectures

Generalized and structured framework and tools Joint algorithm-architecture explorationArea-time-power-flexibility tradeoffs

Potential applications: embedded systems Image and Video processing:

Cameras : variety of compression algorithms

Biomedical applications: Hearing aids: DSP running on body heat*

Sensor networksCompression of data before transmission

*Quote: Gene Frantz, TI Fellow

Page 41: Flexible wireless communication architectures

41RICE UNIVERSITY

SWAPs: Flexibility, Performance, Power

Need flexibility in future wireless devicesAlgorithms and Architectures

Rapid Exploration for Scalable, Wireless Application-specific ProcessorsStructured approach with flexibility-performance trade-offs

SWAPs - flexibility, high performance and low powerExploit data parallelism like ASICs1-2 orders better performance than DSPsTurn off unused clusters and unused ALUs for low power