presentation spprr luebeck wehn - (hardware-software-co ......and performance Öfabrication on 65nm...

13
1 Software Defined Radio - How to implement the outer modem ? „Entwurf einer dynamisch rekonfigurierbaren Plattform für Kanalcodierung zukünftiger Mobilfunksysteme“ Timo Vogt Norbert Wehn Microelectronic System Design Research Group University of Kaiserslautern www.eit.uni-kl.de/wehn Schwerpunktprogramm 1148 „Rekonfigurierbare Rechensysteme“ Anschlusskolloquium der zweiten Förderperiode in Lübeck 2 Project Overview Phase 1 (2005-2007) Implementation of dynamically reconfigurable decoder for trellis based decoding algorithms - FlexiTreP First studies on High-Throughput Phase 2 (2007-2009) Optimization of FlexiTreP architecture Silicon Implementation of FlexiTreP (65nm technology) Enhancement of Platform for flexible LDPC decoding High-Throughput (e.g. dynam. Reconfigurable Multiprocessor Architecture) Consideration of Reliability Issues in the platform design

Upload: others

Post on 18-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

1

Software Defined Radio -How to implement the outer modem ?

„Entwurf einer dynamisch rekonfigurierbaren Plattform für Kanalcodierung zukünftiger Mobilfunksysteme“

Timo VogtNorbert Wehn

Microelectronic System Design Research GroupUniversity of Kaiserslautern

www.eit.uni-kl.de/wehn

Schwerpunktprogramm 1148„Rekonfigurierbare Rechensysteme“

Anschlusskolloquium der zweiten Förderperiode in Lübeck

2

Project Overview

Phase 1 (2005-2007)Implementation of dynamically reconfigurable decoder fortrellis based decoding algorithms - FlexiTrePFirst studies on High-Throughput

Phase 2 (2007-2009)Optimization of FlexiTreP architectureSilicon Implementation of FlexiTreP (65nm technology)Enhancement of Platform for flexible LDPC decodingHigh-Throughput (e.g. dynam. Reconfigurable Multiprocessor Architecture)Consideration of Reliability Issues in the platform design

Page 2: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

2

3

Software Defined Radio (SDR)

Mobile communications systemsFlexibility

– Multi-mode (e.g. Uplink UMTS, Downlink DVB-H), Multi-channel– Adaptivity („cognitive radio“)

Energy efficiencyFlexibility/Cost Trade-off

Programmable Architectures (SDR): SIMD/Vector EnginesSandblaster SB3011 Platform (Sandbridge)Music Architecture (Infineon)OnDSP (NXP)Samira (Univ. Dresden)SODA (ARM, Univ. Michigan)….

Dynamically reconfigurable ArchitecturesPleiades Wireless Reconfigurable Processor Architecture (Berkeley)ADRES Architecture (IMEC)

– Factor 5 smaller area than Music architecture– Factor 10 higher energy efficiency than DSP

4

Filtering: suppress signals outside frequency band

Modulation: map source information onto signal waveforms

Channel Estimation: Estimate channel condition for transceivers

Error Correction: correct errors induced by noisy channel

Interleaver Channelencoder

deinteleaverChanneldecoder

(turbo/viterbi)

Source: Scott Mahlke / MPSoC‘06

UMTS/W-CDMA Physical Layer

Page 3: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

3

5

UMTS Physical Layer (2Mbps)

54022001823073115239

SODAMcycles/s

50100Turbo Encoder32.417500Turbo Decoder

132.526500Searcher21.43900FIR(Rx)25.77900FIR (Tx)33.3100Combiner327.33600Despreader60300Spreader1132600Descrambler26.6240Scrambler

Speed-upFactor

GPPMcycles/s

Algorithm

General Purpose Processor Superscalar ArchitectureSignal-processing On-Demand Architecture (SODA)

SIMD pipeline with 32 16bit datapathsScalar pipeline (one 16bit datapath)AGU (Address generation unit)

Quelle: „SODA: A Low-powerArchitecture For Software Radio“,T. Mudge et al, ARMISCA´06

6

ADRES Architecture (IMEC)

VLIW processor tightly coupled withreconfigurable array

1D VLIW processor (controlflow)2D VLIW Reconfigurablematrix (dataflow kernel)

Reconfiguration via config. RAM

Page 4: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

4

7

Channel Coding Law: doubling Complexitiy every 15 months

8

Standards/Flexibility

…14.4 Mbps40-511481/2…3/4bTCHSDPA

...32 Mbps(broadcast)641/4...7/8CCDVB-T/H

…64kbps...2608161/2 bTCInmarsat

>100Mbps...~2500-1/2...3/4LDPC

IEEE802.16(WiMax) ...54 Mbps... 64881/2...3/4 dbTC

2562/3CC

...2040

1…4095

...1944

1…4095

378...20736

1-744

40-5114

1-504

39...870

33...876

Blocksizes Throughput*StatesRatesCodesStandard

...54 Mbps641/2...7/8CC

6...54 Mbps641/2,9/16,3/4CCHiperlan

...450 Mbps-1/2...5/6LDPC

6...54 Mbps641/2...3/4CCIEEE802.11

(WLAN)

...2 Mbps81/2...1/5bTC

...38 kbps2561/2...1/6CCCDMA-2k

...2 Mbps81/3bTC

...32 kbps2561/2,1/3CCUMTS

5...62 kbps646/7,1/3CCEDGE

...12 kbps16, 643/4...1/4CCGSM

* throughput/channel

Page 5: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

5

9

Implementation Approach

Channel decoding algorithms

Complex iterative algorithms: control and dataflow

Calculations are not the bottleneck (log domain)

Data management (bandwidth/routing/storage) is key

SIMD/Vector architectures for inner modem are not suited

Basic Architecture

Application specific instruction set processors (ASIP)

Efficient support of control and dataflow

Design time e.g. Tensilica, LISATek, ARC

Processors with dynamically reconfigurable Hardware

– Loose coupling of processor with an FPGA

– Reconfigurable array as functional unit in pipeline (e.g. XiRISC)

10

FlexiTreP: Flexible Trellis Processor

Exploit programmability

Simple programming model

Decoding algorithms e.g. Log-MAP, Viterbi (control flow)

Exploit hardware reconfigurability (data management)

Fast context switching

Multi context instructions: simplifies instructions & reduces programmsize

(similar to ADRES architecutre)

Partially dynamically reconfigurable ASIP

Specific application: application knowledge is key

Full ASIP approach i.e. no predefined configurable pipeline template

„Just enough flexibility“: energy efficiency

Assembler code

Page 6: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

6

11

FlexiTreP Features

Supports all trellis-based decoding techniques in current standards

Binary Turbo DecodingConstraint length between 3 and 5Arbitrary generator and feedback polynomialsRates down to 1/7Interleaver table loadable

Doubinary Turbo DecodingConstraint lengths 4 and 5Arbitrary generator and feedback polynomialsRates down to 1/3

Binary MAP and Viterbi decodingConstraint length between 5 and 9Arbitrary generator and feedback polynomialsRates down to 1/4

12

FlexiTreP configured for Turbo Decoding

Channel code structure specified in DRCCCNSC/RSC, constraint length, BMU/Butterfly assignment… (red boxes)

Decoding algorithm programmable in Software

Page 7: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

7

13

DRCCC

Dynamic reconfigurable channel code control (DRCCC)LUT table

Controlls dataflow in datapath pipeline

Controlls address- and datawidth of memories

Several configurations can be stored for fast context switch(Shadow LUT table)

Each configuration memory contains 383 bits

Simplifies programming and reduces instruction lengthMulti context processing24 bits instead of 68 bits

Saves power, area and improves throughput

14

Multi context processing example

Operand routing for ACS recursion varies for partial parallel processingE.g. 64 states: 16 states processed concurrently -> 4 steps

Control for operand shuffling is stored in DRCCC for each trellis-stepInstructions only specify trellis-stepThe proper context is loaded into the pipeline

Butterfly calculation (ACS recursion)

Page 8: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

8

15

FlexiTreP configured for Viterbi decoding

16

Assembler code examples

MAP (2 Windows, blocklength 20)

Reconf….RPT ->STD_WIN #2

ldSMR (0)RPT ->FW2 #10fwdrec 3,3,1fwdrec 3,3,1FW2:modCVA 57modRDA 19ldSMR (127)RPT ->AQ2 #10bwdacq -3,3bwdacq -3,3AQ2:bwdrecllr (19)RPT ->BW2 #9bwdrecllr -1bwdrecllr -1BW2:bwdrecllr (0)

STD_WIN:

64 State Viterbi (blocklength 26)

ReconfRPT ->LOOP_END #26

ldSMR -3,1VA1 3,3,+4ldSMR -1,-1VA2 +2ldSMR -4,-1VA3 +5ldSMR -4,1VA4 +5

LOOP_END:RPT ->TB_END #13

VATBVATB

TB_END:

Each command 1 clock cycleZero-overhead loop control

Page 9: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

9

17

Synthesis and Performance Results

Synthesis with 65nm low power standard cell libraryArea : 73Kgates (~ 0.15mm2 )400 MHz clock frequencyCC-Throughput ~ 190 Mbps @ Kc=5/16 states (w/o IO)

~ 40 Mbps @ Kc=7/64 states (w/o IO)~ 10 Mbps @ Kc=9/258 states (w/o IO)

TC-Throughput up to 19 Mbps @ 5 iterations (w/o IO)

UMTS TC ComparisonXiRisc : ~ 0.1 Mbps @ 100MHz @ 130 nmOptimized Tensilica : ~ 0.4 Mbps @ 133MHz @ 104 Kgates @ 180 nmENST ASIP : ~ 4.4 Mbps @ 335MHz @ 93Kgates @ 90 nm

Comparison SODA <-> FlexiTrePUMTS (2Mbps service, Turbo) : 540MIPS <-> 50MIPSWLAN (24Mbps service, Kc=7, Viterbi) : 398MIPS <-> 240MIPS

18

Area Results/Memories

Logic: 73 Kgates

Memories (for full support W-CDMA/UMTS)Interleaver : (5120*13) [ ≈ 35.4 Kgates]Channel values : 2*(4096*12) [ ≈ 2 * 26.1 Kgates]Apriori : 4*(2048* 8) [ = 4 * 8.7 Kgates]LIFO : (128*48) [ ≈ 3.9 Kgates]State Metric (DP) : 2*(128*96) [ = 2 * 13.1 Kgates]Programm Mem : (512*24) [ = 6.5 Kgates]Total Memory : 159 Kgates

Logic and memories together less than 0.5mm2 @ 65nm technology

Page 10: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

10

19

Reconfigurable FlexiTreP Array

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

dr-ASIP

RI F R R R I F

RI F R R R I F

RI F R R R I F

RI F R R R I F

2D mesh topology, “Dimension-Order Routing”, Input-queued router with two virtual channels

20

Standards/Flexibility

…14.4 Mbps40-511481/2…3/4bTCHSDPA

...32 Mbps(broadcast)641/4...7/8CCDVB-T/H

…64kbps...2608161/2 bTCInmarsat

>100Mbps...~2500-1/2...3/4LDPC

IEEE802.16(WiMax) ...54 Mbps... 64881/2...3/4 dbTC

2562/3CC

...2040

1-4095

...1944

1-4095

378...20736

1-744

40-5114

1-504

39...870

33...876

Blocksizes Throughput*StatesRatesCodesStandard

...54 Mbps641/2...7/8CC

6...54 Mbps641/2,9/16,3/4CCHiperlan

...450 Mbps-1/2...5/6LDPC

6...54 Mbps641/2...3/4CCIEEE802.11

(WLAN)

...2 Mbps81/2...1/5bTC

...38 kbps2561/2...1/6CCCDMA-2k

...2 Mbps81/3bTC

...32 kbps2561/2,1/3CCUMTS

5...62 kbps646/7,1/3CCEDGE

...12 kbps16, 643/4...1/4CCGSM

* throughput/channel

Page 11: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

11

21

LDPC Architecture Overview

CombinedLayeredSingle PhaseTwo-Phase +PN branch

Two-Phase

Permutation Network Π

Perm

utat

ion

RAMC

ontro

ller

CNB/VNB

CNB/VNBA

ddre

ss R

AM

CNB VNB

Cha

nnel

RA

M

IN M

sg R

AM

CNPVNP

ZigZag Network

Permutation Network Π

Add

ress

RA

MP

erm

utat

ion

RA

MCon

trolle

r

CNB VNB

Cha

nnel

RA

M

CNB/VNB

CNB/VNB

IN M

sg R

AM

CNPVNP

PN

Msg

RA

M

Permutation Network Π

Add

ress

RA

MP

erm

utat

ion

RA

M

CNP CNP

Con

trolle

r

VNB

Permutation Network Π−1

Cha

nnel

RA

M

Sum

RA

M 1

Sum

RA

M 2

+

+

VNB VNB

Msg

RA

M

-

CNP

CNB

Permutation Network Π

Add

ress

RA

MP

erm

utat

ion

RA

M

Cha

nnel

RA

M

CNP

FIFO

Msg

RA

M

+

+ -

CNB CNBC

ontro

ller C

hann

el R

AM

Cha

nnel

RA

M

Permutation Network Π

Add

ress

RA

MP

erm

utat

ion

RA

M

CNP CNP

Con

trolle

r

VNB

Permutation Network Π−1

Cha

nnel

RA

M

Sum

RA

M 1

Sum

RA

M 2

+

+

VNB VNB

CNB

CNP

FIFO

Msg

RAM

+

+ -

Algorithms/implementation complexity strongly depends onCode structure, code rates, flexibility

Data management problem even worse than trellis-based decodersFactor 5 compared to TC

Memory Cuts determine area and throughputLarge complexity e.g. 802.11n WiFi Standard ~ 500 Kgates

22

UKL LDPC Decoder Implementations

PN branch

725-2025-2050-15Max. Iterations

0.14-0.70

274 Mbps / mm2

6.0-5.8 µs

54-281 Mbps

1.023

0.467

0.065

0.395

0.096

1-phase

27-81

1/2-5/6

648, 1296, 1944

WiFi(802.11n)

3.080.12-0.830.58-6.700.15-1.77Infobit/Cycle

3.2 Gbps / mm2

4.4 µs

1.63 Gbps

0.504

0.265

0.027

0.212

0

@ 528 MHz

Layered

MinSum+MSF/Lay.

80

3/4

9600

U-S LDPC

250 Mbps / mm2

6.0-5.7 µs

48-333 Mbps

1.337

0.551

0.206

0.470

0.110

Combined

24-96

1/2-5/6

576-2304

WiMax (802.16e)

69-21 µs270-82 µsLatency

430 Mbps / mm2183 Mbps / mm2

Max. Efficiency

0.23-2.68 Gbps

6.115

4.428

0.270

1.200

0.217

360

60-708 MbpsNet Throughput

3.861Overall Area

3.357Memory

0.046Network

0.328CNP

0.130VNP

Area [mm2] 65nm @ 400 MHz

1-phaseArchitecture

3-MinAlgorithm

6 bitQuantization

90Parallelism

1/4-9/10Code Rate

64800Codeword Size

DVB-S2LDPC Code

Page 12: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

12

23

Conclusion

ASIP for Trellis based Decoding

Combination of application specific IS programmability with dynamichardware reconfigurability provides very good trade-off between flexibilityand performance

Fabrication on 65nm Technology in 2007

– October 2007

– Energy measurements

Multiprocessor solution for High-Throughput CC and TC decoding

Flexible LDPC decoder implementation is the next big challenge

Efficient memory sharing

E.g. different architectures dynamically reloadable

Consideration of reliability issues in the platform design

24

Publications and Cooperations

Publications

A Reconfigurable Outer Modem Platform for Future Communications Systems T. Vogt, C. Neeb and N. Wehn. Dagstuhl Seminar "Dynamically ReconfigurableArchitectures" Dagstuhl Seminar Proceedings 06141, April 2006, Dagstuhl, Germany. A Reconfigurable Multi-Processor Platform for Convolutional and Turbo Decoding T. Vogt, C. Neeb and N. Wehn. Reconfigurable Communication-centric SoCs(ReCoSoC) 2006, Montpellier, France.Channel Decoding in Software Defined Radio N. Wehn. MPSoC 2006, August 2006, Estes Park, Colorado, USA. A Reconfigurable Application Specific Instruction Set Processor for Viterbiand Log-MAP Decoding T. Vogt and N. Wehn. IEEE Workshop on Signal Processing(SIPS'06), pages 142-147, October 2006, Banff, Canada.

CooperationsMiniworkshop „Applikationen und Compiler für grobgranularerekonfigurierbare Architekturen“ (Prof. Becker, Prof. Rosenstiel)Prof. Dr. J. Teich (Universität Erlangen-Nürnberg)

Page 13: presentation SPPRR Luebeck wehn - (Hardware-Software-Co ......and performance ÖFabrication on 65nm Technology in 2007 – October 2007 – Energy measurements ÖMultiprocessor solution

13

Thanks for your Attention

Schwerpunktprogramm 1148„Rekonfigurierbare Rechensysteme“

Anschlusskolloquium der zweiten Förderperiode in Lübeck