1 u niversity of m ichigan 11 1 soda: a low-power architecture for software radio author: yuan lin,...

25
1 UNIVERSITY OF MICHIGAN 1 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge Advanced Computer Architecture Laboratory University of Michigan at Ann Arbor Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Kriszti´an Flautner ARM, Ltd. Presenter: Wei Miao Jingcheng Wang

Upload: solomon-brooks

Post on 17-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

1UNIVERSITY OF MICHIGAN 11

1

SODA: A Low-power Architecture For Software Radio

Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge

Advanced Computer Architecture Laboratory

University of Michigan at Ann Arbor

Chaitali Chakrabarti

Department of Electrical Engineering

Arizona State University

Kriszti´an Flautner

ARM, Ltd.Presenter: Wei Miao

Jingcheng Wang

Page 2: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

2UNIVERSITY OF MICHIGAN

Overview Introduction on SDR Behavior model and Design tradeoff Architecture analysis Performance analysis Summary

Page 3: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

3UNIVERSITY OF MICHIGAN

INTRODUCTION AND ANALYSIS

Wei Miao

Page 4: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

4UNIVERSITY OF MICHIGAN

Basic introduction on SODA Signal-processing On-Demand Architecture Support software radio 4-core, containing asymmetric pipeline Meet requirement of 2Mbps WCDMA/24Mbps 802.11a

Page 5: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

5UNIVERSITY OF MICHIGAN

Introduction on SDR Software Defined Radio(SDR) Decode different signals on a single processor

Page 6: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

6UNIVERSITY OF MICHIGAN

Why SDR? Easy to implement & update Multi-mode operation Prototyping and bug fixes Shorter time to develop

UWB EDGE 802.16a

802.16a Bluetooth

802.11b WCDMA 802.11n

SDR

(Picture From Lin, ISCA’06 slides)

Page 7: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

7UNIVERSITY OF MICHIGAN

Challenges of SDR Need to achieve high throughput Power limitation

Page 8: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

8UNIVERSITY OF MICHIGAN

Wireless protocols behavior Feed-forward, multiple kernel Low but heterogeneous requirement for inter-kernel

communication Real-time deadline Heavy data parallelism 8-16 bits data width Scalar vector operation

Page 9: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

9UNIVERSITY OF MICHIGAN

Design Tradeoff Concurrent execution vs. Single Context execution Static Multi-core Scheduling vs. Multi-threading Vector vs. SIMD vs. VLIW

Page 10: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

10UNIVERSITY OF MICHIGAN

SODA ARCHITECTURE AND RESULTS

Jingcheng Wang

Page 11: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

11UNIVERSITY OF MICHIGAN

4 PEs static kernel mapping

and scheduling SIMD+Scalar units

1 ARM GPP controller scalar algorithms and

protocol controls

SIMDRF

SIMDMEM

scalarRF

scalarMEM

WtoS&

StoW

DMA

Scalar ALU SIMD ALU

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

SIMDRF

SIMDMEM

scalarRF

scalarMEM

WtoS&

StoW

DMA

Scalar ALU SIMD ALU

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

SIMDRF

SIMDMEM

scalarRF

scalarMEM

WtoS&

StoW

DMA

Scalar ALU SIMD ALU

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

LocalMem

ExecutionUnit

PE

GlobalMemSystem ArchitectureARM

SODA System Architecture

(From Lin, ISCA’06 slides)

Page 12: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

12UNIVERSITY OF MICHIGAN

SODA PE Architecture

PE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

2 issue LIW (400MHz) - SIMD + (Scalar or AGU) DMA: - mem-to-mem transfer - access global memory

(From Lin, ISCA’06 slides)

Page 13: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

13UNIVERSITY OF MICHIGAN

SODA PE Scalar Pipeline

PE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

Scalar: - One 16-bit datapath - No mult unit Scalar memory: - 16bit port - 1 read/write port - 4 KBytes Scalar-to-Vector Vector-to-Scalar

(From Lin, ISCA’06 slides)

Page 14: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

14UNIVERSITY OF MICHIGAN

SODA PE SIMD Pipeline

PE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

16-bit 16 entries2 read/ 1 write port

RF

EX

16-bitMultiplier

40-bit ACC

16-bit

ALU

16bit

16bitWB

16bit

(From Lin, ISCA’06 slides)

Page 15: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

15UNIVERSITY OF MICHIGAN

SODA PE SIMD Pipeline

PE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

SIMD: - 32 wide - predicated exec. - predicated neg.

Memory: - 512bit port - 1 read port - 1 write port - 8 KBytes (From Lin, ISCA’06 slides)

Page 16: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

16UNIVERSITY OF MICHIGAN

SODA PE SIMD Shuffle Network

PE

Scalar pipeline

32x16bit

SSN

Vector to ScalarStage 1

SIMD Memory (8KB)

IR

RF ID

16bit EX

16bit WBALU

Scalar Memory (4KB)

32-waySIMD

IR

ScalarRF

RF ID

EX

WB

IR

AGURF

AGU ALU12bit

Inst.Mem.4KB

SIMD pipeline

AGU pipelineDMA16bit BUS

512bit

Vector to ScalarStage 2

Scalar to Vector

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

RF ID

16bit EX Multiplier16bit W

BALU

SIMD Shuffle NetworkShuffle Exchange (SE)Inverse Shuffle Exchange (SE)Exchange Only (EX)Iterative Feedback

(From Lin, ISCA’06 slides)

Page 17: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

17UNIVERSITY OF MICHIGAN

W-CDMA Mapping On SODA

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

...

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

D/A

A/D

FrontendW-CDMA Physical Layer Processing

2 LPF-RxMisc.

ControlSearcher

De-interleaver

PowerControl

PN CodeTX/RX

TurboDecoder

Buffer(1K Bytes)

Buffer(1K Bytes) Buffer

(2K Bytes)FIFO Queue

(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(20 KBytes)

Buffer(1K Bytes)

ARM PE PE PE PE GlobalMemory

Buffer(1K Bytes)

WCDMA Receiver WCDMATransmitter

4 LPF-Rx

Scrambler

Spreader

TurboEncoder

Interleaver

De-scrambler

Despreader

Combiner

4 LPF-Rx

Scrambler

Spreader

TurboEncoder

Interleaver

descrambler despreader combiner

descrambler despreader

...

TurboDecoderSearcher

2 LPF-Rx

De-scrambler

Despreader

Combiner

Misc.Control

De-interleaver

PowerControl

PN CodeTX/RX

Channeldecoder

(turbo/viterbi)deinteleaver

searcher

LPF-Rx

Channelencoder

InterleaverspreaderscramblerLPF-Tx

Buffer(1K Bytes)

Buffer(1K Bytes) Buffer

(2K Bytes)FIFO Queue

(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(20 KBytes)

Buffer(1K Bytes)

Buffer(1K Bytes)

(From Lin, ISCA’06 slides)

Page 18: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

18UNIVERSITY OF MICHIGAN

Page 19: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

19UNIVERSITY OF MICHIGAN

SIMD Design and Tradeoffs 40GOPS required In 4 PE system,

10 GOPS in each

Page 20: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

20UNIVERSITY OF MICHIGAN

Low-power Design Clustered Register Files with 2 Read Ports and 1 Write Port

Fewer Memory Read/Write Ports

Smaller Instruction Fetch logic

Page 21: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

21UNIVERSITY OF MICHIGAN

Experiment Methodology Area and power estimation calculated using RTL Verilog

model Synthesized using Synopsys Physical Compiler and TSMC

180nm Library Memories generated by Artisan SRAM generator Estimated 90nm and 65nm processes using a quadratic

scaling factor Dynamic power was estimated from behavior simulation on

their system simulator Leakage power was estimated at 30% of the total power

Page 22: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

22UNIVERSITY OF MICHIGAN

Performance results

Page 23: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

23UNIVERSITY OF MICHIGAN

Power Area result Typical cellular phone power for physical layer ~ 200mW

Page 24: 1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor

24UNIVERSITY OF MICHIGAN

Discussion Points 1. The author only synthesized the core in TSMC180nm and

estimated the area and power of 90nm and 65nm. Is that fair to claim that the architecture meet the requirement?

The author claims that he reduces CDMA search algorithm from 26.5Gops in GP processor to 200Mops in SODA. And the main reason is due to SIMD execution. Is SIMD the only and main speedup factor? Is the novelty of paper enough?

2. Utilization of the 4 PEs are 60%, 50%, 100% and 94% respectively. Can it do better?