the tick programmable low-latency sdr...

23
The Tick Programmable Low-Latency SDR System Haoyang Wu 1 , Tao Wang 1 , Zengwen Yuan 2 , Chunyi Peng 3 , Zhiwei Li 1 , Zhaowei Tan 2 , Boyan Ding 1 , Xiaoguang Li 1 , Yuanjie Li 2 , Jun Liu 1 , Songwu Lu 2

Upload: others

Post on 22-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

The Tick Programmable Low-Latency SDR System

Haoyang Wu1, Tao Wang1, Zengwen Yuan2, Chunyi Peng3, Zhiwei Li1, Zhaowei Tan2, Boyan Ding1, Xiaoguang Li1, Yuanjie

Li2, Jun Liu1, Songwu Lu2

Page 2: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

2

Drives for Software-Defined Radio

New Requirements on SDR

New wireless

standard

Protocol

evolutions

Trending

applications

Low latency Good

Programmability Good

Programmability

Page 3: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

3

Current SDR Hard to Achieve Both

Low latency Good

Programmability

GNU Radio Latency over milisecond

Real-time 802.11a/g

Sora Real-time 802.11ac

WARP Real-time 802.11n

Real-time 802.11ac Hard

Page 4: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

4

PHY & MAC differ in their data and control flows

Why?

PHY

Computational-intensive

data-flow

FFT, channel estimation, …

Low-latency oriented design

ASIC, DSP, FPGA, …

In HW, hard to program

MAC

Complex cond./branch control-

flow

Retransmission, backoff, …

Processor for program

C programming

Hard to satisfy MAC timing

Page 5: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

5

It’s not a simple trade-off !

Call for new design

Ideal

Achieving the Best of Both Worlds

Good programmability

Poor programmability

High latency Low latency

GNU Radio

WARP

Sora

PHY

optimization

MAC

optimization

Page 6: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

6

Tick uses HW/SW co-design in PHY & MAC

1. How is latency reduced?

PHY

New parallel pipeline techniques in FPGA (HW)

MAC

Separate data and control flows (HW)

2. How is programmability retained?

PHY & MAC

Modular programming (SW)

Tick: A New SDR that Achieves Both

Page 7: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

7

Problem

Different computing workload

Low Latency at PHY: MCDP

Interleaver Synchronization

clk Bottleneck

Reduce

0.33 μs

Clock

frequency 45MHz 45MHz

45MHz

Latency

cycles 17

Latency 0.37 μs

430MHz

0.04 μs

1

0.02 μs

Bit order

exchange

Complex Add

Complex Mul

Cause: single clock limits latency reduction in non-bottleneck modules

Page 8: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

8

Problem

Different computing workload

MCDP: Multi-Clock-Domain Pipeline

Each module operates with its own clock

How to match rates at different clock

Async FIFO

Low Latency at PHY: MCDP

clk1 clk2 clk3

Higher clk frequency Lower latency

Interleaver Synchronization

Bit order

exchange

Complex Add

Complex Mul

aFIFO aFIFO

full empty

Page 9: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

9

Low Latency at PHY: Field-Based Processing

Wireless Frame

Training, Data fields, …

Not all fields need to traverse the entire pipeline

frame

Whole pipeline latency

Scrambler IFFT

Training Data Training Data’

Data Training

Latency: 2.49 μs

Latency: 1.86 μs

… …

Page 10: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

10

Solution: Field-based processing

Parallel processing fields in different pipeline stages

Benefit:

Flexible fields customization with control

Low Latency at PHY: Field-Based Processing

L-header 802.11a

802.11ac

L-Data

L-header VHT-Data VHT-header

Scrambler IFFT BCC … …

⑧ Data ③ L-SIG

④ VHT-SIG-A

⑦ VHT-SIG-B

① L-STF

② L-LTF

⑤ VHT-STF

⑥ VHT-LTF

Page 11: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

11

Problem

Data blocks time critical control flow

Accelerator-rich design

Offload tasks

Separate data & control flows

Significantly reduce latency

Low Latency at MAC: Accelerator-Rich

Embedded

Processor

Accel

erator

Accel

erator

Accel

erator

Control

Data

DMA blocks ~400μs

Page 12: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

12

How?

Bus-based?

Larger latency per R/W (18%)

Repetitive control

Why register-based?

Smaller accumulated R/W latency

Reduce 1.5 μs per 50 times R/W

Save SIFS time by 10%

Avoid repetitive control by shared reg.

Solution: Control & status register (CSR)

CR/SR: avoid rewrite conflict

Processor & Accelerators Communicates

Embedded

Processor

Global CSR Controller

Accelerator 1 Accelerator 2

Local CSR conflict

CR

SR

shared

Page 13: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

13

Element-based design: similar to Click

Unified interface

User programmable PE

Programming abstraction

A graph of elements

XML-based config reduces coding efforts

Retain Programmability

Processing Engine

Storage Unit

Config

Output Input

Library

Control

Page 14: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

14

Three real-time 802.11 SDR systems with Tick:

802.11ac, SISO (first-ever)

802.11ac, 2x2 MIMO (first-ever)

802.11a/g, full-duplex

Public code release soon

Implementation

802.11ac SISO

TX RX

Page 15: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

15

COTS Hardware: Xilinx Kintex-7 KC705

Rich Library: supporting 802.11a/g/ac

Easy-to-use APIs

XML/GUI based programming and config

28 PHY elements

BPSK, QPSK, 16/64/256-QAM

Coding rate: 1/2, 2/3, 3/4, 5/6

Long GI, short GI …

12 MAC accelerators

CRC-32

Backoff, Timer

MAC header process …

Implementation

Tick board (w/o host)

Page 16: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

16

Many viable choices need further fine tuning

PHY:

Eliminate jitter caused by async FIFO in MCDP

MAC:

Ensure critical timing for accelerators via timestamp passing

HW/SW co-design:

Choose low-latency interfaces

Lessons Learnt from Implementation

Page 17: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

17

Latency Benchmark on 802.11ac SISO, 80MHz

Reduction compared to a reference design

PHY: sizable reduction (2.2x)

MCDP and field-based processing

MAC: over two-order-of-magnitude reduction

Decoupling control and data flows

Overall Evaluation

Latency

Programming effort

FPGA resource usage

Evaluation

Page 18: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

18

802.11ac SISO v.s. Ref. design: 80 MHz, 64 QAM, 3/4 rate

PHY latency: 2.2x/1.6x reduction for RX/TX, MTU frame

MCDP: speed up clock frequency

Field-based Proc.: bypass bottleneck

MAC latency: > 497x reduction (8938 → 18 μs), MTU frame

Reduce data processing bottleneck: CRC + Frame header

Evaluation: Latency Benchmark

2.98 2.49

2.01 1.86

0.

0.8

1.5

2.3

3.

3.8

SCDP MCDP

Late

ncy (

μs)

Frame-based

Field-based

Tick PHY Tx latency reduction (1.6x)

Page 19: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

19

Overall latency

End-to-end latency on host: 334 μs (WARP: 2.5 ms)

SIFS: 14.7 μs (< 16us, 802.11ac)

Programmability

Reduce 22.8% line-of-codes (LoC)

Moderate FPGA resource usage

27.9% LUT in Kintex-7 FPGA (65.3% for 2x2 MIMO)

Evaluation: Overall System

Page 20: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

20

First real-time 802.11ac MIMO prototype on SDR

Performance for 64-QAM

Max PHY proc.: Tx: 1 Gbps (1113 Mbps); Rx: 632 Mbps

Satisfy SIFS timing requirements

Coding effort

Case Study: 802.11ac 2x2 MIMO

2x2 MIMO

reuses 14/20

SISO modules

Page 21: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

21

Lightweight coding effort

Only 8% of the original 802.11a/g codes are modified

15/22 elements are unmodified and reused

Throughput

245 Mbps

Case Study: 802.11a/g Full-Duplex

Page 22: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

22

Future SDR calls for lower latency while retaining

programmability

Existing designs cannot meet both for new wireless

standards

Tick:

Three-year dev effort at PKU, UCLA

Explore HW/SW co-design techniques

First real-time 802.11ac SDR implementation (SISO & MIMO)

Stimulate community efforts for domain-specific arch design

for wireless networking

Conclusion

Page 23: The Tick Programmable Low-Latency SDR Systemweb.cs.ucla.edu/~zyuan/slides/mobicom17-tick-slides.pdfThe Tick Programmable Low-Latency SDR System Haoyang 3Wu 1, Tao Wang , Zengwen Yuan2,

23

Visit http://metro.cs.ucla.edu/tick.html

Peking University

Haoyang Wu, Tao Wang, Zhiwei Li, Boyan Ding, Xiaoguang Li,

Jun Liu

UCLA

Zengwen Yuan, Zhaowei Tan, Yuanjie Li, Songwu Lu

Purdue University

Chunyi Peng

Thanks