ziria: wireless programming for hardware dummies božidar radunović joint work with gordon stewart,...
TRANSCRIPT
Ziria: Wireless Programming for Hardware Dummies
Božidar Radunovićjoint work with
Gordon Stewart, Mahanth Gowda, Geoff Mainland, Dimitrios Vytiniotis
http://research.microsoft.com/en-us/projects/ziria/
2
Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions
3
Motivation – why this course?
Lots of innovation in PHY/MAC design Popular experimental platform: GNURadio
Relatively easy to program but slow, no real network deployment
Modern wireless PHYs require high-rate DSP Real-time platforms [SORA, WARP, …]
Achieve protocol processing requirements, difficult to program, no code portability, lots of low-level hand-tuning
4
Issues for wireless researchers CPU platforms (e.g. SORA)
Manual vectorization, CPU placement Cache / data sizing optimizations
FPGA platforms (e.g. WARP) Latency-sensitive design, difficult for new students/researchers to
break into
Portability/readability Manually highly optimized code is difficult to read and maintain Also: practically impossible to target another platform
Difficulty in writing and reusing code
hampers innovation
5
Hardware Platforms FPGA: Programmer deals with hardware issues
WARP, Airblue CPUs: SORA bricks [MSR Asia], GNURadio blocks
SORA was a huge breakthrough, design of RX/TX with PCI interface, 16Gbps throughput, ~ μs latency
Very efficient C++ library We build on top of SORA
Many other options now available: E.g. http://myriadrf.org/
6
What is wrong with current tools?
7
Current SDR Software Tools Portable (FPGA/CPU), graphical interface:
Simulink, LabView
CPU-based: C/C++/Python GnuRadio, SORA
Control and data separation CodiPhy [U. of Colorado], OpenRadio [Stanford]:
Specialized languages (DSL): Stream processing languages: StreamIt [MIT] DSLs for DSP/arrays, Feldspar [Chalmers]: we put more emphasis on
control Spiral
8
Issues Programming abstraction is tied to execution model Programmer has to reason about how the program will be
executed/optimized while writing the code
Verbose programming Shared state Low-level optimizationWe next illustrate on Sora code examples(other platforms are have similar problems)
9
Running example: WiFi receiver
removeDC
DetectCarrier
ChannelEstimation
InvertChannel
Packetstart
Channel info
Decode Header
InvertChannel
Decode Packet
Packetinfo
10
How do we execute this on CPU?
removeDC
DetectCarrier
ChannelEstimation
InvertChannel
Packetstart
Channel info
Decode Header
InvertChannel
Decode Packet
Packetinfo
11
Dataflow streaming abstractions
Events (messages) come in
Events (messages) come out
Why unsatisfactory? It does not expose: (1)When is vertex state (re-)
initialized?(2)Under which external “control”
messages can the vertex change behavior?
(3)How can vertex transmit “control” information to other vertices?
Predominant abstraction today [e.g. SORA, StreamIt, GnuRadio] is that of a “vertex” in a dataflow graph
Reasonable as abstraction of the execution model Unsatisfactory as programming and compilation model
12
Shared statestatic inlinevoid CreateDemodGraph11a_40M (ISource*& srcAll, ISource*& srcViterbi, ISource*& srcCarrierSense){CREATE_BRICK_SINK (drop, TDropAny, BB11aDemodCtx );CREATE_BRICK_SINK (fsink, TBB11aFrameSink, BB11aDemodCtx );CREATE_BRICK_FILTER (desc, T11aDesc, BB11aDemodCtx, fsink );typedef T11aViterbi <5000*8, 48, 256> T11aViterbiComm;CREATE_BRICK_FILTER (viterbi,T11aViterbiComm::Filter,BB11aDemodCtx, desc );CREATE_BRICK_FILTER (vit0, TThreadSeparator<>::Filter, BB11aDemodCtx, viterbi);// 6MCREATE_BRICK_FILTER (di6, T11aDeinterleaveBPSK, BB11aDemodCtx, vit0 );CREATE_BRICK_FILTER (dm6, T11aDemapBPSK::filter, BB11aDemodCtx, di6 );…
… CREATE_BRICK_SINK (plcp, T11aPLCPParser, BB11aDemodCtx );CREATE_BRICK_FILTER (sviterbik, T11aViterbiSig, BB11aDemodCtx, plcp );CREATE_BRICK_FILTER (dibpsk, T11aDeinterleaveBPSK, BB11aDemodCtx, sviterbik );CREATE_BRICK_FILTER (dmplcp, T11aDemapBPSK::filter, BB11aDemodCtx, dibpsk );CREATE_BRICK_DEMUX5 ( sigsel,TBB11aRxRateSel, BB11aDemodCtx,dmplcp, dm6, dm12, dm24, dm48 );CREATE_BRICK_FILTER (pilot, TPilotTrack, BB11aDemodCtx, sigsel );CREATE_BRICK_FILTER (pcomp, TPhaseCompensate, BB11aDemodCtx, pilot );CREATE_BRICK_FILTER (chequ, TChannelEqualization, BB11aDemodCtx, pcomp );CREATE_BRICK_FILTER (fft, TFFT64, BB11aDemodCtx, chequ );; CREATE_BRICK_FILTER (fcomp, TFreqCompensation, BB11aDemodCtx, fft );CREATE_BRICK_FILTER (dsym, T11aDataSymbol, BB11aDemodCtx, fcomp );CREATE_BRICK_FILTER (dsym0, TNoInline, BB11aDemodCtx, dsym );Shared
state
13
Separation of control and datavoid Reset() { Next0()->Reset(); // No need to reset all path, just reset the path we used in this frame
switch (data_rate_kbps) {case 6000:case 9000:
Next1()->Reset();break;
case 12000:case 18000:
Next2()->Reset();break;
case 24000:case 36000:
Next3()->Reset();break;
case 48000:case 54000:
Next4()->Reset();break;
} }
Resetting whoever* is downstream*we don’t know who that is when we write this
component
14
VerbosityDEFINE_LOCAL_CONTEXT(TBB11aRxRateSel, CF_11RxPLCPSwitch, CF_11aRxVector );template<TDEMUX5_ARGS>class TBB11aRxRateSel : public TDemux<TDEMUX5_PARAMS>{ CTX_VAR_RO (CF_11RxPLCPSwitch::PLCPState, plcp_state ); CTX_VAR_RO (ulong, data_rate_kbps ); // data rate in kbps
public: …..public: REFERENCE_LOCAL_CONTEXT(TBB11aRxRateSel); STD_DEMUX5_CONSTRUCTOR(TBB11aRxRateSel) BIND_CONTEXT(CF_11RxPLCPSwitch::plcp_state, plcp_state) BIND_CONTEXT(CF_11aRxVector::data_rate_kbps, data_rate_kbps) {}
- Declarations are written in host language- Language is not specialized, so often verbose
- Hinders fast prototyping
15
SDR manual optimizations (LUT)struct _init_lut { void operator()(uchar (&lut)[256][128]) { int i,j,k;
uchar x, s, o; for ( i=0; i<256; i++) {
for ( j=0; j<128; j++) { x = (uchar)i; s = (uchar)j; o = 0; for ( k=0; k<8; k++) {
uchar o1 = (x ^ (s) ^ (s >> 3)) & 0x01;
s = (s >> 1) | (o1 << 6);
o = (o >> 1) | (o1 << 7);
x = x >> 1; } lut [i][j] = o; } } } }
Hand-written bit-fiddling code to create lookup
tables for specific computations that must run
very fast
?
16
Vectorization
removeDC
DetectCarrier
ChannelEstimation
InvertChannel
Packetstart
Channel info
Decode Header
InvertChannel
Decode Packet
Packetinfo
- Beneficial to process items in chunks
- But how large can chunks be?
17
My Own Frustrations Implemented several PHY algorithms in FPGA
Never been able to reuse them: Complexity of interfacing (timing and precision) was higher than
rewriting!
Implemented several PHY algorithms in Sora
Better reuse but still difficult Spent 2h figuring out which internal state variable I haven’t
initialized when borrowed a piece of code from other project.
I want tools to allow me to write reusable codeand incrementally build ever more complex systems!
18
Improving this situation New wireless programming platform
1. Code written in a high-level language2. Compiler deals with low-level code optimization3. Same code compiles on different platforms (not there just yet!)
Challenges1. Design PL abstractions that are intuitive and expressive2. Design efficient compilation schemes (to multiple platforms)
What is special about wireless1. … that affects abstractions: large degree of separation b/w data
and control2. … that affects compilation: need high-throughput stream
processing
19
Our Choice: Domain Specific Language What are domain-specific languages? Examples:
Make SQL
Benefits: Language design captures specifics of the task This enables compiler to optimize better
20
Why is wireless code special? Wireless = lots of signal processing Control vs data flow separation Data processing elements:
FFT/IFFT, Coding/Decoding, Scrambling/Descrambling Predictable execution and performance, independent of data
Control flow elements: Header processing, rate adaptation
21
Programming model
removeDC
DetectCarrier
ChannelEstimation
InvertChannel
Packetstart
Channel info
Decode Header
InvertChannel
Decode Packet
Packetinfo
22
How do we want code to look like? Example: IEEE 802.11a scrambler: S(x) = x7 + x4 + 1
Ziria:x <- take; do{
tmp := (scrmbl_st[3] ^ scrmbl_st[0]);scrmbl_st[0:5] := scrmbl_st[1:6];scrmbl_st[6] := tmp;y := x ^ tmp
}; emit (y)
23
What do we not want to optimize? We assume efficient DSP libraries:
FFT Viterbi/Turbo decoding
Same are used in many standards: WiFi, WiMax, LTE
This is readily available: FPGA (Xilinx, Altera) DSP (coprocessors) CPUs (Volk, Sora libraries, Spiral)
Most of PHY design is in connecting these blocks
24
Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions
25
Ziria: A 2-layer design Lower layer
Imperative C-like code for manipulating bits, bytes, arrays, etc. NB: You can plug-in any C function in this layer
Higher layer A monadic language for specifying and staging stream processors Enforces clean separation between control and data flow, clean state
semantics
Runtime implements low-level execution model
Monadic pipeline staging language facilitates aggressive compiler optimizations
26
A stream transformer t, of type:
ST T a b
Ziria: control-aware stream abstractions
t
inStream (a)
outStream (b)
c
inStream (a)
outStream (b)
outControl (v)
A stream computer c, of type:
ST (C v) a b
27
Staging a pipeline, in diagrams
c1
t1
t2
t3
C T
repeat { v <- (c1 >>> t1) ; t2 >>> t3 }
“Vertical composition” (along data path -- “arrows”)
“Horizontal composition” (along control path --
“monads”)
28
Running example:WiFi Scrambler
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in ...
29
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in <rest of the code>
Start defining computational method
End defining computational method
30
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in ...
Local variables
Types:- Bit- Array of
bits
Constants
31
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in ...
Special-purpose computers:
32
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in ...
Imperative (C/Matlab-like) code:
33
let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp: bit; var y:bit;
repeat seq { x <- take;
do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp; };
emit y }in ...
repeat
take doemi
t
yx
Computers and transformers
34
Whole program
Read >>> do_something >>> write
Reads and writes can come from RF, IP, file, dummy
35
Computation language primitives Define control flow Two groups:
Transformers Computers
36
Transformers Map:
let f(x : int) =
var y : int = 42;
y := y + 1;
return (x+y);
in
read >>> map f >>> write
Repeat
let f(x : int) =
x <- take;
if (x > 0) then
emit 1
in
read >>> repeat f >>> write
37
Computers While:
while (!crc > 0) {
x <- take;
do {crc = search(x);}
}
If-then-else:
if (rate == CR_12) then
emit enc12(x);
else
emit enc23(x);
Also: take, emit, for
38
Expression language – data processing Mix of C and Matlab Can be directly linked to any C function Subset of data types (mainly fixed point):
<basetype> ::= bit | bool | double | int | int8 | int16 | int32 | complex | complex16 | complex32 | struct TYPENAME | arr <basetype> | arr[INTEGER] <basetype> | arr[length(VARNAME)] <basetype>
39
Expression language - examplelet build_coeff(pcoeffs:arr[64] complex16, ave:int16, delta:int16) =
var th:int16;
th := ave - delta * 26; for i in [64-26, 26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }; th := th + delta; for i in [1,26] { pcoeffs[i] := complex16{re=cos_int16(th);im=-sin_int16(th)}; th := th + delta }in
Array (equivalent to [64-26:64])
Fixed-point complex numbers
External C function
Function
40
Libraries Ziria header: let external v_sub_complex32(c:arr complex32, a:arr[length(c)] complex32, b:arr[length(c)] complex32 ) : () in
C method:int __ext_v_add_complex32(struct complex32* c, int len, struct complex32* a, int __unused_2, struct complex32* b, int __unused_1)
Libraries (mainly linked to existing Sora libraries): SIMD instructions, FFT and Viterbi, fixed-point trigonometry,
visualisation
41
Frequently Asked Questions Why defining a new language? Why not use C/Matlab/<your favourite language>?
How do you share state? Why using let x = 20+3*z in instead ofx := 20 + 3*z;?
Why x <- take and not x := take?
42
Question: How do you implement teleport message?
Decoding
Frequency mixing
Equalizernew_freq
reconfigurationmessage
43
Answer: Use repeat to reinitialize in the new state
let processor() = var new_freq := X; // initializerepeat { ret <- ( freq_mixing(new_freq)
>>> equalizer >>> decoding
) ; do{ new_freq := ret } }
Freq_mixing
Decoding
repeatEqualize
r
44
Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions
45
How to write a compiler? Haskell + libraries
Parsing, code generation, flexible types, pattern matching
First version in <2 months Easily extendible Moral: compilers can be a useful tool!
46
Compilation – High-level view Expression language -> C code Computation language -> Execution model Numerous optimizations on the way:
Vectorization Lookup tables Conventional optimizations: Folding, inlining, …
47
Execution model: How to execute code?
removeDC
DetectCarrier
ChannelEstimation
InvertChannel
Packetstart
Channel info
Decode Header
InvertChannel
Decode Packet
Packetinfo
Runtime
tick()
process(x)
YIELD (data_val)
SKIP
DONE (control_val)
B1
B2process(x)
tick()
Q: Why do we need ticks?
Actions: Return values:
YIELD
DONE
A: Example: emit 1; emit 2; emit 3
49
Execution model - examplelet comp test1() =
repeat{
(x:int) <- take;
emit x;
}
in
tick()
SKIP
tick()
SKIP
tick()
YIELD(n)
read[int] >>> test1() >>> test1() >>> write[int]
process(n)
YIELD(n)
process(n)
process(n)DONE(n)
process(n)
YIELD(n)
50
Runtime main loop
L1: t.init() // init top-level componentL2: whatis := t.tick()L3: if (whatis == Yield b) then { put_buf(b); goto L2 } else if (whatis==Skip) then goto L2 else if (whatis==Done) then exit() else if (whatis==NeedInput) then { c = get_buf(); whatis := t.process(x); goto L3; }
In reality:• Very few function calls with a CPS-
based translation: every “process” function knows its continuation
• Optimizations: never tick components with trivial tick(), never generate process() for tick()-only components
• Only indirection is for bind: at different points in times, function pointers point to the correct “process” and “tick”
• Slightly different approach to input/output
51
How about performance?let comp test1() = repeat{ (x:int) <- take; emit x + 1; }in
read[int] >>> test1() >>> test1() >>> write[int]
(((read >>> let auto_map_6(x: int32) = x + 1 in {map auto_map_6}) >>> let auto_map_7(x: int32) = x + 1 in {map auto_map_7}) >>> write)
buf_getint32(pbuf_ctx, &__yv_tmp_ln10_7_buf);__yv_tmp_ln11_5_buf = auto_map_6_ln2_9(__yv_tmp_ln10_7_buf); __yv_tmp_ln12_3_buf = auto_map_7_ln2_10(__yv_tmp_ln11_5_buf); buf_putint32(pbuf_ctx, __yv_tmp_ln12_3_buf);
52
Type-preserving transformationslet block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; __unused_174 <- times 4 (\vect_j_50. (x : int) <- return vect_xa_47[0*4+vect_j_50*1+0]; __unused_1 <- return y := x+1; return vect_ya_48[vect_j_50*1+0] := y); emit vect_ya_48 in vect_up_wrap_46 (tt)
let block_VECTORIZED (u: unit) = var y: int; repeat let vect_up_wrap_46 () = var vect_ya_48: arr[4] int; (vect_xa_47 : arr[4] int) <- take1; emit let __unused_174 = for vect_j_50 in 0, 4 { let x = vect_xa_47[0*4+vect_j_50*1+0] in let __unused_1 = y := x+1 in vect_ya_48[vect_j_50*1+0] := y } in vect_ya_48 in vect_up_wrap_46 (tt)
Dataflow graph iteration converted to tight loop! In this case we got x3
speedup
53
Transformations on Abstract Syntax Tree One of main benefit of compiler Computation optimizations:
Vectorization, inlining, convert to map, optimize tick/process, …
Expression optimization: Lookup tables, inlining,
calculate constant expressions, unroll loops, …
Tests: Array boundary checks
seq
EArrRead
TArr
x
EVal
VInt
0
EVal
VInt
10
y := x[0,10]
y:=
54
Flexible compiler design Example: x[0,length(x)] x
subarr_inline_step e | EArrRead evals estart (LILength n) <- unExp e , EVal (VInt 0) <- unExp estart , TArr (Literal m) _ <- info evals , n == m = rewrite evals
Easy to add new transformations
expressionOptimization function
If expression e is of type EArrRead on array evals == m with start estart == 0 and length == (LILength n)
evals => xestart => 0(LILength n) => length(x)
55
Vectorization Idea: batch processing over multiple data itemsrepeat {(x:int)<-take; emit x} repeat {(x:arr[64] int)<-take; emit x}
Modifications of the execution model: Possible since the execution model is not hardcoded in the code We need to respect the operational semantics
Benefits: LUT: bits -> bytes Lower overhead of the execution model (ticks/processes) Faster memcpy Better cache locality
Vectorization Challenges
56
ParseHeader
CRC(Len,Rate)
If rate == 6 Mbps
scrambler
½ encoder
interleaver
BPSK
2 bit1 bit
48 bit48 bit
1 bit1 complex
1 bit1 bit
1 bit1 bit
CRC
scrambler
¾ encoder
interleaver
64 QAM
4 bit3 bit
288 bit288 bit
6 bit1 complex
1 bit1 bit
1 bit1 bit
Len
Len
8 bit4 bit
48 bit48 bit
8 bit8 complex
8 bit8 bit
8 bit8 bit
32 bit24 bit
288 bit288 bit
12 bit2 complex
8 bit8 bit
8 bit8 bit
32 bit24 bit
288 bit288 bit
12 bit2 complex
8 bit8 bit
8 bit216 bit
8 bit4 bit
48 bit48 bit
8 bit8 complex
8 bit8 bit
8 bit24 bit
Look-up Table (LUT) Optimizations Key optimization for Sora TX Identify block of expressions that
transform data has limited input and output size
Replace it with a LUT Similar to FPGA compilation
Especially beneficial for bit operations
58
LUT Optimizations (by example)let comp scrambler() = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp,y: bit; repeat { (x:bit) <- take; do { tmp := (scrmbl_st[3] ^ scrmbl_st[0]); scrmbl_st[0:5] := scrmbl_st[1:6]; scrmbl_st[6] := tmp; y := x ^ tmp };
emit (y) }
let comp v_scrambler () = var scrmbl_st: arr[7] bit := {'1,'1,'1,'1,'1,'1,'1}; var tmp,y: bit;
var vect_ya_26: arr[8] bit; let auto_map_71(vect_xa_25: arr[8] bit) = LUT for vect_j_28 in 0, 8 { vect_ya_26[vect_j_28] := tmp := scrmbl_st[3]^scrmbl_st[0]; scrmbl_st[0:+6] := scrmbl_st[1:+6]; scrmbl_st[6] := tmp; y := vect_xa_25[0*8+vect_j_28]^tmp; return y }; return vect_ya_26 in map auto_map_71
Vectorization
Automatic lookup-table-compilationInput-vars = scrmbl_st, vect_xa_25 = 15 bitsOutput-vars = vect_ya_26, scrmbl_st = 2 bytesIDEA: precompile to LUT of 2^15 * 2 = 64K
59
Question How to implement bit permutation as LUT?(idea from SORA):
out_arr := perm({0,2,3,1}, in_arr);in_arr = {1,2,3,4} out_arr = {1,3,4,2}
Hint: permutation indices are constants!
60
Answerlet perm(p : arr int, iarr : arr bit) =
var oarr : arr[length(p)] bit;
var oarrtmp : arr[length(p)] bit;
var iarr1 : arr[8] bit;
unroll for j in [0,length(p)/8] {
let p1 = p[j*8,8];
iarr1 := iarr[j*8,8];
perm8(p1,iarr1,oarrtmp)
oarr := v_or(oarr,oarrtmp);
}
return oarr;
in
let perm8(p : arr[8] int, iarr : arr[8] bit, oarr : arr bit) =
for i in [0,8] {
oarr[p[i]] := iarr[i];
}
in
iarr
oarr
LUT size: 2^8 * sizeof(oarr)
Constants!
out_arr := perm({0,2,3,1}, in_arr);
61
Supporting different HW architectures Work in progress… SMP vs FPGA vs ASIC Pipeline and data parallelism SIMD, coprocessors (DSP or ASIC)
62
Pipeline parallelismofdm |>>>| decode >>> packetize
ofdm >>> write(q1) >>> read(q1) >>> decode >>> packetize
ofdm >>> write(q1)
Thread 1, pin to Core 1
read(q1) >>> decode >>> packetize
Thread 2, pin to Core 2
Sync queue
63
Code Examples
64
Performance evaluation
Msample*/sec (sample = 32bits)
SORA Ziria Wifi
RX 6Mbps 164 156 40
RX 12Mbps 125 100 40
RX 24Mbps 81 67 40
RX 48Mbps 61 52 40
RX CCA 289 163 40
Mbps/sec SORA Ziria Wifi
TX 6Mbps 54 51 6
TX 12Mbps 98 45 12
TX 24Mbps 145 53 24
TX 48Mbps 231 70 48
- WiFi RX and TX measurements
65
Real-time LTE-like demo
66
Status Released to GitHub under Apache 2.0 WiFi implementation included in release Currently supports SORA platform Essential dependency on CPU/SIMD Looking into porting to other CPU-based SDRs
67
Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions
68
Motivation Wireless architecture and design are fragmented:
EE considers PHY and parts of MAC CS considers MAC and above Opportunities for synergies missed: Q: How to change PHY to allow better/simpler network designs?
69
Examples: HARQ and/or rateless codes: don’t care about rate adaptation, just keep sending
Correlation and detection: detection takes time => huge MAC overheads and terrible efficiency
Channel impulse response and localization: more precise location information
70
Conventional WiFi design PHY: standardized and cannot be changed
data pipe, correct bits getting in and out
MAC: innovation happen Conventional: CSMA Many alternatives
71
Conventional cellular design PHY and MAC are standardized
Several modes of operations but none can be changed by applications
IP layer and above: innovations can happen
72
In reality: We have standard blocks:
Correlators, scramblers, coders/decoders, interleavers, FFT/IFFT
Why not allow building network from these standard blocks? Respect certain ground rules Allow innovations
73
How to design a wireless transceiver? Challenges: performance, complexity, (cost/patents)
CDMA: Uses pseudo-random code against multi-path Not as complicated to implement as OFDM based systems Difficult to equalise the overall wide spectrum
OFDM: Uses subcarriers to combat multi-path combat multipath with greater robustness and less complexity. OFDMA can achieve higher spectral efficiency with MIMO than CDMA
Rule of a thumb: For data rates >= 10 Mbps use OFDM
74
OFDM OFDM used in most of the contemporary PHYs: WiFi, LTE, WiMax, 60 GHz, UWB
OFDMA is OFDM variant used in cellular (LTE, WiMax): Multiple users sharing the same OFDM symbol MAC scheduling at sub-symbol level Blurred distinction between MAC and PHY
75
WiFi TX Overview OFDM transmitter (@56Mbps):emits createSTSinTime(); emits createLTSinTime();crc216(h.len) >>> scrambler() >>> encode34() >>> interleaver_m64qam() >>> modulate_64qam() >>> add_pilots() >>> ifft() Preamble for detection and channel estimation
CRC (with padding) to check for errors Scrambler prevents high peaks Interleaver decouples errors
76
IFFT example
77
OFDM Symbol Add pilots, perform IFFT and add cyclic prefix
78
Channel impulse response
79
Effect on OFDM symbol in time
80
Effect on OFDM symbol in frequency
81
Channel Estimation
82
Channel inversion/equalization
83
How to deal with multi-path? Add cyclic prefix
84
Missing Cyclic Prefix
85
Scrambler
Avoids large peak-to-avg power ratios
86
WiFi RX Overviewlet comp receiver() =seq{ (removeDC() >>> t<-detectSTS()) ; params <- ChannelEstimation() ; removeCP() >>> FFT() >>> ChannelEqualization(params) >>> PilotTrack() >>> RemovePilots() >>> receiveBits() }in
87
Pilots Channel estimation changes across OFDM symbols Channel changes Drift in oscillators between sender and receiver
Pilots are used for channel re-estimation Similar as initial channel estimation Interpolation for data points
88
Carrier Sensing and Synchronization Find where packet starts Accurate timing needed for the rest of RX Also used to estimate CFO Also used in carrier sensing
89
Preamble
90
Detection using correlation Correlate for a known preamble
How do we implement this in CPU/SIMD?
91
Detection
92
Performance of Detector
93
Layout Introduction Ziria Programming Language Compilation and Execution Case Study - WiFi Design Conclusions
94
Conclusions Wireless innovations will happen at intersections of PHY and MAC levels
We need prototypes and test-beds to evaluate ideas
PHY programming in its infancy Difficult, limited portability and scalability Steep learning curve, difficult to compare and extend previous works
Wireless programming is easy and fun – go for it!http://research.microsoft.com/en-us/projects/
ziria/