dsp design in wireless communication
TRANSCRIPT
DSP Design in Wireless
CommunicationLIANG LIU AND FREDRIK EDMAN, [email protected]
Wireless connections in a phone
Evolution of mobile communication
CDMA OFDM MIMOMassive
MIMO
Wideband HighMod
Figure from ResearchGate
Same in other communication systems
Wider road
Multiple dimensions
Frequency
Wider bands
More bits
Multiple antennas
More passengers
Algorithms beats Moore beats Chemists
1
10
100
1000
10000
100000
1000000
10000000
1980
1984
1988
1992
1996
2000
2004
2008
2012
2016
2020
Algorithmic Complexity
Processor Performance
(~Moore’s Law)
Battery Capacity1G
2G
3G
Courtesy: Ravi Subramanian (Morphics)
Design Considerations
Flexible
High speed
ReliableSmall area
Low power
Low price
Time to market
Optimizing DSP Implementation
Functionality
Implement
ation
Cost
Performance
Optimization
Algorithm
Architecture
Module
Circuit
Optimizing DSP Implementation
CDMA
CDMA: Code Division Multiple Access
“A UMTS Baseband Receiver Chip for Infrastructure Applications”, Texas Instruments, S. Sriram et al.
Accumulator
UMTS filter in receiver systeman
t
RF ADC
Inband Outband
Reconfig.
UMTS filter
De-scramble
&
despreaddemod
data
desired signal
“Architectural Optimization for Low Power in a Reconfigurable UMTS Filter”, in Proceedings of
Wireless Personal Multimedia Communications Symposium (WPMC), Deepak Dasalukunte et al. San Diego, USA
33dB
Adaptive UMTS Filter (Algorithm level)
• minimum 33dB stop band attenuation (3GPP
specification)
• required filter length of 65 taps
In a Bad ChannelD D D D
...but ONLY 5 is needed in a Good one!
14
Constant Multiplier (Module level)
+D
+
Coe[127]
Coe[126]
+D
+
Coe[1]
Coe[0]
+D
+
Coe[127]
Coe[126]
+D
+
Coe[1]
Coe[0]
MUX
MUX
MUX
MUX
Shift
&
Add
15
Symmetric Coefficient (Architecture
level)
+D
+
+D
+
Coe[1]
Coe[0]
MUX
MUX
Shift
&
Add
OFDM
OFDM: Orthogonal Frequency Division
Multiplexing
N-p
oin
t ID
FT
Para
llel to
seri
alkx ,0
kx ,1
kNx ,1−
CP
ks ,0
ks ,1
kNs ,1−
OFDM
FFT/IFFT in OFDM Systems
Large number of subcarriers a large FFT, O(Nlog2N)
OFDM:• DVB-2/4/8k FFT
• WLAN IEEE802.11a/g-64 FFT (48+4 subcarriers)
• LTE – Long Term Evolution: 2k FFT
• 5G: 4K FFT over 200MHz bandwidth
• 6G?
X1(k)
X2(k) WNk
X1(k)+WNkX2(k)
-1X1(k)-WN
kX2(k)
•Folding
•Pipeline
•Parallel
+ →
FFT: VLSI Architecture
FFT: VLSI Architecture
• Symmetry of twiddle factors
– 𝑒𝑗𝑥 = 𝑐𝑜𝑠𝑥 + 𝑗𝑠𝑖𝑛𝑥
– Reduce number of twiddle factors by 87.5%
FFT: Twiddle Factors (module level)
A
B
-B
-A
• Shift-add for constant multiplier
ROM
ControlMUX
In Out
FFT: Twiddle Factors
MIMO
Transmit
AntennasReceive
Antennas
SISO
The Radio
Channel
MISO
Single Input Single Output
Multiple Input Single Output
(Transmit diversity)
Receive
Antennas
Transmit
Antennas
MIMO
The Radio
Channel
SIMO
Single Input Multiple Output
(Receive diversity)
Multiple Input Multiple Output
(Multiple data streams)
Multiple Antenna System, MIMO
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
r = Hs + nS/P
H =
h11 h12 …….. h1N
h21 h22 …….. h2N
hM1 hM2 …….. hMN
. . …….. .
N
M
hij models fading gain
between the jth transmit and
ith receive antenna
s r
Transmitted vector Received vector
MIMO System Model
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
high-complexity
Signal processing
r = Hs + n
sS/P
• Recover the transmitted signal s from the received signal r,
which contains interference and noise.
Interference
Cancellation!
MIMO Signal Processing - Receiver
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
Receiver Signal
Processing
Symbol
Detection
Channel
Estimation
Matrix
Manipulation
r = Hs + n
H^
H-1^
s = H-1r^ ^S/P
MIMO Signal Processing - Receiver
• Channel estimation: obtain the channel status by training signals
• Matrix manipulation: matrix inversion or decomposition
depending on detection algorithm
• Symbol detection: estimate transmitted signal s given channel
matrix H and received signal r
• Linear Detection: zero-forcing detection
• Maximum Likelihood (ML) Detection: exhaustive search
• 1.6×107 points per vector detection for 64-QAM, 4×4 MIMO
𝒓 = 𝑯𝒔 + 𝒏
ො𝒔𝒛𝒇 = 𝑯−𝟏𝒓 = 𝒔 + 𝑯−𝟏𝒏
ො𝒔𝒎𝒍 = 𝒂𝒓𝒈 max𝒔∈|𝑸|𝑵
𝒑(𝒔|𝒓,𝑯)
= 𝒂𝒓𝒈 min𝒔∈|𝑸|𝑵
|𝒓 − 𝑯𝒔|𝟐
MIMO Signal Detection
WLAN 802.11n Example
Modulation 256QAM; 4 Tx antennas; 108 sub-channels, 4ms per OFDM symbol
ML detection 1.159 x 1017 points/sec
Current DSP technology is 1G inst/sec 108 processors!
OR (“Moores Law” ..... processor capability doubles every 18 months)
MUST WAIT 40years!
Mike Faulkner 2005, Victoria Univ.
Today…
Intel i7 CPU: 1011 inst/sec
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
Receiver Signal
Processing
ML
Detection
Channel
Estimation
Matrix
Manipulation
H^
s S/P
High Complexity ML Detection
ML Detection Sphere Detection
Simplified 2D-case
Limited search space a reduced complexity
Sphere Decoding: Algorithm level
optimization
Sphere Decoding: complexity reduced
Chia-Hsiang Yang, University of California, Los Angeles, 2007
Detection Performance Computational Complexity
• Near optimal ML performance with significantly reduced computational complexity (# search points)
5 10 15 20 25 3010
-5
10-4
10-3
10-2
10-1
100
SNR (dB)
BE
R
ML
Tree-Pruning
Tree-Pruning+Reording
5 10 15 20 25 3010
0
101
102
103
104
105
SNR (dB)
Av
era
ge
# o
f S
ea
rch
Po
ints
ML
Tree-Pruning
Tree-Pruning+Reording
44 array
16-QAM
Total # search points=65536
Near ML detection with 0.1% computational complexity
Systolic Array
• Systolic array
– A homogeneous network of tightly coupled Processing Elements (PEs)
called cells or nodes
– The wave-like propagation of data through a systolic array resembles the
pulse of the human circulatory system, the name systolic was coined from
medical terminology
Systolic Array (example)
Systolic Array (T1)
Systolic Array (T2)
Systolic Array (T3)
Systolic Array (T4)
Systolic Array (T5)
Systolic Array (T6)
Systolic Array (T7)
Massive MIMO
Goes to ‘LARGE’ Dimension
• Further Scaling Up?
• Size limitation in terminals
• Power consumption in portable devices
Cellular Antenna WLAN Antenna
HSPA+ 2×2 802.11n 4×4
LTE 4×4 802.11ac 8×4
LTE-A 8×8 802.11? 16×16
How many antennas can we have?
2 in a
phone16 in a laptop
100X
Massive MIMO or Very-Large MIMO
BS
...
A1
AM
UE1
UE2
UEK
• We think of very-large MIMO (multi-user) system
• We mean M>>K>>1
• We are looking for M>100 antennas!
• We serve 10-20 users concurrentlyF. Rusek, D. Persson, B. K. Lau, E.G. Larsson, T.L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and
Challenges with Very Large Arrays”, IEEE Signal Processing Magazine, Jan. 2013
Dream case
Massive MIMO for 5G:
Imagine a highway created just for you,
no matter where you are!
4G System 5G System
Figure source: Softbank, WCP launch ‘5G Project’, deploying 100 Massive MIMO base stations
Dream case
Figure source: "Mitsubishi Electric's New Multibeam Multiplexing 5G Technology Achieves 20Gbps Throughput," 21 January
2016. [Online]. Available: http://www.mitsubishielectric.com/news/2016/0121.html.
Measurement setup
48
Base stationClosely spaced users
Overall setup from
base station view
BSUsers
Measurement results
➢22 closely located UEs (256-QAM) served simultaneously
➢Uncoded transmission
➢Equating a spectral efficiency of 145 b/s/Hz
49
DSP for Massive MIMO
Channel
Encoding
Interle
aving
Symbol
Mapping
OFDM
Modulation
Resampling
Filtering
Channel
Encoding
Interle
aving OFDM
ModulationResampling
Filtering
1
K
...
...
Analog
TX
Analog
TX
1
M
...
...
Channel
Decoding
Deinte
rleav
Symbol
Demap
Channel
Decoding
Deinte
rleav
Symbol
Demap OFDM
Demod.Digital
Front-end
1
K
...
...
Analog
RX
...
OFDM
Demod.Digital
Front-end
Analog
RX
Symbol
Mapping
1
M
1
M
MIMO
Precoding
Channel
Estimation
+
MIMO
Detection
Reciprocity
Calibration
Memory
Memory
Data
Tran
sfer N
etw
ork
Data
Tran
sfer N
etw
ork
Central
Processing
Per-antenna
Processing
Per-user
Processing
Design challenges
51
➢High computation count:❑ 2×1011 multiplication/s for ZF-based MIMO
processing
128×16 massive MIMO system with 20MHz
• 1.6×107 points per vector detection for 64-QAM, 4×4 MIMO
Not just data processing…
52
Digital Circuits:
Data Processing
Shanghai development
in 20 years
Data Transfer
Data Storage
Traffic
Parking
Design challenges
53
➢High computation count:❑ 2×1011 multiplication/s for ZF-based MIMO
processing
➢Large data storage:❑ 9.8MB memory for channel matrix
➢ Complicated data shuffling: ❑ 11GB/s information exchange for 16-bit wordlength
128×16 massive MIMO system with 20MHz
• 1.6×107 points per vector detection for 64-QAM, 4×4 MIMO
Memory subsystem in Massive MIMO
➢High capacity and throughput
❑ Channel matrix 128×16 in
massive MIMO v.s. 4×4 in LTE-A
➢Multiple access patterns
❑Column wise:HHH
❑Row wise: Hy
❑Diagonal wise: HHH+αI
➢Adjustable operand matrix size
54
HH
H/HH
H+αI/(HH
H+αI)-1
(K×K)
Flexible memory access
55
𝟏 𝟐 𝟑𝟒 𝟓 𝟔𝟕 𝟖 𝟗
1
4
7
2
5
8
3
6
9
Multi-bank memory
1
5
9
2
6
7
3
4
81
6
8
2
4
9
3
5
7
Conflict-free parallel memory scheme
56
Permutation Network
Memory
Bank01 2 15
Inverse-Permutation Network
read
Permutation
Pattern Generate
Unit
write
2048x32
Address
Generate Unit
Control Logic
v2v0 v1 v15
row
column
diagonal
116 2 3 4 5 6 7 8 9
116 2 3 4 5 6 714 15
116 2 3 4 512 13 14 15116 2 310 11 12 13 14 15
1168 9 10 11 12 13 14 156 7 8 9 10 11 12 13 14 154 5 6 7 8 9 10 11 12 132 3 4 5 6 7 8 9 10 11
1 1623456789234567891011456789101112136789101112131415
1 16 891011121314151 1623 101112131415
1 162345 121314151 16234567 1415
116 2 3 4 5 6 7 8 9116 2 3 4 5 6 714 15
116 2 3 4 512 13 14 15
8index of target
memory module
What can be done to reduce the memory area?
6G?
(sub) THz communication
58
Large intelligent surfaces
59
Figure source: Beyond Massive MIMO: The Potential of Data Transmission With Large Intelligent Surfaces
Conclusions
• Digital signal processing is evolving at fast pace with new
wireless technologies and applications
• “Optimal” DSP implementation is crucial to bring new DSP
algorithm into practice
• “Best” design achieves balanced trade-offs depending on
application requirements
• Optimization at earlier stages and do cross-layer (or co-)
optimization
• Keep tracking new technologies for both algorithm and
implementation
Thanks