wireless communication extensions for dsps and general purpose processors
DESCRIPTION
Wireless Communication Extensions for DSPs and General Purpose Processors. Sridhar Rajagopal COMP 625 April 17, 2000. Motivation. Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market. Outline. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/1.jpg)
Wireless Communication Extensions for DSPs and
General Purpose Processors
Sridhar Rajagopal
COMP 625
April 17, 2000
![Page 2: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/2.jpg)
April 17,2000 Sridhar Rajagopal 2
Motivation
Wireless, the next wave after Multimedia Highly Compute-Intensive Algorithms Real-Time Requirements Design based on Time-to-Market
![Page 3: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/3.jpg)
April 17,2000 Sridhar Rajagopal 3
Outline
Processor Core with Reconfigurable Support Permutation Based Interleaved Memory Processor Architecture -EPIC Instruction Set Extensions Truncated Multipliers Software Support Needed
![Page 4: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/4.jpg)
April 17,2000 Sridhar Rajagopal 4
Characteristics of Wireless Algorithms
Massive Parallelism Bit-level Computations Matrix Based Operations Memory Intensive Complex-valued Data Approximate Computations
![Page 5: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/5.jpg)
April 17,2000 Sridhar Rajagopal 5
What’s wrong with Current Architectures for these applications?
![Page 6: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/6.jpg)
April 17,2000 Sridhar Rajagopal 6
Problems with Current Architectures
UltraSPARC, C6x, MMX, IA-64 Not enough MIPs/FLOPs Unable to fully exploit parallelism Bit Level Computations Memory Bottlenecks Specialized Instructions for Wireless
Communications
![Page 7: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/7.jpg)
April 17,2000 Sridhar Rajagopal 7
Why Reconfigurable
Adapt algorithms to environment Seamless and Continuous Data Processing during
Handoffs
Home Area Wireless LAN
High Speed Office Wireless LAN
Outdoor CDMA Cellular Network
![Page 8: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/8.jpg)
April 17,2000 Sridhar Rajagopal 8
Reconfigurable Support
User InterfaceTranslation
SynchronizationTransport Network
OSILayers3-7
Data Link Layer(Converts Frames
to Bits)
OSILayer2
Physical Layer(hardware;
raw bit stream)
OSILayer1
![Page 9: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/9.jpg)
April 17,2000 Sridhar Rajagopal 9
Different Protocols
Source Coding Channel Coding
Channel
Decoding
Source
Decoding
Multiuser
Detection
Channel
Estimation
MPEG-4, H.723 - Voice,Multimedia
Convolutional,Turbo - Channel Coding
![Page 10: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/10.jpg)
April 17,2000 Sridhar Rajagopal 10
A New Architecture
Processor Core
(GPP/DSP)
Cache
Q Q
Crossbar
Reconfigurable
Logic
Real-Time I/O
Bit Stream
Main
Memory
RF Unit
Processor
Add-on PCMCIA Network Interface Card
![Page 11: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/11.jpg)
April 17,2000 Sridhar Rajagopal 11
Why Reconfigurable
Process initial bit level computations Optimize for fast I/O transfer
Reconfigurable
Logic
Real-Time I/O
Bit StreamRF Unit
![Page 12: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/12.jpg)
April 17,2000 Sridhar Rajagopal 12
Reconfigurable Support
Configuration Caches
2 64-bit data buses1 64-bit address bus
ControlBlocks
SequencerGARP Architecture at UC,Berkeley
Boolean values 64-bit Datapath Fast I/O
![Page 13: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/13.jpg)
April 17,2000 Sridhar Rajagopal 13
Reconfigurable Support
Wide Path to Memory
– Data Transfer
– Minimize Load Times
Configuration Caches
– Recently Displaced Configurations(5 cycles)
– Can hold 4 full size Configurations
Independent Execution
![Page 14: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/14.jpg)
April 17,2000 Sridhar Rajagopal 14
Reconfigurable Support
Access to same Memory System as Processor
– Minimize overhead
When idle
– Load Configurations
– Transfer Data
![Page 15: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/15.jpg)
April 17,2000 Sridhar Rajagopal 15
Operation
Load Configuration
– If in configuration cache, minimal time
Copy initial data with coprocessor move instructions
Start execution
Issue wait that interlocks while active
Copy registers back at kernel completion
![Page 16: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/16.jpg)
April 17,2000 Sridhar Rajagopal 16
Memory Interface
Access to Main Memory and L1 Data Cache– Large, fast Memory Store
Memory Prefetch Queues for Sequential Accesses– Read aheads and Write Behinds
Processor Core
(GPP/DSP)
L1 Data Cache
Q Q
Crossbar
Main
Memory
FPGA
Instruction Cache
![Page 17: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/17.jpg)
April 17,2000 Sridhar Rajagopal 17
Permutation Based Interleaved Memory (PBI)
High Memory Bandwidth Needed Stride-Insensitive Memory System for Matrices Multiple Banks Sustained Peak Throughput (95%)
L1 Data Cache
Main
Memory
![Page 18: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/18.jpg)
April 17,2000 Sridhar Rajagopal 18
PBI Scheme
N- address length
M = 2n Banks
2N-n words in each bank
To access a word,
– n-bit bank number
– N-n bit address (high-order)
Calculation of the n-bit Bank Number
![Page 19: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/19.jpg)
April 17,2000 Sridhar Rajagopal 19
Calculate Bank Number
Use all N bits to get n-bit vector Y = A X , A = n*N matrix of 0’s & 1’s
Y = AhXh + Al Xl (N-n,n) [Al -rank n]
N-bit parity circuit with logkN levels of XOR gates (k-
Fanin)
Parity Ckt.
Row 0 of A
Parity Ckt.
Row 1 of A
Parity Ckt.
Row n-1 of A
N-bit address
Decoder
n parity bit signals
2n bank select signals
![Page 20: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/20.jpg)
April 17,2000 Sridhar Rajagopal 20
Interleaved Memory Model
Address Source
M(0) M(1) M(M-1)
Data Sink Data Sequencer
Input Buffers
Output Buffers
Memory Banks
![Page 21: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/21.jpg)
April 17,2000 Sridhar Rajagopal 21
Processor Core
64-bit EPIC Architecture with Extensions(IA-64/C6x) Statically determined Parallelism;exploit ILP Execution Time Predictability
Processor Core
(GPP/DSP)
Cache
Q Q
Crossbar
FPGA
![Page 22: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/22.jpg)
April 17,2000 Sridhar Rajagopal 22
EPIC Principle
Explicitly Parallel Instruction Computing
Evolution of VLIW Computing
Compiler- Key role
Architecture to assist Compiler
Better cope with dynamic factors
– which limited VLIW Parallelism
![Page 23: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/23.jpg)
April 17,2000 Sridhar Rajagopal 23
Aspects of EPIC
Designing Plan of Execution(POE) at Compile Time
Permitting Compiler to play Statistics– Conditional Branches, Memory references
Communicating POE to the hardware– Static Scheduling– Branch information
![Page 24: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/24.jpg)
April 17,2000 Sridhar Rajagopal 24
Architecture Features in EPIC
Static Scheduling– MultiOP– Non-Unit Assumed Latency (NUAL)
The Branch Problem– Predicated Execution– Control Speculation– Predicated Code Motion
The Memory Problem– Cache Specifiers– Data Speculation
![Page 25: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/25.jpg)
April 17,2000 Sridhar Rajagopal 25
Instruction Set Extensions
To accelerate Bit level computations in Wireless
Real/Complex Integer - Bit Multiplications
– Used in Multiuser Detection, Decoding
Bit - Bit Multiplications
– Used in Outer Product Updates
– Correlation, Channel Estimation
Complex Integer-Integer Multiplications
Useful in other Signal Processing applications
– Speech, Video,,,
![Page 26: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/26.jpg)
April 17,2000 Sridhar Rajagopal 26
Architecture Support
Support via Instruction Set Extensions
Minimal ALU Modifications necessary
Transparent to Register Files/Memory
Additional 8-bit Special Purpose Registers
![Page 27: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/27.jpg)
April 17,2000 Sridhar Rajagopal 27
Integer - Bit Multiplications
64-bit Register A 64-bit Register C
+/- +/- +/-
64-bit Register D
D[I] = D[I] + b[J]*C[j]Eg: Cross-Correlation
8-bit Register b
Register Renaming?
![Page 28: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/28.jpg)
April 17,2000 Sridhar Rajagopal 28
8-bit to 64-bit conversions
D = D + b*bT
Eg: Auto-Correlation
b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8)
b(1)..b(8) b(1) b(1) b(8)
b(1)..b(8) b(1) b(2) b(8)b(7)
b(8)
8-bit Register b 64-bit Register A
1.1 1.2
2.1
![Page 29: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/29.jpg)
April 17,2000 Sridhar Rajagopal 29
Bit-Bit Multiplications
D = D + b*bT
Eg: Auto-Correlation
64-bit Register A = b1 64-bit Register B=b2
Ex-NOR
b1*b2Bit-Bit Multiplications
64-bit Register C=b1*b2
B1 B2 B1*B2
0 0 10 1 01 0 01 1 1
![Page 30: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/30.jpg)
April 17,2000 Sridhar Rajagopal 30
Increment/Decrement
64-bit Register D
+/- +/- +/-
64-bit Register (D+b1*b2)
8-bit Register b1*b2
1
D = D + b*bT
Eg: Auto-Correlation
![Page 31: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/31.jpg)
April 17,2000 Sridhar Rajagopal 31
Complex-valued Data Processing
Is it easy to add ? Is this worth an additional ALU Support ? Typically supported by Software!
?
![Page 32: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/32.jpg)
April 17,2000 Sridhar Rajagopal 32
Truncated Multipliers
Many applications need approximate computations Adaptive Algorithms :Y = Y + mu*(Y*C) Truncate lower bits Truncated Multipliers - half the area/half the delay Can do 2 truncated multiplies in parallel with
regular
Multiplier 1 Multiplier 2Truncated
Multiplier
ALU Multipliers
![Page 33: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/33.jpg)
April 17,2000 Sridhar Rajagopal 33
Software Support
Greater Interaction between Compilers and Architectures
– EPIC– Reconfigurable Logic
Compiler needs to find and exploit bit level computations
Reconfigurable Logic Programming
![Page 34: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/34.jpg)
April 17,2000 Sridhar Rajagopal 34
Area Estimates
Area increase by 20% over a IA-64 architecture size
due to reconfigurable Support
Instruction Set extensions need min hardware
support
Parallel Interleaved Memory Banks will need larger
area
![Page 35: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/35.jpg)
April 17,2000 Sridhar Rajagopal 35
Other Uses
Reconfigurable Logic– For accelerating loops of general purpose processors
Bit Level Support– For other voice, video and multimedia applications
![Page 36: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/36.jpg)
April 17,2000 Sridhar Rajagopal 36
Conclusions
Processor Core with Reconfigurable Support developed for Wireless Applications
Instruction Set Extensions added for accelerating performance of the algorithms
Integration of Wireless Appliances with General Purpose Processors
Great Impact on Performance of Wireless Algorithms
![Page 37: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/37.jpg)
April 17,2000 Sridhar Rajagopal 37
Future Work
Simulations for finding performance improvements
Other Processor Architectures– Bit Slice Architectures– Out-of-order
![Page 38: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/38.jpg)
April 17,2000 Sridhar Rajagopal 38
References
The GARP Architecture and C Compiler
– T.C. Callahan,J.R.Hauser,J.Wawrzynek, IEEE Computer,April 2000, pp62-
69
http://brass.cs.berkeley.edu
EPIC:Explicitly Parallel Instruction Computing
– M.S.Schlansker,B.R.Rau, IEEE Computer, Feb 2000, pp 37-45
High-Bandwidth Interleaved Memories for Vector
Processors - A Simulation Study
– G.S.Sohi, IEEE Transactions on Computers, Vol.42,No.1,Jan 1993,pp34-44
![Page 39: Wireless Communication Extensions for DSPs and General Purpose Processors](https://reader035.vdocuments.mx/reader035/viewer/2022062721/56813841550346895d9fecf0/html5/thumbnails/39.jpg)
April 17,2000 Sridhar Rajagopal 39
Acknowledgements
Vijay Pai Partha Ranganathan Joseph Cavallaro