vlsivada.skku.ac.kr/classinfo/microsystem/dsp/sunwoo.pdf · multi-bus structure ... tms320c5x...
TRANSCRIPT
1
ASIC SYSTEM LAB./AJOU UNIV.
������� VLSI ��
��� � ����� ��� ��
ASIC SYSTEM LAB./AJOU UNIV.
Contents
● Digital Signal Processing● Basic Architectures for DSP Algorithms● Comparison with Microprocessors● Fixed-Point DSP Chips : DSP56100 (Motorola)● Multimedia DSP Chips
◆ MediaProcessor◆ TriMedia
● Trends of Future DSPs● VLSI Architectures for Communications
◆ Fast Fourier Transform◆ Viterbi Decoder◆ Reed-Solomon Decoder◆ Equalizer
2
ASIC SYSTEM LAB./AJOU UNIV.
What is Digital Signal Processing?
● Analog Signal vs. Digital Signal◆ Analog Signal : Continuous Time and Continuous Amplitude◆ Discrete Time Signal : Discrete Time and Continuous Amplitude◆ Digital Signal : Discrete Time and Discrete Amplitude
● Advantages of Digital Signal Processing◆ Guaranteed Accuracy
� Specify Sampling Rate, Word Length and Algorithm� Independent on Time, Temperature, Humidity
◆ Low Sensitivity of Noise and Error Correctable◆ Digital system : Small, Cheaper, Less Power because of VLSI◆ Flexibility of System : Reprogrammable◆ Reliable & Predictable
● Disadvantages◆ Finite Sampling Rate & Word Length Problem◆ Wide Bandwidth for Data Transfer
ASIC SYSTEM LAB./AJOU UNIV.
Why Digital Signal Processor?
Low-passFilter
High-passFilter
Amplifier
ConvolverFourier
TransformAnalog
Systems
D/AConverter
DSPD/A
Converter
ManyAlgorithms
AnalogSignal
DigitalSignal
DigitalSignal
AnalogSignal
AnalogSignal
AnalogSignal
Digital Domain
Analog Domain
3
ASIC SYSTEM LAB./AJOU UNIV.
DSP Algorithms
● Convolution
y[n] =
◆ Basic Output Sequence of LTI Digital Systems
● Correlation
y[n] =
◆ Signal Matching
● Discrete Fourier Transform (DFT)
X[k] = x[n]exp(-j2πkn / N)
◆ Spectral Analysis of Signals
h[k]x[n k]K 0
−=
∞
∑
n 0
N 1
=
−
∑
x [n]x [n k]1 2 +=
−
∑n 0
N 1
ASIC SYSTEM LAB./AJOU UNIV.
DSP Algorithms (cont.)
● Z-Transform
X(z) =
◆ System and Signal Analysis
● Finite Impulse Response (FIR) Filtering
y[n] =
◆ Linear Phase and Stable Response Filtering
● Infinite Impulse Response (IIR) Filtering
y[n] =
◆ Sharper Cutoff Filtering than FIR with the Same Number of Taps
h[k]x[n - k]k 0
N 1
=
−
∑
x[n] Z-n
z 0=
∞
∑
a x[n - k] - b y[n - k]k k
k=1
M
k 0
N
∑∑=
4
ASIC SYSTEM LAB./AJOU UNIV.
Basic Architecture for DSP Algorithms
Inst.Memory
X DataMemory
Y DataMemory
AddressGeneration
Unit
Multiplier
Adder & Acc
Inst. bus
A=X*Y+A
X*Y
X Data bus
ProgramControl
Unit
X Y
Y Data bus
X Address bus
Y Address bus
ASIC SYSTEM LAB./AJOU UNIV.
Microprocessor System Block Diagram
5
ASIC SYSTEM LAB./AJOU UNIV.
CPU Block Diagram
ASIC SYSTEM LAB./AJOU UNIV.
MCU Block Diagram
6
ASIC SYSTEM LAB./AJOU UNIV.
Micro-Instruction Format
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors
● Harvard Architecture
◆ X&Y Data Memories, Instruction Memory
● Multi-Bus Structure
◆ Minimize Bottleneck Problem● Three Separate Parallel Units
◆ Data Calculation Unit
◆ Program Control Unit
◆ Address Generation Unit
<Example> MAC x1, y1, A X:(R0)+, y1 X:(R3)+, x1● On-chip Peripherals
◆ A/D and D/A Converter, PLL, DMA, Host Interface,SIO and PIO Ports, Timer, Viterbi Accelerator, etc.
7
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors (cont.)
● Data Calculation Unit
◆ MAC Unit: Multiply and Accumulate in a Single Inst. Cycle
◆ Extended ALU and Accumulator� Prevent Overflow and Support Multiprecision
◆ Barrel Shifter� Variable Length Shift within One Cycle� Multi-precision and Scaling Operations
◆ Sine or Cosine ROM Table for DFT, FFT, DCT Algorithm
● Program Control Unit
◆ Fast Interrupt Service for Real-time Applications
◆ Multiple Level Hardware Stack for Nested Hardware Do Loop● Address Generation Unit
◆ Many Memory Address Registers� Various Addressing Mode� Linear, Modulo(filtering), Bit-reverse(FFT), Offset
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors(Multimedia DSPs)
● Superscalar, VLIW, SIMD, Multithreading Architectures
◆ Multiple Funtional Units
◆ Large and Multi-port Register Files
● Handle Various Data Types◆ Four Packed Data Types
� Packed Bytes, Packed Words, Packed Double Words and PackedQuad Words
Packed Double Words
Packed Bytes
Packed Words
Packed Quad Words
063 7
15
31
63
8
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors (cont.)(Multimedia DSPs)
● Load/Store Units◆ Block Load/Store Scheme
◆ Various Addressing Modes◆ Big- or Little-endian Addressing Modes
● Packed Operations for Group Data
ex) Packed Addition : Add two Packed Words and Clips (Saturation), the Results to the Maximum Values if there is an Overflow
a2a1 7FFFha3
b2b1 0001hb3
A2+b2A1+b1 7FFFhA3+b3
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors (cont.)(Multimedia DSPs)
● Switching Network◆ Deal with Mixed-Precision Data
◆ Rearrange, Expand, Pack, Merge● Compression for MPEG-2 (Motion Estimation)
ex) SAD (Sum of Absolute Difference)
a1 a2 a3 a4
|a1-b1| |a2-b2| |a3-b3| |a4-b4|
b1 b2 b3 b4
�
9
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons with Microprocessors (cont.)(Multimedia DSPs)
● Multiple Operations in One Inst. Cycle ex) Group-Multiply-and-Add
Multiply four Packed Bytes and Add four Packed Word
a b c d
XXXX
e f g h
++++
i j k l
a*e+i b*f+j c*g+k d*h+l
ASIC SYSTEM LAB./AJOU UNIV.
Special DSP Instructions
● Multiply and Accumulate instruction (MAC)
◆ Major Operation of DSP Algorithms
● Normalization Instruction
◆ Normalize Extended Value in ALU and Accumulator● Various Arithmetic and Logical Shift Instructions
◆ Multi-precision Data Operations
● Hardware Do Loop Instruction
◆ Useful for Do Loop Type Algorithms
● Block Data Move Instruction◆ Use Instruction Memory in Single Data Memory DSPs
● Stand-by Inststruction
◆ Low Power in Mobile Communications
10
ASIC SYSTEM LAB./AJOU UNIV.
Special Multimedia DSP Instructions
● Computation
◆ Partitioned add/subtract
◆ Partitioned Multiply
◆ Partitioned Compare◆ Group-Multiply-and-Add
● Data Format Conversions
◆ Pixel expand
◆ Pixel packing
◆ Pixel merge● Compression
◆ Pixel distance (SAD)
src1
src2
dst1
< Partitioned Add/Subtract >
015
src1
src2
dst1
< Pixel Expand >
015
031
ASIC SYSTEM LAB./AJOU UNIV.
Commercial Fixed-Point DSP Chips
Model
Company
Data/Microcode
Inst. Set
Pipeline Depth
MemSize
Ext.
Int.
Cache Size
BUS
Acc. Size
PGM
Data
DSP56100
AT&TMotorola
DSP1610Texas Instruments
TMS320C5x ADSP2100 OAK D950CORE uPD77017
Analog Device DSP-Group SGS-Thomson NEC
16/16 16/16 16/16 16/24 16/16 16/16 16/32
87 48 124 31 NA NA 57
3 3 4 2 3 3 3
64Kx16 16Kx24, 16Kx1664Kx16 64Kx16 64Kx16 64Kx16 64Kx16
2Kx16 512x16 9Kx16 2Kx24 xx
(12K+256)x32
6Kx16x22Kx16, 2Kx161Kx161056x16 (Dual)8Kx16 (Dual)4Kx16 (Dual)
x 15x16 x 16x24 x x x
6 5 5 5 6 6 3
2x40 2x36 (Buf. 2x36) 32 (Buf. 32) 40 4x36 2x40 x
Shifter1,4,16
Hardwired36 Barrel Shifter 16 Barrel Shifter 32 Barrel Shifter 32 Barrel Shifter 40 Barrel Shifter 40 Barrel Shifter
ALU Size 32 36 32 16x40 36 40 40
DataMemRegs
Addr
Offset
Modulo
R0-R3 (4x16) R0-R3 (4x16) AR0-AR7 (8x16) I0-I7 (8x14)AX0-1,AY0-1
(4x16)X pointerY pointerGeneral
PurposePoint
Registers(8x16)
N0-N3 (4x16) j,k (2x16) INDX M0-M7 (8x14)DX0-3,DY0-3
(4x16)
BX,MX,BY,MY(4x16)
L0-L7 (8x14)CBSR1-2,
CBER1-2 (4x16)rb,re (2x16)M0-M3 (4x16)
GeneralRegister Bank
(8x40)
11
ASIC SYSTEM LAB./AJOU UNIV.
DSP56100 Features
● Performance : 66MIPS@15ns
● Instruction/Data Width : 16/16
● Multi-bus Structure (Program : 2, Data : 4 )
● Pipeline Stage : Fetch, Decode, Execute● Hardware Stack Levels : 15 x 32 bit
● Fast Interrupt Processing
● Hardware Loop Structure
◆ LA (16 bit), LC (16 bit)
● Accumulation Width : 2 x40 bit● 1, 4, 16 Hardware Shifter
● Modulo, Bit-reverse Addressing
● Division, Double-precision Multiplication Instruction
ASIC SYSTEM LAB./AJOU UNIV.
DSP 56100 Architecture
12
ASIC SYSTEM LAB./AJOU UNIV.
Data ALU
ASIC SYSTEM LAB./AJOU UNIV.
Data ALU (cont.)
● Two 40 bit Accumulator : 2x32 bit Accumulator Registers,
2x8 bit Accumulator Extension Registers
● MAC Unit
◆ 16x16 Multiplier with 32 bit Product◆ Arithmetic Operation : 40 bit Result
◆ Logical Operation : 16 bit Result
◆ ZB Multiplexer
● Accumulator Shifter, Output Shifter
● Data Shifter / Limiter : Scaling, Limiting● Data ALU Arithmetic and Rounding
◆ Fractional, Integer, Multiprecision Arithmetic support
◆ Rounding : Convergent, Two’s Complement Rounding
13
ASIC SYSTEM LAB./AJOU UNIV.
Program Control Unit
Program Counter
Loop Address
Loop Count
Stack Pointer
OMR SR
32 X 15
HardwareStack
16
Clock
Interrupts
Control
Address Data
Global Data Bus
16 16
ASIC SYSTEM LAB./AJOU UNIV.
Program Control Unit
● Program Address Generation● Instruction Decoding
● Hardware Do Loop Control
● Interrupt Control
● Components
◆ Program Counter (PC)◆ Loop Address (LA) : Where to End of Loop
◆ Loop Counter (LC) : Number of Iteration
◆ Status Register (SR)
◆ Operating Mode Register (OMR)◆ Stack Pointer (SP)
◆ System Stack : Store PC and SR for Subroutine Call and
Long Interrupt
14
ASIC SYSTEM LAB./AJOU UNIV.
Fast Interrupt
ASIC SYSTEM LAB./AJOU UNIV.
Address Generation Unit
15
ASIC SYSTEM LAB./AJOU UNIV.
Address Generation Unit (cont.)
● Effective Address Calculation
● Perform Linear, Modulo, Bit-reverse Addressing● Components
◆ Address Register File (Rn), Offset Register File (Nn),
Modifier Register File (Mn), Temporary Address Register� Where : n=0 ~ 3
◆ AGU Status Register
◆ PC Relative Addressing Unit
◆ Secondary Offset Adder Unit◆ Modulo Arithmetic Unit : Offset Adder, Modulo Adder,
Reverse Carry Adder
ASIC SYSTEM LAB./AJOU UNIV.
DSP56100 Instruction Set
● Number of Instructions : 87● Arithmetic Instruction : Within Data ALU
◆ Add/Sub Group : ADC, ADD, SBC, SUB, SUBL, DEC, DEC24, INC, INC24
◆ Mul/Div Group : IMPY, MPY, MPYR, MPY(su,uu), DIV
◆ MAC Group : MAC, MACR, DMAC, MAC(su,uu), IMAC
◆ Shift Group : ASL, ASL4, ASR, ASR4, ASR16, NORM◆ Transfer Group : Tcc, TFR, TFR2, TST, TST2, SWAP
◆ ABS, CLR, EXT, ZERO, etc.
● Logical Instructions
◆ AND,EOR,NOT,OR,LSL,LSR,ROL,ROR◆ ANDI,ORI : Immediate Program Controller Register
● Bit Field Manipulation Instructions
◆ BFCLR, BFSET, BFCHG, BFTSTL, BFTSTH
16
ASIC SYSTEM LAB./AJOU UNIV.
DSP56100 Instruction Set (cont.)
● Move Instructions◆ LEA : Load Effective Address◆ MOVE, MOVE(C), MOVE(I), MOVE(M), MOVE(P), MOVE(S)
● Program Control Instructions
◆ Bcc, BSR, BRA, BScc : Branch Instruction
◆ Jcc, JMP, JSR, JScc : Jump Instruction◆ REP, REPcc : Repeat Instruction◆ DO, DO FOREVER, ENDDO : Loop Instruction◆ BRKcc : Conditional Exit from Hardware Loop
◆ DEBUG, DEBUGcc : Debug Mode Instruction
◆ RTI, RTS : Return Instruction (Interrupt, Subroutine)
◆ NOP, STOP, WAIT, SWI
ASIC SYSTEM LAB./AJOU UNIV.
FIR Filter Implementation Example
17
ASIC SYSTEM LAB./AJOU UNIV.
FIR Filter Segmemt
MOVE #XADDR, R0 MOVE #K-1, M0 MOVE X:INPUT, X:(R0) MOVE #CADDR, R3 MOVE #K-1, M3 CLR A X:(R0)+, y1 MOVE X:(R3)+, x1 REP #K MAC x1, y1, A X:(R0)+, y1 X:(R3)+, x1 RND A MOVE A, X:OUTPUT
ASIC SYSTEM LAB./AJOU UNIV.
SDSP 56116
● ��� ����� 16 �� � !"# DSP $
● %& '�
◆ Motorola() DSP56116 $) �*+ ,-. �/
◆ 0�� �1�) 23 ���� 4 I/O �56 "789 :; �<◆ �<= 23�� �� 4 >� ?@AB CDEF ��
◆ (�� GHI�J 4 K�5 L3 M�
◆ VHDL6 ��N Top-down IC OP
◆ COMPASSTM 4 SYNOPSYSTMQ ��N R�-3
◆ (�) STDU. �� VW
18
ASIC SYSTEM LAB./AJOU UNIV.
SDSP 56116 Features
● Performance : 20MIPS@40MHz
● Instruction/Data Width : 16/16
● Multi-bus Structure (Program : 2, Data : 4)
● Pipeline Stage : Fetch, Decode, Execute● Hardware Stack Levels : 15 x 32 bit
● Fast Interrupt Processing
● Hardware Loop Structure : LA (16 bit), LC (16 bit)
● Accumulation Width : 2 x40 bit
● 1, 4, 16 Hardware Shifter● Modulo, Bit-reverse Addressing
● Division, Double-precision Multiplication Instruction
● Limiting and Scaling Mode
● Power Down Mode : STOP, WAIT
ASIC SYSTEM LAB./AJOU UNIV.
Chip Characteristics
● 0.8 micron VTI Standard Cell Library (v8r4)
● Total Gate Count : 70,000
● Package : 100 Pin MQFP
● Components◆ Data ALU (DALU)
◆ Program Control Unit (PCU)
◆ Address Generation Unit (AGU)
◆ Dual-port RAM
19
ASIC SYSTEM LAB./AJOU UNIV.
Multimedia DSPs
● Architecture Features
◆ VLIW : Multiple Functional Units
◆ SIMD : Partitioned Operations for Multiple Data
◆ Multithread : Multiple Threads executed in Parallel◆ Vector Processor : Vectorized Operations
● Multimedia DSP
◆ MediaProcessor (MicroUnity) : Multithreading
◆ TriMedia (Philips) : VLIW
◆ Mpact (Chromatic) : VLIW, SIMD, Vector Processor◆ TMS320C6x (Texas Instruments) : VLIW
ASIC SYSTEM LAB./AJOU UNIV.
MicroUnity MediaProcessor
MediaCodec
audio
video
radio
net
tape
disc
MediaBridge
DRAM
PCI
MediaChannel
I-BUF /Cache
ETLB& Tags
D-BUF /Cache
Regs
Br & Gate
L,S & Sync
+/x
/ / / / / / / / / / / / /
EMath
Media Processor
20
ASIC SYSTEM LAB./AJOU UNIV.
MicroUnity MediaProcessor (cont.)
● Five-cylinder Multithreaded Microprocessor
◆ 200 MHz for Each Task
◆ Five Register Files : 64 x 64 bit Registers / Register File
● Performance : 300 MHz (0.6 µm CMOS) or 1 GHz (0.5 µm BiCMOS)● Configuration
◆ Branch and Gate Execution Unit
◆ Load/Store and Synchronization
◆ Adder/Multiplier Unit
◆ 128-bit Data Crossbar Switch◆ Extended Math Unit
● Interface
◆ Media Codec : Audio, Video, Radio, Network, Tape, Disk, etc.
◆ Media Bridge : PCI, DRAM◆ Media Processor
ASIC SYSTEM LAB./AJOU UNIV.
Philips TriMedia
� �
��������
�� �����������
����
����
�
���
�� �����������
�� ������������
�����������
�����������
�����������
�����������
�����������
������������������������������������� �����!��!���"�����#����
������ ������ $�
���
● VLIW Architecture
◆ Five RISC Operations per Clock at100 MHz
◆ Register File : 15 Reads/5 Writes
◆ Crossbar Network◆ Instruction Coding
� Uncompressed RISC InstructionEncoding : 42 bit
● Performance : 2 to 4 BOPS@100MHz● Interface
◆ PCI Master/Slave Bridge (400 Mbps)
◆ Digital Camera, Video Encoder,Stereo Audio ADC/DAC
◆ V.34 Modem Analog Front End orISDN Terminal
21
ASIC SYSTEM LAB./AJOU UNIV.
Philips TriMedia (cont.)
● 27 Functional Units
◆ 5 Constants, 5 Integer ALUs, 2 Load/Store Units, 2 Shifters, 3Branch Units, 2 Integer/FP Multipliers, 2 FP ALUs, 1 FP Compare,1 FP Sqrt/Div, 2 DSP ALUs, 2 DSP Multipliers
◆ Number of decoders : 5� 27 Fu, Classify 5 Groups to Reduce Decoder Size
● VLIW Instruction Size
◆ Uncompressed : 42 bit x 5 = 210 bit
◆ Compressed : 32 bit (Huffman Coding)● DMA Mastering-video & Audio-I/O Units (Data Prefetch) Configuration
◆ Video/Audio DMA In, Out
◆ VLD (Variable Length Decoder) Coprocessor
◆ Image Coprocessor (MPEG-1, MPEG-2)
ASIC SYSTEM LAB./AJOU UNIV.
0�� MultimediaQ XN � !"# DSP (MDSP)
Fixed-point DSP(Multimedia)
Multimedia DSP(Portability)
Multimedia Portability
+
MDSP(Portable
Multimedia)
● �Y DSP $) ZM#
◆ [\]�+ DSP $� �^, ��_� 0�� `�abc d-e f2
◆ � !"# DSP $� �ghdG [\]�+ i�j��
● VWN MDSP) Lk
◆ [\]�+ + 0� �� `�ab◆ [\]�+ i�j ���l +� !"# DSP �l
22
ASIC SYSTEM LAB./AJOU UNIV.
MDSP Features
● m^, m�_ � !"# DSP n+● SIMD + Vector Processing + DSP● 23, 2o 4 ap ����Q XN ^qdi�j r� : 8-, 16-, 32-, 40-bit
● JXs t�uv + Packing t�uv◆ ghdG i�j q/
◆ wH %x) y1● 4zP {�C|G : Fetch ➞ Decode ➞ Ex1
➞ Ex2● 8V) Nested 8}~+ FOR �C ��
● 2V) Parallel Move● 24 �� ��vD n}● �CH : Barrel �CH, Prescaler
����� ���
IDB
�� � ��
VALUBashifter
VMPY
prescaler
Vadder Vadder
Pack. net
�� ��� ��
�� ��� ��
VMPY
prescaler
XDB, YDB
Execution 1
Execution 2
<DPU ����>
ASIC SYSTEM LAB./AJOU UNIV.
Chip Characteristics
● 5V 0.6µm SOG (�� KG75)● ��� � : 68,831● �� ��� : 30 MHz● Package : 160 pin QFP● Performance
◆ DCT : MPEG2 MP@ML(720 x 480 )
: 5 frames / sec◆ Motion Estimation : MPEG1 �� CD
(352 x 240) 3 frames / sec◆ ���� ���� (��� �� �) ➞
Test Board
���
����
FFT IIR FIR AdaptiveFilter
10N
7N
6N
5N
N1/2N
3N
2N
DSP56100
MDSP
Chip
23
ASIC SYSTEM LAB./AJOU UNIV.
What Should We Do for the Next Century?
● Lots of Circuits Level works
◆ High Speed Clock
◆ Low Power, Low Cost
● Parallel Programmable DSP Architectures◆ Employ VLIW / RISC Superscalar (RISC-SS) Architecture
� High Speed Coupled with Parallel Execution
� Good Compiler Efficiency� Poor Code Density (VLIW) vs. Good code Density (RISC-SS)� High Power (VLIW) vs. Low Power (RISC-SS)
� Difficult (VLIW) vs. Easy (RISC-SS) to Program by Hand
◆ High Level Languages suitable for Parallel Architectures
● Architecture Driven Algorithms for Multimedia Functions● Hardware / Software Co-design Approach should be used for
Optimized Systems
ASIC SYSTEM LAB./AJOU UNIV.
Multimedia Terminal Should Have
● 2 MPEG-2 Codecs : 8 GOPS
● 2 CG Generators : 4 GOPS
● Stereo Echo Canceler : 4 GOPS
● Background Removal : 4 GOPS 20 GOPS
● Future DSP Chips should be
◆ Low Price
◆ Programmable◆ 20 GOPS DSP Chip in the Year 2000
24
ASIC SYSTEM LAB./AJOU UNIV.
MPU History
����
���
��
���
���
��������
��
������
��� ���� �
������
���������� �
����
�����������
���� ��������������
�������� ���������
���� ����
!"���#$!!!��
ASIC SYSTEM LAB./AJOU UNIV.
Programmable DSP Chips
���
��
�������� ��������������� ��
���� ����
����
����
���� ���� ���� ����
����������
�����������������
����
�������� �!���"
���
��
���
�����
25
ASIC SYSTEM LAB./AJOU UNIV.
● Fast Fourier Transform (FFT) Algorithm
◆ One of Discrete Fourier Transform (DFT)
◆ Reduce Computation
◆ FFT Method : Radix-2, Radix-4
● Example : Orthogonal Frequency Division Multiplexing (OFDM)
Fast Fourier Transform
IFFTTransmit
Filter
Channel
FFT
����
����
�(2π∆�(�+φ)
�����
�����
�
�
�
�
�
�
�
�
�
�
�
�
������������ ���������
Serialto
Parallel
Parallelto
Serial
Serialto
Parallel
Parallelto
Serial
Reference : JCCI’98 pp 879~883
ASIC SYSTEM LAB./AJOU UNIV.
● Radix-2 Butterfly Algorithm
OUT0 = IN0 + IN1
OUT1 = (IN0 - IN1) WNk
● Radix-4 Butterfly Algorithm
OUT0 = [(IN0 + IN2) + (IN1 + IN3)]
OUT1 = [(IN0 - IN2) - j(IN1 - IN3)] WNk
OUT2 = [(IN0 + IN2) - (IN1 + IN3)] WN2k
OUT3 = [(IN0 - IN2) + j(IN1 - IN3)] WN3k
where WNk = e(-2�nk/N)
FFT Algorithm
�
���
����
����
����
����
����
����
���
�
�
�
��
����
�
�
��
��
���
�
�
�
�
�
��
�
�
���
����
�
�
���
���
26
ASIC SYSTEM LAB./AJOU UNIV.
● Butterfly Architecture
● Number of butterflies(N-point) : N/2(log2N-1)
● Number of complex adders : N(log2N-1)● Number of complex multipliers : N/2(log2N-1)
Radix-2 FFT Architecture
Latch GW
PROM
Latch E
��������
������
SUB
ADD
MUL1
MUL2
ADD
���
��
����
���
ASIC SYSTEM LAB./AJOU UNIV.
Radix-4 FFT Architecture
● Butterfly Architecture
● Number of butterflies(N-point) : N/2(log4N-1)
● Number of complex adders : N(log4N-1)● Number of complex multipliers : 3N/4(log4N-1)
���
��������
��������
���
��������
��������
����������
����������
��� ��������
��� ��������
����
�
�
����
����
����
�� �
���������
��������������
��������������
���������
27
ASIC SYSTEM LAB./AJOU UNIV.
Comparison between Radix-4 and Radix-2
● Algorithm comparisons
* data : Complex number
- Radix-4 reduces the number of additions and multiplicationscompared with radix-2
● Architecture comparisons
- Butterfly architecture of radix-4 is more complex than that of radix-2
- However, as N increases, the gate count of radix-2 increases moresharply than that of radix-4
����������� �
�����������������
�����������������������
�������
�������
�� ����
�������
����������
����������
����������
������������
ASIC SYSTEM LAB./AJOU UNIV.
m1Punctured
Logic���������
�������
������
���������������������
m2
Gc0 = 1012 = 58Gc1 = 1112 = 78
c1
c01 0 1
1 1 1
MSBLSB
Convolutional Encoder for VITERBI Algorithm
● Convolutional Encoder consists of Two Components
◆ Shift Register : Hold K-1 Bits (Number of Shift Register)
◆ v Modulo-2 Adder : v - Bits are Output
◆ Example : K = 3, r = 1/2 Convolution Encoder
28
ASIC SYSTEM LAB./AJOU UNIV.
● Viterbi Decoding Procedure◆ Branch Metric Calculation (BMC)
� Calculate Hamming Distance or Euclidean Distance◆ Path Metric Calculation (PMC)
� Accumulate BM of Previous Survival Path (has smaller PM of two path)
◆ Add - Compare - Select (ACS)� Add : PM + BM� Compare : Compare Two Previous PM� Select : Select Smaller PM
◆ Trace-Back (TB)� We define the Length of TB Depth� Usually, TB Depth = K x 5 or 6� After fill TB depth, Trace Back the TB Memory and Decode the
Received Code
VITERBI Decoding
ASIC SYSTEM LAB./AJOU UNIV.
● Punctured Code : One of Modified Coding Scheme◆ Increase Code Rate◆ Decrease Coding Gain (c.f. Coding Gain is
10log(Pwithout FEC/Pwith FEC))◆ Example : r = 3/4 Punctured Convolutional Code
���������� �����
������
����������������� � �
���� ���� ���� ���� ���� ����
����
����
����
����
����
���
����
����
����
����
����
���
�
�����
����
����
����
����
����
���
����
�������� ������� �����
�������������
Punctured Code
011
101
29
ASIC SYSTEM LAB./AJOU UNIV.
● Trellis Diagram for PMC (previous BM� �)◆ Example : K = 3, r = 1/2 Convolutional Code◆ Branch Metric is Hamming Distance (Hard decision, # of different bits) or
Euclidean Distance (Soft decision, difference of decimal code) betweenReceived Code and Branch Word
Trellis Diagram
�
�
�����������
���� �������
��
��
��
��
�
�
�
�
��
��
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
��
��
��
����
��
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�������
��
��
��
�����������
���������� ��
� ������ ��
������ ����������� ��
�
��
��
�
��
��
�
��
��
�
��
��
�
��
��
�
��
��
�
��
��
�
��
��
� � � � � � � �
ASIC SYSTEM LAB./AJOU UNIV.
● Viterbi Decoder Architecture
◆ Depunctured Logic : If Received Code is a Punctured Code
◆ BMC : Hard/Soft Decision
◆ ACS : After ACS, Storage PM Memory◆ TB : Trace-Back
Path MetricMemory
DepuncturedLogic
BranchMetric
Calculate
Trace BackMemory
AddCompare
Select
�����
�
��
�
�
������ ��
���������
����
����
����
VLSI Architectures for VITERBI Algorithm
If Hard decision, x is 1-bitIf Soft decision, x is 3-bit
If upper path is smaller, TB stores 0If lower path is smaller, TB stores 1
30
ASIC SYSTEM LAB./AJOU UNIV.
● Serial ACS Viterbi Decoder Architecture
◆ Minimum Gate
◆ Maximum DelayPM memory
�������������� ���
�����
�����
�����
�����
����
����
����
����
����
����
������
������
����
����
���� ����
����
����
����������������
Compare&
Select
Compare&
Select
VLSI Architectures for VITERBI Algorithm
Reference : US patent 4,536,878
Time 0
Time 1
Time 0
Time 1
�����
��
��
��
��
����
����
����
����
����
����
����
����
����
����
����
����
ASIC SYSTEM LAB./AJOU UNIV.
VLSI Architectures for VITERBI Algorithm
● Parallel ACS Viterbi Decoder Architecture
◆ Minimum Delay
◆ Maximum Gate
◆ Routing Complexity High
◆ No use PM Memory
ACS
ToTrace-Back
Memory
FromBMC
ACS
ACS
ACS
����
����
����
����
������
������
������
������
����
����
����
����
Reference : US patent 4,614,933
�����
��
��
��
��
����
����
����
����
����
����
����
����
����
����
����
����
31
ASIC SYSTEM LAB./AJOU UNIV.
D
������
D
��
D
��
���������
RS Encoder
● LFSR (Linear Feedback Shift Register)� ��
● (n, k) RS Encoder
◆ n : # of code symbol
◆ k : # of message symbol◆ gi : ������
ASIC SYSTEM LAB./AJOU UNIV.
RS Decoding
● Decoding Procedure
◆ Syndrome (error pattern) �� S1, S2, ���, S2t
◆ Error Locator Polynomial (� �� ���)
◆ Error Location �� � �� ���� �� ��
◆ Error � ��� ���, � �� ���� � �� ��
◆ Error ��� �� �� �� xor � � => � ��
32
ASIC SYSTEM LAB./AJOU UNIV.
RS Decoder
��������
�� �����
�� ���� �
���� ������ ��������
�� �!
��"��#�$��! �!
%���� ���#"�!#��
Reg.
��
- Syndrome �� ��� �� � �� ��� �- � � � �� �� ��
XOR, Finite Field Multiplier,Shift Register � ��
ASIC SYSTEM LAB./AJOU UNIV.
�� �� ��� �
● Berlekamp-Massey Algorithm
◆ Shift Registers, XORs, Control Logic, Finite Field Multiplier, etc.
Shift Register (2t)
Shift Register (2t)
Shift Register (2t)
Shift Register (2t)
÷
x
+
Control Unit
MUX
+
xConstant
MUX
MUX
General Berlekamp-Massey Architecture
- Shift Register : 8t (t ���� �)
- Finite Field Multiplier : 2
- Finite Field Divider : 1
- Control Logic, MUXs
- Small Gate Count, Low Speed
< Hardware Complexity >
33
ASIC SYSTEM LAB./AJOU UNIV.
�� �� ��� �
● Euclid Algorithm
◆ Registers, Finite Field Multiplier, etc.
REG�
�
�
� REG
�
�
�
REG
REG
�
�
�
�
�
�
�
- Register : 4 x t (t ���� �)
- Finite Field Multiplier : 2 x t
- MUXs, XORs
- Large Gate Count, High Speed
< Hardware Complexity >
“Reed-Solomon Euclid Algorithm Decoder Having aProcess Configurable Euclid Stack,”U. S. Patent 5,170,399, Dec. 8, 1992.
ASIC SYSTEM LAB./AJOU UNIV.
��� FEC � ��
● FEC �� ��
◆ ���/RS/�� �� ��
◆ !" �# �� $% &' ( )*+,- &'
● ��� ./0◆ K=7, r=1/2- 1234 -/0
◆ 3-bit Soft Decision/Hard Decision ��
◆ BER Monitoring
◆ Serial/Parallel 5� ��
● Reed-Solomon ./0◆ (200,188), 662 �� �
◆ 478 79 /�: 8; : (200, 188), (120, 108), (60, 48), (40, 28)
◆ < RS =>?@� A�BC D E F 16.7% GH
34
ASIC SYSTEM LAB./AJOU UNIV.
����� FEC �� �� !
��� FEC �� ��
● FEC �� ��
◆ Gate Count : F 75,000 (��� : F 45,000, RS : F 30,000)
◆ IJ K)� : 33 MHz (worst case)
ASIC SYSTEM LAB./AJOU UNIV.
Channel Model
● ISI(Intersymbol Interference)
◆ Band-limited Channel Distortion (Wired Channel)
◆ Multipath Fading (Wireless Channel)
● Equalizer - LM� �� NOP �Q RS TQA Discrete Time Filter
y(n) : Equalizer Output
w(n) : Tap Coefficient
� � �� ����������
��
�
PSF Channel Equalizer
0 T
PSF : Pulse Shaping Filter
∑−
=
−=1
0
)()()(M
k
k knxnwny
35
ASIC SYSTEM LAB./AJOU UNIV.
Criteria of Equalizer
● Frequency Bandwidth◆ Baseband, Passband
● Sampling Time
◆ Symbol - One Sample/symbol
◆ Fractional Symbol - Two or More Samples/symbol● Coefficient Characteristics
◆ Fixed, Adaptive
● Architecture
◆ Transversal, DFE, Lattice
ASIC SYSTEM LAB./AJOU UNIV.
Transversal Structure
● Simplest Type
● Small Gate Count, Low Speed
● Low Power Consumption
T T T TInput
Xn
OutputYn
C0 C1 C2 Cn-2 Cn-1
∑−
=
−=1
0
)()()(M
k
k knxnCny
T : Sample Time
Ck : Tap Coefficient
Register : N Multiplier : N Adder : N-1
N : Number of Taps
36
ASIC SYSTEM LAB./AJOU UNIV.
Decision-Feedback Structure
● Good Performance in the Presence of Severe ISI● Moderate Gate Count, Power Consumption
● Low Speed
jk
K
j
jjk
Kj
j ICXCI −
=
−
−=∑∑ +=
~2
1
0
1
^
T T T
����������������
DecisionDevice
� � �
Error
Decision�
Data
Training Data
Input Data
����������������
FeedforwordFilter
FeedbackFilter
{Xk}
{I^k}
{I~k}
C-k1 C0C-k1+1
C1Ck2
Register : N Multiplier : N Adder : N-1
N : Number of Taps
ASIC SYSTEM LAB./AJOU UNIV.
Lattice Structure
● High Power Consumption
● Large Gate Count, High Speed
fm(n) = fm-1 (n) + k*mb m-1(n-1)
bm(n) = bm-1 (n-1) + kmf m-1(n)
y(n)� Transversal� �� ��
K1
K1*
T T T
K2*
K2 KM-1
K*M-1
InputXn
Stage1 Stage2 StageM-1
f0(n) f1(n) f2(n) fM-1(n)
bM-1(n)b2(n)b1(n)b0(n)
Register : N Multiplier : 2N Adder : 2N + 1
N : Number of Taps
37
ASIC SYSTEM LAB./AJOU UNIV.
Comparisons of Tap Update Algorithms
ZF LMS RLS
HardwareComplexity Low Low High
Speed Medium Low High
PowerConsumption Low Low High
Error CorrectionCapability Low Medium High
Tap Update ��� �
ASIC SYSTEM LAB./AJOU UNIV.
WLAN Modem Chip
8 9 10 11 12 13 1410-8
10-7
10-6
10-5
10-4
10-3
10-2
8 9 10 11 12 13 14
10-8
10-7
10-6
10-5
10-4
10-3
10-2
Th.DQPSK HSP3824 Proposed
BER
Eb/No
● Wireless LAN Modem (IEEE 802.11)
◆ Data Rate : 4Mbps(DQPSK), 2Mbps(DBPSK)
BER v.s. Eb/No : 6.5e-6 @ 14dB (AWGN)
◆ Differential Encoder/Decoder, Spreader/Despreader, MatchedFilter, CRC Encoder/Decoder, Scrambler/Descrambler, SymbolTiming Recover, Carrier Frequency Offset Recover