ee371 io lecture - stanford...
TRANSCRIPT
1
EE371 Lecture 16
High-Speed LinksVladimir Stojanovic
(with slides from M. Horowitz, J. Zerbe, K.Yang and W. Ellersick)
EE371 Lecture 16 2
Agenda : High Speed Links
� High-Speed Links, What,Where?� Signaling Faster - Evolution
» Circuits» Channel
� System-level design» Channel designer’s view» IC designer’s view
� Demo
2
EE371 Lecture 16 3
What Makes a Link?� Signaling: sending and receiving the information
� Clocking: Determining which bit is which
tbit /2
1 0 0 01 01
Tx RxChannel
PCB, Coax, Fiber
EE371 Lecture 16 4
Spanning A Broad Space� Inverter.........to……..DSL modem� Metrics
» Speed» Latency» Electrical environment» Power & area» Volume
3
EE371 Lecture 16 5
Increasing Chip I/O Bandwidth
� Computers:Main memory:
– SDRAM100 (100 Mbps) � RDRAM (0.8-1.1 Gbps)Peripherals:
– PCI (66 Mbps) � Infiniband (2.5 Gbps)
� Networks:Physical Front End:
– LAN: Fast-Eth (100 Mbps) � Gigabit-Eth (1Gbps) – WAN: OC-12 (625 Mbps) � OC-192 (12.5 Gbps)
Switch Fabric:– 625 Mbps � 2.5 Gbps
EE371 Lecture 16 6
Inside the Router
MACMACTM/
FabricIF
TM/Fabric
IFNPUNPU
SerDesSerDes
OpticsOpticsSerDesSerDes
SerDesSerDes
CrossbarCrossbar
Line Cards:8 to 16 per System
Switch Cards:2 to 4 per System
Passive Backplane
MEM
MEM MEM
MEM MEM
MEM MEM
MEM
4x3.125 Gb/sXAUI Serial Links
(chip-to-chip)
OC-19212.5Gb/s
Laser driver link
3.125-12.5Gb/s Backplane Serial Links
� Regardless of where the links are, there is a constant desire tosignal faster and with less power
4
EE371 Lecture 16 7
Serial Link Signaling Over Backplanes - Past
� Designs were limited by transmitter & receiver speed� Clever circuit design – no communications/SI background needed
serdes
BackplaneLinecard Linecard
serdes
Signal at Tx Signal at Rx0.1
1.00.0 0.2 0.4 0.6 0.8 1.0
[GHz]
� Channel was not an issue up to 2-3Gb/s
2Gb/s view of the channel
EE371 Lecture 16 8
Signaling
+
-
+
-
VS
VS/2shared
+
-refd
+
-
dd
High Impedance
Diff
eren
tial
Sing
leEn
ded
Low Impedance
5
EE371 Lecture 16 9
Transmitter Design
� Critical components: Sync, Mux, Tx
� Design issues:» Slew rate control vs ISI, jitter» Output current and impedance control
� Clock and Driver power dissipation
Data Generation Pre-Driver Driver
Tx50�
Sync MuxEncoder
EE371 Lecture 16 10
Output Drivers
� On-chip clock speed limited to 6-8FO4� Need to send more bits/clock – multiplex data
6
EE371 Lecture 16 11
Simple Transmitter
� DDR: send a bit per clock edge� Critical issues:
» 50% duty cycle» Tbit > 4-FO4
Data_O
Data_E1 2 3 4 50
10
20
30
bit time (normalized to FO4)
outp
ut p
ulse
wid
th c
losu
re (
%)
EE371 Lecture 16 12
Fastest Transmitter» Off chip time constant smaller than on chip:
– Generate current pulse at the output» Limited only by the output capacitance
outout_bRTERM
RTERM
x 8
d0 d0
ck3
D0 D1 D2data(ck0)
clock(ck3)
0.50 0.60 0.70 0.80 0.90 1.000.0
10.0
20.0
30.0
Bit-width (#FO-4)
% e
ye c
losu
re
» Limiting time constant 25-�*Cpad» Cpad = 8*Cdriver + Cesd
7
EE371 Lecture 16 13
Simple Receiver
� Regenerative latch has highest gain-bandwidth product of all amplifiers (gain exponential with time – just need to wait long enough)
� Preconditioning stage: filter/integrate rectify CM� Latch makes decision (4-FO4)� DAC can be used to compensate offsets
inref
clk
A latch
D/A
clk
EE371 Lecture 16 14
Fastest Receiver
� Use multiple input receivers» Simplest 2, more complex 4-8» Decouples Tbit from latch resolution» Leverage high input impedance amplifiers
D0 D1 D2 D7
clk0clk1clk2clk3
Ring Oscillatorclk0 clk1 clk2 clk3
ck0
ck1
ck2
ck3
ck4
din
To A
mpl
ifier
s
8
EE371 Lecture 16 15
Serial Link Signaling Over Backplanes
� Now that we’ve made the fastest Tx & Rx look what happens with the eye� Need to look more closely into the channel as that seems to be the problem
serdes
BackplaneLinecard Linecard
serdes
Signal at Tx Signal at Rx
0.00
0.01
0.10
1.000.0 1.0 2.0 3.0 4.0 5.0 [GHz]
10Gb/s view of the channel
EE371 Lecture 16 16
The Backplane Environment
� The problem is there are many sources of Z…and thus many possible sources of signal degradation
� Interference» Intersymbol (dispersion, reflections)» Co-channel (crosstalk)
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and device loading capacitance)
Line card via
Back plane trace
Backplane via
Package via
Back plane connector
Line card trace
Package
On-chip parasitic(termination resistance and device loading capacitance)
Line card via
Back plane trace
Backplane via
Package via
[Kollipara, DesignCon03]
9
EE371 Lecture 16 17
Interference
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
FEXT
NEXT
THROUGH
0 1 2 3
0
0.2
0.4
0.6
0.8
1
ns
puls
e re
spon
se
Tsymbol=160ps
� Inter-symbol interference» Dispersion (skin-effect, dielectric loss) - short latency» Reflections (impedance mismatches – connectors, via stubs, device
parasitics, package) – long latency
� Co-channel interference (Far-End & Near-End Crosstalk)
EE371 Lecture 16 18
Dispersion: Material Loss
� PCB Loss : skin & dielectric loss» Skin Loss � �f» Dielectric loss � f : a bigger issue at higher f
FR4 dielectric, 8 mil wide and 1m long 50 Ohm strip line
0
0.2
0.4
0.6
0.8
1
1.0E+06 1.0E+07 1.0E+08 1.0E+09 1.0E+10Frequency, Hz
Atte
nuat
ion
Total lossConductor lossDielectric loss
10
EE371 Lecture 16 19
Reflections: Z - Discontinuities
� Sources of Reflections : Z - Discontinuities» PCB Z mismatch» Connector Z mismatch» Vias (through) Z mismatch» Device parasitics - effective Z mismatch
Z1 Z2
Z2 Z1–Z1 Z2+--------------------
2Z2Z1 Z2+--------------------
DC via Conn via BP
Energy flow into junction = transmitted +
reflected energy
EE371 Lecture 16 20
Reflections From Via Stubs
� Additional sources of reflections : stubs» Vias - particularly on thick backplanes» Package plating stubs
Top layer signaling results in large via stub
0 2 4 6 8 10
-60
-50
-40
-30
-20
-10
0
frequency [GHz]
Atte
nuat
ion
[dB
]
9" FR4, via stub
26" FR4,via stub
26" FR4
9" FR4
11
EE371 Lecture 16 21
Reflections and Crosstalk
Far-end XTALK (FEXT)
Desired signal
Near-end XTALK (NEXT)
Reflections
[Sercu, DesignCon03]
EE371 Lecture 16 22
Crosstalk� Many sources
» On-chip» Package» PCB traces» Inside connector
� Differential signaling can help» Minimize xtalk generation & make effects common-mode
� Both NEXT & FEXT» NEXT very destructive if RX and TX pairs are adjacent
– Full swing-TX coupling into attenuated RX signal– Effect on SNR is multiplied by signal loss
» Simple solution : group RX/TX pairs in connector» NEXT typically 3-6%, FEXT typically 1-3%
12
EE371 Lecture 16 23
A Complex System
PCB only
PCB + Connectors
PCB, Connectors,Via stubs & Devices
EE371 Lecture 16 24
Signaling Faster – System Level Improvements
� Channel designer’s view (passive techniques)» Try to make Z-discontinuities go away» Reduce cross-talk (EM isolation)
� IC designer’s view (active techniques)» Design circuits that compensate/eliminate
interference
13
EE371 Lecture 16 25
Equalization For Loss : Goal is to Flatten Response
� Channel is band-limited� Equalization : boost high-frequencies relative to lower frequencies
x
=
EE371 Lecture 16 26
Receive Linear Equalizer
� Amplifies high-frequencies attenuated by the channel
� Pre-decision� Digital or Analog FIR filter� Issues
» Amplifies noise» Precision» Tuning delays (if analog)» Setting coefficients
– Adaptive algorithms such asLMS
…
WL-1
DDDWLW1
+
H(s)
freq
14
EE371 Lecture 16 27
Transmit Linear Equalizer
� Attenuates low-frequencies» Need to be careful about output
amplitude : limited output power– If you could make bigger swings,
you would– EQ really attenuates low-frequencies
to match high frequencies Also FIR filter : D/A converter
� Can get better precision than RX� Issues
» How to set EQ weights?» Doesn’t help loss at f
H(s)
freq
EE371 Lecture 16 28
Transmit Linear Equalizer: Single Bit Operation
0.0 0.3 0.6 0.9 1.2-0.3
-0.1
0.1
0.3
0.5
0.7UnequalizedEqualization PulseEnd of Line
time (ns)
Vol
tage
15
EE371 Lecture 16 29
Example : 5Gbps over 26” FR4
no equalization with Tx linear equalizer
EE371 Lecture 16 30
Decision Feedback Equalization
� Don’t invert channel…just remove ISI» Know ISI because already received
symbols» Doesn’t amplify noise» Has error accumulation problem
– Less of an issue in linkswhere random noise small
� Requires a feed-forward equalizer for precursor ISI» Reshapes pulse to eliminate
precursor
-
FIR filter
Decision (slicer)
FIR filter
Feed-forward EQ
Feed-back EQ
16
EE371 Lecture 16 31
Transmit and Receive Equalization
� Transmit and receive equalizers are combined to make a range restricted DFE» Tx equalizer functions as the feed-forward filter» Rx equalizer restricted in performance of loop
TAP SELLOGIC
TXDATA
3
RXDATA
EE371 Lecture 16 32
Tx & Rx Equalization Ranges
TX Driver/Equalizer : 5 taps1(pre)+1(main)+3(post)
RX Equalizer5-17 taps after mainPick any 5 taps
17
EE371 Lecture 16 33
Minimizing Reflections : The Vias
� Minimizing via stubs» Thinner PCBs are better…
but sometimes impossible» Counter-boring» Blind vias» SMT technology
» All are costly1.1x - 2x counter-bored
blind via
EE371 Lecture 16 34
Vias : Effect of Counter-boring
� Counter-boring top layer takes it from highest-loss to lowest-loss & reduces resonance
Layer3 no Counter-boringLayer3 with Counter-boring
18
EE371 Lecture 16 35
Minimizing Reflections: Termination Design
� On-chip termination » Bondwire & pad capacitance part of the channel
… instead of a stub (which rings)
EE371 Lecture 16 36
Minimizing Reflections: FET Terminations
IV-characteristicof two-element resistor
[Dally]
19
EE371 Lecture 16 37
Alternate Approaches: Multi-Level Signaling
� Binary (NRZ) is 2-PAM� 2-PAM uses 2-levels to send one
bit per symbol� Signaling rate = 2 x Nyquist
� 4-PAM uses 4-levels to send 2 bits per symbol
� Each level has 2 bit value� Signaling rate = 4 x Nyquist
00
01
11
10
1
0
1
0
Note : both can be either single-ended or differential
EE371 Lecture 16 38
When Does 4-PAM Make Sense?
� First order : slope of S21» 3 eyes : 1 eye = 10db» loss > 10db/octave : 4-PAM should
be considered
0.0 1.0 2.0 3.0 4.0
Nyquist Frequency (GHz)
|H(f)
|
-20db
-40db
-60db
20
EE371 Lecture 16 39
Alternate Approaches: Simultaneous BiDirectional
� Two signals at halfspeed» Makes sense if b/w need equal
in both directions
� Issues» Getting ideal timing
between TX & RX is tough
Vlinedrv
VrefVrefH (shared)VrefL (shared)
rcvr
receive signal
transmit signal
VlineVref
(Vline - Vref)+ve
-ve
VrefH
VrefL
Fixed VrefL= Vdd – 1.5*Vswing
EE371 Lecture 16 40
Characterization System� Multiple
» Connectors» Backplane materials» Trace lengths» Layers/via lengths» Via technology
� These slides» 20” Trace length» FR4 non counter-bored» Nelco 6000 2-step
counter-bored» Top & bottom layers
� Will show the Rambus 10Gb/s backplane SerDes demo on Friday
21
EE371 Lecture 16 41
An attempt to shift the problem to DSP side
� 8-way DAC (8bit) and ADC (4bit)� 8GSa/s� A lot of power (not even including the DSP
section)� DACs and ADCs complex – a lot of parasitic
filtering – channel degradation � Still people are moving in that direction – check
out K. Poulton’s 20GSa/s 8-bit ADC paper at ISSCC03
EE371 Lecture 16 42
Time-Interleaved DACs
� DACs enabled by overlap of two 1 GHz clocks» Need precise clocks: 3%pp phase noise=>24%pp symbol» Fast clocks (period of 8 gate delays) limit interleaving» Capacitance of all 8 DACs loads output
CurrentDAC
CurrentDAC
��
�
clkStart0clkEnd0
clkStart7clkEnd7
data0
data7
50
CurrentDAC
CurrentDAC
��
�
clkStart0clkEnd0
clkStart7clkEnd7
data0
data7
50
22
EE371 Lecture 16 43
DAC Output Circuitry
32
32
16
16
8
8
4
4
2
2
symbol time
data
1
1
low-fanoutpre-driver output
driver
7 thermometer-coded size 32 outputs
5-bit binary
3-bit thermometer
VddReg
� Predriver VddReg controls output current
clkEnd
clkStart
� RCout = 25��* 4.3pF �� 1.5 GHz bandwidth
EE371 Lecture 16 44
Transceiver Inductors and Clocking
� Phase adjusters correct LC delay, static errors» Adjuster: clock mux, 1/16th-symbol clock interpolator» 8 ADC phase adjusters + 1 for timing recovery» 16 DAC phase adjusters (2 clocks for each DAC)
adjust
4 stagePLL
dac
Transmit memory
adjustadjustadjust
adjust
4 stagePLL
adc
Rx memoryTiming Recovery
adjustadjustadjust adc
adcadc dac
dacdac
23
EE371 Lecture 16 45
Next Challenges
� Improving PSR of all circuits in the path� Integration of many links
» Low power, area, portable solutions
� Control of complex architectures» Deal with loss, reflections and crosstalk
� Offset and mismatch» Both voltage and time
� Lots of opportunity for design!
EE371 Lecture 16 46
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Nyauist Frequency (GHz)
S21
FR420top
FR420bot
FR410top
FR410bot
Measured S21’s: FR4 no C-Bore
24
EE371 Lecture 16 47
26” FR4 Bot 3.125Gbps, 2P w/EQ
EE371 Lecture 16 48
26” FR4 Bot 6.4Gbps, 2P w/EQ
25
EE371 Lecture 16 49
26” FR4 Top 6.4Gbps, 2P w/EQ
EE371 Lecture 16 50
26” FR4 Top 6.4Gbps, 4P w/EQ
26
EE371 Lecture 16 51
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Nyquist Frequency (GHz)
S21
NCB20top
NCB20bot
NCB10top
NCB10bot
Measured S21’s : N6k C-Bore
EE371 Lecture 16 52
26” N6k-cb Top 6.4Gbps, 2P
27
EE371 Lecture 16 53
10G Eyes & System Margin Shmoos
� 3”/20”/3” = 26” Trace + 2 Connectors� Tested to BER < 10-15
EE371 Lecture 16 54
Link Performance vs. Time
Walker’02
28
EE371 Lecture 16 55
Link Efficiency: Gb/W, Gb/mm2
Walker’02(ISSCC’92-2001)