a blind baud-rate cdr and zero-forcing adaptive dfe for an … · 2016-04-08 · a blind baud-rate...
TRANSCRIPT
A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFEfor an ADC-Based Receiver
by
Clifford Ting
A thesis submitted in conformity with the requirementsfor the degree of Masters of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2013 by Clifford Ting
A Blind Baud-Rate CDR and Zero-Forcing Adaptive DFEfor an ADC-Based Receiver
Clifford Ting
Master of Applied Science, 2013
Graduate Department of Electrical and Computer Engineering
University of Toronto
Abstract
This thesis describes two design ideas in the area of ADC-based receivers.
The first contribution of thesis is a 10Gb/s blind baud-rate CDR. The blind baud-
rate operation, which is made possible by using a 2UI integrate-and-dump filter, creates
intentional ISI in adjacent bit periods. The blind samples are interpolated to recover
center-of-the-eye samples for a speculative Mueller-Muller PD and a 2-tap DFE operation.
The 65nm CMOS test chip has a measured high-frequency jitter tolerance of 0.19UIPP
at ±300ppm of frequency offset.
The second contribution of this thesis is a digital zero-forcing adaptive DFE. The
DFE coefficients are calculated by correlating data samples with the recovered bits. Sim-
ulations show that the adaptive taps converge to the ISI values on the pulse response of
the data signal. The CDR and adaptive 2-tap DFE have a high-frequency jitter tolerance
of 0.28UIPP when simulated at 10Gb/s with an 8” FR4 channel.
ii
Acknowledgements
The work described in this thesis would not have been possible without the help and
support of many people.
I would like to thank my supervisor, Professor Ali Sheikholeslami, for his support
and guidance during my M.A.Sc. studies and for being a great teacher. His optimism
encouraged me to continue measuring and eventually publish the blind baud-rate test
chip, even though it did not work initially.
I would also like to the thank the thesis committee members, Professor Tony Chan
Carusone, Professor Antonio Liscidini, and Professor Andreas Moshovos for reviewing
the thesis and for their valuable feedback.
My gratitude goes to Fujitsu Laboratories Ltd. for sending me to their office in
Kawasaki, Japan to tape out the test chip. I am grateful to everyone who made my
visit an enjoyable one – in particular, thank you to Masaya Kibune, Hirotaka Tamura,
Takuji Yamamoto, Kouichi Kanda, Takayuki Hamada, Junji Ogawa, Hirotaka Yamazaki,
Yasumoto Tomita, and Iwao Sugiyama. I would like to thank Tamura-san for sharing
his ideas with me and for always encouraging research discussions. A special thank you
goes to Kibune-san who stayed late to keep me company during the tapeout, made sure
I had everything I needed during my stay in Japan, and took time on weekends to give
me a tour of Kyoto and Tokyo, even though he was very busy with his own work.
Thank you to all the graduate students in BA5000 and BA5158 for making my time
in graduate school a wonderful experience. In particular, the contributions in this thesis
could not have been done without Josh Liang’s and Sadegh Jalali’s help during de-
iii
sign and measurement of the test chips. Also, I would like to thank the previous and
current students of Professor Sheikholeslami’s research group for their friendship and
valuable discussions: Shayan Shahramian, Safeen Huda, Behrooz Abiri, Sadegh Jalali,
Ravi Shivnaraine, Aynaz Vatankhahghadim, Josh Liang, Neno Kovacevic, and Farhad
Ramezankhani.
Most of all, I would like to thank my parents for their unconditional love and support.
Thank you for always being there for me.
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Channel effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Linear Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Decision-Feedback Equalization (DFE) . . . . . . . . . . . . . . . 12
2.4 Equalizer Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Zero-Forcing (ZF) Method . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Minimum Mean Square Error (MMSE) Method . . . . . . . . . . 19
2.4.3 Maximum Eye Opening Method . . . . . . . . . . . . . . . . . . . 22
2.5 Clock and Data Recovery (CDR) . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 Phase-Tracking CDR with Clock Feedback . . . . . . . . . . . . . 24
2.5.2 Blind Feed-forward CDR . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.3 Blind CDR with Feedback . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
v
3 A Blind Baud-Rate CDR 36
3.1 Blind 1x Data Recovery Concepts . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Proposed 1x Blind Receiver Architecture . . . . . . . . . . . . . . . . . . 39
3.3 Receiver Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Integrate-and-Dump Filter . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Clock Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.3 Data Interpolator . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Mueller-Muller Phase Detector . . . . . . . . . . . . . . . . . . . 46
3.3.5 Decision-Feedback Equalizer . . . . . . . . . . . . . . . . . . . . . 48
3.3.6 Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Simulation and Measurement Results . . . . . . . . . . . . . . . . . . . . 50
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 A Zero-Forcing Adaptive DFE for an ADC-Based CDR 60
4.1 Proposed DFE Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Proposed Blind ADC-Based Receiver Architecture . . . . . . . . . . . . . 62
4.3 Proposed Digital CDR with Adaptive 2-tap DFE . . . . . . . . . . . . . 63
4.3.1 Data Interpolator . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.2 Low-Pass Filter for DFE Adaptation . . . . . . . . . . . . . . . . 68
4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Conclusion 78
5.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.1 Implementation of a Fully Feed-Forward Blind Baud-Rate CDR . 79
5.2.2 Evaluation of Phase-Dependent DFE for Data Interpolators . . . 80
5.2.3 Adaptive Optimization of Offset Coefficient in MMPD . . . . . . 80
vi
5.2.4 Calibration of I&D and ADC Front End . . . . . . . . . . . . . . 80
References 80
vii
List of Tables
4.1 Comparison of Adapted Coefficients (c1 and c2) vs. Pulse Response (h1
and h2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
viii
List of Figures
2.1 The basic components of a communication system . . . . . . . . . . . . . 4
2.2 An example of a channel frequency response and the effect on an isolated
data pulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Intersymbol interference (ISI) when transmitting a ’1111’ sequence . . . . 6
2.4 Comparison of (a) binary and (b) ADC-based receivers . . . . . . . . . . 8
2.5 (a) Linear and (b) non-linear receiver equalizers . . . . . . . . . . . . . . 10
2.6 Frequency response of combined channel and linear equalizer . . . . . . . 11
2.7 Source-degenerated continuous time linear equalizer . . . . . . . . . . . . 11
2.8 A 3-tap DFE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 A speculative 1-tap DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.10 An example of channel+FFE pulse response (h(t)) and Nyquist response
(g(t)). ISI is the difference between the two responses (r=g-h). . . . . . . 15
2.11 An example of a receiver with a channel (with 2 pre-cursor and 4 post-
cursor taps of ISI) and a 2-tap FFE . . . . . . . . . . . . . . . . . . . . . 16
2.12 A partial model of a discrete-time receiver with channel and FFE . . . . 16
2.13 A geometric representation of optimal zero-forcing FFE coefficients . . . 17
2.14 A model of a discrete-time receiver, including a ZF adaptation loop . . . 18
2.15 A example of minimizing average error by using steepest-descent algorithm 20
2.16 A model of a discrete-time receiver with a DFE and LMS adaptation loop 21
2.17 A system that adapts equalizer taps based on eye opening . . . . . . . . 22
ix
2.18 A recovered clock sampling equalized data . . . . . . . . . . . . . . . . . 23
2.19 CDR classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.20 System diagram of phase-tracking CDR with clock in feedback loop . . . 25
2.21 Example of a jitter tolerance chart . . . . . . . . . . . . . . . . . . . . . 26
2.22 (a) PD inputs and output and (b) linear model . . . . . . . . . . . . . . 26
2.23 Alexander PD implementation . . . . . . . . . . . . . . . . . . . . . . . . 27
2.24 Alexander PD examples with early and late CKRX . . . . . . . . . . . . 27
2.25 Transfer function of Alexander PD with no jitter on data or CKRX . . . 27
2.26 Hogge PD implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.27 Hogge PD output with (a) early, (b) on-time, and (c) late CKRX . . . . . 28
2.28 Transfer function of Hogge PD . . . . . . . . . . . . . . . . . . . . . . . . 29
2.29 Example of (a) pulse response and (b) MM function [21] . . . . . . . . . 30
2.30 System diagram of a 8x oversampled blind feed-forward (burst-mode)
CDR [22,27] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.31 The edge detection and data selection process from Figure 2.30 . . . . . . 32
2.32 A blind 2x ADC-based CDR [32] . . . . . . . . . . . . . . . . . . . . . . 32
2.33 A blind 1.45x ADC-based CDR [33] . . . . . . . . . . . . . . . . . . . . . 33
2.34 System diagram of blind CDR with feedback [10] . . . . . . . . . . . . . 34
2.35 Analog data interpolator (DI) estimates center and edge samples from
blind samples [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Worst-case for 2x, 1.45x and 1x sampling on open eye diagram . . . . . . 37
3.2 Comparison of theoretical worst-case jitter tolerance given the pulse re-
sponses of an ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate
samples can shift across a 1UI range due to frequency offset. . . . . . . . 38
3.3 System block diagram of interleaved analog front end (1 UI I&D and ADC)
and digital CDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D 40
x
3.5 Handling (a) negative frequency offset: data (TX) is slower than blind
receiver clock (CKRX) (b) positive frequency offset: data (TX) is faster
than blind receiver clock (CKRX) . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Implementation of integrate-and-dump (I&D) circuit [28] . . . . . . . . . 42
3.7 I&D operating phases synchronized with clock pulses . . . . . . . . . . . 43
3.8 Implementation of clock pulse generator with adjustable delay for deskew 43
3.9 (a) Effect of clock phase skew on the I&D integration period (b) Equal
I&D integration periods after correcting clock skew . . . . . . . . . . . . 44
3.10 Adjustable clock delay block . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.11 Piecewise linear interpolation of desired sample from blind samples . . . 45
3.12 (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed
MM function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.13 Design and implementation of the speculative Mueller-Muller phase detec-
tor (MMPD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.14 (a) A speculative 2-tap DFE and (b) the first stage of the parallel specu-
lative DFE that recovers 8 bits per cycle . . . . . . . . . . . . . . . . . . 48
3.15 The second stage of parallel speculative DFE that recovers 16 bits per cycle 49
3.16 Loop filter with configurable proportional and integral gains . . . . . . . 50
3.17 Simulated loop filter convergence with 1000ppm of frequency offset for
PRBS-7. Signals correspond to nodes on the block diagram of Fig. 3.16 . 51
3.18 Frequency response of channel models in simulation . . . . . . . . . . . . 52
3.19 Simulated eye diagrams using Channel A + 2UI I&D . . . . . . . . . . . 52
3.20 Simulated eye diagrams using Channel B . . . . . . . . . . . . . . . . . . 53
3.21 Simulated jitter tolerance results at 10Gb/s with a BER of 10−6 . . . . . 54
3.22 Chip photo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.23 Measurement setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.24 Average ADC output given DC input (a) before and (b) after skew correction 56
xi
3.25 Measured channel frequency response . . . . . . . . . . . . . . . . . . . . 57
3.26 Measured eye diagrams (a) after the channel and (b) after the ADC ADC 57
3.27 Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 in-
put data and BER of 10−6 and 10−12, respectively . . . . . . . . . . . . . 58
4.1 ISI can be calculated by correlating sampled data (Ak, Ak−1, etc.) with
recovered bits (xk, xk−1, etc.) . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Zero-forcing controller for n-tap DFE adaptation . . . . . . . . . . . . . . 61
4.3 System diagram of proposed receiver with 3-bit ADC-based CDR and
adaptive DFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Data interpolator calculates sample at desired location from closest blind
samples. (a) Negative or (b) positive frequency offsets result in occasional
skipped or extra interpolated samples . . . . . . . . . . . . . . . . . . . . 63
4.5 Proposed digital CDR with adaptive DFE . . . . . . . . . . . . . . . . . 64
4.6 Piecewise linear interpolation of desired sample from 2x blind samples . . 66
4.7 Frequency responses of 1x and 2x data interpolators. Both interpolators
operate on a 10Gbps data signal with a Nyquist frequency of 5GHz. . . . 67
4.8 Low-pass filter for DFE coefficients . . . . . . . . . . . . . . . . . . . . . 68
4.9 Hysteresis block implemented in low-pass filter . . . . . . . . . . . . . . . 68
4.10 Frequency responses of channel models used in simulation . . . . . . . . . 69
4.11 Combined channel and interpolator pulse responses showing ISI tap values
(h−1, h0, h1, h2, h3) when CDR has locked . . . . . . . . . . . . . . . . . 70
4.12 Simulated DFE adaptation with Channel C at 10Gbps. DFE converges
to same steady-state values when given different initial coefficients (i.e. 0
and 30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.13 Simulated DFE adaptation with Channel D at 10Gbps. DFE converges
to same steady-state values when given different initial coefficients (i.e. 0
and 30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
xii
4.14 Simplified diagram of CDR model used for eye diagram simulations . . . 72
4.15 Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams
correspond to signals in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . 73
4.16 Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams
correspond to signals in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . 74
4.17 Simulated eye diagrams with 10Gbps data and Channel D. Eye diagrams
correspond to signals in Figure 4.14 . . . . . . . . . . . . . . . . . . . . . 75
4.18 Simulated jitter tolerance of proposed receiver . . . . . . . . . . . . . . . 76
xiii
List of Acronyms
ADC Analog-to-Digital Converter
BER Bit-Error Rate
CDR Clock and Data Recovery
CTLE Continuous-Time Linear Equalizer
DFE Decision Feedback Equalizer
DJ Deterministic Jitter
CP Charge pump
DI Data Interpolator
FFE Feed-Forward Equalizer
FIR Finite Impulse Response
FR4 A type of glass-reinforced epoxy laminate printed circuit board
Gb/s Gigabits per second
Gbps Gigabits per second
ISI Intersymbol Interference
LMS Least Mean Square
xiv
MMPD Mueller-Muller Phase Detector
MMSE Minimum Mean Square Error
NRZ Non-Return-to-Zero
PCB Printed Circuit Board
PD Phase Detector
PI Phase Interpolator
PRBS Pseudo-Random Binary Sequence
PVT Process, Voltage and Temperature
RJ Random Jitter
UI Unit Interval
USB Universal Serial Bus
VCO Voltage-Controlled Oscillator
ZF Zero-Forcing
xv
1 Introduction
The rapid improvements in processor speeds and other digital computation have enabled
the development of applications such as Internet and video conferencing. These applica-
tions have, in turn, caused a growing demand for high-speed data communication. One
particular trend is the centralization of processing and storage resources in cloud com-
puting centers [4,9]. While cloud computing reduces complexity and power consumption
of client devices (e.g. laptops, tablets, cell phones, etc.), it comes at a cost of requiring
high bandwidth communication between the client device and computing center [4].
This thesis focuses on the part of the communication system that transfers digital data
from one chip to another via wireline channels. In recent years, the wireline channels
used in chip-to-chip communications have not improved at the same rate that silicon
technologies have advanced. In addition, the desire to minimize production costs has
limited the number of I/Os and channels available to each chip. Hence, we are forced to
send ever increasing amounts of data per channel in the presence of channel imperfections.
Accordingly, new circuit innovations are required in order to achieve higher data rates.
1.1. Motivation
The main channel imperfection that limits data rates is the channel’s bandwidth. As
data rates increase, the data signal experiences frequency-dependent attenuation due to
the dielectric and conductive losses in the channel [12, 15]. Analog circuits can be used
1
Chapter 1. Introduction 2
to equalize the signal and recover the digital data [13, 20, 30]. Compared to their digital
counterparts, the analog circuits consume less power, but are more vulnerable to process,
temperature, and voltage (PVT) variations. As the technology scales to smaller device
sizes and lower voltages, analog circuits benefit less from the technology advances and,
in fact, are at a disadvantage because they require voltage headroom. Digital circuits,
on the other hand, port easily and perform better with each successive technology.
The role of a clock-and-data recovery (CDR) block is to recover the transmitted data
on the receiver side by sampling the data signal either with a phase-tracking [3,7,11,37]
or blind clock [1,32,36]. Sampling with a blind clock removes the feedback loop between
analog and digital circuits and results in faster development because the analog and
digital blocks can be designed independently.
Digital equalizers and CDRs require a high-speed analog-to-digital converter (ADC).
However, the ADC consumes significant power. One of the goals of this thesis is to
increase data rate without increasing ADC power. We can accomplish this by reducing
the ADC’s sampling rate while maintaining the bit rate. The first part of this thesis
proposes a CDR that can recover data from blind, baud-rate samples. This will be in
contrast with the previous work where 2x [32,36] or 1.45x [33] the baud rate was used as
the sampling rate.
The second part of this thesis proposes an adaptive DFE for the blind, baud-rate
CDR. The proposal simplifies the DFE controller compared to a previous LMS adaptive
DFE for blind CDRs [1].
1.2. Thesis Objectives
This thesis presents the design and implementation of a blind baud-rate CDR for an
ADC-based receiver, and the architecture of a zero-forcing adaptive DFE for a blind
CDR. The main objectives of the thesis are as follows:
• Provide a background on different types of adaptive equalizers and clock-and-data
Chapter 1. Introduction 3
recovery systems,
• Investigate and propose a CDR to recover data from blind, baud-rate ADC samples,
• Present the implementation, simulation, and measurement results to show proposed
CDR’s functionality,
• Investigate and propose an adaptive DFE for a blind CDR,
• Present the implementation, and simulation results to show the DFE controller
functionality; measurement is left as future work.
1.3. Thesis Outline
The remaining chapters of this thesis are organized as follows:
• Chapter 2 provides a background on different types of adaptive equalizers and
clock-and-data recovery systems,
• Chapter 3 describes the concept of blind, baud-rate data recovery, proposes a CDR
architecture, and presents simulation and measurement results,
• Chapter 4 proposes a novel DFE controller for the CDR developed in Chapter 3,
and presents simulation results,
• Chapter 5 concludes the thesis and provides the future directions for this work.
2 Background
This purpose of this chapter is to present the basic concepts needed to understand the
contributions of this thesis and to review existing architectures of some of the blocks used
in high-speed system for communicating digital data. The communication system shown
in Figure 2.1 consists of three main components: a transmitter, channel, and receiver.
The channel is the physical medium (e.g. wireline, wireless, or optical) that connects the
transmitter to the receiver. In this thesis, we will focus on systems designed for electrical
wireline channels. Two examples of wireline channels include the traces on an FR4 PCB
and copper wire in Ethernet and USB cables.
Transmitter
(TX)Channel
Receiver
(RX)
Digital
Data
Recovered
Data
Figure 2.1: The basic components of a communication system
The transmitter converts the digital data into a signal that can be transmitted across
the channel (e.g. electrical pulses with NRZ coding). The receiver samples the signal
at the other end and recovers the digital data. Bit errors occur when the recovered
data does not match the original data at the transmitter. The goal of the high-speed
communication system is to minimize the bit error rate (BER). For wireline applications,
the target BER is usually below 10−12.
At the time of this writing, the term, ”high-speed,” refers to the sending and receiving
of data on the order of gigabits per second (Gbps). Wireline channels have not improved
4
Chapter 2. Background 5
much as data rates have increased. Hence, the transmitter and receiver must compensate
for the non-idealities of the channel (e.g. bandwidth limitations) in order to reduce the
BER below the target rate while minimizing their power consumption.
In this chapter, we will focus on the blocks and different architectures of the receiver
and omit the details of transmitter because this thesis contributes in the area of receiver
architecture. The chapter is organized as follows. Section 2.1 describes channel non-
idealities and their effect on the transmitted data signals. Sections 2.3 and 2.4 describe
how equalizers compensate for channel bandwidth limitations and how to adapt an equal-
izer to match a particular channel. Section 2.5 discusses different types of clock-and-data
recovery blocks.
2.1. Channel effects
The top of Figure 2.2 shows an example of a frequency response of the channel from
the transmitter to the receiver (also known as the S21 parameter). The frequency, fb,
is the baud rate of the data (e.g. if the data rate is 10Gbps with NRZ coding, then fb
would be 10GHz). We are mostly interested in the channel frequency response up to
fb/2 since the data pattern with the highest transition density (”01010101. . .”) has a
frequency of fb/2. In a FR4 channel, the skin effect of the copper trace and dielectric
loss from the surrounding PCB substrate cause the channel response to be attenuated
at high frequencies. As we increase the data rate, the channel will further attenuate the
signal at fb/2.
The bottom of Figure 2.2 shows the transmitted and received data pulses. Assuming
NRZ coding, the transmitted pulse has a duration of Tb=1/fb. The figure shows a digital
”1” being sent; if a digital ”0” were being sent, then pulse amplitude would be negative.
The transmitted pulse is a Nyquist pulse; if we sample the pulse at baud rate, fb, then we
would have only one non-zero sample at h0. However, the channel’s frequency-dependent
attenuation spreads the pulse energy into adjacent Tb bins (h−1, h1, h2, etc.). If we
Chapter 2. Background 6
TX RXChannel
fb
Ch
an
ne
l F
req.
Re
sp
on
se
h0
h1 h2
Tb=1/fb
h0
Pu
lse
Re
sp
on
se
h1h2h-1
Freq.
TimeTdelay + Δt
Figure 2.2: An example of a channel frequency response and the effect on an isolated datapulse
transmit a sequence of bits, as depicted in Figure 2.3, then the received signal would
become a superposition of pulses.
Bk-2h2
Bk-1h1
Bk+1h-1
Bkh0
Figure 2.3: Intersymbol interference (ISI) when transmitting a ’1111’ sequence
Let xk represent a sample in the received signal. Our goal is to recover the transmitted
bit, Bk, from the sample, xk. However, due to the spreading of the pulse energy, xk
also includes components from previous bits (Bk−1, Bk−2) and future bits (Bk+1). This
Chapter 2. Background 7
interference is known as intersymbol interference (ISI). The example in Figure 2.3 includes
3 ISI components in addition to Bkh0. If the ISI components are left uncompensated
they can corrupt the recovery of Bk and cause bit errors.
xk = Bk+1h−1 +Bkh0 +Bk−1h1 +Bk−2h2 (2.1)
In general, the sample, xk, can be expressed as the following:
xk = Bkh0︸ ︷︷ ︸Main cursor
+∑i<0
Bk−ihi︸ ︷︷ ︸Precursor ISI
+∑i>0
Bk−ihi︸ ︷︷ ︸Post-cursor ISI
(2.2)
The ISI caused by previous and future bits are known, respectively, as pre-cursor and
post-cursor ISI. In order to successfully recover the transmitted bit, Bk, the main cursor
must be the dominant cursor. If the main cursor is dominant, then data eye diagram will
be open.
|h0| >∑i 6=0
|hi| (2.3)
In addition to ISI, the channel also introduces a propagation delay, Tdelay, shown in
Figure 2.2, between the transmitted and received pulses. Tdelay is a constant delay when
transmitting a given pulse response over a given channel. However, since we usually
design the transmitter and receiver to work with a range of channels, Tdelay is not known
at the time of design.
In practice, the time when signal the arrives at the receiver usually deviates from
Tdelay. This timing deviation is defined as jitter (shown as ∆t in Figure 2.2) and can be
modeled as a random process. In general, jitter can be split into two components [18]:
deterministic jitter (DJ) and random jitter (RJ). DJ is bounded while RJ is unbounded
and has a Gaussian distribution. Channel imperfections (e.g. bandwidth limitations,
reflections, crosstalk, and electromagnetic interference) can cause part of the DJ. RJ is
Chapter 2. Background 8
mostly caused by noise from the circuits in the transmitter and receiver (e.g. thermal,
shot, and flicker noise) [18].
In order to compensate for the channel’s non-idealities, a typical receiver contains two
main blocks: equalizer and clock-and-data recovery (CDR) block. Equalizers are com-
monly used to reduce pre-cursor and post-cursor ISI in order to fulfill Equation 2.3. The
CDR block recovers the data below the target BER by compensating for the propagation
delay and jitter.
2.2. Receiver
Figure 2.4a shows a conventional binary receiver with a phase-tracking clock in a feedback
loop. The analog equalizer reduces ISI in order to open the eye. The eye opening allows
the comparator to sample the received data where the error probability is at its minimum
and regenerate the data bit. The comparator is followed by a CDR that detects the phase
of the equalized data signal and aligns the rising edge of the clock (CKREC) with the
center of the data eye. This kind of receiver is binary because the sampling comparator
only captures the sign of incoming data signal. Since equalization requires both sign and
magnitude, all of the equalization must be done before the comparator in the analog
domain.
CDRAnalog
Equalizer
Data
CDRAnalog
Equalizer
DataDigital
EqualizerADC
CKREC
CKREC
(a)
(b)
Figure 2.4: Comparison of (a) binary and (b) ADC-based receivers
Chapter 2. Background 9
Alternatively, we can transform the binary receiver into an ADC-based receiver by
replacing the comparator with an ADC as illustrated in Figure 2.4b. By incorporating an
ADC in the receiver, we capture both sign and magnitude of the signal after the analog
equalizer. Now that we have obtained magnitude information, it becomes possible to
perform additional equalization in the digital domain after the ADC. The digital equalizer
and CDR architectures (discussed in Sections 2.3 and 2.5) can be implemented entirely
with HDL code. The main disadvantages of ADC-based receivers are the high power
consumption and large area of the ADC. However, there are several benefits of a digital
equalizer and CDR implementation:
• Digital blocks are immune to PVT variations (assuming that timing constraints are
met across all corners)
• HDL code is easily ported across technology nodes using automatic synthesis, place,
and route software, whereas analog blocks must be manually designed for each
process technology.
• The digital blocks scale with more advanced technology nodes (which often benefit
digital blocks more than analog ones).
• The digital equalizer can be easily combined with a digital controller. Furthermore,
the adaptive controller may benefit by gaining access to both sign and magnitude
information. In analog equalizers, the adaptation controller often only has access
to sign information.
2.3. Equalization
Equalization can be implemented at the transmitter (pre-equalizer), receiver (post-equalizer),
or both. It is easier to perform post-equalization for two reasons. First, the equalizer may
change the signal swing, which, if implemented in the transmitter, alters the amplitude
of output signal. In many cases, the transmitter is designed for a set of communication
Chapter 2. Background 10
standards that impose constraints on the signal swing in the channel. The standards do
not impose constraints on the signals internal to the receiver. Second, adaptive equal-
ization is more easily implemented in the receiver compared to the transmitter. The
adaptive controller requires information about the channel response, which can be esti-
mated at the receiver. If the adaptive controller were implemented at the transmitter, it
would require feedback to be sent back from the receiver through an auxiliary channel.
For these reasons, some systems include a small amount of constant equalization at the
transmitter and adaptively equalize most of the frequency-dependent attenuation at the
receiver [36]. In this thesis, we will focus on post-equalizers.
There are two broad categories of equalizers: linear and non-linear. A receiver may
include one type or both types of equalizers. We will discuss them in more detail in
Sections 2.3.1 and 2.3.2.
Linear
EQ
C(z)
Partially Equalized
Signal (xK’)
Data
Signal
Recovered
Data (AK)
(xK)
yK
wK
Non-Linear Equalizer
Figure 2.5: (a) Linear and (b) non-linear receiver equalizers
2.3.1. Linear Equalization
The main purpose of equalization is to improve BER by decreasing the ISI caused by the
high-frequency attenuation of the channel. If we can cascade the channel and equalizer
such that the overall response is flat up to fb/2, then most of the ISI will be eliminated.
A linear equalizer achieves the final flat response by either emphasizing (i.e. boosting)
the high frequency content or by de-emphasizing (i.e. attenuating) the low frequency
content in the data signal. Figure 2.6 shows the latter.
Chapter 2. Background 11
fb/2
Channel Response
fb/2
=
fb/2
Linear EQ Response Channel + Linear EQ
Response
Figure 2.6: Frequency response of combined channel and linear equalizer
A linear equalizer can be implemented with a continuous-time or discrete-time ar-
chitecture. Figure 2.7 depicts an example of a commonly-used continuous-time linear
equalizer (CTLE). Usually, RS or CS is made programmable so that the zero at fz can be
adjusted to match the channel response. CTLEs can only be implemented with analog
circuits.
Vin+ Vin-gm
RS
CS
RL RL
CL CL
VOUT +-
fZ fP1 fP2
AV
AV ≈2πRLCL
1
fZ ≈2πRSCS
1
fP1 ≈
1 + (gmRS)/2fP2 ≈
2πRSCS
gmRL
Figure 2.7: Source-degenerated continuous time linear equalizer
A discrete-time linear equalizer (also known as feed-forward equalizer or FFE) can
be implemented either in the analog domain or digital domain (if the receiver is ADC-
based). They usually include an infinite impulse response (IIR) or finite impulse response
(FIR) filter that boosts high-frequency content of the data signal.
The main disadvantage of a linear equalizer is that it boosts not only the high-
frequency content of the signal, but also high-frequency noise. Sources of noise include
Chapter 2. Background 12
thermal noise from the transmitter and receiver, crosstalk, and ADC quantization noise
(in the case of a digital FFE). The latter can add a significant amount of noise to the
signal; hence, we can either increase the ADC resolution to reduce the quantization noise
or reduce the FFE gain (and rely on the decision-feedback equalizer described in the next
section).
2.3.2. Decision-Feedback Equalization (DFE)
A decision-feedback equalizer (DFE) is a non-linear equalizer that removes post-cursor
ISI from the channel using the recovered data and a filter (shown as C(z) in Figure 2.5)
whose pulse response matches the post-cursor ISI of the partially equalized signal. The
DFE response is given by:
wk =∑i>0
Ak−ici (2.4)
We assume a very low BER such that we can approximate the original data, Bk,
with the recovered data, Ak. The optimal equalization occurs when the DFE coefficients
match the ISI taps of the partially equalized signal (i.e. ci = h′i):
yk = x′k − wk = (Bkh′0 +
∑i<0
Bk−ih′i +
∑i>0
Bk−ih′i)−
∑i>0
Ak−ici
yk = Bkh′0 +
∑i<0
Bk−ih′i
(2.5)
One disadvantage is that the DFE cannot remove pre-cursor ISI since the DFE feed-
back path would need future recovered data to estimate the pre-cursor ISI. However,
linear equalizers can reduce pre-cursor ISI and are often used in conjunction with a DFE.
Another disadvantage of the DFE is error propagation. When an incorrect decision is
made, the wrong data is fed back through C(z) and may cause incorrect decisions on
future data.
Chapter 2. Background 13
The DFE’s main advantage is the absence of high-frequency noise amplification. Un-
like a linear equalizer that amplifies both signal and noise, the DFE slicer regenerates
the digital data without noise and the noise-less signal is fed back for equalization.
D Q D Q D Q
Critical
Timing Path
c1 c2 c2
AK-1 AK-2 AK-3
Recovered Data
Partially
Equalized
Signal
DFE FIR = C(z) = c1z-1
+ c2z-2
+ c3z-3
Figure 2.8: A 3-tap DFE example
The filter, C(z), can be implemented as either an FIR or IIR filter. Figure 2.8 il-
lustrates a DFE example with a 3-tap FIR filter. The coefficients c1, c2, and c3 should
be adjustable in order to accommodate different channels. Section 2.4 describes some
adaptive controllers that can set appropriate DFE coefficients for a given channel.
The feedback loop indicated in Figure 2.8 poses a challenge in meeting timing con-
straints during design of the DFE. If the DFE is implemented in the analog domain, the
high capacitance at the adder node slows down the propagation of the feedback signal.
The problem occurs if Ak−1 cannot be recovered and sent back to the adder in time for
the next bit. A digital DFE in an ADC-based receiver would face similar issues – digital
adders are slower than analog ones. One solution to the timing problem is to employ
speculation on the recovered bit. In this thesis, we assume NRZ coding where Bk is
either +1 or -1. Figure 2.9 shows an example of a 1-tap speculative DFE. It subtracts
both c1 and −c1 from the received signal and later selects the correct result using a mux.
The speculation removes the gain and the adder from the feedback loop; only the mux
and register remain in the critical path. The cost of speculation is the area and power
Chapter 2. Background 14
consumed by the extra adder. In particular, speculative DFEs do not scale easily because
the number of adders increases exponentially as we increase the number of DFE taps.
D Q-1
+1
+c1
-c1
Partially
Equalized
Signal
AK-1
Critical path is faster
Figure 2.9: A speculative 1-tap DFE
2.4. Equalizer Adaptation
When designing a receiver, we usually intend the receiver to work with a range of chan-
nels. In addition, the ISI in the received signal may vary with process and temperature.
Hence, we would not know the exact amount of equalization required at the time of
design. As described in Sections 2.3.1 and 2.3.2, both linear and decision-feedback equal-
izers usually include configurable coefficients which can be adjusted by a controller to
obtain an appropriate amount of equalization during receiver operation. This section
describes three different adaptation methods: zero-forcing (ZF), minimum mean square
error (MMSE), and maximum eye-opening.
2.4.1. Zero-Forcing (ZF) Method
A zero-forcing (ZF) equalizer attempts to force all ISI components to zero. If the equalizer
does not have enough taps (i.e. degrees of freedom), to force all ISI to zero, then the
optimal tap values should minimize the mean-squared sum of ISI components. This
section presents a ZF controller for a linear equalizer; we will discuss and compare ZF
and LMS controllers for DFEs at the end of Section 2.4.2.
The ZF analysis and examples in this section are taken from [5] and [13] and repro-
duced here for convenience. First, we will find the optimal equalizer coefficients in terms
Chapter 2. Background 15
of zero-forcing criteria. Second, we will describe a feedback loop that converges to the
optimal coefficients.
Figure 2.10 shows an example of a combined channel and equalizer pulse response,
h(t), and the desired Nyquist response, g(t). The sampled versions of the responses can
be represented with vectors, h and g, respectively, and we assume that p1 and p2 are
constants such that pre-cursor and post-cursor taps outside the range of k− p1 to k+ p2
are zero. In Figure 2.10’s example pulse response, p1 and p2 are 2 and 4, respectively. We
define the ISI vector, r, to be the difference between the actual and desired responses, h
and g, respectively. In this and the next section, note that the transmitted and recovered
data are represented with vectors, bk and ak, instead of Bk and Ak to distinguish the
data vectors from other matrix quantities.
g(t)
h(t)
rk = gk - hk = g(kTb - Tdelay) - h(kTb)
g = [ gk-p1 … gk-1 gk gk+1 … gk+p2 ]T
Define:
h = [ hk-p1 … hk-1 hk hk+1 … hk+p2 ]T
r = [ rk-p1 … rk-1 rk rk+1 … rk+p2 ]T
Figure 2.10: An example of channel+FFE pulse response (h(t)) and Nyquist response (g(t)).ISI is the difference between the two responses (r=g-h).
r = g − h (2.6)
Chapter 2. Background 16
The goal of a ZF equalizer is to minimize the energy of ISI, ‖r‖2.
‖r‖2 = ‖g − h‖2 = (g − h)T (g − h) (2.7)
Figure 2.11 provides a simple example of a system with a channel, FFE, and receiver.
In this case, the FFE is a 2-tap finite impulse response (FIR) filter. In Figure 2.11, f(t)
represents the channel pulse response. However, if the system includes both a CTLE and
FFE, then f(t) would be the convolution of the channel and CTLE responses. The vector,
c, represents the FFE coefficients. bk, yk, and ak represent the source data, equalized
signal, and recovered data, respectively. Figure 2.12 models the system with matrix and
vector quantities.
f(t)bk
Channel
ak
Tb
c1
c2
y(t)x1(t)
x2(t)
Delay element
bk = [ bk+2 bk+1 … bk-4 bk-5 ]T
c = [ c1 c2 ]T
f(-2) f(-1) … f(4) 0
0 f(-2) … f(3) f(4)F =
xk = [ x1(kT) x2(kT) ]T
Figure 2.11: An example of a receiver with a channel (with 2 pre-cursor and 4 post-cursortaps of ISI) and a 2-tap FFE
Fbk
Channel + Linear EQ
cc
yk = xkTc = bk
TF
Tc
ak
xk = Fbk
Figure 2.12: A partial model of a discrete-time receiver with channel and FFE
Given Figure 2.12, we see that the pulse response is h = F T c. Therefore, we substitute
Chapter 2. Background 17
h into Equation 2.7:
‖r‖2 = (g − F T c)T (g − F T c)
= gTg − 2gTF T c+ cTFF T c
(2.8)
In order to find the optimal c that minimizes ISI (i.e. cOPT ), we take the derivative
of Equation 2.8 and set it equal to zero:
∂
∂c(‖r‖2) = 2cTFF T − 2gTF T = 0 (2.9)
cOPT = (FF T )−1Fg (2.10)
As an example, let us assume that the FFE has two taps (i.e. c is a 2x1 vector). As
illustrated in Figure 2.13, h = F T c can be represented as a 2D plane. If g lies on the
plane, then there exist values for the two taps that can compensate the ISI completely.
However, if g is not on the plane, then we can find c = cOPT such that the length of r is
minimum. This occurs when r is orthogonal to the plane spanned by h.
h = FTc
hOPT = FTcOPT
g
r
Figure 2.13: A geometric representation of optimal zero-forcing FFE coefficients
Figure 2.14 shows the model from Figure 2.12 with a ZF feedback loop. The vector nk
represents white noise generated by the receiver’s circuits. The error ek is the difference
between the received sample, yk, and the desired signal, which we generate using the
desired pulse response, g, and recovered data, ak. We define vk = nkT c to be noise
shaped by the FFE coefficients. We also assume a low BER such that ak ≈ bk. In order
Chapter 2. Background 18
Fbk
Channel + Linear EQ
xk yk ak
M
c
Makek
g
Shift
Reg
nk
ek
akController
Figure 2.14: A model of a discrete-time receiver, including a ZF adaptation loop
to show that the feedback loop converges correctly, we find the error, ek, in terms of bk,
r, and vk.
ek = akTg − yk
= bkTg − (F T bk + nk)T c
= bkT (g − q)− nk
T c
= bkT r − vk
(2.11)
The ZF adaptation correlates the error, ek, with the recovered bits, ak. Equation 2.12
takes the average of the correlation term to find the ISI vector, r.
E[akek] = E[ak(bkT r − vk)]
= (E[ak(bkT r − vk)])
= (E[bkbkT ]r − E[bkvk])
= r
(2.12)
The integrator in the feedback loop forces the average of the weighted quantity Makek
Chapter 2. Background 19
to zero. The matrix, M , is a parameter that sets the gain of the feedback loop and
maps ISI taps to the FFE coefficients. To find M , we assume that bk is a sequence of
independent bits such that E[bkbkT ] = I where I is the identity matrix. We also assume
that the data is uncorrelated with noise (i.e. E[bkvk] = 0).
ME[akek] = Mr = 0 (2.13)
M(g − F T c) = 0 (2.14)
c = (MF T )−1Mg (2.15)
By comparing Equations 2.10 and 2.15, we see that the feedback converges to the
optimal tap values if M = uF (where u is a scalar that determines loop gain) or, more
generally, M = UF (where U is a matrix). This result implies that an optimal M
should be selected based on channel and equalizer responses. It appears that, by using
ZF adaptation, we have changed the problem of choosing c into one of choosing M .
However, it turns out that M is a less sensitive parameter compared to c. In practice, M
is chosen based on the worst-case channel that the system is designed for; in other cases,
the adaptation loop will not converge optimally, but will be close enough [5].
2.4.2. Minimum Mean Square Error (MMSE) Method
The minimum mean square error method seeks to minimize the average power of the
error, E[e2k], between the received signal, xk, and the desired signal. The error may
include the effects of both ISI and random noise. This is in contrast with the zero forcing
method where the adaptation algorithm minimizes ‖r‖2, which only includes ISI. An
implementation of a MMSE controller for a DFE is described in [1].
Figure 2.15 illustrates how we can find the MMSE using the steepest descent algo-
Chapter 2. Background 20
rithm. We assume that E[e2k] is well-behaved with respect to the equalizer tap values,
ck = [c1k c2k . . . cik . . . cNk], and that following the gradient at all ck will lead to the
minimum E[e2k]. For each cik, we start with an initial value and increment or decrement
it in the direction of decreasing average error power.
E[e2]
Minimum E[e2]
Increment or
decrement ci in
direction of
decreasing E[e2]
ci
Figure 2.15: A example of minimizing average error by using steepest-descent algorithm
ci(k+1) = cik − u∂E[e2k]
∂cik(2.16)
In a receiver system, it is usually not practical to measure E[e2k]; therefore, we ap-
proximate Equation 2.16 by replacing the expected value with the instantaneous value.
When this approximation is made, the steepest descent algorithm is known as the least
mean square algorithm.
ci(k+1) = cik − u∂(e2k)
∂cik
ci(k+1) = cik − 2uek∂(ek)
∂cik
(2.17)
Equation 2.17 can be applied to any equalizer. Figure 2.16 shows a LMS feedback loop
implemented for a DFE. In order to apply the steepest-descent algorithm, it is necessary
to relate the error, ek, to the DFE coefficients, ck in Equation 2.18.
Chapter 2. Background 21
g
Shift
Reg
xk
yk ak
ek
2u
Controllerck
{ak-1, ak-2, ak-3, …}
DFE
wk
Figure 2.16: A model of a discrete-time receiver with a DFE and LMS adaptation loop
ek = yk −M∑j=1
gjkak−j
ek = (xk −N∑i=1
cikak−i)−M∑j=1
gjkak−j
(2.18)
From Equation 2.18, we can find the derivative of ek with respect to cik:
∂(ek)
∂cik= −ak−i (2.19)
We can substitute Equation 2.19 into Equation 2.17:
ci(k+1) = cik + 2uekak−i (2.20)
We can implement Equation 2.20 as the controller in Figure 2.16. It is possible to
further simplify the controller to replace ek or ak−i or both with only their signs (i.e.
sgn(ek) and sgn(ak−i)). These simplified LMS controllers are respectively known as
sign-error, sign-data, or sign-sign.
It is also interesting to compare the ZF and LMS controllers in Figures 2.14 and 2.16.
Chapter 2. Background 22
If we replace the FFE in Figure 2.14 with a DFE and substituted M = 2uI (where I is
the identity matrix), then the ZF controller would be identical to the LMS controller for a
DFE. This is expected because a DFE does not amplify noise. Therefore, minimizing ISI
(||r||2) and signal error (E[e2k]) at the DFE output should lead to the same solution [12].
2.4.3. Maximum Eye Opening Method
The maximum eye-opening method is another commonly-used algorithm [8, 16, 31] for
adjusting equalizer taps. Figure 2.17 shows a system that uses an eye monitor to measure
eye height or width and feeds the information back to the equalizer through a controller.
It should be noted that optimizing an equalizer to maximize eye height may not lead
to an optimal eye width and vice versa. The eye monitors described in [31] and [16]
measure eye height by comparing the outputs of a main sampler and auxiliary sampler
with a shifted threshold. If the outputs are the same, then the threshold of the auxiliary
sampler is within the eye. Thus, the eye monitor estimates eye height by increasing the
threshold of the auxiliary sampler until the outputs differ.
Equalizer
ControllerEye
Monitor
CDR
Recovered
Clock
Recovered
DataSignal from
Channel
EQ coefficients
Figure 2.17: A system that adapts equalizer taps based on eye opening
The adaptive controllers in [31] and [16] iterate across all possible combinations of
equalizer tap values. For each combination, they determine the eye-opening by plotting
a histogram and, at the end, choose the tap settings that produce the maximum eye-
opening. Compared to the ZF and LMS equalizers, this adaptation method is slower and
Chapter 2. Background 23
cannot run continuously during data recovery because it has to try all of the equalizer
settings. However, the method is more flexible since it can be applied to a variety of
equalizer structures and does not depend on having correctly recovered data.
2.5. Clock and Data Recovery (CDR)
In many wireline communication systems, the clock signal is not transmitted with the
data signal in order to reduce the number of wires and, therefore, the cost of the channel.
In addition, the receiver usually has a plesiochronous clock source (i.e. similar in fre-
quency, but phase and frequency are not matched) with respect to the transmitter data.
Hence, the clock and data recovery (CDR) block’s job is to extract the transmitted clock
and binary data from the data signal in the presence of jitter and frequency offset. One
type of CDR generates a phase-tracking clock whose falling and rising edges align, re-
spectively, with the zero-crossings and centers of the data signal (shown in Figure 2.18).
Then, the CDR samples the data signal with the clock’s rising edge and outputs the
recovered data and phase-tracking clock to downstream digital blocks.
Eye Diagram of
Equalized Data Signal
Recovered
Clock, CKRX
Figure 2.18: A recovered clock sampling equalized data
Another type of CDR blindly samples the data signal with the plesiochronous clock
and post-processes the samples to extract the data bits and phase information. As de-
picted in Figure 2.19, we can classify CDRs into two broad categories where one operates
with a phase-tracking clock and the other with a blind clock. We can further classify
CDRs as having a feedback or feed-forward architecture. In Sections 2.5.1 to 2.5.3, we
will discuss three types of CDRs; burst-mode CDRs [2] are omitted because they are less
relevant to the proposed CDR. Chapter 3 proposes an ADC-based implementation of a
Chapter 2. Background 24
blind-sampling CDR with feedback.
CDR Types
Phase-Tracking
ClockBlind Clock
Feedback
(Conventional)
Feed-forward
(Burst-mode)
Feedback
(Data interpolator)
Feed-forward
(Oversampling)
Figure 2.19: CDR classification
2.5.1. Phase-Tracking CDR with Clock Feedback
Figure 2.20 shows a conventional phase-tracking CDR with clock feedback. The phase
detector (PD) compares the equalized data signal to the recovered clock, CKRX , to esti-
mate the phase difference between them. The PD output is an error signal that is ideally
proportional to the phase difference. The charge pump (CP) is a transconductor that
converts the error signal to a current. The loop filter is a proportional-integral controller
where the resistor, R1, produces a proportional voltage to the current and the capacitor,
C1, integrates the current. The second capacitor, C2, is used to smooth the pulses of
current from the CP and its value is much smaller compared to C1. The voltage from
the loop filter adjusts the frequency (and, indirectly, the phase) of the voltage-controlled
oscillator (VCO) that generates CKRX . When operating in steady state conditions, the
feedback loop forces the phase of CKRX to match that of the incoming data signal.
Although Figure 2.20 shows an CDR with a VCO block, it is also possible to generate
CKRX with a phase interpolator (PI). While a VCO’s frequency is proportional to its
input voltage, PI’s phase is directly proportional to its input signal. Hence, a PI-based
CDR usually has an extra integrator in the loop filter to replace the integrator from the
VCO. PI-based CDRs can be used in multi-transceiver systems to reduce the number of
VCOs (e.g. to avoid coupling between VCOs). On the other hand, PIs are challenging
Chapter 2. Background 25
PD & CP VCO
R1
C1
C2
Loop Filter (LF)
D Q Recovered data (AK)Equalized
Data Signal
PD: phase-detector
CP: charge pump
VCO: voltage-controlled oscillator
CKRX
Figure 2.20: System diagram of phase-tracking CDR with clock in feedback loop
to implement in terms of linearity (i.e. phase output not exactly proportional to the
input signal) and noise (i.e. the PI has a lower output amplitude compared to VCO
output) [10].
We can characterize a CDR’s performance by measuring its jitter tolerance, jitter
transfer, and jitter generation. Jitter tolerance measures the maximum amount of sinu-
soidal jitter between the data signal and CKRX from which the CDR can successfully
recover data given a required BER. Jitter transfer is the amount of jitter the CDR trans-
fers from the data signal to CKRX . Jitter generation is the amount of jitter in CKRX
caused by the CDR’s internal blocks (e.g. VCO). The most important measurement is
jitter tolerance because it directly relates input jitter to BER. A simplified example is
shown in Figure 2.21.
The jitter tolerance curve is separated into two parts by the CDR’s bandwidth. When
the frequency of the input jitter is low, the CDR can shift CKRX to track the center of
the data data eye even if it deviates from the ideal location by more than 0.5UI. However,
when the jitter frequency is higher than the CDR bandwidth, the feedback cannot track
the data eye. At most, the data eye can move the 0.5UI (i.e. 1UIPP ) before a bit
error occurs. In practice, the high frequency jitter tolerance is usually lower than 1UIPP
Chapter 2. Background 26
1UIPP
Jitt
er
tole
rance
(UI P
P)
Jitter Frequency (Hz)
CDR bandwidth
Figure 2.21: Example of a jitter tolerance chart
because the CDR has to recover data in the presence of other components of jitter besides
sinusoidal jitter (e.g. data-dependent jitter, random jitter, etc.).
The PD is an important component of the CDR because it provides the error signal
used to guide the feedback loop (shown in Figure 2.22). In following sections, we will
discuss three types of PDs: Alexander, Hogge, and Mueller-Muller.
KPD
ΦERR
ΦIN
ΦCK
PDOUT
PDΦIN
ΦCK
PDOUT
(a) (b)
Figure 2.22: (a) PD inputs and output and (b) linear model
Alexander (Bang-Bang) Phase Detector
As depicted in Figures 2.23 and 2.24, the Alexander PD, also known as a bang-bang
PD, samples both the edges and centers of the data signal. When a transition occurs,
the PD compares the edge sample to the adjacent center samples to determine if the
clock is early or late with respect to the data signal. In order to capture both center and
edge samples, the Alexander PD must oversample at 2x the baud rate. Alexander PDs
are widely used because they are easily implemented with digital logic, but, as shown in
Chapter 2. Background 27
Figure 2.25, they are highly non-linear when jitter is absent from clock and data. When
jitter exists, the PD can be linearized [17], but its gain is jitter-dependent. This is also
undesirable since we usually cannot predict the jitter in advance.
D Q D Q
D Q D Q
PD
Lo
gic
CKRX
D2
D1
E
DIN
Early
Late
{D1, E, D2}
110 or 001
100 or 011
000 or 111
1
0 1
0
0 0
Early Late
Figure 2.23: Alexander PD implementation
D2D1 E
01 1
CKRX is early
(a) (b)
D2D1 E
01 0
CKRX is late
CKRX
DIN
Figure 2.24: Alexander PD examples with early and late CKRX
-UI/2 -UI/2
ΦERR=ΦIN-ΦCK
PDOUT=Avg(Late – Early)
Figure 2.25: Transfer function of Alexander PD with no jitter on data or CKRX
Chapter 2. Background 28
Hogge Phase Detector
The Hogge PD is depicted in Figure 2.26. In contrast to the Alexander PD, its output
is linear and its gain is independent of jitter. As shown in Figure 2.27, the signal, B, is
a pulse with a constant width of 0.5UI. The other signal, A, measures the time from the
data transition to the rising edge of CKRX . When the rising edge samples the center of
the data eye (Figure 2.27b), the data transition occurs 0.5UI from the rising edge, the
pulses on A and B are equal, and the average PD output is zero. Otherwise, PDOUT is
positive or negative when CKRX is late or early, respectively.
D Q D Q
CKRX
DIN
A
B
PDOUT
BUF1
FF1 FF2
Figure 2.26: Hogge PD implementation
Early
(a)
A
B B BA A
CKRX
DIN
A < B A < B A > B
Avg(PDOUT)<0
PDOUT
+1
-1
Avg(PDOUT)=0 Avg(PDOUT)>0
On time Late
(b) (c)
Figure 2.27: Hogge PD output with (a) early, (b) on-time, and (c) late CKRX
Chapter 2. Background 29
Figure 2.28 shows the transfer function of an ideal Hogge PD with no offset. However,
the Hogge PD is more difficult to implement accurately compared to the Alexander PD.
In particular, the delay of BUF1 should match the clock-to-Q delay of FF1. A delay
mismatch adds a phase offset to the A signal and, in turn, causes PD offset [6].
-UI/2 -UI/2
ΦERR=ΦIN-ΦCK
PDOUT=Avg(Late – Early)
Figure 2.28: Transfer function of Hogge PD
Mueller-Muller Phase Detector
One way to reduce power consumption is to reduce the sampling rate. Both the Alexan-
der and Hogge PDs require a 2x oversampling rate. In contrast, Mueller-Muller PDs
(MMPDs) allow the CDR to operate at baud rate (1x) sampling [14, 21, 26] – the PD
calculates phase error from center samples only. The center samples contain mostly
amplitude information about the data signal and the edge samples, which the MMPD
ignores, contain mostly phase information. However, if pulse response of the data signal
has ISI, then the MMPD can infer the phase information from the center samples and
the slope of the pulse response. Therefore, a MMPD requires ISI in order to function; it
will fail if given a data signal with a Nyquist pulse response (which has infinite slope on
its edges).
Each MMPD is defined by a MM function, F , which should be chosen based on the
pulse response of the channel. The MM function is also the transfer characteristic of the
MMPD. When placed in a CDR feedback loop, the feedback forces the MM function to
zero.
Figure 2.29 shows an example that Mueller and Muller presented in their 1976 pa-
Chapter 2. Background 30
0 T 2T
Time
3T
h-1
h0
h1
F = h-1-h1
-1 1
Sampling Phase (UI)
2
(a) (b)
Pulse Response
ExampleMM Function
Figure 2.29: Example of (a) pulse response and (b) MM function [21]
per [21]. The MM function demonstrated in [21] was F = h−1 − h1 (i.e. the difference
between the precursor, h−1, and post-cursor, h+1). Given the example pulse response
shape, when the samples h−1 and h1 shift to the left, h1 becomes greater than h−1 and F
is negative. Conversely, if the samples shift to the right, F becomes positive. When the
CDR locks, the feedback forces F to zero and h−1 and h1 are equal such that the main
cursor, h0, is near the optimal sampling position close to the peak of the pulse response.
Mueller and Muller also showed that we can estimate the points on the pulse response
(e.g. h−1, h0, h1, etc.) by correlating baud-rate samples of the data signal with the
recovered data. The results are listed in Equations 2.21 to 2.24. The derivation is omitted
because the analysis is very similar to the Equation 2.12. We note that Equations 2.21
to 2.24 assume random, independent data with zero DC bias (E[Ak] = 0); therefore, the
MMPD requires these conditions on the input signal in order to function correctly.
E[xkAk−1] = h1 (2.21)
E[xkAk] = E[xk−1Ak−1] = h0 (2.22)
E[xk−1Ak] = h−1 (2.23)
Chapter 2. Background 31
h−1 − h1 = E[xk−1Ak − xk−1Ak] (2.24)
According to Equation 2.24, we can implement the MMPD described in Figure 2.29
using the expression: xk−1Ak−xk−1Ak. The loop filter that follows the MMPD estimates
the expected value by averaging the MMPD output.
From Figure 2.29, we can also observe a disadvantage of the MMPD – namely, its
transfer function is dependent on the shape of the channel pulse response. A sharp pulse
response will lead to a high PD gain, whereas a spread-out pulse response (resulting from
increased ISI) will reduce the PD gain.
2.5.2. Blind Feed-forward CDR
An example of a blind feed-forward CDR is described in [22,27], as shown in Figure 2.30.
The proposed design samples a 10.3Gbps data signal at 82.5GS/s (8x oversampling).
The edge detector locates the rising and falling data transitions by comparing adjacent
samples. As depicted in Figure 2.31, the data selector chooses the sample farthest away
from the edge (i.e. closest to the center of the UI).
8-phase
clock
generator
SamplersEdge detection +
Data selection logic
PLLCKREF
Recovered
DataDIN
Figure 2.30: System diagram of a 8x oversampled blind feed-forward (burst-mode) CDR [22,27]
An advantage of the feed-forward architecture is that the CDR blocks can be im-
plemented and simulated independently. In fact, the data selection logic in Figure 2.30
was implemented on a separate FPGA while the analog front end blocks were imple-
Chapter 2. Background 32
Detected Edges
UI Center
Figure 2.31: The edge detection and data selection process from Figure 2.30
mented on a test chip. However, the 8x oversampling ratio required a large number of
samplers and a complicated clock distribution network, which resulted in the test chip’s
high power consumption of 5.8W. The oversampling ratio also limits the data rate. The
analog front end’s power consumption and increasing data rates motivates us to reduce
the oversampling ratio.
FFE
Data
Decision
Low-pass
FilterPD
ΦX
ΦAVG
DOUT
5Gb/s
Input
Digital CDR
5GHz
Blind CK
a
bΦX
2x blind samples
0.5UI
ΦX 0.5a
a - b
PD interpolates linearly between
2x samples to find zero-crossing:
5-bit
ADC
¸4 ¸2
Figure 2.32: A blind 2x ADC-based CDR [32]
Figure 2.32 shows an ADC-based implementation of a 2x blind feed-forward CDR [32], [36].
A 5Gb/s input is sampled by a 5-bit ADC and is passed to a feed-forward equalizer (FFE)
in the digital CDR. After the FFE, the blind samples are processed by the phase detector
(PD). If two adjacent blind samples are opposite in sign, a zero-crossing is detected which
Chapter 2. Background 33
corresponds to the edge sample in a phase-tracking system. This zero-crossing, denoted
by variable φX , is approximated by the linear interpolation shown in Figure 2.32. The
instantaneous value of φX is low-pass filtered into φAV G by the digital filter. The data
decision block adds 0.5UI to φAV G to find the center of the eye and compares it to φX to
recover the data. This system uses 2x sampling where the blind samples are 0.5UI apart.
However, if oversampling ratio can be decreased, then the data rate can be increased
without increasing the frequency of the blind clock.
5-bit
ADC
Data
Decision
FilterPDΦX
ΦAVG
DOUT
¸4
6.875Gb/s
Input
Digital CDR5GHz
Blind CK ¸2
Data
Compactor
S1
S2
S3 S16
Fractional sampling: 16 samples per 11 UI
ΦX
Figure 2.33: A blind 1.45x ADC-based CDR [33]
A subsequent work [33], illustrated in Figure 2.33, reduces the oversampling ratio to
1.45x; the receiver takes 16 samples for every 11UI to achieve 6.875Gb/s. Its architecture
is similar to the one presented in [36], but now the samples are farther apart than 0.5UI
and the linear interpolation used in the PD to estimate zero-crossings is less accurate. To
solve this problem, the PD filters out some of the less accurate results based on sample
amplitude. With this architecture, 1.45x seems to provide a good compromise where
the oversampling ratio can be reduced without much loss in jitter tolerance. In order to
eliminate oversampling altogether, Chapter 3 proposes a different CDR architecture.
Chapter 2. Background 34
2.5.3. Blind CDR with Feedback
Due to the linearity and noise drawbacks of PI-based CDRs, [10] proposed a 2x oversam-
pling, 32Gbps design based on a data interpolator (DI) instead of a PI. The DI samples
the data signal blindly and generates the center and edge samples by interpolating be-
tween the blind samples as shown in Figure 2.34. The DI is implemented in the analog
domain by storing the samples on capacitor arrays and interpolating through charge
sharing.S
am
ple
r
Sw
itch
ed
-
ca
p.
arr
ay
PD
LF
Data Interpolator
DIN
Recovered
Data
ΦAVG
Figure 2.34: System diagram of blind CDR with feedback [10]
Data
Edge
Data
Edge
Data Center
Blind sample
Interpolated
sample
Figure 2.35: Analog data interpolator (DI) estimates center and edge samples from blindsamples [10]
A disadvantage of a DI-based CDR is that the DI introduces interpolation error
when estimating the desired samples. In particular, the analog interpolator is a first-
order interpolator (see Figure 2.35). A digital DI can reduce the error by using a more
sophisticated interpolation algorithm. Chapter 3 proposes ADC-based implementation
of a blind CDR with a digital DI.
Chapter 2. Background 35
2.6. Summary
This chapter discussed fundamental concepts about channels and receivers and reviewed
some previous work on adaptive equalizers and CDR blocks. This thesis builds upon the
background in this chapter by exploring blind baud-rate CDR architecture in Chapter 3
and a zero-forcing adaptive DFE in Chapter 4.
3 A Blind Baud-Rate CDR
This chapter proposes a CDR that can recover data from blind baud-rate samples. Sec-
tion 3.1 discusses some concepts and challenges arising from blind baud-rate data re-
covery. Sections 3.2 and 3.3 present the receiver, CDR, and each of their components.
Section 3.4 shows the simulated and measured results.
3.1. Blind 1x Data Recovery Concepts
The PDs in the 2x [32,36] and 1.45x [33] blind CDRs (Figures 2.32 and 2.33, respectively)
interpolate between the blind samples in order to detect the phase of the zero crossings;
they require a finite slope in order to calculate phase. The interpolation cannot accurately
estimate phase when given a low-loss channel because the data transitions become to
abrupt. Unlike phase-tracking CDRs, blind ADC-based CDRs perform poorly with low-
loss channels. Since a blind ADC-based CDR should work with a range of channels, we
focus most of the analysis on low-loss channels. Section 3.4 shows how the proposed
CDR can be modified for a high-loss channel.
Figure 3.1 compares eye diagrams with different sampling rates given a low-loss chan-
nel. The worst-case sampling position occurs when adjacent samples are equally far from
the center of the eye. For 2x blind sampling, the worst case is where adjacent samples are
both 0.25UI from the edge, which leads to a high-frequency jitter tolerance of 0.5UIPP.
When the oversampling ratio is decreased to 1.45x, jitter tolerance decreases to 0.31UIPP.
36
Chapter 3. A Blind Baud-Rate CDR 37
At 1x, the samples may occur on the edges. If jitter shifts samples away from each other,
then the CDR will not capture the bit at all, which results in zero jitter tolerance. The
following paragraph uses the channel’s pulse response to elaborate on this issue and to
arrive at the proposed solution.
2x 1.45x 1x
0.5UIPP 0.31UIPP 0UIPP
High Freq.
Jitter Tol.
(HF JT):
Figure 3.1: Worst-case for 2x, 1.45x and 1x sampling on open eye diagram
Figure 3.2 shows the pulse response of an ideal channel. The best sampling position
occurs when the main cursor is at the center of the ideal pulse response. In a clocked
phase-tracking system, the sampling would remain at this position. However, with 1x
blind sampling, any frequency offset between the data and receiver clock will cause the
sampling phase to shift continuously across a 1UI window. When the sampling occurs
near the UI boundary, any high-frequency jitter may shift the sampling outside the 1UI
phase range, resulting in the loss of data bits (i.e. zero jitter tolerance).
In order to increase the jitter tolerance at baud-rate sampling, the pulse response
is extended beyond 1UI by introducing a controlled amount of ISI in the data using a
rectangular filter, which is implemented via an integrate-and-dump (I&D) circuit [28] in
the receiver front end. A rectangular filter is suitable in this case since its response has a
finite length of ISI and requires fewer equalization taps compared to the exponentially-
decaying response of an RC filter. A 1UI rectangular filter, convolved with the ideal
channel, spreads the pulse response to 2UI. If we have a perfect decision feedback equalizer
Chapter 3. A Blind Baud-Rate CDR 38
Ideal
channel
(no I&D)
0 T 2T 3T
Ideal
channel
+ 1UI I&D
Ideal
channel
+ 2UI I&D
Pulse Response with
Blind Baud-Rate Samples
h0
h-1
h0
h0
h-1: Pre-cursor
h0: Main cursor
Vertical eye opening with
ideal DFE (h0-h-1)
-1 0 1
1UI blind range
h-1
0UIpp jitter
tolerance at
boundary
(No margin)
Sampling Phase (UI)Time
0.5UIpp jitter
tolerance at
boundary
1UIpp jitter
tolerance at
boundaryh-1
Faded arrows and dots show possible
sampling phases due to frequency offset.
Figure 3.2: Comparison of theoretical worst-case jitter tolerance given the pulse responses ofan ideal channel, 1UI I&D, and 2UI I&D. Blind baud-rate samples can shift across a 1UI rangedue to frequency offset.
(DFE) to cancel all post-cursor ISI, then the eye would be open for a range of 1.5UI (this
would have been 2UI if we could cancel pre-cursor ISI). If the blind samples shift beyond
the 1UI window, there is still a remaining jitter margin of 0.5UIPP. A 2UI rectangular
filter increases this margin to 1UIPP and results in a symmetric eye opening with respect
to the blind sampling window. For these reasons, a 2UI I&D circuit was chosen for the
proposed design.
Chapter 3. A Blind Baud-Rate CDR 39
3.2. Proposed 1x Blind Receiver Architecture
Figure 3.3 shows the system diagram of the receiver including an analog front end and
digital CDR. The analog front end consists of four interleaved I&D and ADC blocks, each
operating at 2.5GS/s. Figure 3.4 shows two possible implementations of a 2UI I&D. The
first implementation illustrated in Figure 3.4a is a fully analog 2UI I&D. We have chosen
the second implementation (Figure 3.4b) where the 2UI I&D consists of 2 components:
one piece is analog and the other digital. The I&D circuit integrates 1UI samples and
the ADC converts the samples into 5-bit digital values. An adder in the digital CDR
combines adjacent 5-bit 1UI I&D samples to synthesize 6-bit 2UI I&D samples. Since
the ADC resolution is limited to 5 bits, if we were to obtain 2UI I&D samples directly in
the analog domain and feed them to the ADC, we would have lost the additional 1 bit
of resolution.
Simulations showed that the system needed an ADC with a minimum ENOB of 4 bits;
this work uses a previously designed 5-bit ADC with a known ENOB of 4.2 bits [32].
The proposed design does not include ADC calibration; the addition of digital calibration
for gain, offset, and timing mismatches [19, 25, 35] would further improve the receiver
performance.
The samples in the digital CDR are processed by the data interpolator, which esti-
mates the samples at the center of the eye using the recovered phase, φAV G. The digital
data interpolator allows the use of a more sophisticated interpolation algorithm com-
pared to an analog interpolator. A Mueller-Muller PD and loop filter form a feedback
loop with the data interpolator. Loop latency is critical in this design since the digital
CDR operates on a 625MHz divided clock – each cycle in the loop adds significant delay.
The proposed implementation has a loop latency of 7 cycles. A 2-tap DFE recovers the
binary data, Ak, from the interpolated samples, xk.
The data interpolator compensates for frequency offset. As shown in Figure 3.5a, we
Chapter 3. A Blind Baud-Rate CDR 40
5GHz Blind
CKRX
Data
Interpolator
MM
PD
Loop
Filter
xK
Average interpolation phase (ΦAVG)
z-1
-31
Convert to
signed integer
Add 1UI I&D samples
to form 2UI samples
xK: Interpolated samples
AK: Resolved bits
16x5b
17x1b
Digital
CDR
÷2 ÷4
10Gb/s
Data
5-bit
ADC1UI I&D
Digital CDR
2.5GHz
625MHz
DFE
4
AK1-UI
I&D
5-bit
ADC
AK
x2
4:1
6
4
Clock gen.
Figure 3.3: System block diagram of interleaved analog front end (1 UI I&D and ADC) anddigital CDR
Analog
2UI I&DADC
z-1
Analog
1UI I&DADC
Blind
CKRX
Blind
CKRX
(a) (b) Digital adder
produces 2UI I&D
Figure 3.4: Comparison of (a) fully analog 2UI I&D and (b) analog and digital 2UI I&D
define negative frequency offset to mean the transmitter clock is slower than the blind
receiver clock. When this occurs, an interpolated sample is skipped each time the phase
completes a 1UI rotation. Similarly, Figure 3.5b shows a positive frequency offset where
the transmitter clock is faster than the receiver clock. A positive frequency offset would
Chapter 3. A Blind Baud-Rate CDR 41
Blind samples
Phase rolls over from 1UI to 0UI
à skip interpolation
Desired sampling
locations
1UI
Phase rolls over from 0UI to 1UI
à do interpolation twice
Blind samples
Desired sampling
locations
ΦAVG
ΦAVG
(a)
(b)
Figure 3.5: Handling (a) negative frequency offset: data (TX) is slower than blind receiverclock (CKRX) (b) positive frequency offset: data (TX) is faster than blind receiver clock(CKRX)
result in cases where no blind sample exists between two desired samples; the interpolator
resolves these cases by interpolating twice between the closest two blind samples when
the decreasing φAV G rolls over from 0UI to 1UI. The range of frequency offset supported
by the loop filter is low enough that we can assume the extra interpolated sample is very
close to the blind sample at 1UI. Hence, the implemented interpolator directly uses the
blind sample as the extra interpolated sample.
The data path in the digital CDR is sized for 17 parallel samples. Most of the time,
only 16 paths are active. If there is frequency offset and φAV G rolls over, then the number
of active paths is temporarily reduced to 15 or increased to 17 for one cycle.
Chapter 3. A Blind Baud-Rate CDR 42
Reset
Switches
V0
V1
V2
V3
CL CL
SC1,SC1xSC0,SC0x
SC3,SC3xSC2,SC2x
SC3SC2
SC1SC0
Vin+ Vin-
SC2x
SC2
Figure 3.6: Implementation of integrate-and-dump (I&D) circuit [28]
3.3. Receiver Implementation
3.3.1. Integrate-and-Dump Filter
The output from the channel drives the input of the I&D filter. The I&D circuit in Fig-
ure 3.6 introduces controlled ISI into the ADC input and also operates as a frequency-
scalable anti-aliasing filter [28]. The circuit consists of a single source-degenerated
transconductance stage that converts the input voltage to current and integrates the
signal on the input capacitance of the four interleaved ADCs, labelled as CL in Fig-
ure 3.6. Each interleaved I&D block operates in 3 phases: integrate, hold (during which
the ADC samples the value), and reset. The clock pulses (SC0, SC1, SC2, and SC3) reset
the outputs (V0, V1, V2, and V3) and redirect the current to each of the interleaved
ADCs. Each clock pulse is 1UI wide.
Chapter 3. A Blind Baud-Rate CDR 43
Operating phases
Clock Pulses
1UI
4UI
SC0
SC1
SC2
SC3
(1) Integrate, (2) Hold, (3) Reset
Figure 3.7: I&D operating phases synchronized with clock pulses
3.3.2. Clock Generator
CML-to-CMOS Converters with
Adjustable Delay for Deskew
CMOS Duty-
Cycle Correction
CML
Toggle
FF
(÷2) SC3
SC2
SC1
SC05GHz
CKRX
Clock
Pulse
Generator
Figure 3.8: Implementation of clock pulse generator with adjustable delay for deskew
Figure 3.8 shows the clock generator which drives the ADC and I&D. A CML toggle
flip-flop divides a 5GHz input clock into 4 phases, each at 2.5GHz. The outputs are then
Chapter 3. A Blind Baud-Rate CDR 44
converted into single-ended CMOS signals and buffered. The clock pulse generator [28]
uses logic gates to generate 1UI wide pulses from the 4 clock pulses.
Correct skew by
adjusting clock delays
SC0
SC1
SC2
SC3
Effect of clock phase
skew
(a) (b)
Figure 3.9: (a) Effect of clock phase skew on the I&D integration period (b) Equal I&Dintegration periods after correcting clock skew
Figure 3.9a shows an example of the clock pulses when skew exists between the 4
phases. First, we note that any skew could change the integration periods when the pulses
control the I&D operation. There would be gain mismatch between the 4 interleaved
I&D blocks. Second, when high-speed signals are sampled, the clock skew would appear
effectively as high-frequency periodic or duty cycle dependent (DCD) jitter. Both the
gain mismatch and high-frequency jitter will degrade the receiver’s jitter tolerance. This
sensitivity to clock skew is a disadvantage of using the I&D block.
As shown in Figure 3.9b, the clock skew can be compensated by adjusting the clock
phase through deskew circuits. In this design, the skews are manually adjusted by ob-
serving the ADC outputs (e.g. Figure 3.24). Figure 3.10 shows the deskew circuitry
implemented in each of the CML-to-CMOS converters as a 4-bit phase interpolator. The
differential clock signal connects to the In+ and In- inputs and a 20ps delayed clock
connects to In del+ and In del-. Combining them achieves ±10ps of deskew range on
each of the 4 clock phases driving the I&D.
Chapter 3. A Blind Baud-Rate CDR 45
In+
In-
In_del+
Out
In_del-
Del[3]
1x2x4x8x 1x2x4x8x
_____
Del[3]_____
Del[2]_____
Del[1]_____
Del[0]
Del[2]
Del[1]
Del[0]
Vbias
Figure 3.10: Adjustable clock delay block
3.3.3. Data Interpolator
0.5×ΦAVG when 0 ≤ ΦAVG < 0.5 UI
0.5×(1-ΦAVG) when 0.5 ≤ ΦAVG ≤ 1 UI
ΦAVG
a
bc
d
0.5((b-a) + (c-d))×Y(ΦAVG)b×(1-ΦAVG) + c×ΦAVG
≈ΦAVG
ΦAVG
1UI
bc
Desired sample ≈
Y(ΦAVG) =
Figure 3.11: Piecewise linear interpolation of desired sample from blind samples
Given the ADC’s blind samples and the CDR’s recovered phase, φAV G, the data
interpolator estimates the value of the data at the centre of the eye (i.e. the desired
sample). Figure 3.11 shows 4 consecutive blind samples, a, b, c and d, that are separated
Chapter 3. A Blind Baud-Rate CDR 46
by 1UI. The desired sample is φAV G away from sample b. For simplicity, the expression
in Figure 3.11 assumes that φAV G is a floating point value between 0 and 1UI. In the
implementation, φAV G is represented by a 5-bit value.
The desired sample is estimated first by linearly interpolating between samples b
and c. This estimate has a large error because samples b and c are separated by 1UI.
To improve accuracy, extrapolation is performed using the slopes ((b − a)/1UI) and
((c− d)/1UI). The piecewise linear shape is scaled in Figure 3.11 by the average of the
two slopes and superimpose it on the linear interpolation. Hence, the accuracy of the
estimate is improved by using four instead of two blind samples.
3.3.4. Mueller-Muller Phase Detector
0 T 2T 3T
h0
h-1
Time
h1
h2 -1 1
Sampling Phase (UI)
Ideal channel+2UI I&D
Pulse Response
2-2
F = h0-h1
B
B
-B
MM Function
(a) (b)
Figure 3.12: (a) Pulse response of an ideal channel followed by 2UI I&D (b) Proposed MMfunction
In the proposed design, the 2UI I&D provides a wider pulse response such that the
conventional MM function in Figure 2.29 would not provide the optimal sampling phase.
If the receiver includes a DFE to cancel post-cursor ISI, the maximum vertical eye opening
occurs when the main cursor, h0, is at time T in Figure 3.12 because h0 is the maximum
value of the pulse response and h−1 is zero. Setting the pre-cursor tap to zero will allow
us to fully benefit from the DFE and eliminates the need for FFE. This sampling position
occurs when post-cursor ISI, h1, is equal to the main cursor, h0. To identify this desired
Chapter 3. A Blind Baud-Rate CDR 47
phase location, we choose the MM function to be F = h0 − h1 [14] and force it to zero
through the feedback loop. Since the actual sampling phase is blind, the desired phase
is forced on the interpolating phase, φAV G.
D Q0
1
AK-1
xK
xK-1
Addition and sign operation are done
speculatively while the DFE resolves AK-1
h-1 = h(-T+t)
= E[xK-1AK]
h0 = h(t)
= E[xKAK]
= E[xK-1AK-1]
h1 = h(T+t)
= E[xKAK-1]
h2 = h(2T+t)
= E[xKAK-2]
Mueller-Muller function:
Mueller-Muller PD:
-1
+1MMPDout = (xK-1 – xK)AK-1
F = (h0-h1)
= E[(xK-1 – xK)AK-1]
= E[xK-1AK-1 - xKAK-1]
Figure 3.13: Design and implementation of the speculative Mueller-Muller phase detector(MMPD)
Chapter 2 showed that the pulse response can be estimated using the samples xk, and
the recovered data, Ak [21]. From Equations 2.22 and 2.21, h0 and h1 can be estimated
by the expected values, E[xkAk] and E[xkAk−1], respectively. We substitute the expected
values into the MM function to transform the MM function into the MMPD. The loop
filter in the next block performs the expected value operation by averaging the MMPD
output.
Note that the expressions for pulse response are not unique. For example, according
to Equation 2.22, h0 is also equal to E[xk−1Ak−1]. In the implementation illustrated in
Figure 3.13, we can therefore choose h0 = E[xk−1Ak−1] so that Ak−1 can be factored
out of the expressions for h0 and h1. The DFE has some latency before it recovers Ak−1;
Chapter 3. A Blind Baud-Rate CDR 48
factoring out Ak−1 allows the subtraction to be performed before Ak−1 becomes available.
Since Ak−1 takes on only two values, +1 and -1, it only affects the sign of the MMPD.
In the PD implementation, subtraction is performed first and speculation is used for the
sign of Ak−1. The DFE’s recovered data and the PD output are ready at the same time,
thereby reducing latency in the CDR feedback loop and improving loop stability.
3.3.5. Decision-Feedback Equalizer
DFE Sum (2-tap)
xK
AK
AK-2AK-1
00
01
10
11
D Q
D Q
D Q
D Q
DFE LevelsC1
C2
AK-2AK-1
xK
xK+1
xk+7
AK
AK+1
AK+7
DFE Sum X8
DFE
SumD Q
D Q
D Q
DFE
Sum
DFE
Sum
(a) (b)
Figure 3.14: (a) A speculative 2-tap DFE and (b) the first stage of the parallel speculativeDFE that recovers 8 bits per cycle
The DFE compensates for post-cursor ISI from the channel and the I&D filter. As
can be seen from the pulse response in Figure 3.12, recovering data from an ideal channel
and 2UI I&D filter would require one DFE tap to equalize post-cursor h1, while a more
attenuative channel may require more taps. Three pipeline stages, operating at 625MHz,
resolve 16 bits in parallel – actually 15 to 17 bits to handle cases of frequency offset as
discussed in Section 3.2. DFE adaptation was not included in this design.
To recover 16 bits per clock cycle, 16 parallel DFE sum blocks are required. Spec-
ulation is used extensively to reduce latency in the CDR feedback loop. In each DFE
summation block shown in Figure 3.14a, the 2 DFE taps, C1 and C2, are manually set
Chapter 3. A Blind Baud-Rate CDR 49
and speculation is performed by subtracting the 4 possible levels from the interpolated
sample, xk. When the previous two bits Ak−1 and Ak−2 have been recovered, the mux
selects the correct Ak.
This speculation removes the adder from the critical path. However, the muxes remain
on the critical path since, in order to resolve all 16 bits, data must propagate through 16
muxes. However, at 625MHz, the data can only propagate through 8 muxes per cycle.
Figure 3.14b shows 8 DFE summation blocks that resolve 8 bits in one clock cycle. For
this reason, another stage of speculation was created.
The next stage speculates on the Ak−1 and Ak−2 inputs to the DFE Sum x8 blocks.
As shown in Figure 3.15, Ak−1 and Ak−2 drive the first 4 parallel DFE Sum x8 blocks in
a speculative structure which resolve bits Ak to Ak+7. The last two bits Ak+6 and Ak+7
of this first stage then drive a second set of 4 DFE Sum x8 blocks which resolve bits Ak+8
to Ak+15. In the end, the complete DFE has a latency of 3 cycles.
DFE
Sum
X8
0001
1011
AK-2AK-1
AK+7 AK+6
AK
AK+6
DFE
Sum
X8
0001
1011
AK+8
AK+14
AK+7 AK+15
xK
xK+6
xK+7
xK+8
xK+14
xK+15
Figure 3.15: The second stage of parallel speculative DFE that recovers 16 bits per cycle
Chapter 3. A Blind Baud-Rate CDR 50
Proportional
Gain
KP={0.25, 0.5, 0.75, 1}
Integral
Gain
KI={0, 0.25, 0.5, 0.75, 1}
Cyclic
Counter
KCYC=1/2048
Phase
Counter
KPC=1/32
Up/down signal
÷256
S
KSUM=16
Saturating
Counter
From
PD
16x11b 5bΦAVG
Figure 3.16: Loop filter with configurable proportional and integral gains
3.3.6. Loop Filter
The loop filter is a conventional proportional-integral controller as shown in Figure 3.16.
The parallel PD outputs are summed together and the result is scaled by configurable pro-
portional and integral gains. The saturating counter is sized to handle up to ±1900ppm
of frequency offset. At the output, the 5-bit phase counter produces the recovered CDR
phase as discrete φAV G values ranging from 0 to 31 which are fed back to the data
interpolator block, closing the CDR feedback loop.
3.4. Simulation and Measurement Results
This section shows, through simulation, that the feedback loop converges correctly, how
the system can be modified for a more attenuative channel, and simulated jitter tolerance
results. Next, the measured eye diagrams and measured jitter tolerance of the proposed
CDR are presented.
Figure 25 illustrates the loop dynamics by showing the transient signals in the loop
filter. When the system in Figure 3.3 starts up, it appears that the MMPD relies on
correctly recovered data to estimate phase and, at the same time, the DFE requires a
correct phase to recover the data. To verify that the feedback loop does not enter into
a deadlock, we have applied an input with 1000ppm of frequency offset so as to start
the loop with both phase and data errors. The proportional gain and saturating counter
Chapter 3. A Blind Baud-Rate CDR 51
−1000
0
1000Proportional Gain Output
−1000
0
1000Saturating Counter Output
−1
0
1Up/Down Signal
0
20
40Phase Output (φ
AVG)
0 1 2 3 40
1000
2000
Time (us)
Error Count
Figure 3.17: Simulated loop filter convergence with 1000ppm of frequency offset for PRBS-7.Signals correspond to nodes on the block diagram of Fig. 3.16
outputs are, respectively, the outputs of the proportional and integral paths in the loop
filter. The cycle-slipping causes the saturating counter to temporarily decrease, but the
saturating counter settles to a value corresponding to 1000ppm within 4µs. The up/down
signal increments or decrements φAV G. In steady state, φAV G increases from 0 to 31 and
wraps around in order to track the frequency offset. After 3µs, φAV G is close enough to
the center of the eye to recover the data correctly (i.e. no more bit errors).
Figure 3.17 illustrates the transient signals in the loop filter (Figure 3.16). The
simulation demonstrates the digital CDR locking to the received signal from Channel A
+ 2UI I&D and with 1000ppm of frequency offset. There is cycle slipping, however the
Chapter 3. A Blind Baud-Rate CDR 52
A+2UI I&D
B
A
Figure 3.18: Frequency response of channel models in simulation
1UI
I&DADC
Data
Interpolator\
xK
Channel
A
PRBS-7
Generator
10GHz
0UI 1UI0.5UI-1
0
1
0UI 1UI0.5UI0
16
31
0UI 1UI0.5UI-2048
0
2048
0UI 1UI0.5UI-128
0
128
ΦAVG
CKRX
2-tap
DFE
AK
MM
PD
Loop
Filter
1 + z-1
(TX RJ =
0.17 UIpp)(RX RJ =
0.23 UIpp)
Figure 3.19: Simulated eye diagrams using Channel A + 2UI I&D
proportional and integral paths settle to their steady state values in approximately 4µs.
Similarly, the bit errors stop occurring after 3µs.
As discussed in Section 3.1, the receiver relies on ISI to spread the pulse response
beyond 1UI. We demonstrate through simulation that the 1x blind CDR can work in
2 cases. In cases where the channel attenuation is low (i.e. there is not enough ISI
produced by the channel), the system relies on the 2UI I&D to produce the ISI. This
Chapter 3. A Blind Baud-Rate CDR 53
5-bit
ADC
Data
Interpolator\
xK
Channel
B
PRBS-7
Generator
10GHz
0UI 1UI0.5UI-1
0
1
0UI 1UI0.5UI0
16
31
0UI 1UI0.5UI-1024
0
1024
0UI 1UI0.5UI-1024
0
1024
ΦAVG
CKRX
20-tap
DFE
AK
MM
PD
Loop
Filter(TX RJ =
0.17 UIpp)(RX RJ =
0.23 UIpp)
Figure 3.20: Simulated eye diagrams using Channel B
situation is demonstrated in Figure 3.18 which shows the combined frequency response of
a low-attenuation Channel A followed by its associated 2UI I&D filter. In contrast, where
the channel is attenuative by itself (i.e. there is enough ISI produced by the channel),
the 2UI I&D is no longer needed to produce extra ISI. This situation is demonstrated
by Channel B in Figure 3.18. Simulations show that the 1x blind CDR works in both
of these cases. If the CDR will be used in applications with a wide variety of channels,
then, ideally, the front-end filter should be adaptive such that it increases the amount of
post-cursor ISI when the channel has less high-frequency loss. However, an adaptive filter
is beyond the scope of this work. The test chip, which is described later, demonstrates
only the first case (i.e. low-attenuation channel with 2UI I&D).
Figures 3.19 and 3.20 show the eye diagrams from simulations done in Simulink using
event-driven models [34]. The data source is 10Gb/s and has 0.17UIPP of random jitter.
Similarly, the blind receiver clock is simulated with 0.23UIPP of random jitter. The two
leftmost eye diagrams in Figure 3.19 show the data eye after Channel A and I&D. The
5-bit ADC quantizes the samples into discrete values from 0 to 31. The eyes are still
open because the analog 1UI I&D does not add much attenuation. The 1 + z−1 filter
adds further ISI and closes the eye. In order to obtain the eye diagrams in the digital
Chapter 3. A Blind Baud-Rate CDR 54
CDR, we break the feedback loop and set φAV G to 0.5UI. This forces the desired sample
halfway between the blind samples and the data interpolator produces the worst-case
interpolation error in this condition. The open eye after the DFE adder shows that the
data can be successfully recovered.
Figure 3.20 demonstrates that the system can recover the data with Channel B with-
out the I&D filter, however it requires a 20 tap DFE. This large number of taps is
necessary for Channel B because it introduces a long tail of ISI. This is not the case for
Channel A with the 2UI I&D because it produces far less ISI.
0.1
1
10
100000 1000000 10000000 10000000 1E+09
Jit
ter
To
lera
nc
e (
UIp
p)
Jitter Frequency
1.5" FR4 + 2UI I&D
16" FR4 (no I&D)
100kHz 1MHz 10MHz 100MHz 1GHz
Channel A + 2UI I&D Channel B
Figure 3.21: Simulated jitter tolerance results at 10Gb/s with a BER of 10−6
Figure 3.21 compares the simulated jitter tolerance for each of the two channels. The
simulation assumes a bit error rate (BER) of 10−6. The high-frequency jitter tolerance of
the system in Figure 3.20 (Channel B) is slightly below that of the system in Figure 3.19
(Channel A + 2UI I&D). We also note that the former has a lower CDR bandwidth
compared to the latter, which is caused by a lower PD gain. Compared to Channel A,
Chapter 3. A Blind Baud-Rate CDR 55
Channel B further spreads out the pulse response, which reduces the PD gain (i.e. the
slope of the MM function).
Process 65nm CMOS
Data Rate 10Gb/s
Supply 1.2V
ADC+Demux
Power
CDR Power
Clock Gen.
Power
109mW
112mW
83mW
Digital
CDR
5-bit
ADC
4:16 Demux
I&D
Clock
Generator
(420x645μm2)
(60x490μm2)
(400x490μm2)
(85x145μm2)
(150x260μm2) I&D Power 1.7mW
Total Power 306mW
Figure 3.22: Chip photo
The proposed receiver was implemented in Fujitsu’s 65nm CMOS process. Figure 3.22
is a photo of the test chip. The I&D, clock generator, and ADC are custom-design analog
blocks. The digital CDR was designed using Verilog RTL and implemented with standard
cell gates.
Figure 3.23 shows a simplified diagram of the measurement setup. The data source
is a PRBS-7 generator. A logic analyzer captures and stores digital waveforms from the
test chip (i.e. design-under-test or DUT). For jitter tolerance measurements, sinusoidal
jitter was applied to the transmitter clock.
Figure 3.24 shows the average ADC output when the I&D is given a DC input. On
one test chip, we observed that one of the interleaved front end blocks had a lower gain
compared to the other blocks as we varied the DC input. As discussed in Section 3.3,
the gain error is mostly caused by systematic clock skew. If left uncompensated, the
Chapter 3. A Blind Baud-Rate CDR 56
Test
Channel
PRBS-7
Generator
10GHz CK with
sinusoidal jitter
DUT
5GHz CKRX
Logic
Analyzer
I&D ADC CDRPRBS-7
Comp.
Figure 3.23: Measurement setup
-400 -200 0 200 4000
10
20
30
40
DC Input Voltage (mVpp Differential )
Avera
ge A
DC
Ou
tpu
t C
od
e
ADC 0
ADC 1
ADC 2
ADC 3
-400 -200 0 200 4000
10
20
30
40
DC Input Voltage (mVpp Differential )
Avera
ge A
DC
Ou
tpu
t C
od
e
ADC 0
ADC 1
ADC 2
ADC 3
After Skew Correction
Ave
rag
e A
DC
Ou
tpu
t C
od
e
-400 -200 0 200 400
DC Input Voltage
(mVpp Differential)
40
30
20
10
0
Before Skew Correction
-400 -200 0 200 400
40
30
20
10
0
DC Input Voltage
(mVpp Differential)
(a) (b)
Figure 3.24: Average ADC output given DC input (a) before and (b) after skew correction
skew will reduce the CDR’s jitter tolerance. Hence, the delay was manually adjusted the
delays in the clock generator. Figure 3.24b shows that the gain at the output of ADC 3
matches more closely with gain of the other interleaved blocks after skew correction.
The measurements were performed with a 48” SMA cable as the channel – its fre-
quency response is plotted in Figure 3.25. Figure 3.26a shows the data eye at the output
Chapter 3. A Blind Baud-Rate CDR 57
Figure 3.25: Measured channel frequency response
916mVPP
93.1ps
0 0.2 0.4 0.6 0.8 1.0
0
20
10
30
Sampling Phase (UI)
1U
I I&
D D
igita
l O
utp
ut
Channel + 1UI I&D + ADC
Eye DiagramChannel Eye Diagram
(a) (b)
Figure 3.26: Measured eye diagrams (a) after the channel and (b) after the ADC ADC
Chapter 3. A Blind Baud-Rate CDR 58
of the channel. Figure 3.26b shows the eye diagrams taken from the outputs of the in-
terleaved ADCs. It has been partially attenuated by the analog 1UI I&D. There is some
mismatch between the 4 interleaved analog front ends, but the digital CDR is able to
tolerate this as demonstrated in the jitter tolerance measurement.
0.01
0.1
1
10
100000 1000000 10000000 10000000
Jit
ter
To
lera
nc
e (
UIp
p)
Jitter Frequency (Hz)
Simulation (BER=1e-6)
-300ppm (TX slower than RX)
0ppm
300ppm
1000ppm (TX faster than RX)
XLAUI mask
100kHz 1MHz 10MHz 100MHz
Figure 3.27: Simulated and measured jitter tolerance results with 10Gb/s PRBS-7 input dataand BER of 10−6 and 10−12, respectively
The jitter tolerance was measured after skew correction and with a maximum BER of
10−12 at 10Gb/s. In Figure 3.27, we show the results given -300, 0, 300, and 1000 ppm of
frequency offset. A negative frequency offset means that the transmitter is slower than the
blind receiver clock (i.e. above baud-rate sampling). A positive frequency offset means
that the transmitter is faster than the blind receiver clock – this case is worse for jitter
tolerance since we are actually sampling slightly below baud-rate. During measurement,
we were able to push the frequency offset to 1000ppm with a slight degradation in jitter
tolerance.
In addition, the CDR model was simulated with the channel frequency response (as
Chapter 3. A Blind Baud-Rate CDR 59
in Figure 3.25) and 300ppm of frequency offset. Due to simulation time constraints,
the simulation assumes a maximum BER of 10−6. For this reason, the simulated jitter
tolerance is higher compared to the measured results. The jitter tolerance mask for XL-
Attachment-Unit-Interface (XLAUI) is also shown in Figure 3.27. Although the proposed
design did not specifically target Ethernet applications in the proposed design, the mask
is provided as a reference.
3.5. Summary
This chapter presents a 1x blind ADC-based CDR. The proposed architecture recovers
data by extending the channel pulse response so that the pulse amplitude is greater than
zero, no matter where the blind samples occur within a 1UI window. The receiver adds
controlled ISI to the pulse response through the use of an I&D block in the receiver
front end. The baud-rate design allows the CDR to operate at 10Gb/s given a 10GS/s
sampling rate.
The proposed design was fabricated in a 65nm CMOS process. The test chip success-
fully recovers 10Gb/s data with BER below 10−12. Jitter tolerance measurements show
that the CDR implementation can recover data with below-baud rate sampling – the
CDR operates with ±300ppm of frequency offset and a high-frequency jitter tolerance of
0.19UIPP.
4A Zero-Forcing Adaptive DFE
for an ADC-Based CDR
This chapter proposes a novel zero-forcing adaptive controller for a DFE in a digital
ADC-based CDR. Section 4.1 provides the concepts of the proposed adaptive controller.
Sections 4.2 and 4.3 describe the architecture and implementation details of the receiver,
respectively. Section 4.4 presents simulation results from Simulink models. At the time of
writing this thesis, the Simulink models and Verilog implementation have been completed.
However, the measurement results are left as future work.
4.1. Proposed DFE Adaptation
Sections 2.5.1 and 3.3.4 showed how samples on a pulse response can be calculated by
correlating samples of random data with recovered bits. The example pulse response
from Figure 2.29 is reproduced in Figure 4.1 for convenience. The MMPD described
in Section 3.3.4 uses this information to estimate phase error by subtracting two pulse
response samples (h0-h1). The MMPD output is processed by a loop filter and fed back
to the data interpolator to form the phase-tracking loop. This chapter shows that it is
possible to use a similar feedback loop to adapt the DFE coefficients.
Figure 4.2 illustrates a controller that adapts ”n” DFE coefficients. The data sample,
xk, is correlated with recovered bits, Ak−1 to Ak−n, to estimate pulse response samples.
The low-pass filters provide average values of the pulse samples, which are used as DFE
coefficients, c1 to cn. The n-tap DFE subtracts post-cursor ISI from the current sample,
60
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 61
h-1 = E[xK-1AK]
h0 = E[xKAK]h1 = E[xKAK-1]
h2 = E[xKAK-2]
Figure 4.1: ISI can be calculated by correlating sampled data (Ak, Ak−1, etc.) with recoveredbits (xk, xk−1, etc.)
xk and the decision block slices the DFE output to recover the binary data, Ak. The
bandwidth of the LPF is the main design parameter. It should be low enough to filter out
transient noise from the correlation terms and, at the same time, high enough to allow
the LPF to settle to the steady state values in reasonable time during receiver start-up.
c1
xk Ak
Shift
register
c2 cn
...
...
...
Ak-1...Ak-n
xkAk-nxkAk-2xkAk-1
n-tap DFE
LPF LPF LPF
Figure 4.2: Zero-forcing controller for n-tap DFE adaptation
This zero-forcing adaptive DFE architecture has two main advantages: scalability
and ease of design. The blocks in Figure 4.2 are easily scaled when n is increased. The
controller is also simpler compared to the ZF implementations in [30] and [13] since it
does not generate an error signal by subtracting the signals before and after the decision
block. Unlike the LMS adaptation in [1], the proposed architecture does not require a
reference (i.e. desired) signal and the feedback loop does not require a configurable gain
parameter.
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 62
4.2. Proposed Blind ADC-Based Receiver Architecture
Figure 4.3 shows the system diagram of the proposed blind receiver. The main com-
ponents are a 20GS/s, 3-bit ADC and a digital CDR with adaptive DFE. The ADC
oversamples the 10Gbps data signal by 2x. Compared to the 1x receiver from Chap-
ter 3, the oversampling reduces the anti-aliasing requirement from the analog front end,
increases the accuracy of the data interpolator in the digital CDR, and removes the need
to extend the pulse response through additional ISI. Hence, the oversampling allows us to
remove the 2UI I&D block from the receiver. The removal of the I&D block simplifies the
clock distribution, reduces the power consumed by the clock divider and pulse generator,
and removes the gain errors resulting from skew between interleaved clocks.
Channel
20GS/s
3-bit
ADC
Baud-rate CDR
with adaptive
DFE
Blind CKRX Digital blocks
Recovered
Data10Gbps Data
Figure 4.3: System diagram of proposed receiver with 3-bit ADC-based CDR and adaptiveDFE
Although the front end sampling rate is doubled, the overall ADC area and power
consumption is reduced by decreasing the number of bits from 5 to 3. If we assume
a simple flash ADC architecture, a 5-bit ADC sampling at baud-rate would require
31 comparisons per UI. In contrast, a 3-bit ADC sampling at 2x would only need 14
comparisons per UI.
The architecture of the baud-rate digital CDR, however, is mostly the same as the one
proposed in Chapter 3 (Figure 3.3). Hence, this chapter focuses only on DFE adaptation
and a few CDR blocks that were modified. The following paragraph explains how the 2x
ADC is interfaced with the 1x CDR.
The data interpolator at the input of the CDR creates baud-rate samples from the
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 63
2x samples. As shown in Figure 4.4, each pair of blind samples (0.5UI apart) are used
to calculate a desired sample in between them. The φAV G quantity tracks the center
of the data eye relative to the blind samples; edge samples are not computed. When
compared to a 2x digital CDR (e.g [24]), the baud-rate architecture reduces CDR power
consumption because no multipliers and adders are used to interpolate and equalize edge
samples.
2x blind samples
Skip
interpolation
Desired sampling
locations
Extra
interpolation
2x blind samples
0UI £ ΦAVG £ 0.5UI
0.5UI £ ΦAVG £ 1.0UI
0UI £ ΦAVG £ 0.5UI 0.5UI £ ΦAVG £ 1.0UI
Desired sampling
locations
(a)
(b)
1UIΦAVG
Figure 4.4: Data interpolator calculates sample at desired location from closest blind samples.(a) Negative or (b) positive frequency offsets result in occasional skipped or extra interpolatedsamples
A negative or positive frequency offset will result in the data interpolator skipping an
interpolation or inserting an extra interpolation in a similar way to the one described in
Section 3.2.
4.3. Proposed Digital CDR with Adaptive 2-tap DFE
Figure 4.5 shows the digital CDR and adaptive DFE. The 3-bit ADC data is demuxed
to 32 parallel samples at 625MHz. The CDR converts the samples to signed integers
before the input to the data interpolator. The phase tracking loop is the same as the one
described in Chapter 3, with two main differences: a different MMPD and configurable
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 64
phase offset coefficient, P.
Data
Interpolator
xk32x3b
5b
2-tap
DFE
MM
PD
(xk-2-xk)Ak-1
xkAk-1LP Filter
Digital
LF
xkAk-2
xk-2Ak-m-2 LP Filter
LP Filter
c2c1
Decision
block
S
4x8b
4x8b
1x8b
8b
8b
8b
8b
Ak
cm
PPhase offset adjustment:
¸4
KDIV=0.25 KSUM=16
3-bit
ADC
Convert
to signed
integer
KINT=16KADC=8
ΦAVG
Digital
CDR
Figure 4.5: Proposed digital CDR with adaptive DFE
Figure 4.5 also identifies the gains of the ADC, data interpolator, divider, and sum
blocks as KADC , KINT , KDIV , and KSUM . The ADC has a gain of 8 because it has a
resolution of 3 bits. The sum block adds together 16 parallel MMPD outputs and, there-
fore, has a gain of 16. The interpolator gain is discussed in Section 4.3.1. Accordingly,
the MM function is:
F = h−1 − h1 +P
KDIVKSUMKAV GKINT
(4.1)
When the CDR has locked to its steady state, we have the relation:
h1 = h−1 +P
KDIVKSUMKAV GKINT
(4.2)
The phase offset coefficient effectively shifts the CDR’s locking phase slightly to the
left (assuming a positive coefficient P), which, in turn, reduces the pre-cursor ISI, and
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 65
increases the post-cursor ISI. This takes advantage of the DFE’s ability to cancel the
latter, but not the former. In this work, P is manually set through test registers; in
future work, it may be possible to automatically optimize P for maximum eye opening.
The output of the phase coefficient adder is processed by the loop filter (see Sec-
tion 3.3.6) and fed back to the data interpolator. The interpolator implementation is
discussed in more detail in Section 4.3.1.
From Figure 4.5, the MMPD block also provides three correlation terms. The first
two are used to estimate the first and second DFE taps (c1 and c2). They are low-pass
filtered and the 8-bit coefficients are fed back to the DFE. The third correlation term
provides cm as an ISI monitor for off-chip measurement and optimization. The integer
”m” can be configured between values of -2 to 13 in order to observe 16 ISI taps.
The MMPD-based architecture in Figure 4.5 provides an advantage by decoupling the
phase-tracking and DFE adaptive feedback loops. In an Alexander-based or Hogge-based
phase-tracking CDR, the PD detects the data edges after decision feedback equaliza-
tion [23]. Hence, the DFE affects the CDR’s output phase. At the same time, the output
phase affects the DFE coefficients. In order to prevent the interaction from causing
instability, the DFE adaptive loop is usually implemented with much lower bandwidth
than the phase-tracking loop. However, the low DFE loop bandwidth will increase the
CDR’s start-up time. The MM-based architecture removes the interaction because the
MMPD locks to the unequalized eye – the DFE does not affect the phase-tracking loop.
Hence, the bandwidth of the DFE loop in an MMPD-based architecture can be increased
compared to DFE loop bandwidth in a Alexander-based or Hogge-based architecture.
4.3.1. Data Interpolator
The data interpolator architecture in Figure 4.6 has been modified for 2x blind samples;
otherwise it is the similar to the one presented in Chapter 3 (Figure 3.11). Note that the
worst case for interpolating between 2x blind samples occurs when φAV G is 0.25UI (i.e.
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 66
the desired sample is halfway between the 2x blind samples). In contrast, the worst case
for interpolating between 1x blind samples occurs when φAV G is 0.5UI (i.e. the desired
sample is halfway between the 1x blind samples).
Φ’AVG when 0 ≤ Φ’AVG < 0.25 UI
0.5×(1-2Φ’AVG) when 0.25 ≤ Φ’AVG ≤ 0.5 UI
Φ’AVG
a
bc
d
0.5((b-a) + (c-d))×Y(Φ’AVG)b×(1-2Φ’AVG) + c×2Φ’AVG
≈Φ’AVG
Φ’AVG
0.5UI
bc
Desired sample ≈
Y(Φ’AVG) =
Φ’AVG = mod(ΦAVG, 0.5UI)
Figure 4.6: Piecewise linear interpolation of desired sample from 2x blind samples
In the Verilog implementation, φAV G is represented by a 5-bit number. The most
significant bit of φAV G selects the pair of blind samples adjacent to the desired sample
(i.e. b and c). As shown in Figure 4.4, one pair is selected when 0UI ≤ φAV G ≤ 0.5UI
and the other when 0.5UI ≤ φAV G ≤ 1.0UI. The remaining 4 bits are substituted as φ′AV G
in the interpolation expression in Figure 4.6. For clarity, Figure 4.6 shows φ′AV G in terms
of UI, but φ′AV G is actually implemented as an integer between 0 to 15. Therefore, the
implemented interpolator has a gain of 16 (i.e. KINT=16).
One disadvantage of the proposed data interpolators (in this section and Section 3.3.3)
is that they have a phase-dependent frequency response as shown in Figure 4.7. The
frequency response of an ideal data interpolator has a flat magnitude; the interpolator
should only shift the phase of the data signal. The proposed 2x interpolator has a flat
magnitude only when φAV G is 0UI; in fact, its frequency response has a null at 10GHz
when φAV G is 0.25UI. In the time domain, the interpolator changes the pulse response
shape when φAV G 6=0UI. To compensate for this, the DFE should use phase-dependent
coefficients [1, 24]. The DFE architecture described in [1] and [24] stored 8 coefficients
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 67
for a 1-tap DFE. The disadvantage is the complexity and area required for storing and
adapting multiple coefficients for each DFE tap. However, this work neglects the pulse
shaping behaviour of the data interpolator because the magnitude of 2x interpolator’s
frequency responses are approximately flat up to the Nyquist frequency of 5GHz. Hence,
only one coefficient is implemented per tap. As we will see in Section 4.4, the DFE
adaptation converges to a coefficient that is approximately the average tap value over all
φAV G.
0
-10
-20
-30
-40
Inte
rpo
lato
r F
req
. R
es
po
ns
e (
dB
)
1GHz 10GHz5GHz
1x,
ΦAVG=0.52x,
ΦAVG=0.25
2x,
ΦAVG=0.125
2x, ΦAVG=0Nyquist freq. = 5GHz
Figure 4.7: Frequency responses of 1x and 2x data interpolators. Both interpolators operateon a 10Gbps data signal with a Nyquist frequency of 5GHz.
In Figure 4.7, we also observe a further advantage of 2x vs. 1x blind sampling. The
frequency response of the interpolator operating on 1x samples has a null at the Nyquist
frequency when φAV G=0.5UI. The system in Chapter 3 worked because the 2UI I&D
already has a null at 5GHz (see Figures 3.18 and 3.25), and, thus, the I&D mostly
masked the phase-dependent response of the interpolator. The CDR would fail if the
2UI I&D were removed because the 1x interpolator would change the pulse response
significantly. In that case, it would be necessary to implement phase-dependent DFE
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 68
coefficients. Therefore, the decision to use 2x oversampling has allowed us to save power
by removing the I&D and by using a simple DFE architecture.
4.3.2. Low-Pass Filter for DFE Adaptation
The low-pass filter (LPF) illustrated in Figure 4.8 is used to approximate the expected
value of the correlation terms from the MMPD. The LPF consists of a single integrator
in an internal feedback loop. A summer adds together a bus of 4 correlation terms at the
LPF input. If we needed faster DFE convergence, it is possible to sum together up to 16
correlation terms since the CDR processes 16 samples in parallel per cycle. However, a
larger adder would consume more power.
S
DQ
2b2b
D Q
X4
10b 11b13b, 14b,
or 15b
10b 8b
9b 8b
8b
4x8b
Up/down signal
Configurable
counterHysteresis to reduce
output toggling
Overflow
Detector
Hysteresis
Block D Q
Integrating
counter
c1 or c2
xkAk-1
or
xkAk-2
Figure 4.8: Low-pass filter for DFE coefficients
‘00’
= 0
0
1 2b 2b
2b
Hysteresis block
D Q
0
1
Up/down signal
{-1, 0, 1}
Figure 4.9: Hysteresis block implemented in low-pass filter
The configuration counter and overflow detector act as an adjustable divider that
produces an up/down signal having one of three values: 1, -1, or 0 (i.e. up, down, or
no change). The hysteresis block reduces toggling at the LPF output. As shown in
Figure 4.9, the register in the hysteresis block filters out the ”no change” signals and
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 69
stores only an up or down signal. If the hysteresis block receives a signal that is opposite
to the stored value, then the mux at the output of the hysteresis block forces the signal
to ”no change.” The filtered up/down signal at the output of the hysteresis block in
Figure 4.8 is integrated by a counter at the output of the LPF. The gain in the feedback
divides the output by 4; this is needed since the summer added together 4 terms at the
LPF input.
4.4. Simulation Results
This section presents the frequency and pulse responses of the channel models, DFE
adaptation curves, and simulated eye diagrams and jitter tolerance.
Figure 4.10 shows the frequency responses of the channel models used in simulation.
Channels C and D represent 1.5” and 8” traces on a FR4 board, respectively. The CDR
and DFE are demonstrated for three cases: Channel C at 5Gbps, Channel C at 10Gbps,
and Channel D at 10Gbps. The attenuation at the Nyquist frequency are, respectively,
5dB, 10dB, and 13dB.
0
-20
-40
-60
-80
Ch
an
ne
l F
req
. R
es
po
ns
e (
dB
)
100MHz 1GHz 10GHz
Channel D
Channel C
Figure 4.10: Frequency responses of channel models used in simulation
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 70
Figure 4.11 depicts the pulse responses of the channel models cascaded with the data
interpolator. The pulse responses are shown for two values of φAV G: 0UI and 0.25UI.
The pulses responses are normalized so that the amplitude of the eye diagram is 1. In
simulation, the offset coefficient, P, is chosen to be 77 because it shifts h0 near the peaks
of the pulse responses; hence the CDR locks at a position described by Equation 4.3.
Figure 4.11 shows the pulse response samples at the CDR lock position.
h1 = h−1 +P
KDIVKSUMKAV GKINT
h1 = h−1 + 0.15
(4.3)
Time (ns)
0 0.2 0.4 0.6 0.8 1.0
Channel C, 5Gbps, ΦAVG=0.0UI
Channel C, 5Gbps, ΦAVG=0.25UI
Channel C, 10Gbps, ΦAVG=0.0UI
Channel C, 10Gbps, ΦAVG=0.25UI
Channel D, 10Gbps, ΦAVG=0.0UI
Channel D, 10Gbps, ΦAVG=0.25UI
h-1
h0 h1 h2
h-1
h0
h1
h-1
h0 h1 h2 h2
h-1
h0 h1 h2 h2
h-1
h0
h1 h2 h2
h-1
h0
h1 h2 h2
Figure 4.11: Combined channel and interpolator pulse responses showing ISI tap values (h−1,h0, h1, h2, h3) when CDR has locked
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 71
Figure 4.12 and 4.13 show that transient output of the adaptation controller given
Channels C and D, respectively, at 10Gbps. Each figure demonstrates that c1 and c2
converge during CDR start-up even when initialized to different values (e.g. 0 or 30).
The coefficients settle in approximately 13µs.
Adapted c1 ≈ 22
Adapted c2 ≈ 8
Figure 4.12: Simulated DFE adaptation with Channel C at 10Gbps. DFE converges to samesteady-state values when given different initial coefficients (i.e. 0 and 30)
Adapted c1 ≈ 24
Adapted c2 ≈ 10
Figure 4.13: Simulated DFE adaptation with Channel D at 10Gbps. DFE converges to samesteady-state values when given different initial coefficients (i.e. 0 and 30)
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 72
Table 4.1 compares the adapted values, c1 and c2, to the pulse response samples, h1
and h2.
Table 4.1: Comparison of Adapted Coefficients (c1 and c2) vs. Pulse Response (h1 and h2)
𝑐1 𝑐1
𝐾𝐴𝐷𝐶𝐾𝐼𝑁𝑇
ℎ1, Φ𝐴𝑉𝐺 = 0UI
ℎ1, Φ𝐴𝑉𝐺 = 0.25UI
𝑐2 𝑐2
𝐾𝐴𝐷𝐶𝐾𝐼𝑁𝑇
ℎ2, Φ𝐴𝑉𝐺 = 0UI
ℎ2, Φ𝐴𝑉𝐺 = 0.25UI
Channel C, 5Gbps
16 0.125 0.151 0.102 3 0.023 0.045 0.039
Channel C, 10Gbps
22 0.172 0.175 0.170 8 0.063 0.062 0.059
Channel D, 10Gbps
24 0.188 0.197 0.193 10 0.078 0.081 0.079
Figure 4.14 depicts a CDR model used to simulate the eye diagrams in Figures 4.15,
4.16, and 4.17. The c1 and c2 coefficients are set to the values in Table 4.1 and φAV G is
forced to either 0UI (no interpolation) or 0.25UI (worst-case interpolation).
Data
Interpolator
ΦAVG
2-tap DFE
Digital
LF
c2c1
Decision
block
S
77
¸4
Data
Signal
3-bit
ADC
MMPD
Ak
ADC
Output
Interpolator
Output (xk)
DFE
Output
(TX RJ =
0.17 UIpp)
(RX RJ =
0.23 UIpp)
CKRX
Figure 4.14: Simplified diagram of CDR model used for eye diagram simulations
Figure 4.18 shows the simulated jitter tolerance of the receiver with a PRBS-31 data
source and bit error rate (BER) of 10−6. The ADC is modeled as an ideal 3-bit ADC.
The data source and blind receiver clocks are simulated with 0.17UIPP and 0.23UIPP of
random jitter, respectively.
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 73
1.0
-1.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
7
0
1
2
3
4
5
6
0.0 0.2 0.4 0.6 0.8 1.0
Phase (UI)
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0UI) DFE Output (ΦAVG=0UI)
Data Signal ADC Output
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0.25UI) DFE Output (ΦAVG=0.25UI)
Phase (UI)
Figure 4.15: Simulated eye diagrams with 5Gbps data and Channel C. Eye diagrams corre-spond to signals in Figure 4.14
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 74
1.0
-1.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
7
0
1
2
3
4
5
6
0.0 0.2 0.4 0.6 0.8 1.0
Phase (UI)
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0UI) DFE Output (ΦAVG=0UI)
Data Signal ADC Output
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0.25UI) DFE Output (ΦAVG=0.25UI)
Phase (UI)
Figure 4.16: Simulated eye diagrams with 10Gbps data and Channel C. Eye diagrams corre-spond to signals in Figure 4.14
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 75
1.0
-1.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
7
0
1
2
3
4
5
6
0.0 0.2 0.4 0.6 0.8 1.0
Phase (UI)
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0UI) DFE Output (ΦAVG=0UI)
Data Signal ADC Output
100
-100
0
100
-100
0
Interpolator Output (ΦAVG=0.25UI) DFE Output (ΦAVG=0.25UI)
Phase (UI)
Figure 4.17: Simulated eye diagrams with 10Gbps data and Channel D. Eye diagrams corre-spond to signals in Figure 4.14
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 76
0.1
1
10
1.0E+05 1.0E+06 1.0E+07 1.0E+08 1.0E+09
Jit
ter
To
lera
nc
e (
UIp
p)
Jitter Frequency
Channel C, 5Gbps
Channel C, 10Gbps
Channel D, 10Gbps
100kHz 1MHz 10MHz 100MHz 1GHz
Figure 4.18: Simulated jitter tolerance of proposed receiver
Chapter 4. A Zero-Forcing Adaptive DFE for an ADC-Based CDR 77
4.5. Conclusion
This chapter presented novel zero-forcing adaptive controller for the DFE coefficients in
the 10Gbps digital CDR presented in Chapter 3. In order to reduce power consumption,
the 2UI I&D was removed from the receiver, oversampling was increased from 1x to 2x,
and the ADC resolution was decreased from 5 bits to 3 bits. Simulations show that the
adaptive DFE converges within 13µs to the ISI taps on the pulse response of the data
signal. The simulated high-frequency jitter tolerance is about 0.28UIPP when given a 8”
FR4 channel.
5 Conclusion
5.1. Thesis Contributions
This thesis provided a background and comparison of the different types of equalizers,
adaptive equalizer controllers, and clock-and-data recovery blocks.
A novel 1x blind ADC-based CDR was developed. The proposed receiver recovers
data by extending the channel pulse response so that the pulse amplitude is greater than
zero, no matter where the blind samples occur within a 1UI window. An I&D block in
the receiver front end extends the pulse response by adding controlled ISI. The baud-rate
design allows the CDR to operate at 10Gb/s given a 10GS/s sampling rate.
The proposed design was fabricated in a 65nm CMOS process. The test chip success-
fully recovers 10Gb/s data with BER below 10−12. Jitter tolerance measurements show
that the CDR implementation can recover data with below-baud rate sampling – the
CDR operates with ±300ppm of frequency offset and a high-frequency jitter tolerance of
0.19UIPP.
Next, a zero-forcing adaptive DFE controller was developed for the digital baud-rate
CDR. In order to reduce receiver power consumption, the 2UI I&D was removed from
the receiver, oversampling was increased from 1x to 2x, and the ADC resolution was
decreased from 5 bits to 3 bits. Simulations show that the adaptive DFE converges
within 13µs to the ISI taps on the pulse response of the data signal. The simulated
78
Chapter 5. Conclusion 79
high-frequency jitter tolerance is about 0.28UIPP when given a 8” FR4 channel. A test
chip was taped out August 2013.
The contributions include:
• Proposal of a blind baud-rate ADC-based CDR,
• Implementation of the CDR (I&D design borrowed from previous tapeout by Tina
Tahmoureszadeh, modified and implemented for the proposed design by Joshua
Liang),
• A paper presented at ISSCC 2013 [29],
• A paper accepted for publication in JSSC to appear in the Dec. 2013 issue,
• Implementation of the adaptive DFE (ADC design done by Sadegh Jalali).
I would also like to acknowledge the help of Joshua Liang with the measurement of
the blind baud-rate CDR.
5.2. Future Work
One aspect of the future work (i.e. DFE adaptation) has been discussed in detail in
Chapter 4. There are four other advances that can be made to this work and they will
be described in the following sections.
5.2.1. Implementation of a Fully Feed-Forward Blind Baud-Rate CDR
As noted in Section 3.2, one of the disadvantages of the proposed CDR is the 7-cycle
(112UI) feedback loop latency. The long loop latency limits the CDR bandwidth.
A feed-forward architecture is unconditionally stable [32, 36]. A future enhancement
would implement a feed-forward version of the blind baud-rate CDR. This would require
research on an appropriate baud-rate PD that can operate without feedback.
Chapter 5. Conclusion 80
5.2.2. Evaluation of Phase-Dependent DFE for Data Interpolators
Section 4.3.1 discussed the possibility of implementing phase-dependent DFE coefficients
to compensate for the data interpolator’s phase-dependent response. One future task
would be to evaluate the performance benefits against the area and power cost of the
extra coefficients.
5.2.3. Adaptive Optimization of Offset Coefficient in MMPD
Section 4.3 described a new coefficient, P , which is summed with the MMPD output in
order to shift the sampling phase, φAV G. In this work, P is manually assigned a value
such that the main tap, h0, is sampled near the peak of the pulse response. One future
enhancement is to add an adaptive controller to optimize P for a range of channels.
5.2.4. Calibration of I&D and ADC Front End
The interleaved analog front end blocks described in Chapter 3 did not include any
adaptive calibration. Only manual calibration for clock skew was implemented. The
jitter tolerance can likely be improved by the addition of adaptive calibration for gain,
offset, and timing mismatch in the interleaved I&D and ADC blocks.
References
[1] B. Abiri, A. Sheikholeslami, H. Tamura, and M. Kibune. An Adaptation Engine for
a 2x Blind ADC-Based CDR in 65 nm CMOS. Solid-State Circuits, IEEE Journal
of, 46(12):3140 –3149, dec. 2011.
[2] B. Abiri, R. Shivnaraine, A. Sheikholeslami, H. Tamura, and M. Kibune. A 1-to-
6Gb/s phase-interpolator-based burst-mode CDR in 65nm CMOS. In Solid-State
Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International,
pages 154 –156, feb. 2011.
[3] O.E. Agazzi, M.R. Hueda, D.E. Crivelli, H.S. Carrer, A. Nazemi, G. Luna, F. Ramos,
R. Lopez, C. Grace, B. Kobeissy, C. Abidin, M. Kazemi, M. Kargar, C. Marquez,
S. Ramprasad, F. Bollo, V. Posse, S. Wang, G. Asmanis, G. Eaton, N. Swenson,
T. Lindsay, and P. Voois. A 90 nm CMOS DSP MLSD Transceiver With Integrated
AFE for Electronic Dispersion Compensation of Multimode Optical Fibers at 10
Gb/s. Solid-State Circuits, IEEE Journal of, 43(12):2939–2957, 2008.
[4] Marco V. Barbera, Sokol Kosta, Alessandro Mei, and Julinda Stefa. To offload or
not to offload? The bandwidth and energy costs of mobile cloud computing. In
INFOCOM, 2013 Proceedings IEEE, pages 1285–1293, 2013.
[5] Jan Bergmans. Digital baseband transmission and recording. Kluwer Academic
Publishers, Boston, 1996.
81
References 82
[6] Jun Cao, Sui Huang, and M.M. Green. Non-idealities in linear CDR phase detectors.
In Circuit Theory and Design (ECCTD), 2011 20th European Conference on, pages
158–161, 2011.
[7] Jun Cao, Bo Zhang, U. Singh, Delong Cui, A. Vasani, A. Garg, Wei Zhang, N. Ko-
caman, Deyi Pi, B. Raghavan, Hui Pan, I. Fujimori, and A. Momtaz. A 500 mW
ADC-Based CMOS AFE With Digital Calibration for 10 Gb/s Serial Links Over KR-
Backplane and Multimode Fiber. Solid-State Circuits, IEEE Journal of, 45(6):1172–
1185, 2010.
[8] E-Hung Chen, Jihong Ren, B. Leibowitz, Hae-Chang Lee, Qi Lin, Kyung Oh,
F. Lambrecht, V. Stojanovic, J. Zerbe, and C.-K.K. Yang. Near-Optimal Equal-
izer and Timing Adaptation for I/O Links Using a BER-Based Metric. Solid-State
Circuits, IEEE Journal of, 43(9):2144–2156, 2008.
[9] S. Dey. Cloud Mobile Media: Opportunities, challenges, and directions. In Comput-
ing, Networking and Communications (ICNC), 2012 International Conference on,
pages 929–933, 2012.
[10] Y. Doi, T. Shibasaki, T. Danjo, W. Chaivipas, T. Hashida, H. Miyaoka, M. Hoshino,
Y. Koyanagi, T. Yamamoto, S. Tsukamoto, and H. Tamura. 32Gb/s data-
interpolator receiver with 2-tap DFE in 28nm CMOS. In Solid-State Circuits Con-
ference Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 36–37,
2013.
[11] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Col-
man, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Kil-
lips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson,
A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth. A 12.5Gb/s
SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization
References 83
and Clock Recovery. In Solid-State Circuits Conference, 2007. ISSCC 2007. Digest
of Technical Papers. IEEE International, pages 436 –591, Feb. 2007.
[12] Yasuo Hidaka. 10-20Gb/s+ Equalizer Design for Electrical Channel with 40dB+
Loss. In ATAC Technical Forum F-3, 10-40 Gb/s I/O Design for Data Communi-
cations, International Solid State Circuits Conference, Feb 2012.
[13] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yamaguchi,
H. Takauchi, H. Ishida, K. Gotoh, and Hirotaka Tamura. A 5-6.4-Gb/s 12-channel
transceiver with pre-emphasis and equalization. Solid-State Circuits, IEEE Journal
of, 40(4):978–985, 2005.
[14] A.K. Joy, H. Mair, Hae-Chang Lee, A. Feldman, C. Portmann, N. Bulman, E.C.
Crespo, P. Hearne, P. Huang, B. Kerr, P. Khandelwal, F. Kuhlmann, S. Lytollis,
J. Machado, C. Morrison, S. Morrison, S. Rabii, D. Rajapaksha, V. Ravinuthula,
and G. Surace. Analog-DFE-based 16Gb/s SerDes in 40nm CMOS that operates
across 34dB loss channels at Nyquist with a baud rate CDR and 1.2Vpp voltage-
mode driver. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2011 IEEE International, pages 350 –351, Feb. 2011.
[15] Andy Joy. (What is so Hard About) SerDes Design Challenges for 20Gb/s+ Data
Rates over Electrical Backplanes? In ATAC Technical Forum F-3, 10-40 Gb/s I/O
Design for Data Communications, International Solid State Circuits Conference,
Feb 2012.
[16] Wang-Soo Kim, Chang-Kyung Seong, and Woo-Young Choi. A 5.4Gb/s adaptive
equalizer using asynchronous-sampling histograms. In Solid-State Circuits Confer-
ence Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 358–359,
2011.
References 84
[17] Jri Lee, K.S. Kundert, and B. Razavi. Analysis and modeling of bang-bang clock
and data recovery circuits. Solid-State Circuits, IEEE Journal of, 39(9):1571 – 1580,
sept. 2004.
[18] Mike Li. Jitter, Noise, and Signal Integrity at High-Speed. Prentice Hall, Upper
Saddle River, NJ, 2008.
[19] S.M. Louwsma, A. J M Van Tuijl, M. Vertregt, and B. Nauta. A 1.35 GS/s, 10 b,
175 mW Time-Interleaved AD Converter in 0.13 um CMOS. Solid-State Circuits,
IEEE Journal of, 43(4):778–786, 2008.
[20] A. Momtaz and M.M. Green. An 80 mW 40 Gb/s 7-Tap T/2-Spaced Feed-Forward
Equalizer in 65 nm CMOS. Solid-State Circuits, IEEE Journal of, 45(3):629–639,
2010.
[21] K. Mueller and M. Muller. Timing Recovery in Digital Synchronous Data Receivers.
Communications, IEEE Transactions on, 24(5):516 – 531, May 1976.
[22] J. Nakagawa, M. Nogami, N. Suzuki, M. Noda, S. Yoshima, and H. Tagami. 10.3-
Gb/s Burst-Mode 3R Receiver Incorporating Full AGC Optical Receiver and 82.5-
GS/s Over-Sampling CDR for 10G-EPON Systems. Photonics Technology Letters,
IEEE, 22(7):471–473, 2010.
[23] Massimo Pozzoni, Simone Erba, Paolo Viola, Matteo Pisati, Emanuele Depaoli,
Davide Sanzogni, Riccardo Brama, Daniele Baldi, Matteo Repossi, and Francesco
Svelto. DFE Receiver With a SSC Tolerant CDR for Serial Backplane Communica-
tion. Architecture, 44(4):1306–1315, 2009.
[24] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, Hirotaka Tamura, and M. Ki-
bune. A 5Gb/s speculative DFE for 2x blind ADC-based receivers in 65-nm CMOS.
In VLSI Circuits (VLSIC), 2010 IEEE Symposium on, pages 69–70, 2010.
References 85
[25] P. Schvan, J. Bach, C. Fait, P. Flemke, R. Gibbins, Y. Greshishchev, N. Ben-Hamida,
D. Pollex, J. Sitch, Shing-Chi Wang, and J. Wolczanski. A 24GS/s 6b ADC in 90nm
CMOS. In Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical
Papers. IEEE International, pages 544–634, 2008.
[26] F. Spagna, Lidong Chen, M. Deshpande, Yongping Fan, D. Gambetta, S. Gowder,
S. Iyer, R. Kumar, P. Kwok, R. Krishnamurthy, Chien chun Lin, R. Mohanavelu,
R. Nicholson, J. Ou, M. Pasquarella, K. Prasad, H. Rustam, L. Tong, A. Tran, J. Wu,
and Xuguang Zhang. A 78mW 11.8Gb/s serial link transceiver with adaptive RX
equalization and baud-rate CDR in 32nm CMOS. In Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 366 –367, feb.
2010.
[27] N. Suzuki, K. Nakura, S. Kozaki, H. Tagami, M. Nogami, and J. Nakagawa. 82.5
Gsample/s (10.3125 GHz X 8 phase clocks) burst-mode CDR for 10G-EPON sys-
tems. Electronics Letters, 45(24):1261–1263, 2009.
[28] T. Tahmoureszadeh, S. Sarvari, A. Sheikholeslami, Hirotaka Tamura, Y. Tomita,
and M. Kibune. A combined anti-aliasing filter and 2-tap FFE in 65-nm CMOS for
2x blind 2-10 Gb/s ADC-based receivers. In Custom Integrated Circuits Conference
(CICC), 2010 IEEE, pages 1–4, 2010.
[29] Clifford Ting, Joshua Liang, Ali Sheikholeslami, Masaya Kibune, and Hirotaka
Tamura. A blind baud-rate ADC-based CDR. In Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2013 IEEE International, pages 122–123, 2013.
[30] Y. Tomita, M. Kibune, J. Ogawa, W.W. Walker, H. Tamura, and T. Kuroda. A
10-Gb/s receiver with series equalizer and on-chip ISI monitor in 0.11- mu;m CMOS.
Solid-State Circuits, IEEE Journal of, 40(4):986 – 993, april 2005.
References 86
[31] Y. Tomita, H. Yamaguchi, S. Kawahara, T. Higuchi, T. Yamamoto, H. Ishida, K. Go-
toh, and H. Tamura. A 0.12mm2 5Gbps receiver with a level shifting equalizer and
a cumulative-histogram-based adaptation engine. In VLSI Circuits (VLSIC), 2011
Symposium on, pages 86 –87, june 2011.
[32] O. Tyshchenko, A. Sheikholeslami, H. Tamura, M. Kibune, H. Yamaguchi, and
J. Ogawa. A 5-Gb/s ADC-Based Feed-Forward CDR in 65 nm CMOS. Solid-State
Circuits, IEEE Journal of, 45(6):1091 –1098, June 2010.
[33] O. Tyshchenko, A. Sheikholeslami, H. Tamura, Y. Tomita, H. Yamaguchi, M. Ki-
bune, and T. Yamamoto. A fractional-sampling-rate ADC-based CDR with feed-
forward architecture in 65nm CMOS. In Solid-State Circuits Conference Digest of
Technical Papers (ISSCC), 2010 IEEE International, pages 166 –167, Feb. 2010.
[34] M. van Ierssel, H. Yamaguchi, A. Sheikholeslami, Hirotaka Tamura, and W.W.
Walker. Event-Driven Modeling of CDR Jitter Induced by Power-Supply Noise,
Finite Decision-Circuit Bandwidth, and Channel ISI. Circuits and Systems I: Reg-
ular Papers, IEEE Transactions on, 55(5):1306–1315, 2008.
[35] S. Verma, A. Kasapi, Li min Lee, D. Liu, D. Loizos, Song-Hee Paik, A. Varzaghani,
S. Zogopoulos, and S. Sidiropoulos. A 10.3GS/s 6b flash ADC for 10G Ethernet
applications. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2013 IEEE International, pages 462–463, 2013.
[36] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto,
K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito,
H. Ishida, and K. Gotoh. A 5Gb/s transceiver with an ADC-based feed-forward CDR
and CMA adaptive equalizer in 65nm CMOS. In Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), 2010 IEEE International, pages 168 –169, Feb.
2010.
References 87
[37] Bo Zhang, Ali Nazemi, Adesh Garg, Namik Kocaman, Mahmoud Reza Ahmadi,
Mehdi Khanpour, Heng Zhang, Jun Cao, and Afshin Momtaz. A 195mW / 55mW
dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40nm
CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2013 IEEE International, pages 34–35, 2013.