equalization and clock recovery for a 2.5-10-gb/s 2-pam/4

10
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003 2121 Equalization and Clock Recovery for a 2.5–10-Gb/s 2-PAM/4-PAM Backplane Transceiver Cell Jared L. Zerbe, Member, IEEE, Carl W. Werner, Member, IEEE, Vladimir Stojanovic, Member, IEEE, Fred Chen, Member, IEEE, Jason Wei, Member, IEEE, Grace Tsang, Member, IEEE, Dennis Kim, Member, IEEE, William F. Stonecypher, Andrew Ho, Member, IEEE, Timothy P. Thrush, Ravi T. Kollipara, Member, IEEE, Mark A. Horowitz, Fellow, IEEE, and Kevin S. Donnelly, Member, IEEE Abstract—A folded multitap transmitter equalizer and multitap receiver equalizer counteract the losses and reflections present in the backplane environment. A flexible 2-PAM/4-PAM clock data recovery circuit uses select transitions for receive clock recovery. Bit-error rate less than 10 and power equal to 40 mW/Gb/s has been measured when operating over a 20-in backplane with two connectors at 10 Gb/s. Index Terms—Adaptive equalizers, decision feedback equal- izers, multilevel systems, pulse amplitude modulation, SerDes, serial links, transceivers. I. INTRODUCTION A. Backplane Environment T HE backplane is a complex environment consisting of many components and represents a serious challenge to signaling rates above 5 Gb/s. As shown in Fig. 1, the signal path includes over 11 different components, each of which has its own impedance variations. In addition, there are up to ten vias in the signal path, each having both a through and stub component, each thus presenting an additional potential impedance discontinuity and resonant pole. As a result, the transfer functions (S21s) of channels in this environment vary significantly, as can be seen in Fig. 2. At Nyquist frequencies below 2 GHz, there are some channel differences but the presence of vias and impedance discontinuities does not have a significant impact. Above 2 GHz, channels vary significantly depending on the signaling layer (and thus the thru/stub ratio of the via), the trace length (and thus the skin and dielectric loss), and the dielectric material. Achieving high data rates across this variance of channel behaviors presents a significant challenge for high-speed serial links. Often architectures which can achieve 10-Gb/s data rates with newer materials and connectors have also demonstrated operation in older legacy backplane environments at rates up to 6 Gb/s, thus demonstrating the similarity of the two problems. A significant group of 10-Gb/s transceivers [1], however, were not designed for this harsh electrical environment and thus are often improperly suited for the variety of difficulties it presents. Manuscript received April 12, 2003; revised June 25, 2003. J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, and K. S. Donnelly are with Rambus Inc., Los Altos, CA 94022 USA (e-mail: [email protected]). M. A. Horowitz is with Stanford University, Stanford, CA 94305 USA. Digital Object Identifier 10.1109/JSSC.2003.818572 B. Worst Case Sequence As can be seen in the raw single-bit response of Fig. 3, a single 200-ps pulse undergoes both serious loss and dispersion when sent down a backplane channel. In addition, it initiates reflections that can be a significant percentage of an equalized eye. Fig. 3 (inset) shows a zoom of the reflections plotted on a scale roughly equivalent to a single 4-PAM eye after transmit equalization. Because a transmit equalizer functions by atten- uating the lower frequency components while operating in a peak-power constrained environment, the single-bit response is smaller after transmit equalization even though the intersymbol interference (ISI) has been reduced. The total usable amplitude shown in Fig. 3 after equalization is slightly smaller than , which is the distance between the peak sample and the next sample of the raw pulse response. While the magnitude of any of the individual channel reflections may not appear significant when compared with the equalized eye height, the complete set of reflections can quickly become significant when combined in a worst case sequence. In such a sequence, the polarity of each of the sequence of bits is set so that all of the reflections sum in the same direction onto a single victim bit. As there can be en- croachment on an eye from either side of the voltage extremes, there are two such sequences for a 2-PAM eye, and six such se- quences for a 4-PAM eye. The magnitude and importance of the worst case sequences can be seen in Fig. 4. In Fig. 4(b), 2000 symbols of a simple 2-PAM pseudorandom bit sequence (PRBS) is run across a channel at 6.4 Gb/s. This sequence is then followed by the two worst case sequences (plotted in bold) for this particular channel, and the total result is then folded into an eye. The worst case sequences appear as encroachments into the eye sample point, and cause a readily discernable degradation in voltage margin at the sample point. Fig. 4(a) shows the probability distribution function (PDF) plotted on log scale of the distributions of the waveform voltages at the sample point. The PDF was calculated using the technique of [2]. This PDF can then be viewed as the probability that a given voltage margin at the sample point will occur. There is good alignment, as shown, between the upper encroaching worst case sequence sample voltage and the PDF voltage at 10 and also the mean of the eye and the peak of the PDF. It is interesting to note that the PDF distributions show smooth and continuous nonzero tails, which, while bounded, indicate that it will be extremely difficult to rely on coding to minimize the impact of these worst case sequences. If coding were to be used to attempt to 0018-9200/03$17.00 © 2003 IEEE

Upload: others

Post on 22-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003 2121

Equalization and Clock Recovery for a 2.5–10-Gb/s2-PAM/4-PAM Backplane Transceiver Cell

Jared L. Zerbe, Member, IEEE, Carl W. Werner, Member, IEEE, Vladimir Stojanovic, Member, IEEE,Fred Chen, Member, IEEE, Jason Wei, Member, IEEE, Grace Tsang, Member, IEEE, Dennis Kim, Member, IEEE,

William F. Stonecypher, Andrew Ho, Member, IEEE, Timothy P. Thrush, Ravi T. Kollipara, Member, IEEE,Mark A. Horowitz, Fellow, IEEE, and Kevin S. Donnelly, Member, IEEE

Abstract—A folded multitap transmitter equalizer and multitapreceiver equalizer counteract the losses and reflections present inthe backplane environment. A flexible 2-PAM/4-PAM clock datarecovery circuit uses select transitions for receive clock recovery.Bit-error rate less than 10 15 and power equal to 40 mW/Gb/s hasbeen measured when operating over a 20-in backplane with twoconnectors at 10 Gb/s.

Index Terms—Adaptive equalizers, decision feedback equal-izers, multilevel systems, pulse amplitude modulation, SerDes,serial links, transceivers.

I. INTRODUCTION

A. Backplane Environment

T HE backplane is a complex environment consisting ofmany components and represents a serious challenge to

signaling rates above 5 Gb/s. As shown in Fig. 1, the signalpath includes over 11 different components, each of whichhas its own impedance variations. In addition, there are up toten vias in the signal path, each having both a through andstub component, each thus presenting an additional potentialimpedance discontinuity and resonant pole. As a result, thetransfer functions (S21s) of channels in this environment varysignificantly, as can be seen in Fig. 2. At Nyquist frequenciesbelow 2 GHz, there are some channel differences but thepresence of vias and impedance discontinuities does not havea significant impact. Above 2 GHz, channels vary significantlydepending on the signaling layer (and thus the thru/stub ratio ofthe via), the trace length (and thus the skin and dielectric loss),and the dielectric material. Achieving high data rates across thisvariance of channel behaviors presents a significant challengefor high-speed serial links. Often architectures which canachieve 10-Gb/s data rates with newer materials and connectorshave also demonstrated operation in older legacy backplaneenvironments at rates up to 6 Gb/s, thus demonstrating thesimilarity of the two problems. A significant group of 10-Gb/stransceivers [1], however, were not designed for this harshelectrical environment and thus are often improperly suited forthe variety of difficulties it presents.

Manuscript received April 12, 2003; revised June 25, 2003.J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim,

W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, and K. S. Donnelly arewith Rambus Inc., Los Altos, CA 94022 USA (e-mail: [email protected]).

M. A. Horowitz is with Stanford University, Stanford, CA 94305 USA.Digital Object Identifier 10.1109/JSSC.2003.818572

B. Worst Case Sequence

As can be seen in the raw single-bit response of Fig. 3, asingle 200-ps pulse undergoes both serious loss and dispersionwhen sent down a backplane channel. In addition, it initiatesreflections that can be a significant percentage of an equalizedeye. Fig. 3 (inset) shows a zoom of the reflections plotted on ascale roughly equivalent to a single 4-PAM eye after transmitequalization. Because a transmit equalizer functions by atten-uating the lower frequency components while operating in apeak-power constrained environment, the single-bit response issmaller after transmit equalization even though the intersymbolinterference (ISI) has been reduced. The total usable amplitudeshown in Fig. 3 after equalization is slightly smaller than ,which is the distance between the peak sample and the nextsample of the raw pulse response. While the magnitude of anyof the individual channel reflections may not appear significantwhen compared with the equalized eye height, the complete setof reflections can quickly become significant when combined ina worst case sequence. In such a sequence, the polarity of eachof the sequence of bits is set so that all of the reflections sum inthe same direction onto a single victim bit. As there can be en-croachment on an eye from either side of the voltage extremes,there are two such sequences for a 2-PAM eye, and six such se-quences for a 4-PAM eye.

The magnitude and importance of the worst case sequencescan be seen in Fig. 4. In Fig. 4(b), 2000 symbols of a simple2-PAM pseudorandom bit sequence (PRBS) is run across achannel at 6.4 Gb/s. This sequence is then followed by thetwo worst case sequences (plotted in bold) for this particularchannel, and the total result is then folded into an eye. Theworst case sequences appear as encroachments into the eyesample point, and cause a readily discernable degradationin voltage margin at the sample point. Fig. 4(a) shows theprobability distribution function (PDF) plotted on log scaleof the distributions of the waveform voltages at the samplepoint. The PDF was calculated using the technique of [2]. ThisPDF can then be viewed as the probability that a given voltagemargin at the sample point will occur. There is good alignment,as shown, between the upper encroaching worst case sequencesample voltage and the PDF voltage at 10and also the meanof the eye and the peak of the PDF. It is interesting to note thatthe PDF distributions show smooth and continuous nonzerotails, which, while bounded, indicate that it will be extremelydifficult to rely on coding to minimize the impact of theseworst case sequences. If coding were to be used to attempt to

0018-9200/03$17.00 © 2003 IEEE

2122 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

Fig. 1. Backplane signaling environment.

Fig. 2. Backplane transfer functions showing variations between channels,each with 20-in backplane traces. Top and bottom layers, 300-mil-thick FR-4and Nelco-6000 backplanes. Nelco-6000 top layer via was counter-bored to 100mils.

Fig. 3. 200-ps pulse response for a 20-in FR4 backplane trace using GbXconnectors, 3-in linecards, and 100 mil vias showing dispersion and attenuationof the main pulse with associated reflections. The inset shows the size of thereflections on a scale equivalent to an equalized data eye. Each dot is a symbolsample point.

eliminate or minimize the worst case sequence, there wouldbe another sequence with nearly the same voltage marginsright behind it. As the PDF shows, there are simply too manyadjacent sequences with such nearly identical properties for an

efficient code to eliminate, thus the link must actively cancelreflections if it is to minimize their impact on system margins.

In summary, the backplane environment is quite complex dueto the number of different elements in the signal path, and thefundamental difficulties in achieving higher performance areloss and reflections. There are often significant reflections inbackplane traces which can cause serious degradation in worstcase margins; these have shown themselves to be difficult tocode around. Such constraints set the design environment forperformance backplane links.

II. DESIGN

A. 2-PAM/4-PAM Modes

The use of multilevel signaling to achieve higher bandwidthin high-loss systems is well understood [3]–[5]. Any systemwhich has 10 dB of loss difference between the 2-PAM and4-PAM Nyquist fundamental frequencies would be likely tobenefit from 4-PAM signaling. This can be understood froma simple first-order understanding of the relative eye sizes.The transfer functions of two example backplane channels andtheir resultant 2-PAM and 4-PAM eyes running at 6.4 Gb/s areshown in Fig. 5. It is interesting to note that both channels arefrom the same backplane with equal trace length and total vialength. The only difference in these channels is the signalinglayer and the ratio of through via to stub via. In Fig. 5(a),the transfer function is not very steep between the 4-PAMNyquist frequency of 1.6 GHz and the 2-PAM frequency of3.2 GHz. As expected, the 2-PAM eye has superior voltagemargin in this case. In Fig. 5(b), the channel characteristicsshow a difference in the transfer function at 1.6 and 3.2 GHz ofalmost 30 dB, and, as expected, the 4-PAM eye shows superiorvoltage margin in this case. As these two channels are almostidentical physically but so different electrically, this clearlydemonstrates how there is no definitive answer to the questionof which is better: 2-PAM or 4-PAM. The only conclusion mustbe that each channel’s individual characteristics will determinethe answer to the question for that particular channel.

This design supports both 2-PAM and 4-PAM operation viathe Gray coded levels shown in Fig. 6. The differential outputdriver can be operated in 4-PAM mode as in Fig. 6(a) with a2-bit input T[1:0]. Alternately, by simply setting the LSB to

ZERBEet al.: EQUALIZATION AND CLOCK RECOVERY FOR A 2.5–10 Gb/s 2-PAM/4-PAM BACKPLANE TRANSCEIVER CELL 2123

Fig. 4. (a) 2-PAM PDF showing the probability of voltages at the sample point. (b) Eye diagram formed by 2000 symbols of a PRBS sequence and overlayedwith the worst case patterns. 6.4 Gb/s over 20-in backplane.

Fig. 5. (a) Transfer functions of low loss and high-loss backplane channels (b) measured 2-PAM and 4-PAM signaling at 6.4 Gb/s over the channels. All fourplotted on the same vertical and horizontal scales.

Fig. 6. Compatible (a) 4-PAM and (b) 2-PAM modes via the use of Gray codedlevels and LSB= 0 when in 2-PAM mode.

zero, Gray coding allows the driver to operate in 2-PAM mode,as shown in Fig. 6(b). It is important to note that the 2-PAMmode is a subset of 4-PAM operation, and thus the 4-PAM trans-mitter and receiver can be used throughout. When switchingbetween 2-PAM and 4-PAM operation, the phase-locked loop(PLL) multiplier needs to be halved in order to maintain a con-sistent data rate.

When considering whether to use 2-PAM or 4-PAMsignaling, the effect of reflections must also be carefullyconsidered, as the size of the minimum eye relative to themaximum transition has decreased by . This can be under-stood by referring to Fig. 3, where the minimum eye size for4-PAM is and in 2-PAM. The worst case reflectionmagnitude, however, does not decrease as the maximumswing remains constant to that of a 2-PAM swing. Thus, the

2124 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

Fig. 7. (a) Block diagram showing how transmit and receive equalizers are combined to make a range-restricted DFE. (b) Equalizer ranges overlayed with anunequalized single-bit response. Each dot is a symbol sample point.

impact of reflections on the 4-PAM receive eyes can be verydestructive. In complex backplanes, some channels may havelow high-frequency loss and can tolerate 2-PAM signaling.Other channels may have higher loss and lower reflections andthus will be better suited for 4-PAM operation.

B. Equalization Architecture

Approaches to solve ISI are well known, the most common ofwhich is equalization [6]. In the backplane link environment, thequestion becomes how to perform effective equalization at veryhigh performance with very low cost in area and power. Whilethe use of multiple signaling levels and transmit equalizationcan be effective in minimizing the effects of dispersion [3], [4],transmit-only equalization is an expensive way to combat theeffect of reflections which can potentially be more destructiveto multilevel signaling. Decision-feedback-based receive equal-ization (DFE) can be effective when dealing with configura-tion-dependent reflections. This work uses both transmit andreceive equalizers and clock recovery circuits for operation in abackplane environment with these issues. The transmit and re-ceiver equalizers are combined to make a range-restricted DFEwith effective ranges, as shown in Fig. 7. Since dispersion variesas a function of many properties in backplanes, flexibility in thetransmit equalizer, both in number of taps and in tap settings,is highly desirable. One completely flexible extreme would in-volve the use of a digital filter and a digital-to-analog converter(DAC) [7], while the simplest extreme is two-tap pre-emphasis[8]. Any technique must be evaluated for additional insertionloss as well as power and complexity.

C. Transmit Equalization

A simple thermometer-coded 2-PAM/4-PAM transmitterstructure is shown in Fig. 8. Pre-decoded data is sent to three

Fig. 8. Five-tap 2-PAM/4-PAM equalizing transmitter without equalization(original).

different output differential-pair drivers which can be selectedto achieve any of the 4-PAM levels. In order to extend thisto a five-tap equalizing transmitter, one simple method is toreplicate the original driver five times over and feed each driverwith individual symbol-delayed inputs of the original data, asshown in Fig. 8 (inset). In order for each tap to have the samerange and resolution of the original tap, each replicated drivermust be just as large and have a DAC just as fine as the originaltransmitter. Consequently, this simple approach would result ina 5 increase of the diffusion capacitance on the output padand a similar increase in power and area.

The five-tap merged differential transmitter/equalizer, shownin Fig. 9(b), leverages the fact that the transmitter is peak-powerconstrained due to output differential pair saturation margin.Thus, only 1/5 of the equalizing transmitter (or total gateequal to the original single-tap transmitter) will be active at any

ZERBEet al.: EQUALIZATION AND CLOCK RECOVERY FOR A 2.5–10 Gb/s 2-PAM/4-PAM BACKPLANE TRANSCEIVER CELL 2125

Fig. 9. (a) Original five-tap 2-PAM/4-PAM equalizing transmitter. (b) Newshared equalizing transmitter.

given time. Rather than keep this device overhead, a single trans-mitter is divided into segments that can be shared by any of thetaps. However, the use of this approach alone limits the reso-lution of the output driver to be the inverse of the number ofsegments into which the transmitter is split. For example, for16 parts, the transmitter would only have a resolution of 4 bits.This would result in having a five-tap 4-bit digital finite-im-pulse response (FIR) filter requiring five 4-bit adders runningat symbol rate; this would consume an unacceptable amount ofpower. Instead, the equalizer is partitioned into two sections: ashared section and a dedicated section. The shared section con-sists of seven large subdrivers, each driving 16current, whereeach shared subdriver can select from any of the five equaliza-tion tap streams A–E. The dedicated portion consists of five bi-nary weighted drivers, one for each equalization tap, and eachcapable of driving up to 15current. This combination of sharedand dedicated drivers allows each equalization tap to have thesame current range (127) and resolution (1) of a nonequalizing7-bit transmitter with only 50% additional parasitic overhead.

D. Receive Equalization

For receive equalization, the linearity and high bandwidthof the transmission line environment were leveraged by addingand subtracting currents directly at the input pads, as shown inFig. 10. The receive equalizer reuses the transmit filter designand is simply equivalent to a th scaled transmit equalizer.High-latency reflections are effectively cancelled by the receiveequalizer in this configuration; it is preferred over a transmitequalizer for reflections as the past data is readily available inthe receive pipeline. As reflections vary in both location and in-tensity between channels, the receive equalizer was designed tobe very flexible, allowing for selection of any five taps within awindow of 5–17 symbols after the received bit. It does not re-quire the taps to be sequential. The selection of position is basedon the magnitude of the reflections at each sample point. Thus,

Fig. 10. Five-tap adjustable receive equalizer including variable delay forremoving output clock-to-Q delay.

the tap select multiplexer and tap weights are separately config-ured and optimized for each backplane channel.

One difficulty with this type of receive equalizer is the timingalignment of the equalizer outputs to the incoming receive data,as the equalizer output has a clock-to-delay which varies overprocess, voltage, and temperature, and must be compensated for.This is accomplished by a simple limited-range variable delayelement in the equalizer clock path. This delay element is ad-justed by a training sequence where the receive equalizer sends a0101 pattern which is received by the data path. During training,the clock data recovery (CDR) outputs are used to adjust thevariable delay element while the normal receive phase value iskept fixed.

An adaptive approach is used to set coefficient values for bothtransmit and receive equalizers whose goal is to optimize thesignal-to-noise ratio (SNR) at the sample point.

E. Input Receiver

The input receiver design, shown in Fig. 11, consists of sixslicers, two for MSB and four for the LSB levels. Odd and evenslicers are used to receive data on both the rising and falling edgeof the bit clock and perform an immediate 2:1 deserialization.The LSB slicers have an input offset voltage applied via a DAC,with a simple polarity reversal between the slicers for the uppereye and the lower eye. Characterization of the receiver showedit to have sensitivity to common-mode differences between theinput signal and the DAC output via the nMOS clocking devicenot acting as a current source. In later versions of the design, apreamplifier stage was added in order to rectify this problem.

F. CDR

A flexible 2-PAM/4-PAM CDR was designed that uses theoptimal transitions available for clock recovery in either 2-PAMor 4-PAM mode. The complete set of 4-PAM transitions, shownin Fig. 12, consists of three minor transitions (smallest changein voltage level possible), one major transition (largest changepossible), and two intermediate transitions for a total of six dif-ferent transition types.

If a conventional zero-crossing CDR is used to recover theclock on uncoded 4-PAM data, the problem arises that the

2126 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

(a)

(b)

Fig. 11. Receiver design showing (a) six input data slicers with LSB DAC andLSB slicers sharing inverted offset polarities. (b) Schematic of single slicer.

edge distribution at the MSB sampler threshold [as shownin Fig. 12(a)] is not uniform. Instead, there are three distinctcrossing regions. Similarly, the offset LSB sampler thresholdsalso contain three distinct crossing regions. Such distributionscan cause jitter, or worse, phase offsets, if the data patternexhibits a predominance of one transition type over another.In this design, the optimal transitions [Fig. 12(b) and (d)]are used for clock recovery depending on the link mode. In2-PAM mode, the MSB major transition [Fig. 12(d)] is used.In 4-PAM mode, the minor transitions of either the MSBor LSB [Fig. 12(b)] are also included, while the transitionswith skewed crossings [Fig. 12(c)] are ignored. Both clockjitter and phase offset are thus minimized. The use of onlyminor transitions also guarantees immunity to any pathologicaloffset-inducing patterns that 4-PAM data could present to asimple 2-PAM CDR. The CDR logic that was developed inorder to do this edge exclusion is shown in Fig. 13. Both MSBand LSB edge and data samplers are used. Adequate densityof optimal transitions is assured through means of scrambling,PRBSXOR, or coding.

Fig. 12. Optimal 4-PAM and 2-PAM CDR transitions. The completetransition space (a) is made up of (b) minor transitions, (c) the simultaneousLSB/MSB transition, and (d) the major transition. Group (c) has undesirabletiming distributions at the LSB slicer thresholds and its timing is ignored in4-PAM mode.

Fig. 13. Bimodal 2-PAM/4-PAM CDR with edge exclusion to eliminate theuse of transitions with poor timing information.

Fig. 14. Complete link block diagram.

III. RESULTS

A complete block diagram of the link, shown in Fig. 14, con-sists of a transmitter and receiver which share a common PLLalong with a CDR and digitally controlled phase mixers in aclocking architecture similar to [9]. Separate phase mixers areused for receiver edge, data, and receiver equalizer clocks toallow for maximum flexibility. The system transmits and re-ceives data on both edges of a CMOS bit clock. The transmitclock also uses a phase mixer with a fixed setting in order to

ZERBEet al.: EQUALIZATION AND CLOCK RECOVERY FOR A 2.5–10 Gb/s 2-PAM/4-PAM BACKPLANE TRANSCEIVER CELL 2127

(a)

(b)

(c)

Fig. 15. (a) 2-PAM eye with no equalization at 6.4 Gb/s over 20-in backplane.(b) 2-PAM eye with transmit equalization at 6.4 Gb/s over 20-in backplane.(c) 4-PAM eye with transmit equalization at 10 Gb/s over 20-in backplane.

allow closing of the PLL loop around a common element of theclock path and minimize low-frequency jitter.

A. Equalization Results

Results for the equalization architecture are shown inFigs. 15–18. Fig. 15(a) shows a measurement of the transmitterrunning at 6.4 Gb/s over a 20-in backplane with two connectorswithout any equalization. The eye is completely closed due toISI. Fig. 15(b) shows the same environment with the five-taptransmit equalizer enabled and shows significant margins andclear improvements in SNR. Fig. 15(c) shows the transmitterrunning over the same backplane at 10 Gb/s in 4-PAM mode.

(a) (b)

Fig. 16. Without receive equalization. (a) 4-PAM PDF showing broaddistributions. (b) Eye including worst case transitions. 10 Gb/s over 20-inbackplane.

(a) (b)

Fig. 17. With receive equalization. (a) 4-PAM PDF showing narroweddistributions. (b) Eye including worst case transitions showing improvement ofboth distributions and eyes. 10 Gb/s over 20-in backplane.

Fig. 16 shows simulations of the effectiveness of the receiveequalizer when operating at 10 Gb/s over a 20-in backplane. Inthis figure, no receive equalizer is enabled, and, while the PRBSeye appears open, the worst case sequence shows an eye that isvirtually closed. The PDF curves of Fig. 16(a) also show inade-quate margin between the distributions below 10, indicatingthat there will be high bit-error rates (BERs) in the patterns withnearly the same probability as the worst case. In the simulationof Fig. 17, the receive equalizer is enabled and the worst casesequence, along with the PRBS distribution, is compressed tocreate maximum SNR at the sample point. The PDF curves ofFig. 17(a) also show a significant improvement in the spacingbetween the worst case sequences as well as improved slope tothe distributions and thus improved BER versus voltage offset.Fig. 18 shows the measured effectiveness of the receive equal-izer in the system voltage and timing margin. The system wasmargined by adjusting the offset of the receiver in both time andvoltage and testing a PRBS sequence for several microsecondsfor each data point. When operating at 6.25 Gb/s on a relatively

2128 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

(a) (b)

Fig. 18. Effect of receive equalization. Measured system margin shmoosshowing final receiver voltage and timing margin (a) without and (b) with thereceive equalizer enabled. Both axes plotted on a 1-UI scale at 6.25 Gb/s over10-in FR-4 backplane.

(a)

(b)

Fig. 19. Measured receive clock phase versus cycle using (a) 2-PAM modeand (b) 4-PAM mode on 4-PAM data. Initial PLL and CDR locking can also beobserved. Edge diagrams indicate the transition types used by CDR (circles) ineach mode.

short 10-in backplane, a significant improvement in the overallsystem margin can be observed.

B. CDR Results

Measured results for the 2-PAM/4-PAM CDR are shown inFig. 19, where the measured phase is plotted versus cycle,and PLL and CDR locking can be observed in the initial tran-

Fig. 20. Measured (a) 2-PAM and (b) 4-PAM performance by configuration[0:7] for 5 different connectors.

sients. When 4-PAM data is used with the CDR in 2-PAM mode,as in Fig. 19(a), peak-to-peak jitter of 60 ps was measured. How-ever, when transition limiting is enabled so the CDR only usesthe minor transitions, as in Fig. 19(b), peak-to-peak jitter of35 ps was measured. Measurements were taken at the symbolrate with a real-time oscilloscope.

C. Complete System Results

Complete system results are shown in Fig. 20, where systemswere margined (via the receiver voltage and timing offset tech-nique) to a point equivalent to BER 10 . A correlation tomeasured BER was done prior to these experiments in ordertoestablish proper voltage and timing margin requirements forcorrelation. The data in Fig. 20 was taken over two differentmaterials (standard FR-4 and Nelco-6000), two different tracelengths (10 and 20 inches), two different layers (top and bottomstripline layers of a 0.3-in-thick backplane), as well as five dif-ferent connector types from multiple vendors (each bar repre-senting a different connector type). The Nelco backplanes, inaddition, had a counterboring process done whereby the toplayer via was reduced from 0.3-in to 0.1-in. All systems wereconfigured as in Fig. 1 with two connectors, two linecards, anda 0.3-in-thick backplane. The results indicate that 10 Gb/s isachievable using 4-PAM over most Nelco-6000 configurationsand some FR-4 configurations. In 2-PAM mode, all configura-tions were able to achieve performance between 5–6.4 Gb/s. Asummary of the link characteristics is shown in Fig. 21.

ZERBEet al.: EQUALIZATION AND CLOCK RECOVERY FOR A 2.5–10 Gb/s 2-PAM/4-PAM BACKPLANE TRANSCEIVER CELL 2129

(a)

(b)

Fig. 21. (a) Cell micrograph. (b) Summary table.

IV. CONCLUSION

The backplane environment can be very complex due to thenumber of different components involved and the variability ofeach of these elements. In addition to loss, there are signifi-cant reflections which degrade overall signal quality. The exactsymbol location of these reflections also varies, making this aneven more challenging environment.

Increasing performance in this environment requires flexi-bility in the implementation in order to be able to adjust to eachof the varying problems. The use of both 2-PAM and 4-PAMmodes as well as flexible transmit and receive equalization ar-chitectures has enabled high performance over a broad config-uration space of materials, trace length, via configuration, andconnector type.

REFERENCES

[1] M. M. Greenet al., “OC-192 transmitter in standard 0.18�m CMOS,”in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2002, pp.248–249.

[2] V. Stojanovic and M. Horowitz, “Modeling and analysis of high speedlinks,” in IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers,2003, pp. 589–594.

[3] J. Stonicket al., “An adaptive PAM-4 5-Gb/s backplane transceiver in0.25-�m CMOS,” IEEE J. Solid-State Circuits, vol. 38, pp. 436–443,Mar. 2003.

[4] J. Zerbeet al., “A 2 Gb/s/pin 4-PAM parallel bus interface with crosstalkcancellation, equalization, and integrating receivers,” inIEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2001, pp. 66–67.

[5] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T. Lee, “A 0.3-�m CMOS8-Gb/s 4-PAM serial link transceiver,”IEEE J. Solid-State Circuits, vol.35, pp. 757–764, May 2000.

[6] R. W. Lucky, “Techniques for adaptive equalization of digital commu-nication systems,”Bell Syst. Tech. J., vol. 45, pp. 255–286, 1966.

[7] C.-K. Yanget al., “A serial-link transceiver based on 8-GSamples/s A/Dand D/A converters in 0.25-�m CMOS,” IEEE J. Solid-State Circuits,vol. 36, pp. 1684–1692, Nov. 2001.

[8] A. Fiedleret al., “A 1.0625 Gbps transceiver with 2x-oversampling andtransmit signal pre-emphasis,” inIEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers, 1997, pp. 238–239.

[9] K. Chang et al., “A 0.4–4 Gb/s CMOS quad transceiver cell usingon-chip regulated dual-loop PLLs,” inSymp. VLSI Circuits Dig. Tech.Papers, 2002, pp. 88–91.

[10] S. Sidiropoulos and M. Horowitz, “A semidigital dual delay-lockedloop,” IEEE J. Solid-State Circuits, vol. 32, pp. 1683–1692, Nov. 1997.

[11] B. Song and D. C. Soo, “NRZ timing recovery technique for band-lim-ited channels,”IEEE J. Solid-State Circuits, vol. 32, pp. 514–520, Apr.1997.

[12] J. G. Maneatis, “Low-jitter process-independent DLL and PLL basedon self-biased techniques,”IEEE J. Solid-State Circuits, vol. 31, pp.1723–1732, Nov. 1996.

Jared L. Zerbe (M’90) was born in New York, NY,in 1965. He received the B.S. degree in electrical en-gineering from Stanford University, Stanford, CA, in1987.

In 1987, he joined VLSI Technology, Inc., wherehe worked on custom and semicustom ASIC design.In 1989, he joined MIPS Computer Systems, wherehe designed high-performance CPU floating-pointblocks. In 1992, he joined Rambus, Inc., Los Altos,CA, where he has since specialized in the designof high-speed I/O, PLL/DLL clock-recovery, and

data-synchronization circuits. He has authored many papers and patents in thearea of high-speed clocking and data transmission. He currently leads a designgroup focused on high-speed backplane serial links.

Carl W. Werner (M’97) was born in Chicago, IL,in 1962. He received the B.S. degree in electrical en-gineering from the University of Illinois at Urbana-Champaign in 1984.

He was with Siliconix Inc. from 1984 to 1988,and with National Semiconductor from 1988 to 1997where he worked on CMOS, bipolar and BiCMOS,analog and mixed-signal integrated circuit design.He holds several U.S. and foreign patents. In 1999,he joined the technical staff of Rambus Inc., LosAltos, CA, where he currently manages a team

focused on high-speed circuit design and test.

Vladimir Stojanovic (M’00) was born in Kragu-jevac, Serbia, Yugoslavia. He received the Dipl.Ing.degree from the University of Belgrade, Yugoslavia,in 1998 and the M.S. degree in electrical engineeringfrom Stanford University, Stanford, CA, in 2000.He is currently working toward the Ph.D. degree atStanford University, where he is a member of theVLSI Research Group.

He has also been with Rambus, Inc., Los Altos,CA, since 2001. He was a Visiting Scholar with theAdvanced Computer Systems Engineering Labora-

tory, Department of Electrical and Computer Engineering, University of Cal-ifornia, Davis, during 1997–1998. His current research interests include designand modeling of CMOS-based electrical and optical interfaces, application ofdigital communication techniques to high-speed links (equalization, noise can-cellation), and high-speed mixed-signal IC design.

Fred Chen(M’00) was born in Wichita, KS, in 1975.He received the B.S. degree in electrical engineeringfrom the University of Illinois at Urbana-Champaignin 1997 and the M.S. degree in electrical engineeringfrom the University of California at Berkeley in 2000.

In 1997, he joined Motorola, Libertyville, IL,where he worked on discrete RF design for CDMAcell phones. In 2000, he joined Rambus, Inc., LosAltos, CA, where he has worked on the design ofhigh-speed I/O and equalization circuits.

2130 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003

Jason Wei (M’00) was born in Taipei, Taiwan,R.O.C. He received the B.S. degree from NationalCheng-Kung University, Tainan, Taiwan, in 1985and the M.S. degree in electrical engineering fromSan Jose State University, San Jose, CA, in 1989.

From 1989 to 1994, he was with Raytheon andOKI Semiconductor working on emitter-coupledlogic and analog/digital PLLs. In 1994, he joined thetechnical staff at Rambus Inc., Los Altos, CA, wherehe has designed high-speed CMOS PLL circuitsfor clock recovery and data synchronization and

high-speed I/O circuits.

Grace Tsang(M’01) received the B.S.E.E. degreefrom the Massachusetts Institute of Technology,Cambridge, in 1984, and the M.S.E.E. degree fromthe California Institute of Technology, Pasadena, in1991.

She designed high-speed bipolar circuits atTektronix from 1984 to 1989. From 1991 to 1995,she was with Western Digital working on harddisk drive read channel ICs. Since 1995, she hasbeen with Rambus Inc., Los Altos, CA, working onchip-to-chip interfaces and clock recovery.

Dennis Kim (M’01) was born was born in Pusan,South Korea, in 1975. He received the B.S. degreein electrical and biomedical engineering from DukeUniversity, Durham, NC, in 1997 and the M.S.E. de-gree in electrical engineering from Stanford Univer-sity, Stanford, CA, in 2000.

In 2000, he joined the technical staff of Rambus,Inc., Los Altos, CA, where he has been engaged inhigh-speed I/O circuit design and test.

William F. Stonecypherwas born in Huntsville, AL,in 1964. He received the B.S. and M.S. degrees inelectrical engineering from the Georgia Institute ofTechnology, Atlanta, in 1986 and 1987, respectively.

He joined Rambus, Inc., Los Altos, CA, in 1992,where he is currently a Principal Engineer workingon the development of high-speed serial links.

Andrew Ho (M’99) was born in Taipei, Taiwan,R.O.C., in 1979. He received the B.S. and M.S.degrees in electrical engineering from StanfordUniversity, Stanford, CA, in 2001 and 2002,respectively.

In 2002, he joined Rambus Inc., Los Altos, CA,working in the areas of high-speed signaling and I/Odesign and test.

Timothy P. Thrush was born in Kansas City, MO.He attended Foothill College, Los Altos Hills, CA.

He has been an IC layout designer since1972, working for Fairchild, Signetics, Intel,Hewlett-Packard Labs, and Digital EquipmentCorporation. He has been a Member of the TechnicalStaff with Rambus Inc., Los Altos, CA, since 1991.

Ravi T. Kollipara (M’88) is a Senior Principal En-gineer with Rambus Inc., Los Altos, CA, responsiblefor the signal integrity of the high-speed serial linkchannels. His responsibilities include design and de-velopment of models for packages, line cards, back-planes, connectors, traces and vias, and performingsimulations for system level voltage and timing bud-gets and jitter characterization.

Mark A. Horowitz (S’77–M’78–SM’95–F’00)received the B.S. and M.S. degrees in electricalengineering from the Massachusetts Institute ofTechnology, Cambridge, in 1978, and the Ph.D.degree from Stanford University, Stanford, CA, in1984.

He is the Yahoo Founder’s Professor of ElectricalEngineering and Computer Science at StanfordUniversity. His research area is in digital systemdesign, and he has led a number of processor designsincluding MIPS-X, one of the first processors

to include an on-chip instruction cache, TORCH, a statically scheduled,superscalar processor that supported speculative execution, and FLASH, aflexible DSM machine. He has also worked in a number of other chip designareas, including high-speed and low-power memory design, high-bandwidthinterfaces, and fast floating point. In 1990, he took leave from Stanford tohelp start Rambus Inc., Los Altos, CA, a company designing high-bandwidthmemory interface technology. His current research includes multiprocessordesign, low-power circuits, memory design, and high-speed links.

Dr. Horowitz received the Presidential Young Investigator Award and an IBMFaculty Development Award in 1985. In 1993, he received the Best Paper Awardat the IEEE International Solid-State Circuits Conference.

Kevin S. Donnelly(M’91) was born in Los Angeles,CA, in 1961. He received the B.S. degree in electricalengineering and computer science from the Univer-sity of California at Berkeley in 1985 and the M.S.degree in electrical engineering from San Jose StateUniversity, San Jose, CA, in 1992.

Since 1984, he has worked at Memorex, Sipex, andNational Semiconductor, specializing in Bipolar andBiCMOS analog circuits for disk drive read/write andservo channels. In 1992, he joined Rambus, Inc., LosAltos, CA , where he has designed high-speed CMOS

PLL circuits for clock recovery and data synchronization and high-speed I/O cir-cuits. He is currently the Vice President of a division at Rambus responsible fordeveloping high-speed serial links. He has authored several papers and receivedseveral patents in the areas of high-speed clocking and I/O circuits.