an 8mw frequency detector for 10gb/s half-rate cdr …ali/papers/cicc2013-p12-5.pdf · an 8mw...

4
An 8mW Frequency Detector for 10Gb/s Half-Rate CDR using Clock Phase Selection Mohammad Sadegh Jalali 1 , Ravi Shivnaraine 1 , Ali Sheikholeslami 1 , Masaya Kibune 2 , Hirotaka Tamura 2 1 Department of Electrical and Computer Engineering, University of Toronto, Canada, 2 Fujitsu Laboratories Limited, Japan Abstract—A half-rate single-loop CDR with a new frequency detection scheme is introduced. The proposed frequency detector selects between the clock phases (I and Q) to reduce cycle slipping, hence improving lock time and capture range. This frequency detector, implemented within a 10Gb/s CDR in Fujitsu 65nm CMOS, consumes only 8mW, but improves the capture range by up to 3.6×. The measured capture range with the FD is from 8.675Gb/s to 11Gb/s. I. I NTRODUCTION Most clock and data recovery (CDR) circuits include two locking mechanisms: one for frequency and one for phase. Conventional frequency detectors (in reference-less CDRs) are based on either the analog [1-2] or the digital imple- mentation of the quadricorrelator architecture [3-7]. Digital quadricorrelator-based frequency detectors (DQFD) for half- rate CDRs typically sample four phases of the clock (0 , 45 , 90 and 135 ) at each data edge to uniquely identify phase rotation, as illustrated in Fig. 1. Accordingly, as the data edge crosses the quadrant boundaries of the clock (CK), the FD asserts pulses, directly contributing to the charge-pump (CP) current, slowing down phase rotation, and pushing the VCO towards lock [3-4]. When frequency error is close to zero, the FD becomes inactive, and the PD takes full control of the VCO. Due to concurrent operation, the two loops can interfere with each other [5, 6] and delay phase locking. In [6], the FD and PD loops are uncoupled, where the FD changes the VCO frequency by switching capacitors in and out of the tank. However, this complicates the VCO design. In contrast, we propose an embedded FD, shown in Fig. 1, that uses one loop and two phases of the clock (I and Q). The proposed embedded FD affects the CP through the PD, instead of directly controlling the CP current, enabling the PD to deal with frequency offset. Prior to lock, the clock phases (CK I and CK Q ) into the PD are continually swapped (in a manner described in Section II) to reduce frequency error. We will show in this paper that this clock phase selection (CPS) scheme has a much lower power consumption and complexity than the DQFD. We implement the proposed FD within a half- rate CDR in 65nm CMOS, and demonstrate that the proposed FD, using only 8mW, on average increases the capture range by 3.6× to 8.675Gb/s - 11Gb/s. The rest of this paper is organized as follows. Section II introduces the basic idea of CPS-based frequency detectors. In section III, the FD implementation, along with the circuits used in the CDR are shown. Simulation and experimental results from a test-chip fabricated in a 65-nm CMOS process are included in Section IV. Finally, sectionV concludes the paper. Conventional half-rate CDR Proposed half-rate CDR PD FD + LF DATA PD LF DATA FD 135° 90° CKREC CKREC VCO CP2 CP1 ICP CP ICP Fig. 1. Basic architecture of conventional and proposed half-rate FD II. PROPOSED FREQUENCY DETECTION TECHNIQUE Fig. 2 compares the operation principle of a conventional half-rate DQFD [3-4] with that of the proposed FD in the presence of a large frequency offset. Assuming a positive frequency offset (f DAT A >f CK ), the data phase in terms of the clock phase rotates clockwise, as shown in Fig. 2(a). In a conventional CDR, in regions 3-4 and 7-8, phase error is positive while in the other regions, it is negative. As a result, the PD output increases the VCO frequency in half of the regions and decreases it in the other half. This makes the net PD output near zero and causes cycle slipping. The DQFD, however, is able to detect the direction of this phase rotation by sampling all clock phases on the data edge and tracking the change in polarity of these samples. The FD then asserts pulses to oppose phase rotation. The sum of the PD and the FD outputs reduces frequency error [3-4]. Fig. 2(b) shows the CP current (Φ err ) if CK I or CK Q is selected as the PD clock (CK REC in Fig. 1). In this figure, the net charge into the loop filter due to CK I and CK Q are shown in blue (light) and red (dark), respectively. Here, similar to the conventional case, cycle slipping occurs regardless of whether CK I or CK Q is used. However, we make a critical observation: since CK I and CK Q are 90 apart, at any given time, phase error with respect to either CK I or CK Q will move the VCO frequency in the correct direction. Therefore, if CK REC could switch between CK I and CK Q at appropriate times, it is possible to reduce cycle slipping (ideally avoid it altogether), and the need for a secondary FD loop is obviated. This is done by comparing CK I and CK Q on the rising edge of the data in regions 1-2 and 5-6 and choosing the one closest to the eye center. We disable switching in regions 3-4 and 7-8. This is conceptually demonstrated in Fig. 2(b). We observe that the PD output in regions 1-2 and 5-6 averages to zero, while in regions 3-4 and 7-8, the PD output averages to a positive 978-1-4673-6146-0/13/$31.00 ©2013 IEEE

Upload: vonhu

Post on 25-Mar-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

An 8mW Frequency Detector for 10Gb/s Half-RateCDR using Clock Phase Selection

Mohammad Sadegh Jalali1, Ravi Shivnaraine1, Ali Sheikholeslami1, Masaya Kibune2, Hirotaka Tamura21Department of Electrical and Computer Engineering, University of Toronto, Canada, 2Fujitsu Laboratories Limited, Japan

Abstract—A half-rate single-loop CDR with a new frequencydetection scheme is introduced. The proposed frequency detectorselects between the clock phases (I and Q) to reduce cycleslipping, hence improving lock time and capture range. Thisfrequency detector, implemented within a 10Gb/s CDR in Fujitsu65nm CMOS, consumes only 8mW, but improves the capturerange by up to 3.6×. The measured capture range with the FDis from 8.675Gb/s to 11Gb/s.

I. INTRODUCTION

Most clock and data recovery (CDR) circuits include twolocking mechanisms: one for frequency and one for phase.Conventional frequency detectors (in reference-less CDRs)are based on either the analog [1-2] or the digital imple-mentation of the quadricorrelator architecture [3-7]. Digitalquadricorrelator-based frequency detectors (DQFD) for half-rate CDRs typically sample four phases of the clock (0◦, 45◦,90◦ and 135◦) at each data edge to uniquely identify phaserotation, as illustrated in Fig. 1. Accordingly, as the data edgecrosses the quadrant boundaries of the clock (CK), the FDasserts pulses, directly contributing to the charge-pump (CP)current, slowing down phase rotation, and pushing the VCOtowards lock [3-4]. When frequency error is close to zero,the FD becomes inactive, and the PD takes full control of theVCO. Due to concurrent operation, the two loops can interferewith each other [5, 6] and delay phase locking. In [6], theFD and PD loops are uncoupled, where the FD changes theVCO frequency by switching capacitors in and out of the tank.However, this complicates the VCO design.

In contrast, we propose an embedded FD, shown in Fig. 1,that uses one loop and two phases of the clock (I and Q).The proposed embedded FD affects the CP through the PD,instead of directly controlling the CP current, enabling the PDto deal with frequency offset. Prior to lock, the clock phases(CKI and CKQ) into the PD are continually swapped (in amanner described in Section II) to reduce frequency error. Wewill show in this paper that this clock phase selection (CPS)scheme has a much lower power consumption and complexitythan the DQFD. We implement the proposed FD within a half-rate CDR in 65nm CMOS, and demonstrate that the proposedFD, using only 8mW, on average increases the capture rangeby 3.6× to 8.675Gb/s - 11Gb/s.

The rest of this paper is organized as follows. Section IIintroduces the basic idea of CPS-based frequency detectors. Insection III, the FD implementation, along with the circuits usedin the CDR are shown. Simulation and experimental results

from a test-chip fabricated in a 65-nm CMOS process areincluded in Section IV. Finally, section V concludes the paper.

Conventional half-rate CDR Proposed half-rate CDR

PD

FD

+LF

DATA

PD

LFDATA

FD

135°

90°

CKREC

CKREC

VCO

CP2

CP1 ICP

CP ICP

Fig. 1. Basic architecture of conventional and proposed half-rate FD

II. PROPOSED FREQUENCY DETECTION TECHNIQUE

Fig. 2 compares the operation principle of a conventionalhalf-rate DQFD [3-4] with that of the proposed FD in thepresence of a large frequency offset. Assuming a positivefrequency offset (fDATA > fCK), the data phase in termsof the clock phase rotates clockwise, as shown in Fig. 2(a).In a conventional CDR, in regions 3-4 and 7-8, phase error ispositive while in the other regions, it is negative. As a result,the PD output increases the VCO frequency in half of theregions and decreases it in the other half. This makes the netPD output near zero and causes cycle slipping. The DQFD,however, is able to detect the direction of this phase rotationby sampling all clock phases on the data edge and trackingthe change in polarity of these samples. The FD then assertspulses to oppose phase rotation. The sum of the PD and theFD outputs reduces frequency error [3-4].

Fig. 2(b) shows the CP current (∝ Φerr) if CKI or CKQ

is selected as the PD clock (CKREC in Fig. 1). In this figure,the net charge into the loop filter due to CKI and CKQ areshown in blue (light) and red (dark), respectively. Here, similarto the conventional case, cycle slipping occurs regardless ofwhether CKI or CKQ is used. However, we make a criticalobservation: since CKI and CKQ are 90◦ apart, at any giventime, phase error with respect to either CKI or CKQ willmove the VCO frequency in the correct direction. Therefore, ifCKREC could switch between CKI and CKQ at appropriatetimes, it is possible to reduce cycle slipping (ideally avoid italtogether), and the need for a secondary FD loop is obviated.This is done by comparing CKI and CKQ on the rising edgeof the data in regions 1-2 and 5-6 and choosing the one closestto the eye center. We disable switching in regions 3-4 and 7-8.This is conceptually demonstrated in Fig. 2(b). We observe thatthe PD output in regions 1-2 and 5-6 averages to zero, whilein regions 3-4 and 7-8, the PD output averages to a positive

978-1-4673-6146-0/13/$31.00 ©2013 IEEE

1

23

4

5

6 78

I(0°)

Q(90°) fDATA>fCK

8 7 6 5 4 3 2 1

(a)

t

FD pulse

PD pulse8 7 6 5 4 3 2 1

IMAX

ICP

t

Avg. CP Current > 0

ICP

Avg. CP Current

IMAX

fDATA<fCK

CKREC=CKI

t

Avg. CP Current=0

CKREC=CKQ

fDATA>fCK

(c)

8 7 6 5 4 3 2 1

fDATA>fCK

(b)

Avg. CP Current < 0

fDATA<fCK

t

ICP

-IMAX

IMAX

ICP

IMAX

t

1 2 3 4 5 6 7 8 2 3 4 5 6 71 8

Avg. CP Current=0

CKREC=CKI

CKREC=CKQ

ICP

Fig. 2. Operation principle of conventional (a) and proposed half-rate FD(b) and (c)

value, therefore, overall, the VCO frequency moves in thedirection of reducing the frequency offset over every cycle slipperiod. Similarly, as shown in Fig. 2(c), when fDATA < fCK ,switching between CKI and CKQ results in a net negativeCP current, again moving the VCO frequency in the directionof reducing the frequency offset.

As we will see later in this paper, compared to half-rateDQFDs, the proposed FD is simpler to implement, offers re-duced lock time and increased capture range while consumingless power.

III. CDR ARCHITECTURE AND IMPLEMENTATION

Fig. 3 shows a block level implementation of the proposedhalf-rate FD embedded in the PD loop. We use a linear half-rate PD [8], along with a differential ring VCO. After a datarising edge, one of the two clock phases CKI and CKQ arechosen by the proposed FD, and is fed back to the PD. IfCKREC is positive at the rising edge of the data, the higher (inamplitude) of CKI and CKQ is chosen, while if it is negative,the lower of the two is chosen. Fig. 3 also shows an examplewhere the edge falls in region 6, and the PD clock is swappedfrom CKI to CKQ. This simple FD logic removes the needfor a secondary FD loop, uses a total of only 45 transistors,and consumes 4.75× less power in simulations than the half-rate FDs in [3] and [4]. After the CDR locks, the FD will

become inactive since the data edge will occur at the samephase and the swapping stops.

While in a conventional FD, after phase locking, phase candeviate from its locked position by 1UIpp without activatingthe FD, in a CPS-based FD, this zone is reduced to 0.5UIpp.This is because the CPS-based FD uses the instantaneousphase information to feed the desirable clock phase to thephase detector. To solve this problem, the FDlock signalshown in Fig. 3 can be used to turn the CPS-based FD off.

CP & LF

D QD Q

CKREC

1

0CKI

CKQ

Proposed Half-rate FD

DATA[0]REC

DATA

D Q

L

D Q

LD Q

L

D Q

L

Half-rate linear PDFD lock

eg. CKREC(before data edge)=CKI

Toggle Detector

REF

ERR

CKSEL

CKREC(tedge+ɛ ) = CKI

CKQ

CKSEL=0

CKSEL=1

CKSEL= (CKI>CKQ) (CKREC(tedge)>0)

CKICKQ

1 2 3 4 5 6 7 8

DATA

CKREC

Fig. 3. Half-rate linear CDR with the proposed half-rate FD

Fig. 4 shows the circuit implementation of the VCO. TheVCO delay cell is based on a differential pair with a cross-coupled stage. The delay of each stage is controlled byVTUNE , which adjusts the trans-conductance of the negative-gm stage, varying delay. VTUNE is used in a differential fash-ion to maintain a constant common-mode at the VCO output.The single-ended to differential converter circuit converts thesingle-ended VCNT to the differential VTUNE .

VCNT

CKOUT

Single to

DifferentialVTUNE

+

_

+

_

OUT

_

+

_

IN

+VTUNE

VBIAS_

VBIAS

VTUNE

+_

VCNT +

Fig. 4. Half-rate linear CDR with the proposed half-rate FD

Fig. 5 shows the circuit implementation of the charge-pump.The current associated with the error signal is twice as largeas the current associated with the reference signal, due to thehalf-rate nature of the PD [8]. An on-chip DAC is used toadjust the CP current during capture range measurements.

CP0 CP3

W2

8W2W2

ICP,NOM

IDN

VCTRL

VBN

CDC

EN

EN

EN

EN

W1 2W1

+

_

ERR+

_REF

EN

ɸERR = 2×ERR - REF

Fig. 5. Circuit diagram of the charge-pump

IV. SIMULATION AND EXPERIMENTAL RESULTS

Fig. 6(a) shows the behavioral simulation results of thesystem with a 3% frequency offset. The figure shows thatthe CDR does not acquire lock until the FD is turned on.Fig. 6(b) characterizes the response of the proposed FD versusfrequency offset with a PRBS7 10Gb/s input and compares itto the response of the previous FD [3] with the same inputs,CP currents, and loop filter values. The average gain of theproposed FD is 1.9 times that of the conventional FD. Fig. 6(c)shows the lock time of the CDR with a PRBS7 10Gb/s input(defined as the time it takes for the CDR to start producingerror free data) with a conventional half-rate FD [3] and a CPS-based FD where both systems are simulated under the sameconditions. On average, the CPS-based FD locks 1.8 timesfaster than the conventional frequency detector. Also, the CDRwith the conventional FD fails to lock for very large frequencyoffsets (indicated by the blue region) which is predicted byFig. 6(b) results. Also, our behavioral simulations show thatthe proposed FD locks 2.7 times faster than the FD in [4].

Fig. 7 shows the measured tuning range of the ring VCOwhich is from 3.94GHz to 6.25GHz.

Fig. 8(a) shows the measured recovered demultiplexed eyeof the CDR with and without the FD. In both cases, weinitialize the VCO frequency to 5GHz. With the FD off, theCDR does not lock to a PRBS7 input at 8.675Gb/s, whilelocking is acquired once the frequency detector is turned on.Fig. 8(b) shows the spectrum of the recovered CK beforeand after lock where the incoming data rate is 9.7Gb/s. Inone case, CDR loop is opened and the VCO frequency isheld at 10Gb/s (5GHz). This forced frequency error causesconstant swapping of CKI and CKQ, creating spurs in theclock spectrum. Closing the loop locks the CDR; here, clockspectrum is clean even though the FD is on. This is becausethe freq. error is zero and clock swapping no longer occurs.

Fig. 9 shows the measured capture range of the CDR(defined as (fmax-fmin)/10Gbps) with and without the FDand its jitter tolerance (JT). The VCO frequency is initializedto 5GHz, and PRBS7 data frequency is swept. As expected,increasing CP current improves the capture range. The maxi-mum locking range of the CDR without the frequency detectoris from 9.5Gb/s to 10.15Gb/s, while with the FD this rangeis increased to 8.675Gb/s to 11Gb/s. Due to the bandwidthlimitation of the test fixture, the maximum reliable data rate

FD OFF

Time (ns)

No

rmaliz

ed P

ha

se

err

or

(UI)

VC

O c

on

tro

l

vo

ltag

e (

V)

(a)

(b)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-1

-0.8

-0.4

0

0.4

0.8

1

Frequency offset (GHz)

No

rmaliz

ed C

P c

urr

en

t

0

-0.50 40 60 1208020 100

0.50 40 60 1208020 100

FD ON

Previous FD [3]Proposed FD

0.5

0.6

0.7

0.8

(c)

Frequency offset (GHz)

Lo

ck t

ime

s)

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

Previous FD [3]Proposed FD

Lock range for the

FD in [3]

Fig. 6. Behavioral simulation results for the proposed FD (a) the systemlocks only after the FD is turned on, (b) normalized charge-pump currentversus frequency error with a PRBS7 pattern (c) lock time of the CDR withPRBS7 pattern

0 0.2 0.4 0.6 0.8 1 1.2 1.43.5

4

4.5

5

5.5

6

6.5

Control Voltage

(V)

Me

asu

red

Oscilla

tio

n

fre

que

ncy (

GH

z)

VCNT (V)

Fig. 7. Measured VCO tuning range

for measurements was found to be 11Gb/s. Limited CP swing,caused by charge-pump current sources entering triode, resultsin the CDR capture range being less than the VCO tuningrange. The use of a linear phase detector further limits the CPswing. A bang-bang PD can be used if an even higher capturerange is needed. The measured JT of the CDR for a BER lessthat 10−12 at 10Gb/s is 0.2UIpp at high frequency.

The chip is fabricated in Fujitsu’s 65nm CMOS process. Thedie photo is shown in Fig. 10. The CDR area is 350×400µm2,

Without FD

(b)

4.85 Gb/s, PRBS7

4.3375Gb/s PRBS7

(a)

With FD

CK spectrum after lock (CDR closed, FD on)CK spectrum before lock (CDR open, FD on)

Fig. 8. (a) Demuxed eye for the CDR with and without the FD with an8.675Gb/s PRBS7 input, (b) clock spectrum before and after lock

Measured Capture Range

105

0.1

1

Jitte

r a

mp

litu

de

(U

I pp)

Frequency (Hz)10

610

7

Measured jitter tolerance (@ 10 Gb/s)

1x 1.5x 2x4

6

8

10

12

14

16

18

20

22

24

Charge pump current

Ca

ptu

re r

an

ge (

%) FD on

FD off

On average,

Capture range

increases by

3.6x

Fig. 9. Capture range and JT measurement results

of which 100×65µm2 is occupied by the proposed FD. At a1.2V supply and operating at 10Gb/s, the chip consumes a totalpower of 37.2mW when the FD is on and 28.8mW when thefrequency detector is off. Hence the CPS-based FD consumesonly 8.4mW.

Finally, Table I summarizes the results and compares thispaper against previous work. Also, note that simulating allfrequency detectors in 65nm CMOS at 10Gb/s (with the samegates) result in a power consumption of 6mW for the proposedFD and 29.5mW and 28.6mW for the FDs in [3] and [4],respectively. Since the details of the design in [6] and [7] arenot available, they cannot be simulated. Also exact transistorcount cannot be obtained.

V. CONCLUSION

In this paper, a novel clock phase selection based frequencydetector for half-rate CDRs is introduced. It was shown that bychanging the PD clock (switching between CKI and CKQ)at the right time, the phase detector can be capable of dealingwith frequency offset itself, removing the need for a secondaryFD loop. Based on this idea, a chip was fabricated in Fujitsu65nm CMOS process. It was shown that the FD increases the

2mm

Digital Output

Loop filter

PD

VCO

Control Inputs

2mm

FD

Output buffer BERT400μm

350μm

65μm

100μm

Fig. 10. Die photo

TABLE ICOMPARISON OF CDR RESULTS

FD Type Tech. Lock Lock No. of FD(nm) Rate range trans- power

(Gb/s) (%) istors (mW)[3] Half-rate 180 3.125 11.52 156 30.6[4] Half-rate 180 10 14.3 147 42.2[6] Full-rate 65 10 30 >73 NA[7] Full-rate 180 3.125 16 >76 15.5

This work Half-rate 65 10 23.25 45 8.4

capture range to 23.25% while consuming only 8mW. Oursimulation and measurement results show that this inherentFD consumes much less power and area than its conventionalequivalents.

ACKNOWLEDGMENT

The authors would like the acknowledge CMC Microsys-tems for the provision of test equipment and CAD tools.

REFERENCES

[1] D. Richman, “Color-Carrier Reference Phase Synchronization Accuracyin NTSC Color Television,” Proceedings of the IRE, vol. 42, pp. 106-133,Jan. 1954.

[2] B. Razavi, “A 2.5-Gb/sec 15-mW Clock Recovery Circuit,” IEEE Journalof Solid-State Circuits, vol. 31, pp. 472-480, April 1996.

[3] R. Yang, S. Chen, and S. Liu, “A 3.125-Gb/s Clock and Data RecoveryCircuit for the 10-Gbase-LX4 Ethernet,” IEEE Journal of Solid-StateCircuits, vol. 39, pp. 1356-1360, Aug. 2004.

[4] J. Savoj and B. Razavi, “A 10-Gb/s CMOS Clock and Data RecoveryCircuit with a Half-Rate Binary Phase/Frequency Detector,” IEEE Journalof Solid-State Circuits, vol. 38, pp. 13-21, Jan. 2003.

[5] D. Dalton, S. Fallahi, M. Kargar, M. Khanpour, and A. Momtaz, “A 12.5-Mb/s to 2.7-Gb/s Continuous-Rate CDR With Automatic Frequency Ac-quisition and Data-Rate Readback,” IEEE Journal of Solid-State Circuits,vol. 40, pp. 2713-2725, Dec. 2005.

[6] N. Kocaman, K. Chai, E. Evans, M. Ferriss, D. Hitchcox, P. Murray, S.Selvanayagam, P. Shepherd, and L. DeVito, “An 8.511.5Gbps SONETTransceiver with Reference less Frequency Acquisition,” IEEE CustomIntegrated Circuits Conference, pp. 1-4, Sep. 2012.

[7] M. Lee, and T. Lee, “A clock and data recovery circuit with wide linearrange freq. detector,” IEEE International Symposium on VLSI design,automation and test, pp. 121-124, April 2008.

[8] J. Savoj, and B. Razavi, “A 10-Gb/s CMOS Clock and Data RecoveryCircuit with a Half-Rate Linear Phase Detector,” IEEE Journal of Solid-State Circuits, vol. 36, pp. 761-768, May 2001.