ieee transactions on very large scale …lyle.smu.edu/~pgui/papers/source_synchronous_ddr.pdf · as...

10
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005 833 A Source-Synchronous Double-Data-Rate Parallel Optical Transceiver IC Ping Gui, Member, IEEE, Fouad E. Kiamilev, Member, IEEE, Xiaoqing Wang, Student Member, IEEE, Michael J. MacFadden, Student Member, IEEE, Xingle Wang, Student Member, IEEE, Nick Waite, Michael W. Haney, Member, IEEE, and Charlie Kuznia, Member, IEEE Abstract—Source-synchronous double-data-rate (DDR) sig- naling is widely used in electrical interconnects to eliminate clock recovery and to double communication bandwidth. This paper describes the design of a parallel optical transceiver integrated circuit (IC) that uses source-synchronous DDR optical signaling. On the transmit side, two 8-b electrical inputs are multiplexed, encoded, and sent over two high-speed optical links. On the receive side, the procedure is reversed to produce two 8-b elec- trical outputs. The proposed IC integrates analog vertical-cavity surface-emitting lasers (VCSELs), drivers and optical receivers with digital DDR multiplexing, serialization, and deserialization circuits. It was fabricated in a 0.5- m silicon-on-sapphire (SOS) complementary metal–oxide–semiconductor (CMOS) process. Linear arrays of quad VCSELs and photodetectors were at- tached to the proposed transceiver IC using flip-chip bonding. A free-space optical link system was constructed to demonstrate cor- rect IC functionality. The test results show successful transceiver operation at a data rate of 500 Mb/s with a 250-MHz DDR clock, achieving a gigabit of aggregate bandwidth. While the proposed DDR scheme is well suited for low-skew fiber-ribbon, free-space, and waveguide optical links, it can also be extended to links with higher skew with the addition of skew-compensation circuitry. To the authors’ knowledge, this is the first demonstration of parallel optical transceivers that use source-synchronous DDR signaling. Index Terms—Flip-chip, high-speed-interconnect, optical inter- connects, optoelectronic-integrated circuits, source-synchronous signaling. I. INTRODUCTION T YPICAL input–output (I/O) architectures transmit a single data word on each positive or negative clock edge. Double- data-rate (DDR) signaling, in which data is sent at both edges of the clock is widely used in bandwidth-limited applications to improve data throughput. Many emerging I/O standards, such as HyperTransport [1], RapidIO [2], and POS-PHY Level 4 [3] Manuscript received March 1, 2004; revised November 17, 2004. This work was supported in part by the Defense Advanced Research Project Agency (DARPA) through the Consortium for Optical and Optoelectronic Technologies in Computing (COOP) at George Mason University, Fairfax, VA, and by the Center for Optoelectronics at Brown University/DARPA, Providence, RI, under Subcontract 1120-24596. P. Gui is with the Department of Electrical Engineering, Southern Methodist University, Dallas, TX 75275 USA (e-mail: [email protected]). F. E. Kiamilev, X. Wang, M. J. MacFadden, X. Wang, N. Waite, and M. W. Haney are with the Department Electrical and Computer Engineering, Univer- sity of Delaware, Newark, DE 19716 USA (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). C. Kuznia is with the Peregrine-Semiconductor Corporation, San Diego, CA 92121 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2005.850101 Fig. 1. Source-synchronous parallel links using (a) SDR signaling and (b) DDR signaling. employ source-synchronous DDR schemes in their link proto- cols. There are two major advantages to DDR design: 1) clock frequency is halved for a given data rate (this results in reduced power consumption because the number of clock transitions per unit time is halved) and 2) the frequency of the clock is identical to the maximum toggle frequency of the data. This can be seen by the comparison between single-data-rate (SDR) signaling and DDR signaling in Fig. 1. This feature results in optimum bandwidth utilization in a limited-bandwidth channel and sim- plifies receiver buffer design. Incorporating DDR signaling into the proposed parallel optical transceiver design, a two-channel parallel optical transceiver integrated circuit (IC) has been designed and im- plemented using a source-synchronous DDR scheme. Analog vertical-cavity surface-emitting laser (VCSEL) drivers and optical receivers are integrated with digital DDR multiplexing, serialization, and deserialization circuits in the proposed trans- ceiver IC. On the transmit side, two 8-b electrical inputs are multiplexed into two DDR serial streams and sent over two high-speed optical links through VCSEL driver circuits. The clock signal, oscillating at half the serial data rate, is transmitted along with each data channel. On the receive side, the optical serial data are received by a photodetector (PD) receiver and passed to DDR flip-flops that sample the data on the rising and falling edges of the clock signal. The serial data are then further demultiplexed into parallel data. The IC was fabricated in a 0.5- m silicon-on-sapphire (SOS) complementary metal–oxide–semiconductor (CMOS) process with 1 4 VCSELs and 1 4 PDs heterogeneously bonded to the CMOS circuitry. To test the functionality of the IC, a free-space optical-link demonstration system was built between two chip-carrier boards separated by 76.2 mm on a printed cir- cuit board (PCB) main board. Both electrical and optical test 1063-8210/$20.00 © 2005 IEEE

Upload: hoangdang

Post on 21-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005 833

A Source-Synchronous Double-Data-Rate ParallelOptical Transceiver IC

Ping Gui, Member, IEEE, Fouad E. Kiamilev, Member, IEEE, Xiaoqing Wang, Student Member, IEEE,Michael J. MacFadden, Student Member, IEEE, Xingle Wang, Student Member, IEEE, Nick Waite,

Michael W. Haney, Member, IEEE, and Charlie Kuznia, Member, IEEE

Abstract—Source-synchronous double-data-rate (DDR) sig-naling is widely used in electrical interconnects to eliminate clockrecovery and to double communication bandwidth. This paperdescribes the design of a parallel optical transceiver integratedcircuit (IC) that uses source-synchronous DDR optical signaling.On the transmit side, two 8-b electrical inputs are multiplexed,encoded, and sent over two high-speed optical links. On thereceive side, the procedure is reversed to produce two 8-b elec-trical outputs. The proposed IC integrates analog vertical-cavitysurface-emitting lasers (VCSELs), drivers and optical receiverswith digital DDR multiplexing, serialization, and deserializationcircuits. It was fabricated in a 0.5- m silicon-on-sapphire (SOS)complementary metal–oxide–semiconductor (CMOS) process.Linear arrays of quad VCSELs and photodetectors were at-tached to the proposed transceiver IC using flip-chip bonding. Afree-space optical link system was constructed to demonstrate cor-rect IC functionality. The test results show successful transceiveroperation at a data rate of 500 Mb/s with a 250-MHz DDR clock,achieving a gigabit of aggregate bandwidth. While the proposedDDR scheme is well suited for low-skew fiber-ribbon, free-space,and waveguide optical links, it can also be extended to links withhigher skew with the addition of skew-compensation circuitry. Tothe authors’ knowledge, this is the first demonstration of paralleloptical transceivers that use source-synchronous DDR signaling.

Index Terms—Flip-chip, high-speed-interconnect, optical inter-connects, optoelectronic-integrated circuits, source-synchronoussignaling.

I. INTRODUCTION

TYPICAL input–output (I/O) architectures transmit a singledata word on each positive or negative clock edge. Double-

data-rate (DDR) signaling, in which data is sent at both edgesof the clock is widely used in bandwidth-limited applications toimprove data throughput. Many emerging I/O standards, suchas HyperTransport [1], RapidIO [2], and POS-PHY Level 4 [3]

Manuscript received March 1, 2004; revised November 17, 2004. This workwas supported in part by the Defense Advanced Research Project Agency(DARPA) through the Consortium for Optical and Optoelectronic Technologiesin Computing (COOP) at George Mason University, Fairfax, VA, and by theCenter for Optoelectronics at Brown University/DARPA, Providence, RI, underSubcontract 1120-24596.

P. Gui is with the Department of Electrical Engineering, Southern MethodistUniversity, Dallas, TX 75275 USA (e-mail: [email protected]).

F. E. Kiamilev, X. Wang, M. J. MacFadden, X. Wang, N. Waite, and M. W.Haney are with the Department Electrical and Computer Engineering, Univer-sity of Delaware, Newark, DE 19716 USA (e-mail: [email protected];[email protected]; [email protected]; [email protected];[email protected]).

C. Kuznia is with the Peregrine-Semiconductor Corporation, San Diego, CA92121 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2005.850101

Fig. 1. Source-synchronous parallel links using (a) SDR signaling and(b) DDR signaling.

employ source-synchronous DDR schemes in their link proto-cols. There are two major advantages to DDR design: 1) clockfrequency is halved for a given data rate (this results in reducedpower consumption because the number of clock transitions perunit time is halved) and 2) the frequency of the clock is identicalto the maximum toggle frequency of the data. This can be seenby the comparison between single-data-rate (SDR) signalingand DDR signaling in Fig. 1. This feature results in optimumbandwidth utilization in a limited-bandwidth channel and sim-plifies receiver buffer design.

Incorporating DDR signaling into the proposed paralleloptical transceiver design, a two-channel parallel opticaltransceiver integrated circuit (IC) has been designed and im-plemented using a source-synchronous DDR scheme. Analogvertical-cavity surface-emitting laser (VCSEL) drivers andoptical receivers are integrated with digital DDR multiplexing,serialization, and deserialization circuits in the proposed trans-ceiver IC. On the transmit side, two 8-b electrical inputs aremultiplexed into two DDR serial streams and sent over twohigh-speed optical links through VCSEL driver circuits. Theclock signal, oscillating at half the serial data rate, is transmittedalong with each data channel. On the receive side, the opticalserial data are received by a photodetector (PD) receiver andpassed to DDR flip-flops that sample the data on the rising andfalling edges of the clock signal. The serial data are then furtherdemultiplexed into parallel data.

The IC was fabricated in a 0.5- m silicon-on-sapphire (SOS)complementary metal–oxide–semiconductor (CMOS) processwith 1 4 VCSELs and 1 4 PDs heterogeneously bondedto the CMOS circuitry. To test the functionality of the IC, afree-space optical-link demonstration system was built betweentwo chip-carrier boards separated by 76.2 mm on a printed cir-cuit board (PCB) main board. Both electrical and optical test

1063-8210/$20.00 © 2005 IEEE

Page 2: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

834 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005

have been performed on the IC. The test results show that thetransceiver IC and the optical links are fully operational at adata rate up to 500 Mb/s with 250-MHz DDR clock per channel,achieving a gigabit of aggregate bandwidth. The optical linksused for demonstration suffered minimal skew, but the DDRscheme can be extended to links with higher skew with the ad-dition of skew-compensation circuitry [4].

The DDR scheme was reported in several serial opticallink designs [5], [6]. In these serial links design, a PLL-basedclock data recovery (CDR) circuit is required at the receiver togenerate the clock for data receiving. In contrast, source syn-chronous design with parallel links eliminates phase-lock-loop(PLL)-based CDR, which reduces the complexity at thereceiver. Several source-synchronous parallel optical inter-connection designs in the application of multiprocessors andchip-to-chip and board-to-board communication were reportedin [7]–[11], but the signaling schemes used in these systemswere mainly based on SDR. In addition, the serialization anddeserialization digital circuits were located on a separate chipfrom the analog VCSEL driver and optical receiver circuits. Tothe authors’ knowledge, this paper reports the first demonstra-tion of an integrated optical source-synchronous DDR paralleloptical transceiver.

The remainder of the paper is organized as follows. Sec-tion II introduces SOS process integration technology byPeregrine-Semiconductor (San Diego, CA). Section III de-scribes the mixed-signal IC design. Both analog transceivercircuits and digital logic circuitry that implements DDR serial-ization and deserialization are described in detail. Section IVdescribes the demonstration system, including PCB design andfree-space optical link setup. Section V presents the electricaland optical testing results, and Section VI is a conclusionsection.

II. INTEGRATION TECHNOLOGY

The optical characteristics of SOS process allow flip-chip in-tegration of VCSELs and PDs directly onto the ultra-thin-sil-icon (UTSi) substrate. As shown in Fig. 2(a), the active VCSELapertures are bonded face-down on the SOS chip with the op-tical signals passing through the substrate. This allows low par-asitic connections to the optoelectronic (OE) devices in a verysimple physical package. Fig. 2(b) and (c) shows an array offour VCSEL driver circuits at 250- m pitch before and after aquad VCSEL array is flip-chip-bonded to it [12].

III. MIXED-SIGNAL IC DESIGN

A. IC Architecture

The transceiver IC consists of separate transmitter and re-ceiver circuitry. Typically, the transmitter and receiver are in-tegrated on the same chip, but due to chip area limitations, theVCSEL array and the PD array were integrated on separate iden-tical chips.

Fig. 3 shows the functional block diagram of the IC. Thetransmit section has two 8:1 serializers, which convert a 16-bparallel electrical CMOS input into two high-speed serial datastreams. The VCSEL drivers convert the serial data streams intooptical signals, which carry the data across the free-space links.

Fig. 2. (a) End view of VCSEL flip-chip-bonded to a sapphire substrate.(b) Quad VCSEL driver array before attachment. (c) Quad VCSEL driver arrayafter attachment to VCSEL array.

Fig. 3. Functional block diagram of the DDR transceiver IC.

On the receive side, the high-speed serial data streams are re-ceived and then deserialized back into a 16-b parallel electricaloutput.

Quad VCSEL and PD arrays were flip-chip-bonded at thecenter of the transceiver ICs, forming four optical links: two datachannels and two clock channels. Since the clock is transmittedalong with the data, clock recovery circuitry is not required onthe receiver side, and 8B/10B encoding is not required on thetransmitter side.

B. Digital Circuit Design

The serializer/deserializer circuits were designed to convertwide, slow parallel electrical signals to a narrow, fast serialdata stream, and vise versa. To implement the DDR scheme,a DDR_MUX and a DDR_DEMUX, each consisting of twoflip-flops triggered on opposite edges of the clock signal,were incorporated into the serializer and deserializer circuit,

Page 3: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

GUI et al.: A SOURCE-SYNCHRONOUS DDR PARALLEL OPTICAL TRANSCEIVER IC 835

Fig. 4. (a) Circuit diagram for DDR_MUX. (b) Circuit diagram forDDR_DEMUX.

Fig. 5. Schematic diagram for one channel of DDR 8:1 serializer.

respectively. The DDR_MUX, shown in Fig. 4(a), multiplexesthe two input data into a serial stream at each edge of the clock,whereas the DDR_DEMUX, shown in Fig. 4(b), demultiplexesthe serial input data into two data outputs at each edge of theclock. This section describes the digital circuit design of theDDR 8:1 serializer and the DDR 1:8 deserializer.

1) DDR Transmitter (8:1 Serializer): Fig. 5 shows theschematic diagram for one channel of the DDR serializer.It is composed of three major stages: the LOAD_GEN; two4:1 parallel-in-serial-out (PISO) converters, triggered on therising and the falling edges of the clock, respectively; anda DDR_MUX as the output stage of the serializer [13]. Theinputs to the serializer are 8-b parallel data and two clocksignals: CLK1X (system clock, synchronous to the paralleldata input) and CLK4X (four times as fast as the CLK1X).In our system, CLK4X is generated by the delay-locked-loop(DLL) clock management unit in a Xilinx field-programmablegate array (FPGA). The output of the serializer is the serializeddata stream that is triggered on both edges of CLK4X. Adelayed version of CLK4X is sent along with the serial dataas the accompanying clock. Buffer stages were added at theCLK4X, and simulations were performed to make sure thatits transition edge is right in the middle of the data bit. Themodule LOAD_GEN generates the load pulses LOAD_RISEand LOAD_FALL, which are the inputs of PISO_RISE andPISO_FALL, respectively. The diagrams in Fig. 6(a) and (b) areshow the LOAD_RISE and LOAD_FALL generator circuits,and Fig. 6(e) shows the LOAD_GEN waveforms. Both CLK1Xand CLK4X are inputs to the LOAD_GEN module. As can beseen from Fig. 6(e), the LOAD_GEN module is designed suchthat LOAD_RISE (triggered on the rise edge of CLK4X) and

Fig. 6. Schematic diagram for the components of DDR serializer. (a)LOAD_RISE generator. (b) LOAD_FALL generator. (c) PISO. (d) DDR_MUXoutput waveform. (e) LOAD_RISE and LOAD_FALL waveforms.

LOAD_FALL (triggered on the falling edge of the CLK4X)are generated after the falling edge of CLK1X. Assuming theinput data word is positive-triggered with respect to CLK1X,this arrangement guarantees that the parallel data word is moststable when loaded. Both LOAD_GEN and LOAD_FALL lastfor one cycle of CLK4X. Fig. 6(c) shows a typical PISO circuitthat was used in the design. A DDR_MUX, shown in Fig. 4(a),was used at the output stage to combine the output from bothPISOs into a serial data stream at both edges of CLK4X.Fig. 6(d) shows the transmitter output waveforms [13], [14].

2) DDR Receiver (1:8 Deserializer): The DDR receiver ac-cepts two serial data streams and their accompanying clock sig-nals and generates a 16-b parallel data stream. Fig. 7 shows thegate-level schematic for one channel of the DDR receiver. Itconsists of three major stages: the octal data rate demultiplexer,the clock generator, and the parallel data output register at the

Page 4: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

836 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005

Fig. 7. Schematic diagram for one channel of DDR 1:8 deserializer. (a) Octal data rate demultiplexer; (b) clock generator; and (c) parallel data output register.

end of the receiver [14]. Since we use a source-synchronousscheme (CLK4X is sent along with serial data stream), there isno need for PLL-based clock recovery circuits on the receiveside. Instead, half- and quarter-cycle phase clocks are gener-ated from CLK4X by the clock generator for demultiplexingthe received data. The clock generator, shown in Fig. 7(b), usesthree clock dividers (CLKDIVs) to generate the multiple clockoutputs. Fig. 8(a) depicts the composition of the CLKDIV, andFig. 8(b) shows the waveform of the multiple clock outputs fromthe clock generator. As shown in Fig. 8(b), C2XR and C2XF areclock signals generated at the rising and falling edges of inputCLK4X, which oscillate at half the rate of CLK4X. Similarly,C1XRR, C1XRF, C1XFR, and C1XFF are clock signals gen-erated at the rising and falling edges of C2XR and C2XF, andoscillate at a quarter of the rate of CLK4X.

The octal data rate register, shown in Fig. 7(a), is a tree ofDDR_DEMUX that uses the multiphase clock outputs from theclock generator to deserialize the input data stream to an 8-bparallel data output. The parallel data output register, shown inFig. 7(c), samples the data from eight clock domains and latchesit to a single clock domain. Its major component, REG4, is con-structed of 4-b parallel D flip-flops (registers) in which to latchthe data. The first four bits from the octal data rate demultiplexer(Q0–Q3) are captured by a REG4 on the rising edge of C1XFRand transferred to another REG4 on the falling edge. The secondfour bits (Q4–Q7) are captured by a REG4 on the falling edgeof C1XFR. C1XFR is the system clock output aligned with theparallel data output.

To establish word alignment between the transmitter andreceiver, a barrel shifter [18] is used at the receiver. Duringthe data link initialization stage, special characters are sentto the receiver, which compares the deserialized data word tothe known special characters to determine how many bits ofrotation are necessary to properly align the output. For example,if the special character is “11 110 000” but the receiver’s de-serializer output is “00 111 100”, a 2-b left rotation is requiredto properly align the received data word. A simple finite-statemachine shown in Fig. 9 and a decoder can be constructed tocontrol the barrel shifter and hold the proper rotation value forall data that follows the training interval. This word alignmentsystem is implemented in an off-the-chip FPGA device.

C. Analog Circuit Design

1) VCSEL Driver Circuit Design: A typical VCSEL drivercircuit uses a differential current-steering topology as shown inFig. 10(a) [15]–[17]. The VCSEL is connected to the right-sideoutput, and a dummy load is connected to the left-side output ofthe amplifier. Constant bias current is supplied by transistor M3to ensure that the VCSEL is always operating above its thresholdcurrent, while M0 and M1 are differentially driven to switchmodulation current through the VCSEL.

In our driver circuits, we replaced the analog current sourcesM2 and M3 with digitally programmable current-mode dig-ital–analog converters (DACs), shown in Fig. 10(b). Thisenables us to dynamically adjust the optical output power. EachVCSEL has its own individual bias and modulation current

Page 5: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

GUI et al.: A SOURCE-SYNCHRONOUS DDR PARALLEL OPTICAL TRANSCEIVER IC 837

Fig. 8. (a) Circuit diagram for CLKDIV. (b) Clock outputs waveform of the clock generator.

Fig. 9. Finite-state machine used with barrel shifter for word alignment.

setting for greater flexibility. The digital settings are stored inon-chip digital registers.

2) PhotoDetector Receiver Circuit Design: The PD receivercircuitry is composed of a transimpedance amplifier (TIA), a re-sistance–capacitance (RC) filter, a decision circuit, two postam-plifier stages, and a Converted_to_CMOS stage, as shown inFig. 11(a). The TIA converts PD current into a voltage signal.The RC filter is used to find the average value of the input signal.The decision and postamplifier stages are differential amplifiersthat amplify the signal to digital current mode level (CML). TheConverted_to_CMOS stage converts the CML signal to CMOSsignal for digital processing.

Fig. 11(b) shows the schematic of the TIA design. Threen-type metal–oxide–semiconductor field-effect transistors(MOSFETs) with various sizes were used in the feedback pathsas feedback resistors. Each of these MOSFETs is digitallycontrollable, allowing for dynamic adjustment of the gain ofthe TIA.

D. Chip Layout

Fig. 12 shows the microphotograph of the DDR transceiverIC, fabricated using the SOS 0.5- m UTSi CMOS process. Thechip dimensions are 2.3 mm 2.7 mm, and it has 44 perimeterI/Os. Due to pad limitations, bidirectional I/O pads were usedfor the 16 electrical I/Os (Din15 Din0) with one extra pad toconfigure them as either input pads in the transmitter or outputpads in the receiver.

IV. SYSTEM INTEGRATION

A. OE Device Attachment

The VCSEL array operates at 850 nm with a threshold currentin the range of 1.5 2.0 mA. The differential resistance at 48 mA is 50 , and the slope efficiency is 0.45 mW/mA. The diesize of the VCSEL array is of 1.2 mm 0.45 mm with 250- mpitch between each channel [19].

The gallium arsenide (GaAs) p-i-n photodiode array operateswith a responsivity of 0.5 A/W at 850 nm. The size of the array is1.055 mm 0.45 mm, and the pitch between devices is 250 m[20].

Compared with backside-emitting VCSELs which necessi-tate the removal of the GaAs substrate [16], [21], the sapphiresubstrate allows VCSEL and photodiode arrays to be flip-chip-bonded face down to the center of the transceiver IC with theoptical signals passing though the substrate.

B. Test-Bed System

To test the operation of the IC, we built a PCB main boardand two chip-carrier boards, one for the transmitter and onefor the receiver. The transceiver ICs with OE arrays attachedwere wire-bonded to the carrier boards and sealed with epoxy.A small rectangular section of the carrier board under the ICwas removed to allow optical access to the OE arrays. The car-rier boards were then mounted to the main board with high-speed surface-mount connectors. The distance between the car-rier boards on the main board is about 76.2 mm. Fig. 13 showsa schematic of the test-bed system.

The main board is a 7 8-in eight-layer FR4 board. AXilinx Virtex FPGA was placed in the center of the boardand was programmed to control the test of the transceiverICs. The Virtex FPGA has a built-in DLL, which we used togenerate CLK1X_IN and CLK4X (see Fig. 3) in perfect phasesynchronization for the transmitter. Fig. 14 shows the completetest-bed system assembly.

C. Free-Space Optical Link Setup

We interconnected the chips using the free-space setup shownin Fig. 15. Starting with 15 high-resolution seven-elementUniverse Kogaku double-Gauss lenses, we selected two lensesthat were well matched. Placed at one focal length from theVCSEL array, the first lens collimated the VCSEL beams, whilethe second lens re-imaged them onto the detector array. Because

Page 6: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

838 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005

Fig. 10. (a) Typical VCSEL driver circuit. (b) DAC setup for adjustable VCSEL driver output.

Fig. 11. (a) Block diagram of a typical receiver circuit. (b) TIA circuit.

the lenses were well matched, the magnification error betweenthe imaged VCSEL array and the PD array was minimal.

Fig. 12. Microphotograph of the transceiver IC.

Since the optical system was approximately paraxial, the op-tical path length (OPL) for each channel did not vary signifi-cantly over the array. As a result of this fact, the skew between

Page 7: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

GUI et al.: A SOURCE-SYNCHRONOUS DDR PARALLEL OPTICAL TRANSCEIVER IC 839

Fig. 13. Schematic of the test-bed system.

Fig. 14. Free-space optical demonstration system setup.

Fig. 15. Schematic of the optical lens system design.

the different channels was very slight. The horizontal pitch be-tween OE devices is 250 m, and the off-axis vertical distance ofthe VCSEL array is 0.385 mm and 0.385 mm for the detectorarray. We used an approximate lens model in an optical designprogram to estimate the OPLs for the four channels. The resultsare tabulated in Table I. The transmission latency for each of thefour channels resulting from these estimated OPLs are similarlyshown. The maximum transmission latency difference betweenchannels is less than .01 ps.

TABLE IOPL AND TRANSMISSION LATENCY OF EACH OF

THE FOUR OPTICAL CHANNELS

Fig. 16. View of IC with OE arrays attached as seen by the camera.

The optical system was designed to permit a reflective neu-tral density filter to be placed between the lenses in order toallow simultaneous oscilloscope observation of transmitter andreceiver signals, which also allowed visual verification of theperformance and alignment of the lenses with a charge-coupleddevice (CCD) camera. Using a reflective neutral density filterwith an optical density of 0.03, a portion of the signal was splitoff and coupled with a fiber-coupled detector while the rest im-pinged upon the detectors. It was not possible to achieve thiskind of observation using an infrared-sensitive camera, becausethere was too much reflection from the lens surfaces as well asproblematic CCD blooming. To overcome this issue, we used acolor camera that was not sensitive to infrared. Using the colorcamera allowed us to observe the incident beam spots withoutthe problem of CCD blooming because, although the VCSELspeak in the infrared, they have a small percentage of visiblespectral content in the red. Fig. 16 shows a view of the IC withOE detector arrays as seen by the camera. The spots are not insharp focus in the photograph because, due to the small chro-matic focal shift of the lenses, the VCSELs had to be shiftedslightly out of focus in the visible red in order to be at optimalfocus in the infrared.

V. EXPERIMENTAL RESULTS

We performed both electrical and optical experimental mea-surements on the IC and the optical link between the transmitterand receiver. Test results show that both the IC and the opticallinks are fully operational at a data rate up to 500 Mb/s. Table II

Page 8: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

840 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005

TABLE IIPOWER CONSUMPTION OF ANALOG AND DIGITAL CIRCUITS

Fig. 17. dc test output versus simulation output of the TIA.

lists the operating power consumption per channel at a 500-Mb/sdata rate.

We measured dc characteristics of the TIA. Fig. 17 shows thatthe dc characteristic curve matches the simulation results almostexactly. The same TIA output was monitored during the processof aligning the optical elements to finetune the alignment of thelenses in the system.

We then performed electrical testing on the IC without OEdevices attached to it. The IC was wire-bonded to a universaltest board, where we measured the electrical IO outputs as wellas the output on the VCSEL pads. Fig. 18(a) is a snapshot of thestimulus to the transmitter generated from the logical analyzer,and Fig. 18(b) shows the corresponding voltage output mea-sured on the VCSEL pads. The measurement was done usinga 50- surface-mount resistor to emulate the VCSEL device.The output was verified to be exactly the serialized version ofthe parallel electrical input to the IC, triggering on both risingand falling edges of the CLK4X signal.

By using the test-bed system shown in Figs. 14 and 15, weperformed a complete test on the whole system from the trans-mitter to the receiver through free-space optical links. As men-tioned previously, a portion of the optical signal was split off andcoupled with a fiber-coupled PD, which was monitored by an os-cilloscope. Since only a small portion of the optical signal wasused in this measurement, the receiver was simultaneously ableto continue receiving the optically transmitted data (see Fig. 15).Fig. 19 shows a captured optical data waveform at 500 Mb/s.

Fig. 18. (a) Snapshot of the stimulus from logical analyzer. (b) Electrical dataoutput measured on the VCSEL pads by emulating circuits.

Fig. 19. Optical serial data stream (00 100 111) captured by oscilloscopethrough optical probe. The minimum pulsewidth is 2 ns as shown in the figurewith data rate of 500 Mb/s per channel. Since DDR clock scheme was used,CLK4X was at 250 MHz, half the data rate. The optical power of the data outputwas measured as 95 �W with a bias current set as 1.8 mA and a modulationcurrent of 5.4 mA.

The optical data was verified to be the same as the deserializedoutput displayed by the receiver. The bias current and modula-tion current of the VCSEL driver were set at 1.8 and 5.4 mA,respectively.

Eye-diagrammeasurementswereperformedonthe transmitteroptical serial output. Pseudorandom data in 8-b parallel

Page 9: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

GUI et al.: A SOURCE-SYNCHRONOUS DDR PARALLEL OPTICAL TRANSCEIVER IC 841

Fig. 20. Measured eye diagram (a) at 160 Mb/s and (b) at 500 Mb/s.

form were generated for each channel inside the FPGA usinglinear-feedback-shift registers (LFSRs). The random data wassent to the transmitter chip, which generated the serializeddata stream through DDR serializer after which optical signalswere generated. The optical signal was measured using thefiber-coupled detector mentioned previously. Fig. 20 showsthe measured eye diagrams at data rate of 160 and 500 Mb/s,respectively. As can be seen from the figure, the eye diagramis very open, indicating a low bit-error rate, even at 500 Mb/s.

The performance of the system is limited by the CMOS cir-cuitry, specifically, the CMOS full-swing I/Os. The speed canbe improved by using low-swing CML drivers and receivers. Inaddition, since our test bed uses an FPGA to generate systemclock (CLK1X), fast clock (CLK4X), and pseudorandom data,a better FPGA will provide better clock and data inputs and thusimprove the overall system performance.

VI. CONCLUSION

A two-channel source-synchronous parallel optical trans-ceiver integrated circuit (IC) using double-data-rate (DDR)clocking has been designed and implemented using sil-icon-on-sapphire (SOS) CMOS process. With the DDR scheme,the system can immediately achieve twice the single-data-rate(SDR) bandwidth for a given clock speed. A free-space opticaldemonstration system was constructed to test the functionalityof the chip. Testing results have shown the chip and the linkare fully operational at a 1-Gb/s aggregate data rate with 500Mb/s per channel across a free-space link. Currently, the systemperformance is limited at the electrical I/O in the transceiverchip. With an improved current mode level (CML) driver/re-ceiver at the electrical I/O, the proposed system can scale up tohigher performance. The success of this first effort shows thatnovel signaling schemes may promote the use of optics in very

short-range, high-density, inside-the-box applications, whichwere once the sole domain of only cables and PCB traces.

ACKNOWLEDGMENT

The authors acknowledge the support from the COOP-Pere-grine-USC workshop and foundry run and the contribution ofXanoptix Corporation, allowing us to fabricate our IC usingtheir chip area at the CO-OP foundry run.

REFERENCES

[1] HyperTransport Technical White Paper [Online]. Available: www.hy-pertransport.org

[2] RapidIO Technical White Paper [Online]. Available: www.rapidio.org[3] POS-PHY Level 4 Resource Center [Online]. Available: www.pmc-

sierra.com/posphylevel4[4] Rambus Yellowstone Interface [Online]. Available:

http://www.rambus.com[5] M. Rau et al., “Clock/data recovery PLL using half-frequency clock,”

IEEE J. Solid-State Circuits, vol. 32, no. 7, pp. 1156–1159, Jul. 1997.[6] J. Savoj and B. Razavi, High-Speed CMOS Circuits for Optical Re-

ceivers. Norwell, MA: Kluwer, 2001.[7] R. A. Nordin, A. F. J. Levi, R. N. Nottenburg, J. O’Gorman, T.

Tanbun-Ek, and R. A. Logan, “A system perspective on digital intercon-nection technology,” J. Lightw. Technol., vol. 10, pp. 811–827, 1992.

[8] M. Govindarajan, S. Siala, and R. N. Nottenburg, “Optical receiver sys-tems for high speed parallel digital data links,” J. Lightw. Technol., vol.13, no. 7, pp. 1555–1565, Jul. 1995.

[9] H. Karstensen, C. Hanke, M. Honsberg, J. R. Kropp, J. Wieland, M.Blaser, P. Weger, and J. Popp, “Parallel optical interconnection for un-coded data transmitted with 1 Gb/s-per-channel capacity,” High Dy-namic Range, Low Power, no. 6, pp. 1017–1030, Jun. 1995.

[10] S. Nishimura, H. Inoue, H. Matsuoka, and T. Yokota, “Synchronizedparallel optical interconnection subsystem implemented in the RWC-1massively parallel computer,” IEEE Photon. Technol. Lett., vol. 11, no.10, pp. 360–367, Oct. 1999.

[11] , “Optical interconnection subsystem used in the RWC-1 massivelyparallel computer,” IEEE J. Sel. Topics Quantum Electron., vol. 5, no. 2,pp. 360–367, Mar./Apr. 1999.

[12] C. Kuznia, “Ultra-Thin Silicon-on-Sapphire (UTSi) CMOS,” inCO-OP/Peregrine/USC Workshop, Los Angeles, CA, Jun. 12–14, 2001.

[13] ight Channel, One Clock, One Frame LVDS Transmitter/Receiver, E.McGettigan. [Online]. Available: xilinx.com/apps/XPP245

[14] B. V. Herzen and J. Brunetti. Multi-Channel 622 Mb/s LVDSData Transfer for Virtex-E Devices. [Online]. Available:xilinx.com/apps/XPP233

[15] F. E. Kiamilev and A. V. Krishnamoorthy, “A high-speed 32-channelCMOS VCSEL driver with built-in self-test and clock generationcircuitry,” IEEE J. Sel. Topics Quantum Electron., vol. 5, no. 2, pp.287–295, Mar./Apr. 1999.

[16] D. V. Plant, M. B. Venditti, E. Laprise, J. Faucher, K. Razavi, M.Chateauneuf, A. G. Kirk, and J. S. Ahearn, “256-channel bi-directionaloptical interconnect using VCSEL’s and photodiodes on CMOS,” J.Lightw. Technol., vol. 19, no. 8, pp. 1093–1103, Aug. 2001.

[17] C. Wilmsen, H. Temkin, and L. A. Colden, Vertical-Cavity Surface-Emitting Lasers: Design, Fabrication, Characterization, and Applica-tions. Cambridge, U.K.: Cambridge Univ. Press, 1999.

[18] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Cir-cuits, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2003.

[19] 1� 4 VCSEL Array Data Sheet. EMCORE Corporation, Somerset, NJ.[Online]. Available: www.emcore.com

[20] 1 � 4 PIN Photodetector Array Data Sheet. EMCORE Corporation,Somerset, NJ. [Online]. Available: www.emcore.com

[21] A. V. Krishnamoorthy, L. M. F. Chirovsky, W. S. Hobson, R. E.Leibenguth, S. P. Hui, G. J. Zydzik, K. W. Goossen, J. D. Wynn,B. J. Tseng, J. Lopata, J. A. Walker, J. E. Cunningham, and L. A.D’Asaro, “Vertical-cavity surface-emitting lasers flip-chip bonded togigabit-per-second CMOS circuits,” IEEE Photon. Technol. Lett., vol.11, no. 1, pp. 128–130, Jan. 1999.

Page 10: IEEE TRANSACTIONS ON VERY LARGE SCALE …lyle.smu.edu/~pgui/Papers/Source_synchronous_DDR.pdf · as HyperTransport [1], RapidIO [2], ... tion II introduces SOS process integration

842 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 7, JULY 2005

Ping Gui (S’00–M’04) received the B.S. degreein electrical engineering from Northwestern Poly-technic University, Xi’an, China, and the Ph.D.degree in electrical engineering from the Universityof Delaware, Newark.

In 2004, she joined Southern Methodist Univer-sity, Dallas, TX, where she is currently an AssistantProfessor of electrical engineering. Her research in-terests include mixed-signal VLSI design, optoelec-tronic integrated circuit design, and system-on-chipdesign and test.

Dr. Gui is a Member of IEEE Solid-State Circuits Society and the IEEE Lasers& Electro-Optics Society (LEOS).

Fouad E. Kiamilev (M’92) received the B.S. degreein electrical engineering and computer science andthe Ph.D. degree in electrical engineering from theUniversity of California at San Diego in 1988 and1992, respectively.

From 1992 to 1999, he was on the faculty of theUniversity of North Carolina at Charlotte. He iscurrently a Professor with the Computer EngineeringDepartment at the University of Delaware, Newark.His research group is engaged in the research and de-velopment of hybrid and integrated circuits related to

optical sensors, communication, and information processing. He has published30 journal papers and coauthored six patents and numerous conference papers.

Xiaoqing Wang (S’03) received the B.S. degreein information engineering from Xi’an ElectronicScience & Technology University, Xi’an, China,in 1994. He is currently working toward the Ph.D.degree in electrical engineering at the University ofDelaware, Newark.

From 1994 to 1998, he was with Motorola(China) Electronics, Ltd., where his work focusedon developing advanced automatic surface-mountand testing technologies. His research interestsinclude VLSI, mixed-signal design, and high-speed

optoelectronics.He is a Student Member of IEEE Lasers & Electro-Optics Society (LEOS).

Michael J. MacFadden (S’02) received the B.S.degree in electrical engineering from George MasonUniversity, Fairfax, VA, in 2000. He is currentlyworking toward the Ph.D. degree in electrical engi-neering at the University of Delaware, Newark.

He is currently a Research Assistant in the Pho-tonic Architectures Center, University of Delaware.His research currently focuses on multiscale opticalinterconnects for intrachip global communication.

Mr. MacFadden is a Student Member of the Op-tical Society of America (OSA) and the IEEE Lasers

& Electro-Optics Society (LEOS). He received an Outstanding PerformerAward from the Defense Advanced Research Project Agency (DARPA) MTOin 2001 in recognition for his contributions to the PWASSP ACTIVE-EYESproject. He has participated in the organization of the IEEE LEOS Workshopon Interconnections within High-Speed Digital Systems for several years andis a General Chair for the 2005 workshop.

Xingle Wang (S’00) received the B.S. degree inphysics and the M.S. degree in optics, both fromXiamen University, Xiamen, China, in 1992 and1998, respectively, and the second M.S. degree inelectrical engineering from the University of Ver-mont, Burlington, in 2002. He is currently workingtoward the Ph.D. degree in electrical and computerengineering with University of Delaware, Newark.

His research interests are mixed-signal designand modeling and high-performance complementarymetal–oxide–semiconductor (CMOS) interconnects

circuits.

Nick Waite is currently an undergraduate studentof Electrical Engineering at University of Delaware,Newark. His research interests include analog andmixed-signal integrated circuit design.

Michael W. Haney (M’80) received the B.S. degreein physics from the University of Massachusetts in1976, the M.S. degree in electrical engineering fromthe University of Illinois in 1978, and the Ph.D. de-gree in electrical engineering from the California In-stitute of Technology, Pasadena, in 1986.

From 1978 to 1986, he was with General Dy-namics, where his work ranged from the developmentof electrooptic sensors to research in photonic signalprocessing. In 1986, he joined BDM International,Inc., where he became a Senior Principal Staff

Member and the Director of Photonics Programs. In 1994, he joined GeorgeMason University, Fairfax, VA, as an Associate Professor of Electrical andComputer Engineering. In 2001, he joined the University of Delaware, Newark,as a Professor of Electrical and Computer Engineering. His research activitiesare focused on the application of photonics to new computing, switching, andsignal processing architectures.

Dr. Haney is a Fellow of the Optical Society of America (OSA) and a Memberof the IEEE Communications Society and the IEEE Lasers & Electro-OpticsSociety (LEOS). He has chaired and cochaired several technical conferencesand is a previous chairman of the IEEE Communications Society’s TechnicalCommittee on Interconnections within High-Speed Digital Systems.

Charlie Kuznia (S’87–M’87) received the B.S. degree from the University ofMinnesota, and the M.S. and Ph.D. degrees from the University of SouthernCalifornia, all in electrical engineering.

He has 12 years of research experience in the design and construction of op-toelectronic systems and networks. He previously served as a Professor of elec-trical engineering at the University of Southern California. He provides tech-nical oversight to the design of the optomechanical module for interfacing op-toelectronic components flip-chip-bumped on ultra-thin-silicon (UTSi) circuitryto fiber-array connectors. He has more than 25 technical publications in the areasof integrated optoelectronic/VLSI systems, free-space and fiber-optic intercon-nects, and diffractive micro-lens array and grating design.