design and optimization of low-power level-crossing adcs
TRANSCRIPT
Design and Optimization of Low-powerLevel-crossing ADCs
Colin Weltin-Wu
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
in the Graduate School of Arts and Sciences
COLUMBIA UNIVERSITY
2012
©2012
Colin Weltin-Wu
All Rights Reserved
Abstract
Mixed-Signal Circuit Techniquesfor Low Power and High Speed
Level Crossing ADCs
Colin Weltin-Wu
This thesis investigates some of the practical issues related to the implementation of level-
crossing ADCs in nanometer CMOS. A level-crossing ADC targeting minimum power is designed
and measured. Three techniques to circumvent performance limitations due to the zero-crossing
detector at the heart of the ADC are proposed and demonstrated: an adaptive resolution algorithm,
an adaptive bias current algorithm, and automatic offset cancelation. The ADC, fabricated in 130
nm CMOS, is designed to operate over a 20 kHz bandwidth while consuming a maximum of 8.5
µW. A peak SNDR of 54 dB for this 8-bit ADC demonstrates a key advantage of level-crossing
sampling, namely SNDR higher than the classic Nyquist limit.
Contents
List of Figures iii
List of Tables x
1 Level-Crossing Sampling Systems 1
1.1 Thesis Outline and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Considerations for Low Power 8
2.1 Topology and Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Feedback DAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Zero Crossing Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Digital Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Interconnect Capacitance . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Delay Distortion and Noise Analysis 25
3.1 Input-Dependent ZCD Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
i
3.1.1 A Model For ZCD Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Harmonic Distortion of a Level Crossing Quantizer . . . . . . . . . . . . . 33
3.1.3 Delay-Dispersion-Induced Harmonic Distortion . . . . . . . . . . . . . . . 39
3.2 Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 ZCD Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Design of a µW Programmable LCADC 55
4.1 Adaptive Resolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Programmable Asynchronous Timing Generation . . . . . . . . . . . . . . . . . . 66
4.3 Zero Crossing Detectors with Dynamic Current Bias . . . . . . . . . . . . . . . . 69
4.3.1 Three-Stage Zero Crossing Detector . . . . . . . . . . . . . . . . . . . . . 69
4.3.2 Current Bias DACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Segmented Capacitor Feedback DAC . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Automatic Offset Calibration with Oscillator . . . . . . . . . . . . . . . . . . . . . 81
4.5.1 Relaxation Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.6 Other Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6.1 Bootstrapped Input Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6.2 Current-Steering Reconstruction DAC . . . . . . . . . . . . . . . . . . . . 87
4.6.3 Asynchronous Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.4 SPI and Calibration Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 LCADC Measurements 90
ii
5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 LCADC System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.4 Calibration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.5 Delay Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.6 Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 Conclusions and Suggestions for Future Work 113
6.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.1 Level-Crossing ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.1.2 Synchronous ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A Derivation of Distortion Equations 123
A.1 The Fourier Components cm,n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.2 Non-Uniform Delay Impact on Harmonic Distortion . . . . . . . . . . . . . . . . 125
A.3 Input Slope at T Rm(t) Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . 127
B Silicon Errata 129
iii
List of Figures
1.1 A 2-bit (4-level) level-crossing sampler, with uniform quantization levels. . . . . . 2
2.1 A simple LCADC implemented as a flash. . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Transfer characteristic of an (a) mid-rise and (b) mid-tread quantizer around 0. . . . 10
2.3 The digital output xq(t) of the LCADC contains the information of the samples,
shown as the circles in the graph. The continuous-time DAC generates a zero-
order-held-like reconstruction of the analog input from the samples. . . . . . . . . 11
2.4 Feedback LCADC topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Feedback LCADC topology with a capacitive DAC. The capacitive DAC recon-
structs vxq(t) as before, and subtracts it from vx(t) capacitively so that the ZCD
input range is reduced. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Schematic of the capacitive DAC, using a classic charge-redistribution array. The
operation of the dacRST and dacEN signals will be explained shortly. . . . . . . . 16
iv
2.7 Timing diagram of the control signals for the capacitive DAC. The ZCD signal
refers to the varying ZCD input, either V+ for the upper ZCD or V− for the lower
ZCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Different configurations of a binary bus. . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 A ZCD stage modeled with a saturating transconductor and single output pole. . . . 26
3.2 On the left, the ZCD input can be approximated as a series of voltage ramps. The
dashed horizontal grey lines indicate the input range of the ZCD where the output
is not saturated. On the right, the actual ZCD response is superimposed over an
ideal response in grey. The actual ZCD zero crossing at time tc is delayed from the
ideal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Reducing bandwidth and increasing gain proportionally cause the first-order sys-
tem to behave more and more like an ideal integrator. . . . . . . . . . . . . . . . . 30
3.4 Delay vs. input slope, and the two approximations in (3.5) and (3.12). . . . . . . . 31
3.5 Delay of a three stage cascaded amplifier, by stage. Also shown is the delay of
single stage that has the same overall gain and bandwidth as the cascade. . . . . . . 32
3.6 With symmetric quantization levels, it is possible to express the quantized signal
with 2N−1 signals T Rm(t). For an odd-symmetric input, each of the T Rm(t) signals
is also odd symmetric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 The Fourier series approximation of vxq(t) for an almost full-scale time-normalized
sinusoid input, for 150, 800, and 4000 Fourier terms. . . . . . . . . . . . . . . . . 37
v
3.8 Wide variation in SFDR for small changes in input amplitude. The solid vertical
line represents a level of the reconstructed quantized signal, and the dotted lines
are the quantizer thresholds. The figure spans two LSB of the quantizer input range. 38
3.9 The first harmonics of the LCADC output with ZCD delay variation. The input is
a 20 kHz 1 V peak sine wave. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.10 Impact of dispersion on SDR, 500 harmonics considered. The input is a full scale
sinusoid of the specified frequency. . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11 Dispersion as a function of stage bandwidth for a 3-stage cascade of identical
single-time-constant stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.12 Visualization of the function p0(∆, t) defined for all real ∆. . . . . . . . . . . . . . 45
3.13 Within one period of a full-scale sine wave there are 2N+1 level crossings, which
occur at times si relative to the beginning of each period. . . . . . . . . . . . . . . 45
3.14 Decomposition of a noisy T Rm(t) signal (a) into the ideal noiseless T Rm(t) signal
(b) and a pulse sequence representing jitter due to noise (c). . . . . . . . . . . . . 46
3.15 Noise corrupts the path of the ZCD output voltage, causing the zero crossing to be
disturbed randomly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.16 Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 10 kHz
input. Resolution bandwidth is 100 Hz. . . . . . . . . . . . . . . . . . . . . . . . 51
3.17 Detail of Fig. 3.16 over a 2 MHz span with the same RBW = 100 Hz. At low
frequencies, the impulse approximation is still valid. . . . . . . . . . . . . . . . . 52
vi
3.18 Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 1 kHz input.
Resolution bandwidth is 10 Hz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.19 Detail of Fig. 3.18 over a 2 MHz span, with RBW = 10 Hz. Note the LCADC
distortion components have moved closer in, but other than that the impulse ap-
proximation still predicts the noise level. . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Top level LCADC block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Two methods for reducing the sampling rate. . . . . . . . . . . . . . . . . . . . . . 60
4.3 The behavior of an ideal adaptive resolution scheme (a) and the proposed (b). . . . 61
4.4 If the input slope changes, the trailing boundary can catch the input leading to
oscillations, shown in (a). The alternative trailing boundary behavior shown in (b)
does not suffer this problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 If the input crosses a threshold during a resolution change, there is a quantization
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Asynchronous timing generator formed with an 11-tap delay line, each with 4-bit
control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 The timing edges from the delay line are launched by either an UP trigger or DN
trigger. If the line has not finished propagating when then next trigger arrives, it
restarts from the beginning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
vii
4.8 Schematic of a delay element, with the devices of interest highlighted in gray.
Devices without sizes given can be assumed to be operating as switches and non-
critical. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.9 The ZCD comprised of a three-stage preamplifier and a latch. . . . . . . . . . . . . 70
4.10 ZCD preamplifier stage. Thick devices are LVT. . . . . . . . . . . . . . . . . . . . 71
4.11 ZCD uni-directional latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.12 Total ZCD delay with respect to input slope for bias currents between 230 nA and
4.3 µA. The higher bias currents are lower lines (indicating faster response). . . . 74
4.13 A delay-line based slope measurement quantizes the bias current to five discrete
levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.14 Adaptive current bias keeps delay within dotted lines (60 ns dispersion) until very
slow slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.15 Binary-weighted arrays of PMOS transistors forming the current bias DACs for
the ZCDs. The bus of 8 bias voltages is multiplexed onto the BIASH and BIASL
wires. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.16 Adaptive resolution implemented with two capacitor DACs (detail of one shown
in inset, both are identical), to give a 24×LSB thermometric range. . . . . . . . . . 78
4.17 Structure of a unit capacitance cell. Projected views from both sides are shown as
well. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.18 Schematic of the R-string calibration DAC with two output taps. The inset shows
one of the 256 identical unit elements. . . . . . . . . . . . . . . . . . . . . . . . . 82
viii
4.19 Two-step SAR calibration loop using the existing ZCDs as comparators. The ca-
pacitor DACs are modeled as voltage sources, and the offset voltages are shown as
voltage sources in series with the ZCD inputs. First the ZCD offsets are calibrated,
then the VLSB step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.20 Relaxation oscillator with tunable 4-bit MOS capacitor load. . . . . . . . . . . . . 84
4.21 Traditional switch drivers (a) bring the gate to 0 when off. The proposed driver (b)
clamps the switch gate to the source when off. . . . . . . . . . . . . . . . . . . . 85
4.22 Gate drive boostrapping circuit. The input node is connected to the analog input,
and the output is connected to the capacitive feedback DACs. . . . . . . . . . . . 86
4.23 Differential segmented current steering DAC used for testing the LCADC. Top
layout shown in (a) is comprised of 256 identical unit cells, whose schematic is
shown in (b). The notation 4× means a group of four unit steering cells. . . . . . . 87
5.1 Die microphotograph. The output DAC is not counted toward the active area in
comparison in Table 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Schematic of the LCADC portion of the testing board used to produce the pre-
sented results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 Test board and IC interface board. . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 Test DAC reconstructing the LCADC output for a 1 kHz, small-amplitude input. . . 95
ix
5.5 DAC reconstruction of LCADC output processing a cardiac pulse signal. At the
signal peaks and valleys the quantization step is 2.8 mV. During the high-slew
portions of the signal the quantization step increases, to a maximum of 14 mV (5×
minimum). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6 Spectrum of a single tone, 300 Hz -3dBFS input. . . . . . . . . . . . . . . . . . . 97
5.7 Spectrum of a single tone, 1 kHz -3dBFS input. . . . . . . . . . . . . . . . . . . . 98
5.8 Signal fidelity as a function of input amplitude for a 1 kHz tone input. . . . . . . . 99
5.9 SNDR as a function of input frequency for a -3 dBFS input. . . . . . . . . . . . . . 100
5.10 Power consumption as a function of frequency for a -3 dBFS input tone. . . . . . . 101
5.11 Average sampling rate in function of input frequency. . . . . . . . . . . . . . . . . 103
5.12 SNDR when the AR is disabled, and enabled. . . . . . . . . . . . . . . . . . . . . 104
5.13 Energy per conversion as defined in (5.1) with respect to frequency. . . . . . . . . 105
5.14 Signal fidelity with varying source resistance. . . . . . . . . . . . . . . . . . . . . 106
5.15 Calibrated SNDR of a dozen dies. . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.16 Delay tuning range for a delay cell, vs. the 4-bit T IMEi[3:0] control value. The
multiple lines at each color are the delay for VDD = 0.8,1.0,1.2 V. . . . . . . . . . 110
5.17 Delay jitter histogram, N = 2000, µ = 1.033 µs, σ = 4.1 ns. . . . . . . . . . . . . . 111
5.18 Calibration oscillator frequency vs. the oscillator tuning code OSCTUNE[3:0] for
12 dies and VDD = 0.8,1.0,1.2 V. . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
x
List of Tables
2.1 Comparison of published low-power LCADCs. In the computation of energy/conversion,
the most optimistic numbers are used. A † indicates that offline post-processing
was used to achieve the SNDR numbers reported. . . . . . . . . . . . . . . . . . . 14
2.2 Representative standard cell power consumption breakdown. Leakage is reported
at VDD+10% and 125 C (worst case). . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Approximate coupling capacitances in units of aF/µm, for 130 nm CMOS. . . . . . 22
4.1 ZCD preamplifier device sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.1 LCADC system performance summary. For the full-scale input definition, 0.54 V
is the voltage required to traverse 190 digital output codes. This range avoids the
large-input-amplitude distortion problem. . . . . . . . . . . . . . . . . . . . . . . 107
5.2 ZCD offsets in mV. σ = 5.9 mV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3 Low-speed delay element comparison, results are from measurement except for
simulated mismatch. The tuning range of this cell is reduced due to the digital
4-bit control. Its area is also much larger due to the 4-bit bias DAC. . . . . . . . . 112
xi
6.1 Comparison of this work to published LCADCs. A † indicates that offline post-
processing was used to achieve the SNDR numbers reported. . . . . . . . . . . . . 115
xii
Acknowledgments
I would like to thank my advisor, Professor Yannis Tsividis for guiding my research. In spite
of our differing opinions (for which I am also grateful, for keeping me on my toes) I am constantly
inspired by his insight into an unbelievably broad array of topics, engineering and otherwise.
I owe a debt of gratitude to my thesis committee: Profs. Peter Kinget and Charles Zukowski,
and Drs. Mihai Banu and Tod Dickson, for their careful reading and honest criticisms that ulti-
mately helped shape this thesis for the better.
I would like to thank my childhood mentors Andrew Chow and Serge Lang, who taught me the
value of safety and rigor, both in work and in life. Although I had always played with wires and
magnets as a kid, my real exposure to electronics came when Bob and Ellen Lasher hired me to
work at their store Al Lasher’s Electronics in Berkeley. They gave me free reign to use the back
shop area as my little playpen and the experience fostered my interest in electricity greatly. At
MIT, I will always remember the lessons of Profs. James Roberge and Rahul Sarpeshkar, who
can manipulate transistors in ways that still seem magical. In San Diego, I am fortunate to have
found a role model and friend in Prof. Ian Galton, whose honesty and candor keeps me in line, and
whose creativity never fails to amaze. I will never forget my time in Italy, and am grateful for the
opportunity given to me by Prof. Francesco Svelto and Dr. Enrico Temporiti, who took a chance
on me with no supporting evidence.
Colleagues at STMicroelectronics continued to provide support without obligation, which made
xiii
several tape-outs much easier, and Agilent Technologies’ generosity with measurement equipment
and application support eased the typically painful process of chip verification.
I thank all my friends, too many to name here, for all their encouragement, diversions, and
enriching my life experience. I am glad to have met all my colleagues in CISL for the blackboard
discussions, happy hours, and shared enthusiasm for electronics. In particular, I am lucky to have
had the opportunity to mentor Tao Mai, who reminds me very much of myself many years ago.
I thank my family, all over the world, for their support and inspiration: my father to this day
still works harder than I do. Finally, I am forever grateful to Lynn Bos for being my source of
unwavering support and encouragement throughout the thesis writing process.
xiv
Chapter 1
Level-Crossing Sampling Systems
Level crossing sampling (LCS) is a signal processing technique where an analog quantity (volt-
age/current/temperature/etc.) whose value is continuous and time-varying is mapped to a finite set
of quantized values in a manner that preserves more timing information in the signal than with
conventional Nyquist-rate sampling. To see this, in Fig. 1.1 shows an analog signal with range
[0,1) and an LCS block with a 2-bit output, meaning the analog signal is quantized to four possible
values. In this simple case, the interval [0,1) is divided into four equal sub-intervals: [0, 14), [
14 ,
12),
[12 ,
34), and [3
4 ,1). The output of the quantizer is then an indication of in whichever interval the
analog signal currently resides, and in this uniform quantization example a logical choice for this
indication is the midpoint of each sub-interval. Therefore, whenever the analog signal is greater
than 0.25 but smaller than 0.5, the LCS output is 0.375.
Note that, in contrast with conventional uniform sampling, the LCS block does not have an
understanding of “sampling time.” Its output is a memoryless and instantaneous function of the
1
2
LEVEL
CROSSING
SAMPLER
Figure 1.1: A 2-bit (4-level) level-crossing sampler, with uniform quantization levels.
input: for example, as soon as the input crosses from below 0.5 to above 0.5, the LCS quantizer
changes output from 0.375 to 0.625. In the conventional case, the quantizer output changes only on
integer multiples of a sampling period. This re-timing of a signal to a clock leads to the well-known
phenomenon of frequency aliasing, which is absent in the LCS system.
Level-crossing sampling is not a new technique; this scheme has existed for several decades in
the context of control systems and coding theory [1–4] but has only recently been considered for
VLSI implementation [5–8]. From the perspective of information flow, this method of sampling
is very natural: there is no reason to generate a sample if the input has not changed (no new in-
formation), and conversely generating a sample the instant the input changes gives the system the
fastest possible response time. In a uniformly sampled system there is up to a 1/ fS ( fS is the syn-
chronous sampling frequency) delay between when the signal crosses a (or several) quantization
level(s) and when it is actually acknowledged by the system. This instantaneous-response prop-
erty is what was initially attractive from a control standpoint, and is still explored in the context
of switching regulators [9] that have complex dynamics sensitive to loop delay. Another promis-
ing application for LCS systems is in embedded biomedical devices. Biological signals such as
nerve action potentials, signals in the brain, and ECG waveforms are all pulse-like. The high peak-
3
to-average ratio for these signals invite circuit and signal-processing topologies which adaptively
scale activity and power with the signal itself, a property fundamental to level-crossing sampling.
Furthermore, when things go wrong (e.g. cardiac arrythmia) the signals are unpredictable, so the
fast-response property of LCS can catch these signals while maintaining low power consumption
during normal periods of low activity.
The most striking property of LCS systems is their ability to perform complex signal processing
functions typically reserved for a dedicated DSP without a synchronous clock [5,6,10]. Of course
analog signal conditioning is a mature technology and also clock-less, but in the case of continuous-
time DSP (CTDSP) systems with LCS inputs, the overall system retains the wide programmability
of conventional digital systems. The lack of a system clock is a paradigm shift especially at very
high speeds, where more than half of the power budget is dedicated simply to distributing the clock
to all portions of the IC [11]. The problem of distributing the clock is compounded by the fact that
the die size of complex digital ICs is not–in spite of process scaling–generally decreasing, which
means tight clock skew at ever higher frequencies is a major challenge addressed by having local
re-timing PLLs, which again add to the power budget. The lack of a global clock not only reduces
the system power consumption but also eliminates the dominant source of spurious tones in mixed-
signal systems. The huge power draw from the supply at each transition of the clock, in addition to
the radiated power in the IC substrate means system designers must go to great lengths to mitigate
clock injection: at the system level, techniques such as spread-spectrum clocking reduce the spot
intensity of the injected signal, and at the physical level triple wells, guard rings and split domains
4
are all used to isolate sensitive nodes. Unfortunately all these techniques incur large costs in design,
verification, and silicon area.
Without a sampling clock, the number of samples produced by an LCS system is proportional to
the activity of the input signal. Therefore, LCS implicity provides input-activity-dependent power
consumption, and when these digital streams are processed by continuous-time DSP (CTDSP)
filters, the filters too exhibit activity-dependent power consumption [5, 6, 12]. Traditional analog
signal processing has also benefitted from bias power scaling with respect to the system require-
ments [13–15], but these techniques require additional analog circuitry to implement the power
scaling, incurring a power overheard. In the digital domain, the system clock frequency imposed
by the Nyquist sampling constraint puts a large lower bound on a DSP’s power consumption, even
when no signal is being processed. Thanks to the ubiquitous nature of digital circuits, a universal
effort to improve speed, reduce power, and increase integration density has been applied at the cir-
cuit level [16], architecture level [17], and even software level [18]. As numerous and effective as
these solutions are, like the analog solutions they too require dedicated hardware and/or software
to take advantage of the power savings; moreover, the automated synthesis (one of the most at-
tractive features of digital circuits) of circuits with multiple VDD domains, clock gating, and mixed
static/dynamic logic is much more difficult to verify. The digital methods do not address the power
consumption in the ADC itself, and thus the ADC still runs at the full Nyquist rate. For applica-
tions where the input signal has long periods of inactivity, continually sampling wastes power in
the ADC since it spends most of its time sampling a DC input. Based on the above argument LCS
5
have possible applications ranging from biomedical sensing [19], intelligent sensor networks [20],
ultrasonic measurement [21], and speech processing [22] to name a few.
As previously mentioned, a property unique to LCS systems is the lack of aliased quantiza-
tion error: the spectrum of a zero-order held reconstruction of the digital stream contains only
harmonics of the input [5, 6]. The quantization noise floor (which in the Nyquist case is aliased
quantization error [23] and not true noise) is not present since the quantizer does not sample the sig-
nal in time and thus does not alias out-of-band distortion components. This means an N-bit LCS
output has a quantization-contributed SDR better than or equal to a conventional N-bit Nyquist
ADC over any finite bandwidth, since in the level-crossing ADC (LCADC) case only a fraction of
the quantization error falls in band, whereas in the Nyquist case all the quantization error folds in
band. A corollary to this feature is that the signal conditioning requirements on the LCADC input
are greatly relaxed, as noise and signals out-of-band remain out-of-band at the LCADC output,
rather than getting aliased in band. As low-noise, sharp-cutoff filters are expensive analog blocks,
this saves money and allows more architectural flexibility in mixed-signal systems.
In spite of these attractive qualities and promising applications, LCS has thus far taken back
seat to conventional sampling methods. Without speculating all the reasons why, there are a few
standout disadvantages of LCS which must be addressed before it will have mainstream appeal.
First, there is no easy way to store LCS data (at least without losing some of the properties which
make LCS attractive in the first place) as there are RAMs and ROMs for uniformly-sampled data,
since information is partly encoded in the timing of the samples themselves. This restricts LCS to
real-time systems where the data is processed and used as it arrives. For some applications such
6
as power supply controllers, wireline transceivers, or always-on systems e.g. fault monitors this
is not a problem. On the other hand, for extremely low power systems such as remote sensors
which do not have a constant reliable connection to offload the data an LCS solution may face
implementation hurdles. In order to store sampled data for later transmission, it would first have
to be resampled and then stored along with the corresponding timestamps (thus incurring aliased
quantization noise and requiring a high-speed clock, which LCS is supposed to avoid).
Another disadvantage of LCS systems thus far reported is that they are several performance
generations behind equivalent synchronous systems which perform the same task. Although some
of this disparity can be attributed to the vast difference in human effort applied to asynchronous
versus synchronous systems, as will be explored in this thesis there are issues surrounding the
zero-crossing detectors used in LCS-based systems absent from conventional sampled systems
that hamper performance.
Along the same lines as the two preceding points, there is no efficient way to convert LCS
data into something that can be processed with conventional DSPs, so if an LCADC existed which
was superior to a conventional ADC for a particular application, the overhead in the interface
with a conventional system would add complexity and cut into the performance margin. Several
applications have been proposed for LCS, but often the simulations or models in these high-level
theoretical papers fail to consider the actual silicon implementation issues that would otherwise
make the solution infeasible. In light of these complications, it is clear that LCS-based circuits
cannot be used as drop-in replacements for their conventional counterparts and be expected to give
7
superior overall system performance. Rather, new architectures [9, 10, 24] need to be developed
from the ground up with LCS in mind, and only then can the possible benefits be realized.
1.1 Thesis Outline and Goals
This thesis explores the trade-offs and limitations in the ultra-low-power design space for LCADCs.
The design specifications are based on the LCADC in [10] (a complete LCADC-CTDSP system
for low frequencies), but leverages mixed-signal techniques to optimize power consumption. The
LCADC is almost fully integrated, lacking only four external resistors. A treatment of performance
bottlenecks in existing low-frequency LCADCs justifies the design choices made, resulting in an
LCADC that improves over existing designs with respect to classical FOM.
The objective of this thesis is to understand the practical issues facing LCS-based systems by
considering the circuit-level implementation of as many features as possible. By considering only
integrable or implementatable topologies (not relying on post-processing or massive digital correc-
tion) the realizable performance of LCADCs in real-world applications is better understood. The
thesis is organized as follows. Chapter 2 reviews previous work on LCS and possible architectures
with the goal of very low power consumption in mind. Chapter 3 gives a general treatment of
nonideal effects which face all LCS systems so far published–then Chap. 4 presents the implemen-
tation details of an LCADC mindful of the nonidealities, and Chap. 5 presents measurements. The
thesis is concluded in Chap. 6.
Chapter 2
Considerations for Low Power
2.1 Topology and Specification
We begin with the simplest possible implementation of an LCADC, by replicating the standard
flash ADC structure but replacing clocked comparators with zero-crossing detectors (ZCD), shown
in Fig. 2.1. A ZCD performs the comparison operation between two inputs, in continuous time,
i.e.
out(t) =
0 if V+(t)<V−(t),
1 otherwise.
(2.1)
Keep in mind that the voltages at the input to the ZCD are a pair of continuous-time analog signals,
and the output out(t) is a continuous-time binary signal. Such circuits have existed since the
beginning of monolithic ICs, for example the LM106/LM710 released in the 1970’s [25]. The
8
9
+
-
+
-
+
-
TH
vx(t)
TH2N-1
CONTINUOUS-TIME ANALOG CONTINUOUS-TIME DIGITAL
TH0
TH2N-2
LEV0
LEV2N-2
LEV2N-1
QUANTIZER
Figure 2.1: A simple LCADC implemented as a flash.
outputs LEVj(t) are a set of continuous-time, binary signals that together form a thermometric
representation of the input vx(t). Just as in the case of a synchronous flash ADC, the reference
voltages T H j for the ZCDs evenly divide the full-scale analog input range into 2N equal parts (for
an N-bit LCADC). One subtlety is whether a half-LSB offset with respect to 0 is introduced in the
T H j voltages: this differentiates between a mid-rise and mid-tread quantizer characteristic. The
terms “rise” and “tread” refer to the vertical and horizontal dimensions of a step in a staircase,
respectively. For example, if T H j were defined as
T H j =−1+j
2N−1 , for integer j ∈ [1,2N−1], (2.2)
10
(a) (b)
DIG
ITA
L O
UT
PU
T
ANALOG INPUT
DEAD ZONE
Figure 2.2: Transfer characteristic of an (a) mid-rise and (b) mid-tread quantizer around 0.
then 0 is a quantization threshold, at j = 2(N−1). This defines a mid-rise quantizer shown in Fig.
2.2(a), because the output code changes when the analog input transitions through 0. On the other
hand, if T H j were defined as
T H j =−1+1
2N +j
2N−1 , for integer j ∈ [1,2N−1], (2.3)
then the two quantization thresholds closest to 0 are ±1/2N ; this means there is a “dead-zone”
around 0, a region of analog input containing 0 where the ADC’s output code does not change. This
defines a mid-tread quantizer, as shown in 2.2(b). We note that if we want symmetric characteristics
from both curves, then the mid-tread quantizer has one fewer quantization level than the mid-
rise quantizer (just as a two’s compliment number extends to 2N negatively but only to 2N − 1
positively), but in practice the input signal to the quantizer does not exercise the full-scale input,
11
+
-xq(t) RECONSTRUCTION
DACvxq(t)vx(t)
+
-
Figure 2.3: The digital output xq(t) of the LCADC contains the information of the samples, shownas the circles in the graph. The continuous-time DAC generates a zero-order-held-like reconstruc-tion of the analog input from the samples.
so this difference is not seen. Furthermore, N is usually large enough that other factors such as DC
offsets in other parts of the circuit make this distinction of the behavior around 0 irrelevant. The
“dead-zone” phenomenon of mid-tread quantizers is only an issue for low-resolution applications,
such as ??.
In Fig. 2.3, the outputs of the ZCDs LEVj(t) are summed digitally to get the quantized rep-
resentation of the input, xq(t). Since it is the sum of binary signals, xq(t) is a continuous-time,
discrete-leveled signal which changes each time the input to the quantizer vx(t) crosses a T H j
quantization level. In essence, the information contained in xq(t) is the collection of pairs of
12
timestamps and quantized values corresponding to each sample from the quantizer. The digital
signal xq(t) can be reconstructed by a continuous-time DAC to get an analog representation of the
quantized signal, as shown in Fig. 2.3.
In an LCADC, the location of the input signal is always known with respect to the quantization
levels–samples are always spaced in amplitude one LSB apart but their spacing in time is unknown.
This is in contrast with a Nyquist rate ADC, where the amplitude difference between consecutive
samples is unknown, but in time they are always spaced 1/ fS. For this reason, although the flash
topology is structurally identical between an LCADC and synchronous ADC, the requirements are
rather different. In a Nyquist rate ADC, the comparators are powered off for most of the sampling
period, since we know deterministically when the next sample will arrive. On the other hand, when
a sample is taken, all must be enabled to capture the unknown vx(t) within the whole VFS full-scale
input range. In an LCADC, the ZCDs must constantly be enabled, but only two need to be on at
any given time–one comparing vx(t) against the next quantization level above, and one comparing
against the quantization level below.
To save power, it would be possible to individually power-gate each ZCD, so that only the
signal-flanking ZCDs are enabled. This is impractical as 2N gating signals would need to be
generated, and even if an efficient implementation were possible (for example, the thermometric
digital output could be shifted left and right and fed back to the gating inputs) there is always the
power and area overhead of implementing the control logic and adding a gate input to all 2N ZCDs.
Instead, a more elegant solution introduced in [7] wraps two ZCDs (which would be the only two
active ones in the flash topology) within a feedback loop such that the ZCDs are level-shifted to
13
+
-
+
-
UP/DOWN COUNTER
vx(t) xq(t)
+VLSB/2
-VLSB/2
UPPER TRACKING THRESHOLD
LOWER TRACKING THRESHOLD
vxq(t) DAC
Figure 2.4: Feedback LCADC topology.
always contain vx(t) within their comparison interval. This topology is shown in Fig. 2.4. The CT
quantized output xq(t) is reconstructed with a DAC as in Fig. 2.3, and two tracking thresholds are
created, one half an LSB above and one half an LSB below. A crossing of either ZCD updates the
output xq(t), which in turn shifts the tracking interval up or down; this way, the input vx(t) is always
compared against two thresholds VLSB apart, identical to the flash case. Within the bounds of the
loop delay and the fed-back tracking interval’s maximum tracking speed, the two architectures are
functionally identical. The feedback topology even offers the added benefit of requiring only two
ZCD offsets to compensate, since the same two ZCDs are used for every level comparison.
Within the general topology in 2.4, there is much room for innovation. We begin with a com-
parison between published low-power LCADCs in Table 2.1, which all use the feedback topology.
Aside from [24] which uses a flash (but for GHz frequencies), only one other publication was
found that used a flash [26], and this did not have silicon so it was unsuitable for comparison. Out
14
Work [27]† [8] [10] [19]†
Year 2005 2006 2008 2011Tech 130 nm 0.25 µm 90 nm 0.18 µmArea 0.1 (mm)2 0.25 (mm)2 0.06 (mm)2 0.96 (mm)2
Resolution 4 bit 6 bit 8 bit 8 bitAdaptive no yes
DAC Type Cap Res Res Hybrid (cap MSB/res LSB)Bandwidth 160 kHz 55 kHz 10 kHz 1 kHz
SNDR 60 dB 53 dB 58 dB 52 dBPower 180 µW 17 mW 50 µW 25 µWJ/conv 560 fJ/conv 345 pJ/conv 3.14 pJ/conv 31 pJ/conv
Table 2.1: Comparison of published low-power LCADCs. In the computation of en-ergy/conversion, the most optimistic numbers are used. A † indicates that offline post-processingwas used to achieve the SNDR numbers reported.
of fairness in comparison, we should note that refs. [27] and [19] both use high speed re-sampling
of the output and digital post processing, and [19] also implements the digital controller off chip,
not included in the power number.
The design target for the LCADC presented in this thesis was 8 bits, 20 kHz bandwidth; such
specifications make the LCADC suitable for both voice and biomedical applications. The goal
was to optimize the power consumption as much as possible, ideally below 10 µW. In addition,
the SNDR performance should be achievable with simple zero-order hold reconstruction, without
high-speed re-sampling or post processing. Finally, all the performance-affecting circuitry needed
to be integrated on-chip.
15
+
-
+
-
UP/DOWN COUNTER
vx(t) xq(t)
+VLSB 2
zcdEN
-VLSB 2
CA
P
DA
C
Figure 2.5: Feedback LCADC topology with a capacitive DAC. The capacitive DAC reconstructsvxq(t) as before, and subtracts it from vx(t) capacitively so that the ZCD input range is reduced.
2.2 Feedback DAC
In Fig. 2.4, the feedback DAC can be implemented in several ways. In the original imple-
mentation [27], a capacitive feedback DAC was used in a structure similar to a classic charge-
redistribution SAR ADC [28]. The topology of an LCADC with a capacitive feedback DAC is
shown in Fig. 2.5. The advantage of a capacitive DAC is that it does not draw static power. On the
other hand, capacitors do not pass DC, and need to be periodically refreshed. In clocked systems
there are regular occasions to reset the capacitors, but in an asynchronous system the refresh needs
to be carefully placed in order to maximize the “listening” time of the circuit and preserve its fast
response to level crossings. During these refresh periods the ZCDs can be disabled to save power
with the zcdEN signal.
The alternative DAC structure is with resistive unit elements, either as an R− 2R ladder or
R-string (as used by [8, 10]). One advantage for resistive DACs is they tend to be smaller than
capacitive DACs, and therefore the inherently monotonic R-string structure is possible; the analo-
16
REF+
REF-
dacEN dacRST
to ZCDsIN
×2N-1
×1
×2N
DAC[7:0]
dacEN
dacRST
Figure 2.6: Schematic of the capacitive DAC, using a classic charge-redistribution array. Theoperation of the dacRST and dacEN signals will be explained shortly.
gous thermometric capacitive DAC would typically be area-prohibitive. The main disadvantage of
resistive DACs is constant current dissipation. Also, with R-string DACs the ZCD would need to
see the full input voltage range across its common mode, since the subtraction occurs implicity in
the ZCD itself. At low supply voltages, special wide-common-mode-range input stages [10] are
necessary to maximize the input signal range. The added complexity causes the ZCD performance
to suffer.
For a low power design the capacitive DAC was chosen, with the DAC schematic shown in
Fig. 2.6. The core charge redistribution array is identical to that of a conventional SAR introduced
in [28]. However the input is brought in through a series capacitor whose size is nominally equal to
the sum of the DAC array capacitance, rather than sampled directly on the SAR array. If the code
driving the DAC[7:0] bus is equivalent to −xq(t) (a bipolar output is possible because the DAC
has positive and negative references) then the voltage into the ZCDs is equal to (vx(t)− vxq(t))/2.
17
Since the maximum difference between these two voltages is ±VLSB/2, the capacitive DAC level
shifts the input so the comparison interval of the ZCDs is now fixed to a (−VLSB/4,+VLSB/4)
interval around VCM, the desired ZCD common mode voltage. Although in Fig. 2.5 the thresholds
are shown as±VLSB/2, the true voltages are indeed (−VLSB/4,+VLSB/4) due to capacitive division.
The reason that figure (and all subsequent figures, unless noted) show ±VLSB/2 is that VLSB is
referred to the input full-scale, and it simplifies the discussion.
Regarding the problem of when to refresh the capacitors, consider that for an N-bit LCADC
with a maximum sinusoidal input frequency fMAX , the minimum inter-sample spacing (granularity
time) is given by [10]
TGRAN =1
2Nπ fMAX. (2.4)
For our proposed 8-bit, 20 kHz system this gives TGRAN = 62 ns. This means that once a sample
occurs, the system has at least 62 ns of dead time before the next sample occurs, during which
it can reset the capacitors. For a back of the envelope calculation, we assume Cunit for the 8-bit
DAC is 10 fF; then the total DAC capacitance is 2.56 pF. Even if we want 10 time constants for the
voltages to settle, the time constant is 6 ns so the aggregate DAC switch resistance Ron is 2.4 kΩ,
easily achievable.
Figure 2.7 shows the REFRESH and T RACK timing sequence for the capacitive DAC. Once
a ZCD crossing occurs, the system enters the REFRESH state, where the input is disconnected
from the capacitor array by dacEN going low, and the whole capacitor array is reset by dacRST .
During the discharge of the array, a new code is loaded into DAC[7:0]. This sets the positions of
18
IN
ZCD
REFRESHTRACKTRACK
dacRST
dacEN
DAC[7:0]
zcdEN
+VLSB 2
-VLSB 2
Figure 2.7: Timing diagram of the control signals for the capacitive DAC. The ZCD signal refersto the varying ZCD input, either V+ for the upper ZCD or V− for the lower ZCD.
the switches to REF+ or REF− for the bottom plates; the capacitors are not actually connected
to the references until dacRST goes low and dacEN goes high. When the DAC is re-enabled, the
dacRST clamp does not overlap dacEN to ensure that proper charge sharing occurs between the
input voltage and the DAC voltage. The charge redistribution is allowed to settle for a few τ time
constants, after which point the circuit is in T RACK mode. In reference to Fig. 2.5, during the
REFRESH period the ZCDs are disabled to save power with the zcdEN signal, shown in the inset
of Fig. 2.7. One shortcoming of this approach is that the differential voltage at the input of the
ZCD is half that of the input–this reduces the input derivative and thus increases the response time
of the ZCD, as will be explained in the following chapter. The latter disadvantage of reduced ZCD
19
differential input is a small price to pay for having a ZCD common mode input range limited to
VCM±VLSB/2.
2.3 Zero Crossing Detectors
By using a capacitive feedback DAC, the only remaining block which dissipates static power is
the ZCD. For example, in [10] the R-string DAC dissipates 10 µA, and the remaining 40 µA are
burned in the ZCDs. Without regeneration, the choice of topology for the ZCD is fairly limited.
In monolithic ZCDs, the original LM106 [25] was two differential pair stages driving a TTL level
shifter. The more recent LMV7219 uses a folded-cascode open-loop amplifier followed by a dif-
ferential pair and a differential-to-single-ended converter [29]. Many other examples exist, but
they are without fail high-gain amplifiers optimized for open-loop application. Most monolithic
designs are bipolar, so they can get away with one or two stages, but in CMOS realizations often
three or more stages are needed (an inverter counts as a stage, since with insufficient preceding
gain, it will be biased in class A and dissipate static power) [8, 10, 19, 27, 30, 31]. While a few
of these designs [30, 31] use timing signals to disable the ZCD after a crossing occurs (thereby
implementing a quasi-latch), the core high-gain cascaded amplifier is almost identical among all
designs.
In order to minimize the power consumed by the ZCD, there are two approaches. One is
to simply reduce the power to the ZCD–up to the point that performance remains unaffected, as
analyzed in Chap. 3. The second approach uses the fact that after each sample, the ZCD can be
20
Cell Leakage Power (nW) Energy/Transition (fJ)2-NAND 2 9
2 2-3 OR-AND-INVERT 4 24D FLIP FLOP 13 32FULL ADDER 8 37
Table 2.2: Representative standard cell power consumption breakdown. Leakage is reported atVDD+10% and 125 C (worst case).
shut off for at least a period of TGRAN–the minimum inter-sample time spacing–without fear of
missing a sample, as shown using the zcdEN signal in Fig. 2.7. A solution to increase the TGRAN
and thus the ZCD off time is implemented in Chap. 4.
2.4 Digital Logic
So far the focus has been on the analog blocks as the dominant power consumers, as has been the
case in previously reported designs. Toward a low power solution, it is necessary to know how
much can be done digitally with less than 10 µW. Given in Table 2.2 is a sample of digital cell
power in 130 nm CMOS1, broken down into static leakage power and dynamic power.
For a full-amplitude 20 kHz input to the 8-bit LCADC, the digital circuits must process 2×
28×20 kHz = 10 MS/s. In the feedback topology LCADC in Fig. 2.4 there is an up-down counter,
which can be implemented as 8 concatenated full adders with an output 8-wide register to prevent
glitches in the output code. For a full-scale input, the counter continuously ramps up and down
over its full scale. In a binary-coded ramp (i.e. 001,010,100... etc.), there are on average two bit
1The numbers have been slightly adjusted to avoid directly reporting standard cell data.
21
changes per step. If we assume that only the adders whose outputs change consume power (not
true) then the dynamic power consumption is
PD = (2×37fJ/transition+8×32fJ/transition)×10MS/s (2.5)
= 3.3µW (2.6)
This is an optimistic estimate, since this number only counts internal switching power and not
loading effects nor wiring capacitance.
2.4.1 Interconnect Capacitance
Digital logic is often placed as far from the sensitive analog circuitry as possible to prevent switch-
ing noise from propagating through the substrate into high-impedance nodes, which exist in a
charge-redistribution topology. However, the distance between the feedback DAC and digital logic
adds large parasitics to the digital bus driving the feedback DAC, which must switch at the full
sample rate of the LCADC. Let us consider a simple example with an 8-bit binary bus, shown in
Fig. 2.8(a). For a full-scale sinusoidal input, the DAC code will ramp from 0 to 255 and back to 0
in one period (modulo 256). For a binary encoded bus, there are a total of 254 0→ 1 transitions
when counting from 0→ 255→ 0. Furthermore, there are 127 differential transitions per period: a
differential transition is defined as the situation where two adjacent wires transition simultaneously
in opposite directions. These matter because for the wire being charged, the coupling capacitance
to the adjacent wire counts double. Table 2.3 shows some rounded capacitances between a high
22
(a)
b7 b6 b5 b4 b3 b2 b1 b0
(b)
b0 b7 b1b6b5 b3b4b2
Figure 2.8: Different configurations of a binary bus.
Min Space 0.2 µm 0.5 µm 1 µm To Substrate100 60 30 20 50
Table 2.3: Approximate coupling capacitances in units of aF/µm, for 130 nm CMOS.
metal layer to the substrate, and between parallel wires in the same metal with different spacing.
Using the table, we can compute the power dissipated just in driving the bus capacitance. For an
isolated 0→ 1 transition, the capacitance which needs to be charged is Csub +2×CC, where Csub
is the capacitance from the line to substrate, and CC is the coupling capacitance to the two adjacent
lines, assuming the line is in the middle. If an adjacent line is switching 1→ 0, we add another
CC term to double-count that capacitance (and the 1→ 0 transition takes no power itself since it is
being discharged). Adding up all the transitions for a full period, the frequency and length normal-
ized energy dissipated in a minimum-spaced bus comes out to be 180 fJ/µm·Hz. For example, a
300 µm long bus requires 1.1 µW just to drive its capacitance when processing a full-scale 20 kHz
input. There is an additional unaccounted-for increase in power since the line capacitance slows
down the transitions times, increasing the cross-conduction currents in the bus receiver.
23
Bus Reordering
For the regular binary bus 50% of the total transitions were differential. It makes sense that if the
most active lines (LSB) were isolated from one another by interleaving them with more slowly
moving lines (MSB) the differential transitions could be reduced. By simply reordering the bus
lines from being sequential to b0,b7,b2,b5,b4,b3,b6,b1 as shown in Fig. 2.8(b) the number of
differential transitions reduces to 8%, and the normalized energy to drive such a bus reduces to
135 fJ/µm·Hz. For the same 300 µm long bus the driving power is now 810 nW. This is just a
simple example; the optimal order depends on the encoding method of the bus and the particular
switching patterns (even more advanced techniques exist if some bus latency is allowed [32]).
Selective Spacing
The inter-wire coupling capacitance can go to zero if we space the lines infinitely far apart, and
since the coupling capacitance is 80% of the total capacitance on a wire, this would be desirable–
unfortunately this would require infinite area. Instead, only the lines with the most activity are
isolated. Starting with a minimum-spacing bus, increasing all the line spacing to 0.5 µm from 0.2
µm reduces the energy from 180 fJ/µm·Hz to 80 fJ/µm·Hz, but triples the width of the bus, a costly
sacrifice. However, if only the three MSBs are spaced at 0.5 µm, the energy still drops to 105
f J/µm·Hz, a 40% savings, and the bus widens by 50%. Finally, if we combine the two ideas and
reorder the bus as well as space the outer two signals (b0 and b1) 0.5 µm from the other lines, the
24
energy is 95 fJ/µm·Hz. To compare, driving this bus at full speed consumes 570 nW, as opposed
to 1.1 µW for the baseline case.
Although the numbers above were just approximations, it is clear that toward an optimal
minimum-power implementation the total power consumed becomes the sum of many small con-
tributions rather than being dominated by a single block, and attention must be paid equally to all
parts of the system.
Chapter 3
Delay Distortion and Noise Analysis
In this chapter, the impact of ZCD nonidealities on LCADC performance is analyzed. Although
the math is tedious and at times opaque, ultimately simple results are achieved which help direct
the design of the ZCD.
3.1 Input-Dependent ZCD Delay
We first consider how the slope of the input signal at the time of zero crossing affects the delay
through the ZCD, a phenomenon called delay dispersion. Then, a mathematical model of an ideal
LCADC output is derived based on Fourier series, and these two results are combined to achieve
an estimate of the dispersion-induced distortion.
25
26
gm ro CL
vin voutVSWro
VSWgmro
vin
iout
Figure 3.1: A ZCD stage modeled with a saturating transconductor and single output pole.
3.1.1 A Model For ZCD Delay
In the context of the LCADC with a capacitive DAC, the ZCD has a very well-defined input to
process. From Fig. 2.7, we see that the ZCD input is comprised of a chopped signal, and to
simplify we can approximate each segment of the signal as a voltage ramp. For an 8-bit converter
with a VFS of 1 V, the LSB step is 4 mV so a small-signal approximation–at least of the first stage
of the ZCD–is valid [33]. A simple model for a ZCD stage is shown in Fig. 3.1. It models
a transconductor with a load capacitance, and a hard saturation that limits the output voltage to
±VSW . Writing AV = gmro and τ = roCL, we know the step response of this circuit is
vout,s(t) = AV
(1− e−
tτ
)· (1V). (3.1)
We are more interested in the ramp response. Again in the context of an LCADC application, the
ZCD input can be approximated as a linear ramp, whose total span is larger than the ZCD input
linear range. Let us imagine the input looks as in Fig. 3.2, where at initial time t = 0 the transcon-
ductor leaves its saturated state (meaning changes in the input voltage cause the transconductor
27
current to change). The ramp rises with a slope a, so that it will cross 0 at t =VSW/aAV seconds. If
the ZCD were ideal, the ZCD’s zero crossing would coincide exactly with the input’s zero crossing
at t =VSW/aAV seconds. However due to the time constant τ = roCL, the ZCD’s response will be
delayed. The actual response is given by the integral of the step response (3.1), with the initial
condition that vout,r(0) =−VSW :
vout,r(t) =−VSW −aAV τ+aAV t +aAV τe−tτ . (3.2)
This equation is plotted in the right side of Fig. 3.2. This can be algebraically manipulated to arrive
at a closed form solution for the delay between the output zero crossing and input zero crossing
given by
td = τW(−e−
(1+ VSW
aAV τ
))+ τ (3.3)
where W(·) is Lambert’s function [34], defined as a solution x = W(y) to the differential equation
y = xex. Unfortunately this is not too helpful since W itself does not have a closed form.
What is more interesting is if we analyze a few cases, for a fixed AV and τ. First, consider when
the input slope is very small. We are interested in the delay between the actual zero crossing, tc
in Fig. 3.2, and the ideal zero crossing at VSW/aAV seconds. To solve for tc, we begin with the
equation for the ramp response in (3.4) and solve the equation vout,r(tc) = 0, since we want the
time where the response crosses zero.
0 =−VSW −aAV τ+aAV tc +aAV τe−tcτ . (3.4)
28
VSWAV
VSWaAV
a
VSW
ZCD DELAY
t t
vin vout
tc
td
Figure 3.2: On the left, the ZCD input can be approximated as a series of voltage ramps. Thedashed horizontal grey lines indicate the input range of the ZCD where the output is not saturated.On the right, the actual ZCD response is superimposed over an ideal response in grey. The actualZCD zero crossing at time tc is delayed from the ideal.
Since the slope is very low, in absolute terms the zero crossing time tc is large; therefore the
exponential term is negligible so we are left with
0 =−VSW −aAV τ+aAV tc (3.5)
tc =VSW
aAV+ τ (3.6)
We can thus express the difference between tc and the ideal zero crossing as td , the ZCD delay:
td = τ. (3.7)
What this says is that a first order system responding to a linear ramp will eventually settle to a con-
stant τ time shift between the input and output. To quantify what a small input slope means, recall
29
that the ideal zero crossing does not occur until VSW/aAV , so if VSW/aAV τ, this approximation
(the exponential is negligible) is valid.
If we consider the alternative scenario where the input slope a is very high, then the ideal zero
crossing time VSW/aAV τ. To solve (3.4) to get tc, again we want to solve vout,r(tc) = 0, but this
time we need to expand the exponential term into its Taylor series
0 =−VSW −aAV τ+aAV tc +aAV τ
(1− tc
τ+
t2c
2τ2 −t3c
6τ3 . . .
)(3.8)
0 =− VSW
aAV τ−1+
tcτ+
(1− tc
τ+
t2c
2τ2 −t3c
6τ3 . . .
)(3.9)
VSW τ
aAV=
t2c2+
(− t3
c6τ
+t4c
24τ2 . . .
)(3.10)
tc ≈√
2VSW τ
aAV(3.11)
Again we are finally interested in td , the delay between the actual ZCD zero crossing time tc and
the ideal crossing time, expressed as
td ≈√
2VSW τ
aAV− VSW
aAV. (3.12)
Note that AV/τ is the unity gain bandwidth of this single-pole amplifier, so if we kept that con-
stant and let AV → ∞ and 1/τ→ 0 then the system behaves more and more like an idea integrator,
looking at the frequency response in Fig. 3.3. When this happens, the high order terms of the Tay-
lor expansion in (3.10) go to 0, and the response of the system to a ramp is quadratic, as expected.
Then the approximation (3.12) becomes exact.
30
decreasing bandwidth
w
|A(jw)|
incr
easi
ng
gai
n
Figure 3.3: Reducing bandwidth and increasing gain proportionally cause the first-order systemto behave more and more like an ideal integrator.
From the above approximations we can characterize the ZCD’s response by two simplifications,
depending on the relation between the time the ZCD input is not saturating the transconductor
(VSW/aAV , shown in Fig. 3.2(a)) and the time constant τ of the ZCD. If VSW/aAV τ, then
the ZCD delay between ideal zero crossing and actual zero crossing is roughly constant, τ. We
call this the constant delay approximation. On the other hand, if VSW/aAV τ, then the ZCD
behaves like an ideal integrator, and the ZCD delay decreases by (3.12). We call this the integrator
approximation. The two approximations for the delay are shown in Fig. 3.4, overlaid atop the
exact delay expression from (3.3). For an example single stage with VSW = 0.4 V, a bandwidth of
5 MHz, and AV = 10, then the input slope that divides the output response between the constant
delay approximation and the integrator approximation is a = VSW/τAV = 1.2 V/µs. Note that for
steady-state LTI system analysis, all that is needed is the frequency of the input to characterize the
31
103
104
105
106
107
108
109
10−1
100
101
102
Input Slope (V/s)
Del
ay (
ns)
Exact DelayConstant Delay ApproximationIntegrator Approximation
Figure 3.4: Delay vs. input slope, and the two approximations in (3.5) and (3.12).
system response. Since we do not have a steady-state input, both the derivative and the amplitude
of the input affect the ZCD’s response.
This figure suggests that a modest bandwidth of 5 MHz can support a 1 V peak 20 kHz sine
wave with no delay variation since dv/dt|MAX = 125 kV/s. However, there is the practical issue
of gain. With an ADC LSB of 4 mV, the ZCD needs at least 48 dB of gain just to amplify an LSB
signal to logic levels, and in reality much more. So then the ZCD delay becomes the sum of several
stage delays, and the slope at the input of each stage increases down the chain. Figure 3.5 shows
32
103
104
105
106
107
108
109
10−2
10−1
100
101
102
Input Slope (V/s)
Del
ay (
ns)
First StageSecond StageThird StageTotal DelaySingle Stage Approximation
Figure 3.5: Delay of a three stage cascaded amplifier, by stage. Also shown is the delay of singlestage that has the same overall gain and bandwidth as the cascade.
the delay through the same amplifier, cascaded 3 times. The total delay is shown, as well as what
a single stage delay would be, had it the same -3 dB bandwidth and gain of the three cascaded
stages. The single stage approximation severely underestimates the overall delay since in the real
amplifier the signal must physically travel through three times as many devices [21]. Therefore the
simple model of a ZCD stage shown in Fig. 3.1 is useful for estimating a single stage’s delay as a
function of small-signal parameters gm, ro, etc. but cannot be directly applied to a cascade. As a
33
result, in the design process of the ZCD presented in sec. 4.3, the final device sizes were achieved
through optimization scripts.
3.1.2 Harmonic Distortion of a Level Crossing Quantizer
We now consider how the slope-varying ZCD delay affects the spurious performance of the LCADC.
Although in section 2.1 we chose an LCADC topology which only uses two ZCDs in a feedback
loop, for analysis we can still use a flash ADC as shown in Fig. 2.1–during normal operation the
level shifting in the feedback structure (shown in Fig. 2.4) is transparent to the ZCD and we can
think of the flash LCADC as the feedback LCADC “unrolled.” In the previous section, we de-
fined the LEVj binary signal as the output of the jth ZCD. While this is the most straightforward
decomposition, by using our assumption that the quantization thresholds are uniformly spaced and
symmetric around 0, we can conveniently group pairs of positive and negative thresholds together,
as demonstrated in Fig. 3.6 for 8 quantization levels. This results in half the number of signals,
and odd-symmetric waveforms for pure sinusoidal inputs. Defining a set of positive thresholds as
Tm =1
2N +m
2N−1 , for integer m ∈ [0,2N−1−1], (3.13)
34
then the input signal is decomposed into simple signals
T Rm(t) =
−1 if vx(t)<−Tm,
+1 if vx(t)> Tm,
0 otherwise.
(3.14)
The index m indicates the distance of the threshold from 0; m = 0 are the thresholds closest to
0, and m = 2N−1− 1 corresponds to the thresholds closest to ±1. Using this decomposition, the
continuous time reconstruction of the quantized signal is just the scaled sum of all the T Rm levels,
vxq(t) =1
2N−1
2N−1−1
∑m=0
T Rm(t). (3.15)
If we assume we have a periodic input vx(t), then each of the T Rm(t) signals are periodic, and
by linearity we can then write the Fourier series of the reconstructed quantized output vxq(t) as the
summation of the Fourier series for each T Rm(t), precisely
F
vxq(t)=
∞
∑n=−∞
fne−2π jnt (3.16)
35
T0
TR0
TR1
TR2
TR3
-T0-T1-T2-T3
T1T2T3
Figure 3.6: With symmetric quantization levels, it is possible to express the quantized signalwith 2N−1 signals T Rm(t). For an odd-symmetric input, each of the T Rm(t) signals is also oddsymmetric.
where fn is the nth Fourier component of vxq, expressible as
fn =2N−1−1
∑m=0
cm,n, (3.17)
cm,n = Fn T Rm(t) (3.18)
The notation Fn T Rm denotes the nth Fourier coefficient of T Rm(t). This is a useful decompo-
sition, because it allows us to analyze the distortion of the level-crossing quantization with simple
functions. If we have a pure sinusoidal input then ideally the Fourier series of the output F
vxq(t)
will have only one non-zero coefficient, f1. However since the quantization introduces some dis-
tortion we will have harmonics of the input at f3, f5, f7. . . (with a symmetric quantizer we should
36
ideally only have odd-order distortion). The above decomposition allows us to express the Fourier
coefficients and therefore distortion components as the sum of the cm,n Fourier coefficients of T Rm
signals, which are easy to compute.
Appendix A.1 derives the Fourier coefficients cm,n for a full-scale, frequency normalized sinu-
soidal input vx(t) = sin(2πt) as
cm,n =
2cos(narcsin(Tm))
π jn2N−1 , n ∈ [1,3,5 · · ·)
0, otherwise.(3.19)
This is intuitively satisfying, because (a) there are only odd harmonics which is what we would
expect from a signal with odd symmetry, and (b) the coefficients cm,n are purely imaginary, which
means the Fourier series is a real sine series (signals with odd symmetric have sine series, signals
with even symmetry have cosine series). We can simplify the Fourier expansion of vxq(t) to:
vxq(t) =∞
∑n=1
2N−1−1
∑m=0
bm,n sin(2πnt) (3.20)
bm,n = j(cm,n− cm,−n) =
4cos(narcsin(Tm))
πn2N−1 , n ∈ [1,3,5 · · ·)
0, otherwise.(3.21)
As a sanity check, if we let N = 1 and Tm = 0, then this equation boils down to bn = 4/nπ, which
are the Fourier coefficients of a square wave. The partial Fourier sums of this series are shown in
Fig. 3.7. Observe that quite a few terms need to be included before the step-discontinuities of
the quantizer levels are accurately modeled-this is due to the fact that the many small steps (512
37
0.22 0.23 0.24 0.25 0.26 0.27
0.97
0.975
0.98
0.985
0.99
0.995
Time (s)
Vol
tage
(V
)
Input150 Terms800 Terms4000 Terms
Figure 3.7: The Fourier series approximation of vxq(t) for an almost full-scale time-normalizedsinusoid input, for 150, 800, and 4000 Fourier terms.
per period for an 8-bit quantizer) of the LCADC waveform have a lot of high frequency content
beyond the fundamental.
Another aspect of the LCADC waveform which can be explained by the Fourier series is the
extreme input-amplitude-sensitivity of the harmonic distortion components. In Fig. 3.8, the SFDR
(defined as the difference in dB between the fundamental component and the largest distortion
component) is plotted as a function of the input amplitude. The sharp peaks and valleys of this
curve, over an extremely narrow input amplitude window of 2 LSB from full scale, can be ex-
38
0.985 0.99 0.995 1
−100
−95
−90
−85
−80
−75
−70
Input Amplitude (V)
SF
DR
(dB
)
Figure 3.8: Wide variation in SFDR for small changes in input amplitude. The solid verticalline represents a level of the reconstructed quantized signal, and the dotted lines are the quantizerthresholds. The figure spans two LSB of the quantizer input range.
plained as follows. In [35] the LCADC quantization error waveform was viewed as the sum of
two components, a high-frequency sawtooth wave corresponding to the intervals in which the in-
put is quickly moving, and a low-frequency bell shaped wave, corresponding to the local peaks
of the wave. By varying the input amplitude slightly, the sawtooth (high frequency) is mostly un-
affected, while the bell wave is strongly affected, which modulates the low-frequency distortion
39
components. In practical situations, because these peaks and valleys are so narrow, the measured
spurious performance of the LCADC is some sort of average of the extrema.
3.1.3 Delay-Dispersion-Induced Harmonic Distortion
As derived in Sec. 3.1.1, a real ZCD has a delay which varies depending on the slope of the input
signal. The variation of ZCD delay is called delay dispersion. We can incorporate the appropriate
signal-dependent delay into each of the Fourier components of (3.18), written generally as
cm,n = Fn
T Rm
(t− f (v′x(t))
). (3.22)
If we return to a pure sinusoid, as shown in Fig. 3.6 each T Rm(t) signal has two rising transi-
tions and two falling transitions per period. At each of these transitions the magnitude of the input
slope is the same, by symmetry of the sine wave. Therefore each T Rm(t) signal is delayed by a
constant amount of time, dm. Combining equations (3.16,3.17,3.18) we can write
F
vxq(t)=
∞
∑n=−∞
2N−1−1
∑m=0
Fn T Rm(t−dm)e−2π jnt . (3.23)
We note that a constant shift in time dm of a periodic function results in a phase rotation by
e−2π jndm of the nth Fourier coefficient, so we simplify to get
40
F
xq(t)=
∞
∑n=−∞
2N−1−1
∑m=0
Fn T Rm(t−dm)e−2π jnt (3.24)
=∞
∑n=−∞
2N−1−1
∑m=0
e−2π jndmFn T Rm(t)e−2π jnt (3.25)
=∞
∑n=−∞
2N−1−1
∑m=0
e−2π jndmcm,ne−2π jnt (3.26)
From this point onward, to unclutter we omit explicit bounds of summation, with the understand-
ing that summation over the n index is on (−∞,∞) and that summation on the m index is on
[0,2N−1−1]. In appendix A.2 we derive that the power of the nth harmonic distortion component
of F
xq(t)
is equal to the power of fn, the nth distortion component of the ideal quantizer, plus
a term en equal to the extra distortion due to delay dispersion
en = 2πn ∑r 6=s
dr,sIcr,ncs,n−2π2n2
∑r 6=s
d2r,sR cr,ncs,n, (3.27)
where dr,s is defined as the difference between ZCD delay for the T Rr level and the T Rs level. We
can use this to predict the increase in distortion power as a function of dr,s in order to determine
the maximum dr,s allowed for a given distortion specification. To do so, we first have to define the
dm delays. The delay variation is modeled as a linear slope-dependent variable,
d(t) = TDEL−TSPREAD
v′x,MAXv′x(t), (3.28)
41
where TDEL is the absolute delay, and TSPREAD is the difference between maximum and minimum
delays. and v′x,MAX is the maximum derivative of the input to normalize the delay dispersion. As
derived in appendix A.3, for a full-scale frequency normalized input, the magnitude of the slope at
each of the T Rm(t) transitions is 2πcos(arcsin(Tm))). We can therefore write
dr,s = 2πTSPREAD
v′x,MAX(cos(arcsin(Tr))− cos(arcsin(Ts)))). (3.29)
If we plug (3.29) into (3.27), we can plot the effect of delay dispersion on harmonic distortion,
as shown in Fig. 3.9. We see that the close-in terms are most affected.
For a fixed dispersion, the effect is more pronounced at higher input frequencies, since a fixed
time delay results in a larger phase error at higher frequencies. Figure 3.10 shows the variation
in SDR for three different input frequencies on a log scale, necessary to see all plots clearly. For
this example, SDR is defined as the power of the signal divided by the sum power of the first 500
harmonics. To keep the delay variation from affecting SDR by more than 1 dB at all frequencies
the ZCD dispersion must be kept below 60 ns.
To use this information to specify the ZCD, recall from (3.5) and (3.12) that the maximum
delay for low-slope inputs is τ, the time constant of the single-stage ZCD, but goes by the square
root of the input slope for high-slope inputs. Unfortunately with a cascade of stages there is no
simple solution for overall delay. The delay dispersion of a chain of 3 stages with a total gain of
80 dB, with a worst-case 20kHz 1 V peak input is plotted using (3.3) in Fig. 3.11. From this plot
it is seen that each stage needs a bandwidth greater than 3.5 MHz for a dispersion less than 60 ns.
42
5 10 15 20 25 30−80
−75
−70
−65
−60
−55
−50
−45
Harmonic Number (× fin
)
Rel
ativ
e P
ower
(dB
c)
No Delay Dispersion100 ns Delay Dispersion200 ns Delay Dispersion
Figure 3.9: The first harmonics of the LCADC output with ZCD delay variation. The input is a20 kHz 1 V peak sine wave.
At this point, it is worth noting that a cascade of three 3.5 MHz amplifiers can have an absolute
delay up to 140 ns, which is larger than the required TGRAN discussed in section 2. This means that
in the generic LCADC topology shown in Fig. 2.4, the requirement on absolute ZCD delay given
by Eq. (2.4) is a more stringent specification than the delay dispersion requirement. It is not until
TGRAN is increased thanks to the adaptive resolution algorithm introduced in section 4.1 that the
dispersion becomes a limitation and must be compensated.
43
20 40 60 80 100 120 140 160 180 20010
−6
10−5
10−4
10−3
10−2
10−1
100
101
Delay Dispersion (ns)
SD
R D
egra
datio
n (d
B)
200 Hz Input2 kHz Input20 kHz Input
Figure 3.10: Impact of dispersion on SDR, 500 harmonics considered. The input is a full scalesinusoid of the specified frequency.
3.2 Noise Analysis
3.2.1 ZCD Noise
We have just considered the effect of a deterministic, signal-dependent delay in the ZCD output
on the fidelity of the reconstructed quantizer output vxq(t). Now, we analyze the effect of random
variations in delay time, i.e. jitter, on vxq(t). Before proceeding, we define the notation of the pulse
44
1 2 3 4 5 6 7 8 9 1010
1
102
103
Stage Bandwidth (MHz)
Del
ay D
ispe
rsio
n (n
s)
Figure 3.11: Dispersion as a function of stage bandwidth for a 3-stage cascade of identical single-time-constant stages.
shown in Fig. 3.12:
p0(∆, t) =
−1 when ∆ < 0, ∆≤ t < 0
+1 when ∆≥ 0, 0≤ t < ∆
0 otherwise.
(3.30)
45
|D|
1
|D|
-1
D≥0 D<0
Figure 3.12: Visualization of the function p0(∆, t) defined for all real ∆.
s0 s1 s2 s3
s2(N+1)s2(N+1)-1
Figure 3.13: Within one period of a full-scale sine wave there are 2N+1 level crossings, whichoccur at times si relative to the beginning of each period.
By time-shifting and scaling a square pulse we get the Fourier transform of p0(∆, t) as [36]:
P0(∆, f ) = e− jπ∆ f∆sinc(π∆ f ). (3.31)
If we feed a noiseless N-bit quantizer with a full-scale input of frequency fin, then the sample
times themselves form a periodic sequence. The input is periodic with time 1/ fin, so over one
period without loss of generality from 0≤ t < 1/ fin, we have a length 2N+1 sequence of crossing
times s0,s1,s2 . . . such that the crossing of the first positive level closest to 0 is s0, the next level
from 0 at time s1, etc. and this sequence repeats at multiples of 1/ fin. This is depicted in Fig. 3.13.
46
(a)
(b)
(c)
Figure 3.14: Decomposition of a noisy T Rm(t) signal (a) into the ideal noiseless T Rm(t) signal(b) and a pulse sequence representing jitter due to noise (c).
Now let us suppose that when resolving the ith zero crossing of the first period from 0 (at time si),
there is a disturbance in the ZCD which delays its decision time by a time ∆i. Then we can write
the resulting quantizer output (with disturbance) vxqn(t) using the p(∆, t) notation:
vxqn(t) = vxq(t)+1
2N−1 p0(∆i, t− si) (3.32)
The ith level is delayed which is equivalent to appending a small pulse of time ∆i whose height
is equal to the width of each level. Note this is valid for negative ∆i as well, since in this case
the negative pulse subtracts from the level, causing the crossing to occur earlier. From Eq. (3.15)
we know the input can be decomposed into a sum of three-level signals T Rm(t). Each of these
signals is derived by comparing the crossings of the input signal with different thresholds Tm, in
the topology of Fig. 2.5. Therefore, each of the transitions of each T Rm(t) is corrupted by noise.
We can then express the actual T Rm(t) with noise as the sum of an ideal noiseless T Rm(t) signal,
plus a random pulse sequence as shown in Fig. 3.14.
47
Let us define the sequence ∆i,k, which is a Gaussian sequence of independent samples of noise,
which we assume to be cyclostationary on the i index: the suffix i,k refers to the jitter on the ith
crossing of the kth period [k/ fin,(k+1)/ fin) of vx(t). In the context of the LCADC, cyclostation-
arity is reasonable since after each crossing the ZCD resets itself so each sample is independent
from the previous, and the only difference from sample to sample is the slope of the ZCD input,
which is a periodic function. From [37] we can decompose the cyclostationary process ∆i,k into a
stationary process and its envelope, namely
∆i,k = αi( fin)∆τi,k. (3.33)
From the previous section we know the delay is a function of input slope. In our ZCD model
from Fig. 3.1, let’s assume that when the input is saturating the transconductor the voltage at the
output is clamped to −VSW . As the output voltage leaves the clamped state, random noise causes
the output voltage to cross 0 at a different time, thus disturbing the zero crossing randomly. This
is shown in Fig. 3.15. The exact calculation of the statistics of a noisy signal tc seconds after an
initial condition is the solution to a stochastic differential equation [38] and is too complicated to
lend insight. Instead, we can approximate the circuit in Fig. 3.1 as a white current noise source
(modeling drain current and resistor current noises) being integrated onto the capacitor. We know
that for a white noise current charging a capacitor, jitter ∝√
delay, [33, 39, 40]. Thus we can
evaluate the jitter of the ZCD for the maximum input slope, and the samples ∆τi,k are the samples
of jitter at that slope. Then αi( fin) is the scaling factor for each of the i level crossings, which
48
VSW t
vouttctc
Figure 3.15: Noise corrupts the path of the ZCD output voltage, causing the zero crossing to bedisturbed randomly.
decreases as fin increases since the ZCD delay decreases, and vice versa. Then we can rewrite the
noisy quantizer output as
vxqn(t) = vxq(t)+1
2N−1 vn(t). (3.34)
From the above discussion and Fig. 3.14, the noise term vn(t) is the sum of scaled shifted
pulses, the statistics of which are cyclostationary with period 1/ fin. Expanded fully, we can write
vn(t) =∞
∑k=−∞
2N+1−1
∑i=0
p0
(αi( fin)∆τi,k, t− si−
kfin
). (3.35)
The total added noise is the linear sum of pulse trains, repeating at fin, with pulses of random width
αi( fin)∆τi,k. To analyze such pulse trains, typically the approximation is made that the pulse width
αi( fin)∆τi,k is much smaller than the period, 1/ fin [41]. The Fourier transform of this pulse given
in (3.31) substituting ∆ = αi( fin)∆τi,k. The transform is a sinc function with a main lobe width
of 1/αi( fin)∆τi,k, so by saying the pulse width is much narrower than the period is equivalent
to saying the pulse energy is broadband compared to the modulation frequency fin, and hence it
49
appears white within the band of interest. Thus the pulse train is approximated as a Dirac delta
train, whose amplitudes are modulated by the random sequence ∆τi,k:
xn(t) =2N+1−1
∑i=0
αi( fin)∞
∑k=−∞
∆τi,kδ
(t− si−
kfin
). (3.36)
The sequence of points can be thought of as the sum of 2N+1 independent, identically dis-
tributed sub-sequences at rate fin. Each of these sub-sequences is scaled by αi( fin), and the
integrated noise N2n,i of each sequence over a band [0, fBW ) is given by
N2n,i = fBW finσ
2n, (3.37)
where σ2n is the jitter variance of the ZCD at maximum input slope. The appearance of fin in this
equation represents that within a 1 second interval, fin pulses occur, thus in this case it is a unit-less
multiplier. So the total noise power due to ZCD jitter over a [0, fBW ) bandwidth is
Snn( f ) =1
22N−2
2N+1−1
∑i=0
α2i ( fin) fBW σ
2n (3.38)
=1
22N−2 α2RMS( fin) fin fBW σ
2n. (3.39)
Note that this analysis follows closely the noise analysis for oscillators presented in [40]. Since the
50
fundamental has amplitude 1 and therefore power 1/2, the SNR is given by
SNR = 10log10
[2N−3
α2RMS( fin) fin fBW σ2
n
](3.40)
The SNR improves by 3 dB for every bit of resolution. This is because for every bit, the number of
uncorrelated noise sources doubles (which add in variance) yet the amplitude of each noise source
has halved since the LSB step halves, which results in a quarter the power, yielding a net 3 dB
improvement in SNR.
The validity of this approximation depends on the jitter magnitude with respect to the whole
period, (1/ fin). A model in Matlab was built using the precise noise model in (3.35), for an 8-bit
LCADC with 35 ns RMS ZCD jitter (much larger than anticipated, to exacerbate the difference
between approximation and actual). The ZCD input-dependent delay was modeled using the ap-
proximation in (3.28). The system was fed a full-scale 10 kHz tone, and the bandwidth of SNR
calculation was 20 kHz. The precise zero-crossings were calculated analytically, then re-quantized
to a 2 GHz sampling clock to enable an FFT operation. As seen in Fig. 3.16, looking over a very
wide bandwidth the approximation is invalid. On the other hand, it is unlikely a system designed to
process signals up to 20 kHz would have a bandwidth of 100 MHz. Looking at the same spectrum
over a 2 MHz bandwidth in Fig. 3.17, we see the two curves coincide. The precise SNR is 71.9
dB, and SNR with the impulse approximation 71.6 dB (over a 20kHz bandwidth).
With the same system, when the input frequency is reduced to 1 kHz, the same plots in Figs.
3.18 and 3.19 show similar results: now both the real model and approximation model give 75.8
51
0 10 20 30 40 50 60 70 80 90 100−140
−120
−100
−80
−60
−40
−20
0
Frequency (MHz)
Spe
ctru
m (
dB)
Precise Jitter ModelImpulse Jitter Approximation
Figure 3.16: Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 10 kHz input.Resolution bandwidth is 100 Hz.
dB SNR with the same 35 ns jitter in the ZCD (again, over a 20 kHz bandwidth). The reason the
SNR does not improve by 10 dB as suggested by 3.40 is that the value of αRMS( fin) increases as fin
decreases, due to the longer delays through the ZCDs, and hence greater crossing jitter variation.
It should be noted that although the developed model concerns ZCD jitter-induced output noise,
other noise sources in the circuit such as jitter in the asynchronous logic and switch noise in the
capacitor DAC can be integrated into this model with little effort.
52
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−120
−100
−80
−60
−40
−20
0
Frequency (MHz)
Spe
ctru
m (
dB)
Precise Jitter ModelImpulse Jitter Approximation
Figure 3.17: Detail of Fig. 3.16 over a 2 MHz span with the same RBW = 100 Hz. At lowfrequencies, the impulse approximation is still valid.
53
0 10 20 30 40 50 60 70 80 90 100−160
−140
−120
−100
−80
−60
−40
−20
0
Frequency (MHz)
Spe
ctru
m (
dB)
Precise Jitter ModelImpulse Jitter Approximation
Figure 3.18: Simulated LCADC noise with ZCD jitter, over a 100 MHz span with a 1 kHz input.Resolution bandwidth is 10 Hz.
54
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−140
−120
−100
−80
−60
−40
−20
0
Frequency (MHz)
Spe
ctru
m (
dB)
Precise Jitter ModelImpulse Jitter Approximation
Figure 3.19: Detail of Fig. 3.18 over a 2 MHz span, with RBW = 10 Hz. Note the LCADCdistortion components have moved closer in, but other than that the impulse approximation stillpredicts the noise level.
Chapter 4
Design of a µW Programmable LCADC
This chapter describes the implementation of a 20 kHz, 8-bit LCADC in 0.13 µm CMOS. In the
following design, due to incompatibility between the technology kit and available CAD tools, no
post-layout extraction was possible. Not knowing the parasitic capacitances meant many circuit
blocks had to be designed conservatively, and thus consume more power than necessary.
The top level block diagram is shown in Fig. 4.1. What follows is a high-level overview of
the system operation, with an in-depth block-by-block discussion in the following sections. While
Fig. 4.1 bears a resemblance to the prototypical capacitor feedback DAC LCADC structure in Fig.
2.5, there are a few additional blocks. The underlying operation is the same: during operation, a
pair of capacitive DACs perform the subtraction of the input signal with the analog reconstructed
digital output. This difference is nominally contained within a comparison window by the ZCDs.
When the difference grows large enough, indicating that the analog input has diverged from the
55
56
reconstructed output, one of the ZCDs triggers, causing the ASYNC TIMING block to update the
digital output value, as well as the value to be fed back to the capacitive DAC.
The improvements upon the basic architecture are briefly outlined. When a ZCD crossing
occurs, the ASYNC TIMING block triggers computations in the ADAPTIVE RES CONTROL
block which dynamically varies the LCADC resolution on a sample-by-sample basis, in a manner
that reduces the average number of samples taken (versus a non-adaptive LCADC [8,10,27]) while
having minimal impact on the fidelity of the digitization. The need to support a varying resolution
is why the capacitive DAC is split into binary and thermometric banks. The ASYNC TIMING
block also causes the ZCD BIAS CONTROL block to adjust the ZCD bias current in order to
partially cancel the effect of input-slope-dependent delay dispersion through the ZCD, which as
was discussed in Sec. 3.1.3 is a source of distortion in an LCADC. The digital bias values from the
ZCD BIAS CONTROL block are converted through a pair of DACs to bias currents for the ZCDs.
In order to support all the digital computations performed by the sub-blocks in the ASYN-
CHRONOUS CONTROLLER without resorting to an always-on clock, a delay-line based timing
generator is used. The edges generated by this block are synchronized by the ZCD triggers, and
therefore signal-dependent and asynchronous with respect to absolute time; an input which pro-
duces samples sparingly produces correspondingly fewer edges which clock the digital, and there-
fore input-activity dependent digital power consumption is achieved. The delay elements have
individually controllable delays through an array of bias DACs, such that the adaptive algorithms
can be tailored to the expected input characteristics.
Rounding out the IC is a separate digital block which is clocked from an integrated oscillator.
57
The oscillator is required to drive the calibration algorithm that cancels the ZCD offsets; because
the calibration does not need to be run continuously, it is enabled only to perform the calibration
cycle and disabled immediately afterward. Also contained in the block is a standard SPI interface
used to program the IC and read off various internal states during operation.
In the sections that follow, the detailed operation of each of these blocks will be discussed.
58
ZC
D B
IAS
DA
CS
ZC
D B
IAS
CO
NT
RO
L
AD
AP
TIV
E
RE
S
CO
NT
RO
L
AS
YN
CH
RO
NO
US
CO
NT
RO
LL
ER
CAL DAC
DE
LA
Y B
IAS
DA
CS
CA
LIB
RA
TIO
NS
PI
SY
NC
HR
ON
OU
S D
IGIT
AL
IN
OU
T
tt
t
PR
OG
RA
M
OS
CIL
LA
TO
R
+-+-
BIN
AR
Y D
AC
S (M
SB
)T
HE
RM
OM
ET
RIC
DA
CS
(LS
B)
INP
UT
SW
ITC
H
dacRST
dacEN
zcdEN
AS
YN
C
TIM
ING
zcdEN
dacRSTdacEN
TIM
ING
GE
NE
RA
TIO
N
CA
PA
CIT
OR
FE
ED
BA
CK
DA
C
ZC
D
BIASHBIASL
BINARY[4:0]
THERMH[23:0]
THERML[23:0]
TIME0[3:0]
TIME10[3:0]
TIM
ING
ED
GE
S
UP
DN
Figure4.1:
ToplevelL
CA
DC
blockdiagram
.
59
4.1 Adaptive Resolution Algorithm
Since the minimum inter-sample timing TGRAN is a function of input derivative, the issue of narrow
sample spacing only occurs for large-amplitude, high-frequency inputs. For a full scale 100 Hz
input, TGRAN is 12.6 µs, and likewise for a 20 kHz -40 dBFS input, TGRAN is 6 µs–both of these are
far from the minimum TGRAN of 62 ns, obtained for a full-scale, 20 kHz input. Therefore, the full
speed (and power) of the ZCD is only needed for a limited set of inputs, and when the input activity
is reduced so are the ZCD requirements. Fortunately as demonstrated in [42], it is possible to skip
samples during high slope inputs without a significant impact on in-band fidelity, and therefore
get away with a lower-power ZCD. Intuitively, the quantization error from high-slope regions of
the curve contributes to distortion at very high frequencies; thus reducing the “effective” sampling
rate in these portions will only increase the out-of-band error. By skipping samples during high-
slope portions, the ZCD requirements can be relaxed exactly where they are most stringent, so the
average ZCD performance requirements are also relaxed (and therefore the ZCDs consume less
power).
Two proposed implementations of a variable-resolution quantizer for CTDSP applications sug-
gested in [42] are shown in Fig. 4.2. In the 4.2(a), a continuous time derivative of the input is
digitized and controls the resolution of the main ADC. In the more digitally-oriented implemen-
tation 4.2(b), the slope is estimated by comparing the inter-sample time with known time delays,
and if the inter-sample time is lower than a certain threshold the resolution is reduced. The first
implementation unfortunately does not solve our problem of ZCD response time-it just moves it to
60
LCADC
RES
CTL
LCADC
RES
CTL
ttt
(a) (b)
DERIV
LCADC
Figure 4.2: Two methods for reducing the sampling rate.
a different ZCD. The derivative is estimated by a simple CR filter, and the output is quantized by a
coarse flash LCADC, whose value is used to determine the resolution of the main LCADC. How-
ever the ZCDs in the flash LCADC require full speed at all times, and thanks to the added delays
in this controlling path, the ZCD in the controller needs to be faster than the main ZCD would oth-
erwise need to be. In addition, unlike most other analog blocks whose power consumption scales
with resolution, even though the controller path ZCD only needs to quantize a few levels, its power
consumption is still dominated by the static bias current required to get the speed (response time).
In the second proposal, a digital block(RES CTL) computes an estimate of the derivative of the
output signal. In order for the controller to determine that the input slope has increased the LCADC
itself must be able to produce samples at the higher rate, which in turn requires a fast ZCD. This
returns us once again to the problem of high power consumption in the LCADC itself. The fact that
neither of the proposed techniques directly solves the ZCD power problem in LCADCs is under-
standable since the focus of [42] was reducing the power consumption in the subsequent CTDSP
through reduced sample throughput, for which both techniques are ideally suited. The main find-
61
+
-
+
-
VIN
VH
VL
VH
VL
1 LSB STEP
(HIGHEST
RESOLUTION)
5 LSB STEP
3 LSB STEP
VIN
RECONSTRUCTED
QUANTIZED OUTPUT
1×LSB
3×LSB
5×LSB
7×LSB
WINDOW IS WIDEST
IMMEDIATELY AFTER
A SAMPLE OCCURS
NARROWING
WINDOW OVER TIME
TD0
TD1
TD2
(a) (b)
Figure 4.3: The behavior of an ideal adaptive resolution scheme (a) and the proposed (b).
ing that reducing sampling rates during high slope inputs minimally affects in-band fidelity is still
valid, but a different method needs to be sought in order to reduce LCADC power consumption
itself.
Figure 4.3 shows the basic idea of the proposed adaptive resolution (AR) algorithm. We begin
with Fig. 4.3(a), which shows an ideal adaptive resolution controller. We define the “leading
boundary” as the comparison voltage the input is expected to cross should its derivative not change
sign during the current sample, and the “trailing boundary” as the other voltage. In the ideal case,
the leading boundary increments on each sample by an amount proportional to the slope of the
input. Likewise, the LCADC’s output values predict the trajectory of the analog input based on
information at the current sample, visible in Fig. 4.3(a), each time the dotted line (reconstructed
quantized output) leads the solid black line (analog input). The problem with this implementation is
62
that a lot of information is required about the signal. A first derivative allows the system to precisely
predict level crossings for linear (ramp) inputs. In order to accurately predict level crossings for
more complex inputs, more derivatives are needed. Obviously, in this energy-constrained system
it is not feasible to compute and process multiple signal derivatives on a per-sample basis.
Moving toward an algorithm with low implementation complexity, the approach adopted for
this work is shown in Fig. 4.3(b). Instead of predicting when the next sample will occur (and
hence the resolution required), immediately after each sample the (VL,VH) comparison interval is
widest. A level crossing which occurs while the comparison window is wide is quantized with low
resolution. If TD0 seconds after the previous level crossing a new level crossing has not occurred,
the comparison window is narrowed one LSB step. This is equivalent to incrementally increasing
the ADC resolution. Until a crossing occurs, the quantizer resolution continues to increase over
time until the quantizer is in its highest resolution state. In this manner, high-slope inputs result
in boundary crossings soon after the previous sample, when the quantizer is in a low-resolution
state, while slow-moving inputs produce crossings when the quantizer resolution is higher. This
behavior is consistent with the proposed variable-resolution algorithms presented in [42], although
the implementation is different. The quantizer output is always updated with a step equal to twice
the instantaneous resolution step; as seen in Fig. 4.3(b), the dotted line is always symmetrically
spaced above and below each sample. The reason for this is that the adaptive resolution controller
can be simply implemented without need for deep memory of multiple previous samples (only the
directly preceding one is needed) or complex, power-consuming prediction logic.
While the behavior of the leading interval boundary (the boundary which the signal is expected
63
(a) (b)
CROSSING
DURING
RESOLUTION
SWITCH
Figure 4.4: If the input slope changes, the trailing boundary can catch the input leading to oscil-lations, shown in (a). The alternative trailing boundary behavior shown in (b) does not suffer thisproblem.
to cross, assuming its derivative does not change sign) is well defined, there are two options for
the behavior of the trailing boundary, as shown in Fig. 4.4. In 4.4(a), the trailing boundary is
increased in accordance with the TDx time intervals until it is 0.5VLSB below the reconstructed
output value, which means the total comparison interval is now VLSB wide at the highest resolution
setting, similar to the LCADC behavior without AR. In this implementation both the leading and
trailing edges have the same convergent behavior. The problem with this implementation is the
situation where the input derivative decreases enough that the trailing boundary catches the input
signal, thereby triggering a false downward increment. In Fig. 4.4(a), the crossings are hard to
see because the crossings occur when a boundary is switching resolutions: thus the crossing is
not due to the input crossing a quantization threshold, but due to the AR algorithm itself. This is
discussed shortly. Since the signal is still increasing, on the next comparison interval an upward
increment occurs, and this process can repeat, so that the output oscillates around the input value.
64
From behavioral simulations, while this effect did not degrade distortion performance (it increased
high frequency “noise,” similar to a first-order ∆Σ converter) it increased the average sample rate,
wasting power.
The alternative solution shown in 4.4(b) has the trailing boundary start VLSB below the actual
crossing voltage of the previous sample, then increment by one VLSB and stop. That the boundary
initially starts below the actual crossing and increments to the previous crossing threshold is a form
of hysteresis that reduces the chance of noise in slow-moving inputs from triggering false samples.
Compared with the first option, because the trailing boundary stops at the previous crossing voltage
ensures that a negative transition is only possible if the input has an inflection point.
In this system, the AR controller can be programmed to different minimum resolutions (the
quantizer resolution immediately after a sample), from 1×VLSB (effectively fixed at maximum
resolution) to 15×VLSB. Since the feedback DAC hardware is fixed at 8 bits, if each quantizer step
skips 15×VLSB the quantizer resolution is reduced to 17 levels, or about 4 bits. To compute the
new TGRAN at this minimum resolution, we can compute the amount of time it takes a full-scale 20
kHz input to cross the wider quantization step:
TGRAN =Quantization Step
Input Slope=
15VLSB
2π2N−1VLSB fMAX= 930 µs. (4.1)
We have therefore relaxed the minimum inter-sample time from 62 ns to almost 1 µs.
From a hardware perspective this method is simpler than using an analog derivative estima-
tor, because like the scheme in Fig. 4.2(b), it requires no measurement on the input aside from
65
CROSSING
DURING
RESOLUTION
SWITCH
Figure 4.5: If the input crosses a threshold during a resolution change, there is a quantization error.
the LCADC sampling itself. It also requires no additional analog circuitry, just a modification of
the feedback DAC to support the variable comparison interval. The timing intervals TDi are pro-
vided by a delay line which is already present to implement other timing functions, and the entire
algorithm is implemented in the digital controller. Its downside is that because the algorithm is
open-loop, the discrete steps in the resolution change can introduce quantization error if the AR is
in between two resolutions when a crossing occurs, already shown in Fig. 4.4(a) and shown again
clearly in Fig. 4.5. While this could be a performance limitation in high resolution LCADCs, for
this 8-bit converter simulations showed > 50 dB SNDR, so alternative solutions were not consid-
ered.
66
DELAY
CELL
DELAY
CELL
DELAY
CELL
UP
DN
COMBINATIONAL LOGIC
TIMING
EDGES
B
IAS
D
AC
B
IAS
D
AC
B
IAS
D
AC
t
TIME0[3:0]
TIME10[3:0]
11× 4-bit TIMING CONTROL DACs
ZCD SIGNALS
Figure 4.6: Asynchronous timing generator formed with an 11-tap delay line, each with 4-bitcontrol.
4.2 Programmable Asynchronous Timing Generation
The AR algorithm in the preceding section as well as the adaptive ZCD bias algorithm (to be
explained in the following section) both require a time base with which to compare the inter-
sample timing, in order to estimate input activity. In synchronous systems a timer is realized with
a high-speed clock and counter, but in the absence of any clock a delay line must be used. Shown
in Fig. 4.6 is the 11-tap delay line with 4-bit bias DACs, one for each delay cell. The delay values
T IMEi[3:0] are provided externally though the serial-peripheral interface (SPI).
The timing of the first few elements of the delay line are shown in Fig. 4.7. When a ZCD
crossing occurs (UP or DN go high) the combination logic resets each delay element simultane-
ously. The τ delay element at the output of the XOR allows adequate time for the elements to reset,
before the pulse generator launches the delay line. In this manner, the delay line always generates
the same set of edges (whose relative spacings TDi are dictated by the T IMEi[3:0] control words)
67
UP
DN
EDGE 0
EDGE 1
EDGE 2
EDGE 3
EQUAL DELAY
DELAY LINE IS
RESET BY
INCOMING
TRIGGER
Figure 4.7: The timing edges from the delay line are launched by either an UP trigger or DNtrigger. If the line has not finished propagating when then next trigger arrives, it restarts from thebeginning.
relative to a ZCD crossing event, which the AR algorithm uses to control the leading and trailing
comparison boundaries to produce the resolution transitions in Fig. 4.3.
During operation it will often happen that a new sample arrives while the AR is still converg-
ing the comparison boundaries; this is equivalent to saying that the delay line has not finished
propagating through from the previous sample. As explained above, an UP or DN rising edge
asynchronously resets each delay cell simultaneously–shown as the truncated pulse on EDGE 3 in
Fig. 4.7–and the added logic needed to implement a fast asynchronous reset complicates the design
of this cell with respect to those previously reported [8, 43, 44]. The full schematic of each delay
68
IN
BIAS
OUT
RST
RST
1
1
2.4
2
DEVICES WITH NO SIZE ARE
MINIMUM SIZE 0.15/0.13
1
1
1
0.2
M1a M1b
M2
M3
Figure 4.8: Schematic of a delay element, with the devices of interest highlighted in gray. Deviceswithout sizes given can be assumed to be operating as switches and non-critical.
cell is shown in Fig. 4.8. The core devices are those in the dotted box, all other devices behave
as switches. Devices M1a,b are current sources whose gate voltage is set by one of two master
references (set by an off-chip resistor). Before the cell triggers, IN and OUT are both high, and
RST is low. This means M1a is not sourcing current, and the switch M3 is closed thus discharging
the MOS capacitor. When IN goes low, the M1a current begins charging the MOS cap through M3,
until the output of the actively loaded common source amplifier formed by M2 and M1b transitions
from high to low. The following switches and logic implement a regenerative latch with fast reset.
The propagation of the falling M2 drain to the falling OUT is so fast in comparison to the voltage
ramp on the MOS cap that when OUT goes low (and opens M3) the voltage on the MOS capacitor
is not much higher than VT of M2. The total charge taken from the supply1 is therefore CtotalVT ,
where Ctotal is the sum of the MOS capacitance and all parasitics on the shared M1a drain/M2 gate
1Not counting the switching currents in the logic, which is admittedly a large portion of the overall power con-sumption.
69
node. This reduced charging is contrasted with the delay cell presented in [43], where when the
equivalent threshold is reached a positive feedback path quickly charges the capacitor to VDD, and
in the process drawing a total of CtotalVDD charge from the supply.
4.3 Zero Crossing Detectors with Dynamic Current Bias
In a previous LCADC without adaptive resolution [10], TGRAN was 62 ns for a full-scale 20 kHz
input. Since the total sum of the propagation delay around that loop (ZCD delay, digital logic
delays, feedback DAC delay) was smaller than TGRAN = 62 ns, the maximum ZCD delay was by
necessity smaller than 62 ns. The upper bound on ZCD delay dispersion is the absolute delay
through the ZCD thus in that system the input-dependent delay dispersion was not a dominant
effect–in Sec. 3.1.3 it was found that the delay dispersion needed to be less than 60 ns to not affect
SDR. In this system, from eq. (4.1) the loop is allowed to have almost 1 µs total delay, which means
the ZCD delay can also be relaxed. We expect that the ZCD delay dispersion scales with absolute
delay, so with a longer allowable ZCD absolute delay the delay dispersion becomes an issue. To
minimize delay dispersion while maintaining low power, an adaptive current bias is implemented.
4.3.1 Three-Stage Zero Crossing Detector
The ZCD follows closely the topology of a standard comparator, with a preamplifier followed by
a latch as shown in Fig. 4.9.
70
+
-
-
+
+
-
-
+
+
-
-
+
+
-IN+
IN-
OUT
zcdRST
STAGE1 STAGE2 STAGE3
BIAS
Figure 4.9: The ZCD comprised of a three-stage preamplifier and a latch.
Transistor Stage 1 Stage 2 Stage 3M0 10/0.5 6/0.5 8/0.5M1 4/0.4 0.4/0.2 0.3/0.2M2 1.6/0.4 0.8/0.3 0.8/0.13
Table 4.1: ZCD preamplifier device sizes.
Preamplifier
The preamplifier is the cascade of three scaled but otherwise identical stages. Each stage, shown
in Fig. 4.10, is a basic differential pair with local common-mode feedback (CMFB). The ratio
of main tail to CMFB current is 4:1, which puts a 25% overhead on overall power. The choice
of local CMFB guarantees stability over any bias current, which is necessary due to the adaptive
current bias. The CMFB circuit used is very simple and has a limited differential range, but this is
acceptable since the preamplifier only needs to amplify the relative difference between two signals;
normal amplifier parameters, such as linearity and swing, are not important for this application.
Although a single loop around all three stages (and almost identical poles) would have saved a
small amount of power, aside from the complication of tracking the location of stabilizing zeros
with bias current, the CMFB loop would have been too slow to be power-cycled at each sample.
The sizes of important ZCD transistors for each stage is given in Table 4.1. The gain and bandwidth
71
IN-
IN+
VCM
OUT+
OUT-
M0
M1
M2
BIAS
Figure 4.10: ZCD preamplifier stage. Thick devices are LVT.
of each stage was initially specified considering the results in Sec. 3.1, but the final transistor sizes
were determined by a Matlab/Spectre script to minimize absolute delay with respect to power.
Simulated ZCD performance shows, for example, that with a power consumption of 490 nW, the
preamplifier provides 71 dB of gain and has a bandwidth of 1.2 MHz.
Uni-Directional Latch
Because the ZCD must track a continuous input, a true regenerative latch is not an option. However,
based on our chosen topology in Fig. 2.4 we do know that during normal operation, the ZCDs
bound the input within an interval: this means that until a crossing occurs, both ZCDs have VIN+ <
VIN−, and their outputs are low. Referring to the timing diagram in Fig. 2.7, once a ZCD crossing
occurs, the circuit enters REFRESH and when it returns to T RACK, the condition VIN+ <VIN− is
again true. Based on this observation, we can design a circuit which latches in only one direction,
72
zcdRST
IN+
IN-
OUT
2.4
0.13
1.2
0.13
0.2
0.130.2
4
0.3
0.13
0.2
0.13
0.2
2
0.15
0.13M1bM1a
M2bM2a
M3
M4
Figure 4.11: ZCD uni-directional latch.
i.e. while VIN+ < VIN− the output is low and can go high as VIN+ ≥ VIN−, but once VIN+ > VIN−
the output is stuck high and cannot go low again without a reset.
The circuit shown in Fig. 4.11 exhibits a uni-directional latching behavior. The input four-
transistor cell comprised of M1a,b) and M2a,b form a high-gain complementary self-biased
differential amplifier cell [45]; this stage can be visualized as two differential pairs connected drain-
drain, with N- and P- tail current sources M3 and M4, respectively. The remaining transistors in
this stack are switches used to disable the bias current sink, and add to the bias current source when
a crossing occurs, thereby speeding up the transition and eliminating the static bias current once
the cell latches. In some way, the input stack can be thought of as an analog amplifier stuffed into
a NAND gate. The zcdRST signal is used to reset the latch while in REFRESH.
73
4.3.2 Current Bias DACs
Figure 4.12 shows the simulated delay of the complete ZCD (preamplifier and latch) vs. input
slope, for 20 logarithmically spaced bias currents ranging from 230 nA to 4.3 µA. Due to the
voltage division in the capacitive feedback DAC (Fig. 2.6), the swing at the ZCD input is half that
of the input itself, so the maximum input slope expected for a 20 kHz full-scale (where full-scale
is 0.8V peak-peak, equal to VDD) input is around 25 V/ms. As shown in the plot, most of the bias
currents can not achieve a 62 ns delay with a 25 V/ms input slope, which is what would have been
required had this ZCD been used in an LCADC without AR.
From the calculation in (4.1) the system has a minimum TGRAN of almost 1 µs. Of this, 200 ns is
budgeted for the ZCD delay; the remaining time is given to the asynchronous logic, capacitor DAC
reset and settling periods. From Sec. 3.1, we know the ZCD is allowed 60 ns dispersion before the
SDR is affected. This narrow dispersion can be maintained down to low slopes by biasing the ZCD
with five different currents depending on input slope. The circuit shown in Fig. 4.13 measures the
time between consecutive samples by counting edges from the delay line shown in Fig. 4.6 and
uses the count value to select a bias current for the ZCD for the next sample. Immediately after a
sample has occurred, the T RIG edge has not propagated down the delay line and the edge counter
output is reset, thus selecting a low bias current. If another level crossing occurs right away, it
would mean the input is fast-moving, or high-slope, which makes the low bias current appropriate.
As time passes the delay line propagates, incrementing the edge count. Should a level crossing
occur at a later time, the bias current latched in for the next sample is correspondingly higher. In
74
101
102
103
104
101
102
103
Input Slope (V/s)
Del
ay (
ns)
Figure 4.12: Total ZCD delay with respect to input slope for bias currents between 230 nA and4.3 µA. The higher bias currents are lower lines (indicating faster response).
this manner, the ZCD bias current varies inversely with the approximate slope of the input; the
longer the interval between level crossings, the lower the slope of the input signal.
The ZCD delay thus traverses different bias current lines depending on input slope such that
the overall ZCD delay dispersion is reduced, as seen in Fig. 4.14.
Figure 4.15 shows the bias current DACs for the ZCD preamplifiers. Eight identical 6-bit
PMOS current DACs all drive identical NMOS diodes, and these voltages are selected by an 8:1
multiplexer. The two multiplexers are controlled by SELH[2:0] and SELL[2:0], which are gener-
75
t t t t
TRIG
EDGE COUNTERRST
COUNT
time between samples
TRIGCOUNT
0 1 2
bias to ZCD
BIAS
0
BIAS
1
BIAS
2
increasing
bias current
Figure 4.13: A delay-line based slope measurement quantizes the bias current to five discretelevels.
ated according to the scheme shown in Fig. 4.13. The multiplexer outputs BIASH and BIASL are
the bias voltages for the upper and lower ZCDs, respectively. The eight DACs all share the same
current reference, the second of two master references brought from off-chip (the first was used
in the timing generation in Sec. 4.2). The seven bias control words BIASi[5:0] for the DACs are
programmed externally though the SPI interface. Eight DACs were chosen since five were used
in the initial system planning, but in hardware five levels and eight levels both encode to 3 bits so
more flexibility was gained with little overhead. Although it is wasteful to have eight current DACs
active if only one is used at any moment, the ZCDs need to switch bias currents within a few hun-
dred nanoseconds, which would normally require a high bandwidth single current DAC. Instead,
each DAC is decoupled with large capacitors (not shown in schematic) across the diode connected
NMOS devices to minimize the voltage step due to instantaneous charge sharing the multiplexer
76
101
102
103
104
101
102
103
Input Slope (V/s)
Del
ay (
ns)
Bias Current = 0.85µABias Current = 0.61µABias Current = 0.45µABias Current = 0.3µABias Current = 0.24µAAdaptive Bias Current
Figure 4.14: Adaptive current bias keeps delay within dotted lines (60 ns dispersion) until veryslow slopes.
connection changes and a new DAC is connected to the ZCD; the DAC itself is, however, low
bandwidth.
4.4 Segmented Capacitor Feedback DAC
In Fig. 2.5, the feedback DAC is implemented as a single capacitor array, which level-shifts the
input voltage to within the (−VLSB/2,VLSB/2) interval compared by the ZCDs. In the case of
77
REF2
0.5
0.25
0.5
16×BIAS0[5:0]
BIAS1[5:0]
BIAS7[5:0]
SELH[2:0]
SELL[2:0]
BIASH
BIASL
Figure 4.15: Binary-weighted arrays of PMOS transistors forming the current bias DACs for theZCDs. The bus of 8 bias voltages is multiplexed onto the BIASH and BIASL wires.
adaptive resolution, the comparison interval is no longer fixed to (−VLSB/2,VLSB/2). Rather than
adjusting the VL,VH boundaries, this variable interval is implemented by using a pair of feedback
DACs whose digital codes are offset by the width of the variable interval. Since the capacitor DAC
is comprised of unit capacitance elements, the variable resolution is easily implemented by adding
more unit elements. The drawback of this scheme is that two capacitor DACs are required, which
are a significant portion of the total area. The schematic of the dual capacitor DAC architecture
is shown in Fig. 4.16. The signals dacEN and dacRST are controlled as before in Fig. 2.7, and
VCM, REF+ and REF− are the DAC reference voltages, identical to the references in a normal
SAR ADC. Both DACs share the same binary code, while the thermometric code differs between
them. In the AR scheme, the comparison interval converges over time to the highest resolution in
8 LSB steps, where the leading boundary of the interval will have decremented toward the signal
until it is 0.5VLSB from the reconstructed output as shown in Fig. 4.3. Thus each capacitor DAC
78
+
-
+
-
vx(t)
BINARY[4:0]THERMH[23:0]THERML[23:0]
VCM
dacEN
dacRST
REF+
REF-
dacRST
IN
×27
×28
BINARY[4:0]
dacEN×2
3×1 ×1
THERM[23:0]
dacEN
to ZCD
VCM
signals from asynchronous controller
dacRSTdacRST
+VLSB 2
-VLSB 2
CA
P
DA
C
CA
P
DA
C
Figure 4.16: Adaptive resolution implemented with two capacitor DACs (detail of one shown ininset, both are identical), to give a 24×LSB thermometric range.
needs a thermometric range of ±8 LSB from the final value. Therefore, the thermometric LSB
DAC is given a range of 3×8 LSB, so that regardless of the LCADC output value there is an 8 LSB
thermometric range above and below it. For example, if the LCADC output is 8M + 0 (meaning
the output is divisible by 8) then the binary code BINARY equals M, and either T HERMH or
T HERML (depending if the previous crossing was upward or downward) counts from ±7 to 0.
Similarly, if the LCADC output is 8M + 7, then BINARY = M as before, but now T HERMH
or T HERML will count from 14 to 7 (if the previous crossing was upward) or from 0 to 7 (if
the previous crossing was downward). Thus the total continuous range of the thermometric DAC
needs to be from -7 to 14 or 21 LSBs. 24 LSBs are used since it makes the layout of the DAC more
symmetric.
The capacitor DACs are comprised of 20 fF unit cells arranged in a common centroid layout
with the thermometric LSB DACs located in the center. The structure of the unit cell shown in
Fig. 4.17 uses the vertical parallel plate structure [46]. A vertical plate formed of stacked wires
79
METAL 5
METAL 4
METAL 3
METAL 2
VIA
3.5
6
VIEW ALONG
X-AXIS
VIEW ALONG
Y-AXIS
Figure 4.17: Structure of a unit capacitance cell. Projected views from both sides are shown aswell.
in metals one atop another with a dense array of vias, to simulate a vertical sheet of metal in
the IC. In this capacitor, the top plate formed by vertical plates of metal3-metal5 interleaved with
plates of metal2-metal5 and entirely enclosed within a basket of metal2-metal5 forming the bottom
plate (imagine a car battery, with the casing as part of the bottom plate). Minimizing the top
plate parasitic to ground is critical because the lines driving the bottom plates are routed within
the capacitor array, and the single-ended switching of these lines can couple onto the top plate,
causing code-dependent offsets. The bottom M1 layer is reserved for routing, as each thermometric
capacitor has its associated switch underneath. In the binary DAC the switches are outside the array
and under the unit capacitors are dummy switches. Due to the aforementioned lack of parasitic
extraction, estimating the impact of line coupling was difficult so a unit size of 20 fF was chosen
as a conservative compromise between minimizing the impact of coupling and low power.
80
To estimate the worst case power consumption, we know that a full-scale input exercises the
full range of the DAC sequentially. That means, ignoring the time-stamps of the level crossings, if
we just inspected the sequence of codes sent to the DAC we would see a ramp from 0 to 2N then
back to 0, repeating twice per period of the full-scale input sine wave. For a DAC with 2N unit
capacitance Cu elements, suppose the input code is 0 ≤ n ≤ 2N − 1. Then the DAC is configured
as n×Cu capacitors to VDD (256− n)×Cu capacitors to ground. Since the DAC is always reset
to VDD/2 before each new output value, this takes a total of nCuVDD/2 charge from the supply.
When this DAC is reset to VDD/2, the VDD/2 reference generator must source or sink energy; the
total charge taken from the supply during this reset phase is (2N − 2n)CuVDD/2. Combining the
charging and reset phases to get the total charge used for DAC code n, we get (2N − n)CuVDD/2.
Since there are two code ramps per period of full-scale input (in one period there are 2×2N level
crossings) the total power consumption is
Power = 2VDD
2N
∑n=1
(n
VDD
2Cu
)× fin (4.2)
For a full scale 20 kHz input and N = 8, Cu = 20 fF, VDD = 1 V, this means 13 µW of power.
Fortunately the AR algorithm reduces the maximum sampling rate at 20 kHz by almost 10×, so
the power consumed by charging the capacitors is a few µW, within the 10 µW budget. It should
be noted that based on published SAR ADCs in these processes, for 8-bit matching the capacitor
could have been made smaller than 4 fF [47–51], but without an extraction tool 20 fF was chosen
for margin.
81
4.5 Automatic Offset Calibration with Oscillator
So far the generation of the (−VLSB/2,VLSB/2) interval has been assumed, as has been the matching
of the two ZCD offsets. In reality, for the device sizes given in Table 4.1 the simulated 1-sigma
input-referred offset of the ZCD is 6 mV, which is about 2VLSB. In order to generate the VLSB/2
voltages and compensate the ZCD offsets, an 8-bit R-string DAC is used. The calibration DAC
is a dual-tap (one for each VL and VH , which go to the lower and upper ZCD, respectively), 8-bit
version of [52] but is otherwise identical. The structure of the DAC and its tap decoder are shown
in Fig. 4.18. The tradeoff between dynamic range and precision is chosen so that the calibration
DAC has LSB/16 precision, and can compensate up to 2σ of ZCD offset. This precision/range
tradeoff is not hardwired into the DAC, but set by the choice of external bias resistors connected to
V HIGH and V LOW .
In order to automatically calibrate the ZCD offsets, the existing ZCDs are used in a pair of
successive-approximation loops, as shown in Fig. 4.19. In the figure, the capacitor DACs are
abstracted as voltage sources since their voltages do not change during a conversion cycle. The
ZCDs are modeled as ideal ZCDs with offset voltage sources VOH , VOL in series with their inputs.
During offset calibration, the capacitor DACs are held in reset at voltage VCM (“0” with respect
to the input range), which is equivalent to binary code “01111111”. The digital logic then per-
forms two parallel SAR conversions (one for each ZCD) using the calibration DAC, and the digital
conversion results DH0 and DL0 represent the ZCD offset voltages themselves, VOH and VOL quan-
tized to 8 bits (since the calibration DAC has 8-bit resolution, not because the LCADC uses 8-bit
82
COLUMN DECODE
RO
W D
EC
OD
E
VH
VL
VHIGH
VLOW
CH[7:4]
COL H COL L
ROW H
ROW L
CL[7:4]
CH[3:0] CL[3:0]
250
VH VL
Figure 4.18: Schematic of the R-string calibration DAC with two output taps. The inset showsone of the 256 identical unit elements.
capacitor DACs). Once the ZCD offsets are found, the process repeats but with the capacitor DAC
outputs set to 2VLSB and−2VLSB (codes “10000001” and “01111101”). This produces at the end of
conversion DH1 and DL1, the digitized versions of VOH +2VLSB and VOL−2VLSB. Then the correct
DAC codes to generate the (−VLSB/2,VLSB/2) thresholds (accounting for the ZCD offsets) are:
DH = DH0 +DH1−DH0
4(4.3)
DL = DL0−DL0−DL1
4(4.4)
Since these are binary bytes, the divide-by-4 is just a right shift two places. The choice of using
two LSB shifts to calculate the VLSB/2 value, where a single LSB shift would have sufficed, is to
reduce the rounding error. This unfortunately limits the dynamic range of the offset calibration,
83
+
-
+
-
+-
+-
DIGITAL
Set to 0, then -2VLSB
Set to 0, then +2VLSB
+ -
+-
VOH
VOL
CAP DACS
CAL
DAC
UP
DN
Figure 4.19: Two-step SAR calibration loop using the existing ZCDs as comparators. The capac-itor DACs are modeled as voltage sources, and the offset voltages are shown as voltage sources inseries with the ZCD inputs. First the ZCD offsets are calibrated, then the VLSB step.
since the calibration DAC needs a range of 4VLSB +VOH +VOL. For this reason the direction of the
2VLSB shift can be chosen, i.e. if VOH +2VLSB is too positive for the conversion range and the SAR
code DH1 is “11111111”, then the conversion repeats but with VOH −2VLSB, which IS in range of
the calibration DAC (the math to produce the calibrated value adjusts accordingly in the logic).
The same is true for the lower SAR loop.
4.5.1 Relaxation Oscillator
The SAR calibration loops require a clock to operate. Although each calibration loop is completed
in a finite number of cycles and therefore could be driven with edges from a delay line, the nodes
VL and VH driven by the calibration DAC outputs are heavily decoupled and therefore have very
84
50k
200
0.5
10
0.54
0.2
2
0.13
OSCTUNE[3:0]
oscENOSC
1
0.13
1
0.13
DEVICES WITH NO SIZE ARE
MINIMUM SIZE 0.15/0.13
M2
M1
Figure 4.20: Relaxation oscillator with tunable 4-bit MOS capacitor load.
low bandwidth, so the delay elements would need to be very slow. Instead, a relaxation oscillator
and 8-bit clock scaler are integrated to drive the calibration loop. The oscillator shown in Fig.
4.20 is based on a current source charging a capacitance with a tunable MOS capacitor bank that
generates voltage ramps that are periodically reset by the NOR gate in the middle. The switching
transition point of the stack with M1 and the stack with M2 are different, due to the diode below
M2. The difference between these two switching points is the amplitude of the sawtooth wave on
the MOS capacitor load. The narrow pulses that result from the NOR periodic reset are fed to a
pass-gate divide-by-2 to create a 50% duty cycle clock. This clock is further divided by an 8-bit
clock scaler to drive the calibration routine. This bizarre implementation of a standard relaxation
oscillator is due to the fact that the layout of the oscillator is basically a delay cell from Sec. 4.2
with changes in metal1 and metal2, and so the oscillator was achieved with little effort (necessary
to meet the tapeout deadline).
85
IN
GATE
OUT
VIN
VGATE
EN
EN
VIN
VGATE
EN
(a) (b)
Figure 4.21: Traditional switch drivers (a) bring the gate to 0 when off. The proposed driver (b)clamps the switch gate to the source when off.
4.6 Other Circuits
4.6.1 Bootstrapped Input Switch
Because the input full-scale voltage is equal to VDD, were the input sampling switch in Fig. 4.1 im-
plemented with a simple pass transistor, it would have an overdrive voltage that varies significantly
with the input. Not only would this cause an input-dependent Ron (this also plagues differential
sampling networks), but signal-dependent charge injection as well. The solution is a boosted sam-
pling switch. Most differential boosted switches drive the switch gate to vIN +VDD when on, and
down to 0 when off [53, 54], as shown in Fig. 4.21(a). If these switches were used in this imple-
mentation, while the on-resistance nonlinearity would be eliminated, the input-varying change in
the switch vGS during the off-on transition would still cause input dependent charge injection. The
solution is a bootstrap circuit shown in Fig. 4.21(b) whose “on” voltage is vIN +VDD, and whose
“off” voltage is vIN (such that the VGS of the switch transistor is 0 when off). The schematic of
86
EN
independent N-wells
160 fF
ALL HIGH VOLTAGE DEVICES
MINIMUM SIZE 0.3/0.28
1
0.13
M2
M1analog input to CAP DAC
Figure 4.22: Gate drive boostrapping circuit. The input node is connected to the analog input, andthe output is connected to the capacitive feedback DACs.
this circuit is shown in Fig. 4.22, and follows closely the classic design of [54]. The only mean-
ingful difference is that the gate of the pass device is not shorted to ground but clamped to its own
source when the switch is turned off. Unfortunately, with the modified “off” voltage no way was
found to use only low-voltage devices. Aside from the added cost of using high-voltage transis-
tors, this method has the disadvantage that it requires the source impedance to be much lower (AC
impedance) than the impedance of the switched capacitor DAC following it (refer to Fig. 4.1),
otherwise the charge injection of clamping switch M2 will flow to the right [55]. With a 0.8 V
overdrive voltage, the nominal Ron was simulated to be 1 kΩ.
87
BINARY
LSBDECODERDECAP
ONE MSB GROUP
4×
4×
4×
4×
DU
MM
Y
DU
MM
Y
DU
MM
Y
DU
MM
Y
CLK
CLK
DATA
I- I+
0.8
3
0.67
0.13
0.31
0.13
(a) (b)
UNIT
UNIT
UNIT
UNIT
Figure 4.23: Differential segmented current steering DAC used for testing the LCADC. Top layoutshown in (a) is comprised of 256 identical unit cells, whose schematic is shown in (b). The notation4× means a group of four unit steering cells.
4.6.2 Current-Steering Reconstruction DAC
To test the LCADC, the digital bitstream needs to be reconstructed. From the stipulation that no
high-speed re-sampling be used, the only recourse is a zero-order hold reconstruction using a DAC.
In order to clearly see the performance of the LCADC and not the DAC, the DAC is purposely
designed for 10-bit linearity even though it has 8 bits. The DAC is a differential current-steering
design with open drain outputs. In order to minimize glitching a fully thermometric structure is
preferred, but this requires a large routing area. As a compromise, a segmented 4/4 architecture
was chosen. Each of the thermometric MSB banks is comprised of 16 replicas of a unit cell, all
driven by the common thermometric signal. The 16 unit cells are grouped as four banks of four,
as shown in Fig. 4.23(a). The groups of four cells are arranged in a common centroid layout, with
the middle row between the two halves occupied by decoupling capacitance, the binary-weighted
88
LSB bank, and the thermometric MSB decoder. Each unit cell shown in Fig. 4.23(b) is a simple
current steering open-drain switch. The CLK signal is not necessarily a synchronous clock, but a
signal which indicates when the data arriving at the DAC is valid, to prevent glitching in the DAC
output. This signal is produced by the LCADC to interface with other asynchronous circuits that
use handshaking protocols. In the DAC the CLK signal is delayed to account for the propagation
delay through the thermometric decoder, and driven to every cell in an H-tree (not shown in the
layout diagram) to ensure identical switching times of all cells–this is also to reduce glitches in the
output. The steering pair also has charge-injection cancelation devices, although their effectiveness
is questionable since no extraction was performed. A PMOS current source device was chosen as
PMOS devices had better matching for small sizes, and were sized to give a 1-σ mismatch of
LSB/10, to allow the nonlinearity of the LCADC to dominate. The integrated noise power over a
20kHz bandwidth is simulated to be more than 20 dB lower than that expected from the LCADC,
and the simulated maximum frequency of operation is > 200 MHz, although this number was
achieved without parasitics. Since the minimum TGRAN for an LCADC without AR is 62 ns (16
MHz instantaneous clock speed) even when accounting for parasitics the bandwidth should be
sufficient.
4.6.3 Asynchronous Controller
The AR controller and ZCD bias controller are written in verilog and synthesized with Design
Compiler, and verified with Prime Time. In order to address the concerns of dynamic dissipation
89
discussed in Sec. 2.4, entirely hand-written RTL was attempted but it was found that the automated
synthesized logic was superior. The logic was still optimized by hand to minimize large code-
dependent delays: in order to minimize power all minimum-sized logic gates were used, which
results in a propagation delay of several ns.2 It was possible to use standard digital flows even
though this block deals with asynchronous signals because the combinational logic in the timing
block (Fig. 4.6) ensures that minimum bounds on pulse widths and pulse frequency are met. The
asynchronous digital logic contains 461 standard cells and occupies 9,000 µm2.
4.6.4 SPI and Calibration Logic
The IC has a total of 25 control bytes, which are interfaced through a standard SPI 1,0 bus. The
calibration logic and SPI logic as well as various testing functions are merged into one synthesized
module, separate from the asynchronous controller, which occupies the lower portion of the IC.
Since both the SPI and calibration logic are powered down during normal use, their supply is
separated from the rest of the IC (shared with the ESD supply) and not counted toward the power
budget. The combined logic block contains approximately 1,400 standard cells, and occupies
45,000 (µm)2.
2Although all such optimizations are possible (and done better) by Design Compiler, but this was done by handdue to lack of familiarity with the tool.
Chapter 5
LCADC Measurements
5.1 Setup
The IC, whose die photo is shown in Fig. 5.1, contains both the LCADC and its testing DAC.
Since the IC is bond pad limited, in order to save area the bias currents and 8-bit PCM data bus
of the LCADC and DAC are multiplexed onto the same pads. The IC powers up in DAC mode so
that the 8-bit data bus is an input, and through the SPI interface it is put into LCADC mode. Since
the bias pins are also shared it is impossible to use both the LCADC and DAC on one IC, so the
testing board contains two copies of the IC. Although the DAC was purposely over-designed so
that it is not the performance-limiting block in the measurement chain, it should be noted that as
all measurements were performed with the test DAC, the numbers are conservative representations
of the actual LCADC performance. Aside from a few non-critical bugs detailed in Appendix B,
the IC functioned as expected.
90
91
Figure 5.1: Die microphotograph. The output DAC is not counted toward the active area incomparison in Table 6.1.
Because the bias currents needed by the LCADC were in the range of 10’s of nanoamperes
while the ESD diode leakage of the foundry-provided pads was several times larger, the bias cur-
rents from off chip were fed into a 300:1 current divider at the pad before propagating to the
LCADC core. Therefore, the bias current through the external resistor is not counted in the total
power consumption since it would normally be generated on-chip.
The schematic for the circuit used to test the IC and obtain all the measurements is shown in Fig.
92
5.2. The testing board contains many other features for audio processing, regulated supplies for the
V REF+ and V REF− capacitor DAC supplies, etc. but those were not used (when measurements
were taken). Although two potentiometers are shown in the schematic, only the potentiometer
for the ZCD bias current was adjusted to obtain the best performance per-die, then fixed for all
the following comprehensive measurements. In the calibration performance measurement none of
the external components was adjusted, to get an accurate assessment of the calibration algorithm
itself. The testing DAC current outputs drive an LT6204-based differential-to-single-ended buffer
with 60 dB of gain (1 kΩ transresistance) and a simulated bandwidth of 20 MHz, to show clearly
the steps in the LCADC output.
The entire testing procedure, save for swapping dies is automated through Matlab. The two
ICs reside on a common SPI bus that communicates with a custom IC testing interface. This USB
powered interface provides two variable supplies, fixed high voltage supplies and data storage, and
supports multiple I/O standards. The interface is controlled through a Matlab program. All the
measurement instruments communicate also with Matlab through a National Instruments USB-
GPIB cable. The two boards are shown in Fig. 5.3.
To capture spectral data, an Agilent E4446A spectrum analyzer was set to capture a 30 kHz
bandwidth with a resolution bandwidth of 10 Hz. The instrument was set to take two averages,
and the raw data was processed in Matlab with a peak-search routine to filter the tones from the
noise (since in an LCADC the spurious tones come in easily predictable locations) to provide both
SNR and SDR numbers. The power in each spectrum analyzer bin was normalized by the spectrum
analyzer’s RBW filter shape, to accurately estimate the total energy in the band of interest. It is still
93
VLOW
VHIGH
BIASZCD
D[7:0]
SAMP
RST
VIN
VCM
VREF+
VREF-
BIASTIME
AVDD
DVDD
AVDD
VSS
VSS
AVDD
VSS
VSS
AVDD
AVDD
VSS
VTL
VTH
CLK
VDDZCD
VDDODAC
VDDCDAC
VDDCAL
VDDTIME
VSSANA
VSSDIG
VDDSPI
VDDASYNC
AVDD
VSS
AVDD
VSS
DVDD
VSS
ANALOG INPUT
DIGITAL OUTPUT
10kΩ
5kΩ
5kΩ
10kΩ
200kΩ
200kΩ
1.5MΩ
1.5MΩ
10kΩ
2.2μF
2.2μF
2.2μF
47μF
47μF
47μF
47μF
10μF +
-
LT1880
Figure 5.2: Schematic of the LCADC portion of the testing board used to produce the presentedresults.
possible that a low-amplitude tone is confused for noise and vice versa, but ultimately the SNDR
data is correct since it is the ratio of the fundamental to everything else in the spectrum within the
band of interest. For power measurements, Agilent E34401A multimeters were spliced in-line with
the individual block supplies and for each data point the average of 4 sequential measurements was
taken.
94
Figure 5.3: Test board and IC interface board.
5.2 Results
The following results are all obtained after the calibration routine was run on the IC at startup.
Since the calibration routine is fully self contained, running the routine is just part of the chip
programming sequence through the SPI interface. Both the adaptive resolution and ZCD biasing
schemes were enabled (unless otherwise specified). The noise and distortion of the LCADC was
integrated over a 20 Hz–20 kHz bandwidth. All measurements were performed at room tempera-
ture, with the IC running from a 0.8 V regulated supply.
95
Figure 5.4: Test DAC reconstructing the LCADC output for a 1 kHz, small-amplitude input.
5.3 LCADC System Performance
A typical waveform plot of the LCADC output is shown in Fig. 5.4. A small amplitude input was
chosen to clearly show the quantization steps. Also visible is a small amount of glitching around
a few transitions. As explained in [35], a larger hysteresis window would give greater glitch im-
munity at the cost of poorer SNDR. The superposition of output over input was not meant to be
indicative, it just so happened that the gain of the buffer-LCADC-DAC-buffer path was approx-
imately 1/2. The sharp staircase-like output waveform shows that the output DAC bandwidth is
significantly higher than the 20 kHz signals it processes (as designed).
As motivated in the introduction of this work, this data converter is less suited for processing
continuous-energy signals such as sinusoids than it is for converting bursty, high peak-to-average
ratio signals. Figure 5.5 shows the time-domain output of the LCADC with a cardiac pulse signal
96
Figure 5.5: DAC reconstruction of LCADC output processing a cardiac pulse signal. At the signalpeaks and valleys the quantization step is 2.8 mV. During the high-slew portions of the signal thequantization step increases, to a maximum of 14 mV (5× minimum).
from an Agilent 33220A arbitrary waveform generator. As designed, the LCADC takes large
quantization steps during fast input transients, and takes smaller steps when the input is moving
slowly.
Two unprocessed spectra of -3dBFS inputs are shown in Figs. 5.6 and 5.7, with the peak of
the next highest tone (the second harmonic, which is present due to the single-ended architecture)
indicated by the waveform marker. In both cases, the locations of the spurious tones are at integer
multiples of the fundamental indicating that such tones are harmonics, as expected. A -3dBFS
input amplitude means that only 190 of the 256 possible digital codes are exercised by the input; the
reason the reduced amplitude was chosen is that for amplitudes greater than -3dBFS the distortion
97
Figure 5.6: Spectrum of a single tone, 300 Hz -3dBFS input.
drastically increases, as shown in the plot of SNDR versus amplitude in Fig. 5.8 for a fixed 1
kHz input. It is important to point out that in spite of this circuit impairment the SNDR of the
converter exceeds the theoretical limit for a conventional 8-bit ADC. Many tests were performed
to isolate the cause of the impairment but the results were inconclusive. The first test was to disable
both the adaptive resolution and ZCD biasing schemes, but the problem persisted. The digital data
from the LCADC was captured with a logic analyzer at 500 MS/s and examined for glitches or
saturation in the digital logic, but none were found. This suggests the problem is not in the digital
algorithms. The next theory was that one of the clamping switches, or perhaps one of the junction
diodes in the bootstrapped switch driver transistors (schematic in Fig. 4.22) begins to conduct at
high input amplitudes. To test this, the supply voltage was increased to 2 V, the hypothesis being
that if the problem is indeed a diode conducting, then that occurs at a fixed voltage relative to the
98
Figure 5.7: Spectrum of a single tone, 1 kHz -3dBFS input.
supply rail whereas the input full-scale is proportional to the supply, so with a larger input full-
scale the problem should occur at a higher (proportional) amplitude. Unfortunately, the amplitude
where distortion worsened tracked relative to the supply (full-scale input range) thus invalidating
that hypothesis. In a second test of the same theory, just the capacitor DAC reference voltages
were adjusted, but then distortion due to the switches not being fully on or off was encountered
(verified by simulations). The ZCDs were ruled out since by the time the signal arrives at the ZCD,
it has been level-shifted by the capacitor DAC so regardless of input magnitude the input remains
around VCM. The remaining culprit was the input bootstrap circuit (an intrinsic problem with the
design). A possible theory was that somehow the gate bootstrap voltage is not allowed to increase
far beyond the supply, so for inputs nearing the positive supply the switch on resistance increases.
This idea is partially supported by the observation that the large increase in distortion is caused by
99
−30 −25 −20 −15 −10 −5 0
20
25
30
35
40
45
50
55
60
Input Amplitude (dBFS)
(dB
)
SNDR Conventional 8b ADCSNDRSNRSDR
Figure 5.8: Signal fidelity as a function of input amplitude for a 1 kHz tone input.
a rise in the even harmonics. Unfortunately, this was not reproducible in simulation nor could a
suitable test be devised to confirm this theory. However, as the bootstrap circuit is one of the only
circuits in this design which is not a common or proven architecture1, it seems plausible. Since
the maximum useable input is -3 dB below the digital full-scale output, it would make sense to
re-define the full-scale input amplitude as that which exercises 190 out of 256 digital codes. It was
1Although it was based on [54], bootstrapping circuits are difficult to guarantee (and test reliability) so typically ahandful of proven designs are ported unmodified from one design to another, especially in industry.
100
102
103
104
5
10
15
20
25
30
35
40
45
50
55
60
Input Frequency (Hz)
(dB
)
SNDR Conventional 8b ADCSNDRSNRSDR
Figure 5.9: SNDR as a function of input frequency for a -3 dBFS input.
decided to leave the 0 dBFS reference as absolute with respect to the digital code output, so as to
be mindful of this performance loss.
The SNDR performance over frequency is shown in Fig. 5.9. The sharp jumps in the SDR
are due to transitions between resolutions, but once again the SNDR is generally higher than that
possible with a conventional 8-bit ADC. It was experimentally found that using only 3 different
resolutions (1×LSB, 3×LSB, and 5×LSB) gave the best performance/power ratio. While the loss
of the lower resolution settings required a faster loop response, the increase in power in the ZCDs
101
102
103
104
0
1
2
3
4
5
6
7
8
9
Input Frequency (Hz)
Pow
er (µ
W)
Total PowerDigitalTimingZCDCap DACCal DAC
Figure 5.10: Power consumption as a function of frequency for a -3 dBFS input tone.
was countered by a large savings in power in the digital controller. Increasing the ZCD power
slightly allowed its response time plus the digital logic delays to be under 300 ns. The delay line
was set to accommodate the most stringent TGRAN condition: for a full-scale, 20kHz input when
the LCADC resolution is 5×LSB, TGRAN = 310 ns.
As seen in Fig. 5.10, the digital logic is the dominant power consumer. The fact that more
adaptive resolution steps were implemented than actually used resulted in a slight efficiency loss,
due to having unused delay elements in the delay line and digital logic overhead. Without adaptive
102
resolution, the IC power consumption is 11 µW at a 4 kHz input frequency; this is compared to 6.5
µW with the adaptive algorithm enabled. Unfortunately comparison at higher input frequencies,
where the benefit of adaptive resolution would be even more pronounced, is impossible since the
IC is not able to sample signals above 4 kHz without the adaptive algorithm.
The power plot shows that the ZCD power barely changes. There are a few reasons for this.
One, the static component (the ZCD bias current) is already very low, as seen in Fig. 4.14. More-
over the variation in bias current needed is less than 1 µA. Also, the reduced static consumption at
higher frequencies is canceled by the increased dynamic component, with the bias currents switch-
ing more frequently as sample rate increases, which accounts for the dip then rise in ZCD power
as frequency increases.
A plot of the average sampling rate vs. input frequency for the LCADC in Fig. 5.11 shows
the operation of the AR algorithm. When the LCADC is set to fixed 1× LSB resolution, it hits
the maximum sampling rate (set by the delay line) within the band of interest, and at 3× LSB
resolution it is almost to the point of saturating right at the band edge. When the AR is enabled,
the sampling rate varies without ever saturating. At a maximum input frequency of 20 kHz the
average adaptive sample rate is a little over 1 MS/s. Without adaptive resolution the sample rate
for the same-amplitude input would be 8 MS/s, which means the adaptive resolution algorithm
achieves a maximum of 8× reduction in samples produced. The SNDR for the same setup in
Fig. 5.12 shows that once the sampling rate of the fixed resolutions saturates the distortion kills
performance, while with the AR enabled SNDR is maintained.
103
102
103
104
101
102
103
Input Frequency (Hz)
Ave
rage
Sam
ple
Rat
e (k
Hz)
LSB=1x, (8b)LSB=3x, (6.4b)LSB=5x, (5.7b)Adaptive Resolution
Figure 5.11: Average sampling rate in function of input frequency.
The energy per conversion defined by the classical formula
energy/conversion =Power
2ENOB fs(5.1)
is used2, where we define fs = 2 fBW as with oversampled converters. The ENOB is computed with
ENOB =SNDR−1.76
6.02(5.2)
2Arguably a different metric should be devised to highlight the unique strengths of the LCADC, but for a directcomparison the standard equation was used.
104
102
103
104
20
25
30
35
40
45
50
55
60
Input Frequency (Hz)
SN
DR
(dB
)
LSB=1x, (8b)LSB=3x, (6.4b)LSB=5x, (5.7b)Adaptive Resolution
Figure 5.12: SNDR when the AR is disabled, and enabled.
and fBW = 20 kHz. The metric is plotted over frequency in Fig. 5.13, achieving a best case of 210
fJ/conv with a 200 Hz input.
As discussed in Sec. 4.6, the input bootstrap switch injects extra charge back into the input
each time the clamp switch M2 turns on to disable the pass switch M1 (referring to Fig. 4.22). It
was assumed that the driving resistance of the source was low enough that the source could absorb
this charge, but it is interesting to see what happens when this is not true (and how much source
resistance is needed). Figure 5.14 plots SNR, SDR and SNDR vs. an additional resistance placed
105
102
103
104
100
200
300
400
500
600
700
800
900
Input Frequency (Hz)
(fJ/
conv
)
Figure 5.13: Energy per conversion as defined in (5.1) with respect to frequency.
at the output of the LT1880 input buffer driving the LCADC (board schematic shown in Fig. 5.2).
For a series resistance less than 1 kΩ, the performance is unaffected.
All the measurements provided in this section were performed on three dies, with practically
identical performance. SNDR measurements (identical to the measurement presented in Fig. 5.9)
were taken on 12 dies, and all were found to have identical performance except one, which suffered
a 2-3 dB performance deficit. A summary of the LCADC performance is given in Table 5.1. As
mentioned, the actual current in the bias resistors was not counted since the only reason large
106
102
103
104
42
44
46
48
50
52
54
56
58
60
62
Input Source Resistance (Ω)
(dB
)
SNDRSNRSDR
Figure 5.14: Signal fidelity with varying source resistance.
currents were used was to avoid measurement uncertainty due to unknown leakage through the
ESD diodes. Also, the power dissipated through the voltage divider setting the VCM node was
not counted. The reason is that after the measurements were taken a 2 MΩ potentiometer with a
bypass capacitor was used to set the VCM node and was found to have identical performance to the
20 kΩ resistor divider shown in Fig. 5.2. This is because the VCM node does not draw or source
static current (up to mismatched leakage currents) so even a large resistor divider with enough
decoupling will work properly.
107
Process ST 130 nm 1P6M. Options: LVT, high res. polyDie Area 1.7 (mm)2, active area 0.36 (mm)2
Supply Voltage 0.8 V analog, 0.8 V digital
Quiescent Power Leakage 0.6 µWBias 2.0 µW
Total Power(full scale input)
100 Hz 3.0 µW1 kHz 4.9 µW
10 kHz 7.4 µW
Sample Rate Minimum < 10 S/sMaximum 2 MS/s
Resolution Minimum 4 bitMaximum 8 bit
Tested Bandwidth 20 Hz to 20 kHzFull-Scale Input 0.54 V single-ended peak-peak
SNDR 47 dB to 54 dB (over frequency)SNR 54 dB to 58 dB (over frequency)
Intermodulation > 48 dB, two tones spaced 1kHz swept across BWJ/conv 210–880 fJ/conv, 210 fJ/conv at 200 Hz
Table 5.1: LCADC system performance summary. For the full-scale input definition, 0.54 V is thevoltage required to traverse 190 digital output codes. This range avoids the large-input-amplitudedistortion problem.
5.4 Calibration Algorithm
To quantify the performance of the calibration algorithm, the two bias potentiometers on the test
PCB were fixed, and 12 dies were calibrated. The post-calibration SNDR is plotted in Fig. 5.15.
The calibration routine provides four values to the LCADC: the offset and VLSB/2 values for the
upper and lower ZCDs. Perturbations around each calibrated point were generated and manually
provided to the LCADC. As shown, the calibration algorithm achieves in almost all cases the peak
SNDR, with sample 3 being the die which exhibited inferior performance overall. The SNDR is
108
1 2 3 4 5 6 7 8 9 10 11 12
30
35
40
45
50
Sample Number
SN
DR
(dB
)
Self−Calibration ValuesPerturbed Values
Figure 5.15: Calibrated SNDR of a dozen dies.
slightly lower than that shown in Figs. 5.8 and 5.9 because the ZCD bias current was fixed between
dies. If the ZCD bias current is tuned per die, an extra 2-3 dB is gained.
Because the calibration algorithm provides the ZCD offset, the offsets for 24 samples (2 ZCDs
per die) are given in Table 5.2. The measured σ = 5.9 mV agrees with the simulated mismatch.
109
Die # Upper ZCD Lower ZCD1 -9.16 0.1042 4.712 -8.433 10.49 -4.294 -1.95 -7.205 5.70 -3.246 -5.35 3.427 3.76 1.398 6.25 7.229 -2.07 -11.3810 3.76 4.6111 5.28 -3.6112 2.25 6.02
Table 5.2: ZCD offsets in mV. σ = 5.9 mV.
5.5 Delay Elements
A replica delay chain was placed on the die for testing purposes. The 11 delay elements are each
controlled by a 4-bit DAC. Each delay cell was found to have practically identical performance,
and its delay over the digital tuning range, for a range of analog bias currents and supply voltages,
is plotted in Fig. 5.16. Since the delay is determined by VT , the delay shows insensitivity toward
power supply voltage (but would vary with process and temperature). The periodic kinks in the
tuning range were present in all the delays, and are traced back to metal 2 routing over the bias
DACs, which is replicated in each DAC.
The jitter of the delay cell was measured by using the measure functions of an Agilent 54832D
sampling oscilloscope capturing at 2 GS/s to measure the time between the input and output edges
for one delay cell. The bias current was set to roughly 24 nA (due to the current dividers driving the
110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
103
TIMEi[3:0]
Del
ay (
ns)
16 nA33 nA50 nA67 nA83 nA100 nA
Figure 5.16: Delay tuning range for a delay cell, vs. the 4-bit T IMEi[3:0] control value. Themultiple lines at each color are the delay for VDD = 0.8,1.0,1.2 V.
internal bias nodes, it is not possible to know the exact bias current to the cell) and 2000 samples
were taken. The jitter histogram in shown in Fig. 5.17.
Since the current into the timing VDD supply pin includes the power of the delay element DACs
as well as the delay elements and the asynchronous combinational logic, the power of a single delay
was calculated by measuring the change in the timing supply power versus the average sample rate
for a varying low input frequency (to ensure that all delay elements are triggered each sample) and
was found to be 43 fJ/delay. This number also includes the power of the combinational logic shown
111
1.01 1.015 1.02 1.025 1.03 1.035 1.04 1.045 1.05 1.0550
20
40
60
80
100
120
140
Delay (µs)
Occ
urra
nce
Figure 5.17: Delay jitter histogram, N = 2000, µ = 1.033 µs, σ = 4.1 ns.
in Fig. 4.6, but separating that is impossible. The delay element performance is summarized in
Table 5.3.
5.6 Oscillator
The calibration oscillator has a 4-bit tuning word OSCTUNE[3:0] to account for PVT variations.
The tuning curves for 12 dies and three supply voltages are plotted in Fig. 5.18.
112
Parameter This Work [43]Technology 130 nm 90 nm
Area 130 (µm)2 36 (µm)2
Energy 43 fJ/delay 50 fJ/delayRange 700 ns – 4 µs 5 ns – 1 us
Mismatch (1σ) 0.14 ns -Jitter 4.1 ns @ 1 µs (0.41%) 6.1 ns @ 1.82 µs (0.3%)
Table 5.3: Low-speed delay element comparison, results are from measurement except for simu-lated mismatch. The tuning range of this cell is reduced due to the digital 4-bit control. Its area isalso much larger due to the 4-bit bias DAC.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
2
3
OSCTUNE[3:0]
Osc
illat
ion
Fre
quen
cy (
MH
z)
Figure 5.18: Calibration oscillator frequency vs. the oscillator tuning code OSCTUNE[3:0] for12 dies and VDD = 0.8,1.0,1.2 V.
Chapter 6
Conclusions and Suggestions for Future
Work
In this thesis, performance limitations and constraints on ultra-low power LCADCs were in-
vestigated. An 8-bit, 20kHz bandwidth LCADC implemented in 130 nm CMOS was used to
demonstrate ultra-low power operation of an LCADC. We now look at how this design compares
against existing state-of-the-art designs in both level-crossing (asynchronous) and conventional
(synchronous) ADC categories.
113
114
6.1 Performance Comparison
6.1.1 Level-Crossing ADCs
The measured performance in Table 5.1 surpasses previous state-of-the art LCADC designs in
Table 6.1, even without the use of digital post processing, and features the highest level of system
integration. Calibration as well as most of the bias generation is all contained in the IC, and unlike
?? the asynchronous digital controller is also on chip.
To be fair, the LCADC in [10] was just a small part of a much larger design, so direct compar-
ison of the LCADCs only does not do justice to the huge effort in producing the whole working
system. Works [27] and [19] both use high-order polynomial reconstruction with non-realtime
post-processing of the data. Because this IC was designed to achieve equivalent SNDR without
post-processing, the system was designed such that each block contributes equally to the overall
SNDR degradation. Using post-processing on the raw PCM data from this ADC only results in an
improvement of 4 dB in SNDR on average, as both the noise of the delay elements and distortion
from the input sampling switch are comparable to the quantization distortion.
6.1.2 Synchronous ADCs
In spite of the relatively good performance against its peers, this LCADC is still far surpassed
by a state-of-the-art SAR ADC design from two years prior [49] by a factor of 50 in overall en-
ergy/conversion. If we were to impose the condition on the input signal that it is only active
115
Work [27]† [8] [10] [19]† [This work]Year 2005 2006 2008 2011 2011Tech 130 nm 0.25 µm 90 nm 0.18 µm 130 nmArea 0.1 (mm)2 0.25 (mm)2 0.06 (mm)2 0.96 (mm)2 0.36 (mm)2
Resolution 4 bit 6 bit 8 bit 8 bit 8 bitAdaptive no yes
Postprocess yes no yes noBandwidth 160 kHz 55 kHz 10 kHz 1 kHz 20 kHz
SNDR 60 dB 53 dB 58 dB 52 dB 47-54 dBPower 180 µW 17 mW 50 µW 25 µW 2-8 µW
Energy/conv 560 fJ/conv 345 pJ/conv 3.14 pJ/conv 31 pJ/conv 200-850 fJ/conv
Table 6.1: Comparison of this work to published LCADCs. A † indicates that offline post-processing was used to achieve the SNDR numbers reported.
(non-zero) 1% of the time, to model a “bursty” signal, then the average power consumption of
the LCADC would drop to about 2.6µW. The synchronous SAR ADC on the other hand would
continue to consume the same amount of power regardless of input, and so the energy/conversion
advantage of [49] in this scenario drops to a factor of 40. Of course this comparison is not en-
tirely fair to the SAR ADC since many techniques to power cycle ADCs for signals with high
peak-to-average ratio have been developed as well.
Continuing the comparison between this work and the SAR ADC in [49], the SAR ADC is im-
plemented in 65 nm technology, two nodes more advanced. Were the LCADC also implemented in
such technology we can expect the digital power consumption to decrease by almost 4× (compar-
ing simulations of the asynchronous logic synthesized in both 130 nm and 65 nm CMOS) thanks
not only to the device scaling but also the availability of more advanced logic cells (separate fam-
ilies of HVT, low leakage, and low power). The size of the SAR array would also decrease by
116
around 40%, not only because of the reduced metal pitch but because a re-design with a layout
parasitic extractor would allow a more aggressive unit capacitor size. With a 4 fF unit cell instead
of the 20 fF cell used in this design, the power consumption of the capacitor DAC would drop
by half (the switch drivers of the capacitor bottom plates consume power which does not scale
equally with the capacitor size). With just the energy saved in the digital logic and capacitor DAC,
the overall power consumption of the IC would drop by almost 40%. Ultimately however these are
unfair speculations, as hindsight is 20/20 and if the designers of [49] were given the opportunity to
redesign the IC it too would perform better.
More generally, even though this LCADC offers favorable performance against other LCADCs,
for this circuit to be profitable in a complete system would require a system entirely without a
clock. Given the availability of a clock, either [49] or [51] would offer superior performance in
any practical situation. For example, when sampling at 100 kS/s [51] consumes 300 nW (half of
just the leakage power of this design), yet provides similar SNDR and offers 2.5× the effective
bandwidth. In the face of this comparison, it ultimately rests on the system designer to determine
the optimal balance between power consumption in the ADC/LCADC and the necessity of clock
generation circuitry, while being mindful of the expected types of inputs the system must process.
6.2 Suggestions for Future Work
In Chaps. 2 and 3 the zero-crossing detector (ZCD) was identified to be the main performance
bottleneck in achieving high-performance LCADCs. The architecture in 4 was built around this
117
premise, and as is visible from the power consumption breakdown in Fig. 5.10, the ZCD is no
longer the dominant power consumer. If anything, the presented architecture takes the opposite
extreme, in the sense that the ZCD is allowed significantly lower performance (and power con-
sumption) at the expense of digital correction algorithms, which dominate the power budget. This
architecture is optimized for a finer technology, where digital power consumption scales down
while analog bias currents remain roughly constant. It would therefore be interesting to see what
performance would be possible from this design in 65 nm CMOS.
It is clear from the rate of improvement among the designs presented in Table 6.1 that still
much is left to be discovered about LCADCs. Unlike SAR ADCs (the current FOM leaders
for Nyquist ADCs) where the state-of-the-art improves only incrementally year over year, this
LCADC is a substantial improvement over ones before, and the LCADC which follows this one
will surely leapfrog the presented performance. Nevertheless, the limitations on LCADCs (as a
subset of ZCD-based circuits) raises the question whether or not, for any given application, an
LCADC-based system will outperform a synchronous one. The answer to this question is appli-
cation specific; based on raw performance metrics developed for synchronous ADCs, the LCADC
is not yet competitive. However, the classic FOM metric in Eq. (5.1) does not take any signal
properties into account, while the presented LCADC does. As discussed, the relative performance
disparity decreases when high peak-average ratio signals are considered. Therefore, perhaps most
of the work to be done is in finding the right application for this type of data converter. Possible
applications may be found in areas such as biomedical electronics and sensor networks.
Bibliography
[1] P. Ellis, “Extension of phase plane analysis to quantized systems,” Automatic Control, IRETransactions on, vol. 4, no. 2, pp. 43 – 54, November 1959.
[2] R. Dorf, M. Farren, and C. Phillips, “Adaptive sampling frequency for sampled-data controlsystems,” Automatic Control, IRE Transactions on, vol. 7, no. 1, pp. 38 – 47, January 1962.
[3] R. Tomovic and G. Bekey, “Adaptive sampling based on amplitude sensitivity,” AutomaticControl, IEEE Transactions on, vol. 11, no. 2, pp. 282 – 284, April 1966.
[4] J. Mark and T. Todd, “A Nonuniform Sampling Approach to Data Compression,” Communi-cations, IEEE Transactions on, vol. 29, no. 1, pp. 24 – 32, January 1981.
[5] Y. Tsividis, “Continuous-time digital signal processing,” Electronics Letters, vol. 39, no. 21,pp. 1551 – 1552, October 2003.
[6] ——, “Digital signal processing in continuous time: a possibility for avoiding aliasing and re-ducing quantization error,” in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP ’04). IEEE International Conference on, vol. 2, May 2004, pp. 589–592.
[7] E. Allier, G. Sicard, L. Fesquet, and M. Renaudin, “A new class of asynchronous A/D con-verters based on time quantization,” in Asynchronous Circuits and Systems, 2003. Proceed-ings of the Ninth International Symposium on, May 2003, pp. 196 – 205.
[8] Y. Li, K. Shepard, and Y. Tsividis, “A Continuous-Time Programmable Digital FIR Filter,”Solid-State Circuits, IEEE Journal of, vol. 41, no. 11, pp. 2512 –2520, November 2006.
[9] Z. Zhao and A. Prodic, “Continuous-Time Digital Controller for High-Frequency DC-DCConverters,” Power Electronics, IEEE Transactions on, vol. 23, no. 2, pp. 564 –573, March2008.
[10] B. Schell and Y. Tsividis, “A Continuous-Time ADC/DSP/DAC System With No Clock andWith Activity-Dependent Power Dissipation,” Solid-State Circuits, IEEE Journal of, vol. 43,no. 11, pp. 2472 –2481, November 2008.
118
119
[11] J. Pangjun and S. Sapatnekar, “Low-power clock distribution using multiple voltages and re-duced swings,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 10,no. 3, pp. 309 –318, June 2002.
[12] Y. Tsividis, “Event-Driven Data Acquisition and Digital Signal Processing–A Tutorial,” Cir-cuits and Systems II: Express Briefs, IEEE Transactions on, vol. 57, no. 8, pp. 577 –581,August 2010.
[13] Y. Palaskas, Y. Tsividis, V. Prodanov, and V. Boccuzzi, “A ”divide and conquer” techniquefor implementing wide dynamic range continuous-time filters,” Solid-State Circuits, IEEEJournal of, vol. 39, no. 2, pp. 297 – 307, February 2004.
[14] M. Ozgun, Y. Tsividis, and G. Burra, “Dynamic power optimization of active filters withapplication to zero-IF receivers,” Solid-State Circuits, IEEE Journal of, vol. 41, no. 6, pp.1344 – 1352, June 2006.
[15] A. Yoshizawa and Y. Tsividis, “A Channel-Select Filter With Agile Blocker Detection andAdaptive Power Dissipation,” Solid-State Circuits, IEEE Journal of, vol. 42, no. 5, pp. 1090–1099, May 2007.
[16] A. Wang and A. Chandrakasan, “A 180-mV subthreshold FFT processor using a minimumenergy design methodology,” Solid-State Circuits, IEEE Journal of, vol. 40, no. 1, pp. 310 –319, January 2005.
[17] J. Howard, S. Dighe, S. Vangal, G. Ruhl, N. Borkar, S. Jain, V. Erraguntla, M. Konow,M. Riepen, M. Gries, G. Droege, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, and R. VanDer Wijngaart, “A 48-Core IA-32 Processor in 45 nm CMOS Using On-Die Message-Passing and DVFS for Performance and Power Scaling,” Solid-State Circuits, IEEE Journalof, vol. 46, no. 1, pp. 173 –183, January 2011.
[18] Z. Yuhua, Q. Longhua, L. Qiang, Q. Peide, and Z. Lei, “A dynamic frequency scaling so-lution to dpm in embedded linux systems,” in Information Reuse Integration, 2009. IEEEInternational Conference on, August 2009, pp. 256 –261.
[19] M. Trakimas and S. Sonkusale, “An Adaptive Resolution Asynchronous ADC Architecturefor Data Compression in Energy Constrained Sensing Applications,” Circuits and Systems I:Regular Papers, IEEE Transactions on, vol. 58, no. 5, pp. 921 –934, May 2011.
[20] M. Neugebauer and K. Kabitzsch, “A new protocol for a low power sensor network,” inPerformance, Computing, and Communications, 2004 IEEE International Conference on,2004, pp. 393 – 399.
[21] K. Kozmin, “Data acquisition and signal conditioning for low power measurement systems,”Ph.D. dissertation, Lulea University of Technology, 2008.
120
[22] K. Guan and A. Singer, “Opportunistic sampling by level-crossing,” in Acoustics, Speech andSignal Processing, 2007. IEEE International Conference on, vol. 3, April 2007, pp. III–1513–III–1516.
[23] R. Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Converters. KluwerAcademic Publishers, 2003.
[24] M. Kurchuk, C. Weltin-Wu, D. Morche, and Y. Tsividis, “GHz-range continuous-time pro-grammable digital FIR with power dissipation that automatically adapts to signal activity,”in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE Interna-tional, February 2011, pp. 232 –234.
[25] Interface Integrated Circuits 1974, National Semiconductor, 1974.
[26] F. Akopyan, R. Manohar, and A. Apsel, “A level-crossing flash asynchronous analog-to-digital converter,” in Asynchronous Circuits and Systems, 2006. 12th IEEE InternationalSymposium on, March 2006, pp. 11 pp. –22.
[27] E. Allier, J. Goulier, G. Sicard, A. Dezzani, E. Andre, and M. Renaudin, “A 120nm lowpower asynchronous ADC,” in Low Power Electronics and Design, 2005. Proceedings of theInternational Symposium on, August 2005, pp. 60 – 65.
[28] J. McCreary and P. Gray, “All-MOS charge redistribution analog-to-digital conversion tech-niques. I,” Solid-State Circuits, IEEE Journal of, vol. 10, no. 6, pp. 371 – 379, December1975.
[29] LMV7219 Datasheet, National Semiconductor, 2004. [Online]. Available: http://www.national.com/ds/LM/LMV7219.pdf
[30] S.-K. Shin, Y.-S. You, S.-H. Lee, K.-H. Moon, J.-W. Kim, L. Brooks, and H.-S. Lee, “Afully-differential zero-crossing-based 1.2V 10b 26MS/s pipelined ADC in 65nm CMOS,” inVLSI Circuits. 2008 IEEE Symposium on, June 2008, pp. 218 –219.
[31] B. Hershberg, S. Weaver, and U.-K. Moon, “Design of a Split-CLS Pipelined ADC With FullSignal Swing Using an Accurate But Fractional Signal Swing Opamp,” Solid-State Circuits,IEEE Journal of, vol. 45, no. 12, pp. 2623 –2633, December 2010.
[32] P. Sotiriadis and A. Chandrakasan, “Low power bus coding techniques considering inter-wirecapacitances ,” in Custom Integrated Circuits Conference. Proceedings of the IEEE 2000,2000, pp. 507 –510.
[33] T. C. Sepke, “Comparator Design and Analysis for Comparator-Based Switched-CapacitorCircuits,” Ph.D. dissertation, Massachusetts Institute of Technology, September 2006.
121
[34] R. Corless, G. Gonnet, D. Hare, D. Jeffrey, and D. Knuth, “On the Lambert W Function,”Advances in Computational Mathematics, vol. 5, pp. 329 –359, 1996.
[35] B. Schell, “Continuous-Time Digital Signal Processors: Analysis and Implementation,”Ph.D. dissertation, Columbia University, May 2008.
[36] B. P. Lathi, Signal Processing and Linear Systems. University of Waterloo, 1998.
[37] W. Gardner, Cyclostationarity in Communications and Signal Processing. IEEE Press, 1994.
[38] P. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations. Springer,1991.
[39] M. Kurchuk, “Signal Encoding and Digital Signal Processing in Continuous Time,” Ph.D.dissertation, Columbia University, May 2011.
[40] A. Hajimiri and T. Lee, “A general theory of phase noise in electrical oscillators,” Solid-StateCircuits, IEEE Journal of, vol. 33, no. 2, pp. 179 –194, February 1998.
[41] M. Perrott, M. Trott, and C. Sodini, “A modeling approach for Σ∆ fractional-N frequencysynthesizers allowing straightforward noise analysis,” Solid-State Circuits, IEEE Journal of,vol. 37, no. 8, pp. 1028 – 1038, August 2002.
[42] M. Kurchuk and Y. Tsividis, “Signal-Dependent Variable-Resolution Clockless A/DConversion With Application to Continuous-Time Digital Signal Processing,” Circuits andSystems I: Regular Papers, IEEE Transactions on, vol. 57, no. 5, pp. 982 –991, May 2010.
[43] B. Schell and Y. Tsividis, “A Low Power Tunable Delay Element Suitable for AsynchronousDelays of Burst Information,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 5, pp. 1227–1234, May 2008.
[44] M. Kurchuk and Y. Tsividis, “Energy-efficient asynchronous delay element with wide con-trollability,” in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Sym-posium on, June 2010, pp. 3837 –3840.
[45] M. Bazes, “Two novel fully complementary self-biased CMOS differential amplifiers,” Solid-State Circuits, IEEE Journal of, vol. 26, no. 2, pp. 165 –168, February 1991.
[46] R. Aparicio and A. Hajimiri, “Capacity limits and matching properties of integrated capaci-tors,” Solid-State Circuits, IEEE Journal of, vol. 37, no. 3, pp. 384 –393, March 2002.
[47] Y.-Z. Lin, C.-C. Liu, G.-Y. Huang, Y.-T. Shyu, and S.-J. Chang, “A 9-bit 150-MS/s 1.53-mWsubranged SAR ADC in 90-nm CMOS,” in VLSI Circuits. 2010 IEEE Symposium on, June2010, pp. 243 –244.
122
[48] C.-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin, “A 10-bit 50-MS/s SAR ADC With aMonotonic Capacitor Switching Procedure,” Solid-State Circuits, IEEE Journal of, vol. 45,no. 4, pp. 731 –740, April 2010.
[49] M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. Klumperink, and B. Nauta, “A10-bit Charge-Redistribution ADC Consuming 1.9 µW at 1 MS/s,” Solid-State Circuits, IEEEJournal of, vol. 45, no. 5, pp. 1007 –1015, May 2010.
[50] G. Promitzer, “12-bit low-power fully differential switched capacitor noncalibrating succes-sive approximation ADC with 1 MS/s,” Solid-State Circuits, IEEE Journal of, vol. 36, no. 7,pp. 1138 –1143, July 2001.
[51] P. Harpe, C. Zhou, Y. Bi, N. van der Meijs, X. Wang, K. Philips, G. Dolmans, and H. de Groot,“A 26 muW 8 bit 10 MS/s Asynchronous SAR ADC for Low Energy Radios,” Solid-StateCircuits, IEEE Journal of, vol. 46, no. 7, pp. 1585 –1595, July 2011.
[52] M. Pelgrom, “A 10-b 50-MHz CMOS D/A converter with 75-Ω buffer,” Solid-State Circuits,IEEE Journal of, vol. 25, no. 6, pp. 1347 –1352, December 1990.
[53] M. Dessouky and A. Kaiser, “Input switch configuration suitable for rail-to-rail operation ofswitched op amp circuits,” Electronics Letters, vol. 35, no. 1, pp. 8 –10, January 1999.
[54] A. Abo and P. Gray, “A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter,”Solid-State Circuits, IEEE Journal of, vol. 34, no. 5, pp. 599 –606, May 1999.
[55] G. Wegmann, E. Vittoz, and F. Rahali, “Charge injection in analog MOS switches,” Solid-State Circuits, IEEE Journal of, vol. 22, no. 6, pp. 1091 – 1097, December 1987.
Appendix A
Derivation of Distortion Equations
This appendix provides the supporting derivations for quantities used in chap. 3.
A.1 The Fourier Components cm,n
We first compute the Fourier coefficients cm,n: the indices indicate the nth Fourier coefficient of the
T Rm(t) signal. If the input is a full-scale frequency-normalized sine of the form x(t) = sin(2πt),
then the output of the mth level T Rm(t) within the [0,1) interval is
T Rm(t) =
12N−1 , t1 < t(mod1)< t2
− 12N−1 , t3 < t(mod1)< t4
0, otherwise.
(A.1)
123
124
where we have used the shorthand
t1 =arcsin(Tm)
2π
t2 =12− arcsin(Tm)
2π
t3 =12+
arcsin(Tm)
2π
t4 = 1− arcsin(Tm)
2π
(A.2)
From this, we can compute the Fourier coefficients cm,n as
cm,n =∫ 1
0T Rm(t)e−2π jntdt (A.3)
Since T Rm(t) is non-zero over two intervals, we can decompose the integral:
∫ 1
0T Rm(t)e−2π jntdt =
12N−1
[−∫ t2
t1e−2π jntdt−
∫ t4
t3e−2π jntdt
](A.4)
=1
2π jn2N−1
[e−2π jnt∣∣t4
t3− e−2π jnt∣∣t2
t1
](A.5)
=1
2π jn2N−1
[e−2π jne jnAm− e−π jne− jnAm− e−π jne jnAm + e− jnAm
](A.6)
= (1− (−1)n)e jnAm + e− jnAm
2π jn2N−1 (A.7)
125
where Am = arcsin(Tm). This can be further reduced to
cm,n =
2cos(narcsin(Tm))
π jn2N−1 , n ∈ [1,3,5 · · ·)
0, otherwise.(A.8)
A.2 Non-Uniform Delay Impact on Harmonic Distortion
If fn was the nth Fourier component of vxq(t) without ZCD delay dispersion (see Eq. (3.16)),
then denoting the nth Fourier coefficient of vxq(t) (with ZCD delay dispersion) as gn, then we can
express gn as:
gn = ∑m
e−2π jndmcm,n, (A.9)
where dm are the ZCD delays corresponding to the T Rm levels. This equation follows from Eq.
(3.23). The power of the nth harmonic with delay dispersion distortion is equal to the squared norm
of gn:
|gn|2 = ∑m
e−2π jndmcm,n∑m
e−2π jndmcm,n (A.10)
= ∑m
e−2π jndmcm,n ∑m
e2π jndmcm,n (A.11)
= ∑m|cm,n|2 + ∑
r 6=se−2π jn(dr−ds)cr,ncs,n (A.12)
For shorthand we define dr,s ≡ dr− ds. We observe that the terms of the second summation
can be grouped into pairs of the form e−2π jndr,scr,ncs,n + e2π jndr,scr,ncs,n, which is equivalent to
126
z+ z = 2R z, so
|gn|2 = ∑m|cm,n|2 + ∑
r 6=sR
e−2π jn(dr,s)cr,ncs,n
(A.13)
To make sure, if we consider the case with zero (or constant) delay through the ZCD, all the terms
dr,s are zero, hence all the cross terms reduce to
|gn|2 = ∑m|cm,n|2 + ∑
r 6=sR cr,ncs,n= | fn|2 (A.14)
the last equality can be seen by squaring Eq. (3.17).
To give this relevance, the spectrum of an LCADC has dominant distortion components close
to the fundamental (3rd, 5th, etc.) and very far out in frequency. Since the far ones are easily
filtered, we focus on the close-in spurs, which have a longer periodicity. Therefore, let us make the
assumption that the ZCD delay dispersion is much smaller than the full period of the input1, i.e.
max |dr,s| 2π, where 2π is the period of the frequency-normalized input. If we consider the nth
harmonic for small n, it would carry over that nmax |dr,s| 2π, so we can then approximate
e−2π jn(dr,s) = cos(2πndr,s)− j sin(2πndr,s) (A.15)
≈ 1−2π2n2d2
r,s− j2πndr,s (A.16)
1For LCADCs based on feedback DACs (i.e. not flash LCADC) the maximum delay around the entire loop, whichis also an upper bound on the absolute ZCD delay, must be a fraction of the input period, and usually even a fractionof the time granularity. Therefore, the assumption is valid.
127
When we substitute this back into Eq. (A.13), we get
|gn|2 = ∑m|cm,n|2 + ∑
r 6=sR e−2π jndr,scr,ncs,n (A.17)
= ∑m|cm,n|2 + ∑
r 6=s
[(1−2π
2n2d2r,s)
R cr,ncs,n+2πndr,sIcr,ncs,n]
(A.18)
= ∑m|cm,n|2 + ∑
r 6=sR cr,ncs,n−2π
2n2∑r 6=s
d2r,sR cr,ncs,n+2πn ∑
r 6=sdr,sIcr,ncs,n (A.19)
= | fn|2−2π2n2
∑r 6=s
d2r,sR cr,ncs,n+2πn ∑
r 6=sdr,sIcr,ncs,n (A.20)
Therefore, we can write
|gn|2 = | fn|2 + en, (A.21)
where
en = 2πn ∑r 6=s
dr,sIcr,ncs,n−2π2n2
∑r 6=s
d2r,sR cr,ncs,n. (A.22)
A.3 Input Slope at T Rm(t) Transitions
At a transition of the T Rm(t) signal, we know that the input voltage to the quantizer is ±Tm. There
are four transitions per period of the T Rm(t) signal, whose transition times are given in eqs. A.2.
Since the slope of a full-scale frequency-normalized sine input is
v′x(t) = 2πcos(2πt) (A.23)
128
if we plug in t1 · · · t4 from Eq. (A.1) we get
|v′m|= 2πcos(arcsin(Tm)) (A.24)
Where v′m is the derivative of the input at the transitions of the T Rm(t) signal. It is assumed that the
ZCD has a response time that varies only with the magnitude of input derivative, not input polarity.
Appendix B
Silicon Errata
The following is a list of the bugs so far discovered in the LCADC.
1. All power domains are isolated from each other with only one back-to-back diode, and there-
fore cannot be powered off independently (without drawing several mA).
2. Layout mismatch due to stress effects of metal routing over active areas in the current DACs
of the timing generator cause non-monotonicity in the delay tuning curves. The routing was
necessary to save layout area.
3. Even when SPI is inactive (chip select CS high) serial data out (SDO) is still low-impedance.
In the situation where the IC is on a shared SPI bus, an external tristate buffer is needed on
SDO to share the output line. This is not a problem when the IC communicates directly with
the SPI master.
129
130
4. Under certain circumstances, when the analog supply is ramped slowly (such as in the pres-
ence of large external decoupling and/or high supply series resistance) the input switch boot-
strap driver may not initialize properly. In normal operation, this disables the chip since the
input sampling capacitor is discharged and the ZCD never triggers a sample, so the input
bootstrap driver is never initialized. A solution is to run the calibration algorithm imme-
diately after power-on/reset, since the calibration routine force-initializes the high-voltage
generator. The other solution is to put an RC timeout on the reset pin as was done in the
testing board.
5. An undetermined source of nonlinearity (suspected to be in the bootstrapped input switch)
limits the usable range of DAC output codes to be around 190/256. This is discussed in detail
in Chap. 5.
6. Under rare conditions (if the calibration mode is enabled precisely when the asynchronous
logic is in certain state), the input bootstrapped switch may not disconnect the input from
the ADC during calibration, resulting in erroneous calibration values. It has been found that
this event occurs so rarely that simply running the calibration three times and taking the two
results most similar is sufficient.
7. A ground-referenced bias voltage is sent between power domains in the timing replica test
circuit. Therefore, any disturbances between the two grounds is coupled directly into the test
circuit. Under normal conditions this should not prevent proper operation, although it will
increase timing jitter in the test outputs. The main function of the test circuits is to collect
131
data for this thesis; this error is not present in the main timing generator and therefore does
not degrade ADC performance.
8. The SPI input pads do not have hysteresis and are thus prone to registering a ringing line as
erroneous clock edges. With proper termination (a shunt of 100 Ω and 220 pF was used for
a 1-m 100-Ω ribbon cable) the SPI bus will operate reliably at 12 Mb/s.