noise-shaping gradient descent based online adaptation ...noise-shaping gradient descent based...

12
1 Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty, Senior Member, IEEE, Ravi Shaga, Kenji Aono, Student Member, IEEE Abstract—Analog circuits that are calibrated using digital-to- analog converters use a DSP based algorithm for real-time adap- tation and programming of system parameters. In this paper, we first show that this conventional framework for adaptation yields sub-optimal calibration properties due to artifacts introduced by quantization noise. We then propose a novel online stochastic optimization algorithm called noise-shaping or ΣΔ gradient de- scent, that can shape the quantization noise out of the frequency regions spanning the parameter adaptation trajectories. As a result, the proposed algorithms demonstrate superior parameter search properties compared to floating-point gradient-methods and better convergence properties than conventional quantized gradient-methods. In the second part of this paper, we apply the ΣΔ gradient descent algorithm to two examples of real-time digital calibration:(a) balancing and tracking of bias currents; and (b) frequency calibration of a band-pass Gm-C biquad filter biased in weak-inversion. For each of these examples, the circuits have been prototyped in a 0.5-μm CMOS process and we demonstrate that the proposed algorithm is able to find the optimal solution even in the presence of spurious local minima which are introduced due to the non-linear and non-monotonic response of calibration DACs. Index Terms—online learning, sigma-delta learning, noise- shaping, stochastic optimization, digitally assisted analog circuits, analog VLSI, quantization I. I NTRODUCTION D IGITALLY calibrated analog circuits exploit the preci- sion and reliability of digital signal processing tech- niques to compensate and calibrate for artifacts (mismatch or non-linearity) encountered in analog components [1]–[3]. Digital calibration (real-time or offline) has been successfully applied to the design of neuromorphic systems [5]–[8], to the design of precision analog-to-digital converters (ADCs) [4] and to the design of current references and memories [9], [10]. Nowhere is the effect of analog artifacts more pronounced than for ultra-low-power CMOS analog circuits where the MOSFET transistors are biased in weak-inversion [5], [11], [13]. As a result, precise and real-time calibration of the circuit requires an exhaustive search and adaptation over the system parameters. In digitally calibrated analog circuits, the system parameters are typically stored on on-chip digital- to-analog converters (DACs) which are programmed using a digital signal processor (DSP). While many adaptation strategies could be implemented on the DSP [1], the most S. Chakrabartty and K. Aono are with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824 USA. R. Shaga is currently with the Intel Corporation, OR, USA (e-mails: {shantanu,shagarav,aonokenj}@egr.msu.edu. g(θ) Analog System ADC DSP DAC θ Fig. 1. Calibration of an analog circuit (system) using a digital adaptation loop. simple and commonly used adaptation algorithm involves the use of online gradients [3]. This method is illustrated in Fig. 1 where the analog circuit to be calibrated is embedded within a digital adaptation loop and the system parameters θ are incrementally updated based on the gradient of an error or quality metric, g(θ) [1], [12], [14]. However, when the error gradients are measured directly or indirectly using an analog-to-digital converter (ADC) (see Fig. 1) and when the system parameters are programmed using non-ideal digital-to-analog converters (DAC), quantiza- tion noise and non-linearity is injected into the adaptation process. In the first part of this paper, we first analyze the effect of quantization noise on the adaptation process and show that the the conventional calibration approach leads to a sub-optimal trade-off between the precision of the solution and speed of convergence of the algorithm. This effect is illustrated in Fig. 2 which plots a typical system adaptation trajectory in time and spectral domain. Fig. 2(a) plots the system error (computed as the norm of the difference between the instantaneous system parameter θ and the target/ideal parameter θ * ) with respect to time as the system continuously adapts. For any asymptotically consistent adaptation algorithm, the system error should converge to zero (absolutely or in probability) over time. In the spectral domain, the adaptation trajectory can be visualized to be band-limited within a low- frequency region as shown in Fig. 2(b). When the error gradient is directly quantized, the quantization noise affects the adaptation trajectory (shown in the time plot in Fig. 2(a))

Upload: others

Post on 25-Dec-2019

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

1

Noise-Shaping Gradient Descent based OnlineAdaptation Algorithms for Digital Calibration of

Analog CircuitsShantanu Chakrabartty,Senior Member, IEEE, Ravi Shaga, Kenji Aono,Student Member, IEEE

Abstract—Analog circuits that are calibrated using digital-to-analog converters use a DSP based algorithm for real-time adap-tation and programming of system parameters. In this paper,wefirst show that this conventional framework for adaptation yieldssub-optimal calibration properties due to artifacts introduced byquantization noise. We then propose a novel online stochasticoptimization algorithm called noise-shaping orΣ∆ gradient de-scent, that can shape the quantization noise out of the frequencyregions spanning the parameter adaptation trajectories. As aresult, the proposed algorithms demonstrate superior parametersearch properties compared to floating-point gradient-methodsand better convergence properties than conventional quantizedgradient-methods. In the second part of this paper, we applythe Σ∆ gradient descent algorithm to two examples of real-timedigital calibration:(a) balancing and tracking of bias currents;and (b) frequency calibration of a band-pass Gm-C biquadfilter biased in weak-inversion. For each of these examples,thecircuits have been prototyped in a 0.5-µm CMOS process andwe demonstrate that the proposed algorithm is able to find theoptimal solution even in the presence of spurious local minimawhich are introduced due to the non-linear and non-monotonicresponse of calibration DACs.

Index Terms—online learning, sigma-delta learning, noise-shaping, stochastic optimization, digitally assisted analog circuits,analog VLSI, quantization

I. I NTRODUCTION

D IGITALLY calibrated analog circuits exploit the preci-sion and reliability of digital signal processing tech-

niques to compensate and calibrate for artifacts (mismatchor non-linearity) encountered in analog components [1]–[3].Digital calibration (real-time or offline) has been successfullyapplied to the design of neuromorphic systems [5]–[8], to thedesign of precision analog-to-digital converters (ADCs) [4]and to the design of current references and memories [9], [10].Nowhere is the effect of analog artifacts more pronouncedthan for ultra-low-power CMOS analog circuits where theMOSFET transistors are biased in weak-inversion [5], [11],[13]. As a result, precise and real-time calibration of thecircuit requires an exhaustive search and adaptation over thesystem parameters. In digitally calibrated analog circuits, thesystem parameters are typically stored on on-chip digital-to-analog converters (DACs) which are programmed usinga digital signal processor (DSP). While many adaptationstrategies could be implemented on the DSP [1], the most

S. Chakrabartty and K. Aono are with the Department of Electrical andComputer Engineering, Michigan State University, East Lansing, MI, 48824USA. R. Shaga is currently with the Intel Corporation, OR, USA (e-mails:shantanu,shagarav,[email protected].

g(θ)

Analog System

ADC

DSPDAC

θ

Fig. 1. Calibration of an analog circuit (system) using a digital adaptationloop.

simple and commonly used adaptation algorithm involves theuse of online gradients [3]. This method is illustrated in Fig. 1where the analog circuit to be calibrated is embedded withina digital adaptation loop and the system parametersθ areincrementally updated based on the gradient of an error orquality metric,g(θ) [1], [12], [14].

However, when the error gradients are measured directlyor indirectly using an analog-to-digital converter (ADC) (seeFig. 1) and when the system parameters are programmedusing non-ideal digital-to-analog converters (DAC), quantiza-tion noise and non-linearity is injected into the adaptationprocess. In the first part of this paper, we first analyze theeffect of quantization noise on the adaptation process andshow that the the conventional calibration approach leads to asub-optimal trade-off between the precision of the solutionand speed of convergence of the algorithm. This effect isillustrated in Fig. 2 which plots a typical system adaptationtrajectory in time and spectral domain. Fig. 2(a) plots thesystem error (computed as the norm of the difference betweenthe instantaneous system parameterθ and the target/idealparameterθ∗) with respect to time as the system continuouslyadapts. For any asymptotically consistent adaptation algorithm,the system error should converge to zero (absolutely or inprobability) over time. In the spectral domain, the adaptationtrajectory can be visualized to be band-limited within a low-frequency region as shown in Fig. 2(b). When the errorgradient is directly quantized, the quantization noise affectsthe adaptation trajectory (shown in the time plot in Fig. 2(a))

Page 2: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

2

Fig. 2. Effect of quantization noise on adaptation trajectories: in (a) time-domain; and in (b) spectral-domain.

by injecting noise into the adaptation frequency band (shownin spectral plot Fig. 2(b).) The injection of noise affects theprecision and hence the speed of the convergence.

But addition of quantization-noise during the adaptation isbeneficial because it makes the online algorithm robust tobeing trapped in local minima. The robustness is importantwhen the calibration DACs used in Fig. 1 are non-ideal. DACnon-ideality arise due to design constraints (like small siliconarea and sub-threshold biasing) that ensure that the calibrationcircuitry does not consume too much power or area. Thisimplies that the system error metric being optimized is a non-linear function of the system parameters. Thus, the magnitudeof the quantization noise determines the trade-off betweenthe speed/precision of convergence and the parameter searchproperties of the gradient-based calibration algorithm.

In this paper, we propose a class of noise-shaping online op-timization algorithms calledΣ∆ gradient-descent algorithmsthat achieve a better trade-off between the speed/precisionof convergence and the robustness to spurious local minima.The concept is illustrated in Fig. 2(b) where the algorithmshapes the quantization noise out of the adaptation frequency-band. Note that this approach in principle, is similar to theconventionalΣ∆ modulation [15], however, its use for onlineoptimization algorithms like gradient-descent has not beenreported. In particular, we demonstrate the effectivenessofthe Σ∆ gradient-descent algorithm for digital calibration ofanalog circuits.

The paper is organized as follows: Section II compares the

spectral properties of conventional and the proposed noise-shaping optimization algorithms. Section III describes theapplication of the proposed algorithm for digital calibrationof some example analog circuits and Section IV concludesthe paper with some discussion on future research directions.Before we describe the mathematical model of the proposedonline learning algorithms, we define some of mathematicalnotations that will be used throughout this paper.

A (bold capital letters) A matrix with its elements denoted byaij , i = 1, 2, ..; j = 1, 2, ...

x (normal lowercase letters) A scalar variable.x (bold lowercase letters) A vector with its elements denotedby

xi, i = 1, 2, ...||x||p The Lp norm of a vector and is given by

||x||p = (∑

i |xi|p)1/p.A

T Transpose ofA.xn Discrete-time sequence of vectors wheren

denotes the time-index.RD denotes a D-dimensional space of real-

numbers.

II. M ATHEMATICAL MODELING OF NOISE-SHAPED

ONLINE OPTIMIZATION

We will denote the parameters of the analog system by avector θ ∈ RD. For example, the system parameters of aband-pass filter-bank could be a vector comprising the centerfrequencies or its Quality factors. The objective of a calibrationor a system optimization procedure is to estimateθ

∗ such that

θ∗ = argmin

θ∈CL(θ) (1)

whereL : RD → R is a system error function andC is theconstraint space over which the minimization is performed.Itis assumed thatL(.) ≥ 0 and is continuous and differentiablew.r.t θ. Also, the calibration algorithm has access to the systemgradient functiong(θ) ∈ RD and is given by

g(θ) =∂L(θ)

∂θ. (2)

A. Steepest Gradient-descent Algorithm

The simplest form of online calibration uses the steepestgradient-descent (GD) algorithm [16], [17] where the param-etersθ are iteratively updated according to

θn = θn−1 − εng(θn−1). (3)

n = 1, 2, . . . denotes the iteration or time index andεndenotes a learning-rate parameter. It has been shown that GDalgorithm converges to a local minimum when the learningrate asymptotically satisfies

∑∞

n=1εn → ∞ and εn → 0

as n → ∞ [16]. In literature, different forms of steepestGD algorithms have been proposed based on the choice ofthe error functionL(.) and the trajectory of the learning-rateparameter. A popular variant that is only applicable to convexmean-square error functions are the least-mean-square (LMS)algorithms (e.g. the recursive LMS (RLMS) or time-varyingLMS (TV-LMS) [18] algorithms) which are commonly used inthe design of adaptive filters. Also, many of the GD algorithmsstart with a larger value of the learning-rate to speed-up the

Page 3: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

3

g(θ)

∆ θn-1

dn

qn

Parameter Adaptation

Analog System

ADC

P(z)θEn-1

Fig. 3. System architecture showing the signal-flow of a quantized gradientdescent (QGD) algorithm.

convergence, but then decrease the magnitude of the learning-rate to precisely converge to the local minima. However, thisasymptotic property is not useful when the objective is real-time tracking and calibration of system parameters. Also, whenL(.) has multiple local minima, smaller values ofεn increasesthe likelihood of that iteration in equation (3) of being trappedin the local minimum. In our comparison we have used afixed value of the learning rateεn = ε, even though theprocedure could potentially be generalized to include time-varying learning-rate parameters.

B. Quantized Gradient-descent Algorithms

In a real-world analog circuit or system, the system-gradientg(.) is measured using an ADC, the GD adaptation is imple-mented using a DSP or a micro-controller and the DACs areused for generating the analog parameters. The signal-flowarchitecture is shown in Fig. 3 whereg(.) is quantized beforebeing processed by the GD based adaptation algorithm. Froman analytical point of view, the ADC and the DACs couldbe combined and the output vectordn can be modeled by aquantization noise vector [19]qn added to the input of theADC/DAC module as shown in Fig. 3. The GD algorithm inequation (3) can be accordingly modified as:

θn = θn−1 − εn [g(θn−1) + qn] . (4)

The algorithm in equation (4) has a similar form as a noisy-GD algorithm [20], [21] and sign-based adaptive filters [22],both of which have been extensively studied in literature.When the quantization noiseqn admits certain statisticalproperties [23] (like zero-mean and bounded-variance), itcanbe shown thatθn is bounded and convergesalmost surelyto a local minimum. Also, addition of noise to the GDalgorithm improves the search properties of the algorithm(to find a more optimal solution) as the stochastic natureof the algorithm would facilitate escaping the neighborhoodof the local minima [20], [24], [25]. Note that dependingon the choice of learning parameterεn, the final value ofθ will oscillate randomly about the optimal pointθ∗ andwill affect the precision of convergence. In literature, twoapproaches have been proposed to alleviate this effect of noiseand improve the precision of convergence. The first approachuses simulated annealing techniques [26], [27] where largervalue of the adaptation parameterε is chosen initially and issuccessively damped allowing the initial iterations to move

θ out of the local minima neighborhood. As the number ofiterations is increased, the effect of the additive noise decreasesas the adaptation parameter goes to zero. However, the choiceof the noise floor level and annealing rate depends on the lossfunction being minimized and in most cases the nature of theloss-function and hence the gradient is not known a-priori.Also, if the objective is the real-time parameter estimationand tracking, asymptotic methods like annealing are not veryuseful because the magnitude of the noise and the rate ofannealing has to continuously adjusted. The second approachuses a post-filtering technique on the parameter sequenceθn. This is shown in Fig. 3 where the post-processing filterP (z) (typically a low-pass filter) filters out the noise andproduces smoothed-out estimates the parameterθ

En . For real-

time tracking and adaptation, the second approach is morepractical and hence has been chosen for our analysis andimplementation.

We analyze the spectral properties of the quantized GD(QGD) algorithm in the proximity of the local minimumθ∗

where the gradient functiong(θ) can be approximated usinga Taylor series expansion as:

g(θ) ≈ g(θ∗) +G(θ∗)T (θ − θ∗) (5)

g(θ) ≈ G(θ∗)T (θ − θ∗). (6)

G(θ∗) ∈ RP ×RP is a positive definite Hessian matrix com-puted at the local minimaθ∗ whereg(θ∗) = 0. Substitutingthe equation (6) into equation (4)

θn = θn−1 − εn[

G(θ∗)T (θn−1 − θ∗) + qn

]

(7)

which can be rewritten as

θn−θ∗ = θn−1−θ

∗−εn[

G(θ∗)T (θn−1 − θ∗) + qn

]

(8)

Denoting the error signal as the difference between the in-stantaneous parameter vector and the sub-optimal parametervector asen = θn − θ

∗, equation (8) leads to

en = en−1 − εn[

G(θ∗)T en−1 + qn

]

. (9)

If εn = ε is kept constant, then equation (9) can be analyzedin the spectral domain using z-transforms which leads to:

E(z) = −[

I+ z−1(εG(θ∗)− I)]−1

εQ(z) (10)

To understand the spectral properties of equation (10), considerthe case whenθ ∈ R is a scalar parameter. Equation (10) canbe reduced to

E(z) =−εQ(z)

1− z−1 + εG(θ∗)z−1. (11)

At frequenciesω ωs, where ωs is the frequency ofparameter updatesz−1 ≈ 1−j ω

ωsand hence the power spectral

density (PSD) of the error signalE(ω) can be written as

|E(ω)||Q(ω)| =

1

G(θ∗)

[

1 +ω2

ω2p

]−1/2

(12)

where

ωp =εωsG(θ∗)

1− εG(θ∗). (13)

Page 4: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

4

If eEn = θEn − θ∗ denotes the error between the estimatedvalue of the system parameterθEn and the actual parameterθ∗,then the mean-square error (MSE)EQGD is given by

EQGD = limN→∞

1

N

N∑

n=1

|eEn |2. (14)

Because the adaptation process is much slower than the rate atwhich the system parameters are updated, we can assume thatthe adaptation process is bandlimited between[−ωa ωa] Hz,with ωa ωs. Under this assumption, an ideal reconstructionfilter P (z) in Fig. 3 would be a brick-wall filter given by

P (ω) =

1; |ω| ≤ ωa

0; |ω| > ωa(15)

Thus, the spectrum of the erroreEn is related toE(ω) accordingto

EE(ω) = P (ω)E(ω) (16)

and equation (14) can be rewritten using Parseval’s theoremas

EQGD =

∫ ∞

−∞

|EE(ω)|2dω (17)

=

∫ ∞

−∞

|P (ω)|2|E(ω)|2dω (18)

=

∫ ωa

−ωa

|E(ω)|2dω (19)

Combining equations (19) and (12) leads to

EQGD =

∫ ωa

−ωa

|Q(ω)|2G(θ∗)

[

1 +ω2

ω2p

]−1

dω. (20)

To simplify the analysis, we will assume that the quantizationnoise exhibits a white-noise spectrum (a commonly usedassumption [15], [19]) within the frequency band[−ωs ωs]. Ifthe combined resolution of the ADC and the DAC in Fig. 1 isassumed to beB bits, then the power spectral density of thequantization noise is given by|Q(ω)|2 = 2−2B/12ωs [15],[19]. Note that the white-noise assumption is not entirelyvalid for a single-bit quantizer. In this case, one could employa “dithering” technique where white-noise is syntheticallyinjected before the quantizer [15]. The total powerE withinthe adaptation band can be found to be

EQGD =2−2Bωp

6G(θ∗)ωstan−1

(

ωa

ωp

)

. (21)

C. First-order Noise Shaping Gradient Descent Algorithm

One of the approaches for reducing the effect of quan-tization noise is to use noise-shaping techniques to movethe noise power out of the adaptation band. Noise-shapingprinciples have been used extensively in the design ofΣ∆analog-to-digital converters [15]. However, the method hasnot been applied to online optimization with the exceptionof our previous work [28], [29] whereΣ∆ modulation wasobtained as a consequence of an online learning algorithm.An architecture implementing first-order noise-shaping for theΣ∆ gradient-descent (SGD) algorithm is shown in Fig. 4.

g(θ)

dn

qn

Parameter Adaptation

Analog System

ADC

θn-1

P(z)θEn-1

Fig. 4. System architecture showing the signal-flow of aΣ∆ gradient descentalgorithm.

When compared to the previous approach as shown in Fig. 3,a Σ∆ modulator is used instead of the conventional quantizer(ADC). Please note that since the system parameterθ is beingcontinuously adapted in both the cases (using the output fromthe quantizer), the comparison between the proposed and theconventional approach is reasonable. Also, the sampling-rateor parameter update-rate is the same for SGD and QGD andthe traditional argument ofΣ∆ oversampling is not directlyapplicable in the case of online learning. In fact, the noveltyof the proposedΣ∆ gradient descent approach arises from thefact that the steepest gradient descent algorithm in (3) naturallyperforms low-pass filtering, and therefore SGD achieves abetter calibration/convergence compared to the conventionalQGD method.

The updates for the parameterθ can be expressed as

θn = θn−1 − ε [g(θn−1) + qn − qn−1] . (22)

Using the Taylor series approximation ofg(θn−1) given byequation (6) in equation (22) leads to

θn = θn−1 − ε[

G(θ∗)T (θn−1 − θ∗) + qn − qn−1

]

. (23)

Similar to the analysis for the QGD algorithm, the z-transform of the erroren = θn − θ

∗ can be obtained as

E(z) = −[

I+ z−1(εG(θ∗)− I)]−1

ε(1− z−1)Q(z), (24)

which for a one-dimensional system parameter can be ex-pressed as

E(z) =−ε

(

1− z−1)

Q(z)

1− z−1 + εG(θ∗)z−1. (25)

At frequency ω ωs, with ωs being the frequency ofparameter updates,z−1 ≈ 1−j ω

ωs. The error transfer function

at frequencyω is given by

|E(ω)||Q(ω)| =

ω

G(θ∗)ωs

[

1 +ω2

ω2p

]−1/2

. (26)

whereωp is given by equation (13). For the reconstructionfilter P (z) given by equation (15), the MSEESGD betweenthe estimated value of the system parameterθEn and the actualparameterθ∗ is given by

ESGD =

∫ ωa

−ωa

ω2|Q(ω)|2G(θ∗)ω2

s

[

1 +ω2

ω2p

]−1

dω. (27)

Page 5: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

5

The MSE reduces to

ESGD =2−2B

6G(θ∗)

ω2

p

ω2s

[

ωa

ωs− ωp

ωstan−1

(

ωa

ωp

)]

(28)

under the assumption that quantization noise exhibits a white-noise spectrum.

The relative reduction in MSE can be expressed as the ratioof equations (28) and (21) according to

ESGD

EQGD=

ω2

p

ω2s

[

ωa

ωp− ωp

ωstan−1

(

ωa

ωp

)]

ωp

ωstan−1

(

ωa

ωp

) . (29)

For the caseωa ωp, equation (29) can be approximated by

ESGD

EQGD=

1

3

(

ωa

ωs

)2

. (30)

In equation (30) we have used the Taylor series approximationtan−1(z) ≈ z − z3/3. Thus, the reduction in MSE within the

adaptation bandR(dB) = 10 log10

(

ESGD

ENGD

)

is given by

R(dB) = −4.7 + 20 log10

(

ωa

ωs

)

. (31)

and is independent of the learning rate parameterε and is onlya function of the ratio of the sampling frequencyωs and themagnitude of the adaptation bandwidth2ωa. This ratio canbe viewed as an online learning equivalent of “oversamplingratio”, a metric commonly used to characterize the resolutionof a Σ∆ analog-to-digital converter. Thus, for larger valuesof ε, the relative MSER is independent of the learning rateand can be significantly reduced by increasing the samplingfrequencyωs.

For the caseωa ≈ ωp, equation (29) can be approximatedby

ESGD

EQGD=

4− π

(

ωp

ωs

)2

. (32)

which can be written in terms of the learning rate parameterε as

R(dB) = −1 + 20 log10

[

1− εG(θ∗)

εG(θ∗)

]

. (33)

Thus, for smaller values ofε the relative reduction in MSEis independent of the sampling frequency and is a mono-tonic function of ε. Note that the relative improvement inperformance is not significant in the regions where the costfunction has plateaus or regions around the local minimawhereG(θ∗) ≈ 0. We now verify these results for a specificexample.

Example I:Consider a time-dependent cost functionL(θ;n)of a one-dimensional parameterθ, given by

L(θ;n) =1

2[sin(nω0Ts)− θ]

2 (34)

wheresin(nω0Ts) is a discrete-time sinusoidal signal at fre-quency ω0 and sampled at everyTs seconds. MinimizingL(θ;n) with respect toθ will favor θ which closely tracksthe sinusoidal signal. The gradient ofL estimated atθn−1 isgiven by g(θn−1) = sin(nω0Ts) − θn−1 and the Hessian is

0 1000 2000 3000 4000 5000 6000−4

−3

−2

−1

0

1

2

3

4

Iterations (n)

θ n

SGDQGDGDtarget

Fig. 5. Plot showingθn generated by the GD, QGD and SGD algorithmalong with the idealθ∗. The output of the GD algorithm is overlaid by theideal trajectory

0 1000 2000 3000 4000 5000 6000−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Iterations (n)

θE n

SGDQGDGDtargetA

Fig. 6. Plot showingθEn generated by the GD, QGD and SGD algorithmwhen additive white noise (-6dB) is used as a model for the quantizationnoise. The output of the GD algorithm is overlaid by the idealtrajectory

given byG(θ) = 1. Thus, application of equation (12) leadsto

|E(ω)||Q(ω)| =

[

1 +ω2

ω2p

]−1/2

(35)

withωp =

εωs

1− ε. (36)

Fig. 5 compares the trajectory of the variableθn generatedby the steepest gradient descent (GD), quantized gradientdescent (QGD) and the proposedΣ∆ gradient descent (SGD)algorithms. The ideal or the target parameterθ∗ is alsoshown in Fig. 5 for comparison. As expected, both QGD andSGD algorithms generate noisy representation of the target

Page 6: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

6

10−4

10−3

10−2−150

−100

−50

0

50

Normalized Frequency

Err

or (

dB)

SGD

QGD

GD

10−4

10−3

10−2−150

−100

−50

0

50

Err

or (

dB)

Normalized Frequency

SGD

QGD

GD

10−4

10−3

10−2−150

−100

−50

0

50

Normalized Frequency

Err

or (

dB)

SGD

QGD

GD

(a) (b) (c)

Fig. 7. Magnitude spectrum of the error signal in the case of ideal(GD),quantized(QGD) andΣ∆ gradient descent(SGD) algorithms for: (a) quantizationnoise level -30dB (relative to the magnitude of the input signal), (b) a 1-bit quantizer withε=0.5; and (c) a 4-bit quantizer withε=0.5.

0 0.2 0.4 0.6 0.8 1−80

−60

−40

−20

0

ε

MS

E (

dB)

SGDQGD

0 0.2 0.4 0.6 0.8 1−80

−60

−40

−20

0

ε

MS

E (

dB)

SGDQGD

0 0.2 0.4 0.6 0.8 1−80

−60

−40

−20

0

ε

MS

E (

dB)

SGDQGD

(a) (b) (c)

Fig. 8. SNR(in dB) vs adaptation parameterε of gradient descent(GD), quantized gradient descent(QGD)and 1st orderΣ∆ gradient descent(SGD) optimizationalgorithms with a n-bit quantizer for (a) n=1,(b) n=2 and (c)n=4.

parameter, where as the traditional GD algorithm can perfectlytrack the ideal parameter trajectory. However, when the noisysequenceθn is filtered using the reconstruction filterP (z),one can see significant differences in convergence propertiesexhibited by QGD and SGD algorithms, as shown in Fig. 6.In Fig. 6, an8th-order Butterworth low-pass filter has beenused as the reconstruction filterP (z) to obtainθEn . Note thatthe Butterworth filter “approximately” satisfies equation (15).Also, for this plot the quantization noise has been modeledusing an additive white noise with power -6dB. The con-vergence plot can be divided into two regions: in regionlabeledA, the algorithm converges to the target trajectorystarting from an initialization condition; where as in the secondregion the algorithms trajectory tracks the target trajectory.Note that there is a phase (or processing) delay between theparameter sequence produced by all the algorithms and thetarget sequence. This delay is determined by the choice of thelearning rate parameterε. It can be clearly seen from Fig. 6that the convergence of SGD algorithm is superior (in termsof precision) when compared to the QGD algorithm.

The effect of quantization noise and the benefits of noise-shaping in SGD learning algorithm can be understood by ana-lyzing the spectrum of the error signalE(ω). Fig. 7(a) showsthe case when the quantization noise is modeled by additivewhite Gaussian noise at magnitude -30dB. All mathematicaloperations for this simulation have been implemented usingfloating-point arithmetic. Fig. 7(b) shows the equivalent resultwhen a 1-bit quantizer is used as an ADC and Fig. 7(c) showsthe case when 4-bit quantizer is used. The results are shownonly for the adaptation frequency band which isω ωs.The results clearly show the noise-shaping effect of the SGD

learning algorithm when compared to the QGD algorithm,which results in a lower residual error within the adaptationband. Note that when a 1-bit quantizer is used, the SGDalgorithm exhibits “idle-tones” in its spectral response similarto the case for a single-bitΣ∆ modulator [15]. However, thefrequency of the “idle tones” are located at frequencies outsidethe adaptation band and hence can not be observed in 7(b).Also note that even for the floating-point gradient descent(GD) algorithm, the residual error is non-zero, which is dueto the phase delay between target and the parameter sequencegenerated by different algorithms (see Fig. 6). The MSE foreach of the algorithms can be computed by integrating thepower of the error-signalE(Ω) within the adaptation bandaccording to equations (19) and (27).

Fig. 8 (a)-(c) compares the MSE for the QGD and SGDalgorithms where the MSE has been computed accordingto MSE(dB) = 10 log

10E . The results illustrate several

interesting trends which can also be explained based on themathematical analysis given in Section II:

• As the value of the learning-rate parameterε is in-creased, the MSE of the QGD algorithm increases. Thisis expected as near convergence, the magnitude of theoscillation (or noise) is proportional toε. Mathematically,this is explained by equation (21) whereωp increases asεis increased. However, for the SGD algorithm, the relativeMSE reduces with the increase inε as can be seen whenthe equations (31) and (33) are compared. Conceptually,this phenomenon occurs due to two counter-balancingprinciples. Increasing the learning rate makes the learningprocess faster which enables the algorithm to convergefaster or track a time-varying system parameter. However,

Page 7: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

7

0 0.5 1 1.5 2−4

−3

−2

−1

0

1

Parameter θ

Loss

Fun

ctio

n

Loss FunctionSGDGD

Fig. 9. Convergence ofΣ∆ gradient descent algorithm to global minima.

Fig. 10. Micrograph of the prototype integrating differentcircuit modulesused to verify the proposedΣ∆ gradient-descent based calibration.

increase in learning rate also increases the effect of quan-tization noise on the convergence to the local minima. Inthe case of the SGD algorithm, noise-shaping moves thequantization noise out of the adaptation band and hencewhen the learning rate is increased, the deterioration dueto increase in the precision error is smaller comparedto performance improvement due to the reduction intracking error (or speed of convergence). Thus, a largervalue of ε enhances the noise-shaping property of SGDand, as we will see later, also enhances the robustness ofthe SGD algorithm to spurious local minima.

• At lower values ofε, the difference between the MSEachieved by the QGD and SGD algorithm reduces. Infact, at higher resolution of the ADC, the MSE of theQGD algorithm is lower (not significantly) than that ofthe SGD algorithm. This effect is also explained bycomparing the MSE equations (21) and (28) where thecontribution due to the inverse tangent function remainsbounded irrespective of the value ofε. However, notethat reducingε also increases the phase-delay betweenthe target parameter and the estimated parameter as isillustrated in Fig. 6.

The consequence of results in Fig. 8 (a)-(c), is that at alarger magnitude of the learning rateε, the SGD algorithm hassuperior convergence properties compared to QGD. However,larger value ofε along with the stochastic nature of theSGD updates makes the algorithm less susceptible to gettingtrapped in local minima. This is illustrated in Fig. 9, where

we have considered an example of a non-linear cost function.The conventional floating-point GD gets trapped to a localminimum whereas the SGD algorithm successfully convergesto a more optimal minimum. Note that due to quantizationnoise, QGD algorithm will also have similar search propertiesas SGD. Thus, the proposed SGD algorithm achieves a supe-rior trade-off between the precision or speed of convergenceand robustness when compared against other online gradient-based techniques. Note that larger magnitude ofε would alsoimply that the SGD algorithm could potentially escape fromthe global minima. Therefore, the value ofε needs to beappropriately chosen based on an a-priori knowledge about thenature of the cost function. Note that the exact mathematicalform of the cost-function need not be known a-priori.

III. A PPLICATIONS IN DIGITALLY CALIBRATED ANALOG

CIRCUITS

In this section, we demonstrate the application of SGDalgorithm for digital calibration of analog circuits. The circuitshave been prototyped in a 0.5µm CMOS process and themicrograph of the prototype is shown in Fig. 10. The circuitmodules include a continuous-time first-orderΣ∆ modulatorwhich implements the signal-flow shown in Fig. 4 and acurrent-mode digital-to-analog converter (DAC) with a serialprogramming interface (SPI) for storage and adaptation ofcircuit parameters. The output of the modulator is sampledby an off-chip field-programmable gate array (FPGA) whichimplements the online adaptation according to the Fig. 4. TheFPGA then programs the on-chip current DAC. The prototypealso integrates current sources and continuous-time Gm-Cfilters which have been used in the examples discussed in thissection.

A. Current tracking and current balancing

Our first setup, shown in Fig. 11(a) implements the trackingexample which was described in Section II-C. The trackingsignal is a current sinkISIG which is shown in Fig. 11(a)and the system parameter is the currentIDAC which is pro-grammed using a DAC as shown in Fig. 11(a). The schematicof the current DAC is shown in Fig. 11(c) which uses a MOSbased resistive divider that has been previously reported inliterature [30], [31]. For this example, the objective functionL(IDAC) is given by

L(IDAC) =1

2(ISIG − IDAC)

2 (37)

and the gradient with respect toIDAC is given by

g(IDAC) =∂L

∂IDAC= ISIG − IDAC . (38)

As shown in Fig. 11(a), the gradient is the input current to thefirst-order continuous timeΣ∆ modulator whose schematic isshown in Fig. 11(b). The circuit level description of theΣ∆has been reported elsewhere [32] and has been omitted herefor the sake of brevity.

Fig. 12 shows the measuredIDAC when the binary inputto the DAC is varied. The results are shown for two dif-ferent levels of reference currentIREF . The results show a

Page 8: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

8

First-order

IDAC

ISIG

IREF

IREFclk

IREF

IDAC

b1 b2 bNb’1 b’2 b’N

b

d

Σ∆ Modulator Current DAC

(b)(a) (c)

IIN IINC

Integrator

Comparator

d

Vref

Fig. 11. Circuit illustrating the application of first-order SGD algorithm for current tracking and balancing.

10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1

DAC input (6 bit)

Nor

mal

ized

cur

rent

I out /

I bias

Ibias

= 5uA

Ibias

= 1.3nA

Fig. 12. Non-linear response of the current DAC when biased in weak-inversion.

non-monotonic response and the monotonicity of the DACdeteriorates when the reference currentIREF is reduced.This behavior is well documented in literature for the DACtopology [31] and is attributed to the mismatch between theMOS transistors which becomes amplified at low bias currents.The calibration DACs therefore provide 10-bit of programma-bility of the system parameters but exhibit only 6-bits oflinearity. However, these artifacts are ideal to demonstrate thebenefits theΣ∆ gradient-descent algorithm. The measuredcost-functionL(IDAC) is shown in Fig. 13(a) which showsthe presence of multiple local minima. Fig. 13(b) shows therunning average ofΣ∆ modulator output when the parameteris adapted according to the SGD algorithm. The result showsthat after the initial search, running average of the modulatoroutput or equivalently the average of the system gradientconverges to zero. Thus, the proposed architecture convergesto a solution where the cost function approaches the globalminima (IDAC ≈ ISIG).

B. Center-frequency Calibration in band-pass Gm-C Filters

The second example applies the SGD algorithm for calibrat-ing the center-frequency of continuous-time band-pass Gm-Cfilters that have been commonly used for designing siliconcochlea [33]–[35]. Many of these applications require thatthe filters consume a minimal amount of power and in manyinstances could have center frequencies as low as 100Hz. Thisrequires biasing the transistors in weak-inversion which,makesthe design more susceptible to analog artifacts like mismatchand distortion [11]. One of the possible approaches to calibratefor these analog artifacts is to incorporate programmablecircuit elements (e.g. DACs) which can be calibrated to obtainthe desired performance metric.

Fig. 14(b) shows the circuit level schematic of a standardGm-C biquad filter stage which consists of three linearizedtransconductors whose transconductance parameters are givenby gm1, gm2 andgm3. The schematic of the transconductor isshown in Fig. 14(c) and was reported and described in [32].The circuit uses an active source degeneration technique basedon the cross-coupled pMOS transistorsM5 −M8. Note thatthe input and the mirrors in the schematic are cascoded (eventhough it is not explicitly shown in the schematic). Assumingthat the transconductor are linear, the output of the band-passbiquad filter stage is given by:

Vbp(s) =Gω0

Q s

s2 + ω0

Q s+ ω2

0

Vin1(s) (39)

where the center frequencyω0,the filter gainG and the qualityfactorQ of the filter are given by

ω0 =

√gm2gm3

C(40)

Q =

gm3

gm1

(41)

G =gm1

gm2

. (42)

The transconductance parametersgm1 − gm3 are deter-mined by the bias current of the transconductor as shown inFig. 14(c), which are programmed using the on-chip 10-bitcurrent DACs (schematic shown in Fig. 12). The magnituderesponse of the filter is measured by rectifying the filter

Page 9: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

9

0 200 400 600 800 10000

0.05

0.1

0.15

0.2

0.25

Loss

func

tion

10−bit DAC input

(a)

0 1000 2000 3000 4000 5000

0

0.2

0.4

0.6

0.8

1

Σ∆ m

odul

ator

out

put

No. of iterations

(b)

Fig. 13. Figure showing (a) the loss function12I2in when the input to the 10-bit current DAC is varied; and (b) therunning average of theΣ∆ modulator

output. The adaptation rate for this experiment wasωs = 250kHz.

First-order

b

d

(b)

Gm-C Filter

CC

gm1 gm2

gm3

Vlp

Vout

Vref

Vin2

gm1-gm3

V+ V-

gm

M1 M2

M3 M4

M5 M6

M7 M8 Iout

Gm-C Filter Transconductor

(a) (c)

Rectifier

VbpVin1

Fig. 14. Circuit illustrating the application ofΣ∆ gradient descent algorithm for center frequency calibration of a Gm-C bandpass filter.

output followed by digitization using a first-order continuous-time Σ∆ modulator, as shown in Fig. 11(a). Note that theinput stage of the continuous-timeΣ∆ modulator automati-cally performs low-pass filtering of the rectified output, thuseliminating the need for an envelope detector commonly usedin the design of silicon cochlea. Fig. 15 shows the measuredmagnitude response using the set-up in Fig. 11. In this exper-iment, the gainG and the quality factorQ are kept constantwhile the center frequencyω0 is incrementally programmedby adjusting the transconductancesgm1 − gm3. The DACreference current is set to 1nA which ensures that the MOStransistors are biased in the deep sub-threshold regime. Themeasured response shows that the center frequencies and thefilter gains vary non-monotonically and is primarily attributedto the non-linearity of the programming DACs (see Fig. 12).We now show that, in spite of the non-monotonic response ofthe filters, the proposedΣ∆ gradient descent algorithm canbe used for robust calibration of the filter center frequencies.

To model the calibration procedure as an equivalent mini-mization problem, we exploit the fact that the biquad filter inFig.11(a) exhibits a low-pass characteristic with respectto the

inputVin2 of the transconductorgm3. Using superposition, theoutputVout can be written as

Vout(s) = Hbp(s)Vin1(s) +Hlp(s)Vin2(s) (43)

where

Hlp(s) =G

s2 + ω0

Q s+ ω2

0

(44)

andHbp is the transfer function as in equation (39).Fig. 16(left) shows the frequency response of the band-pass

Hbp and low-passHlp filter stages of the biquad filter. Ifa tone with frequencyω0 is applied to the inputsVin1 andVin2, the magnitude of the outputVout reaches its minimumwhen the center frequency of the biquad matchesω0. Thisproperty can now be exploited to formulate an optimizationproblem whose solution is a filter whose center-frequency iscalibrated to the desired frequencyω0. The band-pass filter,however, has a zero at DC which contributes to an additional90o phase shift in the transfer function (43). Therefore, a90o

phase difference is synthetically introduced between the inputsVin1 andVin2. Fig. 16(center) shows the signalsVin1 andVin2

and the output of the filter according to the equation (43).

Page 10: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

10

1000 2000 3000 4000 500045

50

55

60

65

Frequency (Hz)

Mag

nitu

de (

dB)

Fig. 15. Non-monotonic response of the biquad filter when thetranscon-ductance parametersgm1, gm2 and gm3 are varied such thatG = 1 andQ = 1.

TABLE IMEASURED SPECIFICATIONS OF THEGM -C BANDPASS FILTER

Technology 0.5µm CMOS process

Size (includingΣ∆ modulator) 1000µm × 300µm

Programmable Frequency Range 10Hz - 10kHz

Filter Linear Range (at Q = 1) 240mV

Power - Bandpass filter 300pW (100Hz) - 33nW (10kHz)

The continuous-timeΣ∆ modulator computes the magnitudeof the error signal and adjusts the value ofgm2 while keepingthe gainG and the quality factorQ constant. Note that inthis optimization onlygm2 is a free variable, where as othervariablesgm1 and gm3 are determined by the constraints.Fig. 16(right) shows the output of the modulator when theΣ∆gradient-descent adaptation is applied. Indeed, the output ofthe modulator reaches a minima at the desired center frequencyω0, in spite of the DAC and transconductor non-linearities.Note that the value of the minima is non-zero implying thatthe magnitude of the error signal cannot be reduced to zero.This limitation arises due to finite programming resolutionofthe current DACs. We have used this procedure to program thecenter frequencies of a biquad filter according to a logarithmicscale or Mel-scale which is desirable for many auditoryfrontends [33], [36]. Fig. 17 shows the measured response ofthe calibrated filter where the center frequencies of the filterhave been successfully programmed to the desired frequencies.Table I summarizes the measured specifications of the Gm-C filter and the calibration DAC, where the nanowatt powerdissipation is attributed to the sub-threshold biasing of theMOSFET transistors.

IV. CONCLUSIONS ANDDISCUSSIONS

All online gradient-descent algorithms use some form of afirst-order recursive low-pass filtering of the system gradient to

estimate the optimal set of parameters. In this paper, we showthat if a Σ∆ modulation step is inserted before the filtering,the quantization noise inserted by the use of finite-resolutionADCs and DACs is shaped out of the frequency bands con-taining the adaptation trajectories. We show that the resultingclass of SGD algorithm not only exhibits superior convergenceproperties compared to the conventional QGD algorithms butitalso exhibits superior parameter search properties compared tothe floating-point gradient descent algorithms. This enables theSGD algorithm to avoid spurious local minima and search formore optimal values of the system parameters. This propertyof the SGD algorithm is beneficial, in particular, for calibrationof digitally assisted analog circuits that use DACs for digitalstorage of analog parameters. Due to non-linearity and non-monotonicity of the DAC, the system error (quality) metricbeing optimized exhibits numerous spurious local minima. Thehardware portion of this paper demonstrated the robustnessofthe SGD based calibration algorithm towards the DAC non-linearities for two specific applications: (a) current balancingand offset cancellation; and (b) center-frequency calibrationof biquad filters biased in weak-inversion. Even though wehave used a first-orderΣ∆ modulator in the proposed SGDalgorithm, the proposed framework can be extended to higher-order noise-shaping as well. However, in this case the first-order gradient updates given by equation (3) will be replacedby a higher-order recursive low-pass filter. Investigationintothe effects of higher-order noise-shaping on the trade-offbetween the convergence and search properties of the onlinegradient-descent algorithm will be a topic of future research.

A limitation of the SGD approach is that the systemgradients need be measured directly, which is not alwayspossible. One possible solution which could be the basis offuture research in this area is that the system gradient couldbe approximated using perturbation methods such as: simul-taneous perturbative stochastic approximation (SPSA) [37] orfinite different stochastic approximation (FDSA) [38], whichonly requires direct measurement of the system error (qual-ity) metric (instead of its gradient). Another related areaof research could be extending the formulation to higher-dimensional spaces where the proposed SGD framework couldlead to noise-shaping not only in the frequency domain butalso in the spatial domain [20], [28].

REFERENCES

[1] M. Pastre and M. Kayal,Methodology for the Digital Calibration ofAnalog Circuits and Systems: with Case Studies, Springer, 1st ed., 2006.

[2] B. Murmann, “Digitally assisted analog circuits,”IEEE Micro, vol. 26,pp. 38–47, Mar. 2006.

[3] G. Cauwenberghs and M. Bayoumi,Learning on Silicon, Boston, MA:Kluwer Academic Publishers, 1999.

[4] B. Murmann, B. Boser, “A 12-bit 75-MS/s pipelined ADC using open-loop residue amplification,”IEEE J. Solid-State Circuits, vol. 38, no. 12,pp. 2040–2050, Dec. 2003.

[5] J. A. Lenero-Bardallo, T. Serrano-Gotarredona, and B. Linares-Barranco,“A Calibration Technique for Very Low Current and Compact TunableNeuromorphic Cells: Application to 5-bit 20-nA DACs,”IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 55, no. 6, pp. 522–526, Jun. 2008.

[6] R. J. Kieret al., “Design and implementation of multipattern generators inanalog VLSI,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 1025–1038,Jul. 2006.

Page 11: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

11

102

103

104

105

10−4

10−3

10−2

10−1

100

101

Frequency (Hz)

Nor

mal

ized

Mag

nitu

de

bandpass filterlowpass filter

ωc

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

Time (seconds)

Σ∆ o

utpu

t

ω0

(a) (b) (c)

Fig. 16. Frequency calibration of the biquad bandpass filter: (a) figure showing the band-pass and the low-pass filter responses along with the error signal;(b)oscilloscope figure showing the input signalsVin1 andVin2 along with the error output from the biquad filter; and (c) output of theΣ∆ modulator measuredfrom the fabricated prototype,Vout.

102

103

104

40

45

50

55

60

65

Frequency (Hz)

Mag

nitu

de (

dB)

Fig. 17. Measured response of the biquad filter which is calibrated usingtheΣ∆ gradient descent algorithm to center frequencies chosen according tothe Mel-scale [36].

[7] B. Linares-Barranco, T. Serrano-Gotarredona, and R. Serrano-Gotarredona,“Compact low-power calibration mini-DACs for neuralmassive arrays with programmable weights,”IEEE Trans. Neural Netw.,vol. 14, no. 5, pp. 1207–1216, Sep. 2003.

[8] H. Juan, A. Murray, W. Dongqing, “Adaptive Visual and AuditoryMap Alignment in Barn Owl Superior Colliculus and Its NeuromorphicImplementation”,IEEE Transactions on Neural Networks and LearningSystems, vol. 23, no: 9, pp. 1486 - 1497, 2012.

[9] T. Delbruck, R. Berner, P. Lichtsteiner, C. Dualibe, “32-bit Configurablebias current generator with sub-off-current capability”,Proceedings of2010 IEEE International Symposium on Circuits and Systems (ISCAS),2010.

[10] D. W. Graham, E. Farquhar, B. Degnan, C. Gordon, P. Hasler, “Indi-rect Programming of Floating-Gate Transistors”,IEEE Transactions onCircuits and Systems I, vol. 54, no: 5, pp. 951-93, 2007.

[11] P. R. Kinget, “Device mismatch and tradeoffs in the design of analogcircuits,” IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1212–1224, Jun.2005.

[12] G. Cauwenberghs and G. C. Temes, ”Adaptive Digital Correction ofAnalog Errors in MASH ADCs – Part I. Off-Line and Blind On-LineCalibration,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.,vol. 47, no. 7, pp. 621–628, Jul. 2000.

[13] G. Carvajal, M. Figueroa, D. Sbarbaro, W. Valenzuela, “Analysis and

Compensation of the Effects of Analog VLSI Arithmetic on theLMSAlgorithm”, IEEE Transactions on Neural Networks, vol. 38 , no. 7, pp.1046-1060, 2011.

[14] M. Cohen and G. Cauwenberghs, ”Floating-Gate Adaptation for Focal-Plane On-Line Nonuniformity Correction,”IEEE Trans. Circuits Syst. II,Analog Digit. Signal Process., vol. 48, no. 1, pp. 83–89, Jan. 2001.

[15] J. C. Candy and G. C. Temes,Oversampled methods for A/D and D/Aconversion in Oversampled Delta-Sigma Data Converters, Piscataway,NJ: IEEE Press, pp. 1–29, 1992.

[16] D. Bertsekas.Non-linear Programming, MA: Athena Scientific, 1995.[17] S. Haykin, Neural Networks: A comprehensive foundation, 2nd ed.,

Prentice Hall, 1998.[18] S. S. Haykin,Least-Mean-Square Adaptive Filters, B. Widrow Eds.,

Wiley, 2003.[19] R. M. Gray,“Spectral analysis of quantization noise,”IEEE Trans. Inf.

Theory, vol. 36, pp. 1220–1244, Nov. 1990.[20] L. Bottou, “Stochastic Learning,”Advanced Lectures in Machine Learn-

ing, Springer, 2003, pp. 146–168.[21] H. Robbins and S. Monro, “A Stochastic approximation method,” Ann.

Math. Statist., vol. 22, no. 3, pp. 400–407, 1951.[22] H. Kushner,Introduction to Stochostic Control, New York: Holt, Rine-

hart and Winston, 1971.[23] J. R. Blum, “Approximation methods which converge withprobability

one,” Ann. Math. Statist., vol. 25, pp. 382-386, 1954.[24] C. Bishop,Neural Networks for Pattern Recognition, Oxford University

Press, 1995.[25] S. Gelfald and S. K. Mitter, “Recursive Stochastic Algorithms for Global

Optimization,” J. Control and Optim., vol. 29, pp. 999–1018, Sep. 1991.[26] H. Fang, G. Gng, and M. Qian, “Annealing of Iterative Stochastic

Schemes,”J. Control and Optim., vol. 35, pp. 1886–1907, 1997.[27] S. Gelfald and S. K. Mitter, “Simulated annealing with noisy or

imprecise energy measurements,”J. Optim. Theor. Appl., vol. 62, pp.49–62, 1989.

[28] A. Gore and S. Chakrabartty, “Min-Max Optimization Framework forDesigningΣ∆ Learners: Theory and Hardware,”IEEE Trans. CircuitsSyst. I, Reg. Papers, vol. 57, no. 3, pp. 604–617, 2010.

[29] A. Fazel, A. Gore, and S. Chakrabartty, “Resolution Enhancement insigma-delta Learners for Super-Resolution Source Separation,” IEEETrans. Signal Process., vol. 58, no. 3, pp. 1193–1204, 2010.

[30] K. Bult and J. G. M. Geelen,“An inherently linear and compact MOST-only current division technique,” IEEE J. Solid-State Circuits, vol 27,pp.1730-1735, Dec.1992.

[31] B. Linares-Barranco and T. Serrano-Gotarredona, “On the design andcharacterization of femtoampere current-mode circuits, IEEE Journal ofSolid-State Circuits, Vol. 38, no:8, pp:1353 - 1363, 2003.

[32] A. Gore, S. Chakrabartty, S. Pal and E. C. Alocilja,“A MultichannelFemtoampere-Sensitivity Potentiostat Array for Biosensing Applications,”in IEEE Transactions of Circuits and Systems I:Regular Papers,Vol.53,Issue 11, Nov. 2006.

[33] L. Wattset al., “Improved implementation of the silicon cochlea,”IEEEJ. Solid-State Circuits, vol. 27, no. 5, pp. 692–700, May 1992.

[34] V. Chan, S. C. Liu, and A. van Schaik, “AER EAR: A matched siliconcochlea pair with address event representation interface,” IEEE Trans.

Page 12: Noise-Shaping Gradient Descent based Online Adaptation ...Noise-Shaping Gradient Descent based Online Adaptation Algorithms for Digital Calibration of Analog Circuits Shantanu Chakrabartty,

12

Circuits Syst. I: Special Issue on Smart Sensors, vol. 54, no. 1, pp. 48–59, 2007.

[35] R. Sarpeshkar,et al., “An Ultra-Low-Power Programmable AnalogBionic Ear Processor,”IEEE Trans. Biomed. Eng., vol. 52, no. 4, pp.711–727, April 2005.

[36] A. Fazel, S. Chakrabartty, “Statistical Pattern Recognition Techniquesfor Speaker Verification,”IEEE Circuits Syst. Mag., vol. 11, no. 2, pp.62–81, 2011

[37] J. C. Spall, “Multivariate stochastic approximation using a simultaneousperturbation gradient approximation,”IEEE Trans. Automat. Control, vol.37, pp. 332–341, 1992.

[38] J. Kiefer and J. Wolfowitz, “Stochastic estimate of a regression func-tion,” Ann. Math. Statistics., vol. 23, pp. 462–466, 1952.

Shantanu Chakrabartty (SM’99-M’04-S’09) re-ceived his B.Tech degree from Indian Institute ofTechnology, Delhi in 1996, M.S. and Ph.D in Elec-trical Engineering from Johns Hopkins University,Baltimore, MD in 2002 and 2005 respectively.

He is currently an associate professor in the de-partment of electrical and computer engineering atMichigan State University (MSU). From 1996-1999he was with Qualcomm Incorporated, San Diegoand during 2002 he was a visiting researcher at TheUniversity of Tokyo.

Dr. Chakrabartty’s work covers different aspects of analogcomputing,in particular non-volatile circuits, and his current research interests includeenergy harvesting sensors and neuromorphic and hybrid circuits and systems.Dr. Chakrabartty was a Catalyst foundation fellow from 1999-2004 and isa recipient of National Science Foundation’s CAREER award,UniversityTeacher-Scholar Award from MSU and the 2012 Technology of the YearAward from MSU Technologies. Dr. Chakrabartty is a senior member of theIEEE and is currently serving as the associate editor forIEEE Transactions ofBiomedical Circuits and Systems, associate editor for theAdvances in ArtificialNeural Systemsjournal and a review editor forFrontiers of NeuromorphicEngineeringjournal.

Ravi K. Shaga received his B.Tech degree from Indian Institute of Tech-nology, Kharagpur in 2007 and a M.S. degree in Electrical andComputerEngineering from Michigan State University, East Lansing,MI in 2011.

He is currently with Intel Corp., Hillsboro, Oregon, and hisresearchinterests include analog integrated circuits and high-speed digital circuits.

Kenji Aono (S’09) received both a B.S. in electricalengineering and a B.S. in computer engineering,with honors, from Michigan State University, EastLansing, MI, U.S.A. in 2011.

He was employed as a Technical Aide in theAdaptive Integrated Microsystems Laboratory afterfinishing his undergraduate work in early 2011. Heis currently a Research Assistant at the same facilitywhile enrolled at Michigan State University to com-plete work towards a master’s degree in electricalengineering.

Mr. Aono is an active officer for the Michigan Alpha chapter ofTau BetaPi and the Gamma Zeta chapter of IEEE-Eta Kappa Nu. He is a recipient ofthe graduate fellowship from the Michigan Space Grant Consortium for the2012-2013 year.