compressed level crossing sampling for ultra-low power iot

13
1 Compressed Level Crossing Sampling for Ultra-Low Power IoT Devices Jun Zhou, Amir Tofighi Zavareh, Robin Gupta, Liang Liu, Zhongfeng Wang, Fellow, IEEE , Brian M. Sadler, Fellow, IEEE , Jose Silva-Martinez, Fellow, IEEE , and Sebastian Hoyos, Senior Member, IEEE Abstract—Level crossing sampling (LCS) is a power-efficient analog-to-digital conversion scheme for spike-like signals that arise in many Internet of Things (IoT) enabled automotive and environmental monitoring applications. However, LCS scheme requires a dedicated time-to-digital converter (TDC) with large dynamic range specifications. In this paper, we present a com- pressed LCS that exploits the signal sparsity in the time domain. At the compressed sampling stage, a continuous-time ternary encoding scheme converts the amplitude variations into a ternary timing signal that is captured in a digital random sampler. At the reconstruction stage, a low-complexity split-projection least squares (SPLS) signal reconstruction algorithm is presented. The SPLS splits random projections and utilizes a standard least squares approach that exploits the ternary-valued amplitude distribution. The SPLS algorithm is hardware friendly, can be run in parallel, and incorporates a low-cost k-term approximation scheme for matrix inversion. The SPLS hardware is analyzed, designed and implemented in FPGA, achieving the highest data throughput and the power efficiency compared to the prior arts. Simulations of the proposed sampler in an automotive collision warning system demonstrate the proposed compressed LCS can be very power efficient and robust to wireless interference, while achieving an approximately 8-fold data volume compression when compared with Nyquist sampling approaches. Index Terms— level crossing sampling, compressive sensing, Internet of Things, least squares, sparse signal reconstruction. I. I NTRODUCTION W ITH the development of low-cost sensors and low- power wireless communication systems, hundreds of thousands of electronic devices are connected together for real-time and intelligent processing. This has promoted rapid growth of many Internet of Things (IoT) enabled automotive and environment monitoring applications, such as lane-keeping driver assistance, collision warning, automated parking assis- tance, smart home, connected health, etc. Although equipped with hundreds of sensor elements, many potential IoT applica- tions have a large number of sensors that are event driven with spike-like signals, and a large portion of them remain idle most of time [1], [2]. The conventional Nyquist analog-to-digital converters (ADCs) carry out uniform sampling, which results in a very high accumulated data rate and correspondingly very high power at the ADC interface and digital post-processing. Jun Zhou is with Broadband Communication, Broadcom Limited, CA 92619, U.S.A. (e-mail: [email protected]). Amir Zavareh, Robin Gupta, Jose Silva-Martinez, and Sebastian Hoyos are with Department of Elec- trical & Computer Engineering, Texas A&M University, TX 77843, U.S.A. (e-mail: {amirtofighi, robin7}@tamu.edu; {jsilva, hoyos}@ece.tamu.edu;). Liang Liu is with Electronical and Information Technology Department, Lund University, Sweden (e-mail: [email protected]). Zhongfeng Wang is with School of Electronic Science and Engineering, Nanjing University, China (e-mail: [email protected]). Brian M. Sadler is with Army Research Laboratory, Adelphi, MD, U.S.A. (e-mail: [email protected]) Fig. 1 (a) illustrates a band-limited analog signal and its discrete and quantized ADC output. This classic sampling approach is agnostic to the characteristics of input signals so long as the bandwidth requirement is satisfied. This motivates the study of a non-uniform sampler that exploits the signal sparsity in time for better power efficiency [3]. LCS has high efficiency in capturing signals that are sparse in the time domain [4]–[6]. LCS schemes usually consist of a level-crossing detector and a TDC. Shown in Fig. 1 (b), a level-crossing detector records the time instances when signal the crosses the predefined amplitude levels [4]. A dedicated TDC measures the delay between two time stamps. Typically operating asynchronously without a clock, a level-crossing detector is able to adaptively respond to the signal variations. The power consumption of the LCS scheme scales linearly with time occupancy, which offers significant power savings for time-sparse signals [7]–[10]. A drawback of LCS is that a large dynamic range TDC is required to handle quiescent periods when a long time interval between two successive level crossing events occurs, and time intervals may vary over orders of magnitude depending on the signal characteristics, e.g., from 1ns to 1ms [11]. A high-speed coarse counter with dynamic range (DR) of 10 6 requires 20-bit resolution, which may have false counts in an asynchronous design or require a very large decoding module for a one- to-one mapping in a synchronous design. Although a coarse counter allows a large DR, the hardware is sensitive to long- term clock jitter [11]. A ramp interpolator [12] yields sub- cycle resolution with a moderate clock rate via a stretching technique. However, this architecture is restricted to short time intervals only and has a dead time limit. A hybrid method combining a coarse counter and a ramp interpolator [11] can measure both short and long time intervals, but the inevitable finite switching time between different schemes as well as the associated hardware cost both motivate a more power efficient TDC design. Recent developments in compressed LCS sampling scheme have produced a low-power compressed sampling scheme optimized for pulse-like signals [13]. The works of [14]–[17] studied the compressed LCS scheme in ultrasound imaging and radar with echoes in sparsely scattering environments driven by impulsive transmit signals. Leveraging a novel continuous-time ternary encoding (CT-TE) block [13], return echoes are converted to a ternary sparse timing signal. CT- TE encodes the relative level changes between two successive time stamps via ternary symbols, e.g., {+1, 0, -1}. As echoes are transient, the ternary timing signal generated by the CT-TE is also sparse. This enables a sub-Nyquist rate sampling via Digital Object Identifier: 10.1109/TCSI.2017.2707481 1558-0806 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Upload: others

Post on 18-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

1

Compressed Level Crossing Sampling for

Ultra-Low Power IoT DevicesJun Zhou, Amir Tofighi Zavareh, Robin Gupta, Liang Liu, Zhongfeng Wang, Fellow, IEEE, Brian M. Sadler,

Fellow, IEEE, Jose Silva-Martinez, Fellow, IEEE, and Sebastian Hoyos, Senior Member, IEEE

Abstract— Level crossing sampling (LCS) is a power-efficientanalog-to-digital conversion scheme for spike-like signals thatarise in many Internet of Things (IoT) enabled automotive andenvironmental monitoring applications. However, LCS schemerequires a dedicated time-to-digital converter (TDC) with largedynamic range specifications. In this paper, we present a com-pressed LCS that exploits the signal sparsity in the time domain.At the compressed sampling stage, a continuous-time ternaryencoding scheme converts the amplitude variations into a ternarytiming signal that is captured in a digital random sampler. Atthe reconstruction stage, a low-complexity split-projection leastsquares (SPLS) signal reconstruction algorithm is presented. TheSPLS splits random projections and utilizes a standard leastsquares approach that exploits the ternary-valued amplitudedistribution. The SPLS algorithm is hardware friendly, can berun in parallel, and incorporates a low-cost k-term approximationscheme for matrix inversion. The SPLS hardware is analyzed,designed and implemented in FPGA, achieving the highest datathroughput and the power efficiency compared to the prior arts.Simulations of the proposed sampler in an automotive collisionwarning system demonstrate the proposed compressed LCS canbe very power efficient and robust to wireless interference, whileachieving an approximately 8-fold data volume compression whencompared with Nyquist sampling approaches.

Index Terms— level crossing sampling, compressive sensing,Internet of Things, least squares, sparse signal reconstruction.

I. INTRODUCTION

W ITH the development of low-cost sensors and low-

power wireless communication systems, hundreds of

thousands of electronic devices are connected together for

real-time and intelligent processing. This has promoted rapid

growth of many Internet of Things (IoT) enabled automotive

and environment monitoring applications, such as lane-keeping

driver assistance, collision warning, automated parking assis-

tance, smart home, connected health, etc. Although equipped

with hundreds of sensor elements, many potential IoT applica-

tions have a large number of sensors that are event driven with

spike-like signals, and a large portion of them remain idle most

of time [1], [2]. The conventional Nyquist analog-to-digital

converters (ADCs) carry out uniform sampling, which results

in a very high accumulated data rate and correspondingly very

high power at the ADC interface and digital post-processing.

Jun Zhou is with Broadband Communication, Broadcom Limited, CA92619, U.S.A. (e-mail: [email protected]). Amir Zavareh, RobinGupta, Jose Silva-Martinez, and Sebastian Hoyos are with Department of Elec-trical & Computer Engineering, Texas A&M University, TX 77843, U.S.A.(e-mail: {amirtofighi, robin7}@tamu.edu; {jsilva, hoyos}@ece.tamu.edu;).Liang Liu is with Electronical and Information Technology Department,Lund University, Sweden (e-mail: [email protected]). Zhongfeng Wangis with School of Electronic Science and Engineering, Nanjing University,China (e-mail: [email protected]). Brian M. Sadler is with Army ResearchLaboratory, Adelphi, MD, U.S.A. (e-mail: [email protected])

Fig. 1 (a) illustrates a band-limited analog signal and its

discrete and quantized ADC output. This classic sampling

approach is agnostic to the characteristics of input signals so

long as the bandwidth requirement is satisfied. This motivates

the study of a non-uniform sampler that exploits the signal

sparsity in time for better power efficiency [3].

LCS has high efficiency in capturing signals that are sparse

in the time domain [4]–[6]. LCS schemes usually consist of

a level-crossing detector and a TDC. Shown in Fig. 1 (b), a

level-crossing detector records the time instances when signal

the crosses the predefined amplitude levels [4]. A dedicated

TDC measures the delay between two time stamps. Typically

operating asynchronously without a clock, a level-crossing

detector is able to adaptively respond to the signal variations.

The power consumption of the LCS scheme scales linearly

with time occupancy, which offers significant power savings

for time-sparse signals [7]–[10].

A drawback of LCS is that a large dynamic range TDC is

required to handle quiescent periods when a long time interval

between two successive level crossing events occurs, and time

intervals may vary over orders of magnitude depending on the

signal characteristics, e.g., from 1ns to 1ms [11]. A high-speed

coarse counter with dynamic range (DR) of 106 requires 20-bit

resolution, which may have false counts in an asynchronous

design or require a very large decoding module for a one-

to-one mapping in a synchronous design. Although a coarse

counter allows a large DR, the hardware is sensitive to long-

term clock jitter [11]. A ramp interpolator [12] yields sub-

cycle resolution with a moderate clock rate via a stretching

technique. However, this architecture is restricted to short time

intervals only and has a dead time limit. A hybrid method

combining a coarse counter and a ramp interpolator [11] can

measure both short and long time intervals, but the inevitable

finite switching time between different schemes as well as the

associated hardware cost both motivate a more power efficient

TDC design.

Recent developments in compressed LCS sampling scheme

have produced a low-power compressed sampling scheme

optimized for pulse-like signals [13]. The works of [14]–[17]

studied the compressed LCS scheme in ultrasound imaging

and radar with echoes in sparsely scattering environments

driven by impulsive transmit signals. Leveraging a novel

continuous-time ternary encoding (CT-TE) block [13], return

echoes are converted to a ternary sparse timing signal. CT-

TE encodes the relative level changes between two successive

time stamps via ternary symbols, e.g., {+1, 0, −1}. As echoes

are transient, the ternary timing signal generated by the CT-TE

is also sparse. This enables a sub-Nyquist rate sampling via

Digital Object Identifier: 10.1109/TCSI.2017.2707481

1558-0806 c© 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

2

compressive sensing (CS) [18] with a moderate DR. The work

[19], [53], [54] explored the CT-TE in low-power biomedical

applications.

In this paper, we present hardware design of the compressed

LCS. In compressed sampling, we study the circuit architec-

ture and key modules in the CT-TE block. The relationship

between the conversion resolution and signal slew rate is

investigated. In signal reconstruction, a low-complexity noise-

robust algorithm called SPLS is studied. SPLS splits random

projections into a few independent sections and adopts LS

approach for recovery of ternary-valued timing signals. We

present a fixed-point hardware design for the SPLS. A k-term

approximation to matrix inversion is proposed to reduce hard-

ware complexity, while still maintaining a bounded probability

of signal recovery. The non-adaptive SPLS algorithm enables

a parallel hardware architecture that is flexible in trading off

circuit complexity and system throughput. The SPLS outper-

forms conventional CS signal reconstruction algorithms, such

as orthogonal matching pursuit (OMP) [20], and iteratively

reweighted least squares (IRLS) [21]. The SPLS hardware has

been designed in RTL and tested in FPGA. Simulations of the

proposed scheme in an automotive collision warning system

confirm the compressed LCS and the SPLS reconstruction

algorithm can be robust to wireless interference, achieving

about 8-fold data compression in a wireless or wireline trans-

ceiver when compared to Nyquist sampling approaches.

This paper is organized as follows. Section II gives a

brief review on CS technique. In Section III, the CT-TE

hardware design is presented and compared to prior arts. In

Section IV, the SPLS hardware design is presented and a

low-complexity k-term approximation to matrix inversion is

discussed. Design results and comparisons with state-of-the-

art are also included. Section V shows analysis and numerical

results of the proposed CS system in an automotive collision

warning application with in-band wireless interference. Sec-

tion VI concludes the paper.

II. COMPRESSIVE SENSING BACKGROUND

CS technique enables sub-Nyquist rate sampling of sparse

signals at the analog-to-digital interface [18]. Signal x ∈ RN

is called K -sparse in a representation basis if K ≪ N

of the coefficients are significant and the rest are zero. We

define sparsity as the ratio K /N . Let � denote an M × N

random sensing matrix, where M ≪ N . The compressed

measurements y ∈ RM are generated by y = �x + w, where

w is zero-mean Gaussian noise with distribution denoted by

N(

0, σ 2)

.

The columns of �, denoted {ϕi }Ni=1, are often referred to as

atoms. The incoherent sampling principle requires the maximal

absolute value of correlation of any two different columns to

be upper-bounded by a small value ε, [22], [23]

maxi, j∈[1,N ],i �= j

ϕi , ϕ j

⟩∣

∣ < ε. (1)

State-of-the-art mixed-signal CS front-end designs include

a random demodulator (RD) [24]–[26], parallel segmented

CS (PSCS) [27], and modulated wideband converter (MWC)

Fig. 1. Illustration of an input band-limited analog signal and output discretesamples from different sampling devices: (a) uniform sampling ADC; (b) levelcrossing sampling scheme.

[28]. For a detailed analysis on hardware limitations of these

designs, we refer the reader to our previous works [13], [14].

Given incoherent sampling, the signal recovery problem is

min ‖x‖1 subject to y = �x . (2)

Many algorithms are available for solving (2). They may be

roughly categorized into two groups. The first group follows

basis pursuit using linear programming [29]. The second

uses a greedy method, such as orthogonal matching pursuit

(OMP) [20] and stage-wise OMP (StOMP) [30]. For a detailed

analysis on the limitations of these algorithms, please see [13],

[14].

III. CT-TE HARDWARE IMPLEMENTATION

The CT-TE scheme introduced in [13] delivers an efficient

asynchronous sampling for pulse-like signals that are sparse

in time. The CT-TE only fires upon level crossing. A Q-bit

DAC divides the dynamic range and provides a threshold pair

(Vth,L , Vth,H ) to two comparators that form an input tracking

window with one least significant bit (LSB) span.

Fig. 2 shows a hardware block diagram of the CT-TE

scheme, which consists of a pair of differential amplifiers,

comparators, a fixed width pulse generator, and a charge

pump. The differential amplifier takes a difference of the input

signal and threshold generated by the charge pump circuit. Its

intrinsic high gain characteristics increase sensitivity of the

comparator. The comparator consists of four cascaded CMOS

inverters. The comparison level is set in the first stage which

can be controlled by changing the voltage transfer curve via

transistor sizing. The threshold of an inverter based comparator

is given by Eq. (3),

Vth =Vdd −

∣Vt p

∣ + Vtn

Kn/K p

1 +√

Kn/K p

, (3)

where Vth is comparator threshold, Vdd is the power supply rail

voltage, Vtn and Vt p are NMOS and PMOS device threshold,

respectively, Kn and K p are the product of device capacitance,

mobility and dimensions for NMOS and PMOS, respectively.

3

Three inverter stages are used for increased gain and logic

level inversion so that the circuit behaves as an internally set

comparator circuit. The key in the second stage design is that

it must be same as the first stage to maintain the same DC

threshold. It’s helpful to keep the linearity in balance for the

rising and falling intervals of high frequency input signals.

The inverter based comparator is a simple design with very

high speed (mainly controlled by the gate delay) compared

to other counterparts like differential amplifier based high

gain comparators. The major disadvantage of the inverter

based comparator is two fold. First, the threshold Vth is

sensitive to temperature variation. In Eq. (3), the mobility and

transistor threshold voltage are both a function of temperature.

Therefore, the comparator threshold is also a function of

temperature. This needs to be calibrated at power up so

that impact of variations can be compensated. Second, the

comparator is sensitive to power supply noise due to its single

ended design. A bypassing capacitor is often employed to

reduce high frequency current. It’s shown that the gate delay

is controlled by device size. Since we don’t want to put a large

load to the differential amplifier stage, a minimum width will

be a good choice for first inverter stage. Assuming we select

a minimum width device for the first inverter stage, the gate

delay of each subsequent stage becomes independent to device

width and is a function of device length and power supply.

The total gate delay of the cascaded inverters in this design is

approximated to 45ps.

Propagation Delay ∝ L2

Vdd

. (4)

A fixed width pulse is generated and fed to the charge pump

for pumping up or down the current into the capacitor at each

time the comparator trips, which increases or decreases the

threshold, respectively. The fixed width pulse generator has a

set/reset flip flop with a fixed delay in the feedback path as

shown in Fig. 2. The delay element “D” in Fig.2 is designed

by an array of series inverters. The calibration process of the

inverters and charge pump can be done with a combination of

delay locked loop (DLL) and phase locked loop (PLL). First,

the calibration of charge pump P/N mismatch can be done

by tweaking pulling up/down current with a fixed width pulse

generator by a PLL. This calibration process doesn’t need to

be on all the time. It can be done at the power on or in the

background. This way PLL power consumption doesn’t come

into picture. Second, the calibration of the inverters (delay

element) can be done by tweaking the control voltage of the

voltage controlled delay lines in a DLL. In [31], it illustrates

a replica based technique that provides ps-accuracy delay line.

The idea is to tune each delay cell by a control voltage

across them. Once the voltage is found and held, the inverters

maintain the same delay. Similarly, the DLL can either run at

the start up or background to get the calibrating parameters.

Therefore, the DLL power is not included in total dissipated

power.

Charge pump circuit is shown in Fig. 3. Switches MP3 and

MN3 control the current flow to the charge-pump output. To

mitigate switching errors propagating to the output node, these

switches are on the source side of the current source devices

Fig. 2. Block diagram of CT-TE hardware implementation.

Fig. 3. Circuit schematic of charge pump design.

Fig. 4. The relationship between the maximum input slew rate versus thestep size of charge pump output in the proposed CT-TE hardware design.

MP1 and MN1. Dummy devices MP2, MP4, MN2, MN4 reduce

both charge injection and clock-feedthrough noise, while MP5

and MN5 ensure a fast turn off of MP3 and MN3.

The charge pump switching frequency is a function of

charge pump current, charging/discharging capacitor size, and

switch on/off pulse widths. A higher charge pump current

results in a higher switching frequency but the power con-

sumption of the charge pump also goes higher. A small size

4

Fig. 5. Power consumption of the CT-TE hardware with different DACresolution.

Fig. 6. Block diagram of digital random sampler (DRS) in the compressedLCS sampling scheme.

charging capacitor requires short time to charge or discharge.

But the integrated noise (KT/C) on the reference signal is

larger. Going through these constraints in our design, we

selected a capacitor size of 50fF, a pulse width of 50ps

translates to an average current of 100uA for 10mV change.

Note that the front end differential amplifier is a macro model

in our schematic simulation just to distinguish the input and

the threshold voltages. Differential amplifier is the only macro

model used in our simulation. The total delay in the signal

path in simulation is 130ps, including 45ps on the inverters

at the input, 35ps on the CLK-Q delay and 50ps on the

delay of the series inverters (delay element). It translates to

an asynchronous data conversion rate up to 7.7GHz. It can

be further increased by increasing the switching charge pump

current at the cost of higher power consumption.

The step size of charge pump output is controlled by pulling

up/down current sources and the pulse width. A small step size

provides high resolution to input variation which minimizes

the granular noise. However, a small step size in the charge

pump also restricts the maximum input slew rate (SR) given a

maximum CT-TE conversion rate fs . Let’s define U as input

swing, and Q-bits quantization is employed, we have,

S R = U

2Q· fs = LSB · fs . (5)

In this work, the CT-TE conversion rate is 4GHz. Fig. 4

illustrates the step size of the charge pump output in our CT-

TE design versus the maximum input slew rate. In Fig. 4, we

test different step sizes of charge pump output (equivalent to

the LSB of the CT-TE), and sweep the input slew rate until

the CT-TE loopback fails to track the input variation. We can

see that the maximum slew rate allowed in the CT-TE design

is linearly proportional to the step size of charge pump output,

which matches Eq. (5) very well.

Fig. 7. Block diagram of the SPLS signal reconstruction algorithm.

Table I compares the power consumption of this work with

recent low-power LCS and successive approximation (SAR)

ADC designs. We list the most important parameters in recent

IoT related sensor front end design like the power consumption

and data conversion rate. We define a figure of merit (FOM)

as

FO M = Power

2E N O B × 2 × BW, (6)

where ENOB stands for effective number of bits. The FOM

indicates the amount of power dissipated for every single

conversion to produce an output. A ramp with a slew rate

of 1V/µs, and a DAC resolution of 10-bit was chosen which

leads to a conversion rate of 4 GHz. The ramp was chosen to

manifest the worst power performance of the circuit due to the

fact that it excited the CT-TE at maximum. Simulation results

suggest our design can potentially achieve very low power

consumption. The power scales with the DAC resolution. Fig.

5 shows the higher the bit number is, the more transitions occur

and consequently more power is used. Table I also shows the

best known low-power SAR ADC announced equivalent 10-

bit 10MSps performance, which is comparable to our scheme

in terms of speed, resolution, and power consumption. We

would like to point out the proposed compressed LCS scheme

has sub-Nyquist rate in analog to digital conversion. This

unique feature allows reduced data volume in the wireless or

wireline transceiver. Section V illustrates an example in an

automotive collision warning system. The reduced data volume

means additional power saving can be expected when sending

compressed data, at the cost of extra signal reconstruction

stage which is implemented in a power-insensitive host.

A pulsed signal pair (Vup, Vdown) from the CT-TE hardware

results in a sparse analog ternary output, enabling a time-

delay measurement using CS technique without explicitly

counting the clock periods. The ternary state is input to a

low complexity digital random sampler (DRS) (Fig. 6) with

only one MUX and one accumulator for sub-Nyquist sampling.

The DRS exhibits many advantages compared with RD, PSCS

and MWC for time sparse signals, e.g. low complexity, ease

of massive integration, excellent linearity, and digital tech-

nology scaling. In addition, the analog integrator that serves

in inner production calculation suffers from saturation issue

5

TABLE I

COMPARISON ON LOW-POWER LEVEL CROSSING SAMPLER (LCS) AND SUCCESSIVE APPROXIMATION (SAR) DESIGNS

inevitably due to the central limit theorem according to [45],

[46], which significantly degrades the signal reconstruction

from compressed samples. Though the stability issue can be

resolved by moving the pole to a higher frequency with a

resistor divider in the feedback loop, it complicates signal

reconstruction. However, saturation issue does not show up

in the proposed compressed LCS scheme. It’s because the

CT-TE converts signal’s amplitude variations into ternary

symbols. The compressed samples are generated in the digital

domain by the digital random sampler shown in Fig. 6, which

is designed to be saturation-proof at acculation with word

length extension per stage. Therefore, the proposed scheme is

different from the conventional compressive sensing front-end

hardware in terms of saturation nonlinearity. As for thermal

noise, “Vin” in Fig. 2 comes from a standard RF receiver.

The RF design is out of scope in this paper. We assume the

outband noise is suppressed by RF chain. And inband noise

can be handled by the embedded Schmitt trigger design in

threshold generation. There are more details in [13].

State-of-the-art TDC designs [47], [48] have difficulties

to support 60dB (from 1ms to 1ns) dynmaic range of time

delay between two successive pulses in many pulsed IoT

applications. Recent work [47] reported a 0.4mW TDC design

fabricated in 0.11µm CMOS process. It supports an input

signal bandwidth up to 50kHz only. Work [48] showed

a 29.4Msample/s 2.8mW TDC design fabricated in 65nm

CMOS process. However, design in [47] and [48] can only

support a dynanic range of 40dB and 35dB, respectively, in

time delay measurement. It does not meet the 60dB desired

performance.

IV. SPLS HARDWARE DESIGN

Denote x ∈ RN to be the K -sparse ternary signal, x ∈ {−1,

0, 1}, and y ∈ RM to be CS measurements. A window mask

� of length L slides along the columns of matrix �. At each

time increment, a subset of consecutive columns are selected,

and � can be split into ⌈N/L⌉ mutually exclusive pieces. For

the i th piece, � is represented by two matrices �(i)� and �

(i)

�,

where � is the complementary set of �. Dropping (i), y is

denoted as,

y = ��x� + ��x� + w. (7)

The SPLS algorithm was first introduced in [14] for signal

reconstruction from compressed samples generated by the

compressed LCS scheme. It uses conventional LS, followed

by thresholding to yield a ternary signal reconstruction. The

derivation of the SPLS statistics and optimal threshold for

ternary detection was studied in [14]. The SPLS estimate of

each column window is,

x� =(

�T���

)−1�

T�y. (8)

The thresholds TH for (1 − α) hypothesis confidence are

shown below, where Tα is a constant. Fig. 7 illustrates a block

diagram of the proposed SPLS scheme, where multiple parallel

LS estimators are employed for real-time signal reconstruction.

We use standard theory to find the thresholds to be [14],

T H = x p ± Tα ·Std

(

x p

)

√p

, (9)

where x p is mean of p estimates of x , Pr (−Tα < t < Tα) =1 − α.

The SPLS yields an over-determined LS estimation in each

random projection section. In addition, signal recovery of each

section is statistically independent, which is a unique advan-

tage compared with general compressive sensing optimization

for reconstruction such as OMP [20], and IRLS [21]. These

conventional CS signal reconstruction algorithms are iterative

optimization routines that are not readily parallelizable. The

independent recovery of each section allows a fully parallel

design. Note that the recovered sections can be employed

to prune the interference from nonzero elements outside of

the projection window when processing the remaining section

[14].

In the SPLS scheme, the most complex hardware component

is the LS estimator, which includes matrix multiplication and

inversion, with matrix inversion the most computationally

intensive. Conventional VLSI design for general Hermitian

and positive-definite matrix inversion includes QR decomposi-

tion, Cholesky decomposition, and others. For a problem size

of L, Cholesky decomposition requires a total of L divisions,

1/3∗L∗(L2 − 1) multiplications and 1/3∗L∗(L2 − 1) additions.

The number of multiplication grows cubically with L.

For our case note that the matrix A = �T��� has very

small off-diagonal elements due to the incoherent sampling

kernel. Define A = D+E, where matrices D and E contain the

diagonal and off-diagonal elements of A, respectively. Noting

6

TABLE II

COMPUTATIONAL COMPLEXITY OF MATRIX INVERSION SCHEMES, FOR

HERMITIAN AND POSITIVE-DEFINITE MATRIX INPUT WITH DIMEN-SION L -BY-L

Fig. 8. Mean squared error (MSE) of least squares estimate parameterizedby the number of quantization bits Q at 10dB signal to noise ratio (SNR).

Fig. 9. MSE of LS estimate versus SNR when using 10-bit word length infixed-point representation.

that En → 0 when n → ∞, then matrix A can be rewritten

in as

A−1 = (D + E)−1 =(

I + D−1E)−1

D−1

=∞∑

n=0

(

−D−1E)n

D−1. (10)

Now consider the k-term approximation of the matrix inverse,

A−1k =

k−1∑

n=0

(

−D−1E)n

D−1. (11)

Now, a 1-term approximation is A−1

1 = D−1, while a 2-

term approximation is A−1

2 = D−1 − D−1ED−1. In general,

the residual error with a k-term approximation is

�k =∞∑

n=k

(

−D−1E)n

D−1 = (−D−1E)k

1 + D−1ED−1

=(

−D−1E)k

A−1. (12)

Therefore, the LS estimate of x� with a k-term matrix inverse

approximation can be written as,

x� = A−1k �

T�y = (I − �kA)A−1

�T�y = (I − �kA) x�. (13)

Equation (12) shows the approximation error in the LS esti-

mate depends on the residual error term �kA. The following

theorem characterizes the probability that the residual error of

a k-term approximation exceeds a given threshold α > 0.

Theorem 1. Let �� ∈ RM×L come from modified Bernoulli

random process, taking value from {1, −1} with equal proba-

bility. Then, for an integer k > 0 and any α > 0, we have

Pr(

‖�kA‖2F > α

)

≤ 2L2

M2α2/k, (14)

where ‖·‖F denotes the Frobenius norm.

The proof of Theorem 1 is given in Appendix. Theorem

1 suggests that by increasing the number of samples M , we

can constrain the probability of a large residual error caused

by the finite term approximation to be arbitrarily small. The

residual error decays in a speed of M2. Since L is less than

M by default in the SPLS scheme, the k-term approximation

approach has a good bounded residual error probability.

The proposed k-term approximation to the matrix inverse

has scalable hardware complexity depending on k. For exam-

ple, a 2-term approximation has only L divisions, and 2∗(L2–

L) multiplications. Table II compares computational complex-

ity of different matrix inversion schemes. Specializing to our

case, the proposed k-term approximation method offers con-

siderable hardware savings over general inversion algorithms.

We now consider quantization effects in the proposed k-term

matrix inversion in a stand-alone LS problem. Let elements

of �� have a standard normal distribution with dimension

of 10,000×100. We consider a LS approach to the linear

problem, y = ��x + w, where x is random ternary signal

(not necessarily sparse), and w is from Gaussian distribution

N(

0, σ 2)

.

Define the mean squared error (MSE) and signal-to-noise

ratio (SNR) in Eqs. (15) and (16), respectively, as

MSE =∥

∥x − x∥

2

2

‖x‖22

, (15)

7

TABLE III

COMPARISON OF IMPLEMENTATION RESULTS OF THE PROPOSED SPLSAND OMP RECONSTRUCTION ALGORITHMS USING CMOS TECHNOL-

OGY

TABLE IV

COMPARISON OF IMPLEMENTATION RESULTS OF THE PROPOSED SPLSAND OMP RECONSTRUCTION ALGORITHMS USING XILINX FPGA

Fig. 10. Timing flowing diagram of the proposed LS calculation in theproposed SPLS estimator hardware.

SNR = 10 log10

Psignal

Pnoise(dB) . (16)

Fig. 8 shows the MSE of the LS estimate parameterized by

finite word-length Q at 10dB SNR. Fixed-point and floating-

point Cholesky decomposition are compared. As shown in Fig.

8, the fixed-point Cholesky decomposition achieves floating

point performance when Q is larger than 13 at 10dB SNR. An

MSE floor appears in the k-term approximation method when

Q is greater than 10, because the finite term approximation in

Eq. (13) introduces a small residual error. A larger k results

in less residual error and hence smaller MSE. However, note

that the MSE improvement becomes marginal when k > 2.

Fig. 9 shows the MSE of the LS estimate versus SNR with a

10-bit word-length in fixed-point representation. At relatively

high SNR (SNR ≥ 15dB), 10-bit quantization is insufficient to

preserve MSE performance as both the approximation scheme

and Cholesky decomposition have MSE error floors. However,

at lower SNR (SNR ≤ 10dB), the 2-term approximation has

a similar MSE compared to Cholesky decomposition.

Based on Figs. 8 and 9, the 2-term approximation is a good

alternative to Cholesky decomposition at low to medium SNR,

with quadratic (rather than cubic) multiplication growth with

the problem size, and no additions. Consequently, we adopt the

10-bit 2-term matrix inversion approximation in the following.

The LS estimation can be decomposed into two parts: three

matrix multiplications and matrix inversion. The three matrix

multiplications are A = �T���, B = �

T�y and A−1B. Each

matrix multiplication involves a collection of independent

inner product calculations. To optimize hardware reuse, the

matrix multiplications are carried out by one generic inner

product module with time multiplexing. A finite state machine

(FSM) is designed to select the correct input vectors and feed

to the inner product module. For example, the element ai, j in

matrix A is calculated from the i th row of �T� and j th column

of ��. Note that only the upper triangle of A needs calculation

due to matrix symmetry. Also, the diagonal elements of A

are all equal to 1 if �� comes from a normalized Bernoulli

random matrix. In this case, the first two matrices A and

B can be computed together with a MUX at the input to

minimize reconstruction time, and the 2-term approximation

to the inversion of matrix A can be simplified to element-

wise operations. Fig. 10 is a timing flow diagram of the

LS estimator. Note that the calculation of A = �T��� and

B = �T�y are independent to each other, they are grouped in

Phase 1, followed by A−1B in Phase 2.

The SPLS reconstruction algorithm was designed in Verilog

at 65nm CMOS technology. Table III compares the hardware

results of the proposed SPLS scheme and existing designs

using the OMP algorithm, where problem size describes

dimension of the matrix �. Table III shows that the SPLS has

the shortest reconstruction time due to the parallel architecture.

In the simulated SPLS design, two parallel LS estimators

were implemented. Additional parallelization can be readily

implemented. Table III shows the SPLS has the smallest area

compared with methods from [32] and [33], due to the low-

cost k-term matrix inversion approximation.

Power is estimated from Power Artist simulation. Table III

shows the SPLS scheme can potentially achieve one fifth the

power of the OMP matrix inversion design in [33]. As a further

comparison, FPGA-based implementations are listed in Table

IV, comparing the proposed SPLS design with schemes from

[32], [34] and [35]. The SPLS exhibits the best data throughput

performance with the least hardware cost.

8

V. RADAR SYSTEM DESIGN EXAMPLE

In this section, we evaluate the compressed LCS scheme

and the SPLS signal reconstruction algorithm in an automotive

collision warning system with in-band wireless interference.

Please note that the compressed LCS scheme is not limited

to automotive collision warning application. Instead, the pro-

posed compressed LCS scheme serves as a low-power analog-

to- digital sampling scheme and can be applied to many other

IoT enabled applications that are driven by event-based spike-

like signals, like biomedical and environment monitoring.

Analysis on the other IoT applications will not be included

in this paper.

The growing use of radar and communications motivates the

development of a system that allows co-existence of both radar

and wireless communications in the same spectrum allocation.

Conventional interference-robust receiver solutions include

parallel architectures such as analog filter banks to notch out

the undesired interference [36], and advanced spatial signal

processing using MIMO [37]. Both solutions face challenges

related to the power and complexity of the front-end hardware

due to the multiple parallel processing channels. Given that

a high-resolution radar requires broadband transmit signals,

and that the Shannon sampling theorem requires a sampling

rate of at least twice the signal’s bandwidth, an excessive data

volume appears at the outputs of conventional Nyquist analog-

to-digital converters (ADCs), that not only challenges the

integration of a large number of high-speed, high-resolution

ADCs, but also the high-speed data links for following digital

processing.

Recently, a few range finder designs based on impulse-radio

ultra-wideband (IR-UWB) are reported [51], [52]. Work [51]

uses time extension approach in quantization to relax ADC

high speed constraint. While the sampling stage still runs at

Nyquist rate. In addition, leading edge detection is required in

multipath environment. Otherwise, the actual echo from direct

path may be missed in dead time. Work [52] adopts a two-

step TDC with scalable delay cells. The finite switching time

between different delay cells causes larger ranging error. Also,

the power cost in this architecture is generally higher than

other approaches. These limitations motivate a more power

efficient TDC design.

Some CS-based compressed sampling schemes have been

reported for radar [38], [39]. However, the RD approach in

[38] and the MWC approach in [39] are challenged by the

power and complexity as discussed earlier. And, these designs

rely on random basis compressed sensing and inevitably suffer

an SNR loss proportional to the compression ratio.

Consider an automotive collision warning system with a

maximum unambiguous range of 90 meters to support a safe

braking and stopping distance at a speed of 100km/hour, which

is roughly 80 meters. We choose a range resolution of 0.02m

for urban application, requiring an 8GHz bandwidth linear

frequency modulated (LFM) transmit pulse. We simulate the

radar, with a detection probability (Pd) of 0.9 and a false

alarm probability (Pfa) of 10−6. Some key parameters of

the simulated collision avoidance radar system are listed in

Table V. A 6GHz carrier frequency places the radar pulse in

the 2GHz∼10GHz spectrum where wireless communication

systems (WLAN and cellular) are also present as interference

with an assumed power level from −50dBm to −70dBm.

A filter bank is employed to notch out wireless interference

while providing multiple channels for parallel processing. The

filter bank consists of 20 2nd-order filters that divide the

signal bandwidth into 20 equally spaced sub-bands of 400MHz

each. Fig. 11 illustrates the filter banks spectrum response

(left) and output waveforms of each channel (right) when

radar signals and wireless interference are simultaneously

present. Fig. 11 illustrates a tone interference contaminates

the 19th as well as its neighboring channels with a power

much stronger than the radar signal, whereas channels far away

from the interference, e.g., the 1st ∼ 17th channels, have high

interference attenuation. When zooming in to pulse area at one

channel, it looks similar to the waveform shown in Fig. 1 (b).

The level crossing points and the corresponding time instant

are also marked.

Fig. 12 shows the proposed radar receiver with emphasis

on the mixed-signal interface. z(t) and y[n] are the input

and output of the compressed LCS scheme, respectively. It is

worth noting that in some cases clutter returns from trees or

buildings can be strong and challenge desired target detection

and tracking. In such a case the return may become less

sparse, but clutter mitigation schemes such as the displaced

phase center antenna (DPCA) pulse canceller [40] and adaptive

DPCA can be employed.

We first evaluate the SPLS performance by synthetic signals.

The synthetic signal is helpful to examine the SPLS estimate in

a standard, controllable setup before we dive into actual radar

signal. Actually, we believe this part of analysis is applicable

to other pulse-like signals, like ultrasound, when they have the

similar signal property after the CT-TE scheme. The ternary 3-

bit CT-TE output model is adopted to examine the statistics of

the SPLS estimator. We simulate the ternary timing signal with

a sparsity of 0.2%. Fig. 13 compares the theoretical analysis

on bias and variance of the SPLS estimator with simulations at

0dB SNR. Simulation results illustrate the inverse relationship

between the bias and variance of the SPLS estimator and the

normalized sub-Nyquist sampling ratio (SSR, defined as the

ratio of the actual sampling rate fs to Nyquist rate fN yq ). Fig.

13 indicates a small bias and variance even with low SNR,

because the SPLS scheme converts each section recovery to

a locally over-determined LS problem that is noise robust.

Although interference from outside of the projection window

affects the LS estimate, this is small compared with the unit

amplitude of the ternary signal due to the use of an incoherent

sensing matrix. In addition, the bias and variance decrease as

the SSR increases.

A clustering parameter (CP) is defined as the minimum

number of nonzero elements in a cluster. By controlling

CP, dense or scattered distribution ternary-valued signals are

emulated for the CT-TE, with different quantization bits in

the DAC. This simulates the density of the scatterers in the

scene. Setting the signal dimension to 10,000 with a sparsity

of 0.2% and SSR equal to 0.08, three normalized windowing

sizes, L = 0.04, 0.08 and 0.12, are used with 100,000 Monte

Carlo simulation trials.

9

Fig. 11. Response of a 20-channel analog filter bank when pulsed radarsignals and interference from wireless communication are simultaneouslypresent.

Fig. 12. Architecture of the proposed interference-robust receiver withemphasis on the proposed compressed LCS sampling scheme.

Fig. 13. Analysis and simulation bias and variance comparison for theproposed SPLS scheme at 0dB SNR.

Error rate (ER) is defined as the number of error symbols

after ternary detection divided by the total number of received

symbols. Fig. 14 shows the ER performance versus normalized

CP parameterized by different normalized windowing sizes in

the split projection. ER improves slightly when CP increases

in noise free and 5dB SNR cases. The improvement is due

to the fact that a larger CP results in a more concentrated

distribution of nonzero elements. This reduces the possibility

of a large number of nonzero elements falling outside of a split

projection window, and hence reduces the probability of large

Fig. 14. Error rate (ER) performance parameterized by normalized clusteringparameters and normalized windowing sizes in the proposed SPLS scheme.

interference in the LS estimate. The ER also improves as the

projection window size increases. OMP [20] is included for

comparison. The CP has marginal influence on ER as shown

in Fig. 14. The ER remains small even for a wide range of

CP, which indicates the SPLS works well when the ternary

signal is clustered in time, as can occur in radar with dense

scattering.

The window size of the split projection has a relatively

larger impact on ER compared with CP, especially for the

noisy case. This is because both noise and interference affect

the error rate. Consider each column of the measurement

matrix as a signal representation atom. Larger L provides a

larger dictionary to interpret the compressed measurements

during each section recovery, and reduces the possibility of

fitting noise and interference that degrades the ER at lower

SNR. A larger L naturally increases the computational cost;

recall that the computational complexity of the SPLS scheme

is O (N M L). This provides a useful system tradeoff depend-

ing on the particular application.

The actual radar signal is now adopted to investigate some

system metrics in radar when the SPLS is used, for example,

error rate, probability of detection (Pd) and probability of

false alarm (Pfa). Fig. 15 compares ER of ternary timing sig-

nal recovery via state-of-the-art CS reconstruction algorithms

(basis pursuit [29], OMP [20], stage-wise OMP (StOMP) [30],

Bayesian CS [41] and smoothed-L0 [42]) at 0dB∼5dB SNR.

The ternary timing signal output from the CT-TE scheme has

an average sparsity of 0.38%, which suggests an SSR of 0.13

in the simulations, or an approximately 8-fold data volume

reduction compared with Nyquist-rate sampling. In Fig. 15, the

SPLS estimator has the best ER performance over the entire

SNR range. This important performance advantage comes

from the use of a “tall” matrix in each recovery that leads

to a locally over-determined LS problem for each section,

yielding good noise robustness. The optimal thresholds in (9)

are selected for the SPLS estimator. We set the threshold to

±0.5 for the conventional CS recovery algorithms because

these CS estimators yield an exact signal recovery with high

probability, so they are unbiased with high probability. It has

been demonstrated in [43], both theoretically and numerically,

10

that the basis pursuit de-noising (BPDN), the Dantzig selector,

and the OMP all have MSE proportional to the oracle estimator

MSE which coincides with the unbiased Cramer-Rao Lower

Bound (CRLB) for exactly K -sparse deterministic signals. The

oracle estimator performs optimal LS estimation by including

the prior knowledge of the sparse signal support [44]. How-

ever, we have to be cautious with the interpretation in the

present case as we exploit the prior of the signal’s ternary

structure.

Fig. 16 shows Probability of Detection (Pd) vs. Probability

of False Alarm (Pfa) parameterized by communication radar

signal to interference power ratio (SIR). The figure shows that

the proposed scheme can achieve a Pd of 0.9 and a Pfa less

than 0.04 with 8-fold data volume reduction compared with

the Nyquist-rate sampling.

As a generic sub-Nyquist ADC, the compressed LCS

scheme is applicable to both filter bank based interference-

robust single antenna radar and MIMO radars that have an

inherent multi-channel architecture. Some extensive studies on

MIMO radars, such as [45], reveal that Pfa can be improved

by two orders of magnitude in a 4-by-4 MIMO configuration.

The compressed LCS and the SPLS scheme would also reduce

the sampling data volume of such a system.

VI. CONCLUSION

We developed a sub-Nyquist sampling scheme for sparse in

time signals, especially in low-power IoT applications, with

a low power compressed LCS scheme, and low complexity

SPLS algorithm. Many sensors envisioned for IoT applications

are event driven with spike-like time signal outputs. Our

design, analysis, and simulation study predicts that the FOM

of the proposed LCS scheme is potentially very favorable with

respect to existing designs. We also presented the SPLS sparse

signal reconstruction algorithm paired with the compressed

LCS that exploits the ternary-valued CT-TE output. The SPLS

splits the random projections with an l2-norm approach fol-

lowed by threshold detection. To lower hardware cost, we

presented a k-term approximation for matrix inversion in the

LS estimator. The SPLS has fixed complexity and each section

reconstruction is independent, which enables fully parallel

hardware architecture. The VLSI design study shows that

the SPLS algorithm can be very power efficient, and could

provide higher throughput than conventional CS sparse signal

reconstruction algorithms such as OMP, by taking advantages

of the sparse ternary sampling scheme.

APPENDIX

Theorem 1. Let �� ∈ RM×L be i.i.d. real standard

normally distributed. Then, for an integer k > 0 and any

α > 0, we have

Pr(

‖�kA‖2F > α

)

≤√

3L2 (M + 2)

M2α1/k, (14)

where ‖·‖F denotes the Frobenius norm.

Proof: We have

E(

‖�kA‖2F

)

= E

(

∥(−D−1E)k∥

2

F

)

≤ E

(

∥D−1E

2k

F

)

.

TABLE V

RADAR SYSTEM SPECIFICATIONS

Fig. 15. Error rate of the recovered signal after amplitude detection at a sub-Nyquist sampling ratio of 13% in an automotive collision warning system.

Fig. 16. Probability of Detection (Pd) vs. Probability of False Alarm (Pfa)parameterized by communication radar signal to interference power ratio(SIR).

(17)

The last inequality follows from the sub-multiplicative prop-

erty of the matrix Frobenius norm. D is a real diagonal matrix

with diagonal elements

dii = ϕTi ϕi , (18)

11

and E is a real matrix with each element given by

ei j = ϕTi ϕ j , i �= j. (19)

Therefore, Eq. (17) can be rewritten as

E(

‖�kA‖2F

)

≤ E

(

∥D−1E

2k

F

)

= E

L∑

i=1

L∑

j=1

ei j

di j

2k⎞

⎠ =L

i=1

L∑

j=1

E

(

ei j

di j

2k)

=L

i=1

L∑

j=1

E

ϕTi ϕ j

ϕTi ϕi

2k⎞

≤L

i=1

L∑

j=1

E

(

1∣

∣ϕTi ϕi

2k

)

E

(

∣ϕTi ϕ j

2k)

. (20)

The last step follows from the Cauchy-Schwarz inequality.

If every entry of � is picked up from a modified Bernoulli

random process with equal probability between 1 and -1, then,

E(

|ϕTi ϕi |4

)

= M4,

E(

|ϕTi ϕ j |4

)

= M2 + (M − 1) M = 2M2 − M.

Hence, for a 2-term approximation, Eq. (20) becomes

E(

‖�2A‖2F

)

≤ E

(

∥D−1E

4

F

)

≤L

i=1

L∑

j=1

E

(

1∣

∣ϕTi ϕi

4

)

E

(

∣ϕT

i ϕ j

4)

= 2M2 − M

M4L2 ≤ 2L2

M2. (21)

According to Markov’s inequality, for any α > 0, we have,

Pr(

‖�kA‖2F > α

)

≤ Pr

(

∥D−1E

2k

F> α

)

= Pr

(

∥D−1E

4

F> α2/k

)

≤E

(

∥D−1E∥

4

F

)

α2/k≤ 2L2

M2α2/k. (22)

ACKNOWLEDGEMENTS

The authors acknowledge the contributions of the students,

faculty, and sponsors of the Analog and Mixed-Signal Center

at the Texas A&M University. This research project was

partially funded under the NSF Award Number 1547436.

REFERENCES

[1] J.A. Stankovic, “Research Directions for the Internet of Things,” IEEE

Internet of Things Journal, vol. 1, no. 1, pp. 3 – 9, Feb. 2014.[2] B. Martinez, M. Monton, I. Vilajosana, and J. Prades, “The Power of

Models: Modeling Power Consumption for IoT Devices,” IEEE SensorsJournal, vol. 15, no. 10, pp. 5777 – 5789, Oct. 2015.

[3] F. Marvasti, “Nonuniform Sampling: Theory and Practice,” Springer,New York, United States, 2001.

[4] J. W. Mark, and T. Todd, “A Nonuniform Sampling Approach to DataCompression,” IEEE Trans. on Communications, vol. 29, no. 1, pp. 24– 32, Jan. 1981.

[5] Y. Tsividis, “Mixed-Domain Systems and Signal Processing Based onInput Decomposition,” IEEE Trans. on Circuits and Systems I, vol. 53,no. 10, pp. 2145 – 2156, Oct. 2006.

[6] Y. Tsividis, “Event-Driven Data Acquisition and Digital Signal Process-ing: A Tutorial,” IEEE Trans. on Circuits and Systems II, vol. 57, no.8, pp. 577 – 581, Aug. 2010.

[7] A. Ogweno, P. Degenaar, V. Khomenko, and A. Yakovlev, “A fixedwindow Level Crossing ADC with activity dependent power dissipa-tion,” 14th IEEE International New Circuits and Systems Conference

(NEWCAS), 2016.

[8] W. Tang, A. Osman, D. Kim, B. Goldstein, C. Huang, B. Martini, V. A.Pieribone, E. Culurciello, “Continuous Time Level Crossing SamplingADC for Bio-Potential Recording Systems,” IEEE Trans. on Circuits

and Systems I, vol. 60, no. 6, pp. 1407 – 1418, Jan. 2013.

[9] L. Yongjia, D. Zhao, W.A. Serdijn, “A Sub-Microwatt AsynchronousLevel-Crossing ADC for Biomedical Applications,” IEEE Transactions

on Biomedical Circuits and Systems, vol.7, no.2, pp.149-157, April 2013.

[10] B. Schell and Y. Tsividis, “A Continuous-Time ADC/DSP/DAC SystemWith No Clock and With Activity-Dependent Power Dissipation,” IEEE

Journal of Solid-State Circuits, vol. 43, no. 11, pp. 2472-2481, Nov.2008.

[11] Józef Kalisz, “Review of methods for time interval measurements withpicosecond resolution,” Metrologia, vol. 41, no. 1, 2004.

[12] G. D. Sasaki, R. C. Jensen, "Automatic Measurements with a HighPerformance Universal Counter", Hewlett-Packard Journal (Hewlett-Packard), vol. 31, no. 9, pp. 21–31, Sep. 1980.

[13] J. Zhou, M. Ramirez, S. Palermo, and S. Hoyos, “Digital-AssistedAsynchronous Compressive Sensing Front-end,” IEEE J. on Emerging

and Selected Topics in Cir. and Sys., vol.2, no.3, pp. 482–492, Sep.2012.

[14] J. Zhou, S. Hoyos, and B. M. Sadler, “Asynchronous CompressedBeamformer for Portable Diagnostic Ultrasound Systems,” IEEE Trans.on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 61, no. 11,pp. 1791 – 1801, Nov. 2014.

[15] J. Zhou, Y. He, B. M. Sadler, and S. Hoyos, “Compressed DigitalBeamformer with Asynchronous Sampling for Ultrasound Imaging,”IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, May2013.

[16] J. Zhou, S. Palermo, B. M. Sadler, and S. Hoyos, “AsynchronousCompressive Sensing in Radar Systems,” IEEE Texas Symposium on

Wireless and Microwave Circuits and Systems, pp.1 – 4, Apr. 2013.

[17] J. Zhou, S. Palermo, José S. Martínez, B. M. Sadler, and S. Hoyos,“Asynchronous Compressive Radar,” the 39th Annual Government

Microcircuit Applications & Critical Technology Conference, Mar. 2014.

[18] E. J. Candès, M. B. Wakin, “An Introduction to Compressive Sampling,”IEEE Signal Processing Magazine, vol. 25, no. 2, Mar. 2008.

[19] A. Tofighi Zavareh, O. Barajas, M. Serafino, J. Jo, B. Applegate, B.Sadler, and S. Hoyos. “A Novel Continuous Time Ternary EncodingBased SS-OCT Calibration,” In IEEE Biomedical Circuits and SystemsConference, 2016.

[20] J. A. Tropp and A. C. Gilbert, “Signal Recovery from Random Measure-ments via Orthogonal Matching Pursuit,” IEEE Trans. Inform. Theory,vol. 53, no. 12, pp. 4655–4666, Dec. 2007.

[21] B.Wohlberg and P. Rodriguez, “An iteratively reweighted normalgorithmfor minimization of total variation functionals,” IEEE Signal Process.

Letter, vol. 14, no. 12, pp. 948–951, Dec. 2007.

[22] E. Candès, J. Romberg and T. Tao, “Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,”IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.

[23] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg and R. G.Baraniuk, “Beyond Nyquist: Efficient Sampling of Sparse BandlimitedSignals,” IEEE Trans. on Information Theory, vol. 56, no. 1, pp. 520-544, Jan. 2010.

[24] D. Gangopadhyay, E. Allstot, A. Dixon, K. Natarajan, S. Gupta, andD. Allstot, “Compressed Sensing Analog Front-End for Bio-SensorApplications,” IEEE J. Solid-State Circuits, vol. 49, no. 2, pp. 426–438,Feb. 2014.

[25] F. Pareschi, P. Albertini, G. Frattini, M. Mangia, R. Rovatti and G. Setti,“Hardware-Algorithm Co-Design and Implementation of an Analog-to-Information Converter for Biosignals Based on Compressed Sensing,”IEEE Trans. Biomedical Circuits and Systems, vol. 10, no. 1, pp. 149 –162, Feb. 2016.

[26] J. N. Laska, S. Kirolos, M. F. Duarte, T. S. Ragheb, R. G. Baraniuk, andY. Massoud, “Theory and implementation of an analog-to-informationconverter using random demodulation,” in Proc. of IEEE Int. Symposium

on Circuits and Systems, pp. 1959–1962, May 2007.

12

[27] Z. Yu, S. Hoyos, and B. M. Sadler, “Mixed-signal parallel compressedsensing and reception for cognitive radio,” in Proc. of IEEE Int. Conf.

on Acoustics, Speech and Signal Processing, pp. 3861–3864, Mar. 2008.

[28] Y. C. Eldar, P. Kuppinger and H. Bolcskei, “Compressed Sensing ofBlock-Sparse Signals: Uncertainty Relations and Efficient Recovery,”IEEE Trans. on Signal Processing, vol. 58, no. 6, pp. 3042 – 3054,2010.

[29] W. Lu and N. Vaswani, “Modified Basis Pursuit Denoising for noisycompressive sensing with partially known support,” in Proc. of IEEEInt. Conf. on Acoustics, Speech, and Signal Processing pp. 3926-3929,2010.

[30] D. Donoho, Y. Tsaig, I. Drori, and J. Starck, “Sparse solution forunderdetermined linear equations by stagewise orthogonal matchingpursuit,” IEEE Trans. on Inform. Theory, vol. 58, no. 2, pp. 1094–1121,Feb. 2012.

[31] M. Kim, S. Choi, T. Seong, and J. Choi, “A Low-Jitter and Fractional-Resolution Injection-Locked Clock Multiplier Using a DLL-Based Real-Time PVT Calibrator With Replica-Delay Cells,” IEEE J. of Solid-State

Circuits, vol. 51, no. 2, pp. 401 – 411, Feb. 2016.

[32] G. Huang, and L. Wang, “High-speed Signal Reconstruction withOrthogonal Matching Pursuit via Matrix Inversion Bypass,” IEEE Work-shop on Signal Processing Systems, pp: 191 – 196, Oct. 2012.

[33] F. Ren, W. Xu, and D. Markovic, “Scalable and parameterised VLSIarchitecture for efficient sparse approximation in FPGAs and SoCs,”Electronics Letters, vol. 49, no. 23, pp: 1440 – 1441, Nov. 2013.

[34] H. Rabah, A. Amira, B.K. Mohanty, S. Almaadeed, and P.K. Meher,“FPGA Implementation of Orthogonal Matching Pursuit for Compres-sive Sensing Reconstruction,” IEEE Trans. VLSI, vol. 23, no. 10, pp.2209 – 2220, Oct. 2015.

[35] M. Xing, Q. Wang, G. Wang, and Z. Bao, “A Matched-Filter-Bank-Based3-D Imaging Algorithm for Rapidly Spinning Targets,” IEEE Trans. On

Geoscience and Remote Sensing, vol. 47, no. 7, pp: 2106 – 2113, 2009.

[36] D. Hai, B. Himed, “Interference Mitigation Processing for Spectrum-Sharing Between Radar and Wireless Communications Systems,” IEEETrans. on Aerospace and Electronic Systems, vol. 49, no. 3, pp: 1911 –1919, Jul. 2013.

[37] M. A. Herman and T. Strohmer, “High-resolution radar via compressedsensing,” IEEE Trans. on Signal Processing, vol. 57, no. 6, pp. 2275 –2284, Feb. 2009.

[38] W. Bajwa, K. Gedalyahu, and Y. Eldar, “Identification of ParametricUnderspread Linear Systems and Super-Resolution Radar,” IEEE Trans.

on Signal Processing, vol. 59, no. 6, pp. 2548 – 2561, Feb. 2011.

[39] J. R. Guerci, Space-Time Adaptive Processing for Radar. Boston: ArtechHouse, 2003.

[40] S. Ji, Y. Xue, and L. Carin, “Bayesian Compressive Sensing,” IEEE

Trans. Inform. Theory, vol. 56, no. 6, pp. 2346–2356, Jun. 2008.

[41] H. Mohimani, M. B. Zadeh, and C. Jutten, "A fast approach forovercomplete sparse decomposition based on smoothed L0 norm", IEEETransactions on Signal Processing, vol.57, no.1, pp. 289-301, Jan. 2009.

[42] Z. Ben-Haim, Y. Eldar, and M. Elad, “Coherence-based performanceguarantees for estimating a sparse vector under random noise,” IEEE

Trans. on Signal Processing, , vol. 58, no. 10, pp. 5030 –5043, Oct.2010.

[43] E. Candès and T. Tao, “The Dantzig selector: Statistical estimation whenp is much larger than n,” The Annals of Statistics, vol. 35, no. 6, pp.2313–2351, 2007.

[44] O’D. Nicholas, and M. F. M. Jose, “On the Product of IndependentComplex Gaussians,” IEEE Trans. on Signal Processing, vol. 60, no. 3,pp. 1050 – 1063, Mar. 2012.

[45] J. Haboba, M. Mangia, F. Pareschi, R. Rovatti, G. Setti, “A pragmaticlook at some compressive sensing architectures with saturation and quan-tization”, IEEE Journal on Emerging and Selected Topics in Circuits and

Systems, vol. 2, no. 3, pp. 443 – 459, Sept. 2012.

[46] D. Bellasi, L. Benini, “Energy-Efficiency Analysis of Analog andDigital Compressive Sensing in Wireless Sensors”, IEEE Transactions

on Circuits and Systems I: Regular Papers, vol. 62, no. 11, pp. 2718 –2729, Nov. 2015.

[47] C. K. Kwon, H. Kim, and J. Park, “A 0.4-mW, 4.7-ps Resolution Single-Loop � TDC Using a Half-Delay Time Integrator,” IEEE Trans. on

VLSI Systems, vol. 24, no. 3, pp. 1184 – 1188, Mar. 2016.

[48] K. O. Ragab, H. Mostafa, and A. Eladawy, “A Novel 10-Bit 2.8-mWTDC Design Using SAR With Continuous Disassembly Algorithm,”IEEE Trans. on Circuits and Systems II, vol. 63, no. 10, pp. 909 –913, Oct. 2016.

[49] Y. J. Chen, K. H. Chang, and C. C. Hsieh, “A 2.02–5.16 fJ/ConversionStep 10 Bit Hybrid Coarse-Fine SAR ADC With Time-Domain Quan-

tizer in 90 nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 51,no. 2, pp. 357–364, Feb. 2016.

[50] C. C. Liu, “27.4 A 0.35mW 12b 100MS/s SAR-assisted digital slopeADC in 28nm CMOS,” IEEE International Solid-State Circuits Conf.

(ISSCC), Feb. 2016.[51] H. G. Han, B. G. Yu, and T. W. Kim, “19.6 A 1.9mm-precision 20GS/S

real-time sampling receiver using time-extension method for indoorlocalization,” IEEE International Solid-State Circuits Conf. (ISSCC),Feb. 2015.

[52] T. H. Jin, H. G. Han, and T. W. Kim, “Time-of-Arrival MeasurementUsing Adaptive CMOS IR-UWB Range Finder With Scalable Resolu-tion,” IEEE Trans. on Circuits and Systems I, vol. 63, no. 10, pp. 1605– 1615, Sept. 2016.

[53] A. T. Zavareh, O. Barajas and S. Hoyos, "An Efficient EstimationAlgorithm for the Calibration of Low-Cost SS-OCT Systems," IEEE

14th International Symposium on Biomedical Imaging (ISBI), 2017.[54] O.Barajas, A. T. Zavareh, S. Hoyos, "Towards an On-chip Signal

Processing Solution for the Online Calibration of SS-OCT Systems,"IEEE International Symposium on Circuits and Systems (ISCAS), 2017.

Jun Zhou (S’10) received the B.S. degree in elec-trical engineering in 2007 from Fudan University,Shanghai, China. He received dual M.S. degreesin electrical engineering from Fudan University,Shanghai, China and Royal Institute of Technology(KTH), Stockholm, Sweden in 2010. He receivedPh.D degree in electrical engineering from TexasA&M University, College Station, TX, USA in 2014.He has one student best paper in IEEE InternationalSymposium on Circuits and Systems (ISCAS). Hismain research interests are mixed signal processing

and low-power VLSI design. He is with Broadcom Limited, focusing on low-power high-performance WPAN system design.

Amir Tofighi Zavareh was born in 1992, Tehran,Iran. He received his B.S. degree with the ElectronicGroup in Electrical Engineering Department fromSharif University of Technology, Tehran, Iran, in2013. He was with Qualcomm as an interim engi-neering intern from August 2015 to December 2015.He was a recipient of the Graduate Teaching Fel-lowship with Dwight Look College of Engineeringat Texas A&M University at spring of 2017.He iscurrently a PhD candidate at Texas A&M University.His research interests includes Analog/ Mixed signal

circuit and system design and biomedical imaging.

Robin Gupta is now pursuing his Master of Sciencedegree in Electrical Engineering at Texas A&M Uni-versity with specialization in Analog/Mixed signalsand RF circuits & systems. He was with TexasInstruments, Bangalore as an Analog ApplicationEngineer for High Speed & RF Products for threeyears. His passion lies in exploring rudimentaryinventions and developing refined techniques, whichcan overcome the fundamental limitations of HighSpeed Analog Devices (Data Converters, PLL’s, Jit-ter Cleaners, Amplifiers, CDRs, TIAs, Equalizers)

and RF (LNA, Mixers, PA) Circuits and Systems.

13

Liang Liu is an Assistant Professor in the Depart-ment of Electrical and Information Technology atLund University, Sweden. He received his B.S.and Ph.D. degree in the Department of ElectronicsEngineering (2005) and Micro-electronics (2010)from Fudan University China. In 2010, he waswith Electrical, Computer and Systems EngineeringDepartment, Rensselaer Polytechnic Institute (NewYork, USA) as a visiting researcher. He joinedLund University as a Post-doc in 2010. His researchinterest includes wireless communication system and

digital integrated circuits design. Liang served as reviewer of over 30 IEEEjournals, transactions and international conferences. Liang is a member of theTechnical Committee for Circuits and Systems for Communications and VLSISystems and Applications of the IEEE Circuit and Systems Society and TCmember of the Asia-Pacific Signal and Information Processing Association(APSIPA). He also served as the technical program committee members ofIEEE WCNC, ICCVE, ReConFig, and Norchip. Liang is board member ofthe Swedish Chapter of the IEEE Joint Solid-State Circuits and Circuits andSystems Society.

Zhongfeng Wang received both B.E. and M.S.degrees from Tsinghua University, Beijing, China.He obtained the Ph.D. degree from the Depart-ment of Electrical and Computer Engineering at theUniversity of Minnesota, Minneapolis in 2000. Hejoined Nanjing University in 2016 as a DistinguishedProfessor through the state’s 1000-talent plan afterserving Broadcom Corporation as a leading VLSIarchitect for nearly nine years. Prior to that, he wasan Assistant Professor in the School of EECS atOregon State University, Corvallis. Even earlier, he

worked for National Semiconductor Corporation, Santa Clara, USA.Dr. Wang is a world-recognized expert on VLSI for Signal Processing

Systems. He has published over one hundred and fifty technical papers,edited one book (“VLSI”) and filed tens of U.S. patent applications anddisclosures. He was the recipient of the IEEE Circuits and Systems SocietyVLSI Transactions Best Paper Award in 2007. In the current record (2007-present), he has had five papers ranked among top twenty most downloadedmanuscripts in IEEE Trans. on VLSI Systems. During his tenure at Broadcom,he has contributed significantly on 10Gbps and beyond high-speed networkingproducts. Additionally, he has made critical contributions in designing FECcoding schemes for 100Gbps and 400Gbps Ethernet standards. So far, histechnical proposals have been adopted by many international networkingstandards.

Since 2004 Dr. Wang has served as Associate Editor for the IEEE Trans.on Circuits and Systems-I (TCAS-I), TCAS-II, and IEEE Trans. on VLSISystems for numerous terms. He is currently a Guest Editor for a special issueof IEEE Journal on Emerging and Selected Topics in Circuits and Systems.Furthermore, he has served as Technical Program Committee Member (or Co-Chair), Session (or Track) Chair, and Review Committee Member for tens ofinternational conferences. In 2013, he served in the Best Paper Award selectioncommittee for the IEEE Circuits and System Society. His current researchinterests are in the area of Digital Communications, Machine Learning, andEfficient VLSI Implementation. He is a Fellow of IEEE since 2016.

Brian M. Sadler (S’81-M’81-SM’02-F’07) receivedthe B.S. and M.S. degrees from the University ofMaryland, College Park, and the PhD degree fromthe University of Virginia, Charlottesville, all inelectrical engineering. He is a Fellow of the ArmyResearch Laboratory (ARL) in Adelphi, MD. Dr.Sadler is an associate editor for the IEEE Trans-actions on Signal Processing and EURASIP SignalProcessing, was an associate editor for IEEE SignalProcessing Letters, and has been a guest editor forseveral journals including IEEE JSTSP, IEEE JSAC,

the IEEE SP Magazine, and the International Journal of Robotics Research.He is a member of the IEEE Signal Processing Society Sensor Array andMulti-channel Technical Committee, and Co-Chair of the IEEE Roboticsand Automation Society Technical Committee on Networked Robotics. Hereceived Best Paper Awards from the Signal Processing Society in 2006and 2010. His research interests include information science, networked andautonomous systems, sensing, and mixed-signal integrated circuit architec-tures.

Jose Silva-Martinez (SM’98–F’10) received theM.Sc. degree from Instituto Nacional de AstrofísicaOptica y Electrónica (INAOE), Puebla, and the Ph.D.degree from Katholieke Univesiteit Leuven, Leuven,Belgium, in 1981 and 1992, respectively. He iscurrently with Department of Electrical and Com-puter Engineering, Texas A&M University, CollegeStation, TX, USA. He has published more than105 journals and 160 conference papers, 2 booksand 12 book chapters, and 1 patent. His currentresearch interests include the design and fabrication

of integrated circuits for communication and biomedical applications. Dr.Silva-Martinez is serving as an Editor-in-Chief of the IEEE TRANSACTIONSON CIRCUITS AND SYSTEMS—PART II: REGULAR PAPERS, the Con-ference Co-Chair of the IEEE International Midwest Symposium on Circuitsand Systems (MWCAS) in 2014. He was co-author of the papers that receivedthe RF-IC and MWCAS Best Student Paper Awards in 2005 and 2011, anda co-recipient of the European Solid-State Circuits Conference Best PaperAward in 1990.

Sebastian Hoyos received the B.S. degree in electri-cal engineering from Pontificia Universidad Javeri-ana (PUJ), Bogota, Colombia, in 2000, and the M.S.and Ph.D. degrees in electrical engineering from theUniversity of Delaware, Newark, in 2002 and 2004.He was a Postdoctoral Researcher (2004-2006) withthe Berkeley Wireless Research Center, Departmentof Electrical Engineering and Computer Sciences.He joined Texas A&M University, College Station,TX in 2006 where he is currently an AssociateProfessor with the Department of Electrical and

Computer Engineering. His research interests include telecommunicationsystems, digital signal processing, and analog and mixed-signal processingand circuit design.