arxiv:2002.02717v1 [cs.lg] 7 feb 2020

8
Unsupervised non-parametric change point detection in quasi-periodic signals Nikolay Shvetsov , Nazar Buzun , Dmitry V. Dylov Skolkovo Institute of Science and Technology Bolshoy blvd. 30/1, Moscow, 121205, Russia {nikolay.shvetsov, n.buzun, d.dylov}@skoltech.ru Abstract We propose a new unsupervised and non- parametric method to detect change points in in- tricate quasi-periodic signals. The detection relies on optimal transport theory combined with topo- logical analysis and the bootstrap procedure. The algorithm is designed to detect changes in virtually any harmonic or a partially harmonic signal and is verified on three different sources of physiological data streams. We successfully find abnormal or ir- regular cardiac cycles in the waveforms for the six of the most frequent types of clinical arrhythmias using a single algorithm. The validation and the ef- ficiency of the method are shown both on synthetic and on real time series. Our unsupervised approach reaches the level of performance of the supervised state-of-the-art techniques. We provide conceptual justification for the efficiency of the method and prove the convergence of the bootstrap procedure theoretically. 1 Introduction New analytical approaches to the quasi-periodic signals with irregular rhythms [25] – such as those encountered in the elec- trocardiograms (ECG) – are in major demand caused by the growth of the physiological monitoring market and by the inflating stock of consumer wearable solutions today. The abundance of unannotated time series data created by these modalities attracts notable theoretical effort in the search for the most robust change point detection (CPD) method, capa- ble of operating in a fast, a model-agnostic, and an unsuper- vised manner. Indispensable with cardiovascular diseases (the top cause of mortality and the major life threat in adults worldwide), ECG has become the most frequently used clinical modal- ity, attracting a multidisciplinary effort to detect conceivable markers of the heart problems in the recordings of the electri- cal function of the heart. Typical ECG is a one-dimensional time series measurement, close to a periodic signal, with each period consisting of three main parts: P wave, QRS complex, and T wave. Both the shapes and the temporal distribution of the PQRST waves carry important clinical information. Among many things, the ECG modality allows detecting dis- ruptions in the cardiac rhythm, being the major proxy for the doctors to diagnose heart arrhythmias [16]. Mathematically, cardiac arrhythmias correspond to some degree of broken periodicity in the ECG data stream, hav- ing much in common with the many other quasi-periodic sig- nals met in the nature [25]. The six of the most frequent types of clinical arrhythmias are atrial flutter, atrial fibrilla- tion, supraventricular tachycardia, premature atrial contrac- tion, and ventricular rhythms [18]. Each of these conditions is defined by a different set of morphologic and temporal char- acteristics in the PQRST complex in the ECG signals. In this work, we were motivated to develop a single, model- agnostic, unsupervised algorithm to detect all of such arrhyth- mias in a binary classification scenario, focusing on high de- tection specificity. Thanks to the non-parametric construction of the proposed change point statistic, we ensure zero mod- eling bias and applicability to a wide range of the incoming quasi-periodic signals, including the physiological ones. Be- sides ECG, we demonstrate the efficient application of the proposed algorithm to find abnormal rhythms in neuronal spiking streams and in periodic limb tremor data in patients with Parkinson’s disease. The formal problem statement is the following. Let X t be the quasi-periodic signal with a period T . One has to test the hypotheses H 0 : {X t IP f0(t/T ) , t [0,n]} H 1 : {∃τ * : X t IP f0(t/T ) ; X t IP f1(t) } t [0* ] and t [τ * ,n] (1) In the notation above IP represents a probability distribution, n is the dataset size, τ * is the change point time, f 0 (t/T ) and f 1 (t) are the functions parametrizing the distributions. Many groups have considered the problem of arrhythmia detection in the frameworks of modern machine learning methods [25, 16]. Such approaches as Decision Trees, Ran- dom Forests, Support Vector Machines (SVM), Naive Bayes, and the Convolutional Neural Networks (CNN) were shown to be efficient for different supervised tasks with the corre- sponding pros and cons [16]. Jun et.al. [19] proposed meth- ods of arrhythmia detection with the neural networks, allow- ing to achieve good values of recall. Further development included methods based on Genetic Algorithms [2] and lo- arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

Upload: others

Post on 12-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

Unsupervised non-parametric change point detection in quasi-periodic signals

Nikolay Shvetsov , Nazar Buzun , Dmitry V. DylovSkolkovo Institute of Science and TechnologyBolshoy blvd. 30/1, Moscow, 121205, Russianikolay.shvetsov, n.buzun, [email protected]

Abstract

We propose a new unsupervised and non-parametric method to detect change points in in-tricate quasi-periodic signals. The detection relieson optimal transport theory combined with topo-logical analysis and the bootstrap procedure. Thealgorithm is designed to detect changes in virtuallyany harmonic or a partially harmonic signal and isverified on three different sources of physiologicaldata streams. We successfully find abnormal or ir-regular cardiac cycles in the waveforms for the sixof the most frequent types of clinical arrhythmiasusing a single algorithm. The validation and the ef-ficiency of the method are shown both on syntheticand on real time series. Our unsupervised approachreaches the level of performance of the supervisedstate-of-the-art techniques. We provide conceptualjustification for the efficiency of the method andprove the convergence of the bootstrap proceduretheoretically.

1 IntroductionNew analytical approaches to the quasi-periodic signals withirregular rhythms [25] – such as those encountered in the elec-trocardiograms (ECG) – are in major demand caused by thegrowth of the physiological monitoring market and by theinflating stock of consumer wearable solutions today. Theabundance of unannotated time series data created by thesemodalities attracts notable theoretical effort in the search forthe most robust change point detection (CPD) method, capa-ble of operating in a fast, a model-agnostic, and an unsuper-vised manner.

Indispensable with cardiovascular diseases (the top causeof mortality and the major life threat in adults worldwide),ECG has become the most frequently used clinical modal-ity, attracting a multidisciplinary effort to detect conceivablemarkers of the heart problems in the recordings of the electri-cal function of the heart. Typical ECG is a one-dimensionaltime series measurement, close to a periodic signal, with eachperiod consisting of three main parts: P wave, QRS complex,and T wave. Both the shapes and the temporal distributionof the PQRST waves carry important clinical information.

Among many things, the ECG modality allows detecting dis-ruptions in the cardiac rhythm, being the major proxy for thedoctors to diagnose heart arrhythmias [16].

Mathematically, cardiac arrhythmias correspond to somedegree of broken periodicity in the ECG data stream, hav-ing much in common with the many other quasi-periodic sig-nals met in the nature [25]. The six of the most frequenttypes of clinical arrhythmias are atrial flutter, atrial fibrilla-tion, supraventricular tachycardia, premature atrial contrac-tion, and ventricular rhythms [18]. Each of these conditions isdefined by a different set of morphologic and temporal char-acteristics in the PQRST complex in the ECG signals.

In this work, we were motivated to develop a single, model-agnostic, unsupervised algorithm to detect all of such arrhyth-mias in a binary classification scenario, focusing on high de-tection specificity. Thanks to the non-parametric constructionof the proposed change point statistic, we ensure zero mod-eling bias and applicability to a wide range of the incomingquasi-periodic signals, including the physiological ones. Be-sides ECG, we demonstrate the efficient application of theproposed algorithm to find abnormal rhythms in neuronalspiking streams and in periodic limb tremor data in patientswith Parkinson’s disease.

The formal problem statement is the following. Let Xt bethe quasi-periodic signal with a period T . One has to test thehypotheses

H0 : Xt ∼ IPf0(t/T ), ∀t ∈ [0, n]H1 : ∃τ∗ : Xt ∼ IPf0(t/T ) ;Xt ∼ IPf1(t)

t ∈ [0, τ∗] and t ∈ [τ∗, n]

(1)

In the notation above IP represents a probability distribution,n is the dataset size, τ∗ is the change point time, f0(t/T ) andf1(t) are the functions parametrizing the distributions.

Many groups have considered the problem of arrhythmiadetection in the frameworks of modern machine learningmethods [25, 16]. Such approaches as Decision Trees, Ran-dom Forests, Support Vector Machines (SVM), Naive Bayes,and the Convolutional Neural Networks (CNN) were shownto be efficient for different supervised tasks with the corre-sponding pros and cons [16]. Jun et.al. [19] proposed meth-ods of arrhythmia detection with the neural networks, allow-ing to achieve good values of recall. Further developmentincluded methods based on Genetic Algorithms [2] and lo-

arX

iv:2

002.

0271

7v1

[cs

.LG

] 7

Feb

202

0

Page 2: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

gistic regression [20], both of which are now very popularfor building the supervised classification models.

A limited number of unsupervised ECG analysis workshave also appeared. In [15], the authors describe unsuper-vised methods for feature extraction and clustering to preventthe false alarm in arrhythmia detection. In [11], the authorsdescribe application of k-means algorithms and a neural net-work to the ECG signal analysis. The application of topolog-ical data analysis (TDA) and Wasserstein metrics [31] for pe-riodical signals was discussed in works by Perea [27, 29, 28].Some selective algorithms for the offline detection of multi-ple change points in multivariate time series are presented in[33, 32] and the CPD via Gaussian processes in [4]. The workby Buzun and Avanesov [5] describes bootstrap applicationfor CPD for the time series. Techniques of Gaussian approx-imation for the OT task are described in different works byBuzun [6] and Chernozhukov [10, 8, 7].

The major difficulty in the statistical study of the problem(1) is twofold: the dependent data and the lack of a suitableparametric model for an intricate signal, such as ECG. To ad-dress these challenges, we propose a new pipeline shown inFigure 1. In the proposed algorithm, we resort to the op-timal transport (OT) approach that is capable of building anon-parametric change point statistic to test the hypotheses.We propose to apply the TDA/OT approach not to the originalsignal, but to a projection of the quasi-periodic function into aclosed curves space (the point cloud), allowing both the peri-odic and the morphologic components of the original signal’swaveform to be considered. Eventually, we estimate quan-tiles of the change point statistic with the bootstrap procedurein order to set a threshold under the null hypotheses assump-tion. In the theoretical section, we prove a theorem about theconvergence of the bootstrap distribution of the statistic to thereal distribution, setting a foundation stone for a plethora ofpossible future works on TDA/OT analysis on periodic sig-nals.

1st sliding window

Raw signal

2nd sliding window

Bootstrap

Wasserstein distancemax$ 𝑊&

& 𝜇() 𝑡 , 𝜇,) 𝑡

𝑡 ∈ [𝜏 − ℎ, 𝜏)

𝑡 ∈ [𝜏, 𝜏 + ℎ]

𝑡 ∈ [𝜏 − ℎ, 𝜏 + ℎ] Anomaly detection

Points cloud

Figure 1: Pipeline of the proposed algorithm, where τ – the secondsliding window center, 2h – the second sliding window size, W p

p –Wasserstein distance, µb

l (t), µbr(t) – Bootstrap measures in the left

and the right parts of the second sliding window.

2 Methodology2.1 The first window: calculate point cloudsThe first step of our approach is to map the original time seriesinto the point cloud. We use the method based on the slidingwindows with 1-dimensional persistence scoring described in[27]. The main idea is to present the original periodical signalas a closed curve, which will help us to apply the optimaltransport formalism to the quasi-periodic data. Define

SW (t) = [Xt, Xt+s, ..., Xt+Ms] (2)

The sliding window (SW ) makes an embedding of the signalXt at point t into IRM+1. Iterating through different values oft with a step ∆t one gets a collection of points called slidingwindow point cloud of Xt (Figure 2). A critical parameterfor this embedding is the first window-size (Ms). It is cho-sen to be equal to the duration of a single period in the signal(e.g., one PQRST cycle in the normal heart beat pattern). Inthe next step we apply Principal Component Analysis (PCA)for this point cloud in order to increase robustness and havea possibility to visualize the rhythm disturbance. Two exam-ples of the point clouds are shown in Figure 2.

0

-1

-1

0

1

0 5000 10000 15000Time

0 5000 10000 15000Time

-10

1X

1

-1

0Y

-1

0

1

Z

1

X0

-1 0

1

Y

-1

0

1

Z

Amplitu

deAm

plitu

de1

1

-1

Figure 2: Point clouds of normal heart rhythm (top) and Atrial Flut-ter (bottom), with x,y,z corresponding to the top tree PCA com-ponents. Colors help to visualize irregularities in the cloud.

2.2 The second window: get Wasserstein distancesIn order to find structural changes in the point cloud corre-sponding to the structural changes in the original time serieswe elaborate the method described in [5]. The main idea isthat at each time step the procedure extracts a data slice fromthe point cloud, splits it in two equal-size parts, and computesWasserstein distance between them. The size of the slidingwindow could be equal to several curve loops. The methodavoids the rise of values of the Wasserstein distances due tofluctuations and neglects noise-driven changes in the curves,effectively tracing only the meaningful structural changes inthe signal.

Wasserstein distance is defined on probability distributionpairs on some metric space [31]. By definition, the Wasser-stein distance of degree p between the probability measuresµ and ν is

W pp (µ, v) =

(inf

γ∈Π(µ,v)

∫M∗M

‖x− y‖pdγ(x, y))1/p

(3)

Lets introduce a change point statistic (the maximum distanceover the window positions):

T (2h) = maxτ

W pp

(µl(τ), µr(τ)

)(4)

µl(τ) =1

h

τ−1∑t=τ−h

δXt , µr(τ) =1

h

τ+h−1∑i=τ

δXt (5)

where δXtis the Dirac function at position Xt (i.e., a unit

mass concentrated at location Xt), τ is the central point of

Page 3: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

the sliding window of length 2h, implying that the data serieswithin the sliding window is (Xτ−h, . . . , Xτ+h−1).

We calculate the Wasserstein distance for each position ofthe sliding window and create a new time series to be used forshowing how the curves differ inside of the window. In prac-tice, one can calculate Wasserstein distances via the Sinkhornalgorithm using the Optimal Transport Library [12].

2.3 Moving blocks bootstrap for rhythm analysisIn this step, we compute Wasserstein distance and execute thebootstrap procedure:

T b(2h) = maxτ

W pp

(µbl (τ), µ

br(τ)

)(6)

µbl (τ) =1

h

τ−1∑t=τ−h

δXk(t), µbr(τ) =

1

h

τ+h−1∑i=τ

δXk(t)(7)

where the set k(t) is generated by the Moving Block Boot-strap (MBB), and where the data is split and shuffled inton blocks randomly. Naturally, we assume that the points lo-cated in the peaks of the plot of the Wasserstein distances cor-respond to the arrhythmia points on the original periodic sig-nal. MBB was formulated in separate works by Kunsch [23]and Lahiri [24] as new scheme to create pseudo-samples. Theusual bootstrap forms new samples taking only random obser-vations from the initial sample, whereas, the MBB performsthis procedure only within a row of the formed blocks. Weuse a weighted block structure of the MBB, which generatesrandom weights for each block and, importantly, preservesthe structure of the original time series.

After the MBB resampling, we create a list of change pointstatistic values (T b(2h)) and set the threshold with α confi-dence level corresponding to the border between the normalpoints and the points of arrhythmia (see Figure 3). It is as-sumed that quantiles of T b(2h) are close to the quantiles ofT (2h) (bootstrap consistency), which we justify in the theo-retical part below.

0 5000 10000 15000 20000 25000Time

0.0

0.5

1.0

Dis

tanc

e

SeparationWasserstein Distance

Wpp

Wbp

Figure 3: Values of Wasserstein distances (bootstrap) help detectabnormal rhythm in unsupervised manner.

2.4 Gaussian approximationConsider two point clouds of size h. They may belong tothe same distribution (null hypothesis) or to different distri-butions. Assume that samples in each point cloud are inde-pendent (when we use block-bootstrap we may assume thatblocks are independent). Bootstrap consistency requires thatthe distribution of the Wasserstein distances between thesepoint clouds should be close to the distribution of the resam-pled ones. One necessary technique in the Bootstrap consis-tency proof is the Gaussian approximation. It appears that the

Wasserstein distance between two point clouds under the nullhypothesis can be approximated by the maximum of someGaussian vector. In [31], this approximation is proved for thecase of discrete distributions as a limit theorem.Theorem 1. Let measures r, s are defined on a discrete setX = x1, . . . , xN and i.i.d. samples X1, . . . , Xh ∼ r andY1, . . . , Yh ∼ s. Define convex sets:

Φp =u ∈ RN : ux − ux′ ≤ dp (x, x′) , x, x′ ∈ X

(8)

Φ∗p(r, s) =

(u,v) ∈ RX × RX :

〈u, r〉+ 〈v, s〉 =W pp (r, s)

ux + vx′ ≤ dp (x, x′) , x, x′ ∈ X

(9)

Multinominal covariance matrix Σ(r) is

rx1

(1− rx1) −rx1

rx2· · · −rx1

rxN

−rx2rx1

rx2(1− rx2

) . . . −rx2rxN

.... . .

...−rxN

rx1−rxN

rx2· · · rxN

(1− rxN)

(10)

such that with Gaussian random vector Z ∼ N (0, Σ(r)) itholds for empirical measures rh and sh:1) One sample - Null hypothesis

n12pWp (rh, r)

d−→maxu∈Φp

uTZ

1p

(11)

2) One sample - Alternative

n12 (Wp (rh, s)−Wp(r, s))

d−→ (12)

1

pW 1−pp (r, s)

max

(u,v)∈Φ∗p(r,s)uTZ

3) Two samples - Null hypothesis. If r=s and h is approach-ing infinity such that h → ∞ and h → λ ∈ (0, 1), then:

(nm

n+m

) 12p

Wp (rh, sh)d−→maxu∈Φp

uTZ

1p

(13)

4) Two samples - Alternative With n and m approaching in-finity such that h→∞ and h→ λ ∈ [0, 1], then:(

h2

2h

) 12p

(Wp (rh, sh)−Wp(r, s))d−→ (14)

1

pW 1−pp (r, s)

max

(u,v)∈Φ∗p(r,s)

√λuTZ +

√1− λvTZs

Below, we will extend the result from [31] for continuous

case and estimate the resulting convergence rate. We willneed two lemmas for that.Lemma 1. [9][3] Let independent samples X1, . . . , Xh ∈IRp be centered random vectors. Their Gaussian counter-parts are Yi ∼ N (0,Var [Xi]). Denote their sum by

SX =1√h

h∑i=1

Xi, SY =1√h

h∑i=1

Yi

Page 4: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

Assume ∃b > 0 such that for all j ∈ 1 . . . p: IE(SXj )2 ≥ band ∃Gh ≥ 1 such that for k ∈ 1, 2

1

h

h∑i=1

IE[|Xij |2+k

]≤ G2+k

h (15)

IE

[exp

(|Xij |Gh

)]≤ 2 (16)

Then for a set A of hyper-rectangle form

supA

∣∣IP SX ∈ A− IP SY ∈ A∣∣ ≤ O(G2h log

7(ph)

h

)1/6

Lemma 2 (Anti-concentration). [10] Let F ⊂ L2(P ) be aseparable class of measurable functions and entropy of F befinite. Denote by G(f), f ∈ F a Gaussian random processwith zero mean and covariance depended on measure P :

IE[G(f)G(g)] =

∫f(x)g(x)dP (x) (17)

Suppose that there exist constants σ, σ > 0 such thatσ2 ≤ IEf2 ≤ σ2 for all f ∈ F . Then ∀ x and ∆ > 0

IP

(supf∈F

G(f) ∈ [x, x+∆]

)≤ CA∆ (18)

where

CA = O

(IE

[supf∈F

G(f)

]+√

1 ∨ log(σ/∆)

)(19)

Proof. Basing on V. Chernozhukov’s work [8], [7] maximumof a Gaussian vector Z has the following anti-concentration

IP

(max1≤j≤p

Zj ∈ [x, x+∆]

)≤ ∆O

(IE max

1≤j≤pZj +

√1 ∨ log(σ/∆)

) (20)

Make a finite ε-net f1, f2, . . . for F and set Zj equals tovalue of G in the center of j-th cell G(fj),such that ∀ j and ‖f − fj‖P ≤ ε, ε→ 0

IP (|Zj −G(f)| > δ)→ 0 (21)

and subsequently

IP

(max

j,‖f−fj‖P≤ε|Zj −G(f)| > δ

)→ 0 (22)

andmaxjZj

Pr−→ maxf

G(f) (23)

Note that convergence by probability yields convergence bydistribution, so

IP

(supfG(f) ∈ [x, x+∆]

)→ IP

(maxjZj ∈ [x, x+∆]

)(24)

andIE sup

fG(f)→ IEmax

jZj (25)

Remark 1. The original proof one may find in Lemma A.1from article [10]. We have used finite entropy assumption inthe previous Lemma because it ensures the existence of pro-cess G(f) according to Dudley’s criterion for sample conti-nuity of Gaussian processes.Denote

Φp = (u, v) : |u(x) + v(y)| ≤ ‖x− y‖p, ∀x, ∀y ∈ Rd

IP (ϕ1, ϕ2) = IP

(√h max

(u,v)∈Φp

〈u, φ1〉+ 〈v, φ2〉 > x

)IPε(ϕ1, ϕ2) = IP

(√h max

(u,v)∈ε-net(Φp)〈u, φ1〉+ 〈v, φ2〉 > x

)Theorem 2. Consider i.i.d. samples X = X1, . . . , Xhand Y = Y1, . . . , Yh with a bounded support space Ω ofdimension d. Exist Gaussian vectors Z1, Z2 ∈ N (0, Σψ) andgeneralized Fourier basis ψi∞i=1, such that

Σψ = IEψψT (X1) (26)

and the Wasserstein distance between the samples can be ap-proximated by the maximum of Gaussian process with the fol-lowing upper bound∣∣∣∣IP(√hW p

p (X,Y ) > x

)− IP

(ZT1 ψ,Z

T2 ψ)∣∣∣∣

≤ CAO(log h

h

) 16+7d/p

,

(27)

where CA can be written as

O

(IE max

(u,v)∈Φp

〈u, ZT1 ψ〉+ 〈v, ZT2 ψ〉+√1 ∨ log(σ/µ3)

)Remark 2. From the practical sense, resampling of Wasser-stein distance should entail data normalization in order torestrict Ω and should keep the power p close to the data di-mension d.Remark 3. In combination with Gaussian comparison [9]one may show the bootstrap consistency, i.e.

IP(ZT1 ψ,Z

T2 ψ)≈ IP

((Zb)T1 ψ, (Z

b)T2 ψ)

and subsequently

IP

(√hW p

p (X,Y ) > x

)≈ IP

((Zb)T1 ψ, (Z

b)T2 ψ)

From this also follows that T (2h) converges to T b(2h) bydistribution when h→∞.

Proof. The dual formulation of Wasserstein distance is

W pp (X,Y ) = max

(u,v)∈Φp

〈u, φX〉+ 〈v, φY 〉 (28)

φX(x) =1

h

h∑i=1

δ(x−Xi), φY (x) =1

h

h∑i=1

δ(x−Yi) (29)

Show how the covering number of Φp depends on the sup-port space of empirical measures Ω. Construct an ε-net on

Page 5: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

empirical measures. Its cardinality is hN(Ω,ε) since each ε-cell of Ω may contain from 0 to h points. For each measurespair (µε1, µ

ε2) from ε-net one may set in correspondence pair

(uε, vε) ∈ Φp such that (uε, vε) is constant inside each cellof Ω and

W pp (µ

ε1, µ

ε2) = 〈uε, µε1〉+ 〈vε, µε2〉 (30)

and subsequently for each arbitrary pair of empirical mea-sures (µ1, µ2) on Ω there is an element (uε, vε) ∈ Φp withproperty

〈uε, µ1〉+ 〈vε, µ2〉 = 〈uε, µε1〉+ 〈vε, µε2〉 (31)W pp (µ1, µ2)− 〈uε, µ1〉+ 〈vε, µ2〉 =W pp (µ1, µ2)−W p

p (µε1, µ

ε2) ≤ 2εp (32)

Decompose densities ϕX , ϕY in ψi(x) basis

〈u, φX〉+ 〈v, φY 〉 =⟨u,

(1

h

∑i

ψ(Xi)

)Tψ

⟩+

⟨v,

(1

h

∑i

ψ(Yi)

)Tψ

⟩(33)

In order to replace ψ(Xi) and ψ(Yi) by Gaussian vectorsand use anti-concentration one has to make an ε-net approx-imation of (u, v) functions. We have shown above that thecowering number of Φp may by restricted by O(h1/ε

d/p

). Soone may set

log pε =1

εd/plog(h) +O(1) (34)

determining the dimension of maximum function. On ε-netLemma 1 gives upper bound∣∣IPε(ϕX , ϕY )− IPε(ZT1 ψ,ZT2 ψ)∣∣ ≤ O(G2

h log7(pεh)

h

)1/6

(35)where

G2+kh = IE(u(X1) + v(Y1))

2+k ≤ IE‖X1 − Y1‖p(2+k)

To make a step from IPε to IP remind that functions u and vare ‖ · ‖p- Lipschitz and subsequently

maxx|u(x)− uε(x)| ≤ ε

and using Lipschitz property with Lemma 2 one gets

|IPε(ZT1 ψ,ZT2 ψ)− IP (ZT1 ψ,ZT2 ψ)| ≤ O(CAε)

Setting optimal

ε =

(1

CAh1/6

) 11+7d/6p

gives the initial statement.

3 ExperimentsIn the experimental section we will demonstrate the use ofthe Theorem 1 on various physiological measurements andon their synthetic models. Of main interest to us is the ECGsignal, but the data streams from a wearable sensor that hadrecorded limb tremor activity in a patient with Parkinson’sdisease will also be analysed.

3.1 Real ECG dataWe used the MIT-BIH arrhythmia dataset from the Phys-ioNet [26]. The MIT-BIH Arrhythmia Dataset contains 48half-hour excerpts of two-channel ambulatory ECG record-ings, studied by the BIH Arrhythmia Laboratory between1975 and 1979. 23 recordings were chosen at random froma set of 4000 24-hour ambulatory ECG recordings and in-clude most common arrhythmia types. The remaining 25recordings include less common but clinically significant ar-rhythmias. Each record contains two 30-min ECG lead signal(mostly MLII lead and lead V1/V2/V4/V5) sampling the dataat a frequency of 360Hz. Our algorithm proved to work with-out any data pre-processing or noise reduction and detectedall types of arrhythmia (see results in Table 1).

3.2 Artificial ECG dataTo visualize the mechanism of arrhythmia detection, and topopulate the arrhythmia classes equally, we also developedan auxiliary model to generate artificial ECG. This modelcan simulate the normal beat and produce different types ofarrhythmia at random time moments: Atrial Flutter, AtrialFibrillation, Supraventricular Tachycardia, Premature AtrialContraction, and Ventricular Rhythms – all according to theinitialization parameters of the model. The generation is pro-duced via the discrete wavelet transform in the form:

Tm,n =

∫ ∞−∞

x(t)ψm,n(t)dt (36)

where ψm,n is the orthonormal wavelet basis. In this work weuse Daubechies wavelet, generated with scipy library, usinghyperparameter different types of arrhythmia were created,also Gaussian noise N (µ, σ) was added to provide realisticECG data.

Artificial normal rhythm modelTo simulate normal heart beat we used its verbatim defini-tion from the medical textbooks [18]. Normal sinus rhythmis a periodical signal, with the heart rate ranging from 60 to100 bpm. The QRS complex is normal, the P wave alwaysexists before the QRS, the T wave is visible after the QRS.Wavelets with Gaussian noise result in smooth curves in thepoint cloud, emphasizing the clean normal cycling beat tra-jectory (See Figure 4).

0 200 400 600 800 1000Time

1

0

1

Ampl

itud

e

0 2X

101

Y

0 200 400 600 800 1000Time

101

Ampl

itud

e

10 0 10X

0.0

2.5

Y

Figure 4: Artificial ECG (top) and spiking neuronal activityfrom [22] (bottom) with corresponding point clouds.

We consider 5 of the most frequent types of arrhythmia and1 signal with an unknown random rhythm anomaly, each of

Page 6: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

them corresponding to some unique PQRST characteristics.Each type of arrhythmia was initiated at a random time mo-ment within a given synthetic time series stream. This wasdone to understand the performance of the bootstrap detec-tion on the ideal data, to test its stability to the noise (omittedfor brevity), and to learn the changes that appear in the pointcloud when particular features of a cardiac malady emerge inthe signal.1 Simulated waveforms of each cardiac arrhythmia(and the corresponding point clouds) were calculated usingbasic textbook in cardiology [18]; the results are presented inFigure 5.

1

0

1Atrial Fibrilation Atrial Flutter Supraventricular Tachycardia

0 500 1000 15001

0

1Premature Atrial Contraction

0 500 1000 1500

Ventricular Rythms

0 500 1000 1500

Random Arrhythmia

Time

Ampl

itud

e

0 2X

1

0

1

Y

1 0 1X

0

2

Y

0 2X

1

0

1

Y

1 0 1X

1

0

1

Y

0 2X

1

0

1

Y

1 0 1X

0

2

Y

Figure 5: Artificial ECG with arrhythmia. The time series data andthe corresponding point clouds are encoded with the same framecolor.

3.3 Comparison with state-of-the-artEach ECG series was split to parts of different size (40,000,80,000, and 120,000 points). If we take the indexes of thepoints, whose values are above the separation line calculatedin the bootstrap procedure, these points in the original ECGwill be the points with the arrhythmia. The PhysioNet datasethas the annotations accompanying the data; therefore, it ispossible to compare the predicted labels of the points withthe ground truth.

The parameters of the first sliding window have the follow-ing values Ms = 450, s = 1, ∆t = 2 (∆t is step of moving

1To the authors’ knowledge, this kind of ECG representation –the point could that automatically boosts the visibility of an abnor-mal rhythm – has never been suggested for the clinical use before.We speculate that it could be easily integrated into the physiologicsystems to accompany or, perhaps, even to substitute the conven-tional ECG running monitors. With time, doctors can get accus-tomed to looking on the cyclic clouds just like they have gotten usedto the waveforms of conventional ECG.

Table 1: Comparison of proposed approach with state-of-the-art.Definitions of sensitivity and specificity follow those in Ref. [19].

Method Sens% Spec% Supervision

1 92.0 ± 4.0 86.0 ± 6.0 3

1* 97.2 ± 4.1 96.2 ±3.1 3

2 [34, 21] 91.6 77.0 3

2* [34, 21] 88.9 84.1 3

3 [1] 92.0 80.6 3

3* [1] 85.8 88.9 3

4 [17] 70.0 98.0 45 [19] 99.6 97.8 2

6 [2] 84.4 99.7 2

7 [30] 75.9 77.7 2

8 [20] 97.0 63.0 2

9 [14] 98.1 85.0 2

3 Unsupervised 4 Semi-supervised 2 Supervised1: Bootstrap on real data, 1*: Bootstrap on artificial data2: Ruptures(PELT) on Wasserstein distance data2*: Ruptures(PELT) on Euclidean distance data3: BOCP on Wasserstein distance data3*: BOCP on Euclidean distance data4: SVM + PCA 5: 2D CNN 6: Echo State Network7: LD QRS- and time interval-based features8: LR 9: DT+Heart rate features

window), corresponding to the typical ECG sampling param-eters, such as those in the MIT-BIH dataset. The size of thesecond sliding window is equal to 4 curve loops, it meansthat the window separates the series into 2 parts with 2 curveloops in each. We chose the confidence level α=5%.

To gauge the performance of the algorithm, we use sensi-tivity and specificity of the prediction [16, 19]. To calculatethem we used a hold-out test set comprising the ECG signalswith the normal heart beat (160 parts) and the ECG with ar-rhythmias (192 parts). As a result, the specificity of 86%, andthe sensitivity of 92% were obtained. We have also calculatedthe same metrics for the artificial data, and for all types of ar-rhythmia (42 time series, with arrhythmia in different parts ofseries). The results are the following: sensitivity 97.2% with4.1% standard deviation; specificity 96.2% with 3.1% stan-dard deviation. Optimal choice of prediction threshold andthe size of the sliding windows define the trade-off betweenthe high recall and the low false positive rate.

Comparison of our algorithm against several other ap-proaches is shown in Table 1. We note that the pipeline inFigure 1 was meant to be as simple as possible, providing arobust statistical approach to predict abnormal rhythms in anunsupervised manner with high computational efficiency. En-hancing the pipeline by obvious combination with the deeplearning or the hybrid model-based analysis methods is be-yond the scope of this paper. Relevant to the clinical ap-probation, the method was tested (and correctly detected) onthe short-episode arrhythmia in the long-term monitoring datastream (Figure 6).

Page 7: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

Figure 6: Algorithm’s performance on a long-term monitoring dataand the real ECG time series. The detected arrhythmia is visibleboth in the Wasserstein graph and in the 3D point cloud.

Figure 7: Algorithm’s performance on a real Parkinson’s diseasetremor data. Wasserstein plot and the corresponding point cloud.

3.4 Other datasetsWe went beyond ECG, and tested our method on other quasi-periodic physiological signals, yielding the following met-rics: neuron spike activity changes[22] were detected withsensitivity of 94.6% and specificity of 88%, and a limb tremordata in a patient with intermittent episodes of increased symp-toms of parkinsonism[13](intermittent tremor) – with sensi-tivity of 92.3%, and specificity of 96%.

4 ConclusionWe presented a new unsupervised and non-parametric learn-ing algorithm for detection of arrhythmias and of otherrhythm anomalies in the raw data of quasi-periodic record-ings. The detection relies on optimal transport theory com-bined with topological analysis and the bootstrap procedure,with the convergence of the bootstrap procedure being proventheoretically. The simple pipeline provides a robust statisti-cal approach to predict abnormal rhythms in an unsupervisedmanner with high computational efficiency.

Despite already demonstrating the level of performance ofthe supervised algorithms, our approach is expected to per-form even better if combined with the deep learning methods(similarly to [16]), especially in a recurrent neural networkconfiguration. Another line of the future work can entail theextension of the algorithm for the multi-class classificationalso using the unsupervised bootstrap method on the Wasser-stein distances.

References[1] Ryan Prescott Adams and David J. C. MacKay. Bayesian

online changepoint detection. arXiv, 0710.3742, 2007.

[2] Miquel Alfaras, Miguel C. Soriano, and Silvia Ortın. Afast machine learning model for ecg-based heartbeat clas-sification and arrhythmia detection. Frontiers in Physics,7:103, 2019.

[3] Valeriy Avanesov and Nazar Buzun. Change-point detec-tion in high-dimensional covariance structure. ElectronicJournal of Statistics, 12(2):3254–3294, 2018.

[4] Valeriy Avanesov. Nonparametric change point detectionin regression. arXiv, 1903.02603, 2019.

[5] Nazar Buzun and Valeriy Avanesov. Bootstrap for changepoint detection. arXiv, 1710.07285, 2017.

[6] Nazar Buzun. Gaussian approximation for empiricalbarycenters. arXiv, 1904.00891, 2019.

[7] Victor Chernozhukov, Denis Chetverikov, and KengoKato. Comparison and anti-concentration bounds for max-ima of gaussian random vectors. arXiv, 1301.4807, 2013.

[8] Victor Chernozhukov, Denis Chetverikov, and KengoKato. Anti-concentration and honest, adaptive confidencebands. The Annals of Statistics, 42(5):1787–1818, Oct2014.

[9] Victor Chernozhukov, Denis Chetverikov, and KengoKato. Central limit theorems and bootstrap in high dimen-sions. arXiv, 1412.3661, 2014.

[10] Victor Chernozhukov, Denis Chetverikov, and KengoKato. Gaussian approximation of suprema of empiricalprocesses. The Annals of Statistics, 42(4):1564–1597, Aug2014.

[11] Gari D Clifford, Francisco Azuaje, Patrick McSharry,et al. Advanced methods and tools for ECG data analysis.Artech house Boston, 2006.

[12] Marco Cuturi. Sinkhorn distances: Lightspeed compu-tation of optimal transport. In Advances in neural infor-mation processing systems, pages 2292–2300, 2013.

[13] Andrey Somov et. al. Parkinson’s disease: analysis ofmisdiagnosed cases in clinical practice. submitted, 2019.

[14] J Faganeli and F Jager. Automatic classification of tran-sient ischaemic and transient non-ischaemic heart-rate re-lated ST segment deviation episodes in ambulatory ECGrecords. Physiological Measurement, 31(3):323–337, feb2010.

Page 8: arXiv:2002.02717v1 [cs.LG] 7 Feb 2020

[15] Behzad Ghazanfari, Fatemeh Afghah, Kayvan Najarian,Sajad Mousavi, Jonathan Gryak, and James Todd. An un-supervised feature learning approach to reduce false alarmrate in icus. arXiv, 1904.08495, 2019.

[16] Giulia Guidi and Manas Karandikar. Classification ofarrhythmia using ecg data. Lecture notes, 2014.

[17] Jing Hua, Hua Zhang, Jizhong Liu, Yilu Xu, and FuminGuo. Direct arrhythmia classification from compressiveecg signals in wearable health monitoring system. Jour-nal of Circuits, Systems and Computers, 27(06):1850088,2018.

[18] Jane Huff. ECG workout: Exercises in arrhythmia in-terpretation. Lippincott Williams & Wilkins, 2006.

[19] Tae Joon Jun, Hoang Minh Nguyen, Daeyoun Kang,Dohyeun Kim, Daeyoung Kim, and Young-Hak Kim. Ecgarrhythmia classification using a 2-d convolutional neuralnetwork. arXiv, 1804.06812, 2018.

[20] Hiroshi Kawazoe, Yukiko Nakano, Hidenori Ochi,Masahiko Takagi, Yusuke Hayashi, Yuko Uchimura, Take-hito Tokuyama, Yoshikazu Watanabe, Hiroya Matsumura,Shunsuke Tomomori, Akinori Sairaku, Kazuyoshi Sue-nari, Akinori Awazu, Yosuke Miwa, Kyoko Soejima,Kazuaki Chayama, and Yasuki Kihara. Risk stratificationof ventricular fibrillation in brugada syndrome using non-invasive scoring methods. Heart Rhythm, 13(10):1947 –1954, 2016. Focus Issue: Sudden Death.

[21] Rebecca Killick, Paul Fearnhead, and Idris A Eckley.Optimal detection of changepoints with a linear computa-tional cost. Journal of the American Statistical Associa-tion, 107(500):1590–1598, 2012.

[22] Dmitriy Krylov, Dmitry V. Dylov, and Michael Rosen-blum. Reinforcement learning for suppression of collec-tive activity in oscillatory ensembles. arXiv, 1909.12154,2019.

[23] Hans R Kunsch. The jackknife and the bootstrap forgeneral stationary observations. The annals of Statistics,pages 1217–1241, 1989.

[24] Soumendra Nath Lahiri. Resampling methods for de-pendent data. Springer Science & Business Media, 2013.

[25] Nicos Maglaveras, Telemachos Stamkopoulos, Kon-stantinos Diamantaras, Costas Pappas, and MichaelStrintzis. Ecg pattern recognition and classification usingnon-linear transformations and neural networks: A review.International Journal of Medical Informatics, 52(1):191 –208, 1998.

[26] George B Moody and Roger G Mark. The impact ofthe mit-bih arrhythmia database. IEEE Engineering inMedicine and Biology Magazine, 20(3):45–50, 2001.

[27] Jose Perea and John Harer. Sliding windows and per-sistence: An application of topological methods to signalanalysis. arXiv, 1307.6188, 2013.

[28] Jose A Perea, Anastasia Deckard, Steve B Haase, andJohn Harer. Sw1pers: Sliding windows and 1-persistencescoring; discovering periodicity in gene expression timeseries data. BMC bioinformatics, 16(1):257, 2015.

[29] Jose Perea. Topological time series analysis. NOTICESOF THE AMERICAN MATHEMATICAL SOCIETY, 66(5),2019.

[30] Philip de Chazal, M. O’Dwyer, and R. B. Reilly. Au-tomatic classification of heartbeats using ecg morphologyand heartbeat interval features. IEEE Transactions onBiomedical Engineering, 51(7):1196–1206, July 2004.

[31] Max Sommerfeld and Axel Munk. Inference for em-pirical wasserstein distances on finite spaces. arXiv,1610.03287, 2016.

[32] Raanju R Sundararajan and Mohsen Pourahmadi. Non-parametric change point detection in multivariate piece-wise stationary time series. Journal of NonparametricStatistics, 30(4):926–956, 2018.

[33] Charles Truong, Laurent Oudre, and Nicolas Vayatis.Selective review of offline change point detection meth-ods. arXiv, 1801.00718, 2018.

[34] Charles Truong, Laurent Oudre, and Nicolas Vayatis.Selective review of offline change point detection meth-ods. Signal Processing, 167, 2018.