talk slides imsct2016

Nonparametric smooth estimators for probability densityfunction for circular data

Yogendra P. Chaubey

Department of Mathematics and StatisticsConcordia University, Montreal, Canada H3G 1M8

E-mail:[email protected]

Talk to be presented at the International Conference onInterdisciplinary Mathematics, Statistics and

Computational Techniques (IMSCT2016-FIMXXV), ManipalUniversity, Jaipur (India), December 22-24, 2016

Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 1 / 74

Abstract

In this talk we provide a short review for smooth estimation of density anddistribution functions for circular data. It has been shown that the usualkernel density estimator used for linear data may not be appropriate in thecontext of circular data. Fisher (1989: J. Structural Geology, 11, 775-778)presents an adaptation of the linear kernel estimator, however, betteralternatives are now available based on circular kernels; see e.g. Di Marzio,Panzera, and Taylor, 2009: Statistics & Probability Letters, 79(19),2066-2075. In this talk I use a simple approximation theory to motivatethe circular kernel density estimation and further explore the usefulness ofthe wrapped Cauchy kernel in this context. It is seen that the wrappedCauchy kernel appears as a natural candidate in connection to orthogonalseries density estimation on a unit circle. In the literature the use of vonMises circular kernel is investigated (see Taylor, 2008: ComputationalStatistics & Data Analysis, 52(7), 3493-3500), that requires numericalcomputation of Bessel function. On the other hand, the wrapped Cauchykernel is much simpler to use. This adds further weight to the considerablerole of the wrapped Cauchy distribution in circular statistics.


1 Introduction2 Motivation for the Circular Kernel Density Estimator3 Alternative Circular Density Estimators

Transformation Based Kernel Density EstimatorsApproximation Methods by Orthogonal Functions

Orthogonal Series of Cosine FunctionsFourier Series Expansion

Orthogonal Polynomials on Sub-intervalsChebyshev PolynomialsLegendre Polynomials

4 Density Estimator derived from Smooth Estimator of the Distribution FunctionInverse Steriographic Projection of Bernestein Polynomial Estimator

5 A Connection Between the Circular Kernel Density Estimator and the OrthogonalSeries

Some Preliminary Results from Complex AnalysisOrthogonal Series on Circle

6 ExamplesExample 1 - Turtle DirectionsExample 2 - Movements of Ants


Introduction

Given an i.i.d. d−dimensional random sample {X1, ..., Xn} from acontinuous DF F with density f, the Parzen-Rosenblatt kernel densityestimator is given by

f(x;h) ≡ n−1h−dn∑i=1

K

(x−Xi

h

). (1.1)

h is known as the window-width or band-width and K is called thekernel function. The band-width h typically tends to 0 as the samplesize n tends to infinity and K is typically a symmetric densityfunction centered around zero with unit variance.

The motivation for this estimator as originally put forward byRosenblatt (1956) comes from the approximation

f(x) ≈ Pr[X ∈ ∆x]

Vx. (1.2)


Introduction

∆x : A hypercube centered around x and Vx is the volume of theregion ∆x.

The region ∆x is moved around all the observations and theprobabilities are computed by the proportions, subsequently theapproximated values on the right hand side are averaged over all theobservations.

The generalization of this idea of using effectively an uniform kernelto symmetric probability distributions is in the background ofRosenblatt’s (1956) proposal.

The proposal was further, independently, studied by Parzen (1962)and popularized in many subsequent papers, and it is now well knownas Parzen-Rosenblatt kernel density estimator [see Prakasa Rao(1983, Ch. 3) or Silverman (1986)].


Introduction

An alternative motivation is provided recently in Chaubey and Sen(2002) using approximation theory, that provides the legitimacy ofasymmetric kernels while considering density estimation fornon-negative random variables.

Lemma 2.1 Consider a sequence {Φn(y, t)}∞n=1 of distribution functionsin Rd for every fixed t ∈ Rd, such that for Yn ∼ Φn(·, t)

(i) EYn = t(ii) vn(t) = max1≤i≤d V ar(Yin)→ 0 as n→∞, for every fixed t.(iii) Φn(y, t) is continuous in t.

Define for any bounded continuous multivariate function u(t)

un(t) =

∫Rd

u(x)dΦn(x, t). (2.11)

Then un(t)→ u(t) as n→∞ for t in any compact subset of Rd, theconvergence being uniform over any subset over which u(t) is uniformlycontinuous. Furthermore, if the function u(t) is monotone, theconvergence holds uniformly over entire Rd.


Introduction

In cases where multimodal and/or asymmetric models may beappropriate, semiparametric or nonparametric modelling may beconsidered more appropriate. [18] and [35] consideredsemi-parametric analysis based on mixture of circular normal and vonMises distributions and [23], [2], [19], and [29] have considered kerneldensity estimators for spherical data.In what follows we consider estimation of the density for circular data,i.e. an absolutely continuous (with respect to the Lebesgue measure)circular density f(θ), θ ∈ [−π, π], i.e f(θ) is 2π−periodic,

f(θ) ≥ 0 for θ ∈ R and

∫ π

−πf(θ)dθ = 1. (1.3)

Given a random sample {θ1, ...θn}for the above density, the kerneldensity estimator may be written as

f(x;h) =1

nh

n∑i=1

K

(θ − θih

)

). (1.4)


Introduction

Fisher (1989) proposed non-parametric density estimation for circulardata by adapting the linear kernel density estimator (1.1) with aquartic kernel [see also Fisher (1993), §2.2 (iv) where an improvementis suggested], defined on [−1, 1], that is given by

K(θ) =

{.9375(1− θ2)2 for − 1 ≤ θ ≤ 1;

0 otherwise.(1.5)

The factor .9375 insures that K(θ) is a density function. The datamust be transformed to the interval [−1, 1]. The values of thesmoothing constant h are proposed to be explored in the interval(.25h0, 1.5h0), h0 being given by

h0 =7

12 ζ

n15

, (1.6)

whereζ = 1/κ, (1.7)


Introduction

κ being the modified maximum Likelihood estimate of theconcentration coefficient of the von Mises circular distribution

KvM (θ) =1

2πI0(κ)exp{κ cos θ}, − π ≤ θ ≤ π, (1.8)

( See Eqs. (4.40) and (4.41) of Fisher (1993).)

Assuming that the sample values of θ are in the interval [0, 2π), thosevalues of θi will only contribute to the sum in (??) such that| θi−θh | < 1. We may remark here that in the algorithm described inFisher (1993), it is explicitly assumed that the data lies in the interval[−1, 1]. Further, in general this method of smoothing does notproduce a periodic estimator, a property that is essential for a circulardistribution. Fisher (1993) suggested to perform the smoothing byreplicating the data to 3 to 4 cycles and consider the part in theinterval [−π, π]. This problem is easily circumvented by using circularkernels.


Introduction

Di Marzio et al. (2011) considered nonparametric estimation ofToroidal Density using Toroidal Kernels defined on a d−dimensionalTorous Td = [−π, π]d, that applies for the present case with d = 1.The Toroidal kernels are defined as d-fold product

KC =

d∏s=1

Kκs

where C represents the set of smoothing parameters {κ1, ..., κd} and Kκ isreal-valued function defined on T = [−π, π] with the following properties:

T1. It admits an uniformly convergent Fourier series{1 + 2

∑∞j=1 γj(κ) cos(jθ)}/(2π), θ ∈ [−π, π], where γj(κ) is strictly

increasing.T2.

∫TKκ(θ)dθ = 1, and if Kκ takes negative values, there exists

0 < M <∞ such that for all κ > 0∫T|Kκ(θ)|dθ ≤M.

T3. For all 0 < δ < π, limκ→∞∫δ≤|θ|≤π |Kκ(θ)|dθ = 0Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 10 / 74

Introduction

The candidates for such a function include many well-known familiesof circular distributions. The corresponding form of the kernel circulardensity estimator is given by

f(θ;κ) =1

n

n∑i=1

J (θ − θi;κ)) . (1.9)

where J is a circular density with center 0 and concentrationparameter κ.

Taylor (2008) considered the von Misses circular normal distributionwith concentration parameter κ for J, that gives the estimator for f as

fvM (θ; ρ) =1

n

n∑i=1

KvM (θ; θj , ρ), (1.10)

and discussed determination of the optimal data based choice for κ.


Introduction

In Section 2, I present a simple approximation theory motivation forconsidering the circular kernel density estimator given in (1.9).

Next I demonstrate that the wrapped Cauchy kernel presents itself asthe kernel of choice by considering an estimation problem on the unitcircle. We also show that this approach leads to orthogonal seriesdensity estimation, however no truncation of the series is required.

The wrapped Cauchy distribution with location parameter µ andconcentration parameter ρ is given by

KWC(θ;µ, ρ) =1

2π

1− ρ2

1 + ρ2 − 2ρ cos(θ − µ),−π ≤ θ < π, (1.11)

that becomes degenerate at θ = µ as ρ→ 1. The estimator of f(θ)based on the above kernel is given by

fWC(θ; ρ) =1

n

n∑i=1

KWC(θ; θj , ρ). (1.12)


Introduction

In Section 3, first we present some basic results from the literature onorthogonal polynomials on the unit circle and then introduce thestrategy of estimating f(θ) by estimating an expectation of a specificcomplex function, that in turn produces the non-parametric circularkernel density estimator in (1.12).

Section 4 shows that the circular kernel density estimator isequivalent to the orthogonal series estimation in a limiting sense.This equivalence establishes a kind of qualitative superiority of thekernel estimator over the orthogonal series estimator that requires theseries to be truncated, however the kernel estimator does not havesuch a restriction.

In Section 5, we present some alternative approaches based ontransformations and provide a critical assessment. The last sectionprovides some examples.


Motivation for the Circular Kernel Density Estimator

The starting point of the nonparametric density estimation is thetheorem given below from approximation theory (see [33]). Beforegiving the theorem we will need the following definition:

Definition

Let {Kn} ⊂ C∗ where C∗ denotes the set of periodic analytic functionswith a period 2π. We say that {Kn} is an approximate identity if

A. Kn(θ) ≥ 0 ∀ θ ∈ [−π, π];

B.∫ π−πKn(θ) = 1;

C. limn→∞max|θ|≥δKn(θ) = 0 for every δ > 0.



The definition above is motivated from the following theorem which issimilar to the one used in the theory of linear kernel estimation (see[37]). Also, note that we have replaced Kn of [33]) by 2πKn withoutchanging the result of the theorem.

Theorem

Let f ∈ C∗, {Kn} be approximate identity and for n = 1, 2, ... set

f∗(θ) =

∫ π

−πf(η)Kn(η − θ)dη. (2.1)

Then we havelimn→∞

supx∈[−π,π]

|f∗(θ)− f(θ)| = 0. (2.2)



Note that taking the sequence of concentration coefficients ρ ≡ ρnsuch that ρn → 1, the density function of the Wrapped Cauchy willsatisfy the conditions in the definition in place of Kn. In general Kn,appearing in the above theorem may be replaced by a sequence ofperiodic densities on [−π, π], that converge to a degeneratedistribution at θ = 0.

For a given random sample of θ1, ..., θN from the circular density f,the Monte-Carlo estimate of f∗ is given by

f(θ) =1

n

n∑i=1

Kn(θi − θ). (2.3)

The kernel given by the wrapped Cauchy density satisfies theassumptions in the above theorem that provides the estimatorproposed in (1.12).



This gives the motivation for considering circular kernels fornonparametric density estimation for circular data as proposed indiscussed in a more detailed by Marzio MD, et al. (2009). However,their development considers circular kernels of order r = 2 thatfurther requires∫ π

−πsinj(θ)Kn(θ)dθ = 0 for 0 < j < 2.

The circular kernel density estimator based on the wrapped Cauchyweights is given by

fWC(θ) =1

n

n∑i=1

fWC(θi − θ). (2.4)

that may be considered more convenient in contrast to the von Miseskernel due to the fact that it does not require computation of anintegral I0(κ).



The circular kernel estimators are also implicit in the estimatorproposed in Jammalmadaka and Sen (2001 , p.243) where the lineardistance θ − θi in the linear kernel density is replaced by angulardistance 1− cos(θ − θi) resulting in the estimator

fJS(θ) =1

nh

n∑i=1

kn

(1− cos(θi − θ)

h

). (2.5)

Remark

One may seek to see if the simple Cardioid distribution given by

fC(θ) =1

2π(1 + ρ cos(θ)), 0 ≤ θ < 2π (2.6)

will be a candidate as circular weight function. One may note that in theextreme case when ρ→ 0, it becomes an uniform distribution, so thequantity ρ is not really a concentration parameter.



Remark

There are various extensions of Cardioid distribution discussed by Abeet al. (2009, 2010, 2013), however they loose the simplicity of theoriginal Cardioid distribution.

These extensions involve replacing cos(θ − µ) bycos(θ − µ+ ν sin(θ − µ)). These are unimodal only for |ν| ≤ 1 andthus we may not get degenerate distribution as |ν| → 1.

For ν = 0 we get the original Cardioid and therefore even thisextension is of not much use. Also, it involves computation of Besselfunction, so it is no better than the von Mises distribution.


Remark

It can be, however, modified to a density

fEC(θ) =2m−1

π(2mm

) (1 + cos(θ))m, 0 ≤ θ < 2π (2.7)

that gets concentrated at zero as m→∞. This has been consideredby Egecioglu and Srinivasan (2000) in the form

w(x) = cm(x) =1

Amcos2m(x/2). (2.8)

where

Am =π

22m−1

(2m

m

). (2.9)

This can be seen to be of the form as in (2.7) by noting that

cos2(x/2) =1

2(1 + cosx). (2.10)



Remark

Marzio et al. (2009) point out the applicability of the uniform kernelon [− π

κ+1 ,πκ+1), Dirichlet and Fejer’s kernels given by

Dκ(θ) =sin{(κ+ 1

2)θ}2π sin(θ/2)

, Fκ(θ) =1

2π(κ+ 1)

[sin({κ+ 1}θ/2)

sin(θ/2)

]2,

(2.11)κ ∈ N.


Transformation Based Kernel Density Estimators

An alternative approach of adapting the linear kernel densityestimator (1.1) to circular data is to transform the interval [−π, π] to(−∞,∞). This may be achieved the stereographic projection, atechnique that has been used in the literature to introduce newfamilies of circular distributions (see e.g. Abe et. al. (2010)), that isgiven by the transformation θ 7→ x = tan( θ2) ∈ (−∞,∞). Let thekernel density estimator given by the data on x be denoted by g(x;h),then the corresponding f(θ) is given by

f(θ;h) =1

1 + cos(θ)g

(sin θ

1 + cos θ;h

). (3.1)

An attractive feature of the above procedure in contrast to Fisher’sadaptation of the linear method is that the latter method gives aperiodic estimator, however the former does not.


Approximation Methods by Orthogonal Functions

Here we will consider approximating continuous bounded functionsf(x) in a compact interval I = [a, b] ⊂ R. For a given nonnegativefunction w(x) defined on I, the L2 weighted norm of f(x) is definedas

‖f‖w2 =

∫ b

a|f(x)|2 w(x)dx. (3.2)

The space of such functions will be denoted by Lw2 . The generalmethod of approximation of functions f ∈ Lw2 involves the set ofbasis functions {ϕk(x)}∞0 and a non-negative weight function w(x)such that

< ϕk, ϕk′ >w=

∫ b

aϕk(x)ϕk′(x)w(x)dx =

{0 for k 6= k′

> 0 for k 6= k′(3.3)



Then for f ∈ Lw2 the partial sum

fN (x) =

N∑k=0

gkϕk(x), (3.4)

where

gk =

∫ b

af(x)ϕk(x)w(x)dx, (3.5)

is considered to be the ‘best’ approximation in terms of the fact thatthe coefficients gk are such that ak = gk minimise

‖f − fN‖w2 =

∫ b

a|f(x)|2 w(x)dx. (3.6)

The original idea is attributed to Cencov (1962) that considered thecosine basis

{ϕ0(x) = 1, ϕj(x) =√

2 cos(πjx), j = 1, 2, ...}and w(x) = 1.



In recent literature, many other type of basis functions includingtrigonometric, polynomial, spline, wavelet and others have beenconsidered. The reader may refer to Devroy and Gyorfi (1985),Efromvich (1999), Hart (1997), Walter (1994) for a discussion ofdifferent bases and their properties.Efromvich (2010) presents an extensive overview of density estimationby orthogonal series concentrated on the interval [0, 1]. As mentionedin Efromvich (2010) the choice of the basis function primarily dependson the support of the function. Thus for the densities on (−∞,∞), oron [0,∞), Hermite and Laguerre series are recommended; see Devroyeand Gyorfy (2001), Walter (1994), Hall (1980) and Walter (1977).For compact intervals, trigonometric (or Fourier) series arerecommended; discussion about these can be found in Cencov (1980),Devroy and Gyorfy (1985), Efromvich (1999), Hart (1997), Silverman(1986), Hall (1981), Tarter and Lock (1993).Classical orthogonal polynomials such as Chebyshev, Jacobi, Legendreand Gegnbauer are also popular; see Trefthen (2013)), Rudzkis andRadavicius (2005) and Buckland (1992).Wavelet bases are becoming increasingly popular, due to their abilityin visualizing local frequency fluctuations and discontinuities, eventhough their explicit form is not available.


Once the basis functions are chosen, the density f(x) for a randomsample {x1, ..., xn} may be estimated by

fN (x) =

N∑k=0

gkϕk(x), (3.7)

where

gk =1

n

n∑i=1

ϕk(xi). (3.8)

Efromvich (2010) discusses in detail various strategies of selecting N,albeit in a more general setting by considering the density estimatorsof the form

f(x) = f(x, {wk}) =

∞∑k=0

wkgkϕk(x) (3.9)

that includes the truncated estimator fJ as well as hard-thresholdingand block-thresholding estimators, commonly studied in the waveletliterature. However, this modification will not be pursued in furtherdiscussion.


Orthogonal Series of Cosine Functions

It can be shown that {1, cos(x), cos(2x), ..} = {cos(kx)}∞k=0 is anorthogonal system on [−π, π]. Assuming that for the circular data,the support of the density is the interval [−π, π], (else we transformthe sample θ1, ..., θn to x1, ..., xn The density estimator for thetransformed data is, therefore, given by

fOC(x) =1

2π+

N∑k=1

gk cos(kθ). (3.10)

where

gk =

nπ∑i=1

cos(kθi).

N is considered a smoothing parameter and may be determined usingthe cross-validation method described in Efromvich (2010).


Orthogonal Series of Cosine Functions

A common problem with truncation and cosine series estimator is thatit may not produce a true density. In order to alleviate this problemEfromvich (1999) considers L2 projection of f onto a class ofnon-negative densities given by

f(x) = max(0, f(x)− c), (3.11)

where c is chosen to make f a proper density.A 2π periodic function h(x) may be approximated by a truncatedFourier series as

f(x) ≈=1

2a0 +

N∑1

{ak cos(kx) + bk sin(kx)}, (3.12)

where,

ak =1

π

∫ π

−πf(x) cos(kx), k = 0, 1, 2, ...N ; (3.13)

bk =1

π

∫ π

−πf(x) sin(kx), k = 1, 2, ...N. (3.14)


Fourier Series

In the context of the circular data {θ1, ..., θn} on support [−π, π],a0 = 1

π , and the unknown coefficients can be estimated by

ak =1

nπ

n∑i=1

cos(kθi), bk =1

nπ

n∑i=1

sin(kθi); k = 1, 2.... (3.15)

Thus, the Fourier series density estimator is given by

fFS(θ) =1

2a0 +

N∑1

{ak cos(kθ) + bk sin(kθ)}. (3.16)


Chebyshev Polynomials

Orthogonal polynomials of the Chebyshev’s class on [−1, 1] can beconverted to orthogonal polynomials on a circle C = {z|‖z‖ = 1}through the transformation

x =1

2(z + z−1).

This has been quite popular in numerical approximation of functions(see for example Trefethen (2013), Chapter 3). the kth Chebyshevpolynomial can be defined by the real part of the function zk on theunit circle:

x =1

2(z + z−1) = cos θ, θ = cos−1 x, (3.17)

Tk(x) =1

2(zk + z−k) = cos(kθ). (3.18)

The following theorems justify the use of orthogonal polynomialestimators (see Rudin (1976)).

Theorem

If h is Lipschitz continuous on [−1, 1], it has a unique representation asChebyshev series,

h(x) =1

2a0 +

∞∑k=1

akTk(x), (3.19)

which is absolutely and uniformly convergent. The coefficients are given bythe formula

ak =2

π

∫ 1

−1

h(x)Tk(x)√(1− x2)

dx =1

π

∫ π

−πh(cos θ) cos(kθ)dθ, (3.20)

and for k = 0, by the same formula with the factor 2/π changed to 1/π.


Chebyshev Polynomials

Use of Chebyshev polynomials for density estimation for the circulardata requires the correspondence

f(θ) = h(x) : x = cos θ, θ ∈ (−π, π). (3.21)

The density estimator then becomes

fJ(θ) =1

2π+

1

π

N∑k=1

gk cos(kθ),−π ≤ θ ≤ π, (3.22)

where

gk =1

n

n∑i=1

cos(kθi); k = 1, 2, ...

that gives the same estimator as those considered by considering theorthogonal functions {cos(kx)}∞k=0 discussed earlier.l


Legendre Polynomials

The Chebyshev weight function is singular at the extremes of theinterval of support. Arbitrary power singularities may be assigned toeach extreme giving a general weight function

w(x) = (1− x)α(1 + x)β (3.23)

where α, β > 1 are parameters. The associated polynomials are

known as Jacobi polynomials, usually denoted as {P (α,βn }.

The special case α = β, gives orthogonal polynomials that are knownas as Gegenauer or ultraspherical polynomials and are subject of muchdiscussion in numerical analysis; see Koornwinder et al. (2010).

The most special case of all α = β = 0 gives a constant weightfunction and produces what are known as Legendre polynomialsdenoted by Pn(x), n = 0, 1, 2, .... that define a orthogonal system forthe interval [−1, 1].



They may be simply described as

P0(x) = 1, P1(x) = x (3.24)

and the recurrence relation(n+ 1)Pn+1(x) = (2n+ 1)xPn(x)− nPn−1(x).

An explicit representation may be given by the following formula:

Pn(x) = 2nn∑k=0

(n

k

)(n+k−12

n

)xk (3.25)



This avoids the possible numerical problem in computing thecoefficients due to singularity at the extremes. Hence this will be apreferred alternative to the Chebyshev polynomials. In this case thedensity estimator is given by and the density estimator in the originalscale is given by

fLP (θ) =1

2π+

1

π

N∑k=1

gkPk (cos θ) (3.26)

where

gk =1

n

n∑i=1

Pk (cos θi) . (3.27)


Derivative of DF Estimator

The justification of the circular kernel method for smoothing thedensity function may be taken over for finding a smooth estimator ofthe distribution function. Babu and Chaubey (2006) considerestimating the distributions defined on a hypercube, extending theunivariate Bernstein polynomials (Babu, Chaubey and Canty (2002),Vitale (1973)). Denoting the empirical distribution function of arandom sample of n− θ values the Bernstein polynomial estimator ofthe distribution function is defined as

Bm(x) =

m∑j=0

Fn

(j

m

)(m

j

)xj(1− x)m−j , x ∈ [0, 1]. (4.1)


Bernstein Polynomials

The derivative of Bm(x) is proposed at the Bernstein polynomialdensity estimator for x ∈ [0, 1] that is given by

B′m(x) =

m∑j=1

[Fn

(j

m

)− Fn

(j − 1

m

)]β(x; j,m− j + 1), x ∈ [0, 1],

(4.2)where β(x; a, b) is given by

β(x; a, b) =1

B(a, b)xa−1(1− x)b−1, (4.3)

and B(a, b) = (a+ b− 1)!/[(a− 1)!(b− 1)!].Transforming the interval [0, 2π] to [0, 1], to get the density estimatorof Θ/(2π) and transforming back we get the density estimator off(θ) as

fB(θ) =1

2π

m∑j=1

[Fn

(2πj

m

)− Fn

(2π(j − 1)

m

)]β

(θ

2π; j,m− j + 1

)(4.4)



Carnicero et al. (2010) note that this does not provide a periodicestimator of the density and propose a modification that requires indefining the distribution function from an origin ν, denoted by F (ν)

such that

F (ν)n (θ)

(2π

m

)= 1− F (ν)

n (θ)

(2π(m− 1)

m

).

In practice this requires estimating ν from the data; maximumlikelihood is recommended, however, this does not guarantee that theequation will be satisfied. In this case it is recommended to averagethe two values on both sides of the above and replace the first andthe last weight of the beta function by this value.

Below we show that aperiodic density estimator is obtained, if wefollow the idea behind Theorem 1.1 in approximating F (θ). We quotethe following result from Feller (1965§4.2) as used in Babu, Cantyand Chaubey (2002).



Theorem

Let u be any bounded and continuous function and Ψx,n, n = 1, 2, ... be afamily of distributions with mean µn(x) and variance vn(x) such thatµn(x)→ x and h2n(x)→ 0. Then

u∗(x) =

∫ ∞−∞

u(t)dΨx,n(t)→ u(x). (4.5)

The convergence is uniform in every subinterval in which hn(x)→ 0uniformly and u is uniformly continuous.

Replacing u(t) by F (t) we get an uniformly convergent approximationof F given by

F ∗(θ) =

∫ 2π

0F (η)dΨθ,n(η), 0 ≤ θ < 2π. (4.6)



Bernstein polynomial estimator is obtained by choosing Psix,ndefined on the support [0, 1] by attaching the binomial weight(mj

)xj(1− x)m−j to the point j/m; this makes µn = x and

h2n = x(1− x)/n and these satisfy the conditions in the abovetheorem.

However, if the support is [0, 2π), a prudent choice of Psiθ,n may bea circular distribution itself, e.g. von Mises distribution (see (??) )vM(θ, κn), with mean θ and concentration parameterκn → 0asn→∞. Note that we can write (5.5) as

F ∗(θ) = F (2π)Ψθ,n(2π)−∫ 2π

0Ψθ,n(η)f(η)dη

= 1−∫ 2π

0Ψθ,n(η)dF (η) (4.7)



Since F is unknown, in order to estimate F ∗, we use a plug-inestimate resulting into an smooth estimator of F given by

F (θ) = 1−∫ 2π

0Ψθ,n(η)dFn(η)

= 1− 1

n

n∑i=1

Ψθ,n(θi) (4.8)

Let

dn(θ; η) =dΨθ,n(η)

dθ(4.9)

then the proposed smooth estimator of f(θ) is given by

f(θ) = − 1

n

n∑i=1

dn(θ; θi). (4.10)



Considering circular distributions with mean µ and concentrationparameter kn → 0 as n→∞, let the density function correspondingto Ψθ,n correspond to a location family given by ψ(θ − µ;κ) that hasmean µ and concentration parameter κ, then

dn(θ; η) = −ψ(η − θ;κ) (4.11)

and the density estimator f(θ) becomes

f(θ) =1

n

n∑i=1

ψ(θi − θ;κ)). (4.12)

which is of the same form as the circular kernel density estimatorgiven in (2.7).



Candidates for symmetric densities concentrated at zero may beconstructed from stereographic projection of the interval [−1, 1], say,to the circle. For example, consider the beta-type distribution givenby Seshadri (1991) that is obtained from a symmetricbeta-distribution by the real Mobius transformation,

w(x;α, γ) =(1− α2)γ

B(γ, 12)

(1− x2)γ−1

(1− αx)2γ, − 1 < x < 1. (4.13)

Considering α = 0, we get a symmetric distribution over x ∈ [−1, 1].The stereographic projection of this interval over [0, 2π) is given byx = sin θ. This gives

ψ(η; θ, n) =1

2

1

B(n+12 , 12)

[cos

(η − θ

2

)]n, 0 ≤ η < 2π. (4.14)

This gives a circular symmetric (around η = π) distribution that getsmore and more concentrated as n→∞. This gives similar typeestimator as given by the circular kernel in (2.7).


ISP of Bernestein Polynomial Estimator

The interval [0, π/2] can be mapped to the interval [0, 1] by thetransformation

x 7→ θ = sin−1 x : θ ∈ [0, π/2] (4.15)

Hence the angles θ ∈ [0, 2π) may be converted to [0, 1] by thetransformation

x = sin(θ/4)

This transforms the Bernstein polynomial to a periodic function givenby

fBS(θ) =1

4cos

(θ

4

)B′m

(sin

(θ

4

))(4.16)

=1

4cos

(θ

4

) m∑j=1

[Gn

(j

m

)−Gn

(j − 1

m

)]wj(θ) (4.17)

=1

4cos

(θ

4

) m∑j=1

[Fn

(sin

j

4m

)− Fn

(sin

j − 1

4m

)]wj(θ),(4.18)

where

wj(θ) = β

(sin

(θ

4

); j,m− j + 1

)and Gn represents the empirical distribution function of thetransformed data.


A Connection Between the Circular Kernel DensityEstimator and the Orthogonal Series

Let D be the open unit disk, {z | |z| < 1}, in Z and let µ be acontinuous measure defined on the boundary ∂D, i.e. the circle{z | |z| = 1}. The point z ∈ D will be represented by z = reiθ forr ∈ [0, 1), θ ∈ [0, 2π) and i =

√−1.

A standard result in complex analysis involves the Poissonrepresentation that involves the real and complex Poisson kernels thatare defined as

Pr(θ, ϕ) =1− r2

1 + r2 − 2r cos(θ − ϕ)(5.1)

for θ, ϕ ∈ [0, 2π) and r ∈ [0, 1) and by

C(z, ω) =ω + z

ω − z(5.2)

for ω ∈ ∂D and z ∈ D.Yogendra Chaubey (Concordia University) Department of Mathematics & Statistics December 22-24, 2016 44 / 74


The connection between these kernels is given by the fact that

Pr(θ, ϕ) = Re C(reiθ, eiϕ) = (2π)fWC(θ;ϕ, ρ). (5.3)

The Poisson representation says that if g is analytic in aneighborhood of D with g(0) real, then for z ∈ D,

g(z) =

∫ (eiθ + z

eiθ − z

)Re(g(eiθ))

dθ

2π(5.4)

(see [42, p. 27]).

This representation leads to the result (see (ii) in §5 of [42]) that forLebesgue a.e. θ,

limr↑1

W (reiθ) ≡W (eiθ) (5.5)

exists and if dµ = w(θ) dθ2π + dµs with dµs singular,


A Connection Between the Circular Kernel DensityEstimatorand the Orthogonal Series

Thenw(θ) = ReW (eiθ), (5.6)

where

W (z) =

∫ (eiθ + z

eiθ − z

)dµ(θ). (5.7)



Our strategy for smooth estimation is the fact that for dµs = 0 wehave

f(θ) =1

2πlimr↑1

Re W (reiθ), (5.8)

where, now W (z) is defined as

W (z) =

∫ (eiθ + z

eiθ − z

)f(θ)dθ. (5.9)

We define the estimator of f(θ) motivated by considering anestimator of W (z), the identity (5.6) and (5.8), i.e.

fr(θ) =1

2πRe Wn(reiθ) (5.10)

where

Wn(reiθ) =1

N

N∑j=1

(eiθj + reiθ

eiθj − reiθ

), (5.11)

where r has to be chosen appropriately. Recognize that

Wn(reiθ) =1

n

n∑j=1

C(z, ωj), (5.12)

where ωj = eiθj .



Then using (5.4), we have

Re Wn(reiθ) =1

n

n∑j=1

Pr(θ, θj), (5.13)

and therefore

fr(θ) =1

(2π)n

n∑j=1

Pr(θ, θj)

=1

n

n∑j=1

fWC(θ; θj , r), (5.14)

that is of the same form as in (1.12).


Orthogonal Series on a Circle

We get the Fourier expansion of W (z) with respect to the basis{1, z, z2, ...} as

W (z) = 1 + 2

∞∑j=1

cjzj (5.15)

where

cj =

∫e−ijθf(θ)dθ,

is the jth trigonometric moment.

The series is truncated at some term N∗ so that the the error isnegligible.

However, we show below that estimating the trigonometric momentcn, n = 1, 2, ... as

cj =1

n

n∑k=1

e−ijθk ,

the estimator of W (z) is the same as given in the previous section.



This can be shown by writing

W (z) = 1 +2

n

n∑j=1

{∞∑k=1

e−ikθjzk}

= 1 +2

n

n∑j=1

{∞∑k=1

(ωjz)k};ωj = eiθj

= 1 +2

n

n∑j=1

(ωjz

1− ωjz

)

=2

n

n∑j=1

(1

2+

ωjz

1− ωjz

)

=1

n

n∑j=1

(1 + ωjz

1− ωjz

)=

1

n

n∑j=1

C(z, ωj),

which is the same as Wn(z) given in (5.12).



Remark:

This ensures that the orthogonal series estimator of the densitycoincides with the circular kernel estimator.

The determination of the smoothing constant may be handled basedon the cross validation method outlined in [46]

Note that the simplification used in the above formulae does not workfor r = 1. Even though, the limiting form of (5.15) is used to definean orthogonal series estimator as given by

fS(θ) =1

2π+

1

πn

n∑j=1

n∗∑k=1

cos k(θ − θj), (5.16)

where n∗ is chosen according to some criterion, for example tominimize the integrated squared error.



Thus the above discussion presents two contrasting situations: in onewe have to determine the number of terms in the series and in theother number of terms in the series is allowed to be infinite, however,we choose to evaluate Re W (eiθ) for some r close to 1 as anapproximation to Re W (eiθ).

Considering above discussions, we provide below some examples usingthe von Mises and wrapped Cauchy kernels and contrast them withusing the inverse stereographic kernels using the normal and logisticdistributions.

It is seen that wrapped Cauchy may provide estimators that are notas smooth as those provided by other methods. Further investigationof these methods is in progress.


Turtle Data

The following Table gives the measurements of the directions taken by76 turtles after treatment from Appendix B.3 in Fisher (1993). Thisdata has been analysed in Fisher (1989) and Prakasa Rao (1986.)8,38,50,64,83,98,204,257, 9,38,53,65,88,100,215,268,13,40,56,65,88,103,223,285, 13,44,57,68,88,106,226,319,14,45,58,70,90,113,237,343, 18,47,58,73,92,118,238,350,22,48,61,78,92,138,243, 27,48,63,78,93,153,244,30,48,64,78,95,153,250, 34,48,64,83,96,155,251


Turtle Data - WC Kernel:

Histogram and Circular Kernel Density

Turtle Data, Stefens(1969)Theta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.4 Rho=.6Rho=.7Rho=.8

Figure: 1. Histogram and Circular Kernel Density with wrapped Cauchy;ρ = .6, .7, .8.


Turtle Data - VM Kernel:

Histogram and Circular Kernel VM Density

Turtle Data, Stefens(1969)Theta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.4 Kappa=3Kappa=4Kappa=5

Figure: 2. Histogram and Circular Kernel Density with von Mises; κ = 3, 4, 5.


Turtle Data - ISC Normal Kernel:

Histogram and Circular Kernel ISCNorm Density

Turtle DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.4 sigma=.25sigma=.3sigma=.35

Figure: 3. Histogram and Circular Kernel Density with ISCNorm Kernel;σ = .25, .3, .35.


Turtle Data - ISC Logistic Kernel:

Histogram and Circular Kernel ISCLogistic Density

Turtle DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.4 sigma=.15sigma=.25sigma=.35

Figure: 4. Histogram and Circular Kernel Density with ISCLogistic Kernel;σ = .15, .25, .35.


Turtle Data

The following Table gives the measurements of the directions chosenby 100 ants in response to an evenly illuminated black target fromAppendix B.7 in Fisher (1993).330,180,160,200,180,190,10,160,110,140,290,160,200,180,160,210,220,180,270,40,60,280,190,120,210,220,180,120,180,300,200,180,250,200,190,200,210,150,200,80,200,170,180,210,180,60,170,300,180,210,180,190,30,130,230,260,90,190,140,200,280,180,200,30,50,110,160,220,360,170,220,140,180,210,150,180,180,160,150,200,190,150,200,200,210,220,170,70,160,210,180,150,350,230,180,170,200,190,170,190


Ants Data - WC Kernel:

Histogram and Circular Kernel WC Density

Ants DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.40.5 Rho=.81

Rho=.82Rho=.83

Figure: 5. Histogram and Circular Kernel Density with wrapped Cauchy;ρ = .81, .84, .86.


Ants Data - VM Kernel:

Histogram and Circular Kernel VM Density

Ants DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.40.5 Kappa=5

Kappa=6Kappa=7

Figure: 6. Histogram and Circular Kernel Density with von Mises Kernel;κ = 10, 20, 40.


Ants Data - ISC Normal Kernel:

Histogram and Circular Kernel ISCNorm Density

Ants DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.40.5 sigma=.2

sigma=.25sigma=.3

Figure: 7. Histogram and Circular Kernel Density with ISCNorm kernel;σ = .2, .25, .3.


Turtle Data - ISC Logistic Kernel:

Histogram and Circular Kernel ISCLogistic Density

Ants DataTheta

Dens

ity

0 1 2 3 4 5 6 7

0.00.1

0.20.3

0.40.5 sigma=.15

sigma=.25sigma=.35

Figure: 8. Histogram and Circular Kernel Density with ISCNorm kernel;σ = .15, .25, .35.


Conclusion:

The von Mises kernel seems to provide smoother plots as compared to WCbut they are qualitatively the same. The estimators obtained by IStransformation also produce similar results as to those given by the circularkernel estimators. Smoothing parameter may be selected using theproposal described in Taylor (2008). An enhanced strategy is toinvestigate a range of values around the value given by cross-validation.


References1 Abe, T. and Pewsey, A. (2011). Symmetric Circular Models Through

Duplication and Cosine Perturbation. Computational Statistics &Data Analysis, 55(12), 3271–3282.

2 Bai, Z.D., Rao, C.R., Zhao, L.C. (1988). Kernel Estimators ofDensity Function of Directional Data. Journal of MultivariateAnalysis, 27(1), 24–39.

3 Babu, G. J.; Canty, A.; Chaubey, Y. (2002). Application of Bernsteinpolynomials for smooth estimation of a distribution and densityfunction. J. Statist. Plann. Inference , 105, 377-392.

4 Babu, G. Jogesh, and Chaubey, Yogendra P. (2006). Smoothestimation of a distribution and density function on hypercube usingBernstein polynomials for dependent random vectors. Statistics andProbability Letters, 76, 959-969.

5 Buckland S.T. (1992). Fitting density functions with polynomials. J.R. Stat. Soc., A41, 6376.


References6 Carnicero, J.A., Wiper, M.P. and Ausın, M.C. (2010). Circular

Bernstein polynomial distributions. Working Paper 10-25, Statisticsand Econometrics Series 11, Departamento de Estadstica UniversidadCarlos III de Madrid. e-archivo.uc3m.es/bitstream/handle/

10016/8318/ws102511.pdf?sequence=1

7 Cencov, N.N. (1980). Evaluation of an unknown distribution densityfrom observations. Soviet Math. Dokl., 3, 15591562.

8 Cencov, N.N. (1980).Statistical Decision Rules and OptimumInference. New York: Springer-Verlag.

9 Chaubey, Yogendra P. (2016). Smooth Kernel Estimation of aCircular Density Function: A Connection to Orthogonal Polynomialson the Unit Circle. Preprint, https://arxiv.org/abs/1601.05053

10 Chaubey, Yogendra P.; Li, J.; Sen, A. and Sen, P.K. (2012). A newsmooth density estimator for non-negative random variables. Journalof the Indian Statistical Association, 50, 83-104.


e-archivo.uc3m.es/bitstream/handle/10016/8318/ws102511.pdf?sequence=1

e-archivo.uc3m.es/bitstream/handle/10016/8318/ws102511.pdf?sequence=1

https://arxiv.org/abs/1601.05053

References

11 Devroye, L. and Gyorfi, L. (1985). Nonparametric Density Estimation:The L1 View. New York: John Wiley & Sons

12 Di Marzio, M., Panzera, A., Taylor, C.C. (2009). Local PolynomialRegression for Circular Predictors. Statistics & Probability Letters,79(19), 2066–2075.

13 Di Marzio M, Panzera A, Taylor C. C. (2011). Kernel densityestimation on the torus. Journal of Statistical Planning & Inference,141, 2156-2173.

14 Efromvich, S. (1999). Nonparametric Curve Estimation: Methods,Theory, and Applications, New York: Springer.

15 Efromvich, S. (2010). Orthogonal series density estimation. WileyInterdisciplinary Rreviews, 2, 467-476.


References

16 Egecioglu, Omer and Srinivasan, Ashok (2000). Efficientnonparametric density estimation on the sphere with applications influid mechanics. Siam J. Sci. Comput. 22 152-176.

17 Feller, W. (1965). An Introduction to Probability Theory and itsApplications, Vol. II. New York: Wiley.

18 Fernandez-Duran, J.J. (2004). Circular distributions based onnonnegative trigonometric sums, Biometrics, 60, 499–503.

19 Fisher, N.I. (1989). Smoothing a sample of circular data, J.Structural Geology, 11, 775–778.

20 Fisher, N.I. (1993). Statistical Analysis of Circular Data. CambridgeUniversity Press, Cambridge.


References

21 Hall P. (1980). Estimating a density on the positive half line by themethod of orthogonal series. Ann. Inst. Stat. Math. 32, 351362.

22 Hall P. (1981). On trigonometric series estimates of densities.Ann.Stat. 9, 683685.

23 Hall P, Watson GP, Cabrera J (1987). Kernel Density Estimation forSpherical Data. Biometrika, 74(4), 751–762.

24 Hart JD. (1997). Nonparametric Smoothing and Lack-of-Fit Tests.New York: Springer.

25 Jammalamadaka, S. R., SenGupta, A. (2001). Topics in CircularStatistics. World Scientific, Singapore.


References26 Jones, M.C. and Pewsey, A (2012). Inverse Batschelet Distributions

for Circular Data. Biometrics, 68(1), 183–193.

27 Kato, S. and Jones, M.C. (2010). A family of distributions on thecircle with links to, and applications arising from, Mobiustransformation. J. Amer. Statist. Assoc. 105, 249-262.

28 Kato, S. and Jones, M. C. (2015). A tractable and interpretablefour-parameter family of unimodal distributions on the circle.Biometrika, 102(9), 181–190.

29 Klemela, J. (2000). Estimation of Densities and Derivatives ofDensities with Directional Data. Journal of Multivariate Analysis,73(1), 18–40.

30 Tom H. Koornwinder, T.H., Wong, R.S.C., Koekoek R., Swarttouw,R.F. (2010). Orthogonal polynomials. In NIST Handbook ofMathematical Functions, Eds: Olver, Frank W. J.; Lozier, Daniel M.;Boisvert, Ronald F.; Clark, Charles W., London: Cambridge UniversityPress.


References

31 Mardia, K.V. (1972). Statistics of Directional Data. Academic Press,New York.

32 Mardia, K.V. and Jupp, P. E. (2000). Directional Statistics. JohnWiley & Sons, New York, NY, USA.

33 Mhaskar, H.N. and Pai, D.V. (2000). Fundamentals of approximationTheory. Narosa Publishing House, New Delhi, India.

34 Minh, D. and Farnum, N. (2003). Using bilinear transformations toinduce probability distributions. Commun. Stat.-Theory Meth., 32,1–9.

35 Mooney A., Helms P.J. and Jolliffe, I.T. (2003). Fitting mixtures ofvon Mises distributions: a case study involving sudden infant deathsyndrome. Comput. Stat. Data Anal., 41, 505– 513.


References

36 Parzen, E. (1962). On Estimation of a Probability Density Functionand Mode. Ann. Math. Statist, 33:3, 1065-1076.

37 Prakasa Rao, B.L.S. (1983). Non Parametric Functional Estimation.Academic Press, Orlando, Florida.

38 Rudin, W. (1987). Real and Complex Analysis- Third Edition.McGraw Hill: New York.

39 Rudzkis R, Radavicius M. (2005). Adaptive estimation of distributiondensity in the basis of algebraic polynomials. Theory Probab Appl,49, 93109.

40 Rosenblatt, M. (1956). Remarks on some nonparametric estimates ofa density function. Ann. Math. Stat., 27:56, 832837


References

41 Seshadri, V. (1991). A family of distributions related to theMcCullagh family. Statistics and Probability Letters, 12, 373–378.

42 Simon, B. (2005). Orthogonal Polynomials on the Unit Circle, Part 1:Classical Theory. American Mathematical Society, Providence, RhodeIsland.

43 Shimizu, K. and Iida, K. (2002). Pearson type vii distributions onspheres. Commun. Stat.–Theory Meth., 31, 513–526.

44 Silverman B. W. (1986). Density Estimation for Statistics and DataAnalysis. London: Chapman & Hall.

45 Tarter, M.E. and Lock, M. D. (1993). Model-Free Curve Estimation.New York: Chapman & Hall.


References

46 Taylor, C.C. (2008). Automatic Bandwidth Selection for CircularDensity Estimation. Computational Statistics & Data Analysis, 52(7),3493–3500.

47 Trefthen, L. N. (2013). Approximation Theory and ApproximationPractice. SIAM: Philadelphia

48 Vitale, R.A. (1973). A Bernstein polynomial approach to densityestimation. Commun Stat, 2, 493-506.

49 Walter, G. G. (1977). Properties of Hermite series estimation ofprobability density. Ann. Stat., 5, 12581264.

50 Walter, G. G. (1994). Wavelets and other Orthogonal Systems withApplications. London: CRC Press.


Talk slides will be available on SlideShare:www.slideshare.net/ychaubey/talk-slides-msast2016-70014046

THANKS!!


www.slideshare.net/ychaubey/talk-slides-msast2016-70014046

talk slides imsct2016

Education