geometric and combinatorial issues in data depth
DESCRIPTION
Greg Aloupis Universit é Libre de Bruxelles. Geometric and combinatorial issues in data depth. What is data depth?. A quantitative measurement of how central a point is with respect to a data set. Goals: to be able to rank data points, and to find the center of the data cloud. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/1.jpg)
Geometric and combinatorial issues in data depth
Greg AloupisUniversité Libre de Bruxelles
![Page 2: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/2.jpg)
What is data depth?
A quantitative measurement of how central a
point is with respect to a data set.
Goals: to be able to rank data points, and to find the center of the data cloud.
![Page 3: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/3.jpg)
Some geometric bivariate medians
Convex hull peeling (Tukey ’70s) ’85 Chazelle (nlogn)
Halfspace median (Tukey ’74) ’01 Langerman-Steiger O(nlog 3 n), ’03 Chan O(nlog n) -
randomized Oja median (Oja ’83)
’01 G.A.-Langerman-Soss-Toussaint O(nlog 3 n) Simplicial median (Liu ’88)
’01 ALST O(n4)
![Page 4: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/4.jpg)
Convex hull peeling
![Page 5: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/5.jpg)
Convex hull peeling
![Page 6: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/6.jpg)
Convex hull peeling
![Page 7: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/7.jpg)
Convex hull peeling
![Page 8: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/8.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
Each median is a point with max/min depth
![Page 9: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/9.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Tukey) halfspace depth: For every line through , count points
above/below. Return minimum number counted over all
lines.
![Page 10: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/10.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Tukey) halfspace depth: For every line through , count points
above/below. Return minimum number counted over all
lines.
![Page 11: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/11.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Tukey) halfspace depth: For every line through , count points
above/below. Return minimum number counted over all
lines.
![Page 12: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/12.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Tukey) halfspace depth: For every line through , count points above. Return minimum number counted over all
lines.
![Page 13: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/13.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
![Page 14: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/14.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
![Page 15: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/15.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
![Page 16: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/16.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
![Page 17: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/17.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
![Page 18: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/18.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
(Liu) simplicial depth: Count the closed triangles in S that contain .
…etc
![Page 19: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/19.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
Oja depth: Sum areas of all triangles with vertices (,si ,sj)
![Page 20: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/20.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
Oja depth: Sum areas of all triangles with vertices (,si ,sj)
![Page 21: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/21.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
Oja depth: Sum areas of all triangles with vertices (,si ,sj)
![Page 22: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/22.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
Oja depth: Sum areas of all triangles with vertices (,si ,sj)
…etc
![Page 23: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/23.jpg)
Halfspace, simplicial and Oja depthsof a point in bivariate data set S
O(nlog n): Khuller-Mitchell ’89, Gil-Steiger-Wigderson ’92, Roussewu-Ruts ‘96
(nlog n): G.A.-Cortes-Gomez-Soss-Toussaint ’01, Langerman-Steiger ’01, G.A.-McLeish ’05
![Page 24: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/24.jpg)
Issue 1:What is the complexity of computing the depth k of a point if k is known to be small/large? If the peel median has depth k>1 then can we
compute it faster? (GSW’92)
!!! this just in: simplicial depth in O(n+nlog (1+ k/n))
Elmasry-Elbassioni CCCG last week Is there a lower bound, sensitive to parameter k?
Something similar for halfspace depth? Current attempts for O(nlog k)
![Page 25: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/25.jpg)
Remember, that horrible n4 result a few slides
back
Issue 2: (Improve) simplicial median computation
![Page 26: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/26.jpg)
I : the set of line segments between pairs in S.
The simplicial median is on an intersection of two segments in I.
Easy observation
![Page 27: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/27.jpg)
Outline of a method Preprocessing: O(n3) brute-force, actually O(n2)
Count number of points above/below each segment.
Compute depth of all points.
For each segment, sort all intersections with other segments.
O(n2log n). Calculate depth of each intersection in O(1)
time: O(n2)
Overall O(n4log n)
![Page 28: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/28.jpg)
![Page 29: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/29.jpg)
![Page 30: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/30.jpg)
![Page 31: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/31.jpg)
![Page 32: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/32.jpg)
![Page 33: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/33.jpg)
![Page 34: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/34.jpg)
![Page 35: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/35.jpg)
![Page 36: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/36.jpg)
![Page 37: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/37.jpg)
Constant time to update depth as we walk
![Page 38: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/38.jpg)
Instead of sorting intersection points and processing each segment alone, we can use topological sweep.
The time complexity becomes O(n4) and the space used is O(n2).
Can we improve this? i.e. find some structure in this depth function
![Page 39: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/39.jpg)
Conjecture: A point of maximum simplicial depth
can always be found on the intersection of two halving segments
(weak) experiments have not contradicted this
![Page 40: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/40.jpg)
Desirable properties of data depth functions
Affine invariance (at the very least)
Robustness: Outliers should not influence the center.
Monotonicity: Center should move in “same” direction as
perturbations
![Page 41: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/41.jpg)
monotonicity
![Page 42: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/42.jpg)
![Page 43: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/43.jpg)
Robustness to outliers breakdown point: fraction of data that must be
moved/added so that median is placed at infinity. Oja median: was considered to be robust, but
finally it was shown that the breakdown point can be near zero for certain configurations. (planar case) (Niinimaa,Oja,Tableman ’90)
simplicial median: don’t know. But the data point of maximum depth can be moved away with few corrupting points (GSW ’92) (planar case)
halfspace median: great! … 1/(d+1) (Donoho,Gasko ’92)
![Page 44: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/44.jpg)
Robustness to outliers breakdown point: fraction of data that must
be moved/added so that median is placed at infinity.
Max breakdown = ½ In 1D, only the median is affine invariant,
monotonic and has max breakdown Is there such an estimator in higher
dimensions?
![Page 45: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/45.jpg)
Issue 3How does the breakdown point depend on the depth of the median?
Convex peeling: breakdown is …zero, unless depth is linear (GSW’92)
Halfspace breakdown is higher (1/3) for centrosymmetric data distributions, where depth is roughly 1/2
Instead of 1/(d+1) So what can we say about other estimators?
For deepest point in plane For deepest data point
![Page 46: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/46.jpg)
Issue 4:Non-strategic breakdownAll work so far involved carefully placing outliers
(erroneous or corrupt data), to move an estimator far away.(is corrupt data really placed carefully in practice?)
What about: • average outliers (random or evenly spaced placement) • strong breakdown (should work regardless of direction at
infinity)• special-case outliers (axis-parallel, or radial extension, or ?)
![Page 47: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/47.jpg)
Issue 5:Computing/analyzing other estimators Projection outlyingness of (Donoho-Gasko ’92) Take max of the following, over all projections : |Median| / (median deviation from median)
Find an algorithm for the least outlying point.
Gil-Steiger-Wigderson: superposition of unit vectors to data points = v(ai) median is a (data) point with || v(ai) R || < 1 ??? computation in o(n2) ? Properties?
Zonoid depth, Delaunay depth …
![Page 48: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/48.jpg)
Issue 6Points of high depth
A point w/ Tukey depth n/(d+1) is a centerpoint. Guaranteed to exist, by Helly’s thm. O(n) time computation (Jadhav-Mukhopadhyay ’94) Can be considered to be a median generalization.
¼ (n 3) simplicial depth 2/9 (n 3) (Boros-Furedi ’82) (in R2 , ignoring quadratic terms)
Can we compute a “high” depth point quickly? Tverberg points in R2 have depth 1/27 (n 3) and can
be computed in O(n) time. Anything better? Is there a point with “high” Oja depth? (normalized)
![Page 49: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/49.jpg)
Things I may have mentioned in the abstract but forgot to include here: Is it faster to locate a deep point
without computing its depth? How many points have depth>k ? When do simplicial depth levels become
disconnected?
![Page 50: Geometric and combinatorial issues in data depth](https://reader035.vdocuments.mx/reader035/viewer/2022070420/56815e02550346895dcc48be/html5/thumbnails/50.jpg)
merci