topological data analysis: visual presentation of multidimensional data sets
DESCRIPTION
Topology data analysis (TDA) is an unsupervised approach which may revolutionise the way data can be mined and eventually drive the new generation of analytical tools. The idea behind TDA is an attempt to "measure" shape of data and find compressed combinatorial representation of the shape. In ordinary topology, the combinatorial representations serve the purpose of providing the compressed representation of high dimensional data sets which retains information about the geometric relationships between data points. TDA can also be used as a very powerful clustering technique. Edward will present the comparison between TDA and other dimension reduction algorithms like PCA, LLE, Isomap, MDS, and Spectral Embedding.TRANSCRIPT
Topological Data Analysis
Visual presentation of multidimensional data sets
Current vs New SQL Topological Data Analysis
Topology
The Seven Bridges of Königsberg, a problem solved by Leonard Euler (1736).
The study of qualitative properties of certain objects (topological spaces) that are invariant under a certain kind of transformation (continuous map), especially those properties that are invariant under a certain kind of equivalence (homeomorphism).
Topology Data Analysis Pipeline
a b
a. First approximate the unknown space X in a combinatorial structure K
b. Then compute topological invariants of K
Combinatorial Representations The Čech Complex
Combinatorial Representations Alpha Complex Vietoris-‐‑Rips Complex
Cubical Complex Witness Complex
Topological Invariants A topological invariant is a map f that assigns the same object to homeomorphic spaces, that is:
Homology: is a machine that converts local data about a space into global algebraic structure
Reference: Wikipedia, 2010.
Morse Theory and Reeb Graph Theorem: Suppose h : X g is a discrete Morse function. Then X is homotopy equivalent to a CW-‐‑complex with exactly one cell of dimension p for each critical simplex of dimension p.
Reference: Teng Ma ; Zhuangzhi Wu ; Pei Luo ; Lu Feng. Reeb graph computation through spectral clustering, 2011.
Case study: Demographics
Data shape: [220:45]
Case study: YT channel stats
Data shape: [1500:12]
Case study: Netflix dataset
Data shape: [17770:480189] 8.5 billions of elements
Case study: Netflix dataset
Music
Indian
Anime
French
Honk Kong
US Cartoons
Kids Movie
German
US Retro
Horror
Case study: Netflix comparison
PCA Isomap
LLE
Spectral Embedding
LTSA Hessian LLE
Case study: Netflix (music)
Case study: Netflix (kids movie)
Case study: Netflix (horror)
Questions?