shape as organizing principle for data

Post on 07-Aug-2015

61 Views

Category:

Engineering

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shape as Organizing Principle for Data

MLConf, SF 2014

Anthony Bak, Principal Data Scientist

The Data Problem: Complexity

Solution: Topological Summaries

Shape as Organizing Principle for Data

Shape as Organizing Principle

Reduce Bias, Discover Models

Want to Discover the underlying structure without bias.

TDA analyzes the data you have, not the data you want to have.

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Generating Topological Summaries

Remember/Forget

Use multiple lenses/metrics to get the complete picture

Different lenses provide different summaries

Generating Topological Summaries

Lenses: where do they come from?

Mean/Max/MinVariancen-MomentDensity…

Statistics

PCA/SVDAutoencodersIsomap/MDS/TSNE…

Machine Learning

CentralityCurvatureHarmonic Cycles…

Geometry

Why Topology?

Key Properties of TDA

Deformation Invariance

Compressed Representation

Coordinate Freeness

Coordinate Invariance

1. Topology of shape doesn’t depend on the coordinates used to describe the shape

2. Different feature sets can describe the same phenomena

3. While processing data, we frequently alter coordinates: scaling, rotating, whitening

You want to study properties of your data that are invariant under coordinate changes

Coordinate Invariance: Gene Expression

NKI

GSE230

Deformation Invariance

• Topological features don’t change when you stretch and distort the data

Advantage: Makes problems easier

Noise resistance Less pre-processing of data Robust (stable) data

Deformation Invariance

Deformation Invariance

Deformation Invariance

Deformation Invariance

Compressed Representation

• Replace the metric space with a combinatorial summary: a simplicial complex.

• Data becomes easier to manage, search, and query while maintaining essential features.

• Leverages many known algorithms from graph theory, computational topology, computational geometry.

Compressed Representation

Baby Steps: PCA

PCA

PCA

Data Stories

Model Introspection

Model Introspection

Predictive Maintenance

Customer Churn

Customer Churn

Customer Churn

Transaction Fraud

Transaction Fraud

Transaction Fraud

Data has shape, Shape has meaning.

http://www.ayasdi.com/company/careers/

top related