"quantum clustering - physics inspired clustering algorithm", sigalit bechler, researcher,...

20
Business Proprietary & Confidential SimilarWeb & Tel-Aviv university On Quantum Clustering Sigalit Bechler December 1, 2014

Upload: dataconomy-media

Post on 09-Jan-2017

91 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Business Proprietary & Confidential

SimilarWeb & Tel-Aviv universityOn

Quantum Clustering

Sigalit Bechler

December 1, 2014

Page 2: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Business Proprietary & Confidential

• SimilarWeb – a quick introduction

• Quantum Clustering

December 1, 2014

Agenda

Page 3: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

3/31

$65M

Funding

2007Founded 6

Offices300

Employees

SimilarWeb

Some of our clients

Page 4: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

What We Do

60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC ESTIMATION• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT

We Provide Digital Insights to the Entire World2M MOBILE APPS DAILYFOR EVERY MOBILE APP:RATINGENGAGEMENTAPP STORE DATACATEGORYKEYWORDS

Page 5: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

What We Do

60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC METRICS• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT

2M MOBILE APPS DAILYFOR EVERY MOBILE APP:• RATING• ENGAGEMENT• APP STORE• CATEGORY• KEYWORDS

INGEST:INTERNATIONAL PANEL, CRAWLING, ISP DATA, LEARNING SET

• 90K events/sec• 4TB/day compressed

BATCH & ON DEMAND PROCESSING:

• 100TB i/o a day• > 150 machines just in processing

cluster• Statistical & machine learning

algorithms

We Provide Digital Insights to the Entire World

Page 6: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Business Proprietary & Confidential

Quantum clustering

December 1, 2014

Prof. David Horn and Dr. Assaf Gottlieb.Phys. Rev. Lett. 88 (2002) 018702

Page 7: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Unsupervised learning problem - dealing with unlabeled data• Goal: group together elements that are similar to each other in some sense.• We usually have an idea or a desire of what this “sense” should be• Might discover new patterns

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 8: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• The user identity is unknown• Leaving it in for the example

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

?

?

?

?

?

?

?

?

Page 9: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Grouping by gender

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 10: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Grouping by fields of interest

Clustering- general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 11: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Quantum Clustering - Motivation

• Relatively easy clustering task

• Still need to set the number of clusters manually.

• Very complex clustering task. • Unbiased analysis of X-Ray

absorption data

Page 12: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Quantum Clustering - Example

Analyzing Big Data with Dynamic Quantum Clustering M. Weinstein, F. Meirer, A. Hume, Ph. Sciau, G. Shaked, R. Hofstetter, E. Persi, A. Mehta, D. Horn http://arxiv.org/abs/1310.2700

Page 13: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Information era - big data• Massive collection of data• Strong presence of outliers• Unknown structures• Non trivial patterns

Why is it important?

Quantum Clustering

Distributed computationtechnologies

Page 14: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Quantum clustering - the potential trick1. Turn data-points into Gaussians centered around the data points:

2. Plug into Schrodinger equation and find V(). Define the solution for V as the potential transform

• Single point → Gaussian →• Multi-points: =

3. Move each data point towards the direction of the minima of the according to the potential surface with gradient descent.

Page 15: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Quantum clustering – reasoning

• Why does it make sense?• Models the divergence effects from the cluster center.• V() : The effects that bind points from the same cluster together.• We may say that we are looking for the minima of V() since this is where the

divergence effects are minimal (slow changes – small numerator and high density- denominator:

• SVD may be performed prior to the clustering: X=USVT , perform QC on U or V• Solve the fact that each feature is of a different dimension type, and scale.• enable dimension reduction to those with the highest variance.

Page 16: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

A topographic map of the probability distribution for the crab data set with =1/2 using principal components 2 and 3. There exists only one maximum.

A topographic map of the potential for the crab data set with =1/2 using principal components 2 and 3 . The four minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.

The Crabs Example (from Ripley’s textbook), 4 classes, 50 samples each, d=5

The data 3D Plot of the potential

Page 17: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Quantum clustering - summary

• Built-in capability to handle outliers (divergence part): no need for additional parameters or processes, no effect on the amount of significant clusters

• The cluster may be a line or other shape and not necessarily a point in the feature space.

• The clusters are not defined by geometric or probability considerations alone

• No need to pre-define the amount clusters

Page 18: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Existing approximated quantum clustering variation for improving time complexity.

• Sensitive to small variations in the data density unlike geometry consideration alone.

• Possible Distributed calculation:• Since all we have is to calculate V, V for every data point parts can be calculated at

each point separately in a different machine

• Performed exceptionally in exposing hidden patterns of data structures from a wide range of fields - finance, on-line marketing, experimental physics, speech-recognition, biological data.

Quantum clustering

Page 19: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

• Physics may provide interesting perspective to questions that at the first glance has no connection to physics.

• It has been done in scale space theory • Sensitive to small variations in the data density• In bio-informatics for extracting protein structure• And many more

Quantum clustering

Page 20: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web

Business Proprietary & Confidential

Thank You!

December 1, 2014