mlconf nyc corinna cortes

40
Searching for Similar Items in Diverse Universes Large-scale machine learning at Google Corinna Cortes Google Research [email protected] 1

Upload: sessionsevents

Post on 26-Jan-2015

116 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: MLconf NYC Corinna Cortes

Searching for Similar Items in Diverse Universes

Large-scale machine learning at Google

Corinna Cortes�Google Research�

[email protected]

1

Page 2: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Browse for Fashion

2

Page 3: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Browse for Videos

3

Page 4: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Outline

Metric setting�

• Using similarities (efficiency)�

• Learning similarities (quality)�

Graph-based setting�

• Generating similarities

4

Page 5: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Image Browsing

Image browsing relies on some measure of similarity.

5

⎜⎜⎜⎝

x11

x12...

x1N

⎟⎟⎟⎠

⎜⎜⎜⎝

x21

x22...

x2N

⎟⎟⎟⎠

Sim(x1,x2) =

⎜⎜⎜⎝

x11

x12...

x1N

⎟⎟⎟⎠·

⎜⎜⎜⎝

x21

x22...

x2N

⎟⎟⎟⎠

Page 6: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Clustering of Images

Compute all the pair-wise distances between related images and form clusters:

6

Page 7: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Similar Images - version 0.0

7

Demo

Page 8: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Helsinki, some cluster

8

Page 9: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

“Machine Learning” comes up empty

9

Page 10: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

The Web is Huge

Image Swirl was precomputed and restricted to top 20K queries.�

People search on billions of different queries.�

Cannot precompute billions of distances�

we need to do something smarter.

10

Page 11: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Approximate Nearest Neighbors

Preprocessing:�

• Represent images by short vectors, kernel-PCA;�

• Grow tree top-down based on ‘random’ projections. Spill a bit for robustness;�

• Save the node ID(s) with each image. �

At query time:�

• propagate down tree to find node ID;�

• retrieve other image with same ID;�

• rank according to similarity with the short vector and other meta data.

11

Page 12: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Similar Images, Version 1.0

Live Search, Similar Images�

Inspect the single clusters�

• cluster 700�

• cluster 700 + machine learning�

• cluster 1000�

• cluster 400

12

Page 13: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 700

13

Page 14: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 700 + Machine Learning

14

Page 15: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 1000

15

Page 16: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 1000 + Machine Learning

16

Page 17: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 400

17

Page 18: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Cluster 400 + Machine Learning

18

Page 19: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Finding Similar Time Series

19

Page 20: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Flu Trends

20

http://www.google.org/flutrends/us/#US

Page 21: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Flu Trends

21

Page 22: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Split time series into N/k chunks:�

Represent each chunk with a set of cluster centers (256) using k-means. Save the coordinates of the centers, (ID, coordinates).�Save each series as a set of closest IDs, hashcode.

Asymmetric Hashing, Indexing

22

Page 23: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

For given input u, divide it into its N/k chunks, uj:�

• Compute the N/k * 256 distances to all centers.�

• Compute the distances to all hash codes:�

MN/k additions needed.�

The “Asymmetric” in “Asymmetric Hashing” refers to the fact that we hash the database vectors but not the search vector.

Asymmetric Hashing, Searching

23

d2(u, vi) =N/k∑

j=1

d2(uj , c(vij))

Page 24: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Google Correlate

http://www.google.com/trends/correlate/

24

Page 25: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Outline

Metric setting�

• Using similarities (efficiency)�

• Learning similarities (quality)�

Graph-based setting�

• Generating similarities

25

Page 26: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014Machine Learning at Google 2014�

Table Search

26

http://webdatacommons.org/webtables/�March 5, 2014: 11 billion HTML tables reduced to 147 million quasi-relational Web tables.

Page 27: MLconf NYC Corinna Cortes

Standard Learning with Kernels

The user is burdened with choosing an appropriate kernel.

User

Algorithm

m points(x, y)

27

Page 28: MLconf NYC Corinna Cortes

Learning Kernels

Demands less commitment from the user: instead of a specific kernel, only requires the definition of a family of kernels.

User

Algorithm

m points

28

Page 29: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014pageMachine Learning at Google

Two stages:�

Outperforms uniform baseline and previous algorithms.�

Centered alignment is key: different from notion used by (Cristiannini et al., 2001).

2014�

Centered Alignment-Based LK

K h

29

[C. Cortes, M. Mohri, and A. Rostamizadeh: Two-stage learning kernel methods, ICML 2010]

Page 30: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014Machine Learning at Google 2014

Centered AlignmentCentered kernels:�

Centered alignment:�

Choose kernel to maximize alignment with the target kernel:

where expectation is over pairs of points.

KY (xi, xj) = yiyj .

Page 31: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014pageMachine Learning at Google 2014�

Alignment-Based Kernel LearningTheoretical Results:�

• Concentration bound.�

• Existence of good predictors.�

Alignment algorithms:�

• Simple, highly scalable algorithm�

• Quadratic Program based algorithm.

31

Page 32: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014Machine Learning at Google 2014�

Table Search

http://research.google.com/tables�

• Machine Learning Conferences�

• Marathons USA�

Research Tool in Docs

32

Page 33: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Outline

Metric setting�

• Using similarities (efficiency)�

• Learning similarities (quality)�

Graph-based setting�

• Generating similarities

33

Page 34: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Graph-based setting

Personalized Page Rank �

• PPR used to densify the graph�

• Single-Linkage Clustering used to find clusters in the graph

34

Page 35: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Example

35

Page 36: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

PPR Clustering

Personalized PageRank (PPR) of : �• Probability of visiting in a random walk starting

at :�• With probability , go to a neighbor

uniformly at random.�• With probability , go back to ;�

Resulting graph: single hops enriched with multiple hops, resulting in a denser graph. Threshold graph at appropriate level.

36

Page 37: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

2 Pass Map-Reduce�

• Each push operation takes a single vertex u, moves an � fraction of the probability from r(u) onto p(u), and then spreads the remaining (1 � �) fraction within r, as if a single step of the lazy random walk were applied only to the vertex u.�

• Initialization: p = ~0 and r = indicator function at u

Algorithm

37

Page 38: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

“Single-Linkage” Clustering

Examples�

Can be parallelized;�

Can be repeated hierarchically.�

�38

Page 39: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014

Demo, Phileas, Landmark

The goal of this project is to develop a system to automatically recognize touristic landmarks and popular location in images and videos. Applications include automatic geo-tagging and annotation of photos or videos. Current clients of the service include Personal Photos Search, and Google Goggles,�

Map Explorer

39

Page 40: MLconf NYC Corinna Cortes

pageMachine Learning at Google 2014pageMachine Learning at Google 2014�

SummaryChallenging very large-scale problems:�

• efficient and effective similarity measures algorithms.�

Still more to do:�

• learning similarities for graph based algorithms;�

• similarity measures vs discrepancy measures:�

• match time series on spikes�

• match shoes with shoes, highlighting differences.

40