making the impossible possible: randomized machine...

45
Making the Impossible Possible: Randomized Machine Learning Algorithms for Big Data Rong Jin Alibaba Group

Upload: others

Post on 28-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Making the Impossible Possible: Randomized Machine Learning

Algorithms for Big Data

Rong Jin

Alibaba Group

Page 2: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Big Data Challenge

• Data exists in the digital universe • 2012: 2.7 Zetabytes (1021) • 2020: 40 Zetabytes

• Huge amount of data generated

on the Internet every minute • YouTube users upload 300 hours of

video, • Facebook users share 4 million

pieces of content

http://www.fiercebigdata.com/story/how-much-data-created-internet-

every-minute/2015-08-14

Too much data to process

Page 3: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Big Data Challenge

High dimensional data

• E.g. millions of features have been used for image classification & online advertising

Page 4: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Data Size Matters ?

Matrix completion

• Classification, clustering, recommender systems

• Performance is measured by recovery error

Page 5: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Data Size Matters ?

O(rnlog2(n)): PERFECT Recovery

O(rnlog (n)): POOR Recovery

reco

very

err

or

# observed entries

O(rnlog (n)) O(rnlog2(n))

Un

kno

wn

# observed entries

Page 6: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Learning from Big Data is Hard ?

Even computing data average is non-trivial

• Each matrix Mi is sparse with size 1Bx1M

• Average matrix Z is much dense, too expensive to store

• Can we compute an approximate average Z’ without having to computing Z explicitly ?

Page 7: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Learning from Big Data is Hard ?

Turn matrix average into an optimization problem

Page 8: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Learning from Big Data is Hard ?

Turn matrix average into an optimization problem

• Solved efficiently by stochastic gradient descent

• Intermediate sparse solutions, strong guarantee

Page 9: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Learning from Big Data is Hard ?

• : training examples

• : a convex loss (e.g. )

• : a convex domain

Page 10: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Learning from Big Data is Hard ?

Require a large-scale optimization problem • Too many data points (109)

• Very high dimensionality (108)

Page 11: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Randomized Algorithms for Big Data

Randomized algorithms are efficient

• for large-sized data sets

• only need one pass of the entire data set

• for high dimensional data

• reduce dimensionality by random projection

Page 12: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Randomized Algorithms for Big Data

Randomized algorithms are efficient

• for large-sized data sets

• only need one pass of the entire data set

• for high dimensional data

• reduce dimensionality by random projection

Randomized algorithms are effective

• Minimizes the generalization error

Page 13: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Randomized Algorithms for Big Data

Limitations of randomized algorithms

• Random decision is suboptimal and can be very poor

We will focus our discussion on Random Projection

Page 14: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

Page 15: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Project data into a random low dimensional space

Gaussian Random Matrix S

Page 16: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Project data into a random low dimensional space

Page 17: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Recover the solution in the high dimensional space

Gaussian Random Matrix ST

Page 18: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Good news

random projections are sufficient if data is linearly separated with margin

Random Projection

High Dimensional Space

Low Dimensional Space

Recovery

Page 19: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

Random Projection

• is an poor approximation of

Page 20: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Impossibility theorem: for most random projection S,

S Random Projection

Page 21: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Random Projection

• Impossibility theorem: for most random projection S,

S

Is it possible to overcome the limitation of random projection

while enjoys its simplicity ?

Page 22: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Randomized Algorithms for Big Data

Limitations of randomized algorithms

• Random decision is suboptimal and can be very poor

How to overcome the fundamental limitations of randomized alg. in ML ?

Page 23: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Dual Random Projection

Random Projection

Page 24: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Dual Random Projection

Random Projection

Compute Dual Variables

Page 25: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Dual Random Projection

Random Projection

Compute Dual Variables

Dual Recovery

Page 26: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Dual Random Projection

Recovery property

• If X can be well approximated by a rank r matrix, with a high probability, we have

Page 27: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Dual Random Projection

Recovery property

• If X can be well approximated by a rank r matrix, with a high probability, we have

Page 28: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Why Dual Random Projection Work ?

• Although primal solution can’t be recovered accurately via random projection, dual variables can

• It is closely related to gradient descent

where

Page 29: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Iterative Dual Random Projection

Page 30: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Iterative Dual Random Projection

With high probability

where

Page 31: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Experiment with Synthetic Dataset

• N=50,000, d = 20,000, r=10

Page 32: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Experiment with RCV1 Dataset

• 800K documents, 40,000 features

Page 33: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Fine-Grained Visual Classification

• Fine-Grained Challenge 2013 (https://sites.google.com/site/fgcomp2013)

• Categories: air crafts, birds, dogs, shoes, cars

• Number of training images: 100K

Page 34: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Fine-Grained Visual Classification

• # Visual features: 134,016

• Our approach is based on metric learning

• Apply dual random projection to improve computational efficiency

Team Performance Inria-Xerox 77.1

CafeNet 75.8

VisionMetric

(Our method)

71.7

Symbiotic (University

of Oxford)

71.6

CognitiveVision

(MSR)

70.0

DPD_Berkeley

(Berkeley)

69.2

MPG (University of

Tokyo)

52.9

Infor_FG (CMU) 16.0

InterfAIce (UIUC) 4.5

Page 35: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Online Display Ads

Advertiser

• Market its products

User

• Find products/service

Platform

• Attract enough traffic

Page 36: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Online Display Ads

Advertiser • Choose target audience by

selecting appropriate tags Platform • Match users with ads

through tags Users • Profile by tag assignments • Assigned tags with the

largest scores (greedy approach)

Tag1 Tag2 …… Tag n

Page 37: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Supply & Demand Mismatch

Advertisers

• Limited budget limited supplies of tags

Platform

• Match users with ads through tags

Users

• Profile by tag assignments

• Assigned tags with the largest scores (greedy approach)

Tag1 Tag2 …… Tag n

Supply

Demand

5000

1000

1000

5000

Page 38: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Supply and Demand Mismatch (I)

• Assume consumers a & b come at random order

• On average, 50% of time b can’t find matched ad

Advertiser/Consumer budget a b

A 1 1.1 1

B 1 1 0

b a

A B

a b

A B

Page 39: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Supply and Demand Mismatch (II)

• Alternative solution: remove a from the list of target audience for ad A

• Both a and b will find their matched ad regardless of their order

Advertiser/Consumer a b

A 1

B 1 0

Advertiser/Consumer budget a b

A 1 1

B 1 1 0

b a

A B

Page 40: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Supply and Demand Mismatch in Alibaba

• Many targets with strong demand (i.e. consumers) but weak supply (i.e. advertisement budgets)

• Many targets with weak demand (i.e. consumers) but strong supply (i.e. advertisement budgets)

Page 41: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Minimize Mismatch: Global Optimization

• Find the best assignment of tags

1. maximize the revenue, and

2. minimize the supply and demand mismatch

• A gigantic optimization problem

• Billions of users and thousands of tags

• Need to find solutions in 2 hours

u1

u2

……

un

a1

a2

am

……

A

Users (109) tags (105)

Page 42: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Minimize Mismatch: Global Optimization

• Apply dual random projection to efficiently find the solution for A u1

u2

……

un

a1

a2

am

……

A

Users (109) Ads (104)

Random Projection

Obtain optimal solution & dual variables

Page 43: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Implementation

• Implement by Map-Reduce

Page 44: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

Results in Online Display Ads

• Reduce the supply and demand mismatch

After optimization

Before optimization

Page 45: Making the Impossible Possible: Randomized Machine ...acml-conf.org/2015/pub/talks/acmltalk_rjin.pdf · Making the Impossible Possible: Randomized Machine Learning Algorithms for

What Is the Next ?

• Impossibility theorems exist in many randomized algorithms in ML • Passive learning

• Active learning

• Data clustering

• Matrix completion

• Difference privacy

• Compressive sensing

• Low rank matrix approximation

• ……