similarity-based learning via data driven embeddingssimilarity-based learning via data driven...
TRANSCRIPT
![Page 1: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/1.jpg)
Similarity-based Learning via Data DrivenEmbeddings∗
Purushottam Kar1 Prateek Jain2
1Indian Institute of TechnologyKanpur
2Microsoft Research IndiaBengaluru
November 3, 2011
∗To appear in the proceedings of NIPS 2011P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 1 / 29
![Page 2: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/2.jpg)
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 2 / 29
![Page 3: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/3.jpg)
Learning
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 4: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/4.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 5: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/5.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 6: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/6.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 7: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/7.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 8: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/8.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 9: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/9.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 10: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/10.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 11: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/11.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 12: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/12.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 13: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/13.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 14: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/14.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 15: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/15.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 16: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/16.jpg)
Learning
Digit Classification†
†MNIST database : http://yann.lecun.com/exdb/mnist/P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 3 / 29
![Page 17: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/17.jpg)
Learning
Spam mail detection
Dear Junta,
The Hall-8 mess will be closed for theoccasion of Diwali at lunch & dinnertime. The breakfast will be servedalong with Lunch packets tomorrow (26thOctober, 2011).
Please collect your Lunch Packet. Themess would resume its normal working from27th October.
A legitimate mail
Hello,I am resending my previous mail to you,I hope you do get it this time aroundand understand its content fully. Iam contacting you briefly based on theInvestment of Forty Five Million Dollars(US$ 45,000,000:00) in your country, as Ipresently have a client who is interestedin investing in your country.Sincerely Yours,J. Costa
Most likely a spam mail
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMINAR SERIESDepartmental Colloquium
Title: Similarity-based Learning via Data Driven Embeddings
Speaker: Purushottam Kar
Affiliation: Ph.D. Scholar, CSE Dept., IIT Kanpur
To each his own ...
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 4 / 29
![Page 18: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/18.jpg)
Learning
Spam mail detection
Dear Junta,
The Hall-8 mess will be closed for theoccasion of Diwali at lunch & dinnertime. The breakfast will be servedalong with Lunch packets tomorrow (26thOctober, 2011).
Please collect your Lunch Packet. Themess would resume its normal working from27th October.
A legitimate mail
Hello,I am resending my previous mail to you,I hope you do get it this time aroundand understand its content fully. Iam contacting you briefly based on theInvestment of Forty Five Million Dollars(US$ 45,000,000:00) in your country, as Ipresently have a client who is interestedin investing in your country.Sincerely Yours,J. Costa
Most likely a spam mail
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMINAR SERIESDepartmental Colloquium
Title: Similarity-based Learning via Data Driven Embeddings
Speaker: Purushottam Kar
Affiliation: Ph.D. Scholar, CSE Dept., IIT Kanpur
To each his own ...
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 4 / 29
![Page 19: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/19.jpg)
Learning
Spam mail detection
Dear Junta,
The Hall-8 mess will be closed for theoccasion of Diwali at lunch & dinnertime. The breakfast will be servedalong with Lunch packets tomorrow (26thOctober, 2011).
Please collect your Lunch Packet. Themess would resume its normal working from27th October.
A legitimate mail
Hello,I am resending my previous mail to you,I hope you do get it this time aroundand understand its content fully. Iam contacting you briefly based on theInvestment of Forty Five Million Dollars(US$ 45,000,000:00) in your country, as Ipresently have a client who is interestedin investing in your country.Sincerely Yours,J. Costa
Most likely a spam mail
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMINAR SERIESDepartmental Colloquium
Title: Similarity-based Learning via Data Driven Embeddings
Speaker: Purushottam Kar
Affiliation: Ph.D. Scholar, CSE Dept., IIT Kanpur
To each his own ...
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 4 / 29
![Page 20: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/20.jpg)
Learning
Spam mail detection
Dear Junta,
The Hall-8 mess will be closed for theoccasion of Diwali at lunch & dinnertime. The breakfast will be servedalong with Lunch packets tomorrow (26thOctober, 2011).
Please collect your Lunch Packet. Themess would resume its normal working from27th October.
A legitimate mail
Hello,I am resending my previous mail to you,I hope you do get it this time aroundand understand its content fully. Iam contacting you briefly based on theInvestment of Forty Five Million Dollars(US$ 45,000,000:00) in your country, as Ipresently have a client who is interestedin investing in your country.Sincerely Yours,J. Costa
Most likely a spam mail
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMINAR SERIESDepartmental Colloquium
Title: Similarity-based Learning via Data Driven Embeddings
Speaker: Purushottam Kar
Affiliation: Ph.D. Scholar, CSE Dept., IIT Kanpur
To each his own ...
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 4 / 29
![Page 21: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/21.jpg)
Learning
More formally ...
We are working over a domain X and wish to learn a targetclassifier over the domain ` : X → {−1,+1}.
We are given training points S = {x1, x2, . . . , xn} sampled fromsome distribution D over X and their true labels {`(x1), . . . , `(xn)}.Our goal is to output a classifier ˆ : X → {−1,+1} such that itmostly gives out the true labels.
Prx∼D
[ˆ(x) 6= `(x)
]< ε
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 5 / 29
![Page 22: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/22.jpg)
Learning
More formally ...
We are working over a domain X and wish to learn a targetclassifier over the domain ` : X → {−1,+1}.We are given training points S = {x1, x2, . . . , xn} sampled fromsome distribution D over X and their true labels {`(x1), . . . , `(xn)}.
Our goal is to output a classifier ˆ : X → {−1,+1} such that itmostly gives out the true labels.
Prx∼D
[ˆ(x) 6= `(x)
]< ε
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 5 / 29
![Page 23: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/23.jpg)
Learning
More formally ...
We are working over a domain X and wish to learn a targetclassifier over the domain ` : X → {−1,+1}.We are given training points S = {x1, x2, . . . , xn} sampled fromsome distribution D over X and their true labels {`(x1), . . . , `(xn)}.Our goal is to output a classifier ˆ : X → {−1,+1} such that itmostly gives out the true labels.
Prx∼D
[ˆ(x) 6= `(x)
]< ε
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 5 / 29
![Page 24: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/24.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 25: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/25.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?
SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 26: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/26.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 27: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/27.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 28: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/28.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 29: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/29.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data
X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 30: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/30.jpg)
Learning
Representing the data
Most learning algorithms (Perceptron, MRF, DBN, SVM, ...) likeworking with numeric data i.e. X ⊂ Rd
How to make heterogeneous data (images, sound, web data)numeric ?SOLUTION 1 : Force a numeric representation by embedding alldata in some Euclidean space Rd
Φ : X → Rd
I Easy to do for images : (n × n) pixels 7→ R3n2for RGB images
I Easier said than done for text, emails, web data (eg. BoW for text)
SOLUTION 2 : Work with some distance/similarity function overthe data X
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 6 / 29
![Page 31: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/31.jpg)
Learning with Similarities
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 32: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/32.jpg)
Learning with Similarities
Classical algorithms that learn with similarities
Let K be a similarity measure (or w.l.o.g. a distance measure)
Nearest neighbor classification
ˆ(x) = `(NN(x))
NN(x) = arg maxx ′∈S
[K (x , x ′)
]Perceptron algorithm : X ⊂ Rd
ˆ(x) = sgn (〈w , x〉) for some w ∈ Rd
ˆ(x) = sgn
(∑x ′∈S
α(x ′)K (x , x ′)`(x ′)
)K (x , x ′) =
⟨x , x ′
⟩w =
∑x ′∈S
α(x ′)`(x ′)
SVM allows use of arbitrary Positive semi-definite kernels
ˆ(x) = sgn
(∑x ′∈S
αSVM(x ′)K (x , x ′)`(x ′)
)
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 33: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/33.jpg)
Learning with Similarities
Classical algorithms that learn with similarities
Let K be a similarity measure (or w.l.o.g. a distance measure)Nearest neighbor classification
ˆ(x) = `(NN(x))
NN(x) = arg maxx ′∈S
[K (x , x ′)
]
Perceptron algorithm : X ⊂ Rd
ˆ(x) = sgn (〈w , x〉) for some w ∈ Rd
ˆ(x) = sgn
(∑x ′∈S
α(x ′)K (x , x ′)`(x ′)
)K (x , x ′) =
⟨x , x ′
⟩w =
∑x ′∈S
α(x ′)`(x ′)
SVM allows use of arbitrary Positive semi-definite kernels
ˆ(x) = sgn
(∑x ′∈S
αSVM(x ′)K (x , x ′)`(x ′)
)
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 34: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/34.jpg)
Learning with Similarities
Classical algorithms that learn with similarities
Let K be a similarity measure (or w.l.o.g. a distance measure)Nearest neighbor classification
ˆ(x) = `(NN(x))
NN(x) = arg maxx ′∈S
[K (x , x ′)
]Perceptron algorithm : X ⊂ Rd
ˆ(x) = sgn (〈w , x〉) for some w ∈ Rd
ˆ(x) = sgn
(∑x ′∈S
α(x ′)K (x , x ′)`(x ′)
)K (x , x ′) =
⟨x , x ′
⟩w =
∑x ′∈S
α(x ′)`(x ′)
SVM allows use of arbitrary Positive semi-definite kernels
ˆ(x) = sgn
(∑x ′∈S
αSVM(x ′)K (x , x ′)`(x ′)
)
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 35: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/35.jpg)
Learning with Similarities
Classical algorithms that learn with similarities
Let K be a similarity measure (or w.l.o.g. a distance measure)Nearest neighbor classification
ˆ(x) = `(NN(x))
NN(x) = arg maxx ′∈S
[K (x , x ′)
]Perceptron algorithm : X ⊂ Rd
ˆ(x) = sgn (〈w , x〉) for some w ∈ Rd
ˆ(x) = sgn
(∑x ′∈S
α(x ′)K (x , x ′)`(x ′)
)K (x , x ′) =
⟨x , x ′
⟩w =
∑x ′∈S
α(x ′)`(x ′)
SVM allows use of arbitrary Positive semi-definite kernels
ˆ(x) = sgn
(∑x ′∈S
αSVM(x ′)K (x , x ′)`(x ′)
)
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 36: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/36.jpg)
Learning with Similarities
Classical algorithms that learn with similarities
Let K be a similarity measure (or w.l.o.g. a distance measure)Nearest neighbor classification
ˆ(x) = `(NN(x))
NN(x) = arg maxx ′∈S
[K (x , x ′)
]Perceptron algorithm : X ⊂ Rd
ˆ(x) = sgn (〈w , x〉) for some w ∈ Rd
ˆ(x) = sgn
(∑x ′∈S
α(x ′)K (x , x ′)`(x ′)
)K (x , x ′) =
⟨x , x ′
⟩w =
∑x ′∈S
α(x ′)`(x ′)
SVM allows use of arbitrary Positive semi-definite kernels
ˆ(x) = sgn
(∑x ′∈S
αSVM(x ′)K (x , x ′)`(x ′)
)
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 7 / 29
![Page 37: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/37.jpg)
Learning with Similarities
Learning with Similarities
A lot of work was done in trying to incorporate various similaritymeasures, distance measures into such frameworks[Pekalska and Duin, 2001, Weinberger and Saul, 2009]
A fair amount went into algorithms that did not require PSDkernels as SVMs do [Goldfarb, 1984]Some very nice work involving isometric embeddings to(pseudo)Hilbert / Banach spaces [Gottlieb et al., 2010,von Luxburg and Bousquet, 2004, Haasdonk, 2005]However, none addressed the issue of suitability of thesimilarity/distance measure to the learning task
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 8 / 29
![Page 38: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/38.jpg)
Learning with Similarities
Learning with Similarities
A lot of work was done in trying to incorporate various similaritymeasures, distance measures into such frameworks[Pekalska and Duin, 2001, Weinberger and Saul, 2009]A fair amount went into algorithms that did not require PSDkernels as SVMs do [Goldfarb, 1984]
Some very nice work involving isometric embeddings to(pseudo)Hilbert / Banach spaces [Gottlieb et al., 2010,von Luxburg and Bousquet, 2004, Haasdonk, 2005]However, none addressed the issue of suitability of thesimilarity/distance measure to the learning task
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 8 / 29
![Page 39: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/39.jpg)
Learning with Similarities
Learning with Similarities
A lot of work was done in trying to incorporate various similaritymeasures, distance measures into such frameworks[Pekalska and Duin, 2001, Weinberger and Saul, 2009]A fair amount went into algorithms that did not require PSDkernels as SVMs do [Goldfarb, 1984]Some very nice work involving isometric embeddings to(pseudo)Hilbert / Banach spaces [Gottlieb et al., 2010,von Luxburg and Bousquet, 2004, Haasdonk, 2005]
However, none addressed the issue of suitability of thesimilarity/distance measure to the learning task
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 8 / 29
![Page 40: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/40.jpg)
Learning with Similarities
Learning with Similarities
A lot of work was done in trying to incorporate various similaritymeasures, distance measures into such frameworks[Pekalska and Duin, 2001, Weinberger and Saul, 2009]A fair amount went into algorithms that did not require PSDkernels as SVMs do [Goldfarb, 1984]Some very nice work involving isometric embeddings to(pseudo)Hilbert / Banach spaces [Gottlieb et al., 2010,von Luxburg and Bousquet, 2004, Haasdonk, 2005]However, none addressed the issue of suitability of thesimilarity/distance measure to the learning task
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 8 / 29
![Page 41: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/41.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformance
It is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 42: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/42.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performance
In general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 43: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/43.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)
Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 44: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/44.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 45: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/45.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 46: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/46.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 47: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/47.jpg)
Learning with Similarities
Suitable Similarities
A suitable similarity should intuitively give better classifierperformanceIt is very well known that the choice of the kernel has a significantimpact on SVM classifier performanceIn general, several domains have preferred notions of similarity(e.g. earth mover’s distance for images)Can formal notions of suitability lead to guaranteed performance ?
I For SVMs, suitability is formalized in terms of the margin offered bythe PSD kernel in its RKHS
I Having large margin does lead to generalization bounds[Shawe-Taylor et al., 1998, Balcan et al., 2006]
Can we do the same for non-PSD similarities ?
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 48: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/48.jpg)
Learning with Suitable Similarities
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 10 / 29
![Page 49: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/49.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 9 / 29
![Page 50: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/50.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
What is a good similarity function ?
Intuitively, a good similarity function should at least respect thelabeling of the domain
It should not assign small similarity to points with same label andlarge similarity to distinctly labeled points
Definition ([Balcan and Blum, 2006])A similarity K : X × X → R is said to be (ε, γ)-good for a classificationproblem if for some weighing function w : X → [−1,1], at least a(1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w(x ′)
K (x , x ′)− w(x ′′)
K (x , x ′′)]≥ γ
In other words, according to the similarity function, most points, onan average, are more similar to points of the same label
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 10 / 29
![Page 51: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/51.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
What is a good similarity function ?
Intuitively, a good similarity function should at least respect thelabeling of the domainIt should not assign small similarity to points with same label andlarge similarity to distinctly labeled points
Definition ([Balcan and Blum, 2006])A similarity K : X × X → R is said to be (ε, γ)-good for a classificationproblem if for some weighing function w : X → [−1,1], at least a(1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w(x ′)
K (x , x ′)− w(x ′′)
K (x , x ′′)]≥ γ
In other words, according to the similarity function, most points, onan average, are more similar to points of the same label
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 10 / 29
![Page 52: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/52.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
What is a good similarity function ?
Intuitively, a good similarity function should at least respect thelabeling of the domainIt should not assign small similarity to points with same label andlarge similarity to distinctly labeled points
Definition ([Balcan and Blum, 2006])A similarity K : X × X → R is said to be (ε, γ)-good for a classificationproblem if for some weighing function w : X → [−1,1], at least a(1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w(x ′)
K (x , x ′)− w(x ′′)
K (x , x ′′)]≥ γ
In other words, according to the similarity function, most points, onan average, are more similar to points of the same label
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 10 / 29
![Page 53: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/53.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
What is a good similarity function ?
Intuitively, a good similarity function should at least respect thelabeling of the domainIt should not assign small similarity to points with same label andlarge similarity to distinctly labeled points
Definition ([Balcan and Blum, 2006])A similarity K : X × X → R is said to be (ε, γ)-good for a classificationproblem if for some weighing function w : X → [−1,1], at least a(1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w(x ′)
K (x , x ′)− w(x ′′)
K (x , x ′′)]≥ γ
In other words, according to the similarity function, most points, onan average, are more similar to points of the same label
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 10 / 29
![Page 54: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/54.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
Learning with a good similarity function
Theorem ([Balcan and Blum, 2006])
Given an (ε, γ)-good similarity function, for any δ > 0, given n = 16γ2 lg 2
δ
labeled points (xi)ni=1, the classifier ˆdefined below has error at margin
γ2 no more than ε+ δ with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
w(xi)`(xi)K (x , xi)
)
Notice that the classifier is very similar in form to the SVM andPerceptron classifiersConsequently one can use these algorithms to learn this classifieras well
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 11 / 29
![Page 55: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/55.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
Learning with a good similarity function
Theorem ([Balcan and Blum, 2006])
Given an (ε, γ)-good similarity function, for any δ > 0, given n = 16γ2 lg 2
δ
labeled points (xi)ni=1, the classifier ˆdefined below has error at margin
γ2 no more than ε+ δ with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
w(xi)`(xi)K (x , xi)
)
Notice that the classifier is very similar in form to the SVM andPerceptron classifiers
Consequently one can use these algorithms to learn this classifieras well
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 11 / 29
![Page 56: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/56.jpg)
Learning with Suitable Similarities Learning with a Suitable Similarity Function
Learning with a good similarity function
Theorem ([Balcan and Blum, 2006])
Given an (ε, γ)-good similarity function, for any δ > 0, given n = 16γ2 lg 2
δ
labeled points (xi)ni=1, the classifier ˆdefined below has error at margin
γ2 no more than ε+ δ with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
w(xi)`(xi)K (x , xi)
)
Notice that the classifier is very similar in form to the SVM andPerceptron classifiersConsequently one can use these algorithms to learn this classifieras well
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 11 / 29
![Page 57: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/57.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 11 / 29
![Page 58: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/58.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
What is a good distance function
Definition ([Wang et al., 2007])A distance function d : X × X → R is said to be (ε, γ,B)-good for aclassification problem if there exist two class conditional probabilitydistributions D+ and D− such that for all x ∈ X , D+(x)
D(x) <√
B andD−(x)D(x) <
√B, such that at least a (1− ε) probability mass of examples
x ∼ D satisfies
Prx ′∼D+
x ′′∼D−
[`(x)
(`(x ′)d(x , x ′)− `(x ′′)d(x , x ′′)
)< 0
]≥ 1
2+ γ
The definition expects the distance function to set dissimilarlylabeled points farther off than similarly labeled pointsYet again this yields a classifier with guaranteed generalizationproperties
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 12 / 29
![Page 59: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/59.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
What is a good distance function
Definition ([Wang et al., 2007])A distance function d : X × X → R is said to be (ε, γ,B)-good for aclassification problem if there exist two class conditional probabilitydistributions D+ and D− such that for all x ∈ X , D+(x)
D(x) <√
B andD−(x)D(x) <
√B, such that at least a (1− ε) probability mass of examples
x ∼ D satisfies
Prx ′∼D+
x ′′∼D−
[`(x)
(`(x ′)d(x , x ′)− `(x ′′)d(x , x ′′)
)< 0
]≥ 1
2+ γ
The definition expects the distance function to set dissimilarlylabeled points farther off than similarly labeled points
Yet again this yields a classifier with guaranteed generalizationproperties
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 12 / 29
![Page 60: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/60.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
What is a good distance function
Definition ([Wang et al., 2007])A distance function d : X × X → R is said to be (ε, γ,B)-good for aclassification problem if there exist two class conditional probabilitydistributions D+ and D− such that for all x ∈ X , D+(x)
D(x) <√
B andD−(x)D(x) <
√B, such that at least a (1− ε) probability mass of examples
x ∼ D satisfies
Prx ′∼D+
x ′′∼D−
[`(x)
(`(x ′)d(x , x ′)− `(x ′′)d(x , x ′′)
)< 0
]≥ 1
2+ γ
The definition expects the distance function to set dissimilarlylabeled points farther off than similarly labeled pointsYet again this yields a classifier with guaranteed generalizationproperties
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 12 / 29
![Page 61: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/61.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
Learning with a good distance function
Theorem ([Wang et al., 2007])Given an (ε, γ,B)-good distance function, for any δ > 0, givenn = 4B2
γ2 lg 1δ pairs of positive and negatively labeled points
(x+
i , x−i
)ni=1,
the classifier ˆdefined below has error at margin γB no more than ε+ δ
with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
βi sgn(d(x , x+
i )− d(x , x−1 )))
,n∑
i=1βi = 1, βi ≥ 0
This naturally lends itself to a boosting-like implementationEach of the pairs yields a stump sgn
(d(x , x+
i )− d(x , x−1 ))
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 13 / 29
![Page 62: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/62.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
Learning with a good distance function
Theorem ([Wang et al., 2007])Given an (ε, γ,B)-good distance function, for any δ > 0, givenn = 4B2
γ2 lg 1δ pairs of positive and negatively labeled points
(x+
i , x−i
)ni=1,
the classifier ˆdefined below has error at margin γB no more than ε+ δ
with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
βi sgn(d(x , x+
i )− d(x , x−1 )))
,n∑
i=1βi = 1, βi ≥ 0
This naturally lends itself to a boosting-like implementation
Each of the pairs yields a stump sgn(d(x , x+
i )− d(x , x−1 ))
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 13 / 29
![Page 63: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/63.jpg)
Learning with Suitable Similarities Learning with a Suitable Distance Function
Learning with a good distance function
Theorem ([Wang et al., 2007])Given an (ε, γ,B)-good distance function, for any δ > 0, givenn = 4B2
γ2 lg 1δ pairs of positive and negatively labeled points
(x+
i , x−i
)ni=1,
the classifier ˆdefined below has error at margin γB no more than ε+ δ
with probability greater than 1− δ,
ˆ(x) = sgn(
n∑i=1
βi sgn(d(x , x+
i )− d(x , x−1 )))
,n∑
i=1βi = 1, βi ≥ 0
This naturally lends itself to a boosting-like implementationEach of the pairs yields a stump sgn
(d(x , x+
i )− d(x , x−1 ))
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 13 / 29
![Page 64: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/64.jpg)
Data-sensitive Notions of Suitability
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 65: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/65.jpg)
Data-sensitive Notions of Suitability
A unified notion of what is a good similarity/distance
Disparate as the last two models may seem, they are, in fact, quiterelated to each other
Motivated by this observation we propose a notion of goodnessthat is data-sensitiveThis notion allows us to tune the goodness notion itself, allowingfor better classifiersThe resulting model subsumes the previous two modelsConsequently, the model does not require separate treatment forsimilarity and distance functions either
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 66: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/66.jpg)
Data-sensitive Notions of Suitability
A unified notion of what is a good similarity/distance
Disparate as the last two models may seem, they are, in fact, quiterelated to each otherMotivated by this observation we propose a notion of goodnessthat is data-sensitive
This notion allows us to tune the goodness notion itself, allowingfor better classifiersThe resulting model subsumes the previous two modelsConsequently, the model does not require separate treatment forsimilarity and distance functions either
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 67: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/67.jpg)
Data-sensitive Notions of Suitability
A unified notion of what is a good similarity/distance
Disparate as the last two models may seem, they are, in fact, quiterelated to each otherMotivated by this observation we propose a notion of goodnessthat is data-sensitiveThis notion allows us to tune the goodness notion itself, allowingfor better classifiers
The resulting model subsumes the previous two modelsConsequently, the model does not require separate treatment forsimilarity and distance functions either
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 68: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/68.jpg)
Data-sensitive Notions of Suitability
A unified notion of what is a good similarity/distance
Disparate as the last two models may seem, they are, in fact, quiterelated to each otherMotivated by this observation we propose a notion of goodnessthat is data-sensitiveThis notion allows us to tune the goodness notion itself, allowingfor better classifiersThe resulting model subsumes the previous two models
Consequently, the model does not require separate treatment forsimilarity and distance functions either
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 69: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/69.jpg)
Data-sensitive Notions of Suitability
A unified notion of what is a good similarity/distance
Disparate as the last two models may seem, they are, in fact, quiterelated to each otherMotivated by this observation we propose a notion of goodnessthat is data-sensitiveThis notion allows us to tune the goodness notion itself, allowingfor better classifiersThe resulting model subsumes the previous two modelsConsequently, the model does not require separate treatment forsimilarity and distance functions either
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 14 / 29
![Page 70: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/70.jpg)
Data-sensitive Notions of Suitability
What is a good similarity/distance function
Definition (K. and Jain, 2011)A similarity function K : X × X → R is said to be (ε, γ,B)-good for aclassification problem if for some antisymmetric transfer functionf : R→ [−Cf ,Cf ] and some weighing function w : X × X → [−B,B], atleast a (1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w (x ′, x ′′) f (K (x , x ′)− K (x , x ′′))] ≥ 2Cfγ
With appropriate setting of the weighing function and the transferfunction, the previous two models can be recovered.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 15 / 29
![Page 71: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/71.jpg)
Data-sensitive Notions of Suitability
What is a good similarity/distance function
Definition (K. and Jain, 2011)A similarity function K : X × X → R is said to be (ε, γ,B)-good for aclassification problem if for some antisymmetric transfer functionf : R→ [−Cf ,Cf ] and some weighing function w : X × X → [−B,B], atleast a (1− ε) probability mass of examples x ∼ D satisfies
Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′)6=`(x)
[w (x ′, x ′′) f (K (x , x ′)− K (x , x ′′))] ≥ 2Cfγ
With appropriate setting of the weighing function and the transferfunction, the previous two models can be recovered.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 15 / 29
![Page 72: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/72.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 15 / 29
![Page 73: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/73.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The learning algorithm is not as simple as before since theguarantees we give hold only if the a good transfer function ischosen.
Let us first see how, given a (good) transfer function, can we learna (good) classifier.We will later on plug in the routines to learn the transfer functionas well.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 16 / 29
![Page 74: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/74.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The learning algorithm is not as simple as before since theguarantees we give hold only if the a good transfer function ischosen.Let us first see how, given a (good) transfer function, can we learna (good) classifier.
We will later on plug in the routines to learn the transfer functionas well.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 16 / 29
![Page 75: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/75.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The learning algorithm is not as simple as before since theguarantees we give hold only if the a good transfer function ischosen.Let us first see how, given a (good) transfer function, can we learna (good) classifier.We will later on plug in the routines to learn the transfer functionas well.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 16 / 29
![Page 76: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/76.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
Algorithm 1 LEARN-DISSIM
Require: A similarity function K , landmark pairs L =(
x+i , x
−i
)n
i=1, a good
transfer function f .Ensure: A classifier ˆ : X → {−1,+1}
1: Define ΦL : X → Rn as ΦL : x 7→(
f (K (x , x+i )− K (x , x−i ))
)n
i=1
2: Get a labeled training set T ={
tj}n′
j=1 ⊂ X sampled from D.
3: T ′ ←{
ΦL(tj )}n′
j=1 ⊂ Rn be the data set embedded in Rn
4: Learn a linear hyperplane over Rn using T ′, `lin ← LEARN-LINEAR(T ′)5: Let ˆ : X → {−1,+1} be defined as ˆ : x 7→ `lin (ΦL(x))
6: return ˆ
LEARN-LINEAR may be taken to be any linear hyperplanelearning algorithm such as Perceptron, SVM.The above procedure essentially creates a data-driven, problemspecific embedding of the domain X into a Euclidean space
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 17 / 29
![Page 77: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/77.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
Algorithm 1 LEARN-DISSIM
Require: A similarity function K , landmark pairs L =(
x+i , x
−i
)n
i=1, a good
transfer function f .Ensure: A classifier ˆ : X → {−1,+1}
1: Define ΦL : X → Rn as ΦL : x 7→(
f (K (x , x+i )− K (x , x−i ))
)n
i=1
2: Get a labeled training set T ={
tj}n′
j=1 ⊂ X sampled from D.
3: T ′ ←{
ΦL(tj )}n′
j=1 ⊂ Rn be the data set embedded in Rn
4: Learn a linear hyperplane over Rn using T ′, `lin ← LEARN-LINEAR(T ′)5: Let ˆ : X → {−1,+1} be defined as ˆ : x 7→ `lin (ΦL(x))
6: return ˆ
LEARN-LINEAR may be taken to be any linear hyperplanelearning algorithm such as Perceptron, SVM.
The above procedure essentially creates a data-driven, problemspecific embedding of the domain X into a Euclidean space
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 17 / 29
![Page 78: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/78.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
Algorithm 1 LEARN-DISSIM
Require: A similarity function K , landmark pairs L =(
x+i , x
−i
)n
i=1, a good
transfer function f .Ensure: A classifier ˆ : X → {−1,+1}
1: Define ΦL : X → Rn as ΦL : x 7→(
f (K (x , x+i )− K (x , x−i ))
)n
i=1
2: Get a labeled training set T ={
tj}n′
j=1 ⊂ X sampled from D.
3: T ′ ←{
ΦL(tj )}n′
j=1 ⊂ Rn be the data set embedded in Rn
4: Learn a linear hyperplane over Rn using T ′, `lin ← LEARN-LINEAR(T ′)5: Let ˆ : X → {−1,+1} be defined as ˆ : x 7→ `lin (ΦL(x))
6: return ˆ
LEARN-LINEAR may be taken to be any linear hyperplanelearning algorithm such as Perceptron, SVM.The above procedure essentially creates a data-driven, problemspecific embedding of the domain X into a Euclidean space
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 17 / 29
![Page 79: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/79.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The results given earlier guarantee small classification error atlarge margin
Not amenable to efficient algorithms as hyperplane classificationerror is NP-hard to minimize[Garey and Johnson, 1979, Arora et al., 1997]We provide our guarantees in terms of smooth Lipschitz losseslike hinge-loss, log-loss etc that can be efficiently minimized overlarge datasets.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 18 / 29
![Page 80: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/80.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The results given earlier guarantee small classification error atlarge marginNot amenable to efficient algorithms as hyperplane classificationerror is NP-hard to minimize[Garey and Johnson, 1979, Arora et al., 1997]
We provide our guarantees in terms of smooth Lipschitz losseslike hinge-loss, log-loss etc that can be efficiently minimized overlarge datasets.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 18 / 29
![Page 81: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/81.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Learning with data-sensitive notions of suitability
The results given earlier guarantee small classification error atlarge marginNot amenable to efficient algorithms as hyperplane classificationerror is NP-hard to minimize[Garey and Johnson, 1979, Arora et al., 1997]We provide our guarantees in terms of smooth Lipschitz losseslike hinge-loss, log-loss etc that can be efficiently minimized overlarge datasets.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 18 / 29
![Page 82: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/82.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Working with surrogate loss functions
Definition (K. and Jain, 2011)A similarity function is said to be (ε,B)-good with respect to a lossfunction L : R→ R+ if for some transfer function f : R→ R and someweighing function w : X × X → [−B,B], E
x∼D[L(G(x))] ≤ ε where
G(x) = Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′) 6=`(x)
[w (x ′, x ′′) f (K (x , x ′)− K (x , x ′′))]
Theorem (K. and Jain, 2011)If K is an (ε,B)-good similarity function with respect to a CL-Lipschitzloss function L then for any ε1 > 0, with probability at least 1− δ overthe choice of d = (16B2C2
L/ε21) ln(4B/δε1) landmark pairs, the
expected loss of the classifier ˆ(x) returned by LEARN-DISSIM withrespect to L satisfies E
x
[L(ˆ(x))
]≤ ε+ ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 19 / 29
![Page 83: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/83.jpg)
Data-sensitive Notions of Suitability Learning with Data-sensitive Notions of Suitability
Working with surrogate loss functions
Definition (K. and Jain, 2011)A similarity function is said to be (ε,B)-good with respect to a lossfunction L : R→ R+ if for some transfer function f : R→ R and someweighing function w : X × X → [−B,B], E
x∼D[L(G(x))] ≤ ε where
G(x) = Ex ′∼D,`(x ′)=`(x)x ′′∼D,`(x ′′) 6=`(x)
[w (x ′, x ′′) f (K (x , x ′)− K (x , x ′′))]
Theorem (K. and Jain, 2011)If K is an (ε,B)-good similarity function with respect to a CL-Lipschitzloss function L then for any ε1 > 0, with probability at least 1− δ overthe choice of d = (16B2C2
L/ε21) ln(4B/δε1) landmark pairs, the
expected loss of the classifier ˆ(x) returned by LEARN-DISSIM withrespect to L satisfies E
x
[L(ˆ(x))
]≤ ε+ ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 19 / 29
![Page 84: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/84.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 19 / 29
![Page 85: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/85.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
We give uniform convergence guarantees that enable standardERM-based routines to recover the best transfer from anycompact class of antisymmetric functions.
This will yield a nested learning problem with the ERM-basedtransfer function learning algorithm calling the classifier learningalgorithm as a subroutine.For any transfer function f and arbitrary set of landmarks L, letL(f ) = E
x∼D[L(G(x))] and let L(f ,L) denote the generalization loss
of the best classifier that uses the embedding ΦL defined by thelandmarks L.The earlier result shows that for a fixed f , for a large enoughrandom L, L(f ,L) ≤ L(f ) + ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 20 / 29
![Page 86: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/86.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
We give uniform convergence guarantees that enable standardERM-based routines to recover the best transfer from anycompact class of antisymmetric functions.This will yield a nested learning problem with the ERM-basedtransfer function learning algorithm calling the classifier learningalgorithm as a subroutine.
For any transfer function f and arbitrary set of landmarks L, letL(f ) = E
x∼D[L(G(x))] and let L(f ,L) denote the generalization loss
of the best classifier that uses the embedding ΦL defined by thelandmarks L.The earlier result shows that for a fixed f , for a large enoughrandom L, L(f ,L) ≤ L(f ) + ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 20 / 29
![Page 87: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/87.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
We give uniform convergence guarantees that enable standardERM-based routines to recover the best transfer from anycompact class of antisymmetric functions.This will yield a nested learning problem with the ERM-basedtransfer function learning algorithm calling the classifier learningalgorithm as a subroutine.For any transfer function f and arbitrary set of landmarks L, letL(f ) = E
x∼D[L(G(x))] and let L(f ,L) denote the generalization loss
of the best classifier that uses the embedding ΦL defined by thelandmarks L.
The earlier result shows that for a fixed f , for a large enoughrandom L, L(f ,L) ≤ L(f ) + ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 20 / 29
![Page 88: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/88.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
We give uniform convergence guarantees that enable standardERM-based routines to recover the best transfer from anycompact class of antisymmetric functions.This will yield a nested learning problem with the ERM-basedtransfer function learning algorithm calling the classifier learningalgorithm as a subroutine.For any transfer function f and arbitrary set of landmarks L, letL(f ) = E
x∼D[L(G(x))] and let L(f ,L) denote the generalization loss
of the best classifier that uses the embedding ΦL defined by thelandmarks L.The earlier result shows that for a fixed f , for a large enoughrandom L, L(f ,L) ≤ L(f ) + ε1.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 20 / 29
![Page 89: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/89.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
Theorem (K. and Jain, 2011)Let F be a compact class oftransfer functions with respect tothe infinity norm and ε1, δ > 0. LetN (F , r) be the size of the smallestε-net over F with respect to theinfinity norm at scale r = ε1
4CLB .
Taking n =64B2C2
Lε21
ln(
16B·N (F ,r)δε1
)random landmark pairs, we havewith probability greater than (1− δ)
supf∈F
[|L(f ,L)− L(f )|] ≤ ε1
Algorithm 2 FTUNERequire: A family of transfer functions F , a
similarity function K , a loss function L.Ensure: An optimal transfer function f∗ ∈ F .
1: Select d landmark pairs L .2: for all f ∈ F do3: wf ← LEARN-DISSIM(K ,L, f ),
Lf ← L (f ,L)4: end for5: f∗ ← arg min
f∈FLf
6: return f∗.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 21 / 29
![Page 90: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/90.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Learning the transfer function
Theorem (K. and Jain, 2011)Let F be a compact class oftransfer functions with respect tothe infinity norm and ε1, δ > 0. LetN (F , r) be the size of the smallestε-net over F with respect to theinfinity norm at scale r = ε1
4CLB .
Taking n =64B2C2
Lε21
ln(
16B·N (F ,r)δε1
)random landmark pairs, we havewith probability greater than (1− δ)
supf∈F
[|L(f ,L)− L(f )|] ≤ ε1
Algorithm 2 FTUNERequire: A family of transfer functions F , a
similarity function K , a loss function L.Ensure: An optimal transfer function f∗ ∈ F .
1: Select d landmark pairs L .2: for all f ∈ F do3: wf ← LEARN-DISSIM(K ,L, f ),
Lf ← L (f ,L)4: end for5: f∗ ← arg min
f∈FLf
6: return f∗.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 21 / 29
![Page 91: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/91.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Intelligent choice of landmark points
If landmarks are clumpedtogether, then all points willget a similar embeddingand linear separationwould be impossible
Thus we promote diversityamong the landmarks as aheuristic on small datasetsOn large datasets FTUNEitself is able to recover thebest transfer function as itdoes not over-fit
Algorithm 3 DSELECTRequire: A training set T .Ensure: A set of n landmark pairs.
1: S ← RANDOM-ELEMENT(T ),L ← ∅2: for j = 2 to n do3: z ← arg min
x∈T
∑x′∈S
K (x , x ′).
4: S ← S ∪ {z}, T ← T\{z}5: end for6: for j = 1 to n do7: Sample z1, z2 from S with replacement s.t.
`(z1) = 1, `(z2) = −18: L ← L ∪ {(z1, z2)}9: end for
10: return L
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 22 / 29
![Page 92: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/92.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Intelligent choice of landmark points
If landmarks are clumpedtogether, then all points willget a similar embeddingand linear separationwould be impossibleThus we promote diversityamong the landmarks as aheuristic on small datasets
On large datasets FTUNEitself is able to recover thebest transfer function as itdoes not over-fit
Algorithm 3 DSELECTRequire: A training set T .Ensure: A set of n landmark pairs.
1: S ← RANDOM-ELEMENT(T ),L ← ∅2: for j = 2 to n do3: z ← arg min
x∈T
∑x′∈S
K (x , x ′).
4: S ← S ∪ {z}, T ← T\{z}5: end for6: for j = 1 to n do7: Sample z1, z2 from S with replacement s.t.
`(z1) = 1, `(z2) = −18: L ← L ∪ {(z1, z2)}9: end for
10: return L
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 22 / 29
![Page 93: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/93.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Intelligent choice of landmark points
If landmarks are clumpedtogether, then all points willget a similar embeddingand linear separationwould be impossibleThus we promote diversityamong the landmarks as aheuristic on small datasetsOn large datasets FTUNEitself is able to recover thebest transfer function as itdoes not over-fit
Algorithm 3 DSELECTRequire: A training set T .Ensure: A set of n landmark pairs.
1: S ← RANDOM-ELEMENT(T ),L ← ∅2: for j = 2 to n do3: z ← arg min
x∈T
∑x′∈S
K (x , x ′).
4: S ← S ∪ {z}, T ← T\{z}5: end for6: for j = 1 to n do7: Sample z1, z2 from S with replacement s.t.
`(z1) = 1, `(z2) = −18: L ← L ∪ {(z1, z2)}9: end for
10: return L
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 22 / 29
![Page 94: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/94.jpg)
Data-sensitive Notions of Suitability Learning the Best Notion of Suitability
Intelligent choice of landmark points
If landmarks are clumpedtogether, then all points willget a similar embeddingand linear separationwould be impossibleThus we promote diversityamong the landmarks as aheuristic on small datasetsOn large datasets FTUNEitself is able to recover thebest transfer function as itdoes not over-fit
Algorithm 3 DSELECTRequire: A training set T .Ensure: A set of n landmark pairs.
1: S ← RANDOM-ELEMENT(T ),L ← ∅2: for j = 2 to n do3: z ← arg min
x∈T
∑x′∈S
K (x , x ′).
4: S ← S ∪ {z}, T ← T\{z}5: end for6: for j = 1 to n do7: Sample z1, z2 from S with replacement s.t.
`(z1) = 1, `(z2) = −18: L ← L ∪ {(z1, z2)}9: end for
10: return L
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 22 / 29
![Page 95: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/95.jpg)
Data-sensitive Notions of Suitability Results
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 22 / 29
![Page 96: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/96.jpg)
Data-sensitive Notions of Suitability Results
Results
50 100 150 200 250 3000.5
0.6
0.7
0.8
0.9
1AmazonBinary (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE+DFTUNEBBS+DBBSDBOOST
50 100 150 200 250 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7Amazon47 (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE+DFTUNEBBS+DBBSDBOOST
50 100 150 200 250 3000.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45Mirex07 (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE+DFTUNEBBS+DBBSDBOOST
0 100 200 3000
0.1
0.2
0.3
0.4
0.5
0.6
FaceRec (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE+DFTUNEBBS+DBBSDBOOST
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 23 / 29
![Page 97: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/97.jpg)
Data-sensitive Notions of Suitability Results
Results
50 100 150 200 250 3000.65
0.7
0.75
0.8
0.85
0.9
0.95
1Isolet (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE (Single)FTUNE (Multiple)BBSDBOOST
50 100 150 200 250 3000.5
0.6
0.7
0.8
0.9
1Letters (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE (Single)FTUNE (Multiple)BBSDBOOST
50 100 150 200 250 3000.93
0.94
0.95
0.96
0.97
0.98
0.99
1Pen−digits (Accuracy vs Landmarks)
Number of Landmarks
Acc
urac
y
FTUNE (Single)FTUNE (Multiple)BBSDBOOST
50 100 150 200 250 3000.88
0.9
0.92
0.94
0.96
0.98
Number of Landmarks
Acc
urac
y
Opt−digits (Accuracy vs Landmarks)
FTUNE (Single)FTUNE (Multiple)BBSDBOOST
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 24 / 29
![Page 98: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/98.jpg)
Data-sensitive Notions of Suitability Results
Discussion
BBS performs reasonably well for small landmarking sizes whileDBOOST performs well for large landmarking sizes.
In contrast, our method consistently outperforms the existingmethods in both the scenarios.Since FTUNE selects its output by way of validation, it issusceptible to over-fitting on small datasets.In these cases, DSELECT (intuitively) removes redundancies inthe landmark points thus allowing FTUNE to recover the besttransfer function.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 25 / 29
![Page 99: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/99.jpg)
Data-sensitive Notions of Suitability Results
Discussion
BBS performs reasonably well for small landmarking sizes whileDBOOST performs well for large landmarking sizes.In contrast, our method consistently outperforms the existingmethods in both the scenarios.
Since FTUNE selects its output by way of validation, it issusceptible to over-fitting on small datasets.In these cases, DSELECT (intuitively) removes redundancies inthe landmark points thus allowing FTUNE to recover the besttransfer function.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 25 / 29
![Page 100: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/100.jpg)
Data-sensitive Notions of Suitability Results
Discussion
BBS performs reasonably well for small landmarking sizes whileDBOOST performs well for large landmarking sizes.In contrast, our method consistently outperforms the existingmethods in both the scenarios.Since FTUNE selects its output by way of validation, it issusceptible to over-fitting on small datasets.
In these cases, DSELECT (intuitively) removes redundancies inthe landmark points thus allowing FTUNE to recover the besttransfer function.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 25 / 29
![Page 101: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/101.jpg)
Data-sensitive Notions of Suitability Results
Discussion
BBS performs reasonably well for small landmarking sizes whileDBOOST performs well for large landmarking sizes.In contrast, our method consistently outperforms the existingmethods in both the scenarios.Since FTUNE selects its output by way of validation, it issusceptible to over-fitting on small datasets.In these cases, DSELECT (intuitively) removes redundancies inthe landmark points thus allowing FTUNE to recover the besttransfer function.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 25 / 29
![Page 102: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/102.jpg)
Data-sensitive Notions of Suitability Results
Thanks
Preprint available athttp://www.cse.iitk.ac.in/users/purushot/
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 26 / 29
![Page 103: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/103.jpg)
References
Outline
1 An Introduction to Learning
2 A Brief History of Learning with Similarities
3 Learning with Suitable SimilaritiesLearning with a Suitable Similarity FunctionLearning with a Suitable Distance Function
4 Data-sensitive Notions of SuitabilityLearning with Data-sensitive Notions of SuitabilityLearning the Best Notion of SuitabilityResults
5 References
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 27 / 29
![Page 104: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/104.jpg)
References
References I
Arora, S., Babai, L., Stern, J., and Sweedyk, Z. (1997).The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations.
Journal of Computer and System Sciences, 54(2):317–331.
Balcan, M.-F. and Blum, A. (2006).On a Theory of Learning with Similarity Functions.In International Conference on Machine Learning, pages 73–80.
Balcan, M.-F., Blum, A., and Vempala, S. (2006).Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings.Machine Learning, 65(1):79–94.
Garey, M. R. and Johnson, D. (1979).Computers and Intractability: A Guide to the theory of NP-Completeness.Freeman, San Francisco.
Goldfarb, L. (1984).A Unified Approach to Pattern Recognition.Pattern Recognition, 17(5):575–582.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 27 / 29
![Page 105: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/105.jpg)
References
References II
Gottlieb, L.-A., Kontorovich, A. L., and Krauthgamer, R. (2010).Efficient Classification for Metric Data.In Annual Conference on Computational Learning Theory.
Haasdonk, B. (2005).Feature Space Interpretation of SVMs with Indefinite Kernels.IEEE Transactions on Pattern Analysis and Machince Intelligence, 27(4):482–492.
Kar, P. and Jain, P. (2011).Similarity-based Learning via Data Driven Embeddings.In 25th Annual Conference on Neural Information Processing Systems.(to appear).
Pekalska, E. and Duin, R. P. W. (2001).On Combining Dissimilarity Representations.In Multiple Classifier Systems, pages 359–368.
Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., and Anthony, M. (1998).Structural Risk Minimization Over Data-Dependent Hierarchies.IEEE Transactions on Information Theory, 44(5):1926–1940.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 28 / 29
![Page 106: Similarity-based Learning via Data Driven EmbeddingsSimilarity-based Learning via Data Driven Embeddings Purushottam Kar1 Prateek Jain2 1Indian Institute of Technology Kanpur 2Microsoft](https://reader033.vdocuments.mx/reader033/viewer/2022053000/5f0508097e708231d410eae9/html5/thumbnails/106.jpg)
References
References III
von Luxburg, U. and Bousquet, O. (2004).Distance-Based Classification with Lipschitz Functions.Journal of Machine Learning Research, 5:669–695.
Wang, L., Yang, C., and Feng, J. (2007).On Learning with Dissimilarity Functions.In International Conference on Machine Learning, pages 991–998.
Weinberger, K. Q. and Saul, L. K. (2009).Distance Metric Learning for Large Margin Nearest Neighbor Classification.Journal of Machine Learning Research, 10:207–244.
P. Kar and P. Jain (IITK/MSRI) Similarity-based Learning November 3, 2011 29 / 29