consistent phrase relevance measures

24
Consistent Phrase Relevance Measures Scott Wen-tau Yih & Chris Meek Microsoft Research

Upload: penelope-mcdowell

Post on 30-Dec-2015

20 views

Category:

Documents


2 download

DESCRIPTION

Consistent Phrase Relevance Measures. Scott Wen -tau Yih & Chris Meek Microsoft Research. Why Measure Phase Relevance?. Keyword-driven Online Advertising Sponsored Search Ads with bid keywords that match the query Contextual Advertising (keyword-based) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Consistent Phrase Relevance Measures

Consistent Phrase Relevance Measures

Scott Wen-tau Yih & Chris MeekMicrosoft Research

Page 2: Consistent Phrase Relevance Measures

Why Measure Phase Relevance?

Keyword-driven Online AdvertisingSponsored Search

Ads with bid keywords that match the queryContextual Advertising (keyword-based)

Ads with bid keywords that are relevant to the content

To deliver relevant ads leads to problems related to phrase relevance measures.

Page 3: Consistent Phrase Relevance Measures

Sponsored Searchqueryflight to kyoto

Are these ads relevant to the query?

Page 4: Consistent Phrase Relevance Measures

Contextual Advertising

How relevant are the keywords behind the ads?

Page 5: Consistent Phrase Relevance Measures

Problem – Phrase Relevance MeasuresGiven a document d and a phrase ph, we want

to measure whether ph is relevant to d (e.g., p(ph|d))

Applications – judging ad relevanceSponsored search (query vs. ad landing page)

Ad relevance verificationWhether a keyword/query is relevant to the page

Contextual advertising (page vs. bid keyword)External keyword verificationWhether the new keyword is relevant to the content page

Page 6: Consistent Phrase Relevance Measures

Keyword Extraction for In-doc PhrasesFor in-document phrases, we can use keyword

extractor (KEX) directly [Yih et al. WWW-06]

Machine Learning model learned by logistic regressionUse more than 10 categories of features

e.g., position, format, hyperlink, etc.Digital Camera ReviewThe new flagship of Canon’s S-series, PowerShot S80 digital camera, incorporates 8 megapixels for shooting still images and a movie mode that records an impressive 1024 x 768 pixels.

KEX

truecredit 0.879

transunion 0.705

credit bureaus 0.637

id theft 0.138

TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion for more detail…

What if the phrase is NOT in the document?

Page 7: Consistent Phrase Relevance Measures

Challenges of Handling Out-of-doc PhrasesGiven a document d and a phrase ph that is not

in dEstimate the probability that ph is relevant to d

truecredit 0.879

transunion 0.705

credit bureaus 0.637

id theft 0.138

TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion for more detail…

credit bureau report ?

credit report services ?

equifax credit bureau ?

equifax credit report ?

exquifax ?

equfax ?

trans union canada ?

Page 8: Consistent Phrase Relevance Measures

Challenges of Handling Out-of-doc PhrasesGiven a document d and a phrase ph that is not

in dEstimate the probability that ph is relevant to d

ChallengesHow do we measure it?

Lack of contextual information that in-doc phrases have

Consistent with the probabilities of in-doc phrasesMay need some methods to calibrate probabilities

Page 9: Consistent Phrase Relevance Measures

Two ApproachesCalibrated cosine similarity methods

Treat in-doc and out-of-doc phrases equallyMap cosine similarity scores to probabilities

Regression methods based on semantic kernelsGiven robust in-doc phrase relevance measuresPredict out-of-doc phrase relevance using similarity between the target phrase and in-doc phrases

Regression methods achieve better empirical results

Page 10: Consistent Phrase Relevance Measures

Outline

IntroductionRelevance measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions

Page 11: Consistent Phrase Relevance Measures

Similarity-based MeasuresStep 1: Estimate sim(d,ph) → R

Represent d as a sparse word vectorWords in document d, associated with weightsVec(d) = {‘truecredit’,0.9; ‘transunion’,0.7; ‘access’,0.1; … }

Represent ph as a sparse word vector via query expansion

Issue ph as a query to search engine; let the result page be document d’Vec(ph) ← Vec(d’)

sim(d,ph) = cosine(Vec(d),Vec(ph))

Choices of term-weighing schemesBag of words (SimBin), TFIDF (SimTFIDF)Keyword Extraction (SimKEX)

Page 12: Consistent Phrase Relevance Measures

Map Similarity Scores to ProbabilitiesStep 2: Map sim(d,ph) to prob(ph|d)

Via a sigmoid function where the weights are pre-learned[Platt ’00]

The sigmoid function can be used to combine multiple relevance scores

SimCombine: Combine SimBin, SimTFIDF & SimKEX

),(1

),(log

phdsim

phdsimf

)exp(1

1)|(

fdphprob

)exp(1

1)|(

m

i ii fdphprob

Page 13: Consistent Phrase Relevance Measures

Outline

IntroductionRelevance Measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions

Page 14: Consistent Phrase Relevance Measures

Regression-based Measures: Intuition Relevant in-doc

phrases:TrueCredit, TransUnion

Out-of-doc phrases:credit bureau report vs. Olympics

Which out-of-doc phrase is more relevant?

TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion…

TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion…

Page 15: Consistent Phrase Relevance Measures

Regression-based Measures: ProcedureStep 1: Estimate probabilities of in-doc phrases

KEX(d) = {(‘truecredit’,0.88),(‘transunion’,0.71), (‘credit bureaus’,0.64), (‘id theft’,0.14)}

Step 2: Represent each phrase as a TFIDF vector via query expansionx1=Vec(‘truecredit’), y1=0.88; x2=Vec(‘transunion’), y2=0.71x3=Vec(‘credit bureaus’), y3=0.64; x4=Vec(‘id theft’), y4=0.14

Step 3: Represent the target phrase ph as a vectorx =Vec(ph), y=?

Step 4: Use a regression model to predict yInput: (x1, y1), …, (xn, yn) and x

Output: y

Page 16: Consistent Phrase Relevance Measures

Gaussian Process Regression (GPR)We don’t specify the functional form of the regression modelInstead, we only need to specify the “kernel function”

k(x1, x2): linear kernel, polynomial kernel, RBF kernel, etc.

Conceptually, kernel function tells how similar x1 & x2 areChanging kernel function changes the regression function

Linear kernel → Bayesian linear regression

GPR

(x1,y1), (x2,y2),…, (xn,yn)

xkernel function e.g., k(xi,xj) = xi·xj

yyIKk 1Τ )( 2

ny

O(N3) from matrix inversion, where N≤20 typically

Page 17: Consistent Phrase Relevance Measures

Outline

IntroductionRelevance Measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions

Page 18: Consistent Phrase Relevance Measures

DataFrom sponsored search ad-click logs (3-month period in 2007)

Randomly select 867 English ad landing pagesEach page is associated with the original query and ~10 related keywords (from internal query suggestion algorithms)

Labeled 9,319 document-keyword pairs4,381 (47%) relevant; 4,938 (53%) irrelevantMost keywords (81.9%) are out-of-document

10-fold cross-validation when learning is used

Page 19: Consistent Phrase Relevance Measures

Evaluation MetricsAccuracy

Quality of binary classificationFalse positive and false negative are treated equally

AUC (Area Under the ROC curve)Quality of rankingEquivalent to pair-wise accuracy

Cross EntropyQuality of probability estimations

-log2[p(ph|d)] if ph is labeled relevant to d

-log2[1-p(ph|d)] if ph is labeled irrelevant to d

Page 20: Consistent Phrase Relevance Measures

Accuracy

1

2

3

4

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.651000000000001

0.663000000000001

0.654000000000001

0.681000000000001

0.704000000000001

Better

Page 21: Consistent Phrase Relevance Measures

AUC Scores

1

2

3

4

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.702000000000001

0.726000000000001

0.726000000000001

0.752000000000001

0.773000000000001

Better

Page 22: Consistent Phrase Relevance Measures

Cross Entropy

1

2

3

4

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.939

0.887

0.882

0.864000000000001

0.835000000000001

Better

Page 23: Consistent Phrase Relevance Measures

Conclusions (1/2)Phrase relevance measure is a crucial task for online advertisingOur solution: similarity & regression based methods

Consistent probabilities for out-of-doc phrasesSimilarity-based methods

Simple and straightforwardThe combined approach can lead to decent performance

Regression-based methodsAchieved the best results in our experimentsQuality depends on the in-doc relevance estimates & kernel

Page 24: Consistent Phrase Relevance Measures

Conclusions (2/2)Future Work – More machine learning techniques

SimCombineAn ML method using basic similarity measures as featuresExplore more features (e.g., query frequency, page quality)Other machine learning models

Gaussian process regressionLearning a better kernel function

Kernel meta-training [Platt et al. NIPS-14] Maximum likelihood training