consistent phrase relevance measures
DESCRIPTION
Consistent Phrase Relevance Measures. Scott Wen -tau Yih & Chris Meek Microsoft Research. Why Measure Phase Relevance?. Keyword-driven Online Advertising Sponsored Search Ads with bid keywords that match the query Contextual Advertising (keyword-based) - PowerPoint PPT PresentationTRANSCRIPT
Consistent Phrase Relevance Measures
Scott Wen-tau Yih & Chris MeekMicrosoft Research
Why Measure Phase Relevance?
Keyword-driven Online AdvertisingSponsored Search
Ads with bid keywords that match the queryContextual Advertising (keyword-based)
Ads with bid keywords that are relevant to the content
To deliver relevant ads leads to problems related to phrase relevance measures.
Sponsored Searchqueryflight to kyoto
Are these ads relevant to the query?
Contextual Advertising
How relevant are the keywords behind the ads?
Problem – Phrase Relevance MeasuresGiven a document d and a phrase ph, we want
to measure whether ph is relevant to d (e.g., p(ph|d))
Applications – judging ad relevanceSponsored search (query vs. ad landing page)
Ad relevance verificationWhether a keyword/query is relevant to the page
Contextual advertising (page vs. bid keyword)External keyword verificationWhether the new keyword is relevant to the content page
Keyword Extraction for In-doc PhrasesFor in-document phrases, we can use keyword
extractor (KEX) directly [Yih et al. WWW-06]
Machine Learning model learned by logistic regressionUse more than 10 categories of features
e.g., position, format, hyperlink, etc.Digital Camera ReviewThe new flagship of Canon’s S-series, PowerShot S80 digital camera, incorporates 8 megapixels for shooting still images and a movie mode that records an impressive 1024 x 768 pixels.
KEX
truecredit 0.879
transunion 0.705
credit bureaus 0.637
id theft 0.138
…
TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion for more detail…
What if the phrase is NOT in the document?
Challenges of Handling Out-of-doc PhrasesGiven a document d and a phrase ph that is not
in dEstimate the probability that ph is relevant to d
truecredit 0.879
transunion 0.705
credit bureaus 0.637
id theft 0.138
…
TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion for more detail…
credit bureau report ?
credit report services ?
equifax credit bureau ?
equifax credit report ?
exquifax ?
equfax ?
trans union canada ?
…
Challenges of Handling Out-of-doc PhrasesGiven a document d and a phrase ph that is not
in dEstimate the probability that ph is relevant to d
ChallengesHow do we measure it?
Lack of contextual information that in-doc phrases have
Consistent with the probabilities of in-doc phrasesMay need some methods to calibrate probabilities
Two ApproachesCalibrated cosine similarity methods
Treat in-doc and out-of-doc phrases equallyMap cosine similarity scores to probabilities
Regression methods based on semantic kernelsGiven robust in-doc phrase relevance measuresPredict out-of-doc phrase relevance using similarity between the target phrase and in-doc phrases
Regression methods achieve better empirical results
Outline
IntroductionRelevance measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions
Similarity-based MeasuresStep 1: Estimate sim(d,ph) → R
Represent d as a sparse word vectorWords in document d, associated with weightsVec(d) = {‘truecredit’,0.9; ‘transunion’,0.7; ‘access’,0.1; … }
Represent ph as a sparse word vector via query expansion
Issue ph as a query to search engine; let the result page be document d’Vec(ph) ← Vec(d’)
sim(d,ph) = cosine(Vec(d),Vec(ph))
Choices of term-weighing schemesBag of words (SimBin), TFIDF (SimTFIDF)Keyword Extraction (SimKEX)
Map Similarity Scores to ProbabilitiesStep 2: Map sim(d,ph) to prob(ph|d)
Via a sigmoid function where the weights are pre-learned[Platt ’00]
The sigmoid function can be used to combine multiple relevance scores
SimCombine: Combine SimBin, SimTFIDF & SimKEX
),(1
),(log
phdsim
phdsimf
)exp(1
1)|(
fdphprob
)exp(1
1)|(
m
i ii fdphprob
Outline
IntroductionRelevance Measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions
Regression-based Measures: Intuition Relevant in-doc
phrases:TrueCredit, TransUnion
Out-of-doc phrases:credit bureau report vs. Olympics
Which out-of-doc phrase is more relevant?
TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion…
TrueCreditGet immediate access to your complete credit report from 3 credit bureaus. Just $14.95 per month, including $25K ID Theft insurance. Contact TransUnion…
Regression-based Measures: ProcedureStep 1: Estimate probabilities of in-doc phrases
KEX(d) = {(‘truecredit’,0.88),(‘transunion’,0.71), (‘credit bureaus’,0.64), (‘id theft’,0.14)}
Step 2: Represent each phrase as a TFIDF vector via query expansionx1=Vec(‘truecredit’), y1=0.88; x2=Vec(‘transunion’), y2=0.71x3=Vec(‘credit bureaus’), y3=0.64; x4=Vec(‘id theft’), y4=0.14
Step 3: Represent the target phrase ph as a vectorx =Vec(ph), y=?
Step 4: Use a regression model to predict yInput: (x1, y1), …, (xn, yn) and x
Output: y
Gaussian Process Regression (GPR)We don’t specify the functional form of the regression modelInstead, we only need to specify the “kernel function”
k(x1, x2): linear kernel, polynomial kernel, RBF kernel, etc.
Conceptually, kernel function tells how similar x1 & x2 areChanging kernel function changes the regression function
Linear kernel → Bayesian linear regression
GPR
(x1,y1), (x2,y2),…, (xn,yn)
xkernel function e.g., k(xi,xj) = xi·xj
yyIKk 1Τ )( 2
ny
O(N3) from matrix inversion, where N≤20 typically
Outline
IntroductionRelevance Measures using cosine similarityOut-of-doc phrase relevance measure using Gaussian process regressionExperimentsConclusions
DataFrom sponsored search ad-click logs (3-month period in 2007)
Randomly select 867 English ad landing pagesEach page is associated with the original query and ~10 related keywords (from internal query suggestion algorithms)
Labeled 9,319 document-keyword pairs4,381 (47%) relevant; 4,938 (53%) irrelevantMost keywords (81.9%) are out-of-document
10-fold cross-validation when learning is used
Evaluation MetricsAccuracy
Quality of binary classificationFalse positive and false negative are treated equally
AUC (Area Under the ROC curve)Quality of rankingEquivalent to pair-wise accuracy
Cross EntropyQuality of probability estimations
-log2[p(ph|d)] if ph is labeled relevant to d
-log2[1-p(ph|d)] if ph is labeled irrelevant to d
Accuracy
1
2
3
4
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.651000000000001
0.663000000000001
0.654000000000001
0.681000000000001
0.704000000000001
Better
AUC Scores
1
2
3
4
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.702000000000001
0.726000000000001
0.726000000000001
0.752000000000001
0.773000000000001
Better
Cross Entropy
1
2
3
4
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.939
0.887
0.882
0.864000000000001
0.835000000000001
Better
Conclusions (1/2)Phrase relevance measure is a crucial task for online advertisingOur solution: similarity & regression based methods
Consistent probabilities for out-of-doc phrasesSimilarity-based methods
Simple and straightforwardThe combined approach can lead to decent performance
Regression-based methodsAchieved the best results in our experimentsQuality depends on the in-doc relevance estimates & kernel
Conclusions (2/2)Future Work – More machine learning techniques
SimCombineAn ML method using basic similarity measures as featuresExplore more features (e.g., query frequency, page quality)Other machine learning models
Gaussian process regressionLearning a better kernel function
Kernel meta-training [Platt et al. NIPS-14] Maximum likelihood training