when experts agree: using non-affiliated experts to rank popular topics meital aizen

43
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Upload: audra-caldwell

Post on 12-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics

Meital Aizen

Page 2: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

o Introduction• General idea• Related work

o Hilltop algorithm• Overview• Algorithm phases

o Expert documents• Detecting host affiliation• Selecting experts• Indexing the experts

o Query processing• Computing expert score• Computing target score

o Evaluation

o Conclusions

Outline:

Page 3: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

General IdeaPropose a ranking scheme for popular topics that places the most authoritative pages on the query topic at the top of the ranking.

Page 4: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Introduction Queries on popular topics tend to produce a large

result set.

This set is hard to rank based on content only.

Content analysis cannot distinguish between authoritative and non-authoritative pages. Hence, other sources of information is used to rank results.

Page 5: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Related WorkApproaches to improve the authoritativeness of ranked results that have been taken in the past:

Ranking Based on Human Classification Ranking Based on Usage Information Ranking Based on Connectivity

Page 6: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Ranking Based on Human Classification

Human editors have been used by companies (such as Yahoo!) to manually associate a set of categories and keywords with a subset of documents on the web.

Disadvantages:o Slow and can only be applied to a small number

of pages.o Keywords and classifications are inadequate or

incomplete.

Page 7: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Ranking Based on Usage Information

Some services collect information on: Queries users submit to search services. Pages they look at subsequently and the time

spent on each page.

This information is used to return pages that most users visit after deploying the given query.

Disadvantages:o Large amount of data needs to be collected for

each query thus, potential set of queries is small.o Open to spamming.

Page 8: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Ranking Based on Connectivity

Analyzing the hyperlinks between pages on the web on the assumption that:a) Pages on the topic link to each other.b) Authoritative pages tend to point to other

authoritative pages.

Two kinds of algorithms: o PageRank o Topic Distillation

Page 9: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

PageRank Algorithm to rank pages based on assumption b.

Computes a query-independent authority score for every page on the web and uses this score to rank the result set.

Can’t distinguish between pages that are authoritative in general and pages that are authoritative on the query topic.

Page 10: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Topic Distillation

Computes a query specific subgraph of web pages.

Computes a score for every page in the subgraph - every page is given an authority score.

A preliminary ranking for the query is done with content analysis. The top ranked result pages for the query are selected. This creates a selected set.

Some of the pages within one or two links from the selected set are also added to the selected set if they are on the query topic.

Page 11: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Hilltop Algorithm Overview

Page 12: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

How Does It Work? “Expert Documents”- subset of pages on the web

identified as directories of links to non-affiliated sources on specific topics.

Results are ranked based on the match between the query and relevant descriptive text for hyperlinks on expert pages pointing to a given result page.

Page 13: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

List of experts

Identify target pages

Rank targets

Compute list of most relevant experts on the query topic

Identify relevant links in the experts set and follow them to get target pages

Rank according to number and relevance of non-affiliated experts

Page 14: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Hilltop Algorithm Phases

Page 15: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Expert Lookup

What is an expert page?

A page that is about a certain topic and has links to many non-affiliated pages on that topic.

• Two pages are non-affiliated if they are by authors from non-affiliated organizations.

Page 16: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

subset of pages crawled by a search engine are identified as experts

The pages are indexed in a special inverted index

Given an input query, a lookup is done on the expert-index to find and rank matching expert pages

Page 17: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Target Ranking

o Given the top ranked matching expert-pages and associated match information, we select links that we know to have all the query terms associated with them.

o With further connectivity analysis on the selected links, we identify a subset of their targets as the top-ranked pages on the query topic.

o The targets are rated by a ranking score which is computed by combining the scores of the experts pointing to the target.

A page is an authority on the query topic

Some of the best experts on that query topic point to it

Page 18: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Expert Documents

Page 19: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Expert Documents

What makes a page an expert?

An expert page needs to be objective and diverse. Its links should be unbiased and point to numerous non-affiliated pages on the subject.

Page 20: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Selecting the Experts

Process a search engine’s database of pages and select a subset considered to be good sources of links on specific topics.

Expert pages - pages with out-degree greater than a threshold, k, whose URLs point to k distinct non-affiliated hosts.

Page 21: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Detecting Host AffiliationTwo hosts are defined as affiliated if one or both of the following is true:

o They share the same first 3 octets of the IP address.

o The rightmost non-generic token in the hostname is the same.

For example: www.ibm.comwww.ibm.co.uk

Page 22: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

• Using a union-find algorithm we group into sets, hosts that either share the same rightmost non-generic suffix or have an IP address in common.

Page 23: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

• Every set is given a unique identifier.

• The host-affiliation lookup maps every host to its set identifier or to itself.

• If the lookup maps two hosts to the same value then they are affiliated; otherwise they are non-affiliated.

1 n32

Page 24: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Indexing the Experts To locate expert pages that match user queries, create an

inverted index to map keywords to experts on which they occur.

Index text contained within “key phrases” of the expert. (title ,headings ,URL anchor text within the expert page)

The inverted index is organized as a list of match positions within experts. Each match position corresponds to an occurrence of a certain keyword within a key phrase of a certain expert page.

For every expert, we maintain the list of URLs within it and for each URL we maintain the identifiers of the key phrases that qualify it.

Page 25: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Query Processing

Page 26: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Query Processing

In response to a user query, determine a list of N experts that are the most relevant for that query.

Rank results by selectively following the relevant links from these experts and assigning an authority score to each page.

Page 27: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Computing the Expert Score

We compute the score of an expert as a 3-tuple of the form (S0, S1, S2):

Expert _Score= + +

k- number of terms in the input query, qp- key phrase- key phrases that contain precisely k−i of the query terms

Page 28: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

LevelScore(p)- score assigned to the phrase based on the type of phrase it is. (LevelScore of 16 for title phrases, 6 for headings and 1 for anchor text).

FullnessFactor(p, q)- measure of the number of terms in p covered by the terms in q.

m- number of terms in p which are not in qplen- the length of p

• If m 2, then FullnessFactor (p, q)= 1• If m > 2, then FullnessFactor (p, q) = 1 − (m−

2)/plen

Page 29: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Computing the Target Score

Targets- pages pointed to by the top N experts

Select top ranked documents from this set of targets. The list of targets is ranked by Target_Score.

Target must be pointed to by at least 2 experts on hosts that are mutually non-affiliated and are not affiliated to the target.

Page 30: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

The target score T is computed in three steps:

First Step:

For every expert E that points to target T we draw a directed edge (E, T).

For each query keyword w:o occ(w, T)- number of distinct key phrases in E that contain w

and qualify the edge (E, T).

o Edge_Score(E, T) for the edge (E, T) is computed by:

• If occ(w, T) = 0 for any w, then the Edge_Score(E, T) = 0• Otherwise, Edge_Score(E, T) = Expert_Score(E) *

Page 31: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Second Step: Check for affiliations between expert pages that

point to the same target.

If two affiliated experts have edges to the same target T, then discard the edge which has the lower Edge_Score of the two.

Third Step:To compute the Target_Score of a target we sum the Edge_Score of all edges on it.

Page 32: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen
Page 33: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Evaluation

Page 34: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Evaluation Two user studies were conducted in August 1999

in order to estimate recall and precision. Both experiments involved three commercial

search engines for comparison: AltaVista, DirectHit and Google (marked as E1, E2, E3 to avoid controversy)

Page 35: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Locating Specific Popular Targets Seven volunteers were asked to suggest the home pages of

ten organizations of their choice.

Some of the queries reproduced:

Page 36: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

The same query was sent to the commercial search engines and to Hilltop.

Every time the home page was found within the first ten results, its rank was recorded.

Page 37: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Average recall at rank k is the probability of finding the desired home page within the first k results.

Page 38: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Gathering Relevant Pages The volunteers were asked to think of broad or

popular topics and formulate queries.

The 25 queries that were collected:

Page 39: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Each query was submitted to all four search engines, and the top 10 results were collected from each, recording the URL, rank and engine that found it.

For each query, a list of unique URLs in the union of the results from all engines was generated.

The list was presented to a judge in a random order, who rated each page for relevance to the given query on a binary scale.

The ratings were combined with the information about source and rank and the average precision was computed at rank k (for k = 1, 5, and 10).

Page 40: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

These results indicate that for broad subjects the engine returns a large percentage of highly relevant pages among the ten best ranked pages

Page 41: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

Conclusions Given a query, Hilltop generates a list of target

pages which are likely to be very authoritative pages on the topic of the query.

In computing the usefulness of a target page we only consider links originating from expert pages, which are directories of links pointing to many non-affiliated sites.

Page 42: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen

In computing the level of relevance, we require a match between the query and the text on the expert page which qualifies the hyperlink being considered.

For further accuracy, we require that at least 2 non-affiliated experts point to the returned page, with relevant qualifying text describing their linkage.

Hilltop delivers a high level of relevance given broad queries and performs comparably to the best of the commercial search engines tested.

Page 43: When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen