a review on “answering relationship queries on the web” bhushan pendharkar asu id 993934582
TRANSCRIPT
![Page 1: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/1.jpg)
A review on “Answering Relationship
Queries on the Web”
Bhushan Pendharkar
ASU ID 993934582
![Page 2: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/2.jpg)
Problem statement
Inability of existing search engines to answer relationship queries, although they excel in keyword matching and document ranking.
Focus of the paper on finding relationship between two entities given as queries, by finding top ranked Web pages for each query and matching them to form list of web page pairs.
Use of connecting terms for determining the relationship and ranking the Web page pairs.
Given two entities E1 and E2 , a Web search engine displays top pages which do not show any relationship between E1 and E2
Attempt to overcome the shortcoming of current search engines , by providing a system and interface for relationship queries.
Proposed system dependent on Google search engine.
![Page 3: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/3.jpg)
Solution Proposed
The proposed system accepts two entities as queries through its interface. The top ranked pages of each entity E1 and E2 are retrieved separately
from a search engine like Google. These pages or documents are preprocessed: elimination of HTML tags,
stemming of words, stop-word removal (Porter stemmer) and elimination of irrelevant words (noise removal).
Calculation of term weight for common term ‘t’ that shows relationship between P1 & P2 .( P1 is a result of query E1 , P2 of E2).
Connecting terms: terms having higher term weights Use of cosine similarity (OKAPI method) to calculate similarity between P1
and P2—( Replacing ‘document’ and ‘query’ by P1 & P2 respectively) Sorting the web-page pairs in descending order of similarity( or weights) and
displaying them along with the connecting terms for each pair.
![Page 4: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/4.jpg)
Criticism of the solution
Assumption: Top-ranked pages for E1 and top-ranked pages for E2 do not contain any relationship between E1 and E2. No ground truth provided. The fact might be the exact opposite.
Overview of the relationship between entities E1 and E2 given as a random term ‘Ec’. Explanation missing about ‘Ec’.
Less processing tasks , heavy dependence on Google results. If “Google” results are not perfect or correct (rarely…!!), the system fails. Explicit mention of “changes in results” if Google results vary.
Use of standard “Porter Stemmer”. This stemmer is not so perfect. Stemming (“ignition” is stemmed to “ignit”, “Monday” to “Mondai”)
Paper concluded by unnecessary explanation of the influence on results when the steps of the proposed approach are eliminated one at a time, although all steps are necessary for the proper implementation of the system.
![Page 5: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/5.jpg)
Relevance to IRM
Significant relevance to the topics taught in the course. The crux of the paper is similarity calculation between Web Page
Pairs(P1,P2). Cosine similarity is used for the same. The concept of TF-IDF is used for determining the term weights for
terms present in the documents P1 and P2. Use of stemming to obtain root words Ranking done on the basis of the similarity values of the Web page
pairs.
![Page 6: A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID 993934582](https://reader036.vdocuments.mx/reader036/viewer/2022082816/56649cef5503460f949be383/html5/thumbnails/6.jpg)
The proposed system accepts two entities as queries through its interface.
The top ranked pages of each entity E1 and E2 are retrieved separately from a search engine like Google.
These pages or documents are preprocessed: elimination of HTML tags, stemming of words, stop-word removal (Porter stemmer) and elimination of irrelevant words (noise removal).
Calculation of term weight for common term ‘t’ that shows relationship between P1 & P2 .( P1 is a result of query E1 , P2 of E2).
Connecting terms: terms having higher term weights Use of cosine similarity (OKAPI method) to calculate similarity
between P1 and P2—( Replacing ‘document’ and ‘query’ by P1 & P2 respectively)
Sorting the web-page pairs in descending order of similarity( or weights) and displaying them along with the connecting terms for each pair.
Significant relevance to the topics taught in the course. The crux of the paper is similarity calculation between Web
Page Pairs(P1,P2). Cosine similarity is used for the same. The concept of TF-IDF is used for determining the term
weights for terms present in the documents P1 and P2. Use of stemming to obtain root words Ranking done on the basis of the similarity values of the Web
page pairs.
Inability of existing search engines to answer relationship queries, although they excel in keyword matching and document ranking.
Focus of the paper on finding relationship between two entities given as queries, by finding top ranked Web pages for each query and matching them to form list of web page pairs.
Use of connecting terms for determining the relationship and ranking the Web page pairs.
Given two entities E1 and E2 , a Web search engine displays top pages which do not show any relationship between E1 and E2
Attempt to overcome the shortcoming of current search engines , by providing a system and interface for relationship queries.
Proposed system dependent on Google search engine.
Assumption: Top-ranked pages for E1 and top-ranked pages for E2 do not contain any relationship between E1 and E2. No ground truth provided. The fact might be the exact opposite.
Overview of the relationship between entities E1 and E2 given as a random term ‘Ec’. Explanation missing about ‘Ec’.
Less processing tasks , heavy dependence on Google results. If “Google” results are not perfect or correct (rarely…!!), the system fails. Explicit mention of “changes in results” if Google results vary.
Use of standard “Porter Stemmer”. This stemmer is not so perfect. Stemming (“ignition” is stemmed to “ignit”, “Monday” to “Mondai”)
Paper concluded by unnecessary explanation of the influence on results when the steps of the proposed approach are eliminated one at a time, although all steps are necessary for the proper implementation of the system.
Problem statement (1)
Criticism of the solution (3)
Relevance to IRM (4)
Solution Proposed (2)