effective xml keyword search with relevance oriented ranking paper by: zhifeng bao, tok wang ling,...
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/1.jpg)
Effective XML Keyword Effective XML Keyword Search with Relevance Search with Relevance
Oriented RankingOriented Ranking
Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu
Presented by: Ilanit GoldshteinSeminar in Databases, Winter 2009
1
![Page 2: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/2.jpg)
Introduction
• XML Keyword search– Enables users to access information in XML
databases
2
– XML data is modeled as a rooted, labeled tree
– Inspired by IR (Information Retrieval) style keyword search on the web, that is designed mostly for text databases.
![Page 3: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/3.jpg)
Why do we need keyword search?• The extreme success of web search engines makes keyword search the most
popular search model for ordinary users.• As XML is becoming a standard in data representation, it is desirable to support
keyword search in XML database without the knowledge of complex query languages.
3
• Recent research efforts
•Efficiency
•Effectiveness
![Page 4: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/4.jpg)
Effectiveness
• Capture user’s search intention– Identify the target that the user intends to search for
– Infer the predicate constraint that user intends to search via
4
•Result ranking
–Rank the query results according to their objective relevance to user search intention
![Page 5: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/5.jpg)
Today’s approach
• one widely adopted approach so far is to find the smallest lowest common ancestor (SLCA) of all keywords of a query.
5
• Each SLCA result of a keyword query contains all query keywords but has no subtree which also contains all the keywords.
![Page 6: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/6.jpg)
Today’s approach - SLCA• For example, the keyword query: “customer,
interest, art”
6
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID nameinterest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
• Wanted result: C4
![Page 7: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/7.jpg)
Today’s approach – SLCA (cont)• SLCA will return 5 results, which include the title of book B1 and the customer nodes with IDs from C1 to C4• Since SLCA cannot well address the search intention, all these 5 SLCA results are returned without any
ranking applied.
7
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID nameinterest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
![Page 8: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/8.jpg)
Problems in SLCA
• Doesn’t address the user’s search intention adequately:– Meaningfulness of query result
– Keyword Ambiguity Problems1. A keyword can appear both as an xml node type and as
the text value of some other nodes
2. A keyword can appear in the text values of different xml node types and carry different meanings
8
![Page 9: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/9.jpg)
Meaningfulness• Keyword query “rock music”
– Search intention: find customers interested in “rock music” – wanted result is C3
– SLCA returns: interest node of C3
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID nameinterest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
Problems
9
![Page 10: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/10.jpg)
Keyword Ambiguity (cont)
• Keyword query “customer, art”– “art” can be the value of interest node (C2, C4), name node (C3),
street node of customer (C1), or title node of book (B1)– “customer” can be tag name of customer node, or (part of) value of
title of (B1)
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID name
interest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
10
Problems
![Page 11: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/11.jpg)
Objectives & Challenges
• ChallengesI. How to decide which sub-tree(s) with appropriate node types can
capture user desired information
II. How to return sub-trees of an appropriate size (i.e. contain enough but non-overwhelming information)
III. How to rank those sub-trees by their relevance
• Address all the problems as a single problem – Search intention identification– Query result retrieval– Result ranking
– Extend original TF*IDF from text database to XML database, while capturing the hierarchical structure of XML data
11
![Page 12: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/12.jpg)
TF*IDF
• TF*IDF (Term Frequency * Inverse Document Frequency) similarity is one of the most widely used approaches to measure the relevance of keywords and documents in keyword search over flat documents
12
• The main idea of TF*IDF is summarized in the following three rules.
![Page 13: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/13.jpg)
TF*IDF (cont)
13
• Rule 2: A document with more occurrences of a query keyword should not be regarded as less important for that keyword than a document that has less. --- TF
• Rule 1: A keyword appearing in many documents should not be regarded as more important than a keyword appearing in a few. --- IDF
• Rule 3: A normalization factor is needed to balance between long and short documents
–as Rule 2 discriminates against short documents which may have less chance to contain more occurrences of keywords.
![Page 14: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/14.jpg)
Challenges
Difficulty in applying TF*IDF to XML:XML DB carries semantic information while text DB
contains pure text information. XML TF*IDF must be aware of the underlying semantics.
All contents of XML data are stored in leaf nodes only
Normalization factor is not simply the size of sub-treeo Structure of sub-trees may also infest the ranks
14
![Page 15: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/15.jpg)
The solution
15
- Extend IR-style keyword search techniques (like TF*IDF) from text database to XML database, in order to capture the hierarchical structure of XML documents• by analyzing the knowledge of statistics of underlying
XML data
![Page 16: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/16.jpg)
The solution (cont)
16
1.Identify user’s desired search-for node and search-via node(s) in a heuristic way
Define XML TF (term frequency) and XML DF (document frequency)
Confidence Formulas for search for/via candidates
2.Define XML TF*IDF Similarity
Propose 3 guidelines specifically for XML keyword search
Take keyword ambiguity problems into account
3.Design a Keyword Search Engine XReal
– Major Contributions
![Page 17: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/17.jpg)
Data Model • Node type - the prefix path from root to n
• Two nodes are of same node type if they share the same prefix path /storeDB/customers/customer/name vs.
/storeDB/books/book/publisher/name
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID name
interest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
17
![Page 18: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/18.jpg)
Data Model (cont)
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID name
interest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
• Value node – Text values contained in leaf node• Structural node - An XML node labeled with a tag name
Single-valued node type, multi-valued node type Grouping type – all its children are of same multi-valued type
18
![Page 19: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/19.jpg)
XML TF and IDF
• XML TF (term frequency)– The number of occurrences of a keyword k in a
given value node a in XML database.
Tkf
,a kf
19
•XML DF (document frequency)
–The number of T-typed nodes that contain keyword k in their sub-trees in XML database.
![Page 20: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/20.jpg)
Infer the desired search-for node• Guidelines: A node type T is considered as a desired
search for node if:1. T is intuitively related to every query keyword2. XML nodes of type T should be informative enough to contain
enough relevant information3. XML nodes of type T should be not overwhelming to contain too
much irrelevant information
• And so, the confidence of a node type T to be the desired search for node type is:
( )( , ) log (1 )*T depth Tfor e k
k q
C T q f r
20
![Page 21: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/21.jpg)
The desired search-for node (cont)
• Confidence of T as the search for node with respect to query q.
• product instead of sum is used to follow 1st guideline• exponential part designed to follow 2nd guideline• log part designed to follow 3rd guideline• r is some reduction factor with range (0,1], usually chosen to
be 0.8 for good performance.
• With the confidence of each node type being the desired type, the one with the highest confidence is chosen as the desired search for node.
21
![Page 22: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/22.jpg)
The desired search-for node (cont)
• Example: Given a query “customer interest art”, node type customer usually has high confidence as the desired node type to search for, because the values of the three statistics:
(i.e. the number of sub- trees rooted at customer nodes containing “customer”, “interest” and “art” in either nested text values or tags respectively) are usually greater than 1. In contrast, node type customers doesn’t have high confidence since the values of all three statistics = 1.
22
," " " " " "customer customer customer
and artin terestcustomerf f f
![Page 23: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/23.jpg)
Infer the desired Search-Via nodes• Infer the structural nodes to search via
– Structural node n is a good candidate if it is related to as many (but not necessarily all) keywords as possible
– the confidence of a node type T to be a desired type to search via is:
• Search via node type normally is not unique
• Infer individual value nodes to search via– Statistics alone is not adequate to infer the likelihood of a value
node as (part of) search via node
– Capture keyword co-occurrence
( , ) log (1 )Tvia e kk q
C T q f
23
![Page 24: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/24.jpg)
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID nameinterest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
• E.g. Q = “ customer, name, rock, interest, art ” Easy to find that name and interest have high confidence to be the
search via nodes But hard to know rock is value of name or interest,
art is value of interest or nameHow to differ customer C4
from C3?
Capture keyword co-occurrence
24
![Page 25: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/25.jpg)
Capture keyword co-occurrence (cont)
• Proximity factors for a value node v of type kt containing keyword k– In-Query distance
• Distance between keyword k and node type kt in query q
• Favors: kt appears before k
– Structural distance • Depth distance between v and the nearest kt typed
ancestor node of v
– Value-Type distance • Max of the above two
25
( , , , )tqDist q v k k
( , , , )tsDist q v k k
( , , , )tDist q v k k
( )
1( , , ) 1
( , , , )t
viatk q ancType v
C q v kDist q v k k
The confidence of a value node v as the node to search via w.r.t. a keyword k appearing in both query q and v
![Page 26: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/26.jpg)
Principles of XML keyword search
• Principle 2– When searching for nodes of type D via a multi-valued type V’,
the relevance of a D-typed node which contains a query relevant V’-typed node should not be affected (i.e. normalized) too much by other query-irrelevant V’-typed nodes.
26
•Principle 1
–When searching for D-typed nodes via a single-valued type V, ideally only the values and structures nested in V-typed nodes can affect the relevance, regardless of the size of other typed nodes nested in D-typed nodes.
•However, TF*IDF similarity in IR normalizes the relevance score of each document w.r.t. its size
![Page 27: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/27.jpg)
Principles of XML keyword search (cont)
27
customers
storeDB
books
... ...book
title publisherIDauthors
author“B 2 ”
...
“Edward Martin”
“Sophia Jones”
author
customer
IDname
interest
interests...
“art”“Rock Davis”
“C 4 ”
...
“Daniel Jones”“John Williams”
book
title...
IDauthors
author“B 1 ”
author
“Art of Customer Interest Care”
customer
IDname
addressinterest
streetcity
interestscontact
no.
“1”
“Art Street”...
...
“fashion”“Mary Smith”
“C 1 ”
customer
IDname
interest
interests
“rock music”“Art Smith”
“C 3 ”
purchase
purchases
customer
ID nameinterest
interests
“street art”“John Martin”
“C 2 ”
...
......
...name
“Oxford”
• Example for principle 2• This principle addresses keyword Ambiguity 2• For the query “art” - C4 should not be less relevant than C1
![Page 28: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/28.jpg)
Principles of XML keyword search (cont)
• Principles 1 and 2 – Especially useful for interpreting pure keyword query -
find search via node correctly
• Principle 3– The order of keywords in a query is important to indicate
the search intention
• Incorporate the search via confidence Cvia we defined before
28
![Page 29: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/29.jpg)
XML TF*IDF Similarity• To calculate the similarity between the search for
node and the query q– Base case: similarity between value node a and q
• Apply original TF*IDF directly since a contains keywords only without any structure
– Recursive case: similarity between structural node n and q• Based on similarities of its children c and the confidence
level of c as the node type to search via
29
![Page 30: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/30.jpg)
( , )similarity q a ,, *
*
Taa kq k
k q aTaq a
W W
W W
IDF TFNormalization factor
30
XML TF*IDF Similarity (cont.)
, ( , , )*ln 1 / (1 )a a
a
T Tq k via T kW C q a k N f
2, , ,1 ln( ),a k a k a a k
k a
W f W W
Intuition: An internal node n is more relevant to q if n has more query-relevant children when all others being equal.
![Page 31: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/31.jpg)
Flowchart of answering a query
1. Identify user search intention– Compute the confidence of all possible candidate node types and
choose desired search for node Tfor
31
2. Relevance-oriented ranking• Compute XML TF*IDF similarity in a bottom-up approach from value nodes containing keywords up to nodes of type Tfor • Return a ranked list of sub-trees rooted at nodes of type Tfor
• If more than one search for node type have comparable confidence, a ranked list for each search for node is returned
![Page 32: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/32.jpg)
Experimental Result
• Data sets– DBLP, XMark, WSU, eBay
• Comparison– Compare XReal- a search engine prototype
that implements the proposed techniques with SLCA and XSeek- An existing Semantic XML search engine using keywords.
32
![Page 33: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/33.jpg)
Search Effectiveness
• Accuracy in inferring the search for node– Conducted by user survey– Tested queries contain at least one of the two
ambiguity problems (examples are in the next slide)
33
Conclusion:XReal works well, especially when the search for node is not given explicitly in the query
![Page 34: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/34.jpg)
Search Effectiveness (cont)
34
![Page 35: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/35.jpg)
Search Effectiveness (cont)
• Result effectiveness– Observations:
• XReal achieves higher precision than SLCA and XSeek for queries that contain ambiguities
• XReal Performs as well as XSeek when queries have no ambiguity in XML data
• XReal: Top-100 precision higher than overall precision
• F-measure also shows good overall effectiveness of both XReal and XSeek
35
![Page 36: Effective XML Keyword Search with Relevance Oriented Ranking Paper by: Zhifeng Bao, Tok Wang Ling, Bo Chen, Jiaheng Lu Presented by: Ilanit Goldshtein](https://reader036.vdocuments.mx/reader036/viewer/2022062714/56649d695503460f94a47574/html5/thumbnails/36.jpg)
Thank You!
36