querying for information integration: how to go from an imprecise intent to a precise query? aditya...
TRANSCRIPT
Querying for Information Integration: How to go from an Imprecise Intent
to a Precise Query?
Aditya TelangSharma Chakravarthy, Chengkai Li
Motivation
• “Retrieve castles near London that are reachable by train in less than 2 hours”
• “Find 3-bedroom houses in Houston within 2 miles of a school and within 5 miles of a highway and priced under 250,000$”
• “Retrieve French restaurants within 1 mile of IMAX Theater in Dallas, Texas”
• …
Current Scenario
•“Retrieve castles near London that are reachable by train in less than 2 hours”
London Train schedules
Trains from London
Castles Near London
- Decision Making Process- Manually Combine Results to arrive at a decision
- Decision Making Process- Manually Combine Results to arrive at a decision
Ideal Scenario
Information Integration
System
Intent: Retrieve castles near London that are reachable by train in less than 2 hours
Actual Results for the intent
The InfoMosaic Approach
Query Specification
• Query: “Bank within 1 mile of ‘University of Texas, Arlington’”
Query: Castle within 2 hours by train from London
How to specify a query?
• Search Method (e.g., Google) –– Just needs to search for the ‘keyword’ in a set of
documents– Get list of documents and post-process (rank,
cluster, classify, etc.)– In an Integration scenario, this doesn’t work
• ‘bank’ – D1• ‘University of Texas, Arlington’ – D2• 1 (out of 1 mile) is ignored• Intersecting documents returned will not generate
results desired
How to specify a query?
• Database Method (e.g., SQL) –– Too rigid– Need to know database (or source) and its
corresponding attributes• SELECT T1.a1, T2.a2 FROM T1, T2 WHERE …
– Web is not organized as a database hence exact mapping between sources and attributes is not feasible and not available
How to specify a query?
• Natural Language –– Ideal mechanism– Inherently hard considering ambiguities of natural
language.– school – institution for education; group of fish– Mechanisms such as Question-Answering
frameworks focus on sophisticated language models built for specific domains independently.
– Incorrect assumption in a integration scenario
Query Specification
• Query – “castle near London”
List of documents retrieved from Web containing text – “castle near London”
Relation containing tuples
SELECT castle.name, …FROM castle_DBWHERECastle.location = ‘London’
Query Specification
• Query – “castle near London”
InformationIntegration
No idea about source, schema, attributes, etc.No idea about how to pose a query
No idea about user intent –Castle:= building, move in chess, … ?
SELECT castle.*WHERE castle.place = “London”
Proposed Approach
• Approach: refine-as-you-input• Approach: verify-after-input
Amount of input
Number of Interactions
(verify-after-input)
(refine-as-you-input)
1
2
3
4
Succinct (keyword)
Verbose (natural language, SQL)Ideal Formulation
Approach 1: Refine-as-you-input
• Based on most popular paradigm of querying used today – keyword search– Input: Set of keywords/concepts (e.g., castle, train, …)– Output: Set of 1 or more Precise Structured Query– Challenge:
• Keyword Resolution – entity, attribute, value?• Generating Query from minimal information
– Problem:• Could result in too many non-relevant queries
– Positive:• Paradigm accepted by Web, IR and even DB community !!!
Approach 1: Refine-as-you-input
User Interaction
Approach 1: Verify-after-input
• Based on a rigorous method of formulation queries – similar to SQL– Input – user filled template based– Output – single precise query– Problem
• Users don’t like filling too many details• Coming up with a unique template across domains
– Positive • Less ambiguous• Reduced number of user interactions
Approach 1: Verify-after-input
Evaluation Plan
• Testing the approaches on RDBMS where the schema and output is known
• Actual user studies
Related Work
Future Work
• Perform extensive experiments to prove the validity of the proposed approaches
• Address other issues in information integration
• Current focus – Ranking [Telang:DBRank’07]
Thank You !