presented by: sandeep chittal minimum-effort driven dynamic faceted search in structured databases...

30
Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das, Ullas Nambiar, Mukesh Mahanja

Upload: lenard-barker

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Presented by: Sandeep Chittal

Minimum-Effort Driven

Dynamic Faceted Search

in Structured Databases

Authors:

Senjuti Basu Roy, Haidong Wang, Gautam Das, Ullas Nambiar, Mukesh Mahanja

Page 2: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

• Introduction• Faceted Search as an alternative to

Ranked Retrieval• Faceted Search in conjunction with

Ranking Functions• Evaluation• Related Work• Conclusion

Agenda

Page 3: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

• Paradigm:– Suggest facets to drill down into database such that the

cost of navigation in minimized.

• Facet selection based on:– Ability to rapidly drill down to most promising tuples.– Ability of user to provide desired values for the facets.

• Proposed dynamic technique:– Ask user question/s on different facets– Dynamically fetch next most promising set of facets

based on user response– Repeat above steps

Faceted Search

Page 4: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Primary problem• Facilitate effective search for data records

within vast data warehouses.

• Sub problems:– Non-unique identifier for relevant tuple– Partial information Examples: Bank scenario and Car Buyer scenario

• Necessity of an effective search procedure.

Page 5: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Approaches for tuple search• Ranked Retrieval from databases

– Rank and retrieve top-k most relevant tuples that satisfy given conditions.

Example: Selection of Car

• Faceted search in databases– Drill down to the tuple via different facets of

dataset.Example: search for pic of Great Wall of China

Page 6: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Main Goal of paper• Explore opportunities of adapting principles of

faceted search paradigm for tuple search in structured database.

• Motivation:– Structured databases associated with rich meta-data.

(tables, attributes, dimensions, domain ranges, …)

• Challenge:– To determine best suited attributes for enabling

faceted search interface from abundance of meta-data.

Page 7: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Broad Problem Areas1) Faceted Search as an alternative to Ranked

Retrieval

– No tuple relevance and ranking function available– Develop dialog with user to judiciously next facets

dynamically.– Metric of effort:

• Expected no. of queries user has to answer to reach tuples of interest.

– Idea proposed:• Cost model for fast tuple search assuming that

attributes are associated with uncertainties

Page 8: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Broad Problem Areas2) Faceted Search that leverages Ranking Functions

– Question:• Can faceted search work in conjunction with ranking

functions?– Complications:

• Ranking functions impose skew over user preferences.

• Reevaluation of ranking function necessary as the faceted search progresses.

– Benefits:• Focused retrieval as well as drill-down flexibility.

Page 9: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Main Contributions• Initiate research into problem of

automated faceted discovery for enabling minimal effort browsing of tuples in structured databases.

• Extend methods to work in conjunction with ranking functions for tuples.

Page 10: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

FS as an Alternative to Ranked Retrieval

• Notations:– D be a relational table– tuples set, D= {t1, t2, . . . , tn}– Attributes set, A = {A1,A2, . . . , Am}– Each with Ai domain Domi

• Task:– To build a decision tree which distinguishes

each tuple testing attribute values• Node = Ai

• Edge = Domi of Ai

Page 11: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

FS as an Alternative to Ranked Retrieval

Page 12: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

where ht(ti) = height of leaf ti

• Approach:– Make the attribute that distinguishes max no. of pairs of tuples

as the root of the tree.• Intuition:

– Select root that minimizes no. of indistinguishable pairs of tuples.

• Function formulated:

Indg(Actor) = (2)(2-1)/2 + (1)(1-1)/2 + (1)(1-1)/2 = 1Indg(Genre) = Indg(Color) = (3)(3-1)/2 + (1)(1-1)/2 = 3Thus, Actor should be root.

• Cost of Tree: Average tree height =

Page 13: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Another decision tree Optimal decision tree

Cost of Optimal tree = (2+2+1+1)/4 = 1.5

Page 14: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Other Attribute Selection Procedures

• Information gain heuristic produces different trees than the approach of minimizing indistinguishable pairs of tuples.

• Select facet with largest information gain.

Page 15: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Difference between prior and papers Algorithm

• Prior Algorithms:– Designed for classification problem– Maximize classification accuracy– Avoid over-fitting

• This paper’s Algorithm:– Build full decision trees– Minimize average root-to-leaf path lengths

Page 16: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Comparing against PCA• PCA developed for dimensionality reduction in numerical datasets.• Our case: reduce from 3 to 2 attributes.

– Cost to retain (Genre, Color) = 2– Cost to retain (Actor, Genre) = 1

• Retain ones with smallest modes.

Page 17: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Modeling Uncertainty in User knowledge

Decision tree with uncertainty models

Page 18: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Single Facet Based Search Algorithm

• Obscure attribute that has little chance of being answered correctly by most users, but is otherwise very effective in distinguishing attributes, will be overlooked in favor of other attributes in the decision tree construction.

Page 19: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Designing a Fixed k-Facets Interface

• k-Facet Selection• Why extend ? • Problem:

– Given: database D, number k, uncertainties p i for attributes Ai

– Select k attributes such that expected no. of tuples that can be distinguished is maximized.

• Overall Idea:– Given: set A’ of k’ attributes

– Select next attribute Al such that expected number of pairs of tuples that cannot be distinguished by A’ U {Al} is minimized

Page 20: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Implementation techniques• If database is static

– Pre-compute decision trees

• If database is dynamic– Construct partial tree with few look-ahead nodes

• If database is highly dynamic (constant updates)– Persist with decision tree created at start– Fresh construction deferred to reasonable intervals.

Page 21: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

FS in Conjunction with Ranking Functions

• Ranking functions help to focus retrieval according to user selection

• Cost of Decision Tree:

• Facet Selection Algorithm:

• Comparison against other Attribute Selection Procedures

Page 22: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Evaluation• Cost

– Average no. of user interactions (facets selected) before tuple is identified

• Time– Complexity of node creation step of tree

building

• Comparison of Selection techniques– Existing techniques with paper’s approach.

Page 23: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Cost decreases with higher attribute probability

Page 24: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Cost increases with increasing database size

Page 25: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Average node creation time increases with increase in database size / width

Page 26: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Paper’s Algorithm performs better than Existing approach.

Page 27: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Average node creation time increases with increase in database size

Page 28: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Difference of approach from prior work

• Considers uncertainty models (inability of user to answer certain attributes)

• Decision-tree based and depends on user interaction

• Algorithms can work in conjunction with available ranking functions

Page 29: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Conclusion• Tackled problem of building faceted search

interfaces over enterprise data warehouses for providing minimal effort tuples navigation solution

• Selection of facets based on:– Ability to rapidly drill down most promising tuples– Ability of user to provide desired values for the facet

• Provided solutions that can consider bias over tuple introduced by ranking function

• Future work:– Techniques to work with multi-table databases– Faceted interfaces that span both structured and

unstructured data sources

Page 30: Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

Thank you!!!