service computation 2013, valencia, spain1 query optimization in cooperation with an ontological...

23
Service Computation 2013, Valencia, Spain 1 Query Optimization in Cooperation with an Ontological Reasoning Service Hui Shi, Kurt Maly, and Steven Zeil Contact: [email protected]

Upload: samuel-ezra-wright

Post on 20-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

3 Problem Efficiency of reasoning in the face of large scale and frequent change within a question/answer system over a semantic web Issues –Query (conjunction of individual clauses) optimization over databases – well understood –Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses –Query optimization in the presence of such uncertainty Service Computation 2013, Valencia, Spain

TRANSCRIPT

Page 1: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Service Computation 2013, Valencia, Spain 1

Query Optimization in Cooperation with an Ontological Reasoning Service

Hui Shi, Kurt Maly, and Steven Zeil

Contact: [email protected]

Page 2: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

2

Outline• Problem

– What are we reasoning about?– What are the challenges?

• Background– Knowledge base using ontologies– Inference strategies– Query optimization methods – Benchmarks

• Dynamic Query Optimization with an Interposed Reasoner– Greedy Ordering– Deferral of joins

• Evaluation– Comparison against Jena

• Conclusions

Service Computation 2013, Valencia, Spain

Page 3: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

3

Problem

Efficiency of reasoning in the face of large scale and frequent change within a question/answer system over a semantic web

• Issues– Query (conjunction of individual clauses) optimization over

databases – well understood

– Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses

– Query optimization in the presence of such uncertainty

Service Computation 2013, Valencia, Spain

Page 4: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

4

Background• Existing semantic application

– Question/answer systems over domain of (researchers, publications, subjects)

• Knowledge base (KB)– Ontologies– Representation formalism: Description Logic (DL)

• Inference methods for First Order Logic– Materialization and forward chaining

• pre-computes inferred truths and starts with the known data • suitable for frequent computation of answers with data that are relatively static• Owlim and Oracle

– Query-rewriting and backward chaining • expands the queries and starts with goals • suitable for efficient computation of answers with data that are dynamic and

infrequent queries• Virtuoso

Service Computation 2013, Valencia, Spain

Page 5: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Background• In conventional database management systems, query

optimization – examines multiple query plans and – selects one that optimizes the time to answer a query

• In the Semantic Web, SPARQL optimization typically based on – selectivity estimations – graph optimization– cost models

5Service Computation 2013, Valencia, Spain

Page 6: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Background• Benchmarks evaluate and compare the performances of

different reasoning systems

– The Lehigh University Benchmark (LUBM)

– The University Ontology Benchmark (UOBM)

6Service Computation 2013, Valencia, Spain

Page 7: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Approach

Dynamic Optimization with an Interposed Reasoner

• A greedy ordering of the proofs of the individual clauses according to estimated sizes anticipated for the proof results

• Deferring joins of results from individual clauses where such joins are likely to result in excessive combinatorial growth of the intermediate solution

Service Computation 2013, Valencia, Spain 7

Page 8: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Greedy Ordering - Example• Suppose there are 10,000 students, 500 courses, 50

faculty members and 10 departments in the knowledgebase and the query pattern is (?S takesCourse ?C) – What courses do students take?

• Estimate of response size– exploiting the fact that each pattern represents that application of

a predicate with known domain and range types– accumulating statistics on typical response sizes for previously

encountered patterns involving that predicate• For the example an estimate might be100,000 if the

average number of courses a student has taken is ten, although the number of possibilities is 500,000.

8Service Computation 2013, Valencia, Spain

Page 9: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Greedy Ordering - Example• Let the query be: List all cases where any student took

two courses from a specific faculty member • We can represent this query as the sequence of the

patterns in the following table

9Service Computation 2013, Valencia, Spain

Clause #

QueryPattern Query Response Response Size

1 ?S1 takesCourse ?C1 {(?S1=>si,?C1=>ci)}i=1..100,000 100,0002 ?S1 takesCourse ?C2 {(?S1=>sj, ?C2=>cj)}j=1..100,000 100,0003 ?C1 taughtBy fac1 {(?C1=>cj)}j=1..3 34 ?C2 taughtBy fac1 {(?C2=>cj)}j=1..3 3

Page 10: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Greedy Ordering - Example• Storage requirement for joins:

– Input size plus input size plus result size• Processing complexity (using hashing to

represent one set, then linear over other set):– Max(result size , input size)

Service Computation 2013, Valencia, Spain

10

Page 11: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Greedy Ordering - Example• Assume first that the patterns are processed in the order given

• Worst case (storage size) is join of clause 2, when the join of two sets of size 100,000 yields 1,000,000 tuples.

11Service Computation 2013, Valencia, Spain

Clause Being Joined

 Clause Evaluation

 Resulting SolutionSpace

 SolutionSpace

Size(initial)   [ ] 0

1 {(?S1=>si,?C1=>ci)}i=1..100,000 [{(?S1=>si, ?C1=>ci)}i=1..100,000] 100,0002  

{(?S1=>sj, ?C2=>cj)}j=1..100,000

[{(?S1=>si, ?C1=>ci, ?C2=>ci)}i=1..1,000,000](based on an average of 10 courses / student)

1,000,000

3  {(?C1=>cj)}j=1..3

[{(?S1=>si, ?C1=>ci, ?C2=>ci)}i=1..900](Joining this clause discards courses taught by other

faculty.)

900

4 {(?C2=>cj)}j=1..3 [{(?S1=>si, ?C1=>ci, ?C2=>ci)}i=1..60] 60

Page 12: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Greedy Ordering• Assume that the same patterns are processed in ascending order of

estimated size, shown in the following table

• Worst case (storage size) is final addition of clause 2, when a set of size 100,000 is joined with a set of 270

12Service Computation 2013, Valencia, Spain

Clause Being Joined

Clause Evaluation Resulting SolutionSpace SolutionSpace Size

(initial)   [ ] 03 {(?C1=>cj)}j=1..3 [[{(?C1=>ci)}i=1..3] 34 {(?C2=>cj)}j=1..3 [{(?C1=>ci, ?C2=>ci)}i=1..3, j=1..3] 31 {(?S1=>si,?C1=>ci)}i=1..100,000 [{(?S1=>si, ?C1=>ci, ?C2=>c’i)}i=1..270] 2702 {(?S1=>sj, ?C2=>cj)}j=1..100,000 [{(?S1=>si, ?C1=>ci, ?C2=>ci)}i=1..60] 60

Page 13: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Deferring joins - Example

• Suppose that we were processing the query: What mathematics courses are taken by computer science majors? Assume • The Math department teaches 150 different courses, there are 1,000 students in

the CS Dept, and there are 500 faculty overall with 50 in Math• The Query is represented as the sequence of the following QueryPatterns

13Service Computation 2013, Valencia, Spain

Clause QueryPattern Query Response Response Size1 (?S1 takesCourse ?C1) {(?S1=>sj,?C1=>cj)}j=1..100,000 100,0002 (?S1 memberOf CSDept) {(?S1=>sj)}j=1..1,000 1,0003 (?C1 taughtby ?F1) {(?C1=>cj, ?F1=>fj)}j=1..1,500 1,5004 (?F1 worksFor MathDept) {(?F1=>fi)}i=1..50 50

Page 14: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Deferring joins - Example • Assume

• the greedy ordering that we have already advocated • all joins are done immediately

• The worst step in this trace is the final join, between sets of size 100,000 and 150,000.

14Service Computation 2013, Valencia, Spain

Clause BeingJoined

 Clause Evaluation

 Resulting SolutionSpace

 SolutionSpac

e Size(initial)   [] 0

4 {(?F1=>fi)}i=1..50 [{(?F1=>fi)}i=1..50] 502 {(?S1=>sj)}j=1..1,000 [{(?F1=>fi, ?S1=>si)}i=1..50,000] 50,0003 {(?C1=>cj, ?F1=>fj)}j=1..1,500 [{(?F1=>fi, ?S1=>si, ?C1=>ci)}i=1..150,000] 150,0001 {(?S1=>sj,?C1=>cj)}j=1..100,000 [{(?F1=>fi, ?S1=>si, ?C1=>ci)}i=1..1,000] 1,000

Page 15: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Deferring joins - Example • Joins be carried out immediately only if the input QueryResponses

share at least one variable, otherwise defer the join• Replace the input QueryResponse set in the solution space with the

result of the join

• The worst join performed would have been between sets of size 100,000 and 150, a considerable improvement over the non-deferred case.

15Service Computation 2013, Valencia, Spain

Clause Being Joined

 Query Response Resulting SolutionSpace

 SolutionSpace

Size

(initial)   [] 04 {(?F1=>fi)}i=1..50 [{(?F1=>fi)}i=1..50] 502 {(?S1=>sj)}j=1..1,000 [{(?F1=>fi)}i=1..50,{(?S1=>sj)}j=1..1,000] (50+1000)3 {(?C1=>cj, ?F1=>fj)}j=1..1,500 [{(?F1=>fi, ?C1=>ci)}i=1..150 , {(?S1=>sj)}j=1..1,000] (150+1000)1 {(?S1=>sj,?C1=>cj)}j=1..100,000 [{(?F1=>fi, ?S1=>si, ?C1=>ci)}i=1..1,000] 1000

Page 16: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Evaluation• Compare our algorithm against Jena (in-memory, backward-

chaining reasoner, limited capabilities to handle some OWL semantic rules, hence only used RDFS semantics)

• Using LUBM benchmarks representing a knowledge base:– describing a single university – ~100,000 triples– describing10 universities – ~1,000,000 triples

• Using a set of 14 queries taken from LUBM, requiring reasoning over rules associated with either– both RDFS and OWL semantics, – purely on the basis of the RDFS rules.

• Comparison metrics is response time• Response size is used for

• Sanity check on correctness of results• Indicator of complexity of reasoning

16Service Computation 2013, Valencia, Spain

Page 17: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Evaluation• Comparison against Jena with Backward Chaining

17Service Computation 2013, Valencia, Spain

LUBM: 1 University, 100,839 triples 10 Universities, 1,272,871 triplesanswerAQuery Jena Backwd answerAQuery Jena Backwdresponsetime

resultsize

responsetime

resultsize

responsetime

resultsize

responsetime

resultsize

Query1 0.20 4 0.32 4 0.43 4 0.86 4Query2 0.50 0 130 0 2.1 28 n/a n/aQuery3 0.026 6 0.038 6 0.031 6 1.5 6Query4 0.52 34 0.021 34 1.1 34 0.41 34Query5 0.098 719 0.19 678 0.042 719 1.0 678Query6 0.43 7,790 0.49 6,463 1.9 99,566 3.2 82,507Query7 0.29 67 45 61 2.2 67 8,100 61Query8 0.77 7,790 0.91 6,463 3.7 7,790 52 6,463Query9 0.36 208 n/a n/a 2.5 2,540 n/a n/aQuery10 0.18 4 0.54 0 1.8 4 1.4 0Query11 0.24 224 0.011 0 0.18 224 0.032 0Query12 0.23 15 0.0020 0 0.33 15 0.016 0Query13 0.025 1 0.37 0 0.21 33 0.89 0Query14 0.024 5,916 0.58 5,916 0.18 75,547 2.6 75,547

Page 18: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Evaluation

• Our algorithm generally is faster than Jena, sometimes by multiple orders of magnitude.

• Exceptions • queries with very small result set sizes or • queries 10-13, which rely upon OWL semantics and so could not be answered

correctly by Jena.• In two queries (2 and 9), Jena timed out.

18Service Computation 2013, Valencia, Spain

Page 19: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Evaluation• Comparison against Jena with Hybrid reasoner

19Service Computation 2013, Valencia, Spain

LUBM 1 University, 100,839 triples 10 Universities, 1,272,871 triplesanswerAQuery Jena Hybrid answerAQuery Jena Hybridresponsetime

resultsize

responsetime

resultsize

responsetime

resultsize

responsetime

resultsize

Query1 0.20 4 0.37 4 0.43 4 0.93 4Query2 0.50 0 1,400 0 2.1 28 n/a n/aQuery3 0.026 6 0.050 6 0.031 6 1.5 6Query4 0.52 34 0.025 34 1.1 34 0.55 34Query5 0.098 719 0.029 719 0.042 719 2.7 719Query6 0.43 7,790 0.43 6,463 1.9 99,566 3.7 82,507Query7 0.29 67 38 61 2.2 67 n/a n/aQuery8 0.77 7,790 2.3 6,463 3.7 7,790 n/a n/aQuery9 0.36 208 n/a n/a 2.5 2,540 n/a n/aQuery10 0.18 4 0.62 0 1.8 4 1.6 0Query11 0.24 224 0.0010 0 0.18 224 0.08 0Query12 0.23 15 0.0010 0 0.33 15 0.016 0Query13 0.025 1 0.62 0 0.21 33 1.2 0Query14 0.024 5,916 0.72 5,916 0.18 75,547 2.5 75,547

Page 20: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Evaluation• Jena Hybrid means that Jena materializes some rules

• starts with longer list of tuples.

• avoiding combinatorial explosions through deferral even more important

• The times here tend to be somewhat closer, but the Jena system has even more difficulties returning any answer at all when working with the larger benchmark.

20Service Computation 2013, Valencia, Spain

Page 21: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

21

Conclusions• We reported on our efforts to use backward-chaining

reasoners to accommodate the changing knowledge base. • We developed a query-optimization algorithm that will work

with a reasoner interposed between the knowledge base and the query interpreter.

• We compared our implementation with traditional backward-chaining reasoners and found, that our implementation – could handle much larger knowledge bases – with more complete rule sets (OWL Horst)– is faster

Service Computation 2013, Valencia, Spain

Page 22: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Future Work

• We will address the issue of being able to scale the knowledgebase to the level forward-chaining reasoners can handle

• We will be working on creating a hybrid reasoner that will combine the best of forward-chaining and backward-chaining

22Service Computation 2013, Valencia, Spain

Page 23: Service Computation 2013, Valencia, Spain1 Query Optimization in Cooperation with an Ontological Reasoning…

Answering a query

23Service Computation 2013, Valencia, Spain

QueryResponseanswerAQuery(query: Query){   // Set up initial SolutionSpace   SolutionSpacesolutionSpace = empty;       // Repeatedly reduce SolutionSpace by applying // the most restrictive pattern   while (unexplored patterns remain in the query) {       computeEstimatesOfReponseSize(unexplored patterns);        QueryPattern p = unexplored patternwithsmallest estimate;               // Restrict SolutionSpace via // exploration of p QueryResponseanswerToP =BackwardChain(p);       solutionSpace.restrictTo (answerToP); } return solutionSpace.finalJoin();}