cooperative answering systems in big data big data – 2014, chasseneuil, france géraud fokou,...

11
Cooperative Answering Systems in Big Data BIG DATA – 2014, Chasseneuil, France Géraud FOKOU, Stéphane JEAN, Allel HADJALI LIAS/ENSMA-University of Poitiers, FRANCE

Upload: peregrine-osborne

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Cooperative Answering Systems in Big Data

BIG DATA – 2014, Chasseneuil, France

Géraud FOKOU, Stéphane JEAN, Allel HADJALI

LIAS/ENSMA-University of Poitiers, FRANCE

2BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

BIG DATA CONTEXT Increase of Data Production

o Sensoring Data, E.Business, Social Network

Diversification of Data Structurationo Unstrutured, semi-structured, Structured data

Distribution of data through multiple and distinct data sources

3BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

BIG DATA RETRIEVING

From 4-V to 5-V in Big Data: Visualisation

o Retrieving, querying Big Data

ObjectivesEfficiency : Speed of Process

Effectiveness: Answers Quality

Big data Big answers set

Plethoric Answers Problem:

Big data Empty answers set

Empty Answer Problem:

4BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

CONTEXT AND PROBLEMATIC Context

Structuration : Semantic Data

• Data Format : RDFS, OWL, N3,…

• Physical represenation Storage : Triplet or Vertical, Horizontal and Binaire .

• Query language : SQL, SPARQL and Hybrid Language

Problematic

Empty Answers Set: Return Alternative Answers

L1 : Lack of relaxation control → O1 : Definition of relaxation operators with control parameters

L2 : Instance-independent ranking → O2 : Our ranking function depends both on instances and queries

L3 : Integration in query language → O3 : A SPARQL extension implemented on top of Jena

5BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

CONTRIBUTIONS

6BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Contributions: Relaxation Operators Relaxation Operators

Based on Relation between Data

• Order Relation (Order in Integer Set)

• Conceptual relation (Generalization)

Similarity between query

• Based on value distance

• Based on Conceptual/Structural distance

(Distand-Based) [Huang08]

(Content-Based) [Jean13]

Operators Proposed

• Clause de Relaxation: APPROX(OP, TopK)

• Relaxation de prédicat : PRED(Q, Prop, epsilon)

• Généralisation: GEN(Q, C, level)

• Substitution: SIB(Q,C,[C1, C2,…, Cn])

• Agrégation of operators : AND

Select ?Title

Where {(?movie rdf:type Drama).

(?movie mo:Title ?Title).

(?movie mo:start 4)}

APPROX { GEN (Drama, 1) AND (PRED (Start, δ)}

7BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Contributions: Data Distance Data Distance

(Distand-Based) [Huang08]

(Content-Based) [Jean13] où dans relaxe dans is the function of the information content of a class, and is the most nearest common

ancestor of class and (Less common Ancestor)

whete is a database instance tuple and Q’ is a relaxed variant of Q

Levenstein_Distance: Mathematic distance for measuring similarity between two string

Ranking Relaxed Queries and alterntives answers

8BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Contributions: Relaxation Strategies Relaxation Strategies

Using MFS (Minimal Failing Subqueries)• Query as conjunction of criteria• Finding all the minimal conjunction of criteria which return an empty answers set

Interactive Relaxation• User based strategy• Return advice for refining query or most similar answers• Ask the queries refined

Using XSS ( maXimal Success Subqueries)• Query as conjunction of criteria• Finding all the maximal conjunction of criteria which not return an empty answers set

Automatic Relaxation• Base on the similarity and the distance• Finding all relaxed queries more similar than the original query• Find the nearest answers to the abstract model answer wanted

9BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Perspectives

Performance

Optimization of the relaxation process by using the database statistics to find the optimal step of relaxation: Selectivity

Multiple-query optimization by using the similarity between the original query and the relaxed queries

Optimization of the relaxation process to quickly find a set of alternative answers

User-aware relaxation process

Leveraging user profiles/preferences to customize the relaxation process

10BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Publications and References Géraud FOKOU, Un Framework pour la relaxation des requêtes dans les bases de données du Web Sémantique , Actes VII ièmes

Forum Jeunes Chercheurs, XXXII ièmes Congrès INFORSID 2014 (FJC-INFORSID 2014)

Géraud FOKOU,  Stéphane JEAN,  Allel HADJALI, Endowing Semantic Query Languages with Advanced Relaxation Capabilities,

Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems (ISMIS 2014), 2014

Stéphane JEAN, Allel HADJALI, Ammar M., Towards a Cooperative Query Language for Semantic Web Database Queries,On the Move

to Meaningful Internet Systems : OTM 2013 Conferences, Springer Berlin Heidelberg, September

Corby O., Dieng-Kuntz R., Faron-Zucker C., Gandon F. L., Searching the Semantic Web : Approximate Query Processing Based on

Ontologies , IEEE Intelligent Systems, 2006.

Godfrey P., Minimization in cooperative response to failing database queries, IJCIS, 1997.

Hogan A., Mellotte M., Powell G., Stampouli D., Towards Fuzzy Query-Relaxation for RDF, ESWC’12, 2012.

Huang H., Liu C., Zhou X., Approximating query answering on RDF databases, Journal of World Wide Web, 2012.

Hurtado C. A., Poulovassilis A., Wood P. T., Query Relaxation in RDF, JODS, 2008.

Poulovassilis A., Wood P. T., Combining Approximation and Relaxation in Semantic Web Path Queries, Proceedings of the 9th

International Semantic Web Conference (ISWC’10), 2010.

Hai Huang, Chengfei Liu, and Xiaofang Zhou. Approximating query answering on rdf databases. World Wide Web, January 2012.

Islam M. S., Liu C., Zhou R., On Modeling Query Refinement by Capturing User Intent Through Feedback, Proceedings of the Twenty-

Third Australasian Database Conference - Volume 124, ADC ’12, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2012.

Jannach D., Finding Preferred Query Relaxations in Content-Based Recommenders , Intelligent Techniques and Tools for Novel System

Architectures, vol. 109, Springer Berlin, Heidelberg, p. 81-97, September.

11BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE

Thank you for your attention …

Web site : http://www.lias-lab.fr