cooperative answering systems in big data big data – 2014, chasseneuil, france géraud fokou,...
TRANSCRIPT
Cooperative Answering Systems in Big Data
BIG DATA – 2014, Chasseneuil, France
Géraud FOKOU, Stéphane JEAN, Allel HADJALI
LIAS/ENSMA-University of Poitiers, FRANCE
2BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
BIG DATA CONTEXT Increase of Data Production
o Sensoring Data, E.Business, Social Network
Diversification of Data Structurationo Unstrutured, semi-structured, Structured data
Distribution of data through multiple and distinct data sources
3BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
BIG DATA RETRIEVING
From 4-V to 5-V in Big Data: Visualisation
o Retrieving, querying Big Data
ObjectivesEfficiency : Speed of Process
Effectiveness: Answers Quality
Big data Big answers set
Plethoric Answers Problem:
Big data Empty answers set
Empty Answer Problem:
4BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
CONTEXT AND PROBLEMATIC Context
Structuration : Semantic Data
• Data Format : RDFS, OWL, N3,…
• Physical represenation Storage : Triplet or Vertical, Horizontal and Binaire .
• Query language : SQL, SPARQL and Hybrid Language
Problematic
Empty Answers Set: Return Alternative Answers
L1 : Lack of relaxation control → O1 : Definition of relaxation operators with control parameters
L2 : Instance-independent ranking → O2 : Our ranking function depends both on instances and queries
L3 : Integration in query language → O3 : A SPARQL extension implemented on top of Jena
6BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Contributions: Relaxation Operators Relaxation Operators
Based on Relation between Data
• Order Relation (Order in Integer Set)
• Conceptual relation (Generalization)
Similarity between query
• Based on value distance
• Based on Conceptual/Structural distance
(Distand-Based) [Huang08]
(Content-Based) [Jean13]
Operators Proposed
• Clause de Relaxation: APPROX(OP, TopK)
• Relaxation de prédicat : PRED(Q, Prop, epsilon)
• Généralisation: GEN(Q, C, level)
• Substitution: SIB(Q,C,[C1, C2,…, Cn])
• Agrégation of operators : AND
Select ?Title
Where {(?movie rdf:type Drama).
(?movie mo:Title ?Title).
(?movie mo:start 4)}
APPROX { GEN (Drama, 1) AND (PRED (Start, δ)}
7BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Contributions: Data Distance Data Distance
(Distand-Based) [Huang08]
(Content-Based) [Jean13] où dans relaxe dans is the function of the information content of a class, and is the most nearest common
ancestor of class and (Less common Ancestor)
whete is a database instance tuple and Q’ is a relaxed variant of Q
Levenstein_Distance: Mathematic distance for measuring similarity between two string
Ranking Relaxed Queries and alterntives answers
8BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Contributions: Relaxation Strategies Relaxation Strategies
Using MFS (Minimal Failing Subqueries)• Query as conjunction of criteria• Finding all the minimal conjunction of criteria which return an empty answers set
Interactive Relaxation• User based strategy• Return advice for refining query or most similar answers• Ask the queries refined
Using XSS ( maXimal Success Subqueries)• Query as conjunction of criteria• Finding all the maximal conjunction of criteria which not return an empty answers set
Automatic Relaxation• Base on the similarity and the distance• Finding all relaxed queries more similar than the original query• Find the nearest answers to the abstract model answer wanted
9BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Perspectives
Performance
Optimization of the relaxation process by using the database statistics to find the optimal step of relaxation: Selectivity
Multiple-query optimization by using the similarity between the original query and the relaxed queries
Optimization of the relaxation process to quickly find a set of alternative answers
User-aware relaxation process
Leveraging user profiles/preferences to customize the relaxation process
10BIG DATA 19 -21 November 2014, Chasseneuil, FRANCE
Publications and References Géraud FOKOU, Un Framework pour la relaxation des requêtes dans les bases de données du Web Sémantique , Actes VII ièmes
Forum Jeunes Chercheurs, XXXII ièmes Congrès INFORSID 2014 (FJC-INFORSID 2014)
Géraud FOKOU, Stéphane JEAN, Allel HADJALI, Endowing Semantic Query Languages with Advanced Relaxation Capabilities,
Proceedings of the 21st International Symposium on Methodologies for Intelligent Systems (ISMIS 2014), 2014
Stéphane JEAN, Allel HADJALI, Ammar M., Towards a Cooperative Query Language for Semantic Web Database Queries,On the Move
to Meaningful Internet Systems : OTM 2013 Conferences, Springer Berlin Heidelberg, September
Corby O., Dieng-Kuntz R., Faron-Zucker C., Gandon F. L., Searching the Semantic Web : Approximate Query Processing Based on
Ontologies , IEEE Intelligent Systems, 2006.
Godfrey P., Minimization in cooperative response to failing database queries, IJCIS, 1997.
Hogan A., Mellotte M., Powell G., Stampouli D., Towards Fuzzy Query-Relaxation for RDF, ESWC’12, 2012.
Huang H., Liu C., Zhou X., Approximating query answering on RDF databases, Journal of World Wide Web, 2012.
Hurtado C. A., Poulovassilis A., Wood P. T., Query Relaxation in RDF, JODS, 2008.
Poulovassilis A., Wood P. T., Combining Approximation and Relaxation in Semantic Web Path Queries, Proceedings of the 9th
International Semantic Web Conference (ISWC’10), 2010.
Hai Huang, Chengfei Liu, and Xiaofang Zhou. Approximating query answering on rdf databases. World Wide Web, January 2012.
Islam M. S., Liu C., Zhou R., On Modeling Query Refinement by Capturing User Intent Through Feedback, Proceedings of the Twenty-
Third Australasian Database Conference - Volume 124, ADC ’12, Australian Computer Society, Inc., Darlinghurst, Australia, Australia, 2012.
Jannach D., Finding Preferred Query Relaxations in Content-Based Recommenders , Intelligent Techniques and Tools for Novel System
Architectures, vol. 109, Springer Berlin, Heidelberg, p. 81-97, September.