musyop: towards a query optimization for heterogeneous distributed database system in energy data...

18
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management Zhan Liu, Fabian Cretton, Anne Le Calvé, Nicole Glassey, Alexandre Cotting, Fabrice Chapuis Institute of Business Information Systems University of Applied Sciences and Arts Western Switzerland HES-SO Valais, Sierre, Switzerland

Upload: university-of-applied-sciences-western-switzerland

Post on 10-May-2015

335 views

Category:

Technology


0 download

DESCRIPTION

The integration of data from multiple distributed and heterogeneous sources has long been an important issue in information system research. In this study, we considered the query access and its optimization in such an integration scenario in the context of energy management by using SPARQL. Specifically, we provided a federated approach - a mediator server - that allows users to query access to multiple heterogeneous data sources, including four typical types of databases in energy data resources: relational database Triplestore, NoSQL database, and XML. A MUSYOP architecture based on this approach is then presented and our solution can realize the process data acquisition and integration without the need to rewrite or transform the local data into a unified data.

TRANSCRIPT

Page 1: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

MUSYOP: Towards a Query Optimization for

Heterogeneous Distributed Database System

in Energy Data Management

Zhan Liu, Fabian Cretton, Anne Le Calvé, Nicole Glassey,

Alexandre Cotting, Fabrice Chapuis

Institute of Business Information Systems

University of Applied Sciences and Arts Western Switzerland

HES-SO Valais, Sierre, Switzerland

Page 2: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

OVERVIEW

• Who Are We ?

• Introduction

• Objectives

• Research methodologies

• Architecture of Heterogeneous Distributed Database System

• Implementation and Use Cases

• Conclusion and Future Research

Page 3: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

WHO ARE WE ?

Page 4: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

WHO ARE WE ?

• 72 staff members

– 20 Professors

– 14 Research scientists

– 29 Research assistants

– 9 Administrative collaborators

and interns

• 92 Scientific publications

• 291 Projects (2013)

• CHF 8,85 millions (Turnover

in 2013)

Page 5: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

CORE COMPETENCIES

eHealth eEnergy

BPM

Cloud Computing

Data Intelligence

Ergonomy, Usability

Intelligent Agents

Internet of Things

Medical Information

Processing

Semantic Web

Software Engineering

eGov ERP eServices

Page 6: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

INTRODUCTION

• Databases in the context of an energy database management system

(EDMS) are non-integrated, distributed and heterogeneous

• As a result, information integration is becoming increasing important

• BUT, It consumes “a great deal of time and money”

Page 7: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

OBJECTIVES

• To propose a uniform approach by using Semantic Web

technologies for SPARQL querying a heterogeneous distributed

database system named MUSYOP for energy data management

• To provide transparent query access to multiple heterogeneous data

sources

• Query optimization to speed-up search executions

Page 8: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

RESEARCH METHODOLOGIES

• (1) Warehouse approach (centralized system)

High efficiency and the capability of extracting deeper information for decision

making

Must set up “data cleansing” and “data standardization” before actual use

Could take months of planning

• (2) Federated approach (decentralized system)

High adaptability to frequent changes of data sources

support of large numbers of data sources and data sources with high heterogeneity

• MUSYOP focused on the federated approach

Page 9: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

ARCHITECTURE OF HETEROGENEOUS

DISTRIBUTED DATABASE SYSTEM

Page 10: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

COMPONENTS OF ARCHITECTURE (1)

• Distributed Query Decomposer

– Generates a number of transaction to match remote data sources

– Assembles transaction results and returns to user

• Query Optimizer

– Data source optimization: data source selection and building sub-queries

– Join order optimization: determine the numbers of intermediate return results

• Distributed Transaction Coordinator

– Detects and handles persistent records and manages the communications with

databases

Distributed Query

Decomposer

Query Optimizer

Distributed Transaction Coordinator

3

Distributed transactions

Optimal sub-queries

4Query Parser 2

Parsed query

Page 11: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

COMPONENTS OF ARCHITECTURE (2)

• D2RQ

– Query relational database using SPARQL

– Custom D2RQ mapping file for describing the relation between an ontology and an

relational data model

• SPARQL Endpoint

– Query the knowledge bases, such as TripleStore via SPARQL protocole services

• SPARQL2XQUERY

– Framework provides a generic method for SPARQL to XQuery translation

• ALLEGROGRAPH

– Query MongoDB databases using SPARQL

– Manual transform datasets from MongoDB to TripleStore

– SPARQLverse

D2RQSPARQL

ENDPOINTSPARQL2XQUERY ALLEGROGRAPH

Page 12: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

ARCHITECTURE IMPLEMENTATION

User’s notification settings

Frequent data of consumption

Daily, monthly and yearly consumption data and user alert management data

Building information, cities location information and weather information

D2RQ

Local Schema DBMS

Local Query Convertor

Local Schema DBMS

MYSQL

Distributed Database Server

SPARQLENDPOINT

Local Schema DBMS

Local Query Convertor

Local Schema DBMS

MYSQL

SPARQL2XQUERY

Local Schema DBMS

Local Query Convertor

Local Schema DBMS

MYSQL

ALLEGROGRAPH

Local Schema DBMS

Relational DB

Triple Store XML NOSQL

Network

6 6 6 67 77 7Local query

ResultLocal query

ResultLocal query

ResultLocal query

Result

Network Network

Distributed Database ServerDistributed Database Server Distributed Database Server

Matching

Link

Aggregation

Page 13: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

ENERGY USE CASE

Page 14: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

SPARQL QUERY PREFIX db: <http://localhost:2020/resource/>

PREFIX vocab: <http://localhost:2020/resource/vocab/>

prefix intbatXtra: <http://websemantique.ch/onto/intBatXtra#>

prefix musyopOnto: <http://www.websematique.ch/voc/musyop#>

prefix dbp-ont: <http://dbpedia.org/ontology/>

prefix xsd: <http://www.w3.org/2001/XMLSchema#>

select ?userID ?buildingID ?buildingLabel ?cpID ?cons1 ?cons2 ?consDiff

?city ?elevation

{

SERVICE <http://localhost:2020/sparql> {

?config_port a vocab:config_ports;

vocab:config_ports_affectation "Gaz";

vocab:config_ports_batiment_id ?buildingID ;

vocab:config_ports_config_port_id ?cpID.

?donneeMens2011 vocab:donnees_mensuelles_config_port ?config_port;

vocab:donnees_mensuelles_timestamp "2011-01-

01T00:00:00"^^xsd:dateTime;

vocab:donnees_mensuelles_cons ?cons1.

?donneeMens2012 vocab:donnees_mensuelles_config_port ?config_port;

vocab:donnees_mensuelles_timestamp "2012-01-

01T00:00:00"^^xsd:dateTime;

vocab:donnees_mensuelles_cons ?cons2.

?buildingUser rdf:type vocab:building_user;

vocab:building_user_building_id ?buildingID;

vocab:building_user_utilisateur_id ?userID;

}

?build a intbatXtra:Object;

musyopOnto:hasID ?buildingID ;

rdfs:label ?buildingLabel;

musyopOnto:locatedIn ?city.

?city dbp-ont:elevation ?elevation.

Filter(?elevation < ?paramElevationMax)

BIND((xsd:double(?cons2)/xsd:double(?cons1)) as ?consDiff).

Filter(?consDiff > ?paramConsDiffMax)

}

ORDER BY ?buildingID ?userID

limit 100

VALUES (?userID) { USERID_VALUES }

Page 15: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

GRAPHICAL USER INTERFACES

Page 16: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

GRAPHICAL USER INTERFACES

Page 17: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

GRAPHICAL USER INTERFACES

Page 18: MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

CONCLUSION AND FUTURE RESEARCH

• Presented an architecture of heterogeneous distributed database

system for energy data management

• Integration of a mediator server to access and coordinate with four

different databases: relational database, NOSQL, Triple Store and

XML

• Proposed an approach for query optimization based on our

architecture

• Future work required on the evaluation our system with a large

amount of energy data in Switzerland