RDF Validation in a Linked Data world A vision beyond structural and value range validation
Post on 24-Feb-2016
Embed Size (px)
DESCRIPTIONRDF Validation in a Linked Data world A vision beyond structural and value range validation . Miguel Esteban Gutirrez, Ral Garca Castro, Nandana Mihindukulasooriya. RDF Validation Workshop September 10 th -11 th , 2013. Linked Data & the ALM iStack Project. Objective: - PowerPoint PPT Presentation
OSLC Core v2.0. Query Syntax
RDF Validation in a Linked Data world A vision beyond structural and value range validation Miguel Esteban Gutirrez, Ral Garca Castro, Nandana MihindukulasooriyaRDF Validation WorkshopSeptember 10th-11th, 2013Center for Open MiddlewareCenter for Open MiddlewareData validation is a vital step for ensuring the quality of data and the expressive languages for doing so and their related tools are essential for a data model to be adopted by the industry. Many data representation and storage technologies, like relational databases or XML, use expressive schema languages for defining the structure and the constraints on data and allow ensuring that the quality and the consistency of data is kept intact. In the context of semantic and Linked Data technologies, which are built upon the Open World Assumption and Non-unique Name Assumption, data validation becomes a challenge as the languages currently used to describe these constraints (i.e., RDF Schema and OWL) are more suited for inferring data than for data validation. There is a clear need for more expressive languages to define rules for validating RDF data.However, having a wider view on the different use cases where RDF data is being used and considering the applications that consume RDF data as Linked Data, we can discover that there are requirements and concerns that go beyond the structural validation and data value range validation. In this paper, we identify the different requirements and factors that need to be taken into account and discussed in the context of data validation in applications that publish and consume Linked Data. These factors are grouped into three main categories: data source factors, procedure factors, and context factors. We believe that having this broader view will help to identify the concrete requirements for data validation especially in the context of Linked Data.1Linked Data & the ALM iStack Project2Objective: To foster the adoption of Linked Data technologies as the means for facilitating application integration in enterprise-grade environments in the ALM domain
Challenge:Provide the means for ensuring that the data exchanged between the applications of the enterprise portfolio is consistent and valid whilst keeping the integrity of the data in each of these applications
Center for Open MiddlewareThe ideas that I will be presenting today have their origin in the work we are carrying out in the context of the ALM iStack project in collaboration with Bank of Santander.The objective of the project is to foster the adoption of Linked Data as an enabler technology for the integration of ALM applications in enterprise-grade environments.One of the challenges that weve found out during this first year of work is the necessity of ensuring that the data exchanged between the enterprise applications is consistent and valid while also keeping the integrity of the data in each of these applications.
2Use case3Simplified Corrective Maintenance Processhttp://sites.google.com/a/centeropenmiddleware.com/alm-istack/deliverables/R1.3_ALM_iStack_Proof-of-Concept.pdf
Center for Open MiddlewareAs part of the project we are building a proof-of-concept that covers a simplified version of the corrective maintenance process used in the Middleware Laboratories of Bank of Santander, which consists of the stages shown in the slide.3Use case scenario4ChangeManagementRequirement ManagementConfiguration ManagementQuality ManagementAsset Management
Data modelRemote API
Center for Open MiddlewareThe enactment of this process involves several ALM applications devoted to different ALM areas, namely: change management, asset management, configuration management, requirement management, and quality management. In addition, an additional non-ALM application for managing organizational information is also used. As shown, each of these applications has its own data model, remote API, and user interface.4Linked Data Application Maturity Model5Linked Data Enabled ApplicationExpose all or part of its data following the Linked Data principles The data exposed is sound and complete from the application perspective
Linked Data Capable ApplicationConsumes data published following the Linked Data principles
Linked Data Aware ApplicationLinked Data Enabled and Linked Data Capable applicationCapable of integrating its own data with other Linked DataRequire RDF validation processCenter for Open MiddlewareWhen developing the PoC, we could differentiate three types of applications w.r.t. their exposure to Linked Data technologies.First we have LDEA, which are those that expose data following LD principles, where this data is sound and complete from the application perspective.On the other hand, LDCA are those that can consume data published following the LD principles, that is, they crawl the web looking for additional resources according to their business needs. Finally, LDAA are those LDE and LDC applications capable of integrating their own data with other Linked Data. The main difficulty in the development of LDC and LDA applications is that of validating the data exchanged with other parties during the enactment of its business processes. 5Designing the RDF validation process6Data source factorsBehavioral aspectsStructural aspects
Procedure factorsData aspectsTemporal aspects
Context factorsOperational aspectsCenter for Open MiddlewareThe implementation of a validation process requires considering the different factors that impact the process in this scenario.First we have the data source factors: it is necessary to have a clear understanding of the characteristics of the data sources that will be used, as they will determine the expectations the application will have with regards to the content provided as well as how it is provided.In addition we have to think in advance about the procedure itself. Thus, it is necessary to take into account aspects related to the data that will be validated as the temporal side of the procedure.Finally, it is also necessary to understand the context in which the process is to be applied. When talking about applications, the context is the operation within which validation is required.6Data source factors (I)7Dynamics
Static data (i.e., periodic bug reports)One-time validation (i.e., validation caching)
Variable data (i.e., live data)Timely updated data (i.e., statistical bug reports) Periodic validation (i.e., validation caching + validation triggering)
Randomly updated data (i.e., a particular bug) Per-operation validation
Center for Open MiddlewareThe first data source factor is the dynamics of the data source. For instance, the data source may be serving static data, that is, data that once it is published is never changed, for instance a periodic bug report. When consuming this data the application only needs to validate it once, so we could have some kind of validation cache that saves as from repeating the process.But the application can also serve variable data, i.e., live data. This data can be timely updated (i.e., statistical bug reports). In this case, the data has to be validated periodically. Knowing the period we could have some kind of validation triggering mechanism to update our validation cache and keep the consistency of our data.Finally, we have randomly updated data, i.e., a particular bug that is updated by developers as required. In this scenario the validation has to be carried out in each operation.7Data source factors (II)8Publication strategyDisallow inline resource definitionAllow inline resource definitionResource aggregation patternResource composition patternproducts:prod3231 a ai:Product; dc:title "SEALS Platform"@en; ai:isInvolvedIn roles:role1002; ai:hasWorkingGroup wg:wg44.products:prod3231 a ai:Product; dc:title "SEALS Platform"@en; ai:isInvolvedIn roles:role1002; ai:hasWorkingGroup wg:wg44.
wg:wg44 a foaf:Group; ai:belongsToWorkArea ai:workAreaDevelopment; ai:member person:pr82; ai:worksInProduct products:prod3231.
roles:role1002 a ai:ProductRole; ai:involves ai:roleMaintainer; ai:involves products:prod3231; ai:involves person:pr82.products:prod3231 a ai:Product; dc:title "SEALS Platform"@en; ai:isInvolvedIn comp:mpr; ai:hasWorkingGroup comp:wg.
comp:wg a foaf:Group; ai:belongsToWorkArea ai:workAreaDevelopment; ai:member person:pr82; ai:worksInProduct products:prod3231.
comp:mpr a ai:ProductRole; ai:involves ai:roleMaintainer; ai:involves products:prod3231; ai:involves person:pr82.@prefix dc: . @prefix foaf: .@prefix ai: .@prefix products: .@prefix comp: .@prefix roles: .@prefix wg: .@prefix persons: .get http://www.example.org/oms/ldp/products/prod3231Center for Open MiddlewareAnother factor is related to the publication strategy used by the data source for exposing its data. Thus, the data source may disallow inlining resource definitions, that is, the contents available via a given URL only refer to the resource identified by that URL and no others (in other words, all the triples have as subject that URL or are blank nodes related to the subject resource). However, the data source may also allow inlining resource definitions, supporting in this way resource aggregation and composition patterns. On the one hand, in the case of the resource aggregation pattern, the contents also include full or partial definitions of other individuals identified with their own URLs, which may save some round trips to the application.On the other hand, in the case of the resource composition pattern, the contents also include the full definition of the composite resources that are part of the main resource, where each of them is identified with a