intersection schemas as a dataspace integration technique 8/21/20141 richard brownlowalex...

Post on 31-Mar-2015

221 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Intersection Schemas as a Dataspace Integration

Technique

04/11/2023

Richard Brownlow Alex Poulovassilis

04/11/2023 2

Contribution• A new methodology for lightweight data

integration in an incremental pay-as-you-go environment based on the concept of “Intersection Schemas”, utilising bidirectional transformations at a schema level.

• Improve on existing workflows for data integration, to increase the productivity of the incremental Data Integration process.

• Development of a demonstrator and user interface to aid the data integrator

04/11/2023 3

Intersection Schemas• Implements a framework for incremental data

integration. A component within the existing AutoMed data integration framework.

• Introduces a new “pay-as-you-go” technique of Intersection Schemas. This allows the integrator to incrementally identify intersections between schemas, and integrate them into the Global Schema.

04/11/2023 4

AutoMed Architecture

04/11/2023 5

Data Integration via Union-compatible Schemas

04/11/2023 6

Intersection Schema

04/11/2023 7

Integrated Intersection and Extensional

Schemas

04/11/2023 8

Global schema derived from

Intersection and Extensional Schemas

04/11/2023 9

Case StudyISpider

• Proteomics data from three different data sources• Mappings defined by domain experts• Mappings constitute the domain knowledge

04/11/2023 10

Illustrative Use Case

Based on iSpider Datasetso Three data sources:

• gpmDB• Pedro• Pepseeker

04/11/2023 11

Illustrative Use CaseGUI

04/11/2023 12

Workflow1. Identify the extensional schemas representing the set of data

sources that are to be integrated.2. Initially a federated schema is created from the schemas

identified in Step 1. 3. Inspect the schemas identified in Step 1 and select two of them

from which to derive an intersection schema.4. Identify mappings between these two schemas and create an

intersection schema. 5. A new Global Schema is created automatically from the

Intersection Schema and the extensional schemas by our tool. The user may optionally elect for any redundant objects in the new Global schema to be dropped.

6. The user may test the Intersection schema or Global schema at this stage by running queries on it.

7. Repeat Steps 3 to 6 for each integration iteration.

04/11/2023 13

Evaluation• Comparison of Intersection Schema methodology

versus a “classical” ladder based integration methodology:

• For ladder based integration integration:• 95 manually defined transformations

• For Intersection schema based integration: • 26 manually defined transformations

04/11/2023 14

ConclusionsWe have demonstrated the technique on a real-world data integration scenario and have seen that the number of user-defined steps required to perform the integration is significantly reduced compared to the original data integration methodology used by the domain experts on that project.

We have shown how the AutoMed toolkit and bidirectional schema transformations can be used to underpin a new light-weight data integration technique within an incremental pay-as-you-go data integration process.

04/11/2023 15

Future Work• Extending the methodology so that intersections

can be created between any number of source schemas at each iteration of the process, rather than just two as at present.

• Detailed user evaluations.

04/11/2023 16

Any Questions

04/11/2023 17

Appendix

Example iSpider transformations from original project.

04/11/2023 18

04/11/2023 19

top related