![Page 1: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/1.jpg)
Semi-automatic Generation of Data-Intensive APIs
Shumet Tadesse
Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof. Katja Hose
eBISS 2019 - Berlin, 5th July 2019
![Page 2: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/2.jpg)
2
First step: ARDI
Outline▪ Context •Data-intensive APIs
•Challenges
•Proposed solution
▪First Step•Background
•Our approach
•ARDI Architecture
▪ Next Steps
▪ Publications
![Page 3: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/3.jpg)
Backend
Systems
Context: Data-intensive APIs
3
• Social Networks (such as Twitter, Facebook)
rely on APIs to expose their internal data
sources
• API is a set of rules, protocols and tools that enable interactions between applications
• At the same time, Businesses build APIs for their customers, or for internal use
Data-intensive API
![Page 4: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/4.jpg)
Backend
Systems
Context: Challenges
4
▪ However, building data-intensive APIs is time-consuming and burdensome
▪ Data-Intensive APIs have traditionally been created manually
▪ It can be reduced to the Data Integration Problem
• needs to deal with highly heterogeneous data sources
Data-intensive API
![Page 5: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/5.jpg)
Challenges(2)
5
Merging, Mapping
Refactoring, Alignment
Generating canonical representationManual
Tasks
Integration
Schema
Various
Data
Sources
▪Data Integration is a means to an end•expressing each data source in terms of a canonical data model
• creating a single unified view of the sources, and
• mapping the data sources to the target schema
![Page 6: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/6.jpg)
Proposed Solution
▪ There is a need for systems to automate as much as possible the
cumbersome and time-consuming task of integrating heterogeneous
data
6
Golshan, B., Halevy, A., Mihaila, G., Tan, W.C.: Data integration: After the teenage years. In: SIGMOD-SIGACT-SIGAI. pp. 101–106. ACM (2017)
Merging, Mapping
Refactoring, Alignment
Generating canonical representation
Manual TasksSemi-automate
![Page 7: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/7.jpg)
ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources for Data Integration
7
Merging, Mapping
Refactoring, Alignment
Generating canonical representation
![Page 8: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/8.jpg)
▪Source data typically come in terms of schemaless data models such as XML or JSON
• For schemaless data formats there is typically no available meta-data
▪Semantic modeling languages become a key technology for data standardization and conceptualization
▪Semantic web community has overlooked the need to generate schema information from data sources automatically
8
Background
![Page 9: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/9.jpg)
▪Approaches for moving data sources to the Semantic Web
• instance-level: generate a semantic representation of the data (instances)
• schema-level: translate schema information
▪Schema-level approaches, however
•do not guarantee to produce meta-model compliant schemas,
•do not fully cover all schema elements that we may find in semi-structured data models (e.g., arrays in JSON), and
•ignore the RDFS meta-model
9
Background(1)
▪We follow a meta-modeling approach
![Page 10: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/10.jpg)
Our Approach: Why meta-modeling?
▪The capability of supporting different abstraction levels
▪Helps to maximize the extent to which data can be integrated by separately expressing schema information and the data itself
▪Ensures interoperability
▪From a technical point of view:•help to minimize development time and
•maximize efficiency and productivity
Chang, D.T., Kendall, E.: Metamodels for rdf schema and owl. In: MDSW 2004, Monterey, USA (2004)
10
![Page 11: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/11.jpg)
Our Approach: RDFS as a canonical data model
▪ Expressive
▪ Flexible
▪ Non-explicit knowledge can be inferred from explicitly asserted knowledge
▪ Allows meta-modelling
11
![Page 12: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/12.jpg)
Our Approach: RDFS Metamodeling
12
M2: Meta-model layer
M1: Model layer
M0: Data layer
s: rdfs:subClassOf
eBISS2019 GermanytakesPlace
SummerSchool CountrytakesPlace
RDFProperty
RDFSResource
RDFSClass
tt
ttt
ss
rd
t: rdf:type
d: rdfs:domain
r: rdfs:range
![Page 13: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/13.jpg)
ARDI Workflow
13
• Extract representations of the sources conformant to the source meta-schema
• Translate to the target schema conformant to the target meta-schema
![Page 14: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/14.jpg)
Running Example▪Stations: attributes with primitive, reference to an object class and array
14
![Page 15: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/15.jpg)
Extraction of Schema
15
![Page 16: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/16.jpg)
Translation of Schema
R1
sc: stations a rdfs:Class .
R2
sc:stations/type a rdf:Property ; rdfs:domain sc: stations .
sc:stations/type rdfs:rangexsd:string
R3
16
![Page 17: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/17.jpg)
Production Rules
17
▪ Define the translation from the schema of the source data to
equivalent RDFS representation
▪ Formalized in First Order Logic
▪ Represented as a logical axiom with left-hand side(LHS) and
right-hand side (RHS)
• if LHS holds RHS must hold too
![Page 18: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/18.jpg)
Prototype Instantiation
18
RDFSDatatype
RDFS
Class
RDFSProperty
RDFSRange
RDFSDomain
stations.json
![Page 19: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/19.jpg)
Next Steps
▪Refactoring automatically extracted source representations•resemble the physical structure of the underlying data sources
•a richer representation of domain concepts and relationships is required to integrate lately
▪Alignment
19
Merging, Mapping
Refactoring, Alignment
Generating RDFS models
Merging, Mapping
Refactoring, Alignment
Generating RDFS models
▪Integrating and querying the
source representations•Merging
•Mapping
![Page 20: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/20.jpg)
PublicationsSubmitted:
Shumet Tadesse, Cristina Gomez, Oscar Romero, Katja Hose “ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources” IEEE EDOC 2019
Planned:
Conference Paper II: Enhancing Data Integration by Refactoring Automatically Extracted Ontologies
•Authors: Shumet Tadesse, Cristina Gomez, Oscar Romero, Katja Hose
•Outlet: The International Conference on Extending Database Technology (EDBT), October 2019
Journal Paper: Automatically Generating data-intensive APIs
•Authors: Shumet Tadesse, Cristina Gomez, Oscar Romero, Katja Hose
•Outlet: Journal of Systems and Software (JSS), December 2019
Conference Paper III: Supporting the Automation of the Whole Data Integration Life-Cycle
•Authors: Shumet Tadesse, Cristina Gomez, Oscar Romero, Katja Hose
•Outlet: The International Semantic Web Conference (ISWC), April 2020
Demo Paper: Integrating Heterogeneous Data Sources for the Generation of data-intensive APIs
•Authors: Shumet Tadesse Nigatu, Cristina Gomez, Oscar Romero, Katja Hose
•Outlet: Conference on Advanced Information Systems Engineering (CAiSE), November 2020 20
![Page 21: Semi-automatic Generation of Data-Intensive APIs · Semi-automatic Generation of Data-Intensive APIs Shumet Tadesse Supervisors: Prof. Oscar Romero, Prof. Cristina Gomez and Prof](https://reader036.vdocuments.mx/reader036/viewer/2022081611/5f0946557e708231d4260ab8/html5/thumbnails/21.jpg)
Thank You!
21