kumar madurai october 21, 2013 knowledge engineering using linked data in an enterprise 1
DESCRIPTION
Opportunities for Improvement How do we reduce the complexity? How do we promote consistency? How do we foster collaborative sharing of data and knowledge? How do we get an integrated view of process and product data across the product life cycle? 3TRANSCRIPT
1
Kumar Madurai
October 21, 2013
Knowledge EngineeringUsing Linked Data in an Enterprise
2
Knowledge Engineering – Problem Context
• 15 different product definitions
• 17 different application systems creating design, regulatory, and production data
• 100’s of change requests monthly
• Tribal knowledge about product design and process design not captured anywhere
3
Opportunities for Improvement
• How do we reduce the complexity?• How do we promote consistency?• How do we foster collaborative sharing of
data and knowledge?• How do we get an integrated view of
process and product data across the product life cycle?
4
Knowledge Engineering Semantic Data Framework
Original Data
Sources
Database1 Database2
Staged Data in Oracle
OntologySemanticLayer in Oracle
Mapping Rules using D2RQ
Linked Concepts and Properties in Domain Views
Applications Consuming Domain Views
ValidatedSystems
Database3 Database4
Non-ValidatedSystems
Database1 Database2 Database3 Database4
Model1RDF Triples
Model2RDF Triples
Model3RDF Triples
Model4RDF Triples
5
Relational Data Source - Simple Example
Equipment_Id Inspector_Id Inspection_Date
E02363 103546 01/10/2013
Employee_Id Employee Name Network_Id
103456 Joe Mathis jmathis
Document_Id Author_Id Creation_Date
D8946 jmathis 06/15/2012
EquipmentDatabase
DocumentDatabase
EmployeeDatabase
Query: Give me the document(s) authored by inspector of equipment E02363
6
Ontology Model for Relational Example
m:Equipment m:equipmentIDrdfs:domain xsd:stringrdfs:range
m:inspectedByrdfs:domain m:Personrdfs:range
m:Document m:documentIDrdfs:domain xsd:stringrdfs:range
m:authoredByrdfs:domain m:Personrdfs:range
m:Employee m:employeeIDrdfs:domain xsd:stringrdfs:range
m:networkIDrdfs:domain xsd:stringrdfs:range
m:Person rdfs:subClassOf m:Employee
7
Semantic Data for Relational Examplem:Equipment
:Equipment_E02363
rdf:type
:equipmentID
E02363
:inspectedBy
:Person_103546
:inspectionDate
01/10/2013
m:Document
:Document_D8946
rdf:type
D8946
:authoredBy
:Person_jmathis
:creationDate
06/15/2012
:documentID
m:Employee
:Person_103546
rdf:type
103546
:employeeName
Joe Mathis
:networkID
jmathis
:employeeID
Inference Rule Example
CONSTRUCT { ?emp2 rdf:type m:Employee . ?emp2 owl:sameAs ?emp1 }WHERE{ ?emp1 rdf:type m:Employee . ?emp1 m:networkID ?netID . BIND (URI (CONCAT (“http://KE/Data/SEM#”, “Person_”, ?netID)) AS ?emp2) }
m:Employee
:Person_103546
rdf:type
:networkID
jmathis
:Person_jmathis
owl:sameAs
rdf:type
:employeeName
Joe Mathis
With OWL Inferencing:SELECT ?nameWHERE{ :Document_D8946 :authoredBy ?author{ ?author :employeeName ?name }
Results in: nameJoe Mathis
8
A Manufacturing Example
Prod_Id Prod_Name Mfg_LineP1 Splash Line #1P2 Trident Line #4
ManufacturingDatabase
PurchasingDatabase
Query: Give me the suppliers of raw materials for products made on Line #4
Product TableProd_Id Matl_Used Qty_Used
P1 Pink Colorant 20
P1 Silver Wrapper 1
P1 Melon Flavor 15
P2 Red Colorant 13
P2 Silver Wrapper 1
P2 Cherry Flavor 10
BOM Table
Matl_Name Supplier_Id
Colorant-Pink S1
Colorant-Red S1
Flavor-Melon S2
Flavor-Cherry S3
Wrapper-Silver S4
Raw Material TableSupplier_Id Supplier_Name
S1 Acme
S2 Foods-R-Us
S3 Yummy
S4 Lotus Inc.
Supplier Table
9
Ontology Model for Manufacturing Database
m:Product m:productIDrdfs:domain rdfs:range
xsd:string
m:productNamerdfs:domain rdfs:range
m:hasManufacturingLine
rdfs:domain
rdfs:range m:ManufacturingLine
m:BillofMaterial rdfs:domain rdfs:range m:Product
rdfs:domainrdfs:range
rdfs:domain
rdfs:range
m:hasProduct
m:hasRawMaterial m:RawMaterial
m:qtyUsed xsd:floatrdfs:domain
m:unitOfMeasure rdfs:range xsd:string
10
Ontology Model for Purchasing Database
p:RawMaterial p:materialNamerdfs:domain rdfs:range xsd:string
p:hasMaterialTyperdfs:domain rdfs:range
p:hasSupplier
rdfs:domain
rdfs:range p:Supplier
rdfs:domain
p:supplierID
rdfs:domain
p:supplierName
xsd:string
p:MaterialType
11
m:Product_P1 rdf:type m:Product m:Product_P1 m:ProductID “P1”m:Product_P1 m:ProductName “Splash”m:Product_P1 m:hasManufacturingLine m:Line_1m:Product_P2 rdf:type m:Product m:Product_P2 m:ProductID “P2”m:Product_P2 m:ProductName “Trident”m:Product_P2 m:hasManufacturingLine m:Line_4
m:Line_1 rdf:type m:ManufacturingLinem:Line_1 rdfs:label “Line #1”m:Line_4 rdf:type m:ManufacturingLinem:Line_4 rdfs:label “Line #4”
m:Bom_P1_Pink_Colorant rdf:type m:BillOfMaterial m:Bom_P1_Pink_Colorant m:hasProduct m:Product_P1 m:Bom_P1_Pink_Colorant m:hasRawMaterial m:Material_Pink_Colorant m:Bom_P1_Pink_Colorant qtyUsed 20m:Bom_P1_Silver_Wrapper rdf:type m:BillOfMaterial m:Bom_P1_Silver_Wrapper m:hasProduct m:Product_P1 m:Bom_P1_Silver_Wrapper m:hasRawMaterial m:Material_Silver_Wrapper m:Bom_P1_Silver_Wrapper qtyUsed 1………………………..
Question: How do we update the unitOfMeasure property?
RDF Triples from D2RQ Mapping - Manufacturing Database
12
RDF Triples from D2RQ Mapping - Purchasing Databasep:Material_Colorant_Pink rdf:type p:RawMaterial p:Material_Colorant_Pink p:materialName “Colorant-Pink”p:Material_Colorant_Pink p:hasMaterialType p:MaterialType_Colorant p:Material_Colorant_Pink p:hasSupplier p:Supplier_S1p:Material_Wrapper_Silver rdf:type p:RawMaterial p:Material_Wrapper_Silver p:materialName “Wrapper-Silver”p:Material_Wrapper_Silver p:hasMaterialType p:MaterialType_Wrapper p:Material_Wrapper_Silver p:hasSupplier p:Supplier_S4…………………………….
p:Supplier_S1 rdf:type p:Supplierp:Supplier_S1 p:supplierID “S1”p:Supplier_S1 p:supplierName “Acme”p:Supplier_S4 rdf:type p:Supplierp:Supplier_S4 p:supplierID “S4”p:Supplier_S4 p:supplierName “Lotus Inc.”……………………………
p:MaterialType_Colorant rdf:type p:MaterialType p:MaterialType_Colorant rdfs:label “Colorant”p:MaterialType_Wrapper rdf:type p:MaterialType p:MaterialType_Wrapper rdfs:label “Wrapper”…………………………….
Question: How do we link the materials from Purchasing to Manufacturing?13
14
Data Traceability is Critical
• Ability to link any item used in an application to the exact data source all the way downstream – use of ‘hasDataSource’ property for every instance created
• Linking of data occurs in two levels, across the product genealogy (horizontal and business driven), and across the system layers (vertical and technology driven)
• Semantic relationships between concepts should be defined properly and maintained to reflect changes in underlying source systems
• Important to keep non-validated data in their own models (semantic graphs) especially in a regulated environment
• Specific verification / validation steps to be performed when new applications are brought on board using the semantic layer
15
Conclusion / Takeaways• Ontological modeling of enterprise data stored in conventional
databases is the first and crucial step• Augmenting the model with rules adds more power to the
inferencing capabilities of the model• Annotation properties (rdfs:label, rdfs:comment, rdfs:seeAlso,
etc.) can also be used to add semantic meaning to the data• D2RQ provides a flexible mapping language and a set of tools to
enable the conversion of relational data to RDF triples • Judicious use of owl:sameAs helps in linking instances that are
the same but from different sources • Critical to ensure data traceability (also called data provenance)
which has to be planned for in the model and when data is loaded into the database