data model vs. ontology dr. tatiana malyuta associate professor, cuny consultant for dod dr. barry...

22
Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Upload: kelsi-whites

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model vs. OntologyDr. Tatiana Malyuta

Associate Professor, CUNYConsultant for DoD

Dr. Barry SmithUB, NCOR

Page 2: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model - Purpose• To provide a consistent and efficiently functioning data

store for a particular business application(s)– Represents specific business concepts in a way that determines

organization of data in the store– Commonly used representations are relational and graph; they

are supported by data management technologies, e.g. relational – Oracle and MySQL, graph – Neoj4, RDF/OWL stores.

• Efficiency requires – Application-specific representations– Store only data needed the application

• Objective (shared) representation of the domain is not the purpose – multiple data models for the same domain to accommodate different business applications

Page 3: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Silos• Numerous partial idiosyncratic representations of the domain

in data models and numerous versions of data in data stores• No re-usability• No single version of truth

Accounts Receivable

Accounts Payable

Budget

Page 4: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Ontology – Purpose • Objectivity of representation of reality • Commonly used representation is graph, it is

supported by RDF-based semantic technologies • Objective (shared) representation of the domain

- one authoritative ontology for the domain of reality meant for re-use

• Storing vast volumes of data is not the purpose

Page 5: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Financial Ontology• A single domain ontology (or a collection of ontologies) • To be re-used in different applications • Single version of truth (as we know it today)

Note: we discuss ontologies built in accordance with the methodology and architecture pioneered by Dr. Smith.

Page 6: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Comparison• Although there are technologies that support a particular

paradigm in the best way, they are not the defining factor in distinguishing between a data model and ontology

• We compare not technologies but paradigms

Skills

Person

Programming Skill

Skill

Computer Skill

Person Name

First Name Network

Skill

Person Name Network Skill Programming Skill

Last Name First Name Skill

Person Name Computer Skill

PersonSkill

Java Skill

Middle Name

Last Name

Nick Name

Ontology Data Model

Page 7: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model – Types• Types are general or repeatable entities capable of being

instantiated by indefinitely many particulars• Data model types and instances are abstractions embodying

efficient ways of describing the data about reality that is needed by an application (efficient both for reasoning and for storage)– Different abstractions depending on the business need

The data model term ‘person’ is used to define an efficient storage solution for data about persons needed by a particular application

Page 8: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Ontology – Types• Ontology types and instances are on the side of reality• They must provide one term, and one definition, for

each salient type of entity in each domain of interest

The ontology term ‘person’, when it is used to represent data about persons, is designed to establish a link between these data and persons in reality.

Page 9: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model – Organization• Arbitrary combination of selected types suited for

efficient data processing• The data model view of reality is flat and rigid

One of the models needs to be changed to accommodate multiple skills of a person. These changes can be performed only through significant effort because of relative rigidity of data representation languages and the need to re-arrange the physical data store

Page 10: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Ontology - Organization• Each type appears only once in the ontology

hierarchy. • The ontology view of reality is synoptic – it

represents in non-redundant fashion an entire hierarchy of types at different levels of generality. Each term is associated in an intelligible way with its subsuming and subsumed terms (and thus with the ancestor and descendant types) in the hierarchy of more and less general

• Representation is more flexible, changes are easier to make, and changes are not as disruptive

Page 11: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Questions?

Page 12: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model vs. Ontology –Types and Individuals

Person Name SkillJohn Computer SkillMary Sewing Skill

Skill

Computer Skill

Programming Skill

Java C++

Person Name SkillJohn JavaMary C++

Page 13: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Data Model – Labels • Are not as important because databases are not

directly exposed to users – they are presented via an application that exposes the database content using the specific vocabulary of a narrow community of users

• Can be anything, e.g. ‘PN’, ‘PName’, ‘PersName’, ‘PersonN’, etc. for the person name

• The meaning of the label is often derived from the context (e.g. Name for the name of the Person and the name of the Skill in one of the examples)

Page 14: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Ontology - Labels

• Are exposed to users• Are nouns and noun phrases from natural

language, and each type has a unique name that designates the type unambiguously regardless of the context in which the type might be used, e.g. PersonName, SkillName

Page 15: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Closed and Open World Assumptions(impact of technologies)

• Database reasoning is confined to search based on the closed world assumption. If we do not find something in the database, then this means that this something does not exist in the world that is defined by the database.

• Ontologies are based on the idea that we can never describe entities in the real world completely. This means that, from the absence in an ontology of a particular term ‘A’, we cannot infer that As do not exist. It means also that ontologies are constructed in a way which allows easy addition of new types and relations.

Page 16: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Life Span

• Data models are created in ad hoc ways to capture targeted selection of features; the data model usually is not reused, which results in numerous data silos for a domain

• Ontologies will grow and expand as new knowledge is gained over time

Page 17: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Summary of ComparisonDimension of Comparison Traditional Data-Model Ontologies

Closeness to reality

Variable, application-specific Reality is always the prime focus

Conceptualization of the domain

Plain and partial (always at the level of detail needed for a particular implementation)

Hierarchical, simultaneously describing the same domain at different levels of detail

Vocabulary Application-specific, not intended for sharing

Application-independent, intended to support sharing and reuse

Structures or organization of types

Groupings of types to accommodate data access patterns

Taxonomies (type hierarchies) always used to describe/classify the domain

Combinability Can rarely be combined; even if possible this will typically require significant manual effort

If the ontology building methodology is followed, then the results will be combinable automatically

Flexibility Rigid, changes normally require significant effort

Flexible, changes can normally be effected very easily.

Page 18: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

18

Semantic Enhancement of Data Models by Ontology

• Semantic Enhancement (SE) is realized with the help of ontologies that are used to explicate data models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides

agile horizontal integration– Ontologies, by virtue of their nature and organization, provide semantic

enhancement of data

PersonID Name Description

111 Java Programming

222 SQL Database

SQL Java C++

ProgrammingSkill

ComputerSkill

Skill Education

TechnicalEducation

Page 19: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

19

The Meaning of ‘Enhancement’• Semantic enhancement/enrichment of data = arm’s

length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills,

“vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education.

– and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes

• For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE (see References)

Page 20: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

SE and Data Integration• Traditional integration approaches involve creation of a new

model used in– A new physical store (data warehouse)

• Expensive, resource- and time-consuming• Another data store – rigid (potential data silo), interoperable with other

stores• Querying the data sources via it

– Fragile

• Both entail loss and or distortion of data and semantics, and provide only ‘local’ integration (do not lead to interoperability with other sources)

• SE of a store – Does not require data reorganization and creation of another

store– Changes to it are non-intrusive– Leads to integration of the store with other stores, enhanced

previously or in the future

Page 21: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

21

References• Barry Smith, et al. IAO-Intel – An

Ontology of Information Artifacts in the Intelligence Domain, STIDS Conference, 2013.

• Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.

• • Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny Parent,

Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012.

• • David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry Smith,

Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.

Page 22: Data Model vs. Ontology Dr. Tatiana Malyuta Associate Professor, CUNY Consultant for DoD Dr. Barry Smith UB, NCOR

Questions?