semantic annotation on the sonet and semtools projects: challenges for broad multidisciplinary...

23
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB TDWG meeting, Wood’s Hole Observations Activity Group Sep. 29, 2010

Upload: myles-newlon

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Semantic annotation on the SONet and Semtools projects:Challenges for broad

multidisciplinary exchange of observational data

Mark Schildhauer, NCEAS/UCSBTDWG meeting, Wood’s HoleObservations Activity GroupSep. 29, 2010

Page 2: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Nature of scientific data sets

• Scientific data often in tables• Tables consist of rows (records) and columns (attributes)• The association of specific columns together (tuple) in a

scientific data set is often a non-normalized (materialized) view, with special meaning/use for researcher

• Individual cells contain values that are measurements of characteristic of some thing

Page 3: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

SONet/Semtools Semantic Approach

• Data-> metadata-> annotations-> ontologies• Ontology: formal knowledge representation in OWL-

DL– Hierarchical structure of concepts– Relationships can link concepts

• Annotations link EML metadata elements to concepts in ontology thru Observation Ontology

• EML metadata describe data and its structures

Page 4: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Linking data values to concepts

• Extensible Observation Ontology (OBOE)• OBOE provides a high-level abstraction of

scientific observations and measurements • Enables data (or metadata) structures to be

linked to domain-specific ontology concepts• Can inter-relate values in a tuple• Provides clarification of semantics of data set

as a whole, not just “independent” values

Page 5: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Concepts of Semantic Search

• Annotations give metadata attributes semantic meaning w.r.t. an ontology

• Enable structured search against annotations to increase precision

• Enable ontological term expansion to increase recall

• Precisely define a measured characteristic and the standard used to measure it via OBOE

Page 6: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Logical Architecture

Page 7: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Annotations

• XML schema defines annotation properties• Namespaces to identify sources of terms• Search performed against annotations not the

metadata itself• Returns metadata documents that are linked

to the annotation• Reasoning (term expansion, consistency, etc.)

through domain ontology

Page 8: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

XML Links

Page 9: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

KNB metadata catalog

• Stores EML (XML) and raw data objects• Extended to store Ontologies, domain and

OBOE (OWL-DLs serialized in XML)• Extended to store Annotations (XML)• Jena to facilitate querying ontologies• Pellet to reason (consistency of ontologies;

class subsumption)

Page 10: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Metacat Implementation

Page 11: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

11

Context

Observation

Measurement

Relationship

Entity

CharacteristicValue

Standard

hasContextRelationship

ofEntity

hasValue ofCharacteristic

usesStandard

hasMeasurement

hasContext

hasContextObservation

0..*

1..1

1..10..*

0..* 1..1

0..* 1..11..1 0..*

0..*

1..1

1..1

0..*

OBOE Conceptual Model (OWL-DL)

Page 12: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Annotation Examples (12/18/2009)

AnnotationDataset

Materialize

Define

(view def.)

OBOE Model(individuals/triples)

OBOE Concepts

instantiates

uses terms from

observation-basedrepresentation of

Query*

* Conceptually, we want to query datasets via annotations

Page 13: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

13

Annotation Examples

<observation label="o1”> <entity id=”TemporalRange"/> <measurement label="m1”> <characteristic id=”Year"/> <standard id=”DateTime"/> </measurement></observation><observation label="o2"> <entity id=“Tree"/> <measurement label="m2" precision="0.1"> <characteristic id=”DBH"/> <standard id=”Centimeter"/> </measurement> <measurement label="m3"> <characteristic id=”TaxonomicTypeName"/> <standard id=”ITIS"/> </measurement> <measurement label="m4”> <characteristic id=”EntityName"/> <standard id=“LocalTreeNames"/> </measurement> <context observation="o1"> <relationship id=“Within"/> </context></observation><map attribute="yr" measurement="m1"/><map attribute="diam" measurement="m2" if="diam ge 0"/><map attribute="spec" measurement="m4"/><map attribute="spp" measurement="m3" value="Picea rubens” if="spp eq 'piru'"/><map attribute="spp" measurement="m3" value="Abies balsamea” if="spp eq 'abba'"/>

Annotation Syntax

observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “diam” to “m2" if diam > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

* Code exists to read/write annotations using this XML format

Page 14: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

14

Annotation Examples

yr spec spp dbh

2007 1 piru 35.8

2007 1 piru 36.2

2008 2 abba 33.2

observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

• Basic idea: go row-by-row through dataset, generating individuals/triples• “external” terms should have namespacing prefix URI

: Obs

: Meas

: Year

: DateTime

2007

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: Year

: DateTime

2007

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: Year

: DateTime

2008

: Obs

: Meas

: EntN

: LocTN.

2

: Meas

: TaxN

: ITIS

Abie.

: Meas

: DBH

: Centim.

33.2

: Tree: Tempral

Range

: Tree: Tempral

Range

: Tree: Tempral

Range

hasContext

hasContext

hasContext

Page 15: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

15

Annotation Examples

yr spec spp dbh

2007 1 piru 35.8

2008 1 piru 36.2

2008 2 abba 33.2

observation "o1” entity ”TemporalRange” measurement "m1” characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

• Same Trees!! (both have name = 1)• Same Year and year observation!!

: Obs

: Meas

: Year

: DateTime

2007

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: Year

: DateTime

2007

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: Year

: DateTime

2008

: Obs

: Meas

: EntN

: LocTN.

2

: Meas

: TaxN

: ITIS

Abie.

: Meas

: DBH

: Centim.

33.2

: Tree: Tempral

Range

: Tree: Tempral

Range

: Tree: Tempral

Range

hasContext

hasContext

hasContext

Page 16: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

16

Annotation Examples

yr spec spp dbh

2007 1 piru 35.8

2008 1 piru 36.2

2008 2 abba 33.2

observation "o1” distinct yes entity ”TemporalRange” measurement "m1” key yes characteristic ”Year” standard ”DateTime”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” characteristic “TaxonomicTypeName” standard “ITIS” measurement "m4” key yes characteristic “EntityName” standard “LocalTreeNames” context observation “o1” relationship “Within”map “yr" to “m1”map “dbh” to “m2" if dbh > 0map “spec" to “m4”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

: Obs

: Meas

: Year

: DateTime

2007

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: EntN

: LocTN.

1

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: Year

: DateTime

2008

: Obs

: Meas

: EntN

: LocTN.

2

: Meas

: TaxN

: ITIS

Abie.

: Meas

: DBH

: Centim.

33.2

: Tree: Tempral

Range

: Tree

: TempralRange

Every observation has an implicit “distinct” attribute (set to “no”)

… and every measurement has an implicit “key” attribute (set to “no”)

hasContext

hasContext

Page 17: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

17

• Observation measurement keys– Like a primary key constraint

– States that observation instances with the same measurement key values are of the same entity instance

– Does not imply the same observation instance, unless the observation is declared distinct

– All key measurements of an observation together form the primary key

• Distinct observations – Only applies if at least one key measurement is defined

– States that observation instances with the same entity instance are of the same observation instance

Annotation Examples

Page 18: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

18

Annotation Examples

plt spp dbh

A piru 35.8

A piru 36.2

B piru 33.2

observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

: Obs

: Meas

: EntN

: Nominal

A

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: EntN

: Nominal

B

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

33.2

: Tree: Plot

: Plot

hasContext

hasContext

Here we don’t have unique ids for trees

But, assume each spp name within a plot uniquely identifies a tree …

i.e., at most one tree of a particular type was measured (possibly multiple times) in each plot

Page 19: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

19

Annotation Examples

plt spp dbh

A piru 35.8

A piru 36.2

B piru 33.2

observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

: Obs

: Meas

: EntN

: Nominal

A

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: EntN

: Nominal

B

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

33.2

: Tree: Plot

: Plot

hasContext

hasContext

• The Tree entity instance should depend on the plot it is in!!! (context)

Page 20: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

20

Annotation Examples

plt spp dbh

A piru 35.8

A piru 36.2

B piru 33.2

observation "o1” distinct yes entity ”Plot” measurement "m1” key yes characteristic ”EntityName” standard ”Nominal”observation "o2” entity “Tree” measurement "m2" precision: "0.1” characteristic “DBH” standard ”Centimeter” measurement "m3” key yes characteristic “TaxonomicTypeName” standard “ITIS” context identifying yes observation “o1” relationship “Within”map “plt" to “m1”map “dbh” to “m2”map “spp" to “m3" if spp == “piru” value=“Picea rubens” map “spp" to “m3" if spp == “abba” value=“Abies balsamea”

Annotation Dataset

: Obs

: Meas

: EntN

: Nominal

A

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

35.8

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

36.2

: Obs

: Meas

: EntN

: Nominal

B

: Obs

: Meas

: TaxN

: ITIS

Picea.

: Meas

: DBH

: Centim.

33.2

: Tree: Plot

: Plot

hasContext

hasContext

Every context relationship has an “identifying” qualifier (set to “no”)

Uniqueness within context observation

Similar to a weak-entity constraint (ER)

: Tree

Page 21: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

21

Representing instances …

• Annotation(AnnotId, Resource)

• Observation(ObsId, AnnotId, EntId)

• Measurement(MeasId, ObsId, MeasType, Value)

• Context(ObsId1, ObsId2, Rel)

• Relationship(RelId, RelType)

• Entity(EntId, EntType)

This could be queried itself and/or mapped to triples

Note that ObsIds are unique across annotationsContext.ObsId’s must be for the same annotation

Annotation Examples

* Simple relational schema for OBOE models (individuals/triples)

Page 22: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

22

• Developing compatible domain ontologies (design patterns for use with observation ontology)

• Scalability of materialization algorithm from annotations (data result sets)

• Testing and developing capabilities motivated by Use Cases (coastal ecosystems and plant traits)

• SONet and JWG-ODMS continue to meet and discuss

Ongoing Activities

Page 23: Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB

Acknowledgements: Shawn Bowers, Huiping Cao, SEEK KR/SMS working group, and all members of SONet and Semtools projects

Thanks also to Chad Berkeley and Ben Leinfelder, project software engineers

Work supported by National Science Foundation awards 0225674, 0225676, 0743429, 0733849, 0753144, 0630033