uscisiuscisi scec ontology development tom russ hans chalupsky, stefan decker, yolanda gil, jihie...

27
U S C I S I SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California Information Sciences Institute

Post on 20-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

SCEC Ontology Development

Tom Russ

Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar

University of Southern California

Information Sciences Institute

Page 2: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Outline Background

• SCEC Goals

• Ontology Basics

• Semantic Interoperability Examples

• Weather

• Seismology

• Building Computational Pathways Ontology Development

• SCEC Ontology Development

• Gene Ontology Development

• Fundamental Ontologies? Big Questions

Page 3: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

Goals: SCEC/IT Project

Page 4: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

What is an Ontology?

An Ontology is a framework for representing shared conceptualizations of knowledge

An Ontology provides:• Definitions for objects and relations in the domain

• Shared vocabulary and and common structure for modeling domain knowledge

• Domain model/theory that captures common knowledge about the domain

Page 5: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Semantic Interoperability Story

SCEC Java code for Community Velocity Model• Inputs: longitude and latitude

• Output: Vs30 (m/s) Connection technology: Java serialization

• In other words: Ship the bits for two double precision floating point values through a network connection

• Make sure you send longitude first!–Non-standard convention for geography

–Probably based on X-Y convention instead

Better: More structured input• Latitude=34.15 Longitude=-117.58

• Explicit identification of parameters

Page 6: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Ontologizing a Domainsuch as “Weather”

Page 7: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

C1.1 LANDC1.1.1 Terrain

C1.1.1.1 Terrain ReliefC1.1.1.2 Terrain ElevationC1.1.1.3 Terrain SlopeC1.1.1.4 Terrain FirmnessC1.1.1.5 Terrain TractionC1.1.1.6 VegetationC1.1.1.7 Terrain Relief Features

C1.1.2 Geological FeaturesC1.1.2.1 Geological ActivityC1.1.2.2 Magnetic VariationC1.1.2.3 Subsurface Water

C1.1.3 Synthetic Terrain FeaturesC1.1.3.1 UrbanizationC1.1.3.2 Significant Civil StructuresC1.1.3.3 Synthetic Terrain ContrastC1.1.3.4 Obstacles to MovementC1.1.3.5 Route Availability

C1.1.4 Landlocked WatersC1.1.4.1 Landlocked Waters DepthC1.1.4.2 Landlocked Waters CurrentsC1.1.4.3 Landlocked Waters WidthC1.1.4.4 Landlocked Waters BottomC1.1.4.5 Landlocked Waters Shore Gradient

C1.2 SEAC1.2.1 Ocean Waters

C1.2.1.1 Ocean DepthC1.2.1.2 Ocean CurrentsC1.2.1.3 Sea StateC1.2.1.4 Ocean TemperatureC1.2.1.5 Saline ContentC1.2.1.6 Ocean FeaturesC1.2.1.7 Sea RoomC1.2.1.8 Ocean AcousticsC1.2.1.9 Ocean BioluminescenceC1.2.1.10 Ocean IceC1.2.1.11 Ocean Ice ThicknessC1.2.1.12 Ocean Ambient Noise

C1.2.2 Ocean BottomC1.2.2.1 Sea Bottom ContoursC1.2.2.2 Sea Bottom Composition

C1.2.3 Harbor CapacityC1.2.3.1 Harbor ShelterC1.2.3.2 Harbor DepthC1.2.3.3 Harbor Currents

C1.2.4 Littoral CharacteristicsC1.2.4.1 Littoral GradientC1.2.4.2 Littoral CompositionC1.2.4.3 Littoral Terrain FeaturesC1.2.4.4 Littoral TidesC1.2.4.5 Littoral Currents

C1.2.5 Riverine EnvironmentC1.2.5.1 Riverine NavigabilityC1.2.5.2 Riverine Tidal TurbulenceC1.2.5.3 Riverine CurrentC1.2.5.4 Riverine Bank Gradient

C1.2.6 Shipping PresenceC1.2.6.1 Shipping DensityC1.2.6.2 Shipping TypeC1.2.6.3 Shipping Indentifiability

C1.3 AIRC1.3.1 Climate

C1.3.1.1 SeasonC1.3.1.2 Weather SystemsC1.3.1.3 Weather

C1.3.1.3.1 Air TemperatureC1.3.1.3.2 Barometric PressureC1.3.1.3.3 Surface Wind Velocity

C1.3.1.3.3.1 Low Altitude Wind VelocityC1.3.1.3.3.2 Medium Altitude Wind VelocityC1.3.1.3.3.3 High Altitude Wind Velocity

C1.3.1.3.4 Wind DirectionC1.3.1.3.5 HumidityC1.3.1.3.6 PrecipitationC1.3.1.3.7 Altitude

C1.3.2 VisibilityC1.3.2.1 LightC1.3.2.2 Obscurants

C1.3.3 Atmospheric Weapon EffectsC1.3.3.1 Nuclear Effects

C1.3.3.1.1 Nuclear Blast/Thermal EffectsC1.3.3.1.2 Nuclear Radiation Effects

C1.3.3.2 Chemical EffectsC1.3.3.3 Biological EffectsC1.3.3.4 Electromagnetic Effects

C1.3.4 Airspace Availability

C1.4 SPACEC1.4.1 Objects in Space

C1.4.1.1 Orbit DensityC1.4.1.2 Orbit TypeC1.4.2 Solar and Geomagnetic Activity

C1.4.3 High Energy Particles

C1.0 PHYSICAL ENVIRONMENT

Conditions for Joint Tasks (from: CJCSM 3500.04A 9/13/96, p. 3-11.)

Identify Relevant Domain Concepts

Page 8: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Weather Specificationin English (from: CJCSM 3500.04A 9/13/96, p. 3-11.)

C 1.3.1.3 Weather• Definition: current weather (next 24 hours).

• Descriptors: clear, partly cloudy, overcast, precipitating, stormy

C 1.3.1.3.1 Air Temperature• Definition: atmospheric temperature at ground level

• Descriptors: Hot (> 85° F)Temperate (40° to 85° F)Cold (10° to 39° F)Very Cold (< 10° F)

Page 9: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Formalizing Domain Concepts

A knowledge-based system about “Weather” must know things like these:

• Terms• hot, humid, windy ...

• Definitions• cold = (10° to 39° F)

• Relationships• cold and windy may overlap

• cold and hot are disjoint

• cold and very cold are disjoint!• Rules

• IF heavy rain lasts 2 days

• THEN muddy terrain and excessive runoff

• (probability .9)

Page 10: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Earthquake Hazard Analog

NEHRP Soil Types

Soil TypeDescription Vs (m/s) Rock Types

A Hard Rock > 1500 Unweathered igneous intrusive

B Rock 760 - 1500750 - 1500

Volcanics, most Mesozoic bedrock, some Franciscan bedrock

C Soft Rock 360 - 760350 - 750

Some Quarternary and Tertiary sands, sandstones and mudstones.

Some Franciscan melange & serpentinite

D Stiff Soil 180 - 360200 - 350

Some Quarternary muds, sands, gravels, silts and mud

E Soft Soil < 180< 200

Water-saturated mud and artificial fill

Page 11: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

(deffunction source-hypocenter ((?s earthquake-source)) :-> (?h location) :documentation "The 3D point where the ruptured started.")(deffunction source-epicenter ((?s earthquake-source)) :-> (?e location) :documentation "The point on the earth's surface directly above the hypocenter" :axioms (=> (earthquake-source ?s) (and (= (latitude-of (source-hypocenter ?s)) (latitude-of (source-epicenter ?s))) (= (longitude-of (source-hypocenter ?s)) (longitude-of (source-epicenter ?s))) (= (depth-of (source-epicenter ?s)) (units 0 "m"))))

PowerLoom:

Hypocenter vs. Epicenter

The epicenter is the point on the surface directly above the hypocenter.

“Directly above”, more formally:

• The latitude and longitude of the epicenter and hypocenter are the same.

• The epicenter depth is zero.

Page 12: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

PowerLoom

Knowledge representation & reasoning system Uses definitions specified in a formal logic

• First order predicate calculus

• Expressive: We can say what we need to

Inference via logical deductions Support for units and dimensions Browsing tool: Ontosaurus

Page 13: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Ontosaurus

Diagrams and images aid domain familiarization

Display of formal information and rules

Navigation Tools and Control Panel

Domain facts.

Textual documentation

Page 14: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Graphical View: Fault Hierarchy

Page 15: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Plan:Building Computational Pathways

Simple scenario to illustrate how a user would define computational pathways

Behind the scenes, DOCKER uses descriptions of components, their I/O requirements and their constraints to:• detect errors in user’s input

• suggest additional steps needed to make the pathway work

• make educated guesses about how components selected by the user may be connected to one another

Page 16: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Compute PGA for an Address Using These Components

EarthquakeForecastModel

(USGS-02)

Geocoder

Fault-type

Magnitude

Vs30

Distance

CommunityVelocity Model

AddressLat/long

Fault-type

Magnitude

Lat/longTime Span

Lat/long Vs30

AttenuationRelationship(Field-2000)

PGA

DistanceComputation

Lat/long1DistanceLat/long2

Fault-type

Magnitude

Site Type

Distance

AttenuationRelationship

(Campbell-02)

PGA

Page 17: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Some Data Paths Connect Easily

EarthquakeForecastModel

(USGS-02)

Geocoder

Fault-type

Magnitude

Vs30

Distance

CommunityVelocity Model

AddressLat/long

Fault-type

Magnitude

Lat/long

Time Span

Lat/long Vs30

AttenuationRelationship(Field-2000)

PGA

DistanceComputation

Lat/long1DistanceLat/long2

Page 18: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Others Require Transformation

EarthquakeForecastModel

(USGS-02)

Geocoder

Fault-type

Magnitude

Vs30

Distance

AddressLat/long

Fault-type

Magnitude

Lat/long

Time Span

CommunityVelocity Model

Lat/long Vs30

AttenuationRelationship(Field-2000)

PGA

DistanceComputation

Lat/long1 DistanceLat/long2

Page 19: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Developing Ontologies

Page 20: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

SCEC Ontology Development

Task-driven

• Particular application

• Modeled on domain inferences & reasoning Small team of Computer Scientists

• Seismology - Tom Russ

• Models - Jihie Kim, Varun Ratnakar, Tom Russ Small group of Domain Experts

• Ned Field and Tom Jordan Future

• Development and curation by domain experts

• Requires methodology

• Requires tools

Page 21: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

Capture Inference in Ontology

Ned Field’s markup of fault parameter data

Computation and checking of propertiesDefinitions of Terms

Page 22: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

The Gene Ontology (GO)

Had a successful jumpstart Done by biologists, not knowledge engineers Developed by a wide, distributed community Focused on specific aspects of genomics

• Fly-base, yeast, mouse Used 24/7 from day 1 Accepted widely by the community Extended based on use requirements of a wide

community Quite large (30-40K terms)

Page 23: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Jumpstart of Go:Key Decisions (1)

Limited scope• limit domain, though it could have included many many

more areas– not let anyone else in until they got somewhere

– Added new groups incrementally (10)

• 3 related areas open (no licenses), use open standards Involve the community Had to develop own software

• control over own code

• KISS: keep it simple stupid– E.g., only two relations

Transitivity

Page 24: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Key Decisions (2)

Use it from the beginning

• If you wait to have ontology finished before using it you’d never be there

• Errors would only be discovered through use

• Set things up so that you are OK when you have to fix those errors (entire chunks of ontology had to be entirely redone)

• Minimized change impacts by limiting most changes are to rels, which in practice does not impact the annotations

Face-to-face meetings 3-4 times a year Satisfied a need for DB users that wanted to ask complex

queries (1 query to all DBs) Establish migration path

Page 25: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Key Decisions (3)

Requests are resolved either:

• Immediately

• Over email if can reach closure over 2-3 days

– No voting, only consensus

• on agenda for next meeting Attribution was important

• Learned that from Flybase

• Both GO content and annotations are annotated with attribution Unique identifiers within GO

• The term can change as a lexical string, but no change in meaning and thus no change in identifier

• Can change defn, but not the GO string, then id changes

• Small number of relations

Page 26: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Fundamental Ontologies

What is out there? Not much.

• Ontolingua (Stanford University) has a number of small component ontologies

–Designed as components

–Not tied to applications

• DAML is working on fundmental physics ontologies (Jerry Hobbs, SRI International, ISI, Ken Forbus, others)

–Time

–Space

• We would like input from GEON!

Page 27: USCISIUSCISI SCEC Ontology Development Tom Russ Hans Chalupsky, Stefan Decker, Yolanda Gil, Jihie Kim, Varun Ratnakar University of Southern California

USC

ISI

Some BIG Questions(from Gene Ontology Workshop)

How do you get started? How to ensure the community will accept it (use

it)? How do you (can you?) represent alternative

views? What is the process to contribute to it? What is the process to make changes to it? What happens when there is an update? How is it implemented? What tools? How is it managed? Who does what, when, where, why?