spire news joel sachs [email protected]. spire semantic prototypes in ecoinformaics umbc ebiquity...
TRANSCRIPT
Spire Semantic Prototypes In
Ecoinformaics
UMBCEbiquityUMBC
Ebiquity
UMD MIND SWAP
UMD MIND SWAP
NASAGSFCNASAGSFC
RMBLPeace RMBLPeace
UC DavisICE
UC DavisICE
NBIINBII
Semantic Web Tools
Agents
Information Retrieval
Invasive Species Forecasting SystemRemote Sensing Data
Food Webs
Semantic CAINOntology DevelopmentDissemination
Prototype applications
Infrastructure
Ontology of Ecological Interaction
Overview of Talk
• What (and why) is the semantic web?– History– The tragic legacy of ontologies– Hope for the future
• Some Spire achievements– Elvis, Ethan, Swoogle, Tripleshop, RDF123
• Semantic Eco-blogging– Spotter, Splickr, Fieldmarking– Bioblitzes
• Linked Data– Why? How?
• A tiny data browsing demo
Semantic Web?
• The Semantic Web arose out of a confluence of 3 communities.– Hypertext; AI; Electronic publishing
• The AI component achieved early dominance.– Knowledge representation; Ontologies; First order logic, etc.
• This was exciting for some, and confounding for others.
The next 3 slides are from “The Suggested Upper Merged Ontology (SUMO) at Age 7: Progress and Promise”, by Adam Pease
High Level Distinctions
The first fundamental distinction is that between ‘Physical’ (things which have a position in space/time) and ‘Abstract’ (things which don’t)
Entity
Physical Abstract
High Level Distinctions
Partition of ‘Physical’ into ‘Objects’ and ‘Processes’
Physical
Object Process
ProcessesDualObjectProcess Substituting Transaction Comparing Attaching Detaching Combining SeparatingInternalChange BiologicalProcess QuantityChange Damaging ChemicalProcess SurfaceChange Creation StateChangeShapeChange
IntentionalProcess IntentionalPsychologicalProcess RecreationOrExercise OrganizationalProcess Guiding Keeping Maintaining Repairing Poking ContentDevelopment Making Searching SocialInteraction ManeuverMotion BodyMotion DirectionChange Transfer Transportation Radiating
Interoperability through Simplicity
Spire So far: Ontologies• “The Big Experiment”
– A collection of linked ontologies enabling highly detailed descriptions of ecological interaction.
– Supports WoW - Webs on the Web
• SpireEcoConcepts
– Medium size. Used for expressing trophic links and related information, including bibliographic info on studies.
• ETHAN
– Evolutionary trees and natural history.
– Huge.
• Observation ontology
– For semantic eco-blogging.
– Tiny.
• Invasives ontology
– Lightweight and extensible in the most trivial of manners.
ETHAN Engineering
• The semantics behind an arbitrary relation can often be expressed using the rdfs:subClassOf relation, as opposed to rdf:property. Doing so has a number of benefits:
• It seems to be more computationally efficient. (We have no hard evidence for this, yet.)
• It makes it easy to introduce a new concept, especially in a distributed manner. (See our discussion of conservation information below.)
• It leads to fewer disagreements among scientists and, therefore, greater chance of ontology adoption (We have anecdotal evidence for this.)
A Brief Tour of Some Relevant Ontologies
• http://spire.umbc.edu/ontologies/InvasivesOntology.owl
• http://spire.umbc.edu/ontologies/lists/
• http://spire.umbc.edu/ontologies/lists/USFWSInjuriousAnimals.owl
• http://spire.umbc.edu/ontologies/lists/Cal-IPC.owl
Spire So far …
• ELVIS– A suite of tools motivated by the belief that food web
structure plays a role in determining the success or failure of potential species invasions.
– Species List Constructor.• Give a location, get a species list.
– Food Web Constructor.• Give a species list, get a food web.
– Evidence Provider.• Drill down on a predicted trophic link, and see evidence for and
against the existence of that link.
• This illustrates our general attitude of moving away from “answer providers” to “evidence providers”.
Bacteria
Microprotozoa
Amphithoe longimana
Caprella penantis
Cymadusa compta
Lembos rectangularis
Batea catharinensis
Ostracoda
Melanitta
Tadorna tadorna
ELVIS: Ecosystem Localization, Visualization, and Information System
Oreochromis niloticusNile tilapia
??
. . .
Species list constructor
Food web constructor
Food Web ConstructorPredict food web links using database and taxonomic reasoning.
In a new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected
Food Web Constructor generates possible links
Evidence provider gives details
So far: Integration
• Swoogle– Google for the semantic web.– Crawls and indexes RDF documents.– Computes metadata, including “ontoRank”.
• Tripleshop– A SPARQL query engine.
• Leave out the FROM clause.• Data comes from Swoogle
– Semi-automatic dataset constructor– Our main platform for integration
Google has made us smarter
But what about our agents?
tell
register
Agents still have a very minimal understanding of text and images.
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.
80 ontologies were found that had these three terms
Let’s look at this one
Basic MetadatahasDateDiscovered: 2005-01-17 hasDatePing: 2006-03-21 hasPingState: PingModified type: SemanticWebDocument isEmbedded: false hasGrammar: RDFXML hasParseState: ParseSuccess hasDateLastmodified: 2005-04-29 hasDateCache: 2006-03-21 hasEncoding: ISO-8859-1 hasLength: 18K hasCntTriple: 311.00 hasOntoRatio: 0.98 hasCntSwt: 94.00 hasCntSwtDef: 72.00 hasCntInstance: 8.00
These are the namespaces this ontology uses. Clicking on one
shows all of the documents using the namespace.
All of this is available in RDF form for the
agents among us.
Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
10K terms associatged with “person”! Ordered by use.
Let’s look at foaf:Person’s metadata
UMBC Triple Shop
• http://sparql.cs.umbc.edu/tripleshop2• Online SPARQL RDF query processing based
on HP’s Jena and Joseki with several interesting features• Selectable level of inference over model• Automatically finds SWDs for give queries using Swoogle
backend database– Provide dataset creation wizard– Dataset can be stored on our server or downloaded– Tag, share and search over saved datasets
Who knows Anupam Joshi?Show me their names, email address and pictures
The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
No FROM clause!
Constraints on wherethe data comes from
Swoogle found 292 RDF data files that appear relevant to answering our query
Let’s save the dataset before we use it
And tag it so we and others can find it more easily.
He has many friends!
Semantic Eco-Blogging: Some Background
1/3 of all new web content is user generated• Scientific data is increasingly a part of Web 2.0/3.0• How easy can we make semantic annotation?
Climate change drives ecological change• Alters species distribution
Wuethrich, B. How Climate Change Alters Rhythms of the Wild Bernice Wuethrich (4 February 2000) Science 287 (5454), 793.
• Drives evolutionBradshaw, W. E., and Holzapfel, C. M. 2001. Genetic shift in
photoperiodic response correlated with global warming. Proc. Nat. Acad Sci. USA. 98:14509-14511
Semantic Eco-blogging.• Eco-blogs are popping up all over the place.
– Bloggers are both amateur nature-lovers, and working biologists.
• “On April 24 in Washington DC, I saw a leopard slug. Here’s a picture.”
• These observations are, potentially, an important part of the ecological record.– “What was the earliest sighting of a robin hatching?”– “What was the Northernmost sighting of the Asian Longhorn
Beetle?”– Etc.
• System concept: global human sensor net.• SPOTTER
– A firefox plugin for creating OWL from field observations.– Spotter map lets you see all “spots”– Being tested at http://ebiquity.umbc.edu/fieldmarking/ and other
blogs near you.
You can download spotter at http://spire.umbc.edu/spotterTry it out, and then view your observations on the Spotter map:
http://spire.umbc.edu/spotter/spotterMap.php
The Blogger Bioblitz
• Bioblitz: a 24 hour inventory of all living things in a given area.– Dual aims of establishing degree of biodiversity and popularizing science.
• The recent Blogger bioblitz.– 17 bloggers from:
– Sitka, Alaska; Greece; Toronto; Santa Cruz; DC; etc.
• 1200 observations.• Tripleshop was able, by combining the observations with background data,
to respond to a number of ad-hoc queries.– E.g. “Show all observations of species listed as being either invasive or injurious.”
resulted in 47 hits.
Splickr
• Flickr has been handling geotagged pictures since August 2006.
• Roughly 30 million geotagged photos in the first year.– 2.1 million so far this month.
• Splickr is a Flickr/Yahoo maps mashup that makes it easy to find pictures of particular species in a given area.– All data gets represented in OWL.
RDF123
• A flexible and graphical means to map from spreadsheets to RDF
• The mapping is stored as an OWL file• An RDF123 webservice takes a Google spreadsheet
and a map as input, outputs RDF.• So you can do all your work, collaboratively, in the
spreadsheet, and you never have to export to RDF!
Taxonomy for biologists is a little bit tricky. Columns A-F (Phylum, Class, Order, Family, Genus, Species) has a rule: i. If there is a value for Column F (Species), then the value of Columns E (Genus) and F should be joined with an underscore, and mapped to ob#hasTaxon. ii. If there is no value for Column F, then the rightmost column, amongst columns A-E, that has a value gets mapped to ob#hasTaxon.
Eco-Blogging: Next steps
• Make every bioblitz a blogger bioblitz– Use RDF123
– Rock Creek, MD and LA county coming up
• Drop-down invasives lists in Splickr– E.g. find all photos in Europe of species on the “Worst Invaders of
Europe” list
• Mining other sources– E.g. birdwatcher listservs
• Making semantic eco-blogging easier– We will continue to work with children.
• Aggressively pursue a Linked Data approach.
A Few Words on Linked Data
• “Linked Data on the Web” is a collection of best practices for publishing data on the semantic web.
– Distinguishing between Information and non-information resources.– 303 redirects and content negotiation.– HTTP URIs for everything on Earth.– owl:sameAs
• It is also, to an extent, a rebranding of the semantic web.– Much more emphasis on links amongst datasets.– Much less emphasis on formal semantics.
• Linked data can be browsed, in much the same way we browse the traditional web.
– So we can find data either by searching for it (with Swoogle/Tripleshop) or by surfing our way to it.
Some Context
• Before search engines, we found things on the web by browsing.
• Browsing still has its charms.– And benefits.
• On the semantic web:– One way to build a dataset: Swoogle/Tripleshop– Another: data browsing …
• A “thing-centric” approach.
Other Thoughts and Deeds
• Web 2.0/3.0 is designed for accommodating a multiplicity of perspectives and worldviews.– Neutrality not required
• Spotter as a general purpose annotation tool?
• Experiment in integrating water quality and invasive species occurrence data.– EPA, USGS, GBIF, EEA(?)– SODA
• Pacific Rim data
• New ELVIS: Extinction patterns in Sierra Nevada lakes.– Invasive trout are causing local extinctions.– We can compare with model predictions made by our
PEaCE lab partners.
GBIF Scenarios
Check out the 3 climate change scenarios (land use, health, and agriculture) from the presentation by Hannu Saarenmaa and Jeremy Kerr at
http://circa.gbif.net/Public/irc/gbif/ict/library?l=/presentations/gbif_scenarios_ppt/_EN_6.0_&a=d
8 Step Scenario Development Processi. Decide on selected species.ii. Set criteria for data. (spans 30 years, georeferenced, etc.)iii. Investigate data availability. (GBIF, GAP, etc.)iv. Improve quality and access to data.v. Choose modeling approach. (Eg. Ecological Niche Modeling with
Open Modeller Framework.)vi. Acquire and transform climate change and environment data. vii. Execute models.viii. Present the results.
Could be build a toolkit to ease the “data” steps, i.e. steps 2, 3, 4, 6
Acknowledgements
Cynthia Parr
Andriy Parafiynyk
Lushan Han
Rong Pan
Li Ding
David Wang
Tim Finn
NSF
NBII
Some References
For a walk-through of Spotter, Tripleshop, Elvis, or our other tools, email [email protected]
Two relevant papers from our research group:
Adding Semantics to Social Websites for Citizen Science http://ebiquity.umbc.edu/paper/html/id/365/Adding-Semantics-to-Social-Websites-for-Citizen-Science
Using the Semantic Web to Support Ecoinformatics, http://ebiquity.umbc.edu/paper/html/id/319/Using-the-Semantic-Web-to-Support-Ecoinformatics
An introduction to linked data: How to Publish Linked Data on the Web, http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/