phyloinformatics and the semantic web

38
Phyloinformatics and the Semantic Web Rutger Vos

Upload: rutger-vos

Post on 18-Dec-2014

994 views

Category:

Technology


6 download

DESCRIPTION

Departmental seminar to the University of Bath, 21 March 2011.

TRANSCRIPT

Page 1: Phyloinformatics and the Semantic Web

Phyloinformatics and the Semantic Web

Rutger Vos

Page 2: Phyloinformatics and the Semantic Web

Outline

• What is phyloinformatics and why should you care?

• How we got here and where we are now• How the semantic web can help• Projects that apply the semantic web to

phyloinformatics• Examples of linked data• Where to next

Page 3: Phyloinformatics and the Semantic Web

What is Phyloinformatics?

Phylogenetics:“The systematic study of organism relationships based on evolutionary similarities and differences.”

Informatics:“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”

Page 4: Phyloinformatics and the Semantic Web

Why should you care?

Firstly, “Nothing in evolution makes sense except in the light

of phylogeny”

Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile?

But if that doesn’t convince you…

Page 5: Phyloinformatics and the Semantic Web

As a consumer of phylogenetic data

The “New Biology” is coming:“Major advances will take place via integration and

synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009)

Presumably, this will involve retrieving and classifying.

Page 6: Phyloinformatics and the Semantic Web

As a consumer of phylogenetic data

Or maybe for you phylogeny is simply a nuisance:– Functional prediction– Comparative analysis– Ortholog finding– Etc.

But it would still be nice to have that out of the way painlessly…

Page 7: Phyloinformatics and the Semantic Web

As a producer of phylogenetic data

• Many journals require proper storage of data described in a manuscript.

• Funding agencies require dissemination and sharing of research results.

Page 8: Phyloinformatics and the Semantic Web

The Past

• Everything was closed:– Idiosyncratic,

private data– “pay-walls”–Closed source

softwareNo accessible publishing medium

Page 9: Phyloinformatics and the Semantic Web

The Present

Science is opening up:–Open data–Open access

publishing–Open source software

Publishing is now accessible to everyone, online

Page 10: Phyloinformatics and the Semantic Web

Our current nightmare

Documents, documents everywhere

Page 11: Phyloinformatics and the Semantic Web

The current web makes sense to us

Page 12: Phyloinformatics and the Semantic Web

But not to a machine

Page 13: Phyloinformatics and the Semantic Web

What was informatics again?

“The sciences concerned with gathering, manipulating, storing,

retrieving, and classifying recorded information.”

Page 14: Phyloinformatics and the Semantic Web
Page 15: Phyloinformatics and the Semantic Web

This is too hard

• O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittleman and A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.

Page 16: Phyloinformatics and the Semantic Web

Let’s delegate that

Page 17: Phyloinformatics and the Semantic Web

Instead of linked documents

Page 18: Phyloinformatics and the Semantic Web

A web of linked concepts

Page 19: Phyloinformatics and the Semantic Web

Concepts connected by statements

Page 20: Phyloinformatics and the Semantic Web

Concepts are defined in ontologies“An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain,

and may be used to describe the domain.”

Page 21: Phyloinformatics and the Semantic Web

Expressing concepts in data syntax

Page 22: Phyloinformatics and the Semantic Web

Concepts are linked

A triple is a statementsubject predicate object

Linked by statements called “triples”

Any part of a triple may have to be uniquely identifiable. For this we use URLs.

Page 23: Phyloinformatics and the Semantic Web

An applied example

Triple 1Subject: <http://example.org/data/tree1>Predicate: <http://example.org/terms/hasLikelihood>Object: 2342.323i.e. -lnL(tree1) = 2342.323

Triple 2Subject: <http://example.org/data/tree2>Predicate: <http://example.org/terms/hasLikelihood>Object: 2341.184i.e. -lnL(tree2) = 2341.184

Page 24: Phyloinformatics and the Semantic Web

What’s the better tree?

• The ontology defines what a likelihood is and how to compare negative log likelihoods.

• Hence, automated reasoning can conclude that tree2 is the better tree.

Page 25: Phyloinformatics and the Semantic Web

URLs for phylogeneticsPhyloWS doesn’t just provide an anchor to identify

phylogenetic data, it also enables searching and retrieval.

Page 26: Phyloinformatics and the Semantic Web

The EvoInfo “stack”

Page 27: Phyloinformatics and the Semantic Web

TreeBASE

Page 28: Phyloinformatics and the Semantic Web

External links

Taxon

Taxonvariant

Study

Page 29: Phyloinformatics and the Semantic Web

A simple example

TreeBASE maps to uBio using skos:closeMatch...

…and uBio to ToL using gla:mapping

Page 30: Phyloinformatics and the Semantic Web

Another Example, UniProt sequences

TreeBASE stores NCBI taxonomy

identifiers

Standard tools can rewrite

these linkout URLs

Result is a corresponding list of UniProt records

Page 31: Phyloinformatics and the Semantic Web

Another Example, Geocoding

TreeBASE uses DarwinCore for lat/lon annotations

Page 32: Phyloinformatics and the Semantic Web

Many online data repositories

Page 33: Phyloinformatics and the Semantic Web

Challenges

• Fragile: many services offline in Japan• Data gets bigger and bigger• Many concepts not yet in ontologies• Many data still “locked in” in publications

Page 34: Phyloinformatics and the Semantic Web

The Future

Page 35: Phyloinformatics and the Semantic Web

The cloud

• Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo)

• Data will be stored in the cloud (Big Table, FreeBase)

Page 36: Phyloinformatics and the Semantic Web

Interpreting locked in knowledge

• Text and images meant for humans are being processed by machines. Examples:– Taxon name mining

(BHL)– Gene name and function

mining– Tree figure processing– Automated annotation

Page 37: Phyloinformatics and the Semantic Web

Summary

• Phyloinformatics is moving from closed to open to linked data

• Concepts and syntax are increasingly formalized and machine readable

• Automated queries across integrated resources will enable synthetic research

• Still lots to do to deploy these technologies and unlock legacy data

Page 38: Phyloinformatics and the Semantic Web

Acknowledgements

Thank you for your attention!Also, many thanks to:

The Pagel lab at UoR

The EvoInfo groupVal TannenWayne MaddisonWilliam PielHilmar LappArlin Stoltzfus