introduction to eol.org for scientists

Post on 01-Sep-2014

1.867 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk given at the Semantic Reasoning workshop held at the National Museum of Natural History September 6, 2012. The audience included computer scientists and biological scientists interested in using EOL for their research.

TRANSCRIPT

@cydparr @eol

Cynthia ParrSemantic reasoning workshopWashington, DC 6-7 September 2012

Introduction to eol.org

Whirlwind tour

• What kind of information we have• How we assemble that information• How machines and people interact with EOL• Next steps

>1.1 million taxon pages with content from more than 200 providers, 1000s individuals

5 million content objects

Details tab

Leafy Seadragon example

Total of 1,344,711 images 9,586 videos 28,569 sounds

Maps

Literature

EOL has Global Partners and is internationalized

China

Australia

Dutch

South Africa

Costa Rica

Mexico EgyptIndia

Colombia

Peru

Taiwan

Norway

USA

EOL summarizes knowledge

Erosaria caputserpentisSerpent's Head Cowrie

Depth range based on 51 specimens in 2 taxa.Water temperature and chemistry ranges based on 40 samples.

Environmental ranges Depth range (m): -5 - 67 Temperature range (°C): 23.011 - 28.496 Nitrate (umol/L): 0.048 - 0.923 Salinity (PPS): 33.821 - 35.837 Oxygen (ml/l): 4.349 - 4.825 Phosphate (umol/l): 0.088 - 0.228 Silicate (umol/l): 0.983 - 4.026

From Moorea Biocode

From GBIFFrom OBIS

Erosaria caputserpentisSerpent's Head Cowrie

Salinity envelope (n=40)

From OBIS

Cynthia ParrSpecies Pages Group

Global Content Summit17-19 Jan 2011

Richness scores

http://eol.org/pages/704102

Whirlwind tour

• What kind of information we have• How we assemble that information

– Big picture– Subject semantics– Names infrastructure– Curation– Richness score

• How machines and people interact with EOL• Next steps

EOL aggregates and curates

Scientific Databases, includingBHL, GBIF, ALA, INBio, COL, Scratchpads, LifeDesks Scientific Journals Curate

CommentRate, Collect

eol.orgAggregate

Quality control

EOL v2

Plinian Core

DwCdescription

SPMinfoitem

usingDarwin Core Archive flat files as transport mechanism

Sharing process adds semantics to content objects

DistributionMolecularBiology

Multiple topicsTypeInformation

HabitatConservationStatus

ThreatsMorphology

ConservationManagement

TrendsSize

AssociationsUses

TrophicStrategyCyclicity & Life Cycle

PopulationBiologyReproduction

MigrationTaxonomy

LifeExpectancyIdentification

BehaviourEcology

Diseases

0 100000 200000 300000 400000 500000 600000 700000 800000

Number of text objectsSu

bjec

t of t

ext o

bjec

t

Content objects are associated with taxon names

Wikimedia Commons: Physeter macrocephalus

(note we actually have over 3.3 million named pages)

Names from different providers are matched

Animal Diversity Web .... Physeter catodon Linnaeus, 1758 ARKive .................. Physeter macrocephalus Linné BioPix .................. Physeter macrocephalus L. INBio ................... Physeter catodon IUCN .................... Physeter Macrocephalus ITIS .................... Physeter macrocephalus Linnaeus, 1758 MarLIN .................. Physeter macrocephalus Linné NCBI .................... Physeter Catodon Species 2000 ............ Physeter macrocephalus Linnaeus, 1758 Taxon Concept ........... Physeter australasianus Desmoulins, 1822 Wikimedia Commons ....... Physeter macrocephalus WORMS ................... Physeter macrocephalus Linnaeus 1758

Physeter macrocephalus

Taxon concept pages: multiple hierarchies on Names tab

Problem: one taxon may have several names

Animal Diversity Web .... Physeter catodon Linnaeus, 1758 ARKive .................. Physeter macrocephalus Linné BioPix .................. Physeter macrocephalus L. INBio ................... Physeter catodon IUCN .................... Physeter Macrocephalus ITIS .................... Physeter macrocephalus Linnaeus, 1758 MarLIN .................. Physeter macrocephalus Linné NCBI .................... Physeter Catodon Species 2000 ............ Physeter macrocephalus Linnaeus, 1758 Taxon Concept ........... Physeter australasianus Desmoulins, 1822 Wikimedia Commons ....... Physeter macrocephalus WORMS ................... Physeter macrocephalus Linnaeus 1758

Problem: the same name may apply to more than one taxon

EOL curation

• Trust or untrust taxon associations• Add new taxon association• Set preferred hierarchies• Set preferred common names• Leave comments

Coming: Taxonomic concept curation

EOL is not Wikipedia

…though we have more than 212,000 Wikipedia articles and 115,000 Wikimedia images Can’t currently edit within text objects

Whirlwind tour

• What kind of information we have• How we assemble that information• How machines and people interact with EOL

– API– Third party apps– Collections and communities

• Next steps

EOL enables machine interaction

Curate

CommentRate, Collect

eol.orgAggregate

API

Third party apps

Third party applications eol.org/api

People interact with EOL content & each other

Collections

Communities

Studies currently underwaywith University of Maryland

• Cross-cultural study on motivation to engage in citizen science – Dana Rotman

• Interaction among scientists and non-scientists on EOL’s social network – Jae-wook Ahn

• Website traffic analysis to aid conservation communication – Yurong He and Bill Fagan

Whirlwind tour

• What kind of information we have• How we assemble that information• How machines and people interact with EOL• Next steps

Using EOL collections to get computable data

Step 1: Search on EOL for organisms with characteristics of interest. Add each one to an EOL collection. Step 2: Write a program using EOL API methods to retrieve the external database identifiers for the species in that collection.Step 3: Add to your program code to retrieve data using external database APIs.Step 4: Analyze, rinse, repeat.

From Arthur Chapman

Crowd-sourcing for computable data

Lovell and Libby Langstroth, Calphotos Foodwebs.org

Efforts underwayPhylogenetic trees: Collaboration with Open Tree of Life project for draft tree

Computable data challengehttp://eol.org/info/data_challengeRod Page’s Bionames projectAlexandria Archive Institute

Devries and Thessen using DBPedia Spotlight to extract associations among taxa and add to Linked Open Data cloud

Sloan 2 project: Marine computable data

TraitBank ABI proposal

Research wishes

• Collecting nominations for research idea where EOL can help:

http://eol.org/info/wishes_for_researchDUE 15 SEPTEMBER

• Will follow with Rubenstein Fellows call for proposals

Our fundersJohn D. and Catherine T. MacArthur FoundationAlfred P. Sloane FoundationSmithsonian InstitutionMarine Biological LaboratoryHarvard UniversityDavid Rubenstein and other funders and donors

All our content providers and global partners

Volunteer curators and individual contributors via Flickr, Wikimedia, and members of EOL

Thanks to

Summary of EOL page richnessOverall• 950,000 have content• 2 % are rich• ~22 % have only links• to literature

Hot List• 30 % of 75K are rich• Average richness = ~30

• Red Hot List• 56 % of 3K are rich• Average richness = 43

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 1360

100000

200000

300000

400000

500000

600000

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 1311

10

100

1000

10000

100000

1000000

Partners in order of # taxa contributed to EOL

Num

ber o

f tax

a fo

r whi

ch c

onte

nt is

con

trib

uted

to E

OL

Long Tail in databases contributing to EOL

… viewed on log scale

Taxon page richness algorithm

a (Breadth) b (Depth) c (Diversity)+ +

Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status

Depth: # words per text object, # words total

Diversity: Sources (partners)

60% 30% 10%

0 – 100, Threshold 40

top related