building the hymenoptera anatomy ontology through exploration of the journal of hymenoptera research

21
Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research Katja Seltmann Matthew Bertone Matthew J. Yoder István Mikó Elizabeth Macleod Andrew Ernst Andrew R. Deans

Upload: katja-c-seltmann

Post on 29-Jun-2015

646 views

Category:

Technology


2 download

DESCRIPTION

The Hymenoptera Anatomy Ontology (HAO) project aims to capture the complex lexica used to describe hymenoptera anatomy. Our core data are extracted from the corpus of published works, particularly descriptions of new taxa. We reviewed the Journal of Hymenoptera Research (JHR) to extract new labels and ontological classes, explored the completeness of the present version of the HAO, and reflected upon community language trends. Three hundred and fifty three (353) Journal of Hymenoptera Research articles were parsed, accessed through the Biodiversity Heritage Library and vetted against the present ontology. New labels (2121) were collected during this process including about 650 adjectives used to qualify morphological features. Language trends were revealed in the process, showing the occurrence of anatomical labels used in the literature, possibly reflecting the character systems and qualifiers we most often use to describe novel taxa. Additionally the novel software used for text extraction is reviewed, outlining possible improvements and useful tools resulting from this effort.

TRANSCRIPT

Page 1: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera ResearchKatja Seltmann Matthew Bertone Matthew J. YoderIstván MikóElizabeth MacleodAndrew Ernst Andrew R. Deans

Page 2: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

Volumes: 1-16Years: 1992-2007

The opportunity…

1. Database (infrastructure)

2. Terms used in hymenoptera morphology

3. JHR Volumes 1-16 are online and processed using optical character recognition (OCR) software through the Biodiversity Heritage Library

Page 3: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

.

Volumes: 1-16Years: 1992-2007

We were wondering…

1. Can we find new terms for the HAO by text extraction?

2. Look for ways we as a community do things. Is it really true that terminology follows phylogeny?

Page 4: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

How captured terms from Journal of Hymenoptera Research…

1. Download articles from Biodiversity Heritage Library (http://www.biodiversitylibrary.org)

2. Put text in database (MX)

3. Match the article text to the words we know are terms

(also cataloged in the same database)

3. Add new terms based on what is NOT matched

4. People made decisions

353 articles

Page 5: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

from 353 articles:

2121 morphological terms

643 qualitative

2065 terms from JHR are not defined as concepts. Floating without definition!

   

As of June 1, 2010…

Page 6: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research
Page 7: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )

tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)

larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)

Page 8: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )

tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)

larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)

Page 9: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

Qualifying terms: spatial, adjectives, comparative

Page 10: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

Qualifying terms: spatial, adjectives, comparative

posterior (2694, 216) dorsal (2654, 216) anterior (2475, 221) slightly (2247, 227 small (2048, 284) short (1930, 249) apex (1894, 192) smooth (1817, 174) large (1629, 266) distinct (1487, 201)

transverse (1486, 173) similar (1476, 276) base (1471, 200) broad (1394, 178) half (1357, 207) separated (1217, 182) single (1097, 243) rounded (1037, 158) dorsally (1017, 146) nearly (990, 185) shiny (980, 83)

inner (950, 158) shorter (938, 177) few (874, 239) elongate (859, 147) lower (834, 188)

Page 11: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

.

Look at the data a different way…

1. Terminals are taxa discussed in articles• Use only articles that have the word “description

of” in the title• Holes: Ichnumonoidea(49), Chalcidoidea(38),

Vespoidea(36), Apoidea(36),Symphyta(9), Cynipoidea(7), Chrysidoidea(4), Stephanidae(1), Mymarommatidae(1)

2. Characters presence or absence of a term• Use only terms that occurred in more than one

article

3. Created a matrix excluding spatial and qualifying words • (1162 terms, 181 terminals)

4. TNT analysis • xmult /level 7 replications 5 hits 5• nelsen

Page 12: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

Page 13: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

Page 14: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

Page 15: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

Page 16: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

studentstudent

Page 17: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

http://tiny.cc/p0aan

Page 18: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

Petiole: http://tiny.cc/p0aan

Page 19: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

What does this mean to ISH…

1. Next session addresses this…moving to open access journal

2. Things we can do in our publications (in the form of annotations) that can make data synthesis easier and less need to repeat work.

Page 20: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research
Page 21: Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera Research

funding: Advances in Biological Informatics (NSF DBI-0850223) NESCent (NSF EF-0423641)  Morphbank (NSF DBI-0446224) HymAToL (NSF EF-0337220) PEET: Monographic research on parasitic Hymenoptera (NSF DEB-0328922)  

   

intellect and enthusiasm:Biodiveristy Heritage Library, Rick Prelinger

International Society of Hymenopterists NESCent Other ontology projects Deans Lab (Barb Sharanowski, Trish Mullins, Bob Blinn, Rinchhuanawma,

Lydia Abernethy)

Acknowledgments

http://tiny.cc/[email protected]