Implementation of TaxPub, a JATS extension for domain-
specific markup in taxonomy:the experience of a biodiversity
publisher
Lyubomir Penev, Terry Catapano, Donat Agosti, Teodor Georgiev,
Guido Sautter, Pavel Stoev
JATS-Con, 16 - 17 Oct 2012
Plazi
This presentation wll focus on:
Implementation of TaxPub, an extension to the general NLM JATS DTD for taxonomy publishing
Semantic tagging of and enhancements to published texts
Dissemination of published information to aggregators
Current and future development of TaxPub
Quick facts about PlaziPlazi founded in 2008: Swiss based NGO with members in Switzerland, Germany, US and IranPlazi is a research based think tank with the mission to promote the idea of open access to scientific contentPlazi has four pillars: Legal advice, technical solutions (eg TaxPub), maintenance of a treatment repository, advocacyPlazi GmbH founded in 2012 as service SME owned by Plazi to provide document conversion services and consultationFunding from public donors, eg. EU, and privateClients are global
ContextConservation: Global biodiversity crisis. Increasing loss of species, but no tools to measure and document itScience: ca 1.8M species described, ca 8M expectedScientific publications
ca 17,000 species described per annum; ca 100,000 redescriptions per annum -> rich contenthighly fragmented with over 2,500 journals and books involved -> difficult access
Solution: Open Access and semantically enhanced publications allow immediate registration of new taxa and dissemination of content -> Taxpub JATS/DTD
This presentation wll focus on:
Implementation of TaxPub, an extension to the general NLM JATS DTD for taxonomy publishing
Semantic tagging of and enhacements to published texts
Dissemination of published information to aggregators
Current and future development of TaxPub
TaxPub Lightweight extension of Blue DTD Describe at JATS-Con 2010: “TaxPub: An Extension of
the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions” (http://www.ncbi.nlm.nih.gov/books/NBK47081/)
Treatments (i.e., species descriptions) <tp:taxon-treatment>, <tp:nomenclature>,
<tp:treatment-sec> Domain specific content
<taxon-name>: Taxonomic names <materials-citation>references to specimens <descriptive-statement>: descriptions of
morphological features
<tp:taxon-treatment> <tp:nomenclature> <tp:taxon-name><tp:taxon-name-part taxon-name-part-type="genus">Platyscelio</tp:taxon-name-part> <tp:taxon-name-part taxon-name-part-type="species">mzantsi</tp:taxon-name-part><object-id>urn:lsid:zoobank.org:act:D084EF48-4736-444F-916F-2C8CDE23E29B</object-id><object-id>urn:lsid:biosci.ohio-state.edu:osuc_concepts:242617</object-id>
</tp:taxon-name><tp:taxon-authority>Taekul & Johnson</tp:taxon-authority><tp:taxon-status>sp. n.</tp:taxon-status> </tp:nomenclature><tp:treatment-sec sec-type=”materials_examined”>...
<tp:treatment-sec sec-type="materials_examined"> <p> <tp:material-citation> <tp:type-status>Holotype</tp:type-status> worker. <tp:taxon-type-location>King Saud Museum of Arthropods (KSMA),
College of Food and Agriculture Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia.</tp:taxon-type-location> <tp:collecting-event> <tp:collecting-location>SAUDI ARABIA, Al Bahah province,
Amadan forest, Al Mandaq governorate, </tp:collecting-location> <named-content content-type="dwc:verbatimCoordinates">20°12'N, 41°13'E</named-content> , 1881 m.a.s.l. 19.V.2010 (M. R. Sharaf & A. S. Aldawood Leg.); </tp:collecting-event> </tp:material-citation> </p></tp:treatment-sec>
TaxPub: Recent and Future Developments
Largely stable <x> Greenfication Interest from journals:
European Journal of Taxonomy Zootaxa (via EOL)
Markup of morphological descriptions
<p>Spreading shrub; stems erect,<Categorical uri="http://ontology.org/plant/stem-color"> <State uri="http://ontology.org/plant/greenish">greenish</State></Categorical>. Leaves deciduous early in summer (particularly when infected with Diseasomyces), oblong, apex obtuse, glabrous or weakly hirsute; stipules sharply pointed, <Quantitative uri="http://ontology.org/plant/stipule-width"><value value="3.2">3,2mm</value></Quantitative> wide, <Categorical uri="http://ontology.org/plant/stipule-color"><State uri="http://ontology.org/plant/black">black</State> or <State uri="http://ontology.org/plant/brown">darkish brown,</State></Categorical>extremely rarely yellow, often shallowly joined around the node; spines stout.</p>
TaxPub: Challenges Maintenance
Sourceforge Volunteer effort, little time, no funding… Supported by Plazi
Documentation Comments with ad hoc markup in extension files Converted to HTML by NCBI Tool Maintained at Species-ID wiki
Quick facts about Pensoft & ZooKeysPensoft founded in 1992: more than 700 books
published; two offices in Sofia and Moscow; 16 employeesZooKeys launched in July 2008 as the first mandatory Open Access journal in taxonomy; 205 issues, 20,000 pages IN FOUR YEARSAll new taxa registered in ZooBank and supplied to EOL, Plazi and the wiki Species-ID CrossRef member, ISI and Scopus covered, indexed in Zoological Record, DOAJ, CABI Abstracts, Google Scholar; archived in PubMedCentral and CLOCKSSPensoft Journal System – XML-based online editorial system; publishing services offered to society and institutional journals
ZooKeys growth
Unified marked up final outputTaxon treatments, keys, images, localities
PROSPECTIVE PUBLISHING | HISTORICAL LITERATURE
The XML landscape for legacy and prospective taxonomic literature
Content management systems &repositories (e.g., EOL, GBIF, SCRATCHPADS)
TaxPub XML schema PENSOFT MARK UP tool
Marked up publicationsPDF, HTML and XML
archivingWIKI
Species-IDWikispecies
Wikipedia
Indexing (IPNI, ZooBank, Myco-
Bank, GNA)Aggregators(EOL, GBIF)
Electronic archives; Data
Centers
END USERS
TaxonX , taXMLit schemas PLAZI’ GOLDEN GATE editor
Automated submission; peer-review
Four stages of the XML-based editorial workflow
SUBMISSION: XML-tagged or non-tagged manuscripts?
PEER-REVIEW/EDITORIAL PROCESS: The technical challenges of the XML mark up
PUBLICATION: Different publishing formats and to whom they are addressed?
DISSEMINATION: How to provide a maximum distribution of published information
But why to mark up? Is it really needed? Who will be using it?
Descriptions
Images
Occurrences
Nomenclature
Literature
Plazi
What XML gives to the readers more than the usual PDF does?
Semantic enhancements to published texts
Semantic enhancements to published texts
Archiving in PubMedCentral
Automated export of species descriptions to Encyclopedia of Life
(EOL)
XML MARK UP
Automated harvesting and deposition of taxon treatments in Plazi
Export of content to the Wiki environment
Species descriptions on Wikispecies and Wikimedia
Commons
The Future of TaxPub and its implementationsMore semantic Web Enhancements! Pensoft Writing Tool (PWT) – a collaborative article writing platformCommunity-based and open peer review processBiodiversity Data Journal will publish any kind of “small data”: checklists, nomenclatural acts, taxon treatments
The collaborative article authoring tool
Why the Biodiversity Data Journal is needed?
Primary data Drawings: Slavena Peneva
Publishing and sharing of primary data
RE-USEof
CONTENT
Biodiversity Data Journal All data maters: NO lower or upper limit of
manuscript size! ALL within a single online collaborative
platform, including the writing of the manuscript!
Collaborative article authoring tool Community peer review with “open” and
“public” options, on the top of conventional peer-review
Online editorial process and version control Standard-compliant (Darwin Core, Dublin
Core, NLM JATS, etc.) Pre-defined biological Code-compliant article
templates
Life cycle of data published in the BDJ
BIODIVERSITYMANUSCRIPT
Occurrence data Genome data
Image galleries
Morphometric data
Environmental data
Phylogenetic data
Any other data
XML MARK UP
Structured text (data!)
ARTICLES Occurr-ence data Taxon namesTaxon treatments
PlaziBHL
Wiki COL
Biblio-graphies
The lessons learnedThe main difficulties are caused by:
The specificity of the domain (e.g., taxon names, synonyms, instability of nomenclature, lack of global LSID infrastructure, etc.)Mark up of occurrence data (certainly a great challenge)Cost efficiency of markup processSociological barriers: the majority of authors are not willing to change their writing habits; most are still not aware about the tremendous advantages of the Web 2.0 technologiesMost small taxonomy publishers (and some bigger ones) have no experience in XML-based editorial wokflows or they simply can’t afford it
“ Semi-automatically generated semantic, enhanced e-publications are the only way to describe the missing 10 M species, and to deal with an increasing flood of data.”
Donat Agosti
It is not easy, but...... ... it is exciting ....... however possible only through Open
Access!