taxpub: an extension of jats for taxonomic descriptions terry catapano 2010-11-02

37
TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Upload: scot-little

Post on 18-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

TaxPub: An Extension of JATS for Taxonomic Descriptions

Terry Catapano2010-11-02

Page 2: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Taxonomic Descriptions

• “Treatment”• Discussion of the features/distribution of a

related group of organisms, “taxon”• Formal conventions

• ICZN, ICBN, etc...• Frequently parts of publications• Cited as discrete objects• 200+ year history

Page 3: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Linnaeus, Systema Naturae, 10th Edition, 1767-1770

Page 4: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Taekul, C., N. F. Johnson, L. Masner, A. Polaszek and Rajmohana K.. 2010. World species of the genus Platyscelio Kieffer (Hymenoptera, Platygastridae). ZooKeys 50: 97-126.

Page 5: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Treatment Components

• NomenclatureoNameoAuthorityoStatus, etc…

• Description• Materials Examined

oSpecimens Collection Deposit

• Diagnosis, Distribution, Etymology, Key, etc…

Page 6: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Background: TaxonX

• NSF/DFG Funded Project• Extraction of species data from taxonomic

literature of Ants• TaxonX schema for markup of corpus• c. 500 publications; c. 11,000 treatments• Development continued by Plazi

Page 7: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

 

• Independent Not-for-Profit Association• Based in Switzerland• Members from varied domains• Pro Bono• Open Access to Scientific Literature

o Legal "...[T]axonomic treatments as well as the metadata of

the publications – are in the public domain and can therefore be used for further scientific research without any restriction, whether or not contained in copyrighted publications." 

Agosti D, Egloff W (2009). "Taxonomic information exchange and copyright: the Plazi approach". BMC Research Notes 2:53. doi:10.1186/1756-0500-2-53.

Page 8: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

 

• Open Data: Technical Activitieso GoldenGate Markup Editoro Treatment Repository: Literature of Antso Treatments provided to Encyclopedia of Life (EOL)o Collaborations and Participation:

– Journals: ZooKeys, Zootaxa

– "Fine-Grained Markup of Descriptive Data for Knowledge Applications in Biodiversity Domains", Hong Cui, U. of Arizona PI. (NSF)

– “The Hymenoptera Ontology: Part of a Transformation in Systematics and Genome Sciences" Andrew Deans, N.C. State PI (NSF)

– Global Biodiversity Information Facility (GBIF)o Implemented TAPIRo Implemented Species Profile Model (SPM)o Report on Knowledge Organization Systems

– TaxonX & TaxPub

Page 9: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

TaxPub

 

Page 10: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Legacy Literature: Challenges

• Text accuracy• Formal/Editorial Variety• Condensed Information• Loose schema, higher costs of application

Page 11: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

New Literature: Rationale

Matt Yoder et al., Development of the Hymenoptera Anatomy Ontology: Implications for Systematics and Literature Mark-up

Page 12: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

TaxPub

• Extension of Publishing (“Blue”) DTD• Parsimony: largely rely on base DTD• “tp:” namespace• Available throughout

o <tp:taxon-name>: scientific nameso <tp:descriptive-statement>: morphologyo <tp:materials-citation>: specimens; gene sequences

• Within <body>o <tp:treatment> + subelements

Page 13: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

"Common" TaxPub Elements

Page 14: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:taxon-name>

 

            <p>A further undescribed <tp:taxon-name rank="genus">Nixonia</tp:taxon-name> species related to <tp:taxon-name rank="species">N. lamorali</tp:taxon-name> emerged from processing of samples collected in Kogelberg Biosphere Reserve (50km east of Cape Town). This species may usurp <tp:taxon-name rank="species">N. gigas</tp:taxon-name>...</p>

Page 15: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:taxon-name>, con't

• @reg: regularized form of name• object-id: identifier(s) for name

o semantics of xlink attrs?• @*-part-type: semantics for name components

o stringo use URI's: here terms from Darwin Core vocabulary

(http://rs.tdwg.org/dwc/terms/)

<tp:taxon-name rank="species" reg="Nixonia lamorali"><object-id object-id-type="LSID" xlink:href="urn:lsid:biosci.ohio-state.edu:osuc_concepts:184923"/><tp:taxon-name-part taxon-name-part-type="dwc:genus" reg="Nixonia">N.</tp:taxon-name-part><tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet">lamorali</tp:taxon-name-part></tp:taxon-name>

Page 16: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:descriptive-statement>

• Relatively undeveloped• Modeling of descriptions challenging

o complex, if formal, natural language• Segment text

o <tp:descriptive-statement>• Delineate components

o <tp:descriptive-statment-part> • Normalize/Annotate

o <tp:descriptive-statment-part>

Page 17: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:descriptive-statement>

... <tp:descriptive-statement>Length 7.0 mm</tp:descriptive-statement>; <tp:descriptive-statement>completely black</tp:descriptive-statement>, <tp:descriptive-statement>tarsi lighter</tp:descriptive-statement> (figs. 2A, B); <tp:descriptive-statement> wings infuscate throughout, brownish</tp:descriptive-statement>...

...<tp:descriptive-statement><tp:descriptive-statement-part descriptive-statement-part-type="character"><object-id xlink:href="HAO:0000992 "/>tarsi<tp:descriptive-statement-part><tp:descriptive-statement-part descriptive-statement-part-type="state">lighter<tp:descriptive-statement-part></tp:descriptive-statement>...

Page 18: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

...<tp:descriptive-statement><tp:descriptive-statement-part descriptive-statement-part-type="character"><object-id xlink:href="HAO:0000992 "/>tarsi<tp:descriptive-statement-part><tp:descriptive-statement-part descriptive-statement-part-type="state">lighter<tp:descriptive-statement-part></tp:descriptive-statement>...

Page 19: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:materials-citation>

• <tp:collecting-event>: how, when collectedo <tp:collecting-location>: where collected

• <object-id>: current location

Page 20: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:materials-citation>, con't

<tp:material-citation><named-content content-type="dwc:individualCount"                            >1</named-content> <named-content content-type="dwc:sex">male</named-content>, <tp:collecting-event><tp:collecting-location><tp:location location-type="dwc:country">South Africa</tp:location> <tp:location location-type="dwc:stateProvince>Western Cape"</tp:location><tp:location location-type="dwc:locality">Langberg Farm, (3 km 270° W Langebaanweg)</tp:location><tp:location location-type="dwc:verbatimCoordinates">32°58.461&#8217;S 18°07.344&#8217;E</tp:location></tp:collecting-location><named-content content-type="dwc:verbatimDate">12&#8211;19 Mar 2003</named-content></tp:collecting-event>,<named-content content-type="dwc:recordedBy">S. van Noort</named-content>, <named-content content-type="dwc:samplingProtocol">Malaise trap, LW02-N2-M175</named-content>, <named-content content-type="dwc:locationRemarks">Sand Plain Fynbos</named-content>, <object-id content-type="dwc:collectionCode">SAM-HYM-P030184</object-id>, <object-id content-type="dwc:catalogNumber">OSUC 256954</object-id>), (<object-id content-type="dwc:institutionCode">SAMC</object-id>)</tp:material-citation>

• tp:location:o @location-type:

URI (Darwin Core) string

• named-content: all other components

Page 21: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

tp:treatment and Sub-Elements

Page 22: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:treatment>

• <tp:treatment-meta>o bibliographic metadata for treatmentso standalone treatments

• <tp:nomenclature>: requiredo <tp:taxon-name>: requiredo other elements...

• <tp:treatment-sec> o @sec-type

Page 23: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:nomenclature>

<tp:taxon-treatment>            <tp:nomenclature>                <tp:taxon-name rank="dwc:species" auth-code="iczn">                    <tp:taxon-name-part taxon-name-part-type="dwc:genus"                        >Nixonia</tp:taxon-name-part>                    <tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet"                        >masneri</tp:taxon-name-part>                    <object-id                        xlink:href="urn:lsid:zoobank.org:act:51495B19-AA60-4560-AAC6-2EED4110C0ED"/>                </tp:taxon-name>                <tp:taxon-authority>van Noort &amp; Johnson</tp:taxon-authority>                <tp:taxon-status>sp. n.</tp:taxon-status>                <xref ref-type="fig" rid="f1">Figures 1A&#8211;F</xref>            </tp:nomenclature>

Page 24: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:nomenclature-citation>

                <tp:nomenclature-citation-list>                    <tp:nomenclature-citation>                        <tp:taxon-name>Nixonia</tp:taxon-name><xref rid="B7">Masner, 1958, 101</xref>                        <comment>Original description. Type: <tp:taxon-name>Nixonia pretiosa</tp:taxon-name> Masner, by monotypy and original designation. For subsequent taxonomic literature see <xref rid="B4">Johnson (1992)</xref> or The Genera of <tp:taxon-name>Platygastroidea</tp:taxon-name> of the World (<ext-link xlink:href="http://purl.oclc.org/NET/hymenoptera/platygastroidea">http://purl.oclc.org/NET/hymenoptera/platygastroidea</ext-link>).</comment>                    </tp:nomenclature-citation>                </tp:nomenclature-citation-list>            </tp:nomenclature>

Page 25: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:treatment-sec>

 

Page 26: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

<tp:treatment-sec sec-type="Materials Examined">

<title>Type material</title>

<p><tp:material-citation><tp:type-status>Holotype</tp:type-status>...</tp:treatment-sec>

<tp:treatment-sec sec-type="Diagnosis">

<title>Diagnosis</title>

<p> Most similar to ... </p>

</tp:treatment-sec>

<tp:treatment-sec sec-type="Etymology">

<title>Etymology</title>

<p> Named in honour of Lubomír Masner, ...</p>

</tp:treatment-sec>

<tp:treatment-sec sec-type="Distribution">

<title>Distribution and habitat association</title>

<p> Currently only known from two widely spaced localities.... </p>

</tp:treatment-sec>

<tp:treatment-sec sec-type="Description">

<title>Description</title>...

<treatment-sec>, con't

Page 27: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Keys

• Indentify subordinate taxa within higher taxon (e.g., species in genus)

• No model in TaxPub

• Use existing JATS table model

• Use <ext-ref> or <related-object>

Page 28: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Keys, con't <tp:treatment-sec sec-type="Key">

<title>Key to species of Nixonia</title>

<p>Online interactive key...></p>

<table-wrap>

<table>

<tbody>

<tr content-type="lead"> <td><target id="key1">1</target></td>

<td>Third antennal segment shorter than, or subequal to, second antennal segment</td>

<td><xref>2</xref></td> </tr>

<tr content-type="graphic"> <td> <graphic xlink:href=”” />

</td>

</tr>

</tbody>

Page 29: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Test Implementations

• “Data-driven” publication– OSU Virtual Systematics Lab– Database morphological data– Export taxon descriptions as TaxPub

• ZooKeys– ZooKeys 50– Archived by PubMed Central

Page 30: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02
Page 31: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02
Page 32: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Status and Future

• SourceForge project

– http://sourceforge.net/projects/taxpub

• Subversion

• Updated Documentation, examples, tools (conversion and profiling)

• Next release December 2010

• Call for comment December 2010

• Version 1: March 2011

• Expand Zoological focus

• Morphology markup

• Vocabularies for type attributes, etc...

• Continued modeling, maintenance infrastructure, hand off...

• Data-driven treatment publication

Page 33: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Reflections, Self-Criticisms, Doubts

Page 34: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Problems, Issues

• “Treatments”

– Undefined

– Conventional, but not Regular

• Zoological focus to date

• Prospective/Retrospective blurry

• Data/Publication

– Scenarios? (XHTML + RDFa, ePub, extraction of data for analysis)

– Inline vs. Linked– Metadata and Packaging

• Page breaks

– Code requirements

– Citation practices

Page 35: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

DTD

• Perceived as “old-fashioned”, “superseded”

• Unfamiliar

• Complex

• Technical Limitations– Datatypes: (really an issue for taxonomic pubs?)– Namespaces: (e.g., Keys; existing schemas; embed?)– Tools, libraries: (processing preferences)

• Embedded XML documentation

Page 36: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Super Set Customization

• Necessary?

• “Structural” elements: <tp:treatment>

– <sec> + @sec-type adequate?

• <tp:nomenclature> has own content model

• Restrictions to enable lower costs of creation/application

• ZooKeys: too restrictive (PCDATA)

• <tp:nomenclature-citation-list>, hard to model in generic JATS

• Otherwise semantic sugar

• <named-content> adequate?

• TaxPub mostly isomorphic with Blue (e.g., ZooKeys > PMC)

• So...why?

• Schema Validation

• Applications (not yet)

• Convenience

• Social/Market value

• Reifies; focuses efforts

Page 37: TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano 2010-11-02

Profiling

• Customization is not just Extension files

– Documention on use of Extension

– Documention on use of Blue DTD

– Samples

– Tools

• Semantic and Structural Layers

• Use or develop vocabularies for type attributes

– e.g., DarwinCore

– Model and Publish own

– Enumerate in DTD, Schematron

• Express usage rules

– Subset

– Schematron