biodbcore: current status and next developments
DESCRIPTION
BioDBCore: Current Status and Next Developments Presented at the Biohackaton 2013, Tokyo, JapanTRANSCRIPT
Pascale Gaudet Chair, International Society for Biocuration Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics
BioDBCore: Current status and future developments
International Society for Biocuration: Mission statement
• Define and promote the work of biocurators
• Foster connections with user communities to ensure that databases and accompanying tools meet specific user needs
• Promote communication and exchanges between curators: meetings, workshops,
• Encourage best practices by providing documentation on standards and annotation procedures ISB
The need • Databases: improve data integration from
published papers
• Journals: link to databases objects
• Researchers: identify resources
• Grant submitters: enforce data sharing plans
Goals 1) Gather information required to provide a
general overview of the database landscape and compare the various resources
2) Encourage consistency and interoperability 3) Promote the use of standards 4) Provide guidance for users 5) Maximize the collective impact of the
resources
BioDBcore group organization • Lead by Pascale Gaudet (ISB/SIB) and
Philippe-Rocca-Serra (BioSharing)
• Guidelines proposed in 2011 paper
• Implemented in 2012 NAR database issue
Use cases • Show all resources of type database which use
MIMARK guidelines • Show all resources where John Smith is involved • Show all resources for mouse phenotypes • Where can I submit my data?
and also: • Guidance for grants’ data sharing policies • Improving integration of data from papers into
databases
Collaborative philosophy • Many groups/resources have been providing
registries and lists of databases • Often not funded, not maintained • BioDBCore seeks to collaborate with all interested
parties to work together to provide a more permanent solution to database descriptions
BioDBcore: Participating groups ² BioDB100 ² BioSharing ² BioCatalogue ² Bioinformatics Links Directory ² Biositemaps ² CASIMIR ² MIBBI ² MIRIAM ² Model Organism Databases ² NIF registry ² … and your group !
BioDBCore descriptors 1. Database name
2. Main resource URL 3. Contact information (e-mail; postal mail) 4. Date resource established (year) 5. Conditions of use (Free, or type of license) 6. Scope: data types captured, curation policy,
standards used 7. Standards: MIs, Data formats, Terminologies 8. Taxonomic coverage 9. Data accessibility/output options 10. Data release frequency 11. Versioning policy and access to historical files 12. Documentation available 13. User support options 14. Data submission policy 15. Relevant publications 16. Resource’s Wikipedia URL 17. Tools available
Database name dictyBase Main resource URL http://dictybase.org Contact information [email protected] Date resource established (year) 2003 Conditions of use Free Scope: Data types captured Genome sequence; gene models including CDS and predicted proteins; Phenotypes, Gene Ontology annotations, Functional annotation (gene product names), Gene nomenclature; Strains; Plasmids; Free text descriptions, Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot), Citations, Researchers database
Curation policy manual curation Standards: MIs, Data formats, Terminologies Gene Ontology, Dicty Anatomy Ontology, Dicty Gene Nomeclature Data formats FASTA, OBO, GAF, GFF3 (standard) Taxonomic coverage (use NCBI Taxid) D. discoideum (44689) including all strains [PRIMARY], also some genome/EST/gene model info for D. purpureum (5786), and gene model sequences for P. pallidum (13642) and D. fasiculatum (261658) Data accessibility/output options HTML, text, database reports Data release frequency curators work on the 'live' database, weekly data dumps (sequences) or monthly (other data) Versioning policy/ access to historical files no versioning but access to historical files is possible
Documentation available http://dictybase.org/FAQ/HelpFilesIndex.html User support options documents, email, webform Data submission policy Data from published literature. Some HTP data
corresponding to published analyses is incorporated Relevant publications PMID: 18974179, PMID: 14681427 Resource’s Wikipedia URL http://en.wikipedia.org/wiki/DictyBase Tools available BLAST, BioMart, Generic Genome Browser, TextPresso, MetaCyc (dictyCyc)
Implementation of BioDBCore at BioSharing (Many thanks to Philippe RS !)
BioDBcore announcement
Published in Nucleic Acids Research database issue 2011 and in the DATABASE journal
Implementation plan • Goal: BioDBCore data public and linked
• Community aware approach: reuse existing stuff
• Current Data model: RDF based on categories from BioSiteMap, MIRIAM, NIF, Dublin core, Darwin Core
• Defined extension mechanisms
www.biodbcore.org
Example BioDBCore entry (1/2)
Example BioDBCore entry (2/2)
Creating, editing, maintaining entries
• Until now: records are manually created from data provided by NAR at publication of Database issue and the Life Sciences Registry (Michel Dumontier and Nick Juty) - Those mostly come as xls files that need to be manually entered - Close to 200 records have been entered out of over 2,000 obtained
Beyond maintenance at BioSharing Ideally database providers would maintain their BioDBCore record up to date • Claim ownership
- A database provider can now (in theory) maintain his own BioDBCore record
Encouraging best practices • DATABASE and Nucleic Acids Research journals:
Editors in chief request BioDBCore information from submitters
• ISB seal of approval • BioDB100 - launched at InCoB 2011 – examples of 100 well
annotated databases
What’s next ?
q Continue to extend participating groups and journals
q Refine scope
q Integrate semantic support
q Develop querying system
q Implement validation tests
q Set up mechanisms for exchange of data among
collaborating groups (in BioDBCore RDF format, or
other)
Identifying or developing semantic support • Policies and guidelines: BioSharing
• Publications and taxon info: identifiers.org
• Authors: ORCID (will also implement organizations)
• Keywords/database scope: NIF when possible
Identifying resources is preferable to developing them !
For biohackaton2013
q Evaluate need for BioDBCore in today’s landscape
of metadatabase resources
q Evaluate further collaboration opportunities
q Set up a better system for creating and maintaining
BioDBCore records
q Identify/develop ontologies pertinent to BioDBCore
Acknowledgements Philippe Rocca-Serra Susanna-Assunta Sansone Eamonn Maguire Alejandra Gonzalez Beltran
International Society for Biocuration
Michael Galperin David Landsman Francis Ouellette
OXFORD UNIVERSITY PRESS
collaborators