the diversity of biomedical data, databases and standards (research data alliance (rda) 8th plenary)
TRANSCRIPT
The Diversity of Biomedical Data,
Databases and StandardsPeter McQuilton
BioSharing Content Leadhttps://www.biosharing.org
@biosharing
IG Elixir Bridging Force, WG Biosharing Registry,WG Data Type Registries,WG Metadata Standards CatalogInternational Data Week, RDA, Denver, 15th September, 2016
A growth in data, a growth in databases, a growth in standards
Number of databases in the NAR database issue, up to 2015 (from @AlexBateman1)
• Data/content standards:
• Structure, enrich and report the description of the datasets
and the experimental context under which they were produced
• Facilitate the discovery, sharing, understanding and reuse of
datasets
• ensure all digital research outputs are Findable, Accessible,
Interoperable and Reusable (FAIR)
Data has to be structured for sharing – we need standards
Content standards – enablers
Formats Terminologies Guidelines
Minimum information reporting
requirements, checklists o Report the same core,
essential information o e.g. MIAME guidelines
Controlled vocabularies, taxonomies,
thesauri, ontologies etc.o Use the same word and refer to
the same ‘thing’o e.g. Gene Ontology
Conceptual model, conceptual
schema, exchange formats etco Allow data to flow from one
system to anothero e.g. FASTA
de jure de factograss-roots
groupsstandard
organizations Nanotechnology Working Group
Over 700 content standards in biomedical sciences
miameMIAPA
MIRIAMMIQASMIX
MIGEN
ARRIVEMIAPE
MIASE
MIQE
MISFISHIE….
REMARK
CONSORT
MAGE-TabGCDML
SRAxmlSOFT FASTA
DICOM
MzMLSBRML
SEDML…
GELML
ISA-Tab
CML
MITAB
AAOCHEBI
OBIPATO ENVO
MOD
BTOIDO…
TEDDY
PROXAO
DO
VO
Formats Terminologies Guidelines
…….... …….... ……....
Technologically-focused content standards
Biologically-focused content standards
Even if common features exists, e.g.:- description of source biomaterial- experimental design componentsthese are inconsistently duplicated
Arrays
ScanningArrays &Scanning
ColumnsGels
MS MS
FTIRNMR
transcriptomics proteomics metabolomics
plant biologyepidemiology microbiology
Diversity in Standards
What is BioSharing?
A web-based, curated and searchable portal that monitors the development and evolution of standards, their use in databases and the adoption of both in data
policies, to inform and educate the user community.
What is BioSharing?
Standards are digital objects too and we make them FAIR
Data policies by funders, journals and other organizations
(>100)
Database, tools and services
(>1000)
Content standards(>700)
Complex and evolving landscape
Formats Terminologies Guidelines
Working with and for the community
NCBI Taxon
~1400 tagsSome hierarchySynonyms4 axes – - Process - Material - Datatype - Property
What data do we capture?
Collections group together
one or more types of
resource by domain,
project or organization.
Recommendations are a
core-set of resources that
are selected and
recommended by a funder
or journal data policy.
Grouping records for different use cases
“BioSharing and its interactive browser will allow us to discover which databases and standards are not currently included in our author guidelines, enabling us to regularly monitor and refine our policies as appropriate, in support of our mission to help our authors enhance the reproducibility of their work.” – Holly Murray, F1000Research
Advisory Board Operational Team