gen2phen gam8 meeting leiden - identifiers for lsdbs

11
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 Identification of G2P databases - challenges and proposal for a solution 1 -- Overview -- Identification difficulties - the Knowledge Centre perspective Or, why we need persistent identifiers for database resources Proposal to collaborate with the BioDBCore initiative standardizing registration & description of bio-databases This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/ ) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given. Gudmundur A. Thorisson <[email protected] > ULEIC Adam J. Webb <[email protected] > ULEIC Raymond Dalgleish <[email protected] > ULEIC Juha Muilu <[email protected] > FIMM Friday, 27 January 12

Upload: gudmundur-thorisson

Post on 17-Dec-2014

374 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Identification of G2P databases - challenges and proposal for a solution

1

-- Overview --

✴ Identification difficulties - the Knowledge Centre perspective

✴ Or, why we need persistent identifiers for database resources

✴ Proposal to collaborate with the BioDBCore initiative

✴ standardizing registration & description of bio-databases

This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given.

Gudmundur A. Thorisson <[email protected]> ULEICAdam J. Webb <[email protected]> ULEIC

Raymond Dalgleish <[email protected]> ULEICJuha Muilu <[email protected]> FIMM

Friday, 27 January 12

Page 2: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

c.103C>

T

c.321G>

T

Linking  resources

c.301C>

T

c.465A>G

c.555G>

T

DB  maintainer SubmiIerPerson

Resource

Databases

DB  maintainerSubmiIer

External  records  /  annotaEons

Friday, 27 January 12

Page 3: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

URLs  are  unstable

• Domain  names  /  subdomains  can  change–hgvbaseg2p.org  -­‐>  gwascentral.org– server1.example.com  -­‐>  server2.example.com

• Paths  can  change–e.g  /LOVD2/  change  to  /LOVD3/

• LSDB  genes  can  move  –e.g  gene  ADAM19  moves  from  one  LOVD  install  to  another

• Databases  can  merge– i.e  gene  ADAM19  on  two  different  installs  are  reconciled  into  a  single  install

hIp://subdomain.example.com/path/to/resource

Friday, 27 January 12

Page 4: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

• Gene  name  not  suitable–>  1  database  for  a  given  gene• gene.lovd.nl  -­‐>  returns  list  of  databases  (or  redirects  if  only  1  is  known)–1  to  many

• lovd.nl/gene  -­‐>  redirects  to  *one*  database–1  to  one,  but  many  resource  do  not  receive  idenEfiers

• These  are  locators,  not  idenEfiers

• Non-­‐gene  based  resources• Ideally  the  idenEfier  should  also  operate  as  the  locator  (like  DOIs  via  a  DOI  resoluEon  service)–hIp://dx.doi.org/10.19192  resolves  DOI  10.19192

IDENTIFIER DATA  RESOURCE1:1

Friday, 27 January 12

Page 5: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Proposal to collaborate with BioDBCore• BioDBCore aims

– annotation - organize the bio-database ‘resourceome’

– discovery - e.g. which protein sequence databases are available?

• Who’s behind it?– International Society for Biocuration– Resource catalogues: Bioinformatics

Links, BioSiteMaps, NAR db-issue etc

– Working group includes reps from NAR and DATABASE journals, MIBBI, Model organism db’s, CASIMIR mouse informatics consortium, others

5

Friday, 27 January 12

Page 6: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 6

Friday, 27 January 12

Page 7: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Persistent resource identifiers in BioDBCore• They plan to use MIRIAM registry / ID resolution service

– unique, persistent and unambiguous identification of various kind of concepts.• http://identifiers.org/ec-code/1.1.1.1

• http://identifiers.org/pubmed/16333295• http://identifiers.org/doi/10.1038/nbt1156

• Decouples identification from location• Many resourcesa are already registered with MIRIAM • Operated by EBI <-- long-term sustainability prospect• Adoption by players LS Semantic Web comunity

– URIs for identifying entities in biological information represented in RDF– http://lsrn.org, Shared Names, Bio2RDF, others

7

Friday, 27 January 12

Page 8: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

How might this work?

• Using database URIs - plausible scenario– Persistent canonical URI: http://identifiers.org/biodbcore/10235900

– Click URL, browser redirects to http://biodbcore.org/resource/10235900– BioDBCore metadata record for the database (akin to “landing page” online journal

site)

• BioDBCore “landing page” presents database metadata– Information *about* the “thing”– Name: Ehlers-Danlos Syndrome Variant Database

Main resource URL: https://eds.gene.le.ac.uk <-- the “thing” itself [scope, data standards, other metadata]

• Location of database = the “thing” itself

8

Friday, 27 January 12

Page 9: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Mututal benefits

• To GEN2PHEN / G2P community– Identification - slot into resource identifier scheme for bio-databases globally, build

more detailed catalogues & annotation systems around this

– Discovery - finding relevant LSDB and other G2P resources via range of search/query tools outside the KC or LSDB lists

– BioDBCore could possibly evolve into a sort of live “database publishing platform” , instead of the static “snapshot” conventional papers.

• To BioDBCore initiative– Acquire an entire category’s worth of metadata records & link to community– Extra pairs of eyes on what they’re doing, alternative perspective– Potential for further collaboration on contrib. tracking tools & ORCID integration

9

Friday, 27 January 12

Page 10: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Open questions, known unknowns etc.

• BioDBCore quite new, many things remain in flux– e.g. the MIRIAM / identifiers.org technical details are vague

• DOIs for BioDBCore records - register database DOIs for fuller integration into publishing process?

• How will this work with existing LSDB lists?

10

Friday, 27 January 12

Page 11: GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

G. A. Thorisson, ULEIC

GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012

Acknowledgements GEN2PHEN Consortium

http://www.gen2phen.org/about-gen2phen/partners

Prof Anthony J. Brookes Bioinformatics Group, Leicester

11

This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.

Contact me!

<[email protected]> |<[email protected]>http://www.linkedin.com/in/mummihttp://www.twitter.com/gthorisson

http://www.gthorisson.namePublished under the CC BY license (http://creativecommons.org/licenses/by/3.0/)

Friday, 27 January 12