drug discovery- elrig -2012
Post on 27-Jan-2015
115 Views
Preview:
DESCRIPTION
TRANSCRIPT
Alejandra González-‐Beltrán, PhD
Senior Software Engineer, ISATeam University of Oxford e-‐Research Centre, Oxford, UK
Drug Discovery 2012, Manchester, UK, September 6-‐7
Community-‐standards for reproducible and reusable research -‐
fundamentals and challenges
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Gene*cs 41(2), 149-‐55 (2009) doi:10.1038/ng.295
Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Gene*cs 41(2), 149-‐55 (2009) doi:10.1038/ng.295
Roadmap
Reproducible & Reusable Bioscience Research
Principles & Challenges
Roadmap
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integraYon
visualizaYon
browsing retrieval
Principles & Challenges
Roadmap
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integraYon
visualizaYon
browsing retrieval
Principles & Challenges
Community Standards So[ware Tools
Source of the figure: EBI website
§ Interdisciplinary and integra9ve in character • need to deal with new and exis9ng datasets
• deal with a variety of data types
Bioscience is mulY-‐domain…
tox/pharma
env
health
agro
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
From reusable data to reproducible research
To make the datasets comprehensible and interoperable, underpinning future
invesYgaYons, we need common ways to report and share the experimental details
and the associated results
Consistent reporYng will have a posiYve and long-‐lasYng impact on the value of
collec9ve scien9fic outputs.
Community Standards
Different communiYes, different norms and standards, e.g.:
report the same core, essenYal informaYon
use the same term to refer to the same ‘thing’ allow data to flow from
one system to another
Different communiYes, different norms and standards, e.g.:
report the same core, essenYal informaYon
use the same term to refer to the same ‘thing’ allow data to flow from
one system to another
Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
GIATE Guidelines for InformaYon About Therapy Experiments
Clinical Model
Animal Model
Cellular Model
Molecular Model
TherapeuYc InvesYgaYon
Molecular Model
Cellular Model
Animal Model
Clinical Model
Generic Model
VO!
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML!SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
Growing number of bioscience reporYng standards
GIATE!
130 +
Es9mated
150 +
Source: MIBBI, EQ
UATO
R
303 +
Source: BioPortal Databases, annotaYon, curaYon tools
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML!SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!GIATE!
Growing number of bioscience reporYng standards
But… what do we know about them and how they are related
miame!MIAPA!
MIRIAM!MIQAS!MIX!
MIGEN!
CIMR!MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!GCDML!
SRAxml!SOFT! FASTA!
DICOM!
MzML!SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!CHEBI!
OBI!
PATO! ENVO!MOD!
BTO!IDO…!
TEDDY!
PRO!XAO!
DO
VO!GIATE!
Which ones are mature enough for
me to use or recommend?
I work on plants, are these just for biomedical applicaYons?
What are the criteria to evaluate their status and
value?
How can I get involved to propose
extensions or modificaYons?
Which tools and databases
implement which standards?
I use high throughput sequencing technologies, which ones are relevant to
me?
Which formats support specific
minimum informaYon guidelines?
But… what do we know about them and how they are related
A coherent, curated and searchable catalogue of data sharing resources
• Bioscience standards and
associated data-‐sharing policies, publica9ons, tools and databases
• Assessment criteria for usability and popularity of standards
• Rela9onships among standards
• Encouragement for communica9on & interac9on among groups
• PromoYng interoperability & informed decisions about standards
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Standards compliance is challenging…
Is it possible to achieve a common, structured representaYon
of diverse bioscience experiments that:
• transcends individual bioscience domains, but also
• follows the appropriate community norms and standards?
§ Capture all salient features of the experimental workflow
§ Make annotaYon explicit and discoverable
§ Structure the descripYons for consistency, tracking § independent variables § dependent variables and using § resolvable idenYfiers and
cross-‐references
Structured descripYon of datasets
§ We must strike a balance between sufficiency and pracYcability: • depth and breadth of
informaYon • burden to produce and
maintain the informaYon
Not too much, not too lille, just ‘right’
MAGE-Tab Pride-xml
SRA-xml SOFT
Metadata tracking framework, designed to support the use of several standards c h e c k l i s t s , t e r m i n o l o g i e s a n d conversions to (a growing number of) other metadata formats , used by public repositories, e.g.
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
user community
ISA soQware suite: supporYng standards-‐compliant experimental annotaYon and enabling curaYon at the community level (Rocca-‐Serra et al, 2010)
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
23
empowering researchers to use standards
Ontology Search and Tagging in Google Spreadsheets
Ontology Search and Tagging in Google Spreadsheets
ISA infrastructure & linked data
• Work in progress to convert to RDF/OWL to connect to the growing Linked Data universe RDF = Resource DescripYon Framework, OWL = Web Ontology Language
• CollaboraYons with Toxbank & W3C HCLSIG
<subject, predicate, object> <lipoprotein> <parYcipates_in> <inflammatory response> <PRO:212342352> <BFO_0000056> <GO:0006954>
Increasing levels of structure…
Notes in Lab Books(information for humans)
Spreadsheets and Tables( the compromise)
Facts as RDF statements(information for machines)
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-‐compliant collec9on, cura9on, management and reuse of invesYgaYons in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics,
We aim to achieve a common representaYon of experimental content that transcends
individual bioscience domains
Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communiYes working to build a
library of cellular signatures
Nanotechnology InformaYcs Working
Group
Some of the internal projects: Some of the public groups/resources:
4
Stem Cell Commons
Stem Cell Commons
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-‐compliant collec9on, cura9on, management and reuse of invesYgaYons in an increasingly diverse set of life science domains, including:
• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics,
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communiYes working to build a
library of cellular signatures
Implementation at Harvard
ISA
hlp://discovery.hsci.harvard.edu/
31
Implementation at the EBI
hlp://www.ebi.ac.uk/metabolights
lack of coordinaYon,
fragmentaYon and uneven coverage
Standards-‐compliant data sharing is demanding and Yme-‐consuming
GIATE Guidelines
Terminologies
Formats
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integraYon
visualizaYon
browsing retrieval
Community Standards So[ware Tools
@isatools @biosharing Isa-‐tools.org isacommons.org biosharing.org
top related