1 how philosophy of science can help biomedical research barry smith
TRANSCRIPT
1
How Philosophy of Science Can Help Biomedical Research
Barry Smith
http://ontology.buffalo.edu/smith
How to Do Biology across the Genome?
2
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
3sequence of X chromosome in baker’s yeast
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
4
5
6Stelzl et al., Cell, 2005
network of gene interactions in E. coli http://moebio.com/santiago/gnom/ingles.html
8
9
10
what cellular component?
what molecular function?
what biological process?
11
12
13
The Idea of Common Controlled Vocabularies
MouseEcotope GlyProt
DiabetInGene
GluChem
sphingolipid transporter
activity
14
The Idea of Common Controlled Vocabularies
MouseEcotope GlyProt
DiabetInGene
GluChem
Holliday junction helicase complex
15
male courtship behavior, orientation prior to leg tapping and wing vibration
Gene Ontology
16
Benefits of GO
1. based in biological science
2. links data to biological reality
3. links people to software
4. links data together
• across species (human, mouse, yeast, fly ...)
• across granularities (molecule, cell, organ, organism, population)
The goal
all biological (biomedical) research data should cumulate to form a single, algorithmically processible, whole
http://obofoundry.org
17
Ontologies already being applied to achieve this goal
Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers
GO tells you what is standard functional information for these genes
By tracking deviations from this standard 189 genes could be identified as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
18
Towards Empirical Philosophy
• processualist vs. 3-dimensionalist
• reductionist vs. non-reductionist
• realist vs. nominalist
If ontologies based on different philosophical principles are tested for their utility in support of scientific research, which types of ontologies will prove most useful?
19
20
Some sample ontologies
Cell Ontology (CL)Foundational Model of Anatomy (FMA)Environment Ontology (EnvO) Gene Ontology (GO)Infectious Disease Ontology Phenotypic Quality Ontology (PaTO)Protein Ontology (PRO)RNA Ontology (RnaO)Sequence Ontology (SO)
21
22
23
24
The problem
High throughput experimentation data is meaningless unless the researcher is provided with detailed information concerning how it was obtained
25
To make experimental data computationally accessible we need ontologies to describe the data
(1) from the point of view of their relation to reality
(2) from the point of view of their relation to experiments
26
27
Three solutions
The MGED Ontology
OBI: The Ontology for Biomedical Investigations
EXPO: The Experiment Ontology
28
MGED (Microarray Gene Expression Data) Ontology
MGED Ontology
Individual =def. name of the individual organism from which the biomaterial was derived
Experiment =def. The complete set of bioassays and their descriptions performed as an experiment for a common purpose. ... An experiment will be often equivalent to a publication.
29
MGED Ontology
Chromosome =Def An abstraction used for annotation
Chromosome =Def A biological sequence that can be placed on an array
30
31
OBI
The Ontology for Biomedical Investigations
with thanks to Trish Whetzel and Richard Scheuermann
32
Purpose of OBI
To provide a resource for the unambiguous description of the components of biomedical investigations such as the design, protocols and instrumentation, material, data and types of analysis and statistical tools applied to the data
NOT designed to model biology
Hypothesis
That it is possible to create ontology resources of genuine utility by drawing on logical and philosophical principles e.g. pertaining to consistency of definitions, avoidance of use-mention confusions.
33
34
OBI Collaborating CommunitiesCrop sciences Generation Challenge Programme (GCP),Environmental genomics MGED RSBI Group, www.mged.org/Workgroups/rsbiGenomic Standards Consortium (GSC),
www.genomics.ceh.ac.uk/genomecatalogueHUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.netImmunology Database and Analysis Portal, www.immport.orgImmune Epitope Database and Analysis Resource (IEDB),
http://www.immuneepitope.org/home.doInternational Society for Analytical Cytology, http://www.isac-net.org/Metabolomics Standards Initiative (MSI), Neurogenetics, Biomedical Informatics Research Network (BIRN),Nutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbiPolymorphismToxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbiTranscriptomics MGED Ontology Group
OBI – Tools and Documentation
Open source, standards compliant and version management• Ontology Web Language (OWL) using Protégé editor• OBI.owl files are available from the OBI SVN Repository
The Problem of Clinical Investigations
Regulatory bodies such as the FDA need to assess the evidentiary value of enormous volumes of data collected e.g. in trials on specific drug formulations
For this, they need to impose standardization of terminologies used to express these data, e.g. as developed by the Clinical Data Interchange Standards Consortium (CDISC)
36
37
Clinical Investigations terminologies
“Study Design”
Descriptive research – Case study – description of one or more patients– Developmental research – description of pattern of change
over time– Qualitative research – gathering data through interview or
observation
Exploratory research– Secondary analysis – exploring new relationships in old data– Historical research – reconstructing the past through an
assessment of archives or other records
Experimental research– Randomized clinical trial – Meta-analysis – statistically combining findings from several
different studies to obtain a summary analysis
“Population”Recruited population
– Randomized population– Eligible population– Screened population– Premature termination population
Excluded population– Excluded post-randomization population– Not-eligible-population
Analyzed population– Study arm population– Crossover population– Subgroup population– Intent-to-treat population - based on randomization
Overview of OCI
Meta-analysis (CDISC)Quality assurance (CDISC)Quality control (CDISC)Baseline assessment (CDISC)Validation (CDISC)Coding (MUSC)Permuted block randomization (MUSC)Secondary-study-protocol (RCT)Intervention-step (RCT)Blinding-method (RCT)
Study design
Development plan (CDISC)Standard operating procedures (CDISC)Statistical analysis plan (CDISC)
Negative findings (MUSC)Positive findings (MUSC)Primary-outcome (RCT)Secondary-outcome (RCT)
46
EXPO
The Ontology of Experiments
L. Soldatova, R. KingDepartment of Computer Science
The University of Wales, Aberystwyth
47EXPO: Experiment Ontology
48EXPO: Experiment Ontology
49EXPO: Experiment Ontology
50
experimental actions part_of experimental designsubject of experiment part_of experimental design
51
Role of Philosophy of Science
EXPO: Experiment Ontology
Towards Empirical Philosophy of Science
• rational statistical models of induction• case-based / domain-based reasoning• falsifiabilism• Humeanism vs. laws• logical, relative frequency, Bayesian, objective
(chance) and epistemic theories of probability
These generate different ontologies of scientific evidence
– which one is correct?
52
Environment Ontology +
Phenotypic Quality Ontology +
Ontology for Personalized and Community Medicine
‘Racial’ Phenotypes: Social, Phylogenetic, Essentialistic ...
53
54
Ontology for Personalized and Community Medicine
to support studies of differential effects on health
1. of environmental qualities of different neighborhoodsand
2. of different community behavior phenotypes