the changing nature of biomedical research: semantic e-science
DESCRIPTION
Keynote talk, at the KR4HC workshop at Artificial Intelligence in medicine Europe, Verona, 2009TRANSCRIPT
![Page 1: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/1.jpg)
The Changing Nature of Biomedical Research: Semantic e-Science
Robert Stevens
BioHealth Informatics Group
University of Manchester
![Page 2: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/2.jpg)
Introduction
• (Modern bio-molecular) Science• E-Science• Semantics and science• Semantic e-Science
![Page 3: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/3.jpg)
Ernest Rutherford
“All science is either physics or stamp collecting”
Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg
![Page 4: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/4.jpg)
Mathematical Sciences
![Page 5: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/5.jpg)
Laws in Biology
Charles Darwin
Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg
On The Origin of Species - 1859
![Page 6: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/6.jpg)
Central Dogma
Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg
![Page 7: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/7.jpg)
Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology
![Page 8: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/8.jpg)
Speed of sequencing
• First human genome
– 10+ years to produce– Cost $500 million– Huge international effort
• Now done in 10 weeks
– (for $399)– http://tinyurl.com/genomecost– http://www.23andme.com
![Page 9: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/9.jpg)
1000+ databases
• according to Nucleic Acids Research
![Page 10: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/10.jpg)
PubMed: 2 papers per minute
• ~700,000 individual papers• Grows at 2 papers per minute
(see http://blogs.bbsrc.ac.uk for details)
![Page 11: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/11.jpg)
Biology now has lots of facts
![Page 12: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/12.jpg)
Lots of catalogues
Genome
Proteome
Transcriptome
Interactome
Metabolome
PHENOME
![Page 13: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/13.jpg)
Creating Woods, not Trees
Genes
Proteins
Pathways
Interactions
LiteratureComplex Machines
Virtual Organism
…. from biological facts, we make a system that is some model of a real organism
![Page 14: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/14.jpg)
Networks of Chemicals
Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif
![Page 15: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/15.jpg)
Systems within Systems
Image: http://www.ehponline.org/members/2007/10373/fig1.jpg
![Page 16: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/16.jpg)
Uniprot:- A protein database?
![Page 18: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/18.jpg)
Bioinformatics Experiments are Data pipelines
Resources/S
ervices
Investigate the evolutionary relationships between proteins
Proteinsequences
Multiplesequencealignment
Query
[Peter Li]
My data
My tool
![Page 19: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/19.jpg)
Linking together data resourcesHypo Science – the routine for the manyHyper Science – big projects, big science
![Page 20: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/20.jpg)
The In Silico Experiment
• We can mine these data for possible hypotheses
• “what are the genes that are involved in some disease phenotype?”
• Correlate genes in QTL with differentially regulated genes in microarray via pathways; query the literature base with these genes, pathways and phenotype; …
• Resulting facts form some hypothesis: A co-ordinated set of SNPs increase cholesterol biosynthesis in macrophage, while delaying apoptosis of these cells; increased super-oxide production aids tolerance to trypanosomiasis in cattle
![Page 21: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/21.jpg)
How bioinformatics was DoneIntegrating data sets
• Slave labour• Collections of Scripts• Warehouses• Applications
– Galaxy– Gaggle– Integr8– Ensembl– …..
• Workflows!
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta
![Page 22: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/22.jpg)
Workflows: E. Science laboris
• Data preparation and analysis pipelines.• Data preparation pipelines• Data integration pipelines• Data analysis pipelines• Data annotation pipelines• Warehouse population refreshing• Data and text mining • Knowledge extraction.• Parameter sweeps over
simulations/computations• Model building and verification• Knowledge management and model
population• Hypothesis generation and modelling
![Page 23: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/23.jpg)
• A workflow is a specification.• WFmS is the machinery for
coordinating the execution of (scientific) services and linking together (scientific) resources.
• Handles cross cutting concerns like: error handling, service invocation, data movement, data streaming, data provenance tracking, process auditing, execution monitoring, security access, blah blah…..
• Agile software development
Workflows: E. Science laboris
Enactment Engine
My data
My tool
![Page 24: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/24.jpg)
Workflow Execution Engine
Workflow execution engineLocal desktop and remote server Implicit iteration over large data collectionsNested workflowsAutomated data flowEvent history log and data provenance trackingWithin-workflow programmingExtensibility points for plug-ins
Graphical workbenchFor ProfessionalsPlug-in architecture
Incorporate new service without coding. Services as they are.Access to local and remote resources and analysis tools
Re-Design
Rewritten
![Page 25: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/25.jpg)
• Comparing resistant vs. susceptible strains – Microarrays
• Mapping quantitative traits – Classical genetics QTL
• Integrated Microarray data, genomic sequences, pathway data, literature mining.
Trypanosomiasis Study
Paul Fisher, et al Nucleic Acids Research, 2007, 35(16)
![Page 26: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/26.jpg)
Genotype to Pathway
Created by Paul Fisher
![Page 27: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/27.jpg)
Pathway to Phenotype
Created by Paul Fisher
![Page 28: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/28.jpg)
• Eliminated user bias and premature filtering
• The scale and complexity of data and literature.
• Systematic data analysis
• Data analysis provenance
• Manageable amount of output data for biologists to interpret and verify
• Data driven science
“Looking where others hadn’t”
“make sense of this data” -> “does this make sense?”
http://www.youtube.com/watch?v=Y6_Kz5L010g
![Page 29: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/29.jpg)
Transferring Characteristics
Uncharacterised protein
Tra1 La2 La3
High similarity transfer characteristics
![Page 30: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/30.jpg)
… A Fact Based Discipline
• Rather than laws captured in mathematics….• We have lots of facts: the discipline’s knowledge• Rather than “calculating” what a protein does, we
investigate and write it down• Equivalent to writing down the trajectories of all
thrown objects and not doing ballistics!• To do biology one needs “the knowledge”
![Page 31: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/31.jpg)
Heterogeneity
• 28 ways to format the representations of a biological sequence
• Though one way to represent the bases or amino acids…
• Different words same concept• Different concepts same words• Different and implicit data schema
![Page 32: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/32.jpg)
An Identity Crisis
• Database entries have identifiers unique within their database
• The type of entity described in an entry doesn’t have an identifier
• Different entries about the same type talk about it differently
• How do we know when an entry in one DB talks about the same thing as another entry in another DB?
• That’s the skill of a bioinformatician
![Page 33: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/33.jpg)
Categories and Category Labels
GO:0000368
U2-type nuclear mRNA 5' splice site recognition
spliceosomal E complex formation
spliceosomal E complex biosynthesis
spliceosomal CC complex formation
U2-type nuclear mRNA 5'-splice site recognition
![Page 34: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/34.jpg)
The Role of Knowledge
• A lot of facts• Perhaps organised into a system• No equivalent of “laws of mechanics” – we
can’t do this biology with mathematics• Or at least not without knowing what the
numbers mean...• This is why we’ve been using ontologies!
![Page 35: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/35.jpg)
Uses of Ontology in Bioinformatics
![Page 36: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/36.jpg)
Post-Genomic Biology
• Fly, mouse, yeast, worm all have their own terminologies
• I want to compare genomes• How?• The genomic sequence is easily dealt with
computationally and comparisons are easy• This is not true of the annotations or knowledge of
those sequences• Need a common understanding
![Page 37: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/37.jpg)
Annotation of Data
• Big effort to create controlled vocabularies using ontologies
• A huge annotation effort – describe the entities in DB with terms from ontologies
• The Gene Ontology (http://www.geneontology.org)• The Open Biomedical Ontologies Consortium
![Page 38: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/38.jpg)
![Page 39: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/39.jpg)
GO in Analysis
• Microarray analysis one of the original visions for GO• Clustering of modulated genes cluster about
functional attributes of their proteins• GO also used in, for example, semantic similarity;
text analysis; etc.
![Page 40: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/40.jpg)
Biocatalogue content screenshot
![Page 41: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/41.jpg)
Shield users and applications from service interoperability and incompatibility plumbing.
Turn your app into a service
Service providers Not only web services
How a bioinformatician assumes stuff should work
![Page 42: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/42.jpg)
Pettifer, University of Manchester
inside
A collection of interactive tools for analysing protein sequence and structure
http://utopia.cs.manchester.ac.uk/
![Page 43: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/43.jpg)
Semantic Descriptions of All
• Not just bio-entities in data• The laboratory experiments by which they were
generated• The protocols for their analysis • The services for their analysis
![Page 44: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/44.jpg)
Semantic Integration
• Same identifiers means integration and interoperation• Most workflow hobbled by syntactic and semantic
heterogeneity• Syntactic integration (Bio2RDF)• Semantic integration via ontologies and naming
schemes• Enables better e-Science through semantic science
![Page 45: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/45.jpg)
Fact Management
• When “stamp collecting” we’re collecting facts• Biology is a fact management activity• Knowing what these facts mean is very important• Science is performed on data and the semantics of data
enable us to do science• Semantic e-Science
![Page 46: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/46.jpg)
Summary
• The nature of modern biology gives it interesting knowledge (fact) management issues
• It is a knowledge based discipline• Not unique, but often extreme• Ontologies seen as one component in management
(but not a panacea)• E-Science gives infra-structure for management;
semantics enable analysis• Actually, very light use of semantics
![Page 47: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/47.jpg)
![Page 48: The Changing Nature of Biomedical Research: Semantic e-Science](https://reader035.vdocuments.mx/reader035/viewer/2022062704/555d05b9d8b42add648b5726/html5/thumbnails/48.jpg)
More Acknowledgements
• Phil Lord• Simon Jupp• Carole Goble