knowledge management in a knowledge based discipline

Post on 22-Nov-2014

152 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

invited talk, Aston Business School, Aston University 2009

TRANSCRIPT

Knowledge Management in a Knowledge Based Discipline

Robert Stevens

BioHealth Informatics Group

University of Manchester

Robert.Stevens@manchester.ac.uk

Introduction

• How do we do (molecular)biology• Managing stamp albums• A knowledge based discipline• Representing knowledge computationally• Ontologies that define what entities are in the

domain• Describing biological knowledge ontologically• Using ontologies and is it enough?

Ernest Rutherford

“All science is either physics or stamp collecting”

Image: http://en.wikipedia.org/wiki/File:Ernest_Rutherford2.jpg

Mathematical Sciences

Laws in Biology

Charles Darwin

Image: http://en.wikipedia.org/wiki/File:Charles_Darwin_01.jpg

On The Origin of Species - 1859

Classic and Modern Biology

Genotype Phenotype

Modern biology

Classic biology

Central Dogma

Image: http://cellbio.utmb.edu/CELLBIO/DNA-RNA.jpg

Speed of sequencing

• First human genome

– 10+ years to produce– Cost $500 million– Huge international effort

• Now done in 10 weeks

– (for $399)– http://tinyurl.com/genomecost– http://www.23andme.com

1000+ databases

• according to Nucleic Acids Research

PubMed: 2 papers per minute

• ~700,000 individual papers• Grows at 2 papers per minute (see http://

blogs.bbsrc.ac.uk for details)

Uniprot:- A protein database?

What is Knowledge?

• Knowledge – all information and an understanding to carry out tasks and to infer new information

• Information -- data equipped with meaning

• Data -- un-interpreted signals that reach our senses

Michael AshburnerProfessor

University of CambridgeUK

ISMB

NameJob

InstitutionCountry

Conf

manacademic, senior

ancient university, 5 ratedEuropean

important figure in biology

BIOLOGY

A Knowledge Based Discipline

• Rather than laws captured in mathematics….• We have lots of facts: the discipline’s knowledge• Rather than “calculating” what a protein does, we

investigate and write it down• Equivalent to writing down the trajectories of all

thrown objects and not doing ballistics!• To do biology one needs “the knowledge”

Heterogeneity

• 28 ways to format the representations of a biological sequence

• Though one way to represent the bases or amino acids…

• Different words same concept• Different concepts same words• Different and implicit data schema

Categories and Category Labels

GO:0000368

U2-type nuclear mRNA 5' splice site recognition

spliceosomal E complex formation

spliceosomal E complex biosynthesis

spliceosomal CC complex formation

U2-type nuclear mRNA 5'-splice site recognition

An Identity Crisis

• Database entries have identifiers unique within their database

• The type of entity described in an entry doesn’t have an identifier

• Different entries about the same type talk about it differently

• How do we know when an entry in one DB talks about the same thing as another entry in another DB?

• That’s the skill of a bioinformatician

Why: Society of Biologists

• To do particle physics necessarily has central organisation

• One central place to generate data• A communitarian attitude• It is still possible to do biology in the “garden shed”• Historicaly less need to organise• Hence…

Navigating the Web of Knowledge in Bioinformatics

Biology is Special

• Large quantities of data: No it doesn’t• Complex data: Yes it does• Volatile data: Types of data and what is recorded

changes rapidly• Nothing that special about biology • …except that it has all the problem and often to a

large degree

Lots of catalogues

Genome

Proteome

Transcriptome

Interactome

Metabolome

PHENOME

Biology now has lots of facts

Creating Woods, not Trees

Genes

Proteins

Pathways

Interactions

LiteratureComplex Machines

Virtual Organism

…. from biological facts, we make a system that is some model of a real organism

Networks of Chemicals

Image: http://genome-www.stanford.edu/rap_sir/images/Web_FigF_RAP1_glycolysis.gif

Systems within Systems

Image: http://www.ehponline.org/members/2007/10373/fig1.jpg

A Biologist’s Skills

• By the time a biologist has finished a Ph.D. he/she is about ready for action

• They have a comprehensive knowledge of the facts of a (narrow) domain

• He/she also knows how to do experimentation in that domain

• There are so many facts, it is difficult to move outside one’s sub-discipline

• Yet in a systems view such movement is mandatory

The Role of Knowledge

• A lot of facts• Perhaps organised into a system• No equivalent of “laws of mechanics” – we

can’t do this biology with mathematics• Or at least not without knowing what the

numbers mean...• This is why we’ve been using ontologies!

What is an Ontology?

• A description of that which exists (in our data)• What it means to be a member of a category• What categories of things exist and how do I

recognise that a particular object is a member of a given category

Uses of Ontology in Bioinformatics

Why develop an ontology?

• To make domain assumptions explicit

– Easier to change domain assumptions– Easier to understand and update legacy data

• To separate domain knowledge from operational knowledge

– Re-use domain and operational knowledge separately

• A community reference for applications• To share a consistent understanding of what information means.

History of Bio-ontologies

1992 1996 1998

TAMBIS

2002

MGED

2006

1st Bio-ontologies meeting

Gene Ontology starts

2005

Controlled Vocabulary

• An Ontology isn’t a controlled vocabulary, but can be used to deliver one

• By agreeing upon the categories in a domain and agreeing upon their labels we are controlling vocabulary

• Addresses one major problem in biology• Also forces examination of definitions• Makes domain assumptions explicit

Transferring Characteristics

Uncharacterised protein

Tra1 La2 La3

High similarity transfer characteristics

Post-Genomic Biology

• Fly, mouse, yeast, worm all have their own terminologies

• I want to compare genomes• How?• The genomic sequence is easily dealt with

computationally and comparisons are easy• This is not true of the annotations or knowledge of

those sequences• Need a common understanding

Annotation of Data

• Big effort to create controlled vocabularies using ontologies

• A huge annotation efffort – describe the entities in DB with terms from ontologies

• The Gene Ontology (http://www.geneontology.org))• The Open Biomedical Ontologies Consortiym

Genotype Phenotype

Sequence

Proteins

Gene products Transcript

Pathways

Cell type

BRENDA tissue / enzyme source

Development

Anatomy

Pheonotype

Plasmodium life cycle

-Sequence types and features-Genetic Context

- Molecule role - Molecular Function- Biological process - Cellular component

-Protein covalent bond -Protein domain -UniProt taxonomy

-Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction

-Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version

-Mosquito gross anatomy-Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy-Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development

-NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history

eVOC (Expressed Sequence Annotation for Humans)

The Sequence Ontology

(http://obo.sf.net)

GO in Analysis

• Microarray analysis one of the original visions for GO• Clustering of modulated genes cluster about

functional attributes of their proteins• GO also used in, for example, semantic similarity;

text analysis; etc.

Fact Management

• When “stamp collecting” we’re collecting facts• Biology is a fact management activity• Knowing what these fact mean is very import• Science is perofrmed on data and the smeantics of

data enable us to do science• Semantic e-Science

Summary

• The nature of modern biology gives it interesting knowledge (fact) management issues

• It is a knowledge based discipline• Not unique, but often extreme• Ontologies seen as one component in management

(but not a panacea)

acknowledgements

• All these people provided slides and input:• Duncan Hull• Simon Jupp• Phil Lord• Carole goble

Genotype to Pathway

Created by Paul Fisher

Pathway to Phenotype

Created by Paul Fisher

Ontology Space

(Axi

omat

ic)

Ric

hnes

s

Usage

Representation

Metadata toilet

• Everyone wants to use good metadata but few people want to spend time curating and cleaning metadata

– Like a clean toilet

Biologists Wake up to Standards

top related