data provenance in biomedical discovery donald dunbar queen’s medical research institute...

15
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in Databases May 21st 2008

Upload: betty-johnston

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Data provenance in biomedical discovery

Donald DunbarQueen’s Medical Research Institute

University of Edinburgh

Workshop on Principles of Provenance in DatabasesMay 21st 2008

Page 2: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Background

biomedical research

basic & clinical science

animal, cell models, patients

genes, proteins, pathways

data analysis & mining

publication

Page 3: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Biomedical discovery

• Looking for contribution to – human health and disease

• In house experiments– data workflows– knowledge capture

• Use public databases– many data types– integration is a problem

Page 4: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Databases we use

sequence structure

function

expression domain specific

Page 5: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Data workflows

experiment 2

spreadsheet

raw datacalculations

publication

database

processeddata

experiment 1 database

Page 6: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Data workflows

copy and paste

open from file

‘algorithm’

copy and paste

save to file

IN

OUT

BUT:

web servicesautomated tools & databasesbioinformatics workflows

Page 7: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Bioinformatics workflows

Page 8: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Is our field changing?databases

experiments knowledge knowledgebase

Page 9: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Knowledge capture

Page 10: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Knowledge capture

Page 11: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

What provenance to we need?Example:Gene expression in a transgenic animal

gene annotation gene expression measurements

public databases output from machine

processingintegration

where, when

which identifiers how

when, what, how

data miningwhat and how did we select genes

……

Page 12: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

What provenance to we need?Example:Curated protein database

expert data database links

curator input

archive

contributor, date

verify, add, delete, modify

source, identifiers, dates

Curated databaseversions, dates

developmentschema & interface changes

Page 13: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

What do we do now (for provenance)?

• We trust the main data providers a lot!– a pragmatic approach

• We use tools and note the settings– rarely fully

• We put extra fields in our databases– source, modify date

• We deposit our data in public repositories– but only when we need to

Page 14: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

What might we do next?

• Use workflow tools like Taverna– capture workflow provenance

• Build provenance tool & database– widely applicable

• Make provenance more visible to biologists– so they value and use it

Page 15: Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in

Conclusions

• In biology we don’t do provenance well (yet)• We use databases and manual workflows• We implement rudimentary provenance• We should build useful provenance tools • We need to make provenance visible