value-adding, access, and use: biological databases as a case study

44
Value-adding, Access, and Use: Biological Databases as a Case Study

Upload: nevin

Post on 18-Mar-2016

27 views

Category:

Documents


2 download

DESCRIPTION

Value-adding, Access, and Use: Biological Databases as a Case Study. Genes…. …….make proteins. Proteins form complex 3D structures. Molecules interact. the right molecules need to be present at the right time. EMBL-Bank DNA sequences. EMBL-Bank DNA sequences. SWISS-PROT + TrEMBL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Value-adding, Access, and Use: Biological Databases as a Case Study

Value-adding, Access, and Use: Biological Databases as a Case Study

Page 2: Value-adding, Access, and Use: Biological Databases as a Case Study

Genes…..

Page 3: Value-adding, Access, and Use: Biological Databases as a Case Study

…….make proteins

Page 4: Value-adding, Access, and Use: Biological Databases as a Case Study

Proteins form complex 3D structures

Page 5: Value-adding, Access, and Use: Biological Databases as a Case Study

Molecules interact

Page 6: Value-adding, Access, and Use: Biological Databases as a Case Study

the right molecules need to bepresent at the right time

Page 7: Value-adding, Access, and Use: Biological Databases as a Case Study
Page 8: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

Page 9: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

Page 10: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Page 11: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

Page 12: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

Page 13: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

Page 14: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

SWISS-PROT+ TrEMBL

InterPro

EnsEMBLMetazoan GenomeGene Annotation

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

Page 15: Value-adding, Access, and Use: Biological Databases as a Case Study

EnsEMBL

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

Page 16: Value-adding, Access, and Use: Biological Databases as a Case Study
Page 17: Value-adding, Access, and Use: Biological Databases as a Case Study
Page 18: Value-adding, Access, and Use: Biological Databases as a Case Study

Integr8

Page 19: Value-adding, Access, and Use: Biological Databases as a Case Study

EnsEMBL

EMBL-BankDNA sequences

Array-ExpressMicroarray

Expression Data IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

EMSDMacromolecularStructure Data

Page 20: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL-BankDNA sequences

IntActProtein ProteinInteraction Data

SWISS-PROT+ TrEMBL

InterPro

Page 21: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 22: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Production DB

Page 23: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Production DB

Page 24: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 25: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 26: Value-adding, Access, and Use: Biological Databases as a Case Study

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 27: Value-adding, Access, and Use: Biological Databases as a Case Study

Data Distrib.

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 28: Value-adding, Access, and Use: Biological Databases as a Case Study

Data Distrib.

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

Subm

itter

s

Subm

issi

on to

ols

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 29: Value-adding, Access, and Use: Biological Databases as a Case Study

Data Distrib.

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

DevelopmentDB

Subm

itter

s

Subm

issi

on to

ols

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 30: Value-adding, Access, and Use: Biological Databases as a Case Study

Data Distrib.

Running a database project

Serv

ice

Tool

s

End

Use

rs

Service DBProduction DB

DevelopmentDB

Subm

itter

s

Subm

issi

on to

ols

Add value(computation)

Add value (review etc.)

Data exchange

Other archives

Q/C etc

Databasedesign

Releases&Updates

GenomesGenesPatentsUpdates

Page 31: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL nucleotide sequence database

Page 32: Value-adding, Access, and Use: Biological Databases as a Case Study

Dataflow

Page 33: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBLFlat File

ID SLD746 standard; DNA; PRO; 477 BP.XXAC D83746;XXNI g1772347XXDT 18-JAN-1997 (Rel. 50, Created)DT 17-FEB-1997 (Rel. 50, Last updated, Version 2)XXDE Streptomyces lividans DNA for ribosomal protein S12, complete cds.XXKW ribosomal protein S12.XXOS Streptomyces lividansOC Eubacteria; Firmicutes; Actinomycetes; Streptomycetes;OC Streptomycetaceae; Streptomyces.XXRN [1]RP 1-477RA Shima J.;RT ;RL Submitted (06-MAR-1996) to the EMBL/GenBank/DDBJ databases.RL Jun Shima, National Food Research Institute; Kannondai 2-1-2, Tsukuba,RL Ibaraki 305, Japan (E-mail:[email protected], Tel:0298-38-8124,RL Fax:0298-38-7996)XXDR SPTREMBL; P97222; P97222.XXFH Key Location/QualifiersFHFT source 1. .477FT /organism="Streptomyces lividans"FT /strain="TK21"FT CDS 28. .399FT /db_xref="PID:g1772348"FT /db_xref="SPTREMBL:P97222"FT /product="ribosomal protein S12"FT /translation="MPTIQQLVRKGRQDKVEKNKTPALEGSPQRRGVCTRVFTTTPKKPFT NSALRKVARVRLTSGIEVTAYIPGEGHNLQEHSIVLVRGGRVKDLPGVRYKIIRGSLDTFT QGVKNRKQARSRYGAKKEK"FT mutation 289FT /replace="g"FT /phenotype="streptomycin resistant mutant TK24"XXSQ Sequence 477 BP; 99 A; 153 C; 152 G; 73 T; 0 other; ATTCGGCACA CAGAAACCGG AGAAGTAGTG CCTACGATCC AGCAGCTGGT CCGGAAGGGC 60 CGGCAGGACA AGGTCGAGAA GAACAAGACG CCCGCACTCG AGGGTTCGCC CCAGCGCCGT 120 GGCGTCTGCA CGCGTGTGTT CACGACCACC CCGAAGAAGC CGAACTCGGC CCTGCGTAAG 180 GTCGCGCGTG TGCGTCTGAC CAGTGGGATC GAGGTCACCG CTTACATTCC GGGTGAGGGG 240 CACAACCTGC AGGAGCACTC CATCGTGCTC GTGCGCGGCG GCCGTGTGAA GGACCTGCCG 300 GGTGTTCGCT ACAAGATCAT CCGCGGTTCG CTTGACACCC AGGGTGTGAA GAACCGCAAG 360 CAGGCCCGCA GCCGCTACGG CGCCAAGAAG GAGAAGTAAG AATGCCTCGT AAGGGCCCCG 420 CCCCGAAGCG CCCGGTCATC ATCGACCCGG TCTACGGTTC TCCTCTGGTG ACCTCCC 477//

Page 34: Value-adding, Access, and Use: Biological Databases as a Case Study

EMBL Relational Schema

VARI ATIO NFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o FREQ UENCY o REPLACE * USERSTAM P * TI M ESTAM P

UNCLASSI FI EDFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o USERSTAM P o TI M ESTAM P

TRANSLATI ONEXCEPTI ON @ PRDB1 (#)

* AM I NO ACI D * FEATI D * TRANSXEND# * TRANSXI D * TRANSXSTART o BI O SEQ I D * USERSTAM P * TI M ESTAM P

TRANSCRI PTFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ENUM BER o FI VE_CO NS o PSEUDO o READI NG FRAM E o THREE_CO NS * USERSTAM P * TI M ESTAM P

THESI S @ PRDB1 (#)

* I NSTI TUTE## * PUBI D o ADVI SO R o DEG REE * USERSTAM P * TI M ESTAM P

SUBMI SSI ONREF @ PRDB1 (#)

# * PUBI D o DS# o M EDI UM * USERSTAM P * TI M ESTAM P

SOURCEFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o O RG ANI SM o CHRO M O SO M E o FREQ UENCY o HAPLO TYPE o NUCSO URCE o O RG ANELLE o SEQ UENCED o SEX o VI RI O N * USERSTAM P * TI M ESTAM P

SI GNALFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o DI RECTI O N o PSEUDO * USERSTAM P * TI M ESTAM P

SEQFEATURE_NOTE @ PRDB1

* FEATI D * LI NE# * NO TE# * TEXT o USERSTAM P o TI M ESTAM P

SEQFEATURE @ PRDB1 (#)

* BI O SEQ I D# * FEATI D * FTYPE# * LO CATI O N * O RDER_I N o EVI DENCE o FEAT_LABEL o I NCO M PLETE o SUBM I TTO R * USERSTAM P * TI M ESTAM P

RPTUNI T @ PRDB1

* FEATI D o RPTI D o LABEL o RPTEND o RPTSTART * USERSTAM P * TI M ESTAM P

RNAFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ANTI CO DO NAA o ANTI CO DO NEND o ANTI CO DO NSTART o PSEUDO * USERSTAM P * TI M ESTAM P

PUB_XREF @ PRDB1

* PUBI D * DBCO DE * PRI M ARYI D

PUBLI CATI ON @ PRDB1 ( #)

# * PUBI D * PUBLANG * PUBSTATUS * PUBTYPE o PUBDATE o TI TLE * USERSTAM P * TI M ESTAM P

PUBAUTHOR @ PRDB1

* O RDERI N * PERSO N * PUBI D o EDI TO RFLAG * USERSTAM P * TI M ESTAM P

PROTEI NSEQ @ PRDB1 ( #)

# * SEQ I D o DERI VED o M O LW EI G HT * USERSTAM P * TI M ESTAM P

PROTEI NCO DI NGFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# * PRO TEI NSEQ I D o ENUM BER o PSEUDO o READI NG FRAM E o TRANSL_TABLE_I D * USERSTAM P * TI M ESTAM P

PHYSI CALSEQ @ PRDB1 ( #)

# * PHYSEQ I D * SEQ TEXT * USERSTAM P * TI M ESTAM P

PERSO NALCOMM @ PRDB1

* ADDRESS * PUBI D * RECI PI ENT o PHO NEI D * USERSTAM P * TI M ESTAM P

PERSON @ PRDB1 (#)

# * PERSO NI D * SURNAM E o FI RSTNAM E o M I DI NI TI ALS * USERSTAM P * TI M ESTAM P

PATENT_BI OSEQ @ PRDB1 (#)

* O RDERI N# * PUBI D# * SEQ I D * USERSTAM P * TI M ESTAM P

PATENTPRIO RI TY @ PRDB1

* PRI O RI TY_DATE * PRI O RI TY_NO * PRI O RI TY_O FFI CE * PRI O RI TY_O RDER * PUBI D * USERSTAM P * TI M ESTAM P

PATENTCLASS @ PRDB1

* CLASS * CLASS_O RDER * PUBI D * USERSTAM P * TI M ESTAM P

PATENTAPPLI CANT @ PRDB1

* APPNAM E * O RDERI N * PUBI D * USERSTAM P * TI M ESTAM P

PATENTABSTRACT @ PRDB1 (#)

* ABSTRACT# * PUBI D * USERSTAM P * TI M ESTAM P

PATENT @ PRDB1 (#)

* DO CNUM * DO CO FFI CE * DO CTYPE# * PUBI D o APPDATE o APPNUM o APPO FFI CE * USERSTAM P * TI M ESTAM P

NUCSTRUCTUREFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o ENUM BER o FREQ UENCY o M O DBASE o O RG ANI SM o RPTFAM I LY * USERSTAM P * TI M ESTAM P

NUCSEQ @ PRDB1 ( #)

* A_CO UNT * C_CO UNT * G _CO UNT * O THER_CO UNT# * SEQ I D * T_CO UNT o STRAND o TO PO LO G Y * USERSTAM P * TI M ESTAM P

NTX_TAX_NODE @ PRDB1 (#)

# * TAX_I D o PARENT_I D o RANK o EM BL_CO DE o DI V_I D * I NHERI T_DI V_I D o G C_I D o I NHERI T_G C_I D o M G C_I D o I NHERI T_M G C_I D * HI DDEN o NO _SEQ UENCE o REM ARK

NTX_SYNONYM @ PRDB1

* TAX_I D * NAM E_TXT o UNI Q UE_NAM E o NAM E_CLASS * UPPER_NAM E_TXT

NTX_RANK @ PRDB1 (#)

o RANK_I D# * RANK_TXT

NTX_CLASS @ PRDB1 (#)

o CLASS_CO DE# * CLASS_TEXT o PRI O RI TY

LOCATI ON_TREE @ PRDB1 (#)

# * LO CNO DEI D * O RDER_I N o CO M PLEM ENT o END_FSI ZE o END_FTYPE o G AP_FSI ZE o G AP_FTYPE o LI TERAL o O PERATO R o PARENTI D o REPLACE o REPL_STRI NG o SEG _END o SEG _G AP o SEG _START * SEQ I D o START_FSI ZE o START_FTYPE * USERSTAM P * TI M ESTAM P

KEYWORD_SYNONYM @ PRDB1 (#)

# * KEYW O RDI D1# * KEYW O RDI D2

KEYWORD @ PRDB1 (#)

* CO M PRESSED_KW * KEYW O RD# * KEYW O RDI D o DBCO DE o DESCRI PTI O N * USERSTAM P * TI M ESTAM P

JOURNALARTI CLE @ PRDB1 (#)

* FI RSTPAG E * I SSN# * PUBI D * VO LUM E o ARTI CLETYPE o I SSUE o LASTPAG E o O RDER_O N_PAG E o SUPPLEM ENT * USERSTAM P * TI M ESTAM P

I NSTI TUTE @ PRDB1 ( #)

# * I NSTI TUTE# * I NSTI TUTE_NAM E * USERSTAM P * TI M ESTAM P

I MMUNOFEATURE @ PRDB1 (#)

# * FEATI D * FKEY# o CHI M ERI C o PSEUDO o READI NG FRAM E o TRANSL_TABLE_I D * USERSTAM P * TI M ESTAM P

GENE @ PRDB1 (#)

# * G ENEI D * G ENENAM E o DBCO DE o EXTDBI D o O RG ANI SM o PATHO LO G Y o PRO DUCT * USERSTAM P * TI M ESTAM P

FEATURE_RELATI ONSHI P @ PRDB1 (#)

# * FEATI D1# * FEATI D2# * RELATI O N * USERSTAM P * TI M ESTAM P

FEATURE_QUALI FI ERS @ PRDB1 (#)

# * FEATI D# * O RDER_O N * Q UAL# o TEXT o USERSTAM P o TI M ESTAM P

ERROR_QUALI FI ERS @ PRDB1 (#)

# * FEATI D# * O RDER_O N

DBENTRY_KEYWORD @ PRDB1

* KEYW O RDI D * DBENTRYI D o USERSTAM P o TI M ESTAM P

DBENTRY_DESCR @ PRDB1 (#)

# * DBENTRYI D# * LI NE# * TEXT o USERSTAM P o TI M ESTAM P

DBENTRY_CO MMENT @ PRDB1 (#)

# * CO M M ENTI D# * DBENTRYI D * USERSTAM P * TI M ESTAM P

DBENTRY @ PRDB1 (#)

* BI O SEQ I D# * DBENTRYI D * ENTRY_NAM E * ENTRY_STATUS * PRI M ARYACC# * VERSI O N# o ANN_DATE o CLEAN_LI STI NG o CO NFI DENTI AL o DBCO DE o EXT_DATE o EXT_VER o FI RST_CREATED o FI RST_PUBLI C o HO LD_DATE o M I SSI NG _PAPER o PRO JECT# o SUBM I T_TO O L o W AI T_FO R_PAPER * USERSTAM P * TI M ESTAM P o FFDATE

DATABASE_XREF @ PRDB1

* EBI _DB * ACC# o NI D_TEXT o PI D_TEXT * EXT_DB * PRI M ARYI D o SECO NDARYI D

COMMENT_TEXT @ PRDB1 (#)

# * CO M M ENTI D# * LI NE# * TO PI CTYPE o PRI VATE o TEXT. . .

CODO NEXCEPTI ON @ PRDB1 (#)

* AM I NO ACI D * CO DO NSEQ# * CO DO NXI D * FEATI D o USERSTAM P o TI M ESTAM P

CI TATI ONSEQFEATURE @ PRDB1 ( #)

# * FEATI D# * PUBI D# * SEQ I D * USERSTAM P * TI M ESTAM P

CI TATI O NBI OSEQ @ PRDB1 (#)

* O RDERI N# * PUBI D# * SEQ I D o CI TCO M M ENT o FULLSEQ o LO CNO DEI D o LO CTYPE * USERSTAM P * TI M ESTAM P

BOOK @ PRDB1 (#)

* BO O KTI TLE * FI RSTPAG E * LASTPAG E# * PUBI D * PUBLI SHER o EDI TI O N o I SBN o PUBPLACE o SERI ES o VO LUM E * USERSTAM P * TI M ESTAM P

BI OSEQ @ PRDB1 (#)

* BI O SEQ TYPE * CHKSUM * M O LECULETYPE# * SEQ I D * SEQ LEN o DDBJSI D o EBI SI D o LO G SEQ o NCBI G I o PHYSEQ * USERSTAM P * TI M ESTAM P

ACCPAI R @ PRDB1 (#)

# * PRI M ARY# * SECO NDARY * USERSTAM P * TI M ESTAM P

ACCEPTED @ PRDB1 (#)

* I SSN# * PUBI D o ARTI CLETYPE o FI RSTPAG E o I SSUE o LASTPAG E o O RDER_O N_PAG E o SUPPLEM ENT o VO LUM E * USERSTAM P * TI M ESTAM P

FK_ACCFK_ACCPAI R_127

FK_BI OSEQ_4

FK_BI OSEQ_5

FK_BOOK_59

FK_CI TATIO NBI OSEQ _60

FK_CI TATI ONBI OSEQ_61

FK_CI TATIO NSEQFEATURE_10

FK_CI TATI ONSEQFEATURE_92

FK_CODONEXCEPTI ON_50

FK_DBENTRY_25

FK_DBENTRY_COMMENT_100

FK_DBENTRY_DESCR_117

FK_DBENTRY_KEYWO RD_39

FK_DBENTRY_KEYWORD_40

FK_ERROR_QUALI FI ERS_122

FK_FEATURE_QUALI FI ERS_121

FK_FEATURE_RELATI ONSHI P_28FK_FEATURE_RELATI ONSHI P_29

FK_GENE_64

FK_I MMUNOFEATURE_7

FK_JOURNALARTI CLE_66

FK_LOCATI ON_TREE_68

FK_LOCATI ON_TREE_72

FK_NAME_CLASS

FK_NTX_SYNONYM_46

FK_NUCSEQ_76

FK_NUCSTRUCTUREFEATURE_55

FK_NUCSTRUCTUREFEATURE_57

FK_PATENTABSTRACT_1

FK_PATENTAPPLI CANT_2FK_PATENTCLASS_98

FK_PATENTPRI ORI TY_97

FK_PATENT_77

FK_PATENT_BI OSEQ_107

FK_PATENT_BI OSEQ_108

FK_PERSONALCOMM_3

FK_PROTEI NCODI NGFEATURE_12

FK_PROTEI NCODI NGFEATURE_13

FK_PROTEI NSEQ_80

FK_PROTEI NSEQ_81

FK_PUBAUTHOR_82

FK_PUBAUTHOR_83FK_PUB_XREF_1

FK_RANK

FK_RNAFEATURE_15

FK_RPTUNI T_109

FK_SEQFEATURE_35

FK_SEQFEATURE_36

FK_SEQFEATURE_NOTE_78

FK_SI GNALFEATURE_18

FK_SO URCEFEATURE_20FK_SOURCEFEATURE_21

FK_SUBMISSI ONREF_119

FK_THESI S_106

FK_THESI S_84

FK_THESI S_85

FK_TRANSCRIPTFEATURE_52

FK_TRANSLATI ONEXCEPTI ON_110

FK_TRANSLATI ONEXCEPTIO N_51

FK_UNCLASSIFI EDFEATURE_118

FK_UNPUBLI SHED_114

FK_VARI ATI ONFEATURE_48

Location Info

Feature Info

Taxonomy Info

Reference InfoSequence Info

Page 35: Value-adding, Access, and Use: Biological Databases as a Case Study

Data Access and Use Network services Sequence Retrieval System (SRS)

integrating and linking the main nucleotide and protein databases plus many specialized databases

Database releases are produced quarterly- via FTP (inc. mirror sites) and CD-ROM

Daily and cumulative updates via FTP Sequence search servers

Page 36: Value-adding, Access, and Use: Biological Databases as a Case Study

April 2003: TrEMBL 23.4 + SWISS-PROT 41.2 829,111 TrEMBL entries 123,721 SWISS-PROT entries weekly production of a non-redundant and

comprehensive protein sequence database consisting of SWISS-PROT, TrEMBL, and TrEMBLnew: ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/

Page 37: Value-adding, Access, and Use: Biological Databases as a Case Study

Goals High level of annotation Minimal redundancy High level of integration with other

databases Complete and up-to-date Availability

Page 38: Value-adding, Access, and Use: Biological Databases as a Case Study

Growth of TrEMBL and SWISS-PROT

0

100

200

300

400

500

600

700

800

900

Nov-96 May-97 Nov-97 May-98 Nov-98 May-99 Nov-99 May-00 Nov-00 May-01 Nov-01 May-02

Publication Date

Entr

ies

in 1

000

SWISS-PROT TrEMBL

Page 39: Value-adding, Access, and Use: Biological Databases as a Case Study

Automatic annotation of TrEMBL

Data-mining to extract conditions from InterPro

Extract SWISS-PROT reference entries fulfilling the conditions

Extract common annotation Store conditions and common

annotation in RuleBase Group TrEMBL by conditions Add common annotation to

TrEMBL

TrEMBLTrEMBL

InterProInterPro

RuleBasRuleBasee

SWISS-PROT

Page 40: Value-adding, Access, and Use: Biological Databases as a Case Study

Cross-referencesDomains, functionalsites, protein familiesInterProPROSITEPfamPRINTSProDomSMART

Nucleotide sequence dbEMBL, GenBank, DDBJ

3D/Structural dbsHSSPPDB

Organism-spec.dbsDictyDbEcoGeneFlyBaseHIVLepromaMaizeDBMGDMypuListSGDStyGeneSubtiListTIGRTubercuListWormPepYEPDZfin

Protein-specificdbsGCRDbMEROPSREBASETRANSFAC

SWISS-PROT/ TrEMBL2D-gel protein dbs

SWISS-2DPAGEANU-2DPAGECOMPLUYEAST-2DPAGEECO2DBASEHSC-2DPAGEAarhus and GhentMAIZE-2DPAGEPHCI-2DPAGEPMMA-2DPAGESiena-2DPAGE

Human diseasesMIM

PTMCarbBankGlycoSuiteDB

Page 41: Value-adding, Access, and Use: Biological Databases as a Case Study

TrEMBL

UniProt Archive

EnsEMBL PDB PatentData

DDBJ/EMBL/

GenBank PIR

UniProt Knowledgebase:TrEMBL + SWISS-PROT

UniProt NREF100

UniProt NREF90

UniProtNREF50

SWISS-PROT

OtherData…

Classification

Automated Annotation Literature Based Annotation

RefSeq

Page 42: Value-adding, Access, and Use: Biological Databases as a Case Study

Funding

EMBL European Commission NIH Industrial licenses MRC IUPHAR

Page 43: Value-adding, Access, and Use: Biological Databases as a Case Study
Page 44: Value-adding, Access, and Use: Biological Databases as a Case Study

SWISS-PROT, TrEMBL, InterPro, etc, at EBI and SIB •Group leaders: Rolf Apweiler, Amos Bairoch

•Co-ordinators:Wolfgang Fleischmann, Henning Hermjakob, Michele Magrane, Maria-Jesus Martin, Nicola Mulder, Claire O’Donovan, Manuela Pruess

•Annotators/curators: Philippe Aldebert, Andrea Auchincloss, Kirsty Bates, Marie-Claude Blatter Garin, Brigitte Boeckmann, Silvia Braconi Quintaj, Paul Browne, Evelyn Camon, Danielle Coral, Elisabeth Coudert, Tania de Oliveria Lima, Kirill Degtyarenko, Sylvie Dethiollaz, Ann Estreicher, Livia Famiglietti, Nathalie Farriol-Mathis, Stephanie Federico, Serenella Ferro, Gill Fraser, Raffaella Gatto, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Janet James, Florence Jungo, Vivien Junker, Youla Karavidopoulou, Maria Krestyaninova, Kati Laiho, Minna Lehvaslaiho, Karine Michoud, Virginie Mittard, Madelaine Moinat, Sandra Orchard, Sandrine Pilbout, Sylvain Poux, Sorogini Reynaud, Catherine Rivoire, Bernd Röchert, Michel Schneider, Christian Sigrist, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Sandra van den Broek, Bob Vaughan, Eleanor Whitfield

•Programmers: Daniel Barrell, David Binns, Michael Darsow, Ujjwal Das, Eduardo de Castro, Alexander Fedotov, Astrid Fleischmann, Elisabeth Gasteiger, Alain Gateau, Andre Hackmann, Ivan Ivanyi, Eric Jain, Alexander Kanapin, Paul Kersey, Ernst Kretschmann, Corinne Lachaize, Chris Lewington, Xavier Martin, John Maslen, Peter McLaren, Rupinder Singh Mazara, Lorna Morris, John O’Rourke, Isabelle Phan, Astrid Rakow, Kai Runte, Florence Servant, Allyson Williams, Dan Wu

•Research staff: Kristian Axelsen, Pierre-Alain Binz, Nicolas Hulo, Anne-Lise Veuthey

•Clerical/secretarial assistance: Veronique Mangold, Claudia Sapsezian, Margaret Shore-Nye, Veronique Verbegue

•Students: Pavel Dobrokhotov, Alexandre Gattiker, various MCF, etc