biological databases an introduction
DESCRIPTION
Biological databases an introduction. By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007. Biological Databases. Sequence Databases Genome Databases Structure Databases. Sequence Databases. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/1.jpg)
Biological databasesan introduction
Biological databasesan introduction
By Dr. Erik Bongcam-Rudloff
LCB-UU/SLU
ILRI 2007
By Dr. Erik Bongcam-Rudloff
LCB-UU/SLU
ILRI 2007
![Page 2: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/2.jpg)
Biological Databases Biological Databases
Sequence Databases Genome Databases Structure Databases
Sequence Databases Genome Databases Structure Databases
![Page 3: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/3.jpg)
Sequence Databases Sequence Databases
The sequence databases are the oldest type of biological databases, and also the most widely used
The sequence databases are the oldest type of biological databases, and also the most widely used
![Page 4: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/4.jpg)
Sequence DatabasesSequence Databases
Nucleotide: ATGC
Protein: MERITSAPLG
Nucleotide: ATGC
Protein: MERITSAPLG
![Page 5: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/5.jpg)
The nucleotide sequence repositories
The nucleotide sequence repositories
There are three main repositories for nucleotide sequences: EMBL, GenBank, and DDBJ.
All of these should in theory contain "all" known public DNA or RNA sequences
These repositories have a collaboration so that any data submitted to one of databases will be redistributed to the others.
There are three main repositories for nucleotide sequences: EMBL, GenBank, and DDBJ.
All of these should in theory contain "all" known public DNA or RNA sequences
These repositories have a collaboration so that any data submitted to one of databases will be redistributed to the others.
![Page 6: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/6.jpg)
The three databases are the only databases that can issue sequence accession numbers.
Accession numbers are unique identifiers which permanently identify sequences in the databases.
These accession numbers are required by many biological journals before manuscripts are accepted.
The three databases are the only databases that can issue sequence accession numbers.
Accession numbers are unique identifiers which permanently identify sequences in the databases.
These accession numbers are required by many biological journals before manuscripts are accepted.
![Page 7: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/7.jpg)
It should be noted that during the last decade several commercial companies have engaged in sequencing ESTs and genomes that they have not made public.
It should be noted that during the last decade several commercial companies have engaged in sequencing ESTs and genomes that they have not made public.
![Page 8: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/8.jpg)
EST databases EST databases
Expressed sequence tags (ESTs) are short sequences from expressed mRNAs.
The basic idea is to get a handle on the parts of the genome that is expressed as mRNA (often called the transcriptome ).
ESTs are generated by end-sequencing clones from cDNA libraries from different sources.
Expressed sequence tags (ESTs) are short sequences from expressed mRNAs.
The basic idea is to get a handle on the parts of the genome that is expressed as mRNA (often called the transcriptome ).
ESTs are generated by end-sequencing clones from cDNA libraries from different sources.
![Page 9: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/9.jpg)
EST cluster databases EST cluster databases
UniGene UniGene is a database at NCBI that
contains clusters (UniGene clusters) of sequences that represent unique genes. These cluster are made automatically by partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
UniGene UniGene is a database at NCBI that
contains clusters (UniGene clusters) of sequences that represent unique genes. These cluster are made automatically by partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
![Page 10: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/10.jpg)
Ideal minimal content of a « sequence » dbIdeal minimal content of a « sequence » db
Sequences !!Accession number (AC)ReferencesTaxonomic dataANNOTATION/CURATIONKeywordsCross-referencesDocumentation
Sequences !!Accession number (AC)ReferencesTaxonomic dataANNOTATION/CURATIONKeywordsCross-referencesDocumentation
![Page 11: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/11.jpg)
Example: Swiss-Prot entry
Example: Swiss-Prot entry
sequence
Accession number
Entry name
![Page 12: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/12.jpg)
Protein nameGene name
Protein nameGene name
Taxonomy
![Page 13: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/13.jpg)
References
![Page 14: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/14.jpg)
Comments
![Page 15: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/15.jpg)
Cross-referencesCross-references
![Page 16: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/16.jpg)
KeywordsKeywords
![Page 17: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/17.jpg)
Feature table(sequence
description)
![Page 18: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/18.jpg)
Sequence database: exampleSequence database: example…a SWISS-PROT entry, in fasta format:
>sp|P01588|EPO_HUMAN ERYTHROPOIETIN PRECURSOR - Homo sapiens(Human).
MGVHECPAWLWLLLSLLSLPLGLPVLGAPPRLICDSRVLERYLLEAKEAE
NITTGCAEHCSLNENITVPDTKVNFYAWKRMEVGQQAVEVWQGLALLSEA
VLRGQALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALGAQKEAISPPD
AASAAPLRTITADTFRKLFRVYSNFLRGKLKLYTGEACRTGDR
…a SWISS-PROT entry, in fasta format:
>sp|P01588|EPO_HUMAN ERYTHROPOIETIN PRECURSOR - Homo sapiens(Human).
MGVHECPAWLWLLLSLLSLPLGLPVLGAPPRLICDSRVLERYLLEAKEAE
NITTGCAEHCSLNENITVPDTKVNFYAWKRMEVGQQAVEVWQGLALLSEA
VLRGQALLVNSSQPWEPLQLHVDKAVSGLRSLTTLLRALGAQKEAISPPD
AASAAPLRTITADTFRKLFRVYSNFLRGKLKLYTGEACRTGDR
![Page 19: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/19.jpg)
SWISS-PROT knowledgebaseSWISS-PROT knowledgebase
Created by Amos Bairoch in 1986 Collaboration between the SIB (CH) and EBI (UK) Annotated (manually), non-redundant, cross-
referenced, documented protein sequence database. ~122 ’000 sequences from more than 7’700 different
species; 192 ’000 references (publications); 958 ’000 cross-references (databases); ~400 Mb of annotations.
Weekly releases; available from more than 50 servers across the world, the main source being ExPASy
Created by Amos Bairoch in 1986 Collaboration between the SIB (CH) and EBI (UK) Annotated (manually), non-redundant, cross-
referenced, documented protein sequence database. ~122 ’000 sequences from more than 7’700 different
species; 192 ’000 references (publications); 958 ’000 cross-references (databases); ~400 Mb of annotations.
Weekly releases; available from more than 50 servers across the world, the main source being ExPASy
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 20: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/20.jpg)
SWISS-PROT: speciesSWISS-PROT: species
7’700 different species 20 species represent about 42% of all
sequences in the database 5’000 species are only represented by one
to three sequences. In most cases, these are sequences which were obtained in the context of a phylogenetic study
7’700 different species 20 species represent about 42% of all
sequences in the database 5’000 species are only represented by one
to three sequences. In most cases, these are sequences which were obtained in the context of a phylogenetic study
![Page 21: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/21.jpg)
Domains, functional sites, protein familiesPROSITEInterProPfamPRINTSSMARTMendel-GFDb
Domains, functional sites, protein familiesPROSITEInterProPfamPRINTSSMARTMendel-GFDb
Nucleotide sequence dbEMBL, GeneBank, DDBJ
Nucleotide sequence dbEMBL, GeneBank, DDBJ
2D and 3D Structural dbsHSSPPDB
2D and 3D Structural dbsHSSPPDB
Organism-spec. dbsDictyDbEcoGeneFlyBaseHIVMaizeDBMGDSGDStyGeneSubtiListTIGRTubercuListWormPepZebrafish
Organism-spec. dbsDictyDbEcoGeneFlyBaseHIVMaizeDBMGDSGDStyGeneSubtiListTIGRTubercuListWormPepZebrafish
Protein-specific dbsGCRDbMEROPSREBASETRANSFAC
Protein-specific dbsGCRDbMEROPSREBASETRANSFAC
SWISS-PROTSWISS-PROT
2D-gel protein databasesSWISS-2DPAGEECO2DBASEHSC-2DPAGEAarhus and GhentMAIZE-2DPAGE
2D-gel protein databasesSWISS-2DPAGEECO2DBASEHSC-2DPAGEAarhus and GhentMAIZE-2DPAGE
Human diseasesMIM
Human diseasesMIM
PTMCarbBankGlycoSuiteDB
PTMCarbBankGlycoSuiteDB
![Page 22: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/22.jpg)
AnnotationsAnnotations
Function(s)
Post-translational modifications (PTM)
Domains
Quaternary structure
Similarities
Diseases, mutagenesis
Conflicts, variants
Cross-references
…
Function(s)
Post-translational modifications (PTM)
Domains
Quaternary structure
Similarities
Diseases, mutagenesis
Conflicts, variants
Cross-references
…
![Page 23: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/23.jpg)
Annotation schemaAnnotation schema
Amos Bairoch
Amos Bairoch
Head annotator 1
Head annotator 1
Head annotator n
Head annotator n
Head annotator 2
Head annotator 2
AnnotatorsAnnotators AnnotatorsAnnotators AnnotatorsAnnotators
ExpertsExperts
……
……
SwissProtSwissProt
![Page 24: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/24.jpg)
Code Content Occurrence in an entry--------- ---------------------------- ---------------------------ID Identification One; starts the entryAC Accession number(s) One or moreDT Date Three timesDE Description One or moreGN Gene name(s) OptionalOS Organism species One or moreOG Organelle OptionalOC Organism classification One or moreOX Taxonomy cross-references One or moreRN Reference number One or moreRP Reference position One or moreRC Reference comment(s) OptionalRX Reference cross-reference(s) OptionalRA Reference authors One or moreRT Reference title OptionalRL Reference location One or moreCC Comments or notes OptionalDR Database cross-references OptionalKW Keywords OptionalFT Feature table data OptionalSQ Sequence header One Amino Acid Sequence One or more// Termination line One; ends the entry
Code Content Occurrence in an entry--------- ---------------------------- ---------------------------ID Identification One; starts the entryAC Accession number(s) One or moreDT Date Three timesDE Description One or moreGN Gene name(s) OptionalOS Organism species One or moreOG Organelle OptionalOC Organism classification One or moreOX Taxonomy cross-references One or moreRN Reference number One or moreRP Reference position One or moreRC Reference comment(s) OptionalRX Reference cross-reference(s) OptionalRA Reference authors One or moreRT Reference title OptionalRL Reference location One or moreCC Comments or notes OptionalDR Database cross-references OptionalKW Keywords OptionalFT Feature table data OptionalSQ Sequence header One Amino Acid Sequence One or more// Termination line One; ends the entry
Manual annotation
Manual annotation
![Page 25: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/25.jpg)
TrEMBL (Translated EMBL)TrEMBL (Translated EMBL)
TrEMBL: created in 1996;
Computer-annotated supplement to SWISS-PROT, as it is impossible to cope with the flow of data…
Well-structure SWISS-PROT-like resource
Derived from automated EMBL CDS translation (maintained at the EBI (UK))
TrEMBL is automatically generated and annotated using software tools (incompatible with the SWISS-PROT in terms of quality)
TrEMBL contains all what is not yet in SWISS-PROT
Yerk!! But there is no choice and these software tools are becoming quite good !
TrEMBL: created in 1996;
Computer-annotated supplement to SWISS-PROT, as it is impossible to cope with the flow of data…
Well-structure SWISS-PROT-like resource
Derived from automated EMBL CDS translation (maintained at the EBI (UK))
TrEMBL is automatically generated and annotated using software tools (incompatible with the SWISS-PROT in terms of quality)
TrEMBL contains all what is not yet in SWISS-PROT
Yerk!! But there is no choice and these software tools are becoming quite good !
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 26: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/26.jpg)
The simplified story of a Sprot entryThe simplified story of a Sprot entrycDNAs, genomes, ….cDNAs, genomes, ….
EMBLnew EMBLEMBLnew EMBL
TrEMBLnew TrEMBLTrEMBLnew TrEMBL
SWISS-PROTSWISS-PROT
« Automatic »• Redundancy check (merge)• InterPro (family attribution)• Annotation
« Automatic »• Redundancy check (merge)• InterPro (family attribution)• Annotation
« Manual »• Redundancy (merge,
conflicts)
• Annotation• Sprot tools (macros…)• Sprot documentation• Medline• Databases (MIM, MGD….)• Brain storming
« Manual »• Redundancy (merge,
conflicts)
• Annotation• Sprot tools (macros…)• Sprot documentation• Medline• Databases (MIM, MGD….)• Brain storming
Once in Sprot, the entry is no more in TrEMBL, but still in EMBL (archive)
Once in Sprot, the entry is no more in TrEMBL, but still in EMBL (archive)
CDS
![Page 27: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/27.jpg)
TrEMBL: exampleTrEMBL: example
Original TrEMBL entry which has been integrated into the SWISS-PROT EPO_HUMAN entry and thus which is not found in TrEMBL anymore.
Original TrEMBL entry which has been integrated into the SWISS-PROT EPO_HUMAN entry and thus which is not found in TrEMBL anymore.
![Page 28: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/28.jpg)
Some protein motif databases
Some protein motif databases
Prosite - Regular expression built from SWISS-PROT
PRINTS - aligned motif consensus built from OWL• (http://bioinf.man.ac.uk/dbbrowser/PRINTS/PRINTS.html)
BLOCKS - PRINTS-like generated from PROSITE families • (http://www.blocks.fhcrc.org/)
IDENTIFY - Fuzzy regular expressions derived from PROSITE
pfam - Hidden Markov Model built from SWISS-PROT
• (http://www.sanger.ac.uk/Software/Pfam)
Profiles - Weight Matrix profiles built from SWISS-PROT
Interpro - All of the above (almost)• (http://www.ebi.ac.uk/InterPro)
Prosite - Regular expression built from SWISS-PROT
PRINTS - aligned motif consensus built from OWL• (http://bioinf.man.ac.uk/dbbrowser/PRINTS/PRINTS.html)
BLOCKS - PRINTS-like generated from PROSITE families • (http://www.blocks.fhcrc.org/)
IDENTIFY - Fuzzy regular expressions derived from PROSITE
pfam - Hidden Markov Model built from SWISS-PROT
• (http://www.sanger.ac.uk/Software/Pfam)
Profiles - Weight Matrix profiles built from SWISS-PROT
Interpro - All of the above (almost)• (http://www.ebi.ac.uk/InterPro)
![Page 29: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/29.jpg)
A domain database synchronised with SWISS-PROT
A domain database synchronised with SWISS-PROT
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 30: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/30.jpg)
HistoryHistory
Founded by Amos Bairoch
1988 First release in the PC/Gene software
1990 Synchronisation with Swiss-Prot
1994 Integration of « profiles »
1999 PROSITE joins InterPro
January 2003 Current release 17.32
Founded by Amos Bairoch
1988 First release in the PC/Gene software
1990 Synchronisation with Swiss-Prot
1994 Integration of « profiles »
1999 PROSITE joins InterPro
January 2003 Current release 17.32
The databaseThe database
![Page 31: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/31.jpg)
Database contentDatabase content Official Release ~1330 Patterns PSxxxxx PATTERN ~252 Profiles PSxxxxx MATRIX 4 Rules PSxxxxx RULE ~1156 Documentations PDOCxxxxx
Pre-Release ~150 Profiles PSxxxxx MATRIX ~100 Documentations QDOCxxxxx
Official Release ~1330 Patterns PSxxxxx PATTERN ~252 Profiles PSxxxxx MATRIX 4 Rules PSxxxxx RULE ~1156 Documentations PDOCxxxxx
Pre-Release ~150 Profiles PSxxxxx MATRIX ~100 Documentations QDOCxxxxx
![Page 32: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/32.jpg)
Prosite (pattern): exampleProsite (pattern): example
![Page 33: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/33.jpg)
Prosite (pattern): exampleProsite (pattern): example
![Page 34: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/34.jpg)
Database content: documentationDatabase content: documentation
QuickTime™ et undécompresseur TIFF (LZW)sont requis pour visionner cette image.QuickTime™ et undécompresseur TIFF (LZW)sont requis pour visionner cette image.
![Page 35: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/35.jpg)
Other protein domain/family dbOther protein domain/family db
PROSITE Patterns / Profiles
ProDom Aligned motifs (PSI-BLAST) (Pfam B)
PRINTS Aligned motifs
Pfam HMM (Hidden Markov Models)
SMART HMM
TIGRfam HMM
DOMO Aligned motifs
BLOCKS Aligned motifs (PSI-BLAST)
CDD(CDART) PSI-BLAST(PSSM) of Pfam and SMART
PROSITE Patterns / Profiles
ProDom Aligned motifs (PSI-BLAST) (Pfam B)
PRINTS Aligned motifs
Pfam HMM (Hidden Markov Models)
SMART HMM
TIGRfam HMM
DOMO Aligned motifs
BLOCKS Aligned motifs (PSI-BLAST)
CDD(CDART) PSI-BLAST(PSSM) of Pfam and SMART
Interpro
Interpro
Text
![Page 36: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/36.jpg)
InterPro: www.ebi.ac.uk/interproInterPro: www.ebi.ac.uk/interpro
![Page 37: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/37.jpg)
InterPro exampleInterPro example
![Page 38: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/38.jpg)
InterPro exampleInterPro example
![Page 39: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/39.jpg)
InterPro graphic exampleInterPro graphic example
![Page 40: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/40.jpg)
Genomic DatabasesGenomic Databases
Genome databases differ from sequence databases in that the data contained in them are much more diverse.
The idea behind a genome database is to organize all information on an organism (or as much as possible).
In many cases they stem out of the necessity for a centralized resource for a particular genome project. But of course they are also important resources for the research community.
Genome databases differ from sequence databases in that the data contained in them are much more diverse.
The idea behind a genome database is to organize all information on an organism (or as much as possible).
In many cases they stem out of the necessity for a centralized resource for a particular genome project. But of course they are also important resources for the research community.
![Page 41: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/41.jpg)
Genomic DatabasesGenomic Databases
Ensembl Genome Browser NCBI
Ensembl Genome Browser NCBI
![Page 42: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/42.jpg)
Structure Databases Structure Databases
PDB SCOP
PDB SCOP
![Page 43: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/43.jpg)
PDBPDB
The Protein Data Bank ( PDB ) was established at Brookhaven National Laboratories (BNL) (1) in 1971 as an archive for biological macromolecular crystal structures.
The three dimensional structures in PDB are primarily derived from experimental data obtained by X-ray crystallography and NMR .
The Protein Data Bank ( PDB ) was established at Brookhaven National Laboratories (BNL) (1) in 1971 as an archive for biological macromolecular crystal structures.
The three dimensional structures in PDB are primarily derived from experimental data obtained by X-ray crystallography and NMR .
![Page 44: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/44.jpg)
SCOPSCOP
The SCOP database groups different protein structures
according to their evolutionary relationship.The
evolutionary relationship of all known protein structures
have been determined by manual inspection and
automated methods.
The goal of SCOP is to provide detail information about
close relatives of proteins and protein and to provide an
evolutionary based protein classification resource.
The SCOP database groups different protein structures
according to their evolutionary relationship.The
evolutionary relationship of all known protein structures
have been determined by manual inspection and
automated methods.
The goal of SCOP is to provide detail information about
close relatives of proteins and protein and to provide an
evolutionary based protein classification resource.
![Page 45: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/45.jpg)
UniProt: United Protein databaseUniProt: United Protein database
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
SWISS-PROT + TrEMBL + PIR = UniProt
Born in October 2002
NIH pledges cash for global protein database The United States is turning to European bioinformatics facilities to
help it meet its researchers' future needs for databases of protein sequences.
European institutions are set to be the main recipients of a $15-million, three-year grant from the US National Institutes of Health (NIH), to set up a global database of information on protein sequence and function known as the United Protein Databases, or UniProt (Nature, 419, 101 (2002))
SWISS-PROT + TrEMBL + PIR = UniProt
Born in October 2002
NIH pledges cash for global protein database The United States is turning to European bioinformatics facilities to
help it meet its researchers' future needs for databases of protein sequences.
European institutions are set to be the main recipients of a $15-million, three-year grant from the US National Institutes of Health (NIH), to set up a global database of information on protein sequence and function known as the United Protein Databases, or UniProt (Nature, 419, 101 (2002))
![Page 46: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/46.jpg)
Some examples of integrated biological database resources are:
Some examples of integrated biological database resources are:
SRS (Sequence Retrieval System) Entrez Browser (at NCBI) ExPASy (home of SwissProt) Ensembl (Open Source based system) Human Genome Browser (Jim Kents creation)
SRS (Sequence Retrieval System) Entrez Browser (at NCBI) ExPASy (home of SwissProt) Ensembl (Open Source based system) Human Genome Browser (Jim Kents creation)
![Page 47: Biological databases an introduction](https://reader035.vdocuments.mx/reader035/viewer/2022062314/568146ba550346895db3e9ba/html5/thumbnails/47.jpg)
THANKSTHANKS
Laurent Falquet, SIB and EMBnet-CH for slides and information on SwissProt and Prosite
Laurent Falquet, SIB and EMBnet-CH for slides and information on SwissProt and Prosite