computational biology service unit cornell university
TRANSCRIPT
![Page 1: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/1.jpg)
Overview of current biological databases
Qi Sun
Computational Biology Service Unit
Cornell University
![Page 2: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/2.jpg)
![Page 3: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/3.jpg)
Web Server Database Server
SOAP
HTTP
FTP
SQL
Platforms for Bioinformatics
![Page 4: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/4.jpg)
LinuxApacheMysqlPerl/Python/PHP
WindowsASP.NETSQL ServerC#
Open source Micorsoft
Platforms for Bioinformatics
![Page 5: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/5.jpg)
Archival database (GenBank, GenPept)
vs
Computer algorithm generated database (Unigene)
vs
Manually curated database (RefSeq)
Public Database - 1
NCBI Sequence Data Model
![Page 6: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/6.jpg)
The NCBI Data Model
Genbank- A DNA centered database
![Page 7: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/7.jpg)
1. LOCUS (obsolete)2. Accession (version)3. GI
Identifier:
![Page 8: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/8.jpg)
Features
![Page 9: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/9.jpg)
GenPept- A protein centered database
![Page 10: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/10.jpg)
FTP sites:
GenBank: ftp://ftp.ncbi.nih.gov/genbank/
GenPept: ftp://ftp.ncifcrf.gov/pub/genpept/
![Page 11: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/11.jpg)
Problems with Genbank and Genpept
• It does not distinguish the sequence categories.
• Lot of redundancy.• Same gene could be deposited into the database many times with different names
• Different version of the same gene could be submitted many times with different accession number.
• The features of genbank record could be chaotic.
![Page 12: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/12.jpg)
Archival database (GenBank, GenPept)
vs
Computer algorithm generated database (Unigene)
vs
Curated database (RefSeq, Locuslink ...)
Public Database - 1
NCBI Sequence Databases
![Page 13: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/13.jpg)
UniGenea non-redundant set of gene-oriented clusters
GenBankmRNAs
GenBank genomic CDSs
dbESTESTs
Unigene
![Page 14: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/14.jpg)
Hs for humanMm for mouseRn for ratBt for cowDr for zebrafishDm for fruitflyAga for mosquitoXl for frogAt for cressHv for barleyOs for riceTa for wheatsZm for maize
Unigene identifier
Examples:
Mm.213407
Hs.13303
At.138
![Page 15: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/15.jpg)
Archival database (GenBank, GenPept)
vs
Computer generated database (Unigene)
vs
Curated database (RefSeq, Gene ...)
NCBI Sequence Databases
Public Database - 1
![Page 16: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/16.jpg)
NCBI human genome annotation pipeline
The refseq incorporate the predicted transcript and protein sequences, experimentally identified mRNA sequences, EST sequences.
![Page 17: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/17.jpg)
Refseq Accession Numbers:
NT_123456 constructed genomic contigs
NM_123456 mRNAs
NP_123456 proteins
NC_123456 chromosomes
XM_123456 predicted mRNA
XP_123456 predicted protein
![Page 18: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/18.jpg)
Genome sequence available
Refseqacc: NP_123456, et al
EST sequence available
Unigeneacc: Hs.13303, et al
Genbankacc: AP33493, et al
Refseq? Unigene? Genbank?
![Page 19: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/19.jpg)
Go to the web
![Page 20: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/20.jpg)
Files that you can download from the NCBI gene database
gene_infogene2refseqgene2go
![Page 21: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/21.jpg)
NCBI Search engine
Entrez• boolean operators “AND” “OR” “NOT”• entrez tags• using limits• MeSH terms
Batch Entrez
search by accession list
![Page 22: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/22.jpg)
Other Sequence Databases:
Genomic DNA: Ensembl Genome annotation database(http://www.ensembl.org, HTTP, FTP, MySQL interface)
Protein: Uniprot(http://www.pir.uniprot.org/ )
![Page 23: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/23.jpg)
KEGG database go to the web
![Page 24: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/24.jpg)
Public Database - 2
GOGene Ontology
1. Molecular Function2. Biological Process3. Cellular Component
http://www.geneontology.org
![Page 25: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/25.jpg)
Public Database - 2
![Page 26: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/26.jpg)
Public Database - 2
Molecular Function 3674
Biological Process 8150
Cellular Component 5575
GO3673
![Page 27: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/27.jpg)
GO Example 1:
Biological Process
![Page 28: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/28.jpg)
GO Example 2:
Molecular Function
![Page 29: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/29.jpg)
Smn: survival motor neuronGene ID: 39844
Gene Ontology Annotation
![Page 30: Computational Biology Service Unit Cornell University](https://reader030.vdocuments.mx/reader030/viewer/2022020706/61fca0a19d50e757a521d35d/html5/thumbnails/30.jpg)
Public Database - 4
Species Specific Databases
•Arabidopsis – TAIR• Yeast – SGD• Fly – FLYBASE• Worm – WORMBASE• Mouse – MGD