information resources for bioinformatics 1 marc: developing bioinformatics programs july, 2008 alex...
TRANSCRIPT
![Page 1: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/1.jpg)
Information Resources for Bioinformatics
1
MARC: Developing Bioinformatics ProgramsJuly, 2008
Alex [email protected]
Hugh [email protected]
These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center
![Page 2: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/2.jpg)
What is an Information Resource? A compilation of prior experimental
knowledge about biologically relevant molecules into a computer system.
Bioinformatics power is in the ability to leverage and apply this prior experimental knowledge to additional biological problems.
In order to effectively search prior experimental knowledge, the prior experimental knowledge must be organized in a way that makes sense from both a computer science prospective and a biological point of view.
![Page 3: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/3.jpg)
How is Information Organized? From a computer-science perspective, there
are several ways that data can be organized and stored: In a relational database In a flat file In a networked (hyperlinked) model
From a biologists perspective, there are also several different ways that data can be organized: Sequence Structure Family/Domain Species Taxonomy
Function/Pathway Disease/Variation Publication Journal And many other ways
![Page 4: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/4.jpg)
Sequence Data Libraries Organized according to sequence When one talks about “searching sequence
databases” these are the libraries that they are searching
Main sources for sequence libraries are direct submissions from individual researchers, genome sequencing projects, patent applications and other public resources. Genbank, EMBL, and the DNA Database of Japan (DDBJ)
are examples of annotated collections publicly available DNA sequences.
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data
![Page 5: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/5.jpg)
Structural Data Libraries Contain information about the (3-dimensional)
structure of the molecule Main sources of structural data are direct
submissions from researchers. Data can be submitted via a variety of experimental techniques including X-ray crystallography NMR structure depositions. EM structure depositions. Other methods (including Electron diffraction,
Fiber diffraction). The Protein Data Bank and the Cambridge
Structural Database are two well-known repositories of structural information
![Page 6: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/6.jpg)
Family and Domain Libraries Typically built from sets of related
sequences and contain information about the residues that are essential to the structure/function of the sequences
Used to: Generate a hypothesis that the query
sequence has the same structure/function as the matching group of sequences.
Quickly identify a good group of sequences known to share a biological relationship.
Some examples: PFAM, Prosite, BLOCKS, PRINTS
![Page 7: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/7.jpg)
Species Libraries Goal is to collect and organize a variety of
information concerning the genome of a particular species
Usually each species has its own portal to access information such as genomic-scale datasets for the species.
Examples: EuPathDB - Eukaryotic Pathogens Database
(Cryptosporidium, Giardia, Plasmodium, Toxoplasma and Trichomonas)
Saccharomyces Genome Database Rat Genome Database Candida Genome Database
![Page 8: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/8.jpg)
Taxonomy Libraries The science of naming and classifying
organisms Taxonomy is organized in a tree structure,
which represents the taxonomic lineage. Bottom level leafs represents species or sub-
species Top level nodes represent higher ranks like
phylum, order and family Examples:
NEWT NCBI Taxonomy
![Page 9: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/9.jpg)
Taxonomy Libraries - NEWT
![Page 10: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/10.jpg)
NCBI Taxonomy Browser
![Page 11: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/11.jpg)
Function/Pathway Collection of pathway maps representing our
knowledge on the molecular interaction and reaction networks for: Metabolism Genetic Information Processing Environmental Information Processing Cellular Processes Human Diseases Drug Development
Examples: KEGG Pathway Database NCI-Nature Pathway Interaction Database
![Page 12: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/12.jpg)
Disease/Variation Catalogs of genes involving variations
including within populations and among populations in different parts of the world as well as genetic disorders and other diseases.
Examples: OMIM, Online Mendelian Inheritance in Man -
focuses primarily on inherited, or heritable, genetic diseases in humans
HapMap - a catalog of common genetic variants that occur in humans.
![Page 13: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/13.jpg)
Journal U.S. National Library of Medicine PubMed is the premiere resources for scientific
literature relevant to the biomedical sciences. Includes over 18 million citations from MEDLINE and
other life science journals for articles back to the 1950s. PubMed includes links to full text articles and other
related resources. Common uses of PubMed:
Find journal articles that describe the structure/function/evolution of sequences that you are interested in
Find out if anyone has already done the work that you are proposing
![Page 14: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/14.jpg)
Integrated Information Resources Integrated resources typically use a combination of
relational databases and hyperlinks to databases maintained by others to provide more information than any single data source can provide
Many Examples: NCBI Entrez – NCBI’s cross-database tool iProClass - proteins with links to over 90 biological
databases. including databases for protein families, functions and pathways, interactions, structures and structural classifications, genes and genomes, ontologies, literature, and taxonomy
InterPro - Integrated Resource Of Protein Domains And Functional Sites.
![Page 15: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/15.jpg)
NCBI Entrez Data Integration
![Page 16: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/16.jpg)
NCBI Entrez
![Page 17: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/17.jpg)
NCBI Entrez Results
![Page 18: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/18.jpg)
NCBI Entrez PubMed Results
![Page 19: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/19.jpg)
NCBI Entrez OMIM Results
![Page 20: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/20.jpg)
NCBI Entrez Core Nucleotide Results
![Page 21: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/21.jpg)
NCBI Entrez Core Nucleotide Results
![Page 22: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/22.jpg)
NCBI Entrez Core Nucleotide Results
![Page 23: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/23.jpg)
NCBI Entrez Core Nucleotide Results
![Page 24: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/24.jpg)
NCBI Entrez Saving Sequences
![Page 25: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/25.jpg)
NCBI Sequence Identifiers Accession Number: unique identifier given
to a sequence when it is submitted to one of the DNA repositories (GenBank, EMBL, DDBJ). These identifiers follow an accession.version format. Updates increment the version, while the accession remains constant.
GI: GenInfo Identifier. If a sequence changes a new GI number will be assigned. A separate GI number is also assigned to each protein translation.
![Page 26: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/26.jpg)
iProClass Protein Knowledgebase Protein centric Links to over 90 biological data libraries Goal is to provide a comprehensive picture of
protein properties that may lead to functional inference for previously uncharacterized "hypothetical" proteins and protein groups.
Uses both data warehousing in relational databases as well as hypertext links to outside data sources
![Page 27: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/27.jpg)
iProclass Integration
![Page 28: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/28.jpg)
iProclass Search Form
![Page 29: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/29.jpg)
iProclass Results
![Page 30: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/30.jpg)
iProClass SuperFamily Summary
![Page 31: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/31.jpg)
iProClass SuperFamily Summary
![Page 32: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/32.jpg)
iProClass SuperFamily Summary
![Page 33: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/33.jpg)
iProClass SuperFamily Summary
![Page 34: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/34.jpg)
iProClass PDB Structure 1a27
![Page 35: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/35.jpg)
iProClass Domain Architecture
![Page 36: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/36.jpg)
PIRSF Family Hierarchy
![Page 37: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/37.jpg)
iProClass Taxonomy Nodes
![Page 38: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/38.jpg)
iProClass Enzyme Function: KEGG
![Page 39: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/39.jpg)
iProClass Pathway: KEGG
![Page 40: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/40.jpg)
iProClass: Saving Sequences
Check
Entries
Save
Format
![Page 41: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/41.jpg)
InterPro Integrated resource of protein families,
domains, repeats and sites from member databases (PROSITE, Pfam, Prints, ProDom, SMART and TIGRFAMs).
Member databases represent features in different ways: Some use hidden Markov models, some use position specific scoring meaticies, some use ambiguous consensus patterns.
Easy way to search several libraries at once with a query.
![Page 42: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/42.jpg)
InterPro – Searching with InterProScan
![Page 43: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/43.jpg)
InterPro - InterProScan Results
![Page 44: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/44.jpg)
InterPro - InterProScan Results
![Page 45: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/45.jpg)
InterPro - InterProScan Results
![Page 46: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/46.jpg)
InterPro - InterProScan Results
![Page 47: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/47.jpg)
InterPro - InterProScan Results
![Page 48: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/48.jpg)
Didn’t find what you are looking for?
Don’t despair, there are many additional information resources for bioinformatics on the web: Try Google Look at Nucleic Acids Research Database Special
Issue http://www.oxfordjournals.org/nar/database/search/
![Page 49: Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu](https://reader033.vdocuments.mx/reader033/viewer/2022051620/56649e625503460f94b5d780/html5/thumbnails/49.jpg)
Finding Information Resources