ncbi fieldguide september 29, 2004 icgeb ncbi molecular biology resources a field guide part 1

32
NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

Upload: bennett-mccoy

Post on 03-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

September 29, 2004 ICGEB

NCBI Molecular Biology Resources

A Field Guidepart 1

Page 2: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Types of Databases

• Primary Databases– Original submissions by experimentalists– Database staff review and may organize the data, but we don’t

add/modify additional information– Records are “owned” and updated by their authors

• Examples: GenBank, SNP, GEO

• Derivative Databases– Human-curated (compilation and correction of data)

Examples: Gene(LocusLink), Structure & Literature databases

– Computationally-Derived

Example: UniGene

– Combination

Examples: RefSeq, Genome Assembly, Domain databases

Page 3: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

NCBI’s Derivative Sequence DatabaseNCBI’s Derivative Sequence Database

genomes transcripts proteins

GenBank

Page 4: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

– Forming the “best representative” sequence– Standardizing nomenclature and record structure– Adding annotation (references, sequence features)

RELEASE 6 IS NOW AVAILABLEON THE FTP SITE!

Page 5: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Curated genomic DNACurated genomic DNA(NC, NT, NW)(NC, NT, NW)

Curated Model mRNACurated Model mRNA (XM)(XM)(XR)(XR)

Curated mRNACurated mRNA (NM)(NM)(NR)(NR)

Model protein Model protein (XP)(XP)

RefSeq Curation Processes

ProteinProtein (NP)(NP)

Scanning....

Page 6: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

LOCUS NC_000913 4639221 bp DNA circular BCT 30-JUL-2003DEFINITION Escherichia coli K12, complete genome.ACCESSION NC_000913VERSION NC_000913.1 GI:16127994KEYWORDS .SOURCE Escherichia coli K12. ORGANISM Escherichia coli K12 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia.REFERENCE 1 (bases 1 to 4639221) AUTHORS Blattner,F.R., Plunkett,G. III, Bloch, C.A., Perna, N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode, C.K., Mayhew,G.F., Gregor,J., Davis,N.W., Kirkpatrick,H.A., Goeden,M.A., Rose,D.J., Mau,R. and Shao,Y. TITLE The complete genome sequence of Esherichia coli K12. JOURNAL Science 277 (5331), 1453-1474 (1997) MEDLINE 97426617 PUBMED 9278503REFERENCE 2 (bases 1 to 4639221) AUTHORS Blattner,F.R. TITLE Direct submission JOURNAL Sumbitted (16-JAN-1997) Guy Plunkett III, Laboratory of Genetics, University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. E-mail [email protected] Phone: 608-262-2543 Fax:

RefSeq Chromosomes: NC_

gene 3954631..3956478 /gene="mutL" /locus_tag="b4170" /note="synonym: mut-25"

CDS 3954631..3956478 /gene="mutL" /locus_tag="b4170" /function="methyl-directed mismatch repair" /codon_start=1 /transl_table=11 /product="MutL" /protein_id="NP_418591.1" /db_xref="GI:16131992"

/translation="MPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDI DIERGGAKLIRIRDNGCGIKKDELALALARHATSKIASLDDLEAIISLGFRGEALASI SSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAAHPVGTTLEVLDLFYNTPARRKF LRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQKERRLGAICGT AFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQAC EDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQL ETPLPLDDEPQPAPRSIPENRVAAGRNHFAEPAAREPVAPRYTPAPASGSRPAAPWPN AQPGYQKQQGEVYRQLLQTPAPMQKLKAPEPQEPALAANSQSFGRVLTIVHSDCALLE RDGNISLLSLPVAERWLRQAQLTPGEAPVCAQPLLIPLRLKVSAEEKSALEKAQSALA ELGIDFQSDAQHVTIRAVPLPLRQQNLQILIPELIGYLAKQSVFEPGNIAQWIARNLM SEHAQWSMAQAITLLADVERLCPQLVKTPPGGLLQSVDLHPAIKALKDE"

Annotation of

Gene, CDS,

and other features

BASE COUNT 978672 a1011074 c 997153 g 974742 t ORIGIN

1 cgtcttcatt gtcagacagc agaatttgta cgcgctgttc ggcttgttgt aatttggcct 61 gcccctgacg tgccagctgc acgccgcgtt cgaactcgtt cagcgcctct tccagcggca 121 ggtcgccact ttccagacgg gttacaatct gttccagctc gctcagcgcc ttttcaaagc 181 tggcgggcgc ctcatttttc ttcggcataa tgaatgtctg actctcaata tttttcgccc 241 cgtcatggta acggactcag ggcaaatagc aaataacgcg caatggtaag gtgatgtgca 301 cagcaaagcg atgttagtgg tatacttccg cgcctggatg cagccgcagg tgtgggctgc 361 tgtatttttc cctatacaag tcgcttaagg cttgccaacg aaccattgcc gccatgaagt 421 ttatcattaa attgttcccg gaaatcacca tcaaaagcca atctgtgcgc ttgcgcttta 481 taaaaatcct taccgggaac attcgtaacg ttttaaagca ctatgatgag acgctcgctg 541 tcgtccgcca ctgggataac atcgaagttc gcgcaaaaga tgaaaaccag cgtctggcta 601 ttcgcgacgc tctgacccgt attccgggta tccaccatat tctcgaagtc gaagacgtgc 661 cgtttaccga catgcacgat attttcgaga aagcgttggt tcagtatcgc gatcagctgg 721 aaggcaaaac cttctgcgta cgcgtgaagc gccgtggcaa acatgatttt agctcgattg 781 atgtggaacg ttacgtcggc ggcggtttaa atcagcatat tgaatccgcg cgcgtgaagc 841 tgaccaatcc ggatgtgact gtccatctgg aagtggaaga cgatcgtctc ctgctgatta 901 aaggccgcta cgaaggtatt ggcggtttcc cgatcggcac ccaggaagat gtgctgtcgc 961 tcatttccgg tggtttcgac tccggtgttt ccagttatat gttgatgcgt cgcggctgcc

Genome sequence

Page 7: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

• Non-redundant  • Explicitly linked nucleotide and protein sequences• Updated to reflect current sequence data and biology• Validated by hand • Format consistency• Distinct accession series • Stewardship by NCBI staff and collaborators

ftp://ftp.ncbi.nih.gov/refseq/release

RefSeq: NCBI’s Derivative Sequence Database

RefSeq Benefits

Page 8: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Genes:

The Gene Summary Database

Summary pagesof curated information about genetic loci

for organisms in the RefSeq project.

►Graphics►Gene information►Bibliography (PubMed links)►General gene information►NCBI Reference Sequences►Related sequences►Additional Links

Announcing!Announcing!

Page 9: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Entrez Gene

Page 10: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Page 11: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

NM/NP Records in Entrez Gene

Page 12: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

UniGene

• Records are clusters of mRNAs and ESTs that ideally represent single genes

• Records are created automatically by a modified BLAST algorithm

• UniGene provides a means to identify an EST or unannotated mRNA

Clustering Expressed Sequences

Page 13: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eA Cluster of ESTs:

Arabidopsis serine protease

query

5’ EST hits

3’ EST hits

Sequence & ExpressionSequence & Expression

Page 14: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Embryophyta Cycadopsida Pinus taeda (loblolly pine) Bryopsida Physcomitrella patensEudicotyledons Arabidopsis thaliana (thale cress) Glycine max (soybean) Helianthus annus (sunflower) Lactuca sativa (lettuce) Lotus corniculatus (lotus flower) Lycopersicon esculentum (tomato) Malus x domestica (apple) Medicago truncatula (barrel medic) Populus tremula/tremuloides (poplar) Solanum tuberosum (potato) Vitis vinifera (wine grape)Liliopsida Hordeum vulagre (barley) Oryza sativa (rice) Saccharum officinarum (noble cane) Sorghum bicolor (sorghum) Triticum aestivum (bread wheat) Zea mays (corn)

UniGene Collections As of July 2004Chordata Mammalia Bos taurus (cow) Canis familiaris (dog) Homo sapiens (human) Mus musculus (mouse) Ovis aries (sheep) Rattus norvegicus (rat) Sus scrofa (pig)

Aves Gallus gallus (chicken) Amphibia Xenopus laevis (african clawed frog) Xenopus tropicalis (western clawed frog) Actinopterygii Danio rerio (zebra fish) Oncorhynchus mykiss (rainbow trout) Oryzias Latipes (japanese rice fish) Salmo salar (salmon) Ascidiacea Ciona intestinalis (sea squirt)

Arthropoda Insecta Anopheles gambiae (malaria mosquito) Apis mellifera (honeybee) Drosophila melanogaster (fruit fly) Bombyx mori (silkworm)

Mycetozoa Dictyosteliida Dictyostedlium discoideum (slime mold)

Echinodermata Echinoidea Strongylocentrotus purpuratus Nematoda Chromadorea Caenorhabditis elegansPlatyhelminthes Trematoda Schistosoma mansoni

Chlorophyta Chlorophycaea Chlamydomonas reinhardii Apicomplexa Coccidia Toxoplasma gondii

Page 15: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Finding UniGene Clusters

by link

by Entrez search

Page 16: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

UniGene Cluster for PRNP

Page 17: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eComplete Genomes

as of June 2004

Organelles:

– Mitochondria (558)

– Plastids (40)

– Plasmids (626)

– Nucleomorphs (3)

Viruses (1923)

Archaebacteria (44)

Eubacteria (176)

Eukaryotes (61)

Page 18: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Simple Genomes

• Full chromosomal sequences are provided

• Genes are annotated

• The annotation can be shown graphically and linked to sequence records

Page 19: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Page 20: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

mutL

Page 21: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Complex Genomes

• Sequences are provided complete or we help assemble

• Heavy annotation: Genes, transcript regions & ORFs, sequence variations & markers, clones, ESTs, etc.

• The annotation can be shown graphically and linked to other

databases using the MapViewerMapViewer

Page 22: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Viewing Complex Genomes

• Map Viewer Home Page• Shows all supported organisms• Provides links to genomic BLAST

– Genome Overview Page• Provides links to individual chromosomes• Shows hits on a genome graphically

– Chromosome Viewing Page• Allows interactive views of annotation details• Provides numerous maps unique to each genome

NCBI Map Viewer

Page 23: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eMap Viewer Home Page

Page 24: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Genome Overview Page

Genomic BLAST

Species-specific help!

Search the maps

Page 25: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

PRNP

Search For Human PRNP

Page 26: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eHuman PRNP on Genome View

Page 27: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Chromosome Viewing Page

Master Mapwith exploded content

Genes

UniGene

Clone

Add or remove maps

ZoomingControls

Map Summary

Page 28: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eZooming in…

Left click

Page 29: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Map Viewer Analysis Tools

Link to OMIM

Link to ProteinEvidence Viewer

Homologene

Sequence Viewer

Download Sequence

ModelMaker

Homologene

Page 30: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Homologene

Page 31: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

eHomology Comparisons on Map Viewer

Page 32: NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NC

BI

Fie

ldG

uid

e

Intermission