generic database. what should a genome database do? search browse collect download results multiple...

37
Generic Database

Upload: noel-matthew-douglas

Post on 29-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Generic Database

Page 2: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

What should a genome database do?

SearchBrowseCollect

Download resultsMultipleformat

GenomeBrowser

InformationGenomicProteomicliterature

Interactwith otherDatabase

Generic

Usable by everyone

Page 3: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

GeneDB – An Overview

Aim – To provide a database to house the data from the many sequencingprojects that the Sanger Institute has been involved in. The database hadto be:

Generic, flexible enough to handle sequence from diverse organisms

Curatable, capable of being manually edited by annotators and curators

Intuitive and user friendly

Capable of housing new data types, easily expandable

Searchable, allow users complete flexibility in searching, selecting and downloading whatever information they want

Interactive, community feedback

Page 4: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Species Genome size Status Curated

Leishmania major 33600 In Finishing Yes

Leishmania infantum 33600 280k reads 5 X Yes

Trypanosoma b. brucei 35000 In Finishing Yes

Trypanosoma vivax 30000 300k reads ~6 X Yes

Trypanosoma cruzi ~41000 In Finishing 19 X No?

GeneDB November 2004 - Datasets www.genedb.org

Total number of organisms – 26

Number of protozoa - 12

Leishmania braziliensis ~33600 361k reads 5 X Yes

Trypanosoma congolense ~30000 262k reads ~5 X Yes

Trypanosoma b. gambiense ~30000 188k reads ~5 X Yes

Kinetoplastids

Page 5: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

WWW.genedb.org

Page 6: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 7: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

a) Basic information – on the selected gene

b) Location – The chromosome number, coordinates, gene length and a graphical map

c) Curated and/or automatic annotation

d) Predicted peptide propertiesstatistics on the predicted protein, known or predicted domains and motifs

Page 8: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

e) Gene Ontology – Annotationusing the GO controlled vocabulary.

f) Database cross referencesare linked to other public databases

g) Curated orthologs – databaselinks to manually selected orthologous genes

h) Similarity information and the respective database links

i) Swiss-Prot annotations – for this protein and keywords

j) Contact – feedback forms forcurators and technical queries

Page 9: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Orthologs and Paralogues in GeneDB

Tri-tryp orthologsPredicted by clustering and Reciprocal BLAST

Paralogs or familiesPredicted using BLAST P and TribeMCL4 BLAST e value cutoffs

TribeMCL Enright A.J., Van Dongen S., Ouzounis C.A; Nucleic Acids Res. 30(7):1575-1584 (2002)

Page 10: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Help

Page 11: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 12: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 13: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

(http://godatabase.org/cgi-bin/go.cgi?query=GO%3A0006166)

Page 14: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Sequence viewer and annotation tool

Page 15: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 16: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

How to access data:

• keyword searching

• sequence searching/ motif search

• complex querying

• browsable catalogues, product, domain

• browsable contig/chromosome maps

• GO (gene ontology) - AmiGO

• across species

Page 17: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 18: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Searching GeneDB

Simple Query

Sequence searchanalysis

Browse Catologues

Page 19: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Chromosome/contig maps

Page 20: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Search multiple datasets over multiple organisms, Uses more than one BLAST algorithm if appropriate

Produces an intermediate results page, listing summary of the top 5hits of all searches

If protein sequence used will also display predicted Pfam proteinfamilies found

Access full BLAST search result from intermediate page

OMNIBLAST

Page 21: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 22: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 23: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Complex querying

Page 24: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Complex querying with boolean search tool

Page 25: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Cross species search for nucleoside transporter

By name or ID

By product

By protein domain

Page 26: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

AmiGO – local Gene Ontology (GO) browser

Page 27: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 28: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Proteomics Tool

Select the datasetSelect restriction enzyme

Enter peptide mass data

Page 29: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Protein motif search

Page 30: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 31: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Data downloads Any search result that gives a list

History of any boolean queries

Page 32: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Contiguous sequence

Generate download list by adding to gene basket

Page 33: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Leishmania major Stats Trypanosoma brucei stats

Page 34: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

Gene Naming

Page 35: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic
Page 36: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic

• GeneDB reference guide

• Papers:Trends in Parasitology, 2002 18 (10) 465-67January 2004 issue of Nucleic Acids Research

• Feed back forms for technical and biological queries

More information

• http://www.genedb.org/

Page 37: Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic