annotating metagenomes using the seed rob edwards department of computer sciences, san diego state...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Annotating Metagenomes Using the SEED
Rob Edwards
Department of Computer Sciences, San Diego State University
Mathematics and Computer Sciences Division, Argonne National Laboratory
NSF/EU Cyberinfrastructure Meeting, Washington, DC.
www.nmpdr.org www.theseed.org
Firstbacterial genome
100bacterial genomes
1,000bacterial genomesN
um
ber
of
know
n s
equence
s
Year
How much has been sequenced?
Environmentalsequencing
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced?
One genome fromevery species
Most majormicrobial environments
What do we want from annotations?
ConsistentAccurateAvailableReliable
www.nmpdr.org www.theseed.org
Consistent
www.nmpdr.org www.theseed.org
The Importance of Consistency
• Consistency: same genes connected to same functional role
• Enables communication
• Required for most comparative genomics assays
www.nmpdr.org www.theseed.org
hisAFIG function:
Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16)
Other functions in RefSeq:
phosphoribosylformimino-5-aminoimidazole carboxamidephosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerasephosphoribosylformimino-5-aminoimidazole carboxamide ribotide...
1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase
N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomeraseN-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase
Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase]
www.nmpdr.org www.theseed.org
Measuring Consistency• Define a set of protein families such that each family
contains genes playing the same function
• Attach functional roles to protein families• Measure the consistency of the annotations made to
genes within each family
1. "consistency" is the odds that two proteins from the same family have the same function
2. Evaluate both families and functions.
www.nmpdr.org www.theseed.org
Consistency among databases
www.nmpdr.org www.theseed.org
Accurate
www.nmpdr.org www.theseed.org
How to measure accuracy
• If everything was called “hypothetical protein” the database would be 100% consistent
• Need to measure accuracy (specificity) as well as consistency
• Sample 100 proteins at random from “curated” set (i.e. that are believed to be correct)
• Manually inspect annotations to score correctness
www.nmpdr.org www.theseed.org
Available
www.nmpdr.org www.theseed.org
http://metagenomics.theseed.org
Free serviceUser registration/log inFree to upload sequences in several formatsAutomatically annotates sequencesDownload in several formats
Complete genomes too: http://www.nmpdr.org/anno-server
Soon to come:Plasmids, phages, other short genomes
Metagenome Metabolic Reconstruction
Metabolic potential in environments
Phylogenomics
Comparing Metagenomes to Genomes(or other metagenomes!)
Reliable (Believable)
Metabolic potential in environments
Sulfur
CDA 60.2%
CD
A 2
1.7
% Respiration
Capsule Motility
Membranetransport
Stress
Signaling
Phosphorus
RNA
MineSaltern
MarineMicrobialites
CoralFish
AnimalsFreshwater
From sequences to environments
What do we want from annotations?
ConsistentAccurateAvailableReliable
When
do
we wan
t it?
NOW
AcknowledgementsEnvironmental Genomics
Forest RohwerRohwer lab membersAll the labs that
provided sequence
Metagenomics Annotation ServerRick StevensDaniel Paarman Folker MeyerBob Olsen
StatisticsLiz DinsdaleDana HallBeltran Rodriguez-Brito
FIGRoss OverbeekVeronika VonsteinAnnotators
Subsystems make up metabolism
Wik
ipedia
Meta
bolis
mhtt
p:/
/en.w
ikip
edia
.org
/wik
i/Port
al:M
eta
bolis
m