![Page 1: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/1.jpg)
Challenges for metagenomic data analysis and lessons from viral metagenomes
[What would you do if sequencing were free?]
Rob Edwards
San Diego State UniversityFellowship for Interpretation of Genomes
The Burnham Institute for Medical Research
![Page 2: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/2.jpg)
Outline
• How and why we sequence environments
• Viral metagenomics– Marine stories– Human stories
• Pyrosequencing– Mine story
• Is there a Future?
![Page 3: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/3.jpg)
Why Metagenomics?
• What is there?
• How many are there?
• What are they doing?
![Page 4: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/4.jpg)
How do you sequence the environment?
• Extract DNA
![Page 5: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/5.jpg)
![Page 6: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/6.jpg)
![Page 7: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/7.jpg)
1.1 g ml-1
1.35 g ml-1
1.5 g ml-1
1.7 g ml-1
CsCl step gradient
![Page 8: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/8.jpg)
CsCl step gradient
![Page 9: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/9.jpg)
How do you sequence the environment?
• Extract DNA
• Create library
![Page 10: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/10.jpg)
Hydroshear
Blunt-ending
Addition of Linkers
Amplification of Fragments
Linker-Amplified Shotgun Libraries (LASLs)
HydroshearBlunt-ending
Addition of LinkersAmplification of Fragments
This method produces high coverage libraries of over 1 million clones from as little as 1 ng DNA
Soil Extraction Kit
David Mead -
Breitbart (2002) PNAS
![Page 11: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/11.jpg)
How do you sequence the environment?
• Extract DNA
• Create library
• Sequence fragments
![Page 12: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/12.jpg)
Outline
• How and why we sequence environments
• Viral metagenomics– Marine stories– Human stories
• Pyrosequencing– Mine story
• Is there a Future?
![Page 13: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/13.jpg)
Why Phages?
• Phages are viruses that infect bacteria– 10:1 ratio of phages:bacteria
– 1031 phages on the planet
• Specific interactions (probably)– one virus : one host
• Small genome size– Higher coverage
• Horizontal gene transfer– 1025-1028 bp DNA per year in the oceans
![Page 14: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/14.jpg)
![Page 15: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/15.jpg)
Uncultured Viruses
200 liters water 5-500 g fresh fecal matter
DNA/RNA LASL
Sequence
Epifluorescent Microscopy
Concentrate and purify viruses
Extract nucleic acids
![Page 16: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/16.jpg)
Bioinformatics
• BLASTagainst NR – blastx, tblastn, tblastx
• BLAST against boutique databases– Complete phage genomes, ACLAME, Other
libraries, 16S
• Parsing to present data in a useful format
![Page 17: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/17.jpg)
BLAST and Parsing
• http://phage.sdsu.edu/blast
• Submit BLAST to local and remote databases– Local (as fast as possible)– NCBI (one search every 3 seconds)
• Many concurrent searches– One search versus 1,000 searches
• Parse data into tables– Access to taxonomy etc
![Page 18: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/18.jpg)
Known22%
Unknown78%
Most Viral Genes are Unknown
Breitbart (2002) PNASRohwer (2003) Cell
TBLAST (E<0.001) 3,093 sequences
![Page 19: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/19.jpg)
60 billion base pairs
60 million sequences
GenBank has more than doubled since 2001 …
![Page 20: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/20.jpg)
GenBank has more than doubled since 2001 …
but the fraction of unknowns remains constant
Edwards (2005) Nature Rev. Microbiol.
![Page 21: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/21.jpg)
All of the new genes in the databases are
coming from environmental sequences
![Page 22: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/22.jpg)
Outline
• How and why we sequence environments
• Viral metagenomics– Marine stories– Human stories
• Pyrosequencing– Mine story
• Is there a Future?
![Page 23: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/23.jpg)
Human-associated viruses
• More bacteria than somatic cells by at least an order of magnitude
• More phages than bacteria by an order of magnitude
• Sample the bacteria in the intestine by sampling their phage
![Page 24: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/24.jpg)
Most Viral DNA Sequences in Adult Human Feces are Unknown Phages
Known40%
Unknown60%
Breitbart (2003) J. Bacteriol.
TBLAST (E<0.001) 532 sequences
Phages94%
Eukaryotic Viruses 6%
![Page 25: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/25.jpg)
No bacteria or viruses in 1st fecal sample
Abundant bacterial and viral communities by 1 week of age
Adults Versus Babies
>108 VLP/g feces
![Page 26: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/26.jpg)
Baby Feces Viruses
• Most sequences are unknown (≈70%)
• Similarities to phages from Lactococcus, Lactobacillus, Listeria, Streptococcus, and other Gram positive hosts
• From microarray studies, sequences are stable in the baby over a 3 month period
• Same types of phage as present in adult feces– one identical sequence to an unrelated adult!
![Page 27: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/27.jpg)
DNA viruses in feces are phages.
Feces ≠ intestines.
RNA viruses?
![Page 28: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/28.jpg)
Most Human RNA Viruses are Known
Known92%
Unknown8%
TBLAST (E<0.001) ≈36,000 sequences
Pepper MildMottle Virus
65%
Other PlantViruses
9%
Other26%
Zhang (2006) PLoS Biology
![Page 29: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/29.jpg)
Pepper Mild Mottle Virus (PMMV)• ssRNA virus; ≈6 kb genome• Related to Tobacco Mosaic Virus• Infects members of Capsicum family• Widely distributed – spread through seeds• Fruits are small, malformed, mottled• Rod-shaped virions
TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac.uk/ppi/links/pplinks/virusems/
Viral particles in fecal sample
![Page 30: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/30.jpg)
S1 S2 S3 S4 S5 S6 S7 S8 S9 PMMV
PMMV is common in Human FecesFecal samples
Extract total RNA
RT-PCR for PMMV
San Diego : 78% people are positiveSingapore : 67% people are positive
10-50 fold increase in feces compared to food106-109 PMMV copies per gram dry weight of feces
![Page 31: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/31.jpg)
India
n c
urr
yPork
noodle
red c
hili
Chic
ken r
ice
Chin
ese
food
Hong K
ong c
hili sa
uce
Hong K
ong g
reen c
hili
Vegeta
rian c
hili
Which Foods Contain PMMV?
Chili powder
Chili sauces
NOT FOUND IN FRESH PEPPERS
![Page 32: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/32.jpg)
T
he
su
nm
ac
hin
e.n
et
http://www.sweatnspice.com
Koch’s Postulates
![Page 33: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/33.jpg)
Human microbial metagenome is more
important than human genome
![Page 34: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/34.jpg)
Outline
• How and why we sequence environments
• Viral metagenomics– Marine stories– Human stories
• Pyrosequencing– Mine story
• Is there a Future?
![Page 35: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/35.jpg)
How do you sequence the environment?
• Extract DNA
• Create library
• Sequence fragments
Ever
ythi
ng so
far f
rom
40,0
00 seq
uenc
es
This
is so
200
4
![Page 36: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/36.jpg)
How do you sequence the environment?
• Extract DNA
• Pyrosequence
![Page 37: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/37.jpg)
454 Pyrosequencing
• Emulsion-based PCR
• Luciferase-based sequencing
} SDSU•DNA extraction from environment
•Whole genome amplification
} 454 Inc.
Margulies (2005) Nature
![Page 38: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/38.jpg)
454 Sequence Data(Only from Rohwer Lab)
• 21 libraries– 10 microbial, 11 phage
• 597,340,328 bp total– 20% of the human genome– 50% of all complete and partial microbial
genomes
• 5,769,035 sequences– Average 274,716 per library
• Average read length 103.5 bp– Av. read length has not increased in 7 months
![Page 39: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/39.jpg)
Growth of sequence data
6 million reads600 million bp
![Page 40: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/40.jpg)
Cost of sequencing
• One reaction = $10,000• One reaction = 250,000 reads• 250 reads = $10• 1 read = 4¢• 1 read = 100bp• 1 bp = 0.04¢
($400 per 1x 1,000,000 bp)
• Sanger sequencing ca. $1/rxn, 0.2¢/bp– real cost ca. $5/rxn, 1¢/bp
454 sequencing doescot require cloning, arrayingetc.
![Page 41: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/41.jpg)
Bioinformatics
• 597,340,328 bp total
• 5,769,035 sequences
• 7 months
• Existing tools are not sufficient
![Page 42: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/42.jpg)
Current Pipeline
• Dereplicate
• BLAST against– 16S– Complete phage– nr (SEED)– subsystems
http://phage.sdsu.edu/~rob/Pyrosequencing/
![Page 43: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/43.jpg)
Sequencing is cheap and easy.
Bioinformatics is neither.
![Page 44: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/44.jpg)
Outline
• How and why we sequence environments
• Viral metagenomics– Marine stories– Human stories
• Pyrosequencing– Mine story
• Is there a Future?
![Page 45: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/45.jpg)
The Soudan Mine, Minnesota
Red Stuff OxidizedBlack Stuff Reduced
![Page 46: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/46.jpg)
Red and Black Samples Are Different
Cloned and 454 sequenced16S are indistinguishable
Black stuff
Red
ClonedRed
![Page 47: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/47.jpg)
Annotation of metagenomes by subsystems
A subsystem is a group of genes that work together
– Metabolism– Pathway– Cellular structures– Anything an annotator thinks is
interesting
![Page 48: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/48.jpg)
There are different amounts of metabolism in each environment
![Page 49: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/49.jpg)
There are different amounts ofsubstrates in each environment
BlackStuff
RedStuff
![Page 50: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/50.jpg)
But are the differences significant?
• Sample 10,000 proteins from site 1• Count frequency of each subsystem• Repeat 20,000 times
• Repeat for sample 2
• Combine both samples• Sample 10,000 proteins 20,000 times• Build 95% CI
• Compare medians from sites 1 and 2 with 95% CI
Rodriguez-Brito (2006). In Review
![Page 51: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/51.jpg)
Examples of significantly different subsystems
Red Stuff
Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal
metabolism
Black Stuff
Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate
degradation
![Page 52: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/52.jpg)
Subsystem differences & metabolism
Iron acquisitionBlack Stuff
Siderophore enterobactin biosynthesisferric enterobactin transportABC transporter ferrichromeABC transporter heme
Black stuff: ferrous iron (Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])
Red stuff: ferric iron (goethite [FeO(OH)])
![Page 53: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/53.jpg)
Nitrification differentiates the samples
Edwards (2006)In review
![Page 54: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/54.jpg)
Not all biochemistry happens in a single organism
Anaerobic methane oxidationBoetius et al. Nature, 2000.
CH4 + SO42- -> HCO3
- + HS- + H2S
ArchaeaCH4 + H2O ->
HCO3- + OH + H2 -> CO2 +
H2OBacteriaSO4
2- + H2O ->
HS- + OH + 2O2
![Page 55: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/55.jpg)
The challenge is explaining the differences between samples
Red Sample
Arg, Trp, His UbiquinoneFA oxidationChemotaxis, FlagellaMethylglyoxal
metabolism
Black Sample
Ile, Leu, ValSiderophoresGlycerolipidsNiFe hydrogenasePhenylpropionate
degradation
![Page 56: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/56.jpg)
We are moving away from one organism one reaction andtowards studying the biochemistry of whole environments
Bacteria don’t live alone
![Page 57: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/57.jpg)
Summary
From 454 sequence:– Identify microbial composition– Identify metabolic function– Identify statistically significant
differences in metabolism
– Who, what, why of microbial ecology
![Page 58: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/58.jpg)
Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments
Metazoanassociated Corals Fish Human blood Human stool
Sampling Sites
Terrestrial/Soil Amazon rainforest Konza prairie Joshua Tree desert Singapore Air
Freshwater Aquifer Glacial lake
ExtremeHot springs (84oC; 78oC)Soda lake (pH 13)Solar saltern (>35% salt)
![Page 59: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/59.jpg)
SDSUForest RohwerMya BreitbartBeltran Rodriguez-Brito
Rohwer Lab:Linda WegleyFlorent AnglyMatt Haynes
Also at SDSUAnca SegallWillow SegallStanley Maloy
Math Guys@SDSU Peter Salamon Joe Mahaffy James Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller
MIT: Ed DeLong
NSF - Biotic Surveys and Inventories - Biological Oceanography
- Biocomplexity
FIG Veronika Vonstein Ross Overbeek Annotators
Genome Institute of Singapore: Zhang Tao Charlie Lee Chia Lin Wei Yijun Ruan
![Page 60: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/60.jpg)
![Page 61: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/61.jpg)
Viral Community Structure
• Contigs assembled from fragments with >= 98% identity over 20 bp are a resampling of a single phage genome
• Contig specturm is the number of contigs that have one sequence, the number that have two sequences, and so on
• Use both analytical and Monte-Carlo simulations to predict community structure from contig spectrum
The Math Guys (2006) In preparation
![Page 62: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/62.jpg)
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
0 10 20 30 40 50
Species Rank
Abundance of the species (%)
Determine the actual contig spectrum of the sample
Predict a contig spectrum using a species abundance modelCompute the error between the actual and predicted
Adjust the parameters in the species abundance model to minimize errors
Continue this procedure until we obtain the smallest error
Find the smallest error, a global minimum
Model parameters
Error
![Page 63: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/63.jpg)
Fecal
Seawater
MarineSediments
Viral Communities are Extremely Diverse
Lots of rare viral genotypes
![Page 64: Rob Edwards San Diego State University Fellowship for Interpretation of Genomes](https://reader035.vdocuments.mx/reader035/viewer/2022062409/568145fd550346895db30b11/html5/thumbnails/64.jpg)
0
1
2
3
4
5
6
7
8
9
10
Cropland Earthworms
Fossil CoralsForest Amphibians
River Bacteria
Sediment Viruses
Seawater VirusesSeawater Viruses
Fecal Viruses
Bacteria on CoralsAgriculture Soil Bacteria
Soil Nematodes
Rainforest SpidersAmazon Fish
Rainforest BirdsForest Mammals
Temperate Forest Beetles
Shannon-WienerIndex