research article virus-host and crispr dynamics in archaea

13
Hindawi Publishing Corporation Archaea Volume 2013, Article ID 370871, 12 pages http://dx.doi.org/10.1155/2013/370871 Research Article Virus-Host and CRISPR Dynamics in Archaea-Dominated Hypersaline Lake Tyrrell, Victoria, Australia Joanne B. Emerson, 1,2 Karen Andrade, 3 Brian C. Thomas, 1 Anders Norman, 1,4 Eric E. Allen, 5,6 Karla B. Heidelberg, 7 and Jillian F. Banfield 1,3 1 Department of Earth and Planetary Science, University of California, Berkeley, 307 McCone Hall, Berkeley, CA 94720-4767, USA 2 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO, USA 3 Department of Environmental Science, Policy, and Management, University of California, Berkeley, 54 Mulford Hall, Berkeley, CA 94720, USA 4 Department of Biology, University of Copenhagen, Copenhagen, Denmark 5 Marine Biology Research Division, Scripps Institution of Oceanography, La Jolla, CA, USA 6 Division of Biological Sciences, University of California, San Diego, La Jolla, CA 92093-0202, USA 7 Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA Correspondence should be addressed to Joanne B. Emerson; [email protected] Received 7 December 2012; Revised 17 May 2013; Accepted 27 May 2013 Academic Editor: Yoshizumi Ishino Copyright © 2013 Joanne B. Emerson et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e study of natural archaeal assemblages requires community context, namely, a concurrent assessment of the dynamics of archaeal, bacterial, and viral populations. Here, we use filter size-resolved metagenomic analyses to report the dynamics of 101 archaeal and bacterial OTUs and 140 viral populations across 17 samples collected over different timescales from 2007–2010 from Australian hypersaline Lake Tyrrell (LT). All samples were dominated by Archaea (75–95%). Archaeal, bacterial, and viral populations were found to be dynamic on timescales of months to years, and different viral assemblages were present in planktonic, relative to host-associated (active and provirus) size fractions. Analyses of clustered regularly interspaced short palindromic repeat (CRISPR) regions indicate that both rare and abundant viruses were targeted, primarily by lower abundance hosts. Although very few spacers had hits to the NCBI nr database or to the 140 LT viral populations, 21% had hits to unassembled LT viral concentrate reads. is suggests local adaptation to LT-specific viruses and/or undersampling of haloviral assemblages in public databases, along with successful CRISPR-mediated maintenance of viral populations at abundances low enough to preclude genomic assembly. is is the first metagenomic report evaluating widespread archaeal dynamics at the population level on short timescales in a hypersaline system. 1. Introduction As the most abundant and ubiquitous biological entities, viruses influence host mortality and community structure, food web dynamics, and geochemical cycles [1, 2]. In order to better characterize the potential influence that viruses have on archaeal evolution and ecology, it is important to under- stand the coupled dynamics of viruses and their archaeal hosts in natural systems. Although previous studies have demonstrated dynamics in virus-host populations, most of these studies have focused on bacterial hosts, oſten restricted to targeted groups of virus-host pairs, and little is known about archaeal virus-host dynamics in natural systems. Community-scale virus-host analyses have oſten been based on low-resolution measurements of the whole com- munity, relying on techniques such as denaturing gradient gel electrophoresis (DGGE), pulsed-field gel electrophoresis (PFGE), and microscopic counts (e.g., [35]). One excep- tion is a study that examined viral and microbial dynam- ics through single read-based metagenomic analyses in

Upload: others

Post on 16-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Hindawi Publishing CorporationArchaeaVolume 2013 Article ID 370871 12 pageshttpdxdoiorg1011552013370871

Research ArticleVirus-Host and CRISPR Dynamics in Archaea-DominatedHypersaline Lake Tyrrell Victoria Australia

Joanne B Emerson12 Karen Andrade3 Brian C Thomas1 Anders Norman14

Eric E Allen56 Karla B Heidelberg7 and Jillian F Banfield13

1 Department of Earth and Planetary Science University of California Berkeley 307 McCone Hall Berkeley CA 94720-4767 USA2Cooperative Institute for Research in Environmental Sciences University of Colorado Boulder CO USA3Department of Environmental Science Policy and Management University of California Berkeley 54 Mulford HallBerkeley CA 94720 USA

4Department of Biology University of Copenhagen Copenhagen Denmark5Marine Biology Research Division Scripps Institution of Oceanography La Jolla CA USA6Division of Biological Sciences University of California San Diego La Jolla CA 92093-0202 USA7Department of Biological Sciences University of Southern California Los Angeles CA 90089 USA

Correspondence should be addressed to Joanne B Emerson jemersonberkeleyedu

Received 7 December 2012 Revised 17 May 2013 Accepted 27 May 2013

Academic Editor Yoshizumi Ishino

Copyright copy 2013 Joanne B Emerson et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

The study of natural archaeal assemblages requires community context namely a concurrent assessment of the dynamics ofarchaeal bacterial and viral populations Here we use filter size-resolved metagenomic analyses to report the dynamics of 101archaeal and bacterial OTUs and 140 viral populations across 17 samples collected over different timescales from 2007ndash2010from Australian hypersaline Lake Tyrrell (LT) All samples were dominated by Archaea (75ndash95) Archaeal bacterial and viralpopulations were found to be dynamic on timescales of months to years and different viral assemblages were present in planktonicrelative to host-associated (active and provirus) size fractions Analyses of clustered regularly interspaced short palindromic repeat(CRISPR) regions indicate that both rare and abundant viruses were targeted primarily by lower abundance hosts Although veryfew spacers had hits to the NCBI nr database or to the 140 LT viral populations 21 had hits to unassembled LT viral concentratereadsThis suggests local adaptation to LT-specific viruses andor undersampling of haloviral assemblages in public databases alongwith successful CRISPR-mediated maintenance of viral populations at abundances low enough to preclude genomic assemblyThisis the firstmetagenomic report evaluating widespread archaeal dynamics at the population level on short timescales in a hypersalinesystem

1 Introduction

As the most abundant and ubiquitous biological entitiesviruses influence host mortality and community structurefood web dynamics and geochemical cycles [1 2] In orderto better characterize the potential influence that viruses haveon archaeal evolution and ecology it is important to under-stand the coupled dynamics of viruses and their archaealhosts in natural systems Although previous studies havedemonstrated dynamics in virus-host populations most of

these studies have focused on bacterial hosts often restrictedto targeted groups of virus-host pairs and little is knownabout archaeal virus-host dynamics in natural systems

Community-scale virus-host analyses have often beenbased on low-resolution measurements of the whole com-munity relying on techniques such as denaturing gradientgel electrophoresis (DGGE) pulsed-field gel electrophoresis(PFGE) and microscopic counts (eg [3ndash5]) One excep-tion is a study that examined viral and microbial dynam-ics through single read-based metagenomic analyses in

2 Archaea

four aquatic environments including an archaea-dominatedhypersaline crystallizer pond [6] In that work it was pro-posed that microorganisms and viruses persisted over time atthe level of individual taxa (species) but were highly dynamicat the genotype (strain) level However in a reanalysis of someof those data by our group using metagenomic assembly weconcluded that viruses were actually dynamic at the popula-tion (taxon) level in that system [7] This result suggests thatfurther analyses are necessary to determine whether archaealpopulations tend to be dynamic or stable in hypersalinesystems on short timescales

Of the relatively few metagenomic analyses of virus-hostdynamics that have been reported several have consideredthe clustered regularly interspaced short palindromic repeat(CRISPR) system which provides an opportunity to studyhostsrsquo responses to viral predation and to link viruses to hosts[8ndash12] The CRISPR system is a genomic region in nearly allarchaea and some bacteria and CRISPRs (at least in all sys-tems that have been biochemically characterized to date) havebeen shown to confer adaptive immunity to viruses andorother mobile genetic elements through nucleotide sequenceidentity between the host CRISPR system and invadingnucleic acids [13 14] The hallmarks of CRISPR regions arespacers generally derived from foreign nucleic acids includ-ing plasmid and viral DNA and short palindromic repeatsequences between each spacer (reviewed in [15 16]) Dif-ferent strains of the same species can have highly divergentCRISPR regions (eg [17]) A highly genomically resolvedstudy of virus-host dynamics in an archaea-dominated acid-mine drainage system demonstrated that only the mostrecently acquiredCRISPR spacersmatched coexisting virusesand showed that viruses rapidly recombined to evadeCRISPRtargeting indicating that community stability was achievedby the rapid coevolution of host resistance to viruses andviral resistance to the host CRISPR system [9] ArchaealCRISPR dynamics have also been investigated in Sulfolobusislandicus populations [18 19] indicating clear biogeographyof viral populations and adaptation of CRISPR sequences tolocal viral populations Whether similar dynamics occur inarchaea-dominated hypersaline systems is not well under-stood

Previously our group tracked the dynamics of 35 viralpopulations in eight viral concentrates (representing the30 kDandash01120583m size fraction) collected during three summersfrom archaea-dominated hypersaline Lake Tyrrell (LT)Victoria Australia [7] We demonstrated that viruses in theLT system were generally stable on the timescale of days anddynamic over years In this study we sought to expand ouranalyses to include LT viruses from metagenomic librariesgenerated from 01 08 and 30 120583m filters potentiallyincluding proviruses viruses larger than 01 120583m activelyinfecting viruses and viruses otherwise retained on the filtersThis allowed us to increase the temporal and spatial scopeof our study to 17 samples including four winter samplesfrom which viral concentrate DNA was not sequencedTo give context to the current and previous viral analysesfrom LT and to test the prevailing theory that microbialtaxa are stable at the species level in archaea-dominatedhypersaline systems (presented in [6]) in this study we also

characterized archaeal and bacterial dynamics through 16SrRNA gene analyses and we used CRISPR analyses to assistin the interpretation of the results

2 Materials and Methods

21 Sample Collection and Preparation Sample collectionDNA extraction and sequencing methods have beendescribed previously [7 20 21] Briefly 10 L surface watersamples were collected from LT and sequentially filteredthrough 20 30 08 and 01 120583m filters Post-01120583m filtrateswere concentrated through tangential flow filtration andretained for viral DNA extraction Viral concentrates and01 08 and 30 120583m filters were retained for each sampleand sequencing was undertaken for different size rangesdepending on the sample (Table 1) Sample names includethe month (J for January A for August) year site (A or Bsim300m apart) and time point if the sample was part of adays-scale time series (eg t1 t2 etc) Where necessary thesize fraction is also indicated in the sample name (30 08or 01 for filter size in 120583m or VC for viral concentrate) SitesA and B are isolated pools in the summer (January samples)but continuous with the lake in the winter (August samples)GPS coordinates for sites A and B are 35∘ 191015840 09610158401015840 S 142∘ 47101584059710158401015840 E and 35∘ 191015840 187110158401015840 S 142∘ 481015840 42310158401015840 E respectively

22 Recovery of 140 Viral Contigs gt10 kb and Detection ofViruses in Each Sample In addition to the 35 LT viral andvirus-like (meaning virus or plasmid) populations that wedescribed previously [7] we incorporated all contigs gt10 kbfrom a new IDBA UD [22] assembly of the six Illumina-sequenced viral concentrate samples (default parameters)We also sought to include as many assembled viral sequencesas possible from libraries from larger size-fraction filtersTo do that we first attempted to assemble reads from allIllumina-sequenced filter samples and reads from at leastone 08 120583m filter per sample (regardless of sequencing type)using IDBA UD with default parameters [22] for Illumina-sequenced samples and gsAssembler [23]with default param-eters for 454-sequenced samples The assemblies were gen-erally fragmentary and as such no viral contigs larger than10 kb were identified from most assemblies However werecovered viral contigs gt10 kb from metagenomic assembliesof all 01120583m filters from January 2010 identified throughannotation that could be confidently assigned to viruses orproviruses (eg contigs that included viral capsid proteinstail proteins andor terminases) Annotation parameterswere the same as those described previously [7] We usedBLASTn to identify duplicate viral sequences across assem-blies and samples and we removed any contigs that sharedgt2 kb at gt95nt identity (all contigs that were removedactually shared ge99nt identity because no contigs sharedgt2 kb at 88ndash98nt identity)The remaining 140 viruses shareup to 2 kb at up to 87nt identity with some smaller sharedregions at higher identity

To determine whether or not a given virus was presentin a given sample we used the 140 viral contig sequencesgt10 kb as references for fragment recruitment (ie recruit-ment of Illumina sequencing reads or equivalent 100 bp read

Archaea 3

Table 1 Sequencing and sample information all filter sizes and viral concentrates

Sample Date Time 119879 (∘C) pH TDS (wt) Filter size Sequencing technology Library type(s) Reads

J2007At1 Jan 23 2007 1500 22 72 3101 Sanger 8ndash10 kb 65056608 Sanger Illumina fosmid 8ndash10 kb and 100 cycles SR 22192398VC Illumina 100 cycles PE 2436330

J2007At2 Jan 25 2007 1500 28 71 3101 Sanger 8ndash10 kb 90514208 Sanger fosmid 8ndash10 kb 832880VC Illumina 100 cycles PE 7330099

A2007At1 Aug 7 2007 1400 24 nm 2501 454 SR 555898208 454 SR 53335323 Illumina 100 cycles SR 1297810

A2007At2 Aug 9 2007 1030 23 nm 2501 Sanger 8ndash10 kb 1260976608 Sanger 8ndash10 kb 7463943 Illumina 100 cycles SR 3949427

A2008At1 Aug 11 2008 1100 12 72 25 3 Illumina 100 cycles SR 15786056A2008At2 Aug 12 2008 1045 11 nm 25 08 454 SR and PE 10359280

J2009At1 Jan 3 2009 1145 20 7 28 01 454 SR and PE 630328308 454 SR 5920276

J2009At2 Jan 7 2009 1500 27 69 27 01 454 SR 551994608 454 SR 6181544

J2009Bt1 Jan 5 2009 721 18 69 24 08 454 Illumina SR 100 cycles SR 6844436J2009Bt2 Jan 5 2009 1237 30 71 26 08 454 SR 7372159J2009Bt3 Jan 5 2009 1800 36 7 27 08 454 SR 7546428J2009Blowast Jan 5 2009 VC Illumina 100 cycles PE 19567468

J2010Bt1 Jan 7 2010 745 20 72 32 01 Illumina 100 cycles PE 13213244VC 454 SR 2373021

J2010Bt2 Jan 7 2010 2000 32 73 3601 Illumina 100 cycles PE 273006343 Illumina 100 cycles PE 23375315VC Illumina 100 cycles PE 3312787

J2010Bt3 Jan 8 2010 800 21 72 3401 Illumina 100 cycles PE 3828796808 Illumina 100 cycles PE 13808599VC 454 SR 2243916

J2010Bt35lowastlowast Jan 9 2010 1615 45 71 27 01 Illumina 100 cycles PE 21747692

J2010Bt4lowastlowast Jan 10 2010 1250 33 72 32 3 Illumina 100 cycles PE 15465664VC Illumina 100 cycles PE 9610233

J2010A Jan 10 2010 1250 37 71 3501 Illumina 100 cycles PE 525203283 Illumina 100 cycles PE 6854358VC Illumina 100 cycles PE 9268384

VC viral concentrateNm not measuredPE paired-end sequencingSR single-read sequencinglowastVC from J2009B is a pool of DNA from three viral concentrates collected throughout a single daylowastlowastTomaintain consistent sample naming with previous publications we are retaining sample names that were based on consecutive viral concentrate samplesSample 35 was actually collected fourth in the series and sample 4 was fifth

fragments from other sequencing technologies as describedin [7]) using the Burrows-Wheeler Aligner (bwa) withdefault parameters [24] We required at least 1x read coverageacross at least 50 of a given reference contig sequence fordetection For hierarchical clustering (Pearson correlationaverage linkage clustering) the number of reads that mappedto a given viral contig sequence in a given sample was

normalized by the length of the viral sequence and thenumber of reads in the sample as described previously [7]

23 Generation of 16S rRNA Gene Data and Calculations ofHost Relative Abundance To generate a reference databaseof 16S rRNA gene sequences we used the EMIRGE algo-rithm [25] to reconstruct near full-length 16S rRNA genes

4 Archaea

from Illumina metagenomic data All 01 and 08 120583m filtersfrom which DNA was Illumina sequenced and from whichviral concentrate DNA was also sequenced from the samesample underwent EMIRGE analysis in order to generate areference 16S rRNA gene database for the LT system Thefollowing filter sample metagenomes were EMIRGE ana-lyzed 2007At1 (08120583m) 2010Bt3 (08 120583m) 2010Bt1 (01 120583m)2010Bt2 (01120583m) 2010Bt3 (01 120583m) and 2010A (01 120583m) Weclustered all EMIRGE-generated 16S rRNA gene sequencesat 97 nt identity using UCLUST [26] resulting in adatabase of 101 16S rRNA gene sequences Taxonomy wasassigned to theseOTUs using the SILVA Incremental Aligner(SINA) [27] on the SILVA website [28 29] Using bwa withdefault parameters [24] wemappedmetagenomic reads (splitinto 100 bp lengths to limit biases associated with differentsequencing technologies as described above and in [7]) fromall filter samples to the reference database of 101 OTUsin order to generate relative abundance estimates for eachOTU across samples For a given OTU to be detected in agiven library we required ge1x coverage across ge70 of thelength of the EMIRGE-generated 16S rRNA gene sequenceIn order to account for differences in sequencing throughputwe used relative abundances that is we estimated the percentabundance of each OTU in each sample as the number ofreads that mapped to that OTU divided by the total numberof reads that mapped to any OTU in that sample times100 Those relative abundances were used for hierarchicalclustering (Pearson correlation average linkage clustering)and appear in Table S1 (see Supplementary Material availableonline at httpdxdoiorg1011552013370871)

These sequences (140 viral contigs gt10 kb and 101 16SrRNA gene sequences) have been submitted to NCBI underBioProject accession no PRJNA81851

24 Correlation with CRISPR Spacers We used Crass [30]to identify clustered regularly interspaced short palindromicrepeat (CRISPR) repeat and spacer sequences in each sampleFor this analysis we considered all sequenced filter sizesfor a given water sample together (ie 01 08 and 30 120583mfilters) Using BLASTn with an 119864-value cutoff of 1119890 minus 10 weassessed the number of CRISPR spacers from each samplethat matched (1) any of the 140 viral contig sequences gt10 kb(2) reads from viral concentrates collected from the samesample (applicable only to the eight samples from whichviral concentrates were sequenced) and (3) reads from viralconcentrates collected from other samples

3 Results and Discussion

31 Relative Abundances of Viral Populations across Size Frac-tions Contigs gt10 kb from 105 new viruses were reconstru-cted increasing the number of genomically characterizedviruses from Lake Tyrrell (LT) Victoria Australia from 35to 140 The 140 contigs including seven previously reportedcomplete genomes range in size from 10050 to 93283 bpWeanalyzed the relative abundances of these 140 viral genotypesin metagenomic libraries from viral concentrates and filtersize fractions (01ndash08 08ndash30 and 30ndash20 120583m) across timeand between locations in LT As few as 42 and as many as 116

viruses were detected in a given sample (any size fraction)and more viruses were detected in samples from which viralconcentrates were sequencedThough we acknowledge that acomparison of 16S rRNA genemicrobial OTUs to gt10 kb viralcontig OTUs is an imperfect proxy for comparing the diver-sity of these groups we find approximately 10 viral popula-tions in the viral concentrate size fraction per host OTU (anyfilter size fraction) in most samples suggesting that plank-tonic virus diversity and host diversity scale approximatelywith abundance which has previously been established to beapproximately 10 1 in most environments (eg [31])

In an attempt to distinguish among free (planktonic)viruses physically trapped on filters and host-associatedviruses (ie active viruses andor proviruses) we assumedthat the 08 and 30 120583m filter pore sizes would be too largeto retain significant numbers of viral particles and shouldpredominantly include host-associated viruses We assumedthat the 01120583m filters could retain both host-associated andplanktonic viruses and that the viral concentrates (30 kDandash01 120583m size fraction) should generally exclude host cells andbe dominated by planktonic viruses Therefore we consid-ered viral detection in libraries from three filter size groupsseparately (1) viral concentrates (2) 01 120583m filters and (3)08 andor 30 120583m filters For five samples from which at leastone library from each of those groups was sequenced thenumber of viruses detected only in the viral concentrateswas always higher than the number detected only on filters(Figure 1(a))That trend is robust to the addition of twomoresamples from which libraries from viral concentrates andat least one 08 andor 30 120583m filter sample were sequenced(Figure 1(b)) Although we detected more unique viruses inthe viral concentrates the ratio of total viruses detected on 08or 30 120583m filters relative to total viruses detected in the viralconcentrates tended to be approximately equal (Figure 1(c))meaning that the richness of viruses in viral concentratestended to be similar to viral richness in the host-associatedsize fractions

We used hierarchical clustering (Figure 2(a)) to deter-mine whether patterns in the relative abundances of the 140viruses (ie the 140 viral contigs gt10 kb) could be observedaccording to season filter size andor sample site No uni-versal patterns were observed but there was some clusteringaccording to filter type andor for samples collected fromthe same site during the same week Specifically all fourviral concentrate samples from January 2010 site B clusteredtogether and they clustered more closely with all of the01 120583m filters from the same time series than with other viralconcentrate samples Two additional viral concentrate sam-ples collected during the same season but at different sitesand years (J2007At1 and J2009B) complete a larger clusterfor that group indicating similarity among planktonic viralfractions relative to viruses on larger filter sizes Interestinglythe remaining two viral concentrate samples (the only samplecollected in January 2010 from site A and another samplefrom site A collected in January 2007) clustered separatelyfrom each other and from the rest of the dataset Overall thissuggests that viral concentrates represent a different part ofthe viral community than is sampled by other size fractions

Archaea 5

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only01 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J201

0Bt2

J201

0Bt3

J201

0A(a)

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

05

1

15

2

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

Det

ecte

d vi

rus r

atio

(fil

ter

VC

)

(c)

Figure 1

particularly fractions gt08120583m Interestingly this is despitethe similarity in richness described above (ie the numberof viral OTUs remains relatively similar across samples butthe composition differs in viral concentrates relative to host-associated size fractions)

Although the most obvious trend in the viral assemblagehierarchical clustering analysis was the separation of viralconcentrates from other filter sizes described above someclustering was observed for 01 08 and 30 120583m filter sizefractions Specifically all four filter samples collected fromJanuary 2009 site A cluster together as do all of the 08 and30 120583m filters from January 2010 site B suggesting stability ofthe active viral andor proviral assemblages over four daysin both cases Although the 01 and 08 120583m filter samplescollected in August 2007 from site A time 1 cluster togetherthey are separate from filters of the same size collected twodays later suggesting a shift in the active viral assemblageover days or possibly the induction of proviruses on thattimescale Together these data suggest that although someturnover in active viruses andor proviruses was observedmost active viruses andor proviruses were stable within thearchaea-dominated LT system over days

32 Archaeal and Bacterial OTUs We also characterized therelative abundances of potential host OTUs across time andbetween sites in the LT system Of 101 total archaeal andbacterial 16S rRNA gene OTUs at 97 nt identity 29 weredetected at 5 abundance or higher on any filter in anysample For easier visualization Figure 3 shows only those29 OTUs and it includes only samples from which at leastfive of those OTUs were detected

In the filter size-resolved plots (Figures 3(a)ndash3(c)) it isclear that even the most abundant archaeal groups changesignificantly over time and space in terms of both pres-enceabsence and relative abundance For example the rela-tive proportions ofHalorubrum-like andHaloquadratum-likeOTUs tend to differ significantly across samples from almostexclusively Halorubrum-like organisms (eg in the August2007 time series) to almost exclusively Haloquadratum-likeorganisms (eg in the January 2009 site A time seriesand in January 2010 site A) Some changes in the relativeabundances of these groups can even be observed overhours on 08120583m filters from the January 2009 site B timeseries (Figure 3(b)) A significant change in temperature wasmeasured between the first and second samples of the January

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 Archaea

four aquatic environments including an archaea-dominatedhypersaline crystallizer pond [6] In that work it was pro-posed that microorganisms and viruses persisted over time atthe level of individual taxa (species) but were highly dynamicat the genotype (strain) level However in a reanalysis of someof those data by our group using metagenomic assembly weconcluded that viruses were actually dynamic at the popula-tion (taxon) level in that system [7] This result suggests thatfurther analyses are necessary to determine whether archaealpopulations tend to be dynamic or stable in hypersalinesystems on short timescales

Of the relatively few metagenomic analyses of virus-hostdynamics that have been reported several have consideredthe clustered regularly interspaced short palindromic repeat(CRISPR) system which provides an opportunity to studyhostsrsquo responses to viral predation and to link viruses to hosts[8ndash12] The CRISPR system is a genomic region in nearly allarchaea and some bacteria and CRISPRs (at least in all sys-tems that have been biochemically characterized to date) havebeen shown to confer adaptive immunity to viruses andorother mobile genetic elements through nucleotide sequenceidentity between the host CRISPR system and invadingnucleic acids [13 14] The hallmarks of CRISPR regions arespacers generally derived from foreign nucleic acids includ-ing plasmid and viral DNA and short palindromic repeatsequences between each spacer (reviewed in [15 16]) Dif-ferent strains of the same species can have highly divergentCRISPR regions (eg [17]) A highly genomically resolvedstudy of virus-host dynamics in an archaea-dominated acid-mine drainage system demonstrated that only the mostrecently acquiredCRISPR spacersmatched coexisting virusesand showed that viruses rapidly recombined to evadeCRISPRtargeting indicating that community stability was achievedby the rapid coevolution of host resistance to viruses andviral resistance to the host CRISPR system [9] ArchaealCRISPR dynamics have also been investigated in Sulfolobusislandicus populations [18 19] indicating clear biogeographyof viral populations and adaptation of CRISPR sequences tolocal viral populations Whether similar dynamics occur inarchaea-dominated hypersaline systems is not well under-stood

Previously our group tracked the dynamics of 35 viralpopulations in eight viral concentrates (representing the30 kDandash01120583m size fraction) collected during three summersfrom archaea-dominated hypersaline Lake Tyrrell (LT)Victoria Australia [7] We demonstrated that viruses in theLT system were generally stable on the timescale of days anddynamic over years In this study we sought to expand ouranalyses to include LT viruses from metagenomic librariesgenerated from 01 08 and 30 120583m filters potentiallyincluding proviruses viruses larger than 01 120583m activelyinfecting viruses and viruses otherwise retained on the filtersThis allowed us to increase the temporal and spatial scopeof our study to 17 samples including four winter samplesfrom which viral concentrate DNA was not sequencedTo give context to the current and previous viral analysesfrom LT and to test the prevailing theory that microbialtaxa are stable at the species level in archaea-dominatedhypersaline systems (presented in [6]) in this study we also

characterized archaeal and bacterial dynamics through 16SrRNA gene analyses and we used CRISPR analyses to assistin the interpretation of the results

2 Materials and Methods

21 Sample Collection and Preparation Sample collectionDNA extraction and sequencing methods have beendescribed previously [7 20 21] Briefly 10 L surface watersamples were collected from LT and sequentially filteredthrough 20 30 08 and 01 120583m filters Post-01120583m filtrateswere concentrated through tangential flow filtration andretained for viral DNA extraction Viral concentrates and01 08 and 30 120583m filters were retained for each sampleand sequencing was undertaken for different size rangesdepending on the sample (Table 1) Sample names includethe month (J for January A for August) year site (A or Bsim300m apart) and time point if the sample was part of adays-scale time series (eg t1 t2 etc) Where necessary thesize fraction is also indicated in the sample name (30 08or 01 for filter size in 120583m or VC for viral concentrate) SitesA and B are isolated pools in the summer (January samples)but continuous with the lake in the winter (August samples)GPS coordinates for sites A and B are 35∘ 191015840 09610158401015840 S 142∘ 47101584059710158401015840 E and 35∘ 191015840 187110158401015840 S 142∘ 481015840 42310158401015840 E respectively

22 Recovery of 140 Viral Contigs gt10 kb and Detection ofViruses in Each Sample In addition to the 35 LT viral andvirus-like (meaning virus or plasmid) populations that wedescribed previously [7] we incorporated all contigs gt10 kbfrom a new IDBA UD [22] assembly of the six Illumina-sequenced viral concentrate samples (default parameters)We also sought to include as many assembled viral sequencesas possible from libraries from larger size-fraction filtersTo do that we first attempted to assemble reads from allIllumina-sequenced filter samples and reads from at leastone 08 120583m filter per sample (regardless of sequencing type)using IDBA UD with default parameters [22] for Illumina-sequenced samples and gsAssembler [23]with default param-eters for 454-sequenced samples The assemblies were gen-erally fragmentary and as such no viral contigs larger than10 kb were identified from most assemblies However werecovered viral contigs gt10 kb from metagenomic assembliesof all 01120583m filters from January 2010 identified throughannotation that could be confidently assigned to viruses orproviruses (eg contigs that included viral capsid proteinstail proteins andor terminases) Annotation parameterswere the same as those described previously [7] We usedBLASTn to identify duplicate viral sequences across assem-blies and samples and we removed any contigs that sharedgt2 kb at gt95nt identity (all contigs that were removedactually shared ge99nt identity because no contigs sharedgt2 kb at 88ndash98nt identity)The remaining 140 viruses shareup to 2 kb at up to 87nt identity with some smaller sharedregions at higher identity

To determine whether or not a given virus was presentin a given sample we used the 140 viral contig sequencesgt10 kb as references for fragment recruitment (ie recruit-ment of Illumina sequencing reads or equivalent 100 bp read

Archaea 3

Table 1 Sequencing and sample information all filter sizes and viral concentrates

Sample Date Time 119879 (∘C) pH TDS (wt) Filter size Sequencing technology Library type(s) Reads

J2007At1 Jan 23 2007 1500 22 72 3101 Sanger 8ndash10 kb 65056608 Sanger Illumina fosmid 8ndash10 kb and 100 cycles SR 22192398VC Illumina 100 cycles PE 2436330

J2007At2 Jan 25 2007 1500 28 71 3101 Sanger 8ndash10 kb 90514208 Sanger fosmid 8ndash10 kb 832880VC Illumina 100 cycles PE 7330099

A2007At1 Aug 7 2007 1400 24 nm 2501 454 SR 555898208 454 SR 53335323 Illumina 100 cycles SR 1297810

A2007At2 Aug 9 2007 1030 23 nm 2501 Sanger 8ndash10 kb 1260976608 Sanger 8ndash10 kb 7463943 Illumina 100 cycles SR 3949427

A2008At1 Aug 11 2008 1100 12 72 25 3 Illumina 100 cycles SR 15786056A2008At2 Aug 12 2008 1045 11 nm 25 08 454 SR and PE 10359280

J2009At1 Jan 3 2009 1145 20 7 28 01 454 SR and PE 630328308 454 SR 5920276

J2009At2 Jan 7 2009 1500 27 69 27 01 454 SR 551994608 454 SR 6181544

J2009Bt1 Jan 5 2009 721 18 69 24 08 454 Illumina SR 100 cycles SR 6844436J2009Bt2 Jan 5 2009 1237 30 71 26 08 454 SR 7372159J2009Bt3 Jan 5 2009 1800 36 7 27 08 454 SR 7546428J2009Blowast Jan 5 2009 VC Illumina 100 cycles PE 19567468

J2010Bt1 Jan 7 2010 745 20 72 32 01 Illumina 100 cycles PE 13213244VC 454 SR 2373021

J2010Bt2 Jan 7 2010 2000 32 73 3601 Illumina 100 cycles PE 273006343 Illumina 100 cycles PE 23375315VC Illumina 100 cycles PE 3312787

J2010Bt3 Jan 8 2010 800 21 72 3401 Illumina 100 cycles PE 3828796808 Illumina 100 cycles PE 13808599VC 454 SR 2243916

J2010Bt35lowastlowast Jan 9 2010 1615 45 71 27 01 Illumina 100 cycles PE 21747692

J2010Bt4lowastlowast Jan 10 2010 1250 33 72 32 3 Illumina 100 cycles PE 15465664VC Illumina 100 cycles PE 9610233

J2010A Jan 10 2010 1250 37 71 3501 Illumina 100 cycles PE 525203283 Illumina 100 cycles PE 6854358VC Illumina 100 cycles PE 9268384

VC viral concentrateNm not measuredPE paired-end sequencingSR single-read sequencinglowastVC from J2009B is a pool of DNA from three viral concentrates collected throughout a single daylowastlowastTomaintain consistent sample naming with previous publications we are retaining sample names that were based on consecutive viral concentrate samplesSample 35 was actually collected fourth in the series and sample 4 was fifth

fragments from other sequencing technologies as describedin [7]) using the Burrows-Wheeler Aligner (bwa) withdefault parameters [24] We required at least 1x read coverageacross at least 50 of a given reference contig sequence fordetection For hierarchical clustering (Pearson correlationaverage linkage clustering) the number of reads that mappedto a given viral contig sequence in a given sample was

normalized by the length of the viral sequence and thenumber of reads in the sample as described previously [7]

23 Generation of 16S rRNA Gene Data and Calculations ofHost Relative Abundance To generate a reference databaseof 16S rRNA gene sequences we used the EMIRGE algo-rithm [25] to reconstruct near full-length 16S rRNA genes

4 Archaea

from Illumina metagenomic data All 01 and 08 120583m filtersfrom which DNA was Illumina sequenced and from whichviral concentrate DNA was also sequenced from the samesample underwent EMIRGE analysis in order to generate areference 16S rRNA gene database for the LT system Thefollowing filter sample metagenomes were EMIRGE ana-lyzed 2007At1 (08120583m) 2010Bt3 (08 120583m) 2010Bt1 (01 120583m)2010Bt2 (01120583m) 2010Bt3 (01 120583m) and 2010A (01 120583m) Weclustered all EMIRGE-generated 16S rRNA gene sequencesat 97 nt identity using UCLUST [26] resulting in adatabase of 101 16S rRNA gene sequences Taxonomy wasassigned to theseOTUs using the SILVA Incremental Aligner(SINA) [27] on the SILVA website [28 29] Using bwa withdefault parameters [24] wemappedmetagenomic reads (splitinto 100 bp lengths to limit biases associated with differentsequencing technologies as described above and in [7]) fromall filter samples to the reference database of 101 OTUsin order to generate relative abundance estimates for eachOTU across samples For a given OTU to be detected in agiven library we required ge1x coverage across ge70 of thelength of the EMIRGE-generated 16S rRNA gene sequenceIn order to account for differences in sequencing throughputwe used relative abundances that is we estimated the percentabundance of each OTU in each sample as the number ofreads that mapped to that OTU divided by the total numberof reads that mapped to any OTU in that sample times100 Those relative abundances were used for hierarchicalclustering (Pearson correlation average linkage clustering)and appear in Table S1 (see Supplementary Material availableonline at httpdxdoiorg1011552013370871)

These sequences (140 viral contigs gt10 kb and 101 16SrRNA gene sequences) have been submitted to NCBI underBioProject accession no PRJNA81851

24 Correlation with CRISPR Spacers We used Crass [30]to identify clustered regularly interspaced short palindromicrepeat (CRISPR) repeat and spacer sequences in each sampleFor this analysis we considered all sequenced filter sizesfor a given water sample together (ie 01 08 and 30 120583mfilters) Using BLASTn with an 119864-value cutoff of 1119890 minus 10 weassessed the number of CRISPR spacers from each samplethat matched (1) any of the 140 viral contig sequences gt10 kb(2) reads from viral concentrates collected from the samesample (applicable only to the eight samples from whichviral concentrates were sequenced) and (3) reads from viralconcentrates collected from other samples

3 Results and Discussion

31 Relative Abundances of Viral Populations across Size Frac-tions Contigs gt10 kb from 105 new viruses were reconstru-cted increasing the number of genomically characterizedviruses from Lake Tyrrell (LT) Victoria Australia from 35to 140 The 140 contigs including seven previously reportedcomplete genomes range in size from 10050 to 93283 bpWeanalyzed the relative abundances of these 140 viral genotypesin metagenomic libraries from viral concentrates and filtersize fractions (01ndash08 08ndash30 and 30ndash20 120583m) across timeand between locations in LT As few as 42 and as many as 116

viruses were detected in a given sample (any size fraction)and more viruses were detected in samples from which viralconcentrates were sequencedThough we acknowledge that acomparison of 16S rRNA genemicrobial OTUs to gt10 kb viralcontig OTUs is an imperfect proxy for comparing the diver-sity of these groups we find approximately 10 viral popula-tions in the viral concentrate size fraction per host OTU (anyfilter size fraction) in most samples suggesting that plank-tonic virus diversity and host diversity scale approximatelywith abundance which has previously been established to beapproximately 10 1 in most environments (eg [31])

In an attempt to distinguish among free (planktonic)viruses physically trapped on filters and host-associatedviruses (ie active viruses andor proviruses) we assumedthat the 08 and 30 120583m filter pore sizes would be too largeto retain significant numbers of viral particles and shouldpredominantly include host-associated viruses We assumedthat the 01120583m filters could retain both host-associated andplanktonic viruses and that the viral concentrates (30 kDandash01 120583m size fraction) should generally exclude host cells andbe dominated by planktonic viruses Therefore we consid-ered viral detection in libraries from three filter size groupsseparately (1) viral concentrates (2) 01 120583m filters and (3)08 andor 30 120583m filters For five samples from which at leastone library from each of those groups was sequenced thenumber of viruses detected only in the viral concentrateswas always higher than the number detected only on filters(Figure 1(a))That trend is robust to the addition of twomoresamples from which libraries from viral concentrates andat least one 08 andor 30 120583m filter sample were sequenced(Figure 1(b)) Although we detected more unique viruses inthe viral concentrates the ratio of total viruses detected on 08or 30 120583m filters relative to total viruses detected in the viralconcentrates tended to be approximately equal (Figure 1(c))meaning that the richness of viruses in viral concentratestended to be similar to viral richness in the host-associatedsize fractions

We used hierarchical clustering (Figure 2(a)) to deter-mine whether patterns in the relative abundances of the 140viruses (ie the 140 viral contigs gt10 kb) could be observedaccording to season filter size andor sample site No uni-versal patterns were observed but there was some clusteringaccording to filter type andor for samples collected fromthe same site during the same week Specifically all fourviral concentrate samples from January 2010 site B clusteredtogether and they clustered more closely with all of the01 120583m filters from the same time series than with other viralconcentrate samples Two additional viral concentrate sam-ples collected during the same season but at different sitesand years (J2007At1 and J2009B) complete a larger clusterfor that group indicating similarity among planktonic viralfractions relative to viruses on larger filter sizes Interestinglythe remaining two viral concentrate samples (the only samplecollected in January 2010 from site A and another samplefrom site A collected in January 2007) clustered separatelyfrom each other and from the rest of the dataset Overall thissuggests that viral concentrates represent a different part ofthe viral community than is sampled by other size fractions

Archaea 5

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only01 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J201

0Bt2

J201

0Bt3

J201

0A(a)

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

05

1

15

2

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

Det

ecte

d vi

rus r

atio

(fil

ter

VC

)

(c)

Figure 1

particularly fractions gt08120583m Interestingly this is despitethe similarity in richness described above (ie the numberof viral OTUs remains relatively similar across samples butthe composition differs in viral concentrates relative to host-associated size fractions)

Although the most obvious trend in the viral assemblagehierarchical clustering analysis was the separation of viralconcentrates from other filter sizes described above someclustering was observed for 01 08 and 30 120583m filter sizefractions Specifically all four filter samples collected fromJanuary 2009 site A cluster together as do all of the 08 and30 120583m filters from January 2010 site B suggesting stability ofthe active viral andor proviral assemblages over four daysin both cases Although the 01 and 08 120583m filter samplescollected in August 2007 from site A time 1 cluster togetherthey are separate from filters of the same size collected twodays later suggesting a shift in the active viral assemblageover days or possibly the induction of proviruses on thattimescale Together these data suggest that although someturnover in active viruses andor proviruses was observedmost active viruses andor proviruses were stable within thearchaea-dominated LT system over days

32 Archaeal and Bacterial OTUs We also characterized therelative abundances of potential host OTUs across time andbetween sites in the LT system Of 101 total archaeal andbacterial 16S rRNA gene OTUs at 97 nt identity 29 weredetected at 5 abundance or higher on any filter in anysample For easier visualization Figure 3 shows only those29 OTUs and it includes only samples from which at leastfive of those OTUs were detected

In the filter size-resolved plots (Figures 3(a)ndash3(c)) it isclear that even the most abundant archaeal groups changesignificantly over time and space in terms of both pres-enceabsence and relative abundance For example the rela-tive proportions ofHalorubrum-like andHaloquadratum-likeOTUs tend to differ significantly across samples from almostexclusively Halorubrum-like organisms (eg in the August2007 time series) to almost exclusively Haloquadratum-likeorganisms (eg in the January 2009 site A time seriesand in January 2010 site A) Some changes in the relativeabundances of these groups can even be observed overhours on 08120583m filters from the January 2009 site B timeseries (Figure 3(b)) A significant change in temperature wasmeasured between the first and second samples of the January

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Archaea 3

Table 1 Sequencing and sample information all filter sizes and viral concentrates

Sample Date Time 119879 (∘C) pH TDS (wt) Filter size Sequencing technology Library type(s) Reads

J2007At1 Jan 23 2007 1500 22 72 3101 Sanger 8ndash10 kb 65056608 Sanger Illumina fosmid 8ndash10 kb and 100 cycles SR 22192398VC Illumina 100 cycles PE 2436330

J2007At2 Jan 25 2007 1500 28 71 3101 Sanger 8ndash10 kb 90514208 Sanger fosmid 8ndash10 kb 832880VC Illumina 100 cycles PE 7330099

A2007At1 Aug 7 2007 1400 24 nm 2501 454 SR 555898208 454 SR 53335323 Illumina 100 cycles SR 1297810

A2007At2 Aug 9 2007 1030 23 nm 2501 Sanger 8ndash10 kb 1260976608 Sanger 8ndash10 kb 7463943 Illumina 100 cycles SR 3949427

A2008At1 Aug 11 2008 1100 12 72 25 3 Illumina 100 cycles SR 15786056A2008At2 Aug 12 2008 1045 11 nm 25 08 454 SR and PE 10359280

J2009At1 Jan 3 2009 1145 20 7 28 01 454 SR and PE 630328308 454 SR 5920276

J2009At2 Jan 7 2009 1500 27 69 27 01 454 SR 551994608 454 SR 6181544

J2009Bt1 Jan 5 2009 721 18 69 24 08 454 Illumina SR 100 cycles SR 6844436J2009Bt2 Jan 5 2009 1237 30 71 26 08 454 SR 7372159J2009Bt3 Jan 5 2009 1800 36 7 27 08 454 SR 7546428J2009Blowast Jan 5 2009 VC Illumina 100 cycles PE 19567468

J2010Bt1 Jan 7 2010 745 20 72 32 01 Illumina 100 cycles PE 13213244VC 454 SR 2373021

J2010Bt2 Jan 7 2010 2000 32 73 3601 Illumina 100 cycles PE 273006343 Illumina 100 cycles PE 23375315VC Illumina 100 cycles PE 3312787

J2010Bt3 Jan 8 2010 800 21 72 3401 Illumina 100 cycles PE 3828796808 Illumina 100 cycles PE 13808599VC 454 SR 2243916

J2010Bt35lowastlowast Jan 9 2010 1615 45 71 27 01 Illumina 100 cycles PE 21747692

J2010Bt4lowastlowast Jan 10 2010 1250 33 72 32 3 Illumina 100 cycles PE 15465664VC Illumina 100 cycles PE 9610233

J2010A Jan 10 2010 1250 37 71 3501 Illumina 100 cycles PE 525203283 Illumina 100 cycles PE 6854358VC Illumina 100 cycles PE 9268384

VC viral concentrateNm not measuredPE paired-end sequencingSR single-read sequencinglowastVC from J2009B is a pool of DNA from three viral concentrates collected throughout a single daylowastlowastTomaintain consistent sample naming with previous publications we are retaining sample names that were based on consecutive viral concentrate samplesSample 35 was actually collected fourth in the series and sample 4 was fifth

fragments from other sequencing technologies as describedin [7]) using the Burrows-Wheeler Aligner (bwa) withdefault parameters [24] We required at least 1x read coverageacross at least 50 of a given reference contig sequence fordetection For hierarchical clustering (Pearson correlationaverage linkage clustering) the number of reads that mappedto a given viral contig sequence in a given sample was

normalized by the length of the viral sequence and thenumber of reads in the sample as described previously [7]

23 Generation of 16S rRNA Gene Data and Calculations ofHost Relative Abundance To generate a reference databaseof 16S rRNA gene sequences we used the EMIRGE algo-rithm [25] to reconstruct near full-length 16S rRNA genes

4 Archaea

from Illumina metagenomic data All 01 and 08 120583m filtersfrom which DNA was Illumina sequenced and from whichviral concentrate DNA was also sequenced from the samesample underwent EMIRGE analysis in order to generate areference 16S rRNA gene database for the LT system Thefollowing filter sample metagenomes were EMIRGE ana-lyzed 2007At1 (08120583m) 2010Bt3 (08 120583m) 2010Bt1 (01 120583m)2010Bt2 (01120583m) 2010Bt3 (01 120583m) and 2010A (01 120583m) Weclustered all EMIRGE-generated 16S rRNA gene sequencesat 97 nt identity using UCLUST [26] resulting in adatabase of 101 16S rRNA gene sequences Taxonomy wasassigned to theseOTUs using the SILVA Incremental Aligner(SINA) [27] on the SILVA website [28 29] Using bwa withdefault parameters [24] wemappedmetagenomic reads (splitinto 100 bp lengths to limit biases associated with differentsequencing technologies as described above and in [7]) fromall filter samples to the reference database of 101 OTUsin order to generate relative abundance estimates for eachOTU across samples For a given OTU to be detected in agiven library we required ge1x coverage across ge70 of thelength of the EMIRGE-generated 16S rRNA gene sequenceIn order to account for differences in sequencing throughputwe used relative abundances that is we estimated the percentabundance of each OTU in each sample as the number ofreads that mapped to that OTU divided by the total numberof reads that mapped to any OTU in that sample times100 Those relative abundances were used for hierarchicalclustering (Pearson correlation average linkage clustering)and appear in Table S1 (see Supplementary Material availableonline at httpdxdoiorg1011552013370871)

These sequences (140 viral contigs gt10 kb and 101 16SrRNA gene sequences) have been submitted to NCBI underBioProject accession no PRJNA81851

24 Correlation with CRISPR Spacers We used Crass [30]to identify clustered regularly interspaced short palindromicrepeat (CRISPR) repeat and spacer sequences in each sampleFor this analysis we considered all sequenced filter sizesfor a given water sample together (ie 01 08 and 30 120583mfilters) Using BLASTn with an 119864-value cutoff of 1119890 minus 10 weassessed the number of CRISPR spacers from each samplethat matched (1) any of the 140 viral contig sequences gt10 kb(2) reads from viral concentrates collected from the samesample (applicable only to the eight samples from whichviral concentrates were sequenced) and (3) reads from viralconcentrates collected from other samples

3 Results and Discussion

31 Relative Abundances of Viral Populations across Size Frac-tions Contigs gt10 kb from 105 new viruses were reconstru-cted increasing the number of genomically characterizedviruses from Lake Tyrrell (LT) Victoria Australia from 35to 140 The 140 contigs including seven previously reportedcomplete genomes range in size from 10050 to 93283 bpWeanalyzed the relative abundances of these 140 viral genotypesin metagenomic libraries from viral concentrates and filtersize fractions (01ndash08 08ndash30 and 30ndash20 120583m) across timeand between locations in LT As few as 42 and as many as 116

viruses were detected in a given sample (any size fraction)and more viruses were detected in samples from which viralconcentrates were sequencedThough we acknowledge that acomparison of 16S rRNA genemicrobial OTUs to gt10 kb viralcontig OTUs is an imperfect proxy for comparing the diver-sity of these groups we find approximately 10 viral popula-tions in the viral concentrate size fraction per host OTU (anyfilter size fraction) in most samples suggesting that plank-tonic virus diversity and host diversity scale approximatelywith abundance which has previously been established to beapproximately 10 1 in most environments (eg [31])

In an attempt to distinguish among free (planktonic)viruses physically trapped on filters and host-associatedviruses (ie active viruses andor proviruses) we assumedthat the 08 and 30 120583m filter pore sizes would be too largeto retain significant numbers of viral particles and shouldpredominantly include host-associated viruses We assumedthat the 01120583m filters could retain both host-associated andplanktonic viruses and that the viral concentrates (30 kDandash01 120583m size fraction) should generally exclude host cells andbe dominated by planktonic viruses Therefore we consid-ered viral detection in libraries from three filter size groupsseparately (1) viral concentrates (2) 01 120583m filters and (3)08 andor 30 120583m filters For five samples from which at leastone library from each of those groups was sequenced thenumber of viruses detected only in the viral concentrateswas always higher than the number detected only on filters(Figure 1(a))That trend is robust to the addition of twomoresamples from which libraries from viral concentrates andat least one 08 andor 30 120583m filter sample were sequenced(Figure 1(b)) Although we detected more unique viruses inthe viral concentrates the ratio of total viruses detected on 08or 30 120583m filters relative to total viruses detected in the viralconcentrates tended to be approximately equal (Figure 1(c))meaning that the richness of viruses in viral concentratestended to be similar to viral richness in the host-associatedsize fractions

We used hierarchical clustering (Figure 2(a)) to deter-mine whether patterns in the relative abundances of the 140viruses (ie the 140 viral contigs gt10 kb) could be observedaccording to season filter size andor sample site No uni-versal patterns were observed but there was some clusteringaccording to filter type andor for samples collected fromthe same site during the same week Specifically all fourviral concentrate samples from January 2010 site B clusteredtogether and they clustered more closely with all of the01 120583m filters from the same time series than with other viralconcentrate samples Two additional viral concentrate sam-ples collected during the same season but at different sitesand years (J2007At1 and J2009B) complete a larger clusterfor that group indicating similarity among planktonic viralfractions relative to viruses on larger filter sizes Interestinglythe remaining two viral concentrate samples (the only samplecollected in January 2010 from site A and another samplefrom site A collected in January 2007) clustered separatelyfrom each other and from the rest of the dataset Overall thissuggests that viral concentrates represent a different part ofthe viral community than is sampled by other size fractions

Archaea 5

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only01 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J201

0Bt2

J201

0Bt3

J201

0A(a)

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

05

1

15

2

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

Det

ecte

d vi

rus r

atio

(fil

ter

VC

)

(c)

Figure 1

particularly fractions gt08120583m Interestingly this is despitethe similarity in richness described above (ie the numberof viral OTUs remains relatively similar across samples butthe composition differs in viral concentrates relative to host-associated size fractions)

Although the most obvious trend in the viral assemblagehierarchical clustering analysis was the separation of viralconcentrates from other filter sizes described above someclustering was observed for 01 08 and 30 120583m filter sizefractions Specifically all four filter samples collected fromJanuary 2009 site A cluster together as do all of the 08 and30 120583m filters from January 2010 site B suggesting stability ofthe active viral andor proviral assemblages over four daysin both cases Although the 01 and 08 120583m filter samplescollected in August 2007 from site A time 1 cluster togetherthey are separate from filters of the same size collected twodays later suggesting a shift in the active viral assemblageover days or possibly the induction of proviruses on thattimescale Together these data suggest that although someturnover in active viruses andor proviruses was observedmost active viruses andor proviruses were stable within thearchaea-dominated LT system over days

32 Archaeal and Bacterial OTUs We also characterized therelative abundances of potential host OTUs across time andbetween sites in the LT system Of 101 total archaeal andbacterial 16S rRNA gene OTUs at 97 nt identity 29 weredetected at 5 abundance or higher on any filter in anysample For easier visualization Figure 3 shows only those29 OTUs and it includes only samples from which at leastfive of those OTUs were detected

In the filter size-resolved plots (Figures 3(a)ndash3(c)) it isclear that even the most abundant archaeal groups changesignificantly over time and space in terms of both pres-enceabsence and relative abundance For example the rela-tive proportions ofHalorubrum-like andHaloquadratum-likeOTUs tend to differ significantly across samples from almostexclusively Halorubrum-like organisms (eg in the August2007 time series) to almost exclusively Haloquadratum-likeorganisms (eg in the January 2009 site A time seriesand in January 2010 site A) Some changes in the relativeabundances of these groups can even be observed overhours on 08120583m filters from the January 2009 site B timeseries (Figure 3(b)) A significant change in temperature wasmeasured between the first and second samples of the January

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 Archaea

from Illumina metagenomic data All 01 and 08 120583m filtersfrom which DNA was Illumina sequenced and from whichviral concentrate DNA was also sequenced from the samesample underwent EMIRGE analysis in order to generate areference 16S rRNA gene database for the LT system Thefollowing filter sample metagenomes were EMIRGE ana-lyzed 2007At1 (08120583m) 2010Bt3 (08 120583m) 2010Bt1 (01 120583m)2010Bt2 (01120583m) 2010Bt3 (01 120583m) and 2010A (01 120583m) Weclustered all EMIRGE-generated 16S rRNA gene sequencesat 97 nt identity using UCLUST [26] resulting in adatabase of 101 16S rRNA gene sequences Taxonomy wasassigned to theseOTUs using the SILVA Incremental Aligner(SINA) [27] on the SILVA website [28 29] Using bwa withdefault parameters [24] wemappedmetagenomic reads (splitinto 100 bp lengths to limit biases associated with differentsequencing technologies as described above and in [7]) fromall filter samples to the reference database of 101 OTUsin order to generate relative abundance estimates for eachOTU across samples For a given OTU to be detected in agiven library we required ge1x coverage across ge70 of thelength of the EMIRGE-generated 16S rRNA gene sequenceIn order to account for differences in sequencing throughputwe used relative abundances that is we estimated the percentabundance of each OTU in each sample as the number ofreads that mapped to that OTU divided by the total numberof reads that mapped to any OTU in that sample times100 Those relative abundances were used for hierarchicalclustering (Pearson correlation average linkage clustering)and appear in Table S1 (see Supplementary Material availableonline at httpdxdoiorg1011552013370871)

These sequences (140 viral contigs gt10 kb and 101 16SrRNA gene sequences) have been submitted to NCBI underBioProject accession no PRJNA81851

24 Correlation with CRISPR Spacers We used Crass [30]to identify clustered regularly interspaced short palindromicrepeat (CRISPR) repeat and spacer sequences in each sampleFor this analysis we considered all sequenced filter sizesfor a given water sample together (ie 01 08 and 30 120583mfilters) Using BLASTn with an 119864-value cutoff of 1119890 minus 10 weassessed the number of CRISPR spacers from each samplethat matched (1) any of the 140 viral contig sequences gt10 kb(2) reads from viral concentrates collected from the samesample (applicable only to the eight samples from whichviral concentrates were sequenced) and (3) reads from viralconcentrates collected from other samples

3 Results and Discussion

31 Relative Abundances of Viral Populations across Size Frac-tions Contigs gt10 kb from 105 new viruses were reconstru-cted increasing the number of genomically characterizedviruses from Lake Tyrrell (LT) Victoria Australia from 35to 140 The 140 contigs including seven previously reportedcomplete genomes range in size from 10050 to 93283 bpWeanalyzed the relative abundances of these 140 viral genotypesin metagenomic libraries from viral concentrates and filtersize fractions (01ndash08 08ndash30 and 30ndash20 120583m) across timeand between locations in LT As few as 42 and as many as 116

viruses were detected in a given sample (any size fraction)and more viruses were detected in samples from which viralconcentrates were sequencedThough we acknowledge that acomparison of 16S rRNA genemicrobial OTUs to gt10 kb viralcontig OTUs is an imperfect proxy for comparing the diver-sity of these groups we find approximately 10 viral popula-tions in the viral concentrate size fraction per host OTU (anyfilter size fraction) in most samples suggesting that plank-tonic virus diversity and host diversity scale approximatelywith abundance which has previously been established to beapproximately 10 1 in most environments (eg [31])

In an attempt to distinguish among free (planktonic)viruses physically trapped on filters and host-associatedviruses (ie active viruses andor proviruses) we assumedthat the 08 and 30 120583m filter pore sizes would be too largeto retain significant numbers of viral particles and shouldpredominantly include host-associated viruses We assumedthat the 01120583m filters could retain both host-associated andplanktonic viruses and that the viral concentrates (30 kDandash01 120583m size fraction) should generally exclude host cells andbe dominated by planktonic viruses Therefore we consid-ered viral detection in libraries from three filter size groupsseparately (1) viral concentrates (2) 01 120583m filters and (3)08 andor 30 120583m filters For five samples from which at leastone library from each of those groups was sequenced thenumber of viruses detected only in the viral concentrateswas always higher than the number detected only on filters(Figure 1(a))That trend is robust to the addition of twomoresamples from which libraries from viral concentrates andat least one 08 andor 30 120583m filter sample were sequenced(Figure 1(b)) Although we detected more unique viruses inthe viral concentrates the ratio of total viruses detected on 08or 30 120583m filters relative to total viruses detected in the viralconcentrates tended to be approximately equal (Figure 1(c))meaning that the richness of viruses in viral concentratestended to be similar to viral richness in the host-associatedsize fractions

We used hierarchical clustering (Figure 2(a)) to deter-mine whether patterns in the relative abundances of the 140viruses (ie the 140 viral contigs gt10 kb) could be observedaccording to season filter size andor sample site No uni-versal patterns were observed but there was some clusteringaccording to filter type andor for samples collected fromthe same site during the same week Specifically all fourviral concentrate samples from January 2010 site B clusteredtogether and they clustered more closely with all of the01 120583m filters from the same time series than with other viralconcentrate samples Two additional viral concentrate sam-ples collected during the same season but at different sitesand years (J2007At1 and J2009B) complete a larger clusterfor that group indicating similarity among planktonic viralfractions relative to viruses on larger filter sizes Interestinglythe remaining two viral concentrate samples (the only samplecollected in January 2010 from site A and another samplefrom site A collected in January 2007) clustered separatelyfrom each other and from the rest of the dataset Overall thissuggests that viral concentrates represent a different part ofthe viral community than is sampled by other size fractions

Archaea 5

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only01 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J201

0Bt2

J201

0Bt3

J201

0A(a)

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

05

1

15

2

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

Det

ecte

d vi

rus r

atio

(fil

ter

VC

)

(c)

Figure 1

particularly fractions gt08120583m Interestingly this is despitethe similarity in richness described above (ie the numberof viral OTUs remains relatively similar across samples butthe composition differs in viral concentrates relative to host-associated size fractions)

Although the most obvious trend in the viral assemblagehierarchical clustering analysis was the separation of viralconcentrates from other filter sizes described above someclustering was observed for 01 08 and 30 120583m filter sizefractions Specifically all four filter samples collected fromJanuary 2009 site A cluster together as do all of the 08 and30 120583m filters from January 2010 site B suggesting stability ofthe active viral andor proviral assemblages over four daysin both cases Although the 01 and 08 120583m filter samplescollected in August 2007 from site A time 1 cluster togetherthey are separate from filters of the same size collected twodays later suggesting a shift in the active viral assemblageover days or possibly the induction of proviruses on thattimescale Together these data suggest that although someturnover in active viruses andor proviruses was observedmost active viruses andor proviruses were stable within thearchaea-dominated LT system over days

32 Archaeal and Bacterial OTUs We also characterized therelative abundances of potential host OTUs across time andbetween sites in the LT system Of 101 total archaeal andbacterial 16S rRNA gene OTUs at 97 nt identity 29 weredetected at 5 abundance or higher on any filter in anysample For easier visualization Figure 3 shows only those29 OTUs and it includes only samples from which at leastfive of those OTUs were detected

In the filter size-resolved plots (Figures 3(a)ndash3(c)) it isclear that even the most abundant archaeal groups changesignificantly over time and space in terms of both pres-enceabsence and relative abundance For example the rela-tive proportions ofHalorubrum-like andHaloquadratum-likeOTUs tend to differ significantly across samples from almostexclusively Halorubrum-like organisms (eg in the August2007 time series) to almost exclusively Haloquadratum-likeorganisms (eg in the January 2009 site A time seriesand in January 2010 site A) Some changes in the relativeabundances of these groups can even be observed overhours on 08120583m filters from the January 2009 site B timeseries (Figure 3(b)) A significant change in temperature wasmeasured between the first and second samples of the January

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Archaea 5

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only01 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J201

0Bt2

J201

0Bt3

J201

0A(a)

0

10

20

30

40

50

VC onlyVC (plus 01)

08 andor 3 only

Viru

ses d

etec

ted

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

05

1

15

2

J200

7At1

J200

7At2

J200

9B

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

Det

ecte

d vi

rus r

atio

(fil

ter

VC

)

(c)

Figure 1

particularly fractions gt08120583m Interestingly this is despitethe similarity in richness described above (ie the numberof viral OTUs remains relatively similar across samples butthe composition differs in viral concentrates relative to host-associated size fractions)

Although the most obvious trend in the viral assemblagehierarchical clustering analysis was the separation of viralconcentrates from other filter sizes described above someclustering was observed for 01 08 and 30 120583m filter sizefractions Specifically all four filter samples collected fromJanuary 2009 site A cluster together as do all of the 08 and30 120583m filters from January 2010 site B suggesting stability ofthe active viral andor proviral assemblages over four daysin both cases Although the 01 and 08 120583m filter samplescollected in August 2007 from site A time 1 cluster togetherthey are separate from filters of the same size collected twodays later suggesting a shift in the active viral assemblageover days or possibly the induction of proviruses on thattimescale Together these data suggest that although someturnover in active viruses andor proviruses was observedmost active viruses andor proviruses were stable within thearchaea-dominated LT system over days

32 Archaeal and Bacterial OTUs We also characterized therelative abundances of potential host OTUs across time andbetween sites in the LT system Of 101 total archaeal andbacterial 16S rRNA gene OTUs at 97 nt identity 29 weredetected at 5 abundance or higher on any filter in anysample For easier visualization Figure 3 shows only those29 OTUs and it includes only samples from which at leastfive of those OTUs were detected

In the filter size-resolved plots (Figures 3(a)ndash3(c)) it isclear that even the most abundant archaeal groups changesignificantly over time and space in terms of both pres-enceabsence and relative abundance For example the rela-tive proportions ofHalorubrum-like andHaloquadratum-likeOTUs tend to differ significantly across samples from almostexclusively Halorubrum-like organisms (eg in the August2007 time series) to almost exclusively Haloquadratum-likeorganisms (eg in the January 2009 site A time seriesand in January 2010 site A) Some changes in the relativeabundances of these groups can even be observed overhours on 08120583m filters from the January 2009 site B timeseries (Figure 3(b)) A significant change in temperature wasmeasured between the first and second samples of the January

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 Archaea

0 10000 500000

J2010

AVC

J2007

At2

VCA2007

At208

J2007

At201

A2008

At208

J2009

Bt108

J2009

Bt208

J2009

Bt308

A2008

At13

J2009

At101

J2009

At201

J2009

At108

J2009

At208

J2010

A01

J2007

At101

J2007

At208

J2007

At108

J2010

A3

A2007

At101

A2007

At108

A2007

At13

A2007

At23

J2010

Bt308

J2010

Bt23

J2010

Bt43

J2007

At1

VCJ2009

BVC

J2010

Bt201

J2010

Bt35

01

J2010

Bt101

J2010

Bt301

J2010

Bt3

VCJ2010

Bt1

VCJ2010

Bt2

VCJ2010

Bt4

VC

(a)

0 2 20

A2007

At13

III

A2007

At201

Sang

A2007

At101

454

A2007

At108

454

A2007

At208

Sang

A2007

At23

III

J2007

At101

Sang

J2007

At201

Sang

J2010

A01

III

J2010

Bt201

III

J2010

Bt35

01

III

J2010

Bt101

III

J2010

Bt301

III

J2007

At208

Sang

A2008

At208

454

J2009

Bt108

454

III

J2009

Bt208

454

J2009

Bt308

454

J2009

At201

454

J2009

At101

454

J2009

At108

454

J2009

At208

454

J2007

At108

Sang

III

A2008

At13

III

J2010

A3

III

J2010

Bt308

III

J2010

Bt23

III

J2010

Bt43

III

(b)

Figure 2

2009 site B time series (Table 1) and we hypothesize thatthis shift in community structure may mark a response tothe temperature change With the exception of the January2010 site B time series the diversity and relative abundanceof Haloquadratum-like and Halorubrum-like organisms aregenerally anticorrelated suggesting that these archaea maycompete for a similar niche in the LT system The obser-vation of a relatively low diversity and abundance of Halo-quadratum-like organisms in some samples (ie A2007At1-t2 A2008At2 and J2009Bt1) suggests that Haloquadratumspecies are more dynamic in hypersaline systems than hasbeen previously appreciated [32]

In addition to trends within organism types specificOTUs also exhibited interesting dynamics For exampleOTUHaloquadratum walsbyi 2 which belongs to a species gener-ally considered to be among the most abundant organisms inhypersaline lakes [32] is at relatively low abundance or notdetected on 08 and 30 120583m filters from August 2007 site Aand January 2010 site B though it is at high abundance inthat size fraction at site A in January 2009 and January 2010Interestingly that OTU also appears dynamic on a days scalein August 2008 at site A appearing at high abundance on the30 120583m filter at time 1 but not detected on the 08 120583m filterat time 2 Three OTUs including two related to Salinibacter

weremore abundant in theAugust 2007 siteA time series (allfilter sizes) than in any other sampleTheNanohaloarchaeonCandidatus Nanosalina [21] was detected at reasonably highabundance on all 01120583m filters from January 2010 sites A andB but was less abundant at other sites and times (Figure 3(a))Consistent with the small cell size for that organism itand the other abundant Nanohaloarchaeal OTU CandidatusNanosalinarumwere found at low abundance or not detectedon filters larger than 01120583m

Overall analyses of the 29 most abundant archaealand bacterial OTUs indicate dynamics at the populationlevel across time and space particularly on months-to-yearstimescales with some dynamics indicated over hours todays These results indicate dynamics in the most abundantarchaeal and bacterial populations at the taxon level incontrast to a previous study which predicted that the mostabundant microbial taxa were stable over timescales ofweeks to one month in a hypersaline crystallizer pond nearSan Diego (SD) CA USA [6] That study was based ontaxonomic affiliations predicted from sim100 bp metagenomicreads It is possible that differences in sampling timescalesgeochemistry andor community composition could resultin true differences in microbial dynamics between thesetwo systems Also LT microbial eukaryotic communities are

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Archaea 7

0 10 20 30 40 50 60 70 80 90 100

A2007At1

J2009At1

J2009At2

J2010A

J2010Bt1

J2010Bt2

J2010Bt35

J2010Bt3

01 120583

m fi

lter s

ampl

e

()

(a)

0 10 20 30 40 50 60 70 80 90 100

A2007At1

A2008At2

J2007At1

J2009At1

J2009At2

J2009Bt1

J2009Bt2

J2009Bt3

J2010Bt3

08 120583

m fi

lter s

ampl

e

()

(b)

0 10 20 30 40 50 60 70 80 90 100

A2007At2

A2008At1

J2010A

J2010Bt2

J2010Bt4

()

Halorubrum 1Halorubrum 2Halorubrum 3Halorubrum 4Halorubrum 5Halorubrum 6Haloquadratum 1Halorubrum 7Haloquadratum 2Haloquadratum 1walsbyiHaloquadratum 2walsbyiHaloquadratum 3Haloquadratum 4Haloquadratum 5Haloquadratum 3walsbyi

Haloquadratum 6Candidatus NanosalinaUnclassified HalobacteriaceaeHalorhabdus 1Halorhabdus 2Halorhabdus 3Halorhabdus 4HalorientalisHalogranumHalobellusCandidatus NanosalinarumSalinibacter 1Salinibacter 2Unclassified Sphingobacteriales

3 120583m

filte

r sam

ple

()

(c)

Figure 3

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 Archaea

dominated by a predatory Colpodella sp and it is unclearwhat effect top-down grazing may have on shifts in bacterialand archaeal community structure [20] However the SDstudy also suggested stability of viral populations in thatsystem and we previously demonstrated through a reanalysisof the SD data that the most abundant viral populationsfrom that study were in fact dynamic [8] so it is possiblethat different sequencing and analytical methods would haverevealed dynamics in archaeal and bacterial populations atthe SD site as well Unfortunately a reanalysis of the existingSD data using the methods in this study is not possibledue to the incompatibility of 454 sequencing reads withthe EMIRGE algorithm which is designed to reconstructnear-complete 16S rRNA genes from paired-end Illuminametagenomic data

Based on the relative abundances of the 101 archaeal andbacterial OTUs (including lower abundance LT organisms)we used hierarchical clustering to determine whether simi-larities in overall archaeal and bacterial community structurewould group LT samples according to season sample typeor filter size (Figure 2(b)) In general archaeal and bacterialcommunities sampled from the same location over daysclustered more closely than did viral assemblages from thesame samples (see above and Figure 2(a)) indicating greaterstability of host populations than viral populations over daysThe only samples for which no within-time series (ie daysscale) clustering was observed for host communities weresamples from August 2008 site A For that time seriesDNA from different filter sizes was sequenced from eachsample and different sequencing technologieswere used (454and Illumina) so it is possible that similarities between thesamples exist but were masked by different methodologiesHowever interestingly all samples and filter sizes from theAugust 2007 site A time series clustered together includingDNA from 01 08 and 30 120583m size fractions sequenced by allthree technologies (Sanger 454 and Illumina)This indicatesa distinct community at site A in August 2007 which wasstable over days and robust to methodological differences inlibrary construction and sequencing

In some cases the archaeal and bacterial communitieson the 01 120583m filters clustered together and separately fromother filters from the same time point reflecting enrichmentfor Nanohaloarchaea Specifically both 01120583m filters fromJanuary 2007 site A cluster together and they belong toa larger cluster that includes all 01120583m filters from January2010 site B The prevalence of Nanohaloarchaea in thesesamples but not in others is evident in Figure 3(a) which isbased on the 29 most abundant OTUs described above Inall other cases the 01120583m filters clustered with larger filtersfrom the same time series For the January 2009 site B single-day time series from which only DNA from 08120583m filters(and a viral concentrate) was sequenced the morning sampleclusters separately from the afternoon and evening sampleswhich cluster together indicating a shift in communitystructure over a single day This is consistent with the shift inrelative abundance of Halorubrum-like and Haloquadratum-like organisms from that time series (Figure 3) and may berelated to a temperature shift as suggested above

Interestingly a seasonal trend was not indicated in eitherthe analysis of the 29most abundant OTUs or in the 101 OTUhierarchical clustering analysis (seasonal trends in viral pop-ulations would be difficult to infer as no viral concentrateswere sequenced from winter samples) Samples from site Ain August 2007 have quite different archaeal and bacterialcommunity structures from samples at the same site inAugust 2008 and January samples from site A are also fairlydistinct year to year Although some samples from site Bcollected in January 2009 and January 2010 might appear tobe the exception (Figure 3(b)) there was as much of a shift incommunity structure over hours in January 2009 at site B asthere was across samples collected over years

As with any metagenomic study we cannot say forcertain to what extent abundance in metagenomic librariesreflects true abundances However we have taken care toavoid biases that are often associated with sequencing-basedcommunity analyses Specifically by focusing on 16S rRNAgene sequences from metagenomic data we have avoidedPCRamplification biases in our estimates of archaeal and bac-terial dynamics Similarly we were able to generate enoughviral concentrate metagenomic DNA for sequencing withoutmultiple-displacement amplification (MDA)which is knownto generate significant biases especially in viralmetagenomes(discussed in [33 34])

33 CRISPR Analyses Using the Crass algorithm [30] weidentified 549 unique repeats and 8095 unique spacersequences from clustered regularly interspaced short palin-dromic repeat (CRISPR) regions in the LT dataset Apartfrom a single sample from January 2009 site A the onlylibraries with spacers that matched any of the 140 completeand near-complete viral genomes were those from samplesfrom which viral concentrates were also sequenced (Table 2)This may indicate that different planktonic viral assemblagesexisted in LT at time points from which viral concentrateswere not sequenced which would be consistent with theviral population dynamics observed across the sequencedviral concentrates [7] Notably no spacers from LT Augustmetagenomesmatched any of the 140 viruses consistent witha possible seasonal shift in viral community structure (noviral concentrates were sequenced from August samples) ashas been observed in marine systems (eg [35]) However aconcomitant seasonal shift in host community structure wasnot supported so it is also possible that the August samplescould harbor different planktonic viral assemblages each year

The only spacer matches to any of the seven previouslydescribed complete LT viral and virus-like genomes [7] wereto LTV2 (76716 bp) and LTVLE3 (71341 bp) in libraries fromthree samples from the January 2010 site B time series(Table 2) Of the seven viral concentrate genomes those twoachieved the highest abundance by approximately one orderof magnitude [7] and they were at their most abundant in thetime series fromwhich spacers targeting themwere detectedThis indicates that LT CRISPRs can actively target abundantcoexisting viruses consistent with previous observations ofCRISPR sampling of coexisting viruses in an acid-minedrainage system [9 11] However it should be noted that thevast majority of spacers do not match any of the 140 viral

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Archaea 9

Table 2 CRISPR analyses by sample

Sample Unique repeats Unique spacers Hits to 140 viruses Viral contig match(es) Predicted hostJ2007At1 67 681 1 scaffold 16J2007At2 82 633 1 Contig999004A2007At1 30 141 0A2007At2 64 554 0A2008At1 20 275 0A2008At2 23 121 0J2009At1 41 163 1 scaffold 117J2009At2 40 173 0J2009B 29 114 0J2010Bt1 22 501 0

J2010Bt2 79 1798 4

LTV2LTV2

LTVLE3Contig999004

J2010Bt3 58 1171 8

LTVLE3Contig1100059 Natronomonas-likeContig998975scaffold 55scaffold 29LTVLE3LTVLE3 NanohaloarchaeaLTVLE3 Nanohaloarchaea

J2010Bt35 60 930 4

LTV2LTVLE3 Nanohaloarchaea

Contig999004Contig998975

J2010Bt4 43 853 1 Contig999004J2010A 20 340 0

contigs gt 10 kb which we assume to be among the mostabundant viruses because they assembled significantly Thissuggests thatmost CRISPRs target rare viruses An alternativeexplanation might be a preponderance of inactive vestigeCRISPR regions but if that were the case then we wouldexpect to see the same CRISPR spacers duplicated acrosssamples (ie CRISPR regions would be clonally inheritedover multiple generations as opposed to actively integratingnew spacers on subgenerational timescales) Given that only353 spacer sequences were duplicated across samples relativeto 8095 unique spacer sequences in the dataset we infereither that most CRISPR regions were highly dynamic andorthat we detected different CRISPR regions over time due toshifts in host community structure Regardless there is a highdiversity of spacers in the LT system consistent with the highdiversity of viruses

Of the CRISPR regions that had matches to the 140LT viral contigs only five had repeats with matches toassemblies from 01 08 andor 30 120583m filter sequences (amatch would be indicative of the host organism) and nonematched any of the 12 complete and near-complete bacterialand archaeal genomes assembled from the LT January 2007site A time series [36] Three matches to the January 2010

site B assemblies (Andrade et al unpublished data) wereto contigs that could only be identified at the order levelof Halobacteriales based on best BLAST hits throughoutthe contigs and one repeat matched a contig predicted tobelong to a Natronomonas-like organism (Table 2) Themost interesting CRISPR repeat sequence match was toa contig from a likely Nanohaloarchaeon based on thetaxonomy of best BLAST hits throughout the contig A spacerassociated with that repeat matched LTVLE3 a virus orplasmid described in [7] strongly suggesting that LTVLE3 is avirus or plasmid of the Nanohaloarchaea Interestingly whileLTVLE3 was only at high abundance in viral concentratesfrom the January 2010 site B four-day time series fromwhich the only spacers targeting it were identified the mostabundant Nanohaloarchaeon Candidatus Nanosalina [21]was detected in most LT samples and is as abundant atsite A as it is at site B (isolated pools 300m apart) in 2010(Figure 3(a)) This suggests that the Nanohaloarchaea at siteB may have been adapted locally to the presence of abundantLTVLE3 at site B in January 2010

BLAST searches of all 8095 LT CRISPR spacers and 549repeat sequences revealed essentially no hits to the NCBI nrand environmental databases and no hits were detected for

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 Archaea

0

250

500

750

1000

1250

1500

1750

2000

J2007A timeseries

J2010B timeseries

All VC timeseries

TotalSingle sample or seriesTwo samples or series

Three samples or seriesFour samples or series

Spac

er h

its to

VCs

(a)

0

2

4

6

8

10

12

14

16

18

Hits to same sampleMin hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

J200

7At1

J200

7At2

J200

9B

J201

0Bt1

J201

0Bt2

J201

0Bt3

J201

0Bt4

J201

0A

(b)

0

2

4

6

8

10

12

14

16

18

20

Min hits to other samplesMax hits to other samples

Spac

er h

its to

VCs

()

A20

07At

1

A20

07At

2

A20

08At

1

A20

08At

2

J200

9At1

J200

9At2

J201

0Bt3

5

(c)

Figure 4

any of the repeat sequencesThree spacers had 100matchesHomo sapiens indicating that they are likely either to beerroneous spacers or representative of mobile genetic ele-ments that are not present in public databases Overall thesedata demonstrate that the LT CRISPR spacers target mobilegenetic elements that have not been previously identifiedsuggesting that LT archaea may be highly adapted to localviral predation though we cannot rule out the possibility of

public database bias (ie perhaps LT CRISPRs could targetviruses from other locations that are not represented inpublic databases) In support of local adaptation despite therelatively small number of spacer matches to the 140 LT viralcontigs gt10 kb 1729 spacers (21) had matches to unassem-bled LT viral concentrate reads This may indicate selectionfor CRISPR spacers that target locally adapted potentiallycoexisting viruses consistent with previously observed local

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Archaea 11

CRISPR adaptation to coexisting viruses in an acid-minedrainage system and in geographically distinct sludge biore-actors [8 9] Because these matches are to unassembledreads we infer that most targeted LT viruses are at relativelylow abundance at least at the times and locations sampledby this study (with the exception of LTV2 and LTVLE3as described above) This may indicate successful CRISPR-mediated maintenance of viral populations at abundanceslow enough to preclude genomic assembly

In terms of the timescales on which CRISPR spacersare retained and match coexisting low-abundance virusesin the LT system we found several lines of evidence thatsupport the stability of many CRISPR spacers and low-abundance viruses over all timescales sampled by this study(up to three years) LT spacers tended to match reads frommultiple viral concentrate metagenomes collected over thecourse of days almost as often as (January 2007 site A) ormore often than (January 2010 site B) they matched readsfrom a single viral concentrate metagenome from the sametime series (Figure 4(a)) This suggests that many spacersand their targeted low-abundance viruses are stable overdays On the timescale of years spacers and their targetedcoexisting viruses were found most often across more thanone of the four sites and times sampled by viral concentrates(January 2007 site A January 2009 site B January 2010site A January 2010 site B) Similarly in all eight samplesfrom which viral concentrates were sequenced nearly asmany (and often more) spacers had hits to viral concentratereads from other samples as matched viral concentrate readsfrom the same sample (Figure 4(b)) A similar number ofmatches to the eight viral concentrates were observed forsamples from which viral concentrates were not sequenced(Figure 4(c)) Overall these results demonstrate that CRISPRregions generally retain spacer targets against low-abundanceviruses on timescales of 1ndash3 years We also infer that low-abundance viruses are likely more stable in the LT systemthan their highly dynamic higher abundance counterparts

4 Conclusions

Both virus (140 contigs gt10 kb) and host (101 16S rRNA geneOTUs) assemblage structures were highly dynamic over timeand space in the archaea-dominated LT system particularlyover months to years However archaeal and bacterial popu-lations were generally more stable than viral populations andlower abundance viruses were inferred to bemore stable thanabundant viruses Filter size-resolved analyses of viral popu-lations revealed different viral assemblages in the planktonicand host-associated (ie active and provirus) fractions sug-gesting that a higher diversity of viruses may have the poten-tial to infect than are actively infecting at any given time Con-sistent with that hypothesis CRISPR analyses revealed per-sistent targeting of lower abundance viral populations alongwith a high diversity of spacer sequences However interest-ingly the most abundant viruses (sim2 of the viral commu-nity relative to le01 formost viruses [7]) were also targetedby CRISPRs indicating that archaeal hosts strike a balancebetweenprotection against persistent low-abundance viruses

and those viruses potentially abundant enough to effect catas-trophic changes in host community structure

Acknowledgments

Funding for this work was provided by the National ScienceFoundation Award 0626526 and DE-FG02-07ER64505 fromthe Department of Energy Thanks to Cheetham Salt Works(Victoria Australia) for permission to collect samples JohnMoreau Jochen Brocks and Mike Dyall-Smith for assistancein the field Matt Lewis and the J Craig Venter Institute(JCVI) for library construction and sequencing ShannonWilliamson and Doug Fadrosh (JCVI) for training JoanneB Emerson in virus-related laboratory techniques ConnorSkennerton and Gene Tyson for early access to and assis-tance with the Crass program Christine Sun for helpfuldiscussions and two anonymous reviewers for constructivecomments that improved the paper

References

[1] F Rohwer D Prangishvili and D Lindell ldquoRoles of viruses inthe environmentrdquoEnvironmentalMicrobiology vol 11 no 11 pp2771ndash2774 2009

[2] C A Suttle ldquoMarine virusesmdashmajor players in the global eco-systemrdquoNature ReviewsMicrobiology vol 5 no 10 pp 801ndash8122007

[3] K Porter B E Russ and M L Dyall-Smith ldquoVirus-host inter-actions in salt lakesrdquo Current Opinion in Microbiology vol 10no 4 pp 418ndash424 2007

[4] R-A Sandaa and A Larsen ldquoSeasonal variations in virus-host populations in Norwegian coastal waters focusing on thecyanophage community infecting marine Synechococcus spprdquoApplied andEnvironmentalMicrobiology vol 72 no 7 pp 4610ndash4618 2006

[5] S M Short and C A Suttle ldquoTemporal dynamics of naturalcommunities of marine algal viruses and eukaryotesrdquo AquaticMicrobial Ecology vol 32 no 2 pp 107ndash119 2003

[6] B Rodriguez-Brito L Li L Wegley et al ldquoViral and microbialcommunity dynamics in four aquatic environmentsrdquo ISMEJournal vol 4 no 6 pp 739ndash751 2010

[7] J B Emerson B C Thomas K Andrade E E Allen K BHeidelberg and J F Banfield ldquoDynamic viral populations inhypersaline systems as revealed by metagenomic assemblyrdquoApplied and Environmental Microbiology vol 78 pp 6309ndash6320 2012

[8] V Kunin S He F Warnecke et al ldquoA bacterial metapopulationadapts locally to phage predation despite global dispersalrdquoGenome Research vol 18 no 2 pp 293ndash297 2008

[9] A F Andersson and J F Banfield ldquoVirus population dynamicsand acquired virus resistance in natural microbial communi-tiesrdquo Science vol 320 no 5879 pp 1047ndash1050 2008

[10] R E Anderson W J Brazelton and J A Baross ldquoUsing CRIS-PRs as ametagenomic tool to identify microbial hosts of adiffuse flow hydrothermal vent viral assemblagerdquo FEMS Micro-biology Ecology vol 77 no 1 pp 120ndash133 2011

[11] GW Tyson and J F Banfield ldquoRapidly evolving CRISPRs imp-licated in acquired resistance of microorganisms to virusesrdquoEnvironmental Microbiology vol 10 no 1 pp 200ndash207 2008

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

12 Archaea

[12] J F Heidelberg W C Nelson T Schoenfeld and D BhayaldquoGerm warfare in a microbial mat community CRISPRs pro-vide insights into the co-evolution of host and viral genomesrdquoPLoS ONE vol 4 no 1 Article ID e4169 2009

[13] R Barrangou C Fremaux H Deveau et al ldquoCRISPR providesacquired resistance against viruses in prokaryotesrdquo Science vol315 no 5819 pp 1709ndash1712 2007

[14] F J M Mojica C Dıez-Villasenor J Garcıa-Martınez and ESoria ldquoIntervening sequences of regularly spaced prokaryoticrepeats derive from foreign genetic elementsrdquo Journal of Molec-ular Evolution vol 60 no 2 pp 174ndash182 2005

[15] R Sorek V Kunin and P Hugenholtz ldquoCRISPRmdasha widespreadsystem that provides acquired resistance against phages inbacteria and archaeardquo Nature Reviews Microbiology vol 6 no3 pp 181ndash186 2008

[16] BWiedenheft S H Sternberg and J A Doudna ldquoRNA-guidedgenetic silencing systems in bacteria and archaeardquo Nature vol482 no 7385 pp 331ndash338 2012

[17] R T DeBoy E F Mongodin J B Emerson and K E NelsonldquoChromosome evolution in the Thermotogales large-scaleinversions and strain diversification of CRISPR sequencesrdquoJournal of Bacteriology vol 188 no 7 pp 2364ndash2374 2006

[18] N L Held A Herrera H C Quiroz and R J WhitakerldquoCRISPR associated diversity within a population of Sulfolobusislandicusrdquo PLoS ONE vol 5 no 9 Article ID e12988 2010

[19] N L Held and R J Whitaker ldquoViral biogeography revealedby signatures in Sulfolobus islandicus genomesrdquo EnvironmentalMicrobiology vol 11 no 2 pp 457ndash466 2009

[20] K B Heidelberg W C Nelson J B Holm N Eisenkolb KAndrade and J B Emerson ldquoCharacterization of eukaryoticmicrobial diversity in hypersaline Lake Tyrrell AustraliardquoFrontiers in Microbiology vol 4 Article ID 115 2013

[21] P Narasingarao S Podell J A Ugalde et al ldquoDe novo metage-nomic assembly reveals abundant novel major lineage of Arch-aea in hypersaline microbial communitiesrdquo ISME Journal vol6 no 1 pp 81ndash93 2012

[22] Y Peng H C M Leung S M Yiu and F Y L Chin ldquoIDBA-UD a de novo assembler for single-cell andmetagenomic sequ-encing data with highly uneven depthrdquo Bioinformatics vol 28no 11 pp 1420ndash1428 2012

[23] M Margulies M Egholm W E Altman et al ldquoGenome sequ-encing in microfabricated high-density picolitre reactorsrdquoNature vol 437 pp 376ndash380 2005

[24] H Li and R Durbin ldquoFast and accurate short read alignmentwith Burrows-Wheeler transformrdquo Bioinformatics vol 25 no14 pp 1754ndash1760 2009

[25] C S Miller B J Baker B C Thomas S W Singer and J FBanfield ldquoEMIRGE reconstruction of full-length ribosomalgenes from microbial community short read sequencing datardquoGenome Biology vol 12 no 5 article R44 2011

[26] R C Edgar ldquoSearch and clustering orders of magnitude fasterthan BLASTrdquo Bioinformatics vol 26 no 19 pp 2460ndash24612010

[27] E Pruesse J Peplies and F O Glockner ldquoSINA accurate high-throughput multiple sequence alignment of ribosomal RNAgenesrdquo Bioinformatics vol 28 pp 1823ndash1829 2012

[28] E Pruesse C Quast K Knittel et al ldquoSILVA a comprehensiveonline resource for quality checked and aligned ribosomal RNAsequence data compatible with ARBrdquo Nucleic Acids Researchvol 35 no 21 pp 7188ndash7196 2007

[29] C Quast E Pruesse P Yilmaz et al ldquoThe SILVA ribosomalRNA gene database project improved data processing andweb-based toolsrdquo Nucleic Acids Research vol 41 pp D590ndashD5962013

[30] C T SkennertonM Imelfort andGW Tyson ldquoCrass identifi-cation and reconstruction of CRISPR from unassembled meta-genomic datardquo Nucleic Acids Research vol 41 no 10 p e1052013

[31] R Danovaro C Corinaldesi A DellrsquoAnno et al ldquoMarineviruses and global climate changerdquo FEMSMicrobiology Reviewsvol 35 no 6 pp 993ndash1034 2011

[32] M L Dyall-Smith F Pfeiffer K Klee et al ldquoHaloquadratumwalsbyi limited diversity in a global pondrdquo PLoS ONE vol 6no 6 Article ID e20968 2011

[33] M B Duhaime and M B Sullivan ldquoOcean viruses rigorouslyevaluating themetagenomic sample-to-sequence pipelinerdquoVir-ology vol 434 pp 181ndash186 2012

[34] M B Duhaime L Deng B T Poulos and M B SullivanldquoTowards quantitative metagenomics of wild viruses and otherultra-low concentration DNA samples a rigorous assessmentand optimization of the linker amplification methodrdquo Environ-mental Microbiology vol 14 pp 2526ndash2537 2012

[35] R J Parsons M Breitbart M W Lomas and C A CarlsonldquoOcean time-series reveals recurring seasonal patterns of viri-oplankton dynamics in the northwestern Sargasso Seardquo ISMEJournal vol 6 no 2 pp 273ndash284 2012

[36] S Podell J A Ugalde P Narasingarao J F Banfield K BHeidelberg and E E Allen ldquoAssembly-driven community gen-omics of a hypersaline microbial ecosystemrdquo PLoS ONE vol 8Article ID e61692 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology