sabree rondon handelsman metagenomics

Upload: chaitanya-sampat

Post on 10-Oct-2015

27 views

Category:

Documents


0 download

DESCRIPTION

great book

TRANSCRIPT

  • rsit

    16S rRNA gene Highly conserved RNA involved in

    environmental gene tag Short sequence obtained

    from metagenomic sequencing.

    BAC Bacterial artificial chromosome

    CTAB hexadecyltrimethylammonium bromide

    METREX Metabolite-regulated expression

    SIGEX Substrate-induced gene expression

    Prokaryotes are the most physiologically diverse and

    th llex

    rde tbe -od tca issl nto spr dsp gmicrobial communities as entire units, without cultivatingindividual members. Metagenomics entails extraction ofallies. But the variations that we know are only the tipof the microbial iceberg. The vast majority of microor-ganisms have not been cultivated in the laboratory, andalmost all of our knowledge of microbial life is based onorganisms raised in pure culture. The variety of the rest of

    be cultured to create metagenomic libraries, and theselibraries are then subjected to analysis based on DNAsequence or on functions conferred on the surrogate hostby the metagenomic DNA. Although this field of microbiol-ogy is quite young, discoveries have already been made thatmetabolically versatile organisms on our planet. Bacteriavary in the ways that they forage for food, transduceenergy, contend with competitors, and associate with622DNA from a community so that all of the genomes oforganisms in the community are pooled. These genomesare usually fragmented and cloned into an organism that canDefining Statement

    Metagenomics is the study of the collective genomes ofthe members of a microbial community. It involves clon-ing and analyzing the genomes without culturing theorganisms in the community, thereby offering the oppor-tunity to describe the planets diverse microbialinhabitants, many of which cannot yet be cultured.

    Introductione uncultured microbial world is staggering and wipand our view of what is possible in biology.The challenge that has frustrated microbiologists focades is how to access the microorganisms that cannocultured in the laboratory. Many clever cultivation meths have been devised to expand the range of organisms than be cultured, but knowledge of the uncultured worldim, so it is difficult to use a process based on rational desigcoax many of these organisms into culture. Metagenomicovides an additional set of tools to study uncultureecies. This new field offers an approach to studyinEGTs environmental gene tagspolypeptide synthesis that is commonly used for making

    phylogenetic inferences.

    BAC Bacterial artificial chromosome.

    contig A set of overlapping sequences derived from a

    single template.

    cosmids Vectors containing cos sites for phage

    packaging that are used to clone large DNA fragments.

    environmental DNA Genetic material extracted

    directly from a microbial community.MetagenomicsZ L Sabree, M R Rondon, and J Handelsman, Unive

    2009 Elsevier Inc. All rights reserved.

    Defining Statement

    Introduction

    Building Metagenomic Libraries

    Glossaryy of Wisconsin-Madison, Madison, WI, USA

    Sequence-Based Metagenomic Analysis

    Function-Based Metagenomic Analysis

    Further Reading

    fosmid Large-insert cloning vector based on the

    F-plasmid of Escherichia coli.

    genome The genetic complement of an organism.

    metaproteome The collective protein complements of

    a microbial community.

    microbiome The collection of microorganisms residing

    in a particular habitat.

    mini scaffold A single paired-end read.

    read The output of an individual DNA sequencing

    reaction.

    scaffold Contigs linked by overlapping regions.Abbreviations

  • to obtain enough DNA to build libraries. Contaminating

    For example, a FastDNA Spin (Qbiogene) preparation

    320 times more clones compared to libraries constructed

    Genetics, Genomics | Metagenomics 623followed by a hexadecyltrimethylammonium bromide(CTAB) extraction yields high quality DNA. Due to shearchemicals and enzymes often remain in the water, making itrelatively easy to isolate DNA without abundant contami-nants. In contrast, inorganic soil components, such asnegatively and positively charged clay particles, and bio-chemical contaminants, such as humic acids and DNases,make DNA extraction from soils, and subsequent manipula-tion, challenging. The process for removing contaminantsdetermines both the clonability and the size of the DNAbecause many of the processes that effectively remove con-taminants that inhibit cloning also shear the DNA. Physicaldisassociation of microbes from the semisolid matrix, typi-cally termed cell separation, can yield a cell pellet fromwhich DNA, especially high molecular weight (>20 kb)DNA, can be obtained. Immobilization of cells in an agarosematrix further reduces DNA shear forces and, followingelectrophoresis, facilitates separation of high molecularweight DNA from humic acids and DNases. Various com-mercial kits can be used to extract DNA from soil and othersemisolid matrices. Applying multiple extraction methodsto a DNA sample can yield minimally contaminated DNA.challenge existing paradigms and made substantial contribu-tions to biologists quest to piece together the puzzle of life.

    Building Metagenomic Libraries

    DNA has been isolated from microbial communities inha-biting diverse environments. Early metagenomic projectsfocused on soil and sea water because of the richness ofmicrobial species (e.g., 500040 000 species/g soil) as wellas the abundance of biocatalysts and natural productsknown to be in these environments from culture-basedstudies. While soil has been most sampled for metagenomiclibraries, aquatic sediments, biofilms, and industrial efflu-ents have also been successfully tapped, often because oftheir unique physicochemistries, for various biologicalactivities. Metagenomic libraries constructed from DNAextracted from animal-associated microbial communitieshave also been the source of a number of novel biocata-lysts (i.e., hydrolases, laccases, and xylanases), antibioticresistance genes, and inter/intraspecies communicationmolecules. See Figure 1 for an overview of metagenomiclibrary construction and screening.

    Preparing Metagenomic DNA

    The physical and chemical structure of each microbialcommunity affects the quality, size, and amount of microbialDNA that can be extracted. Accessing planktonic commu-nities requires equipment that is capable of handling largevolumes of water to concentrate sufficient microbial biomassin fosmids (3040 kb inserts) or BACs (up to 200 kb inserts)to obtain comparable coverage of the same microbialcommunity.

    Community coverage is possible only in relativelysimple microbial communities like the acid mine drai-nage, which contains only about five members. Thismetagenomic libraries harboring inserts smaller than 10 kb.Conversely, to obtain targets encoded by multiple genes,large DNA fragments (>20 kb) must be cloned into fosmids,cosmids, or bacterial artificial chromosomes (BACs), all ofwhich can stably maintain large DNA fragments. Two vec-tors, namely pCC1FOS and pWE15, have been used forcloning large DNA fragments from various microbial com-munities. The pCC1FOS vector has the advantage that,when in the appropriate host (e.g. E. coli Epi300), its copynumber can be controlled by addition of arabinose in themedium to increase DNA yield. Microbial sensing signals,antibiotic resistance determinants, antibiosis, pigment pro-duction, and eukaryotic growth modulating factors havebeen identified from metagenomic libraries constructedwith pCC1FOS. Additionally, the presence of considerableflanking DNA on fosmid or BAC clone inserts facilitatesphylogenetic inference about the source of the fragment.

    Community Complexity and MetagenomicLibrary Structure

    The determination of target insert size, cloning vector, andminimum number of library clones is governed by thetype of genes that are sought and the complexity of themicrobial community. Shotgun sequencing is usuallyconducted on small-insert clones, whereas successful func-tional studies can be performed on small- and large-insertclones. Small-insert metagenomic libraries constructed inplasmids that stably maintain up to 10 kb of DNA requireforces, low molecular weight DNA is typically isolated, butthe physically vigorous nature of some of these methods canfacilitate lysis of encapsulated bacteria, spores, and othermicrobial structures that are resistant to more gentle lysismethods, thereby providing access to a greater, more diverseproportion of the microbial community for cloning.

    Cloning Vectors and Metagenomic LibraryStructure

    The choice of cloning vector and strategy largely reflectsthe desired library structure (i.e., insert size and number ofclones) and target activities sought. To obtain a functionencoded by a single gene, small DNA fragments (

  • 624 Genetics, Genomics | MetagenomicsDNAextractioncommunity was sampled and sequenced deeply with ahigh-density, large-insert metagenomic library, making itpossible to reconstruct the genome of one member. Incontrast, the metagenome of a very complex communitysuch as that in soil can only be sampled, not exhaustivelysequenced. With todays technology, a community can besequenced to closure more quickly and cheaply with

    He

    Heterologeneexpress

    Translation

    Secretion

    Function-based metagenomics

    Screen foractive enzymes

    Selection forgrowth

    Screen for growthinhibition

    Transcription

    Figure 1 Metagenomics. Metagenomics is the study of the collectdirectly from the community, cloned into a surrogate host, and then s

    interest. Many microbial communities have been tapped for metagen

    two approaches can be taken to access the genomic information. Fu

    express the recombinant DNA in either screens for active enzymes osuppressive conditions (e.g., nutrient deficiency or presence of an an

    randomly sequenced using vector-based primers or a specific gene

    hybridize to arrayed metagenomic clones.Cloning vectorHost bacteriasmall- than large-insert clones, but future technologydevelopment may change this. In contrast, in activity-based analysis, large inserts are preferable because theprobability that the target activity is encoded by anyone clone is positively correlated with the size of theinsert, and if the activity is encoded by a cluster ofgenes, they are more likely to be captured in a large insert.

    terogeneousDNA

    DNA cloning andtransformation

    gous

    ion

    Metagenomic librar

    Sequence-based metagenomics

    Vector-primedend sequencing

    Probe librarywith specific oligos

    ive genomes of the members of a community. DNA is extractedtudied by sequencing or screening for expression of activities of

    omic analyses. Following construction of a metagenomic library,

    nctional metagenomics requires that the host bacterium can

    r antibiotic production or selections for growth under growth-tibiotic). In sequence-based metagenomics, cloned DNA is

    is sought using complimentary oligonucleotides (oligos) to

  • important feature of activity-based analysis of metagenomiclibraries. Libraries have been screened in E. coli with many

    Library storage conditions should preserve clone viabilityidentify clones containing DNA from planktonic marine

    Genetics, Genomics | Metagenomics 625as well as the original diversity of the library. E. coli-basedmetagenomic libraries can be prepared in liquid culturesupplemented with vector-selective antibiotics and1015% glycerol and stored frozen in pooled aliquots.Pooled libraries are revived in fresh media supplementedwith antibiotics, and brief incubation (12 h) at 3037 Con a rotary shaker is sufficient to initiate growth.Extended incubation could hinder recovery of the fullmetagenomic library due to overgrowth of some clonesto the exclusion of others. The revived library is thensuitable for screening or DNA preparation.

    Sequence-Based Metagenomic Analysis

    Sequence-based metagenomics is used to collect genomicinformation from microbes without culturing them. Incontrast to functional screening, this approach relies onsequence analysis to provide the basis for predictionsabout function. Massive datasets are now catalogued inthe Environmental Genomic Sequence database, andeach sequencing project is more informative than thelast because of the accumulated data from diverse envir-onments. As patterns emerge in the environmentalsuccessful outcomes, but the barriers to expression of genesfrom organisms that are distant from E. coli have led toscreening of some libraries in other hosts. A number of clonescontaining interesting activities have been detected inonly one host species and not in others. This suggeststhat the host background affects the expression of thegenetic potential of the metagenomic library, andtherefore the suitability of a particular host for thetargeted screen must be considered. Sequence-based ana-lysis of metagenomic libraries is conducted entirely inE. coli.

    Storing Metagenomic LibrariesSelecting and Transforming a Host Organism

    Development of microorganisms to host metagenomiclibraries has trended toward well-characterized and easilycultivated bacteria, with most work being conducted inE. coli. E. coli offers many useful tools for metagenomics,such as strains that harbor mutations that reduce recombina-tion (recA) and DNA degradation (endA) and facilitate blue/white recombinant (lacZ) screening. Electroporation is theprimary and most efficient method for introducing metage-nomic DNA, especially large-insert libraries, into E. coli.Additionally, some vectors can be transferred from E. coli toother bacterial species by conjugation. This has been anArchaea. Previously, 16S rRNA gene sequences had beenrecovered from several environmental samples, suggest-ing the presence of Archaea in nonextreme environmentssuch as soil and open water, yet no cultured representa-tives of these clades were known. Stein and colleaguestook a direct cloning approach to isolate genomic DNAfrom these organisms rather than attempting to culturethem. They filtered seawater and prepared a fosmidlibrary from the collected organisms. The library wasprobed to find small-subunit rRNA genes from archaealspecies. One clone was found and completely sequenced.Several putative protein-encoding genes were identified,including some that had not yet been identified in Archaea.

    Similarly, Beja` and colleagues constructed several BAClibraries from marine samples. These libraries were alsoscreened by first probing for the presence of 16S rRNAgenes. One 130-kb BAC clone was sequenced and found tocontain a 16S rRNA gene from an uncultivated gamma-proteobacterium. Surprisingly, this clone contained a geneencoding a protein with similarity to rhodopsins, whichare light-driven proton pumps found thus far only inArchaea and Eukarya. This new type of rhodopsin, calledproteorhodopsin, suggests that bacteria may also use theseproteins for phototrophy. This discovery highlights thesequences, sequence-based methods will be increasinglymore informative about microbial communities.

    Some studies use a gene of interest or anchor toidentify metagenomic clones of interest for further ana-lysis. A metagenomic library is constructed and screenedusing PCR to amplify the anchor. Anchors are often aribosomal RNA gene, but can also be a metabolic gene(e.g., a polyketide synthase). The clones that contain theanchor are then sequenced or further analyzed to provideinformation about the genomic context of the anchor.Thus, researchers can quickly focus on a clone of interest.For example, a marine picoplankton metagenomic librarywas screened by hybridization with a 16S rRNA geneprobe and the 16S rRNA genes in the positive cloneswere then sequenced to provide a picture of the diversityof the community members.

    Recently, the tremendous advances in DNA sequen-cing technology have made it feasible to sequence largelibraries without preselection for clones containing aparticular anchor. This has led to the accumulationof massive amounts of sequence data from unculturedmicrobes in several environments. Here we discuss afew examples of sequence-based metagenomic projects.These projects are also detailed in Table 1.

    Anchor-Based Sequencing

    Some early projects used rRNA or other genes as anchorsto identify useful and informative clones from metage-nomic libraries. For example, this technique was used to

  • Table 1 Sequence-based metagenomics

    Year Environment

    Totalamountsequenced Number of reads Vector Assembly Comments Reference

    Anchor-based projects1996 Ocean 10 kbp 36 pFOS1 Not attempted Shotgun sequence

    clones form 38.5-kb

    cosmid insert

    Stein et al.

    Journal of

    Bacteriology178: 591599

    2000 Ocean 128 kbp Not reported pIndigoBAC536 Entire BAC insert

    assembled

    Found

    proteorhodopsin

    gene

    Beja` et al.

    Science

    289: 190219062002 Paederus

    beetles

    110 kbp Not reported pWEB 110 kbp assembled

    from several

    overlapping cosmids

    Found putative

    pederin cluster is

    probably bacterial inorigin

    Piel

    PNAS

    99:1400214007

    Viral Metagenomics2002 Ocean ND 1934 pSMART Contigs assembled at

    low stringency

    Used specific

    techniquesto enrich for viral

    DNA

    Breitbart et al.

    PNAS99:1425014255

    Community Sequencing projects2004 Acid mine

    drainage76.2 Mbp 103462 pUC18 85% of reads in

    scaffold 2 kb or

    longer

    Combined length of

    1183 scaffolds is10.82 Mbp

    Assembled two near-complete and three

    partial genomes

    Tyson et al.Nature

    428: 3743

    2004 Sargasso Sea 1360 Mbp

    265 Mbp

    1660 000

    325561

    pBR322

    derivative

    64398 scaffolds

    217 015 miniscaffolds215 038 singletons

    (data for Weatherbird

    only)

    Larger Weatherbird

    Sample yieldedassembly, smaller

    Sorcerer II did not

    Venter et al.

    Science304: 6674

    2005 Minnesota soilWhale-fall

    100 Mbp3 25Mbp

    149085Not reported

    Lambda ZAP Analyzed withoutassembly for

    environmental gene

    tags and

    metaproteome

    Assembly not possiblefor soil sample

    due to complexity

    Tringe et al.Science

    308: 554557

    2006 Human gut 78 Mbp 139521 pHOS2 14572 scaffolds for

    33 753108 bp

    40% of reads notassembled for an

    additional 44 Mbp

    0.7X coverage of

    B. longum genome

    and 3.5X coverageof M smithii genome

    Gill et al.

    Science

    312: 13551359

  • Genetics, Genomics | Metagenomics 627undersampled.

    Community Metagenomics

    The development of increasingly fast, accurate, and inex-pensive sequencing technologies, coupled with significantimprovements in bioinformatics, has made it feasible toconduct large-scale sequencing of DNA from multispe-cies communities. This development will advance ourunderstanding of microbial diversity in nature. Freedfrom the constraint of cultivating microbes to accesstheir genomes, researchers have accumulated vast quan-tities of microbial genomic information that could nothave been gathered even a few years ago. Althoughwhole genomes cannot currently be reassembled fromantitumor compound.

    Viral Metagenomics

    The world is thought to contain 1032 uncharacterizedviruses, and characterizing them by metagenomics may bemore efficient and informative than finding a suitable hostfor each of them. Moreover, since most of their hosts havelikely not been cultured, metagenomics represents the onlyway to access this viral diversity. Cloning viral genomesfrom an environmental sample faces additional challengesthat are not necessarily barriers in metagenomics based oncellular organisms. For example, the viral DNA must beseparated from abundant free DNA in the environment,including organismal DNA; viral genes that kill the hostmust be inactivated or they will prevent the cloning of theviral DNA; chemically modified viral DNA can be unclon-able; and ssDNA and RNA viral genomes are not amenableto traditional cloning. Generally, researchers have purifiedintact virus particles by physical separation and thenextracted the DNA from those particles. In one example,Breitbart and colleagues used a linker-amplified shotgunlibrary method to analyze a viral metagenomes from thePacific Ocean. They found that most viral sequences recov-ered were not highly similar to known viral sequences,suggesting that the global viral metagenome is stillexpansive potential and novelty of bacterial genes in theenvironment and further supports the hypothesis thatmuch of microbial diversity remains undescribed.

    Genes besides 16S rRNA genes can also be used asanchors. To identify the source of production of the anti-tumor compound pederin, Piel constructed a cosmid libraryfrom pederin-producing Paederus beetles and screenedthe library for pederin synthesis genes. A locus encodingthe putative pederin synthesis genes was identified, and thelocus seems to be bacterial in origin, suggesting symbioticbacteria within Paederus beetles are the true producers ofpederin. This analysis shows that metagenomics can be usedto isolate genes of potential medical interest and confirmsthat it is the bacteria, not their insect hosts, that produce thisshotgun sequencing of complex communities containingdozens, hundreds, or thousands of species, rapid advancesin technology development make it likely that such featsare not far off.

    The first environment to be the subject of a compre-hensive sequence-based metagenomic effort was the acidmine drainage environment at Iron Mountain, California.Acid mine drainage results when pyrite dissolution facil-itates microbial iron oxidation, which is accompanied byacid production. The simplicity of the acid mine drainagemicrobial community led to the assembly of five nearlycomplete or partial genomes without first separation ofcells or cultivation of community members.

    Approximately 76 million base pairs (Mbp) weresequenced from small-insert libraries from the acid minedrainage biofilm. The community consisted of three bacter-ial and three archaeal lineages, thus raising the possibility ofgenome reassembly for many if not all of the members. Toachieve this, the sequences were first assembled into scaf-folds. Then the scaffolds were sorted into bins based ontheir GC content. Binning is the first step in assigning thescaffolds to a unique genome. The scaffolds in each bin werethen sorted based on the degree of coverage, assuming that amore abundant member of the community would contributemore DNA to the library, and therefore be more highlyrepresented in the sequence data and that all genes withinone organisms genome should be similarly represented.Thus, the highGC bin was separated into a 10X coveragegenome and a 3X coverage genome. These two genomeswere found to represent a Leptospirillum group II genome anda Leptospirillum group III genome, respectively. A nearlycomplete, 2.23-Mb genome of the group II genome wasassembled. Similarly, the 10X coverage genome in the lowGC bin represented a nearly complete genome of thearchaeon Ferroplasma type II. Partial genomes were alsoidentified for a Ferroplasma type I strain and a G-plasmastrain.

    In addition to providing a model for genome reassem-bly, the acid mine drainage study led to tremendousinsight into the habitat and physiology of the communitymembers. The representation of various functions wasrecorded, and a chemical model for the flow of energy,nutrients, and electrons was developed and tested.

    Another pivotal study in sequence-based metagenomicsinvolved the large-scale sequencing of libraries constructedfrom the DNA of microbes living in the Sargasso Sea. Morethan one billion base pairs of DNA were sequenced, repre-senting approximately 1800 genome equivalents. Cloneswere end-sequenced to provide paired-end reads andassembled into scaffolds, mini scaffolds (with a singlepaired-end read), or left as single reads. Again, the sequenceswere sorted into bins, based on depth of coverage, oligonu-cleotide frequencies, and similarity to known genomes.From these data, several nearly complete genomes wereassembled, as well as ten megaplasmids. Sequence analysis

  • nocellulosic digester. Four clones expressing cellulolyticactivity were identified, all of which had activity optima

    628 Genetics, Genomics | Metagenomicspredicted that 1 214 207 novel proteins were encoded in theenvironmental DNA. The analysis highlighted the presenceof related strains of a given species in the sample, raising thecomplicating factor of strain heterogeneity in metagenomeanalysis.

    When the community is sufficiently complex to pre-vent reassembly of genomes, other strategies must beundertaken to find patterns. For example, one studycompared four metagenomic libraries, one from soil andthree from oceanic whale-fall samples (decomposingwhale carcasses on the ocean floor), all of which weretoo complex for genome reassembly or even constructionof contigs or scaffolds. Using paired-end sequences fromsmall-insert libraries, environmental gene tags (EGTs)were identified, 90% of which contained predicted genes,thus allowing for a global predicted metaproteome ana-lysis without assembly of the DNA into larger scaffolds.This approach may be useful to find patterns in predictedprotein functions and metabolic properties in many dif-ferent environments.

    In a powerful study, sequence-based metagenomics wasused to investigate the microbiome of the human ecosys-tem. Since many of the microorganisms living in thehuman gut have not yet been cultured, metagenomicsallows a direct approach to this microbial community.The data revealed nearly complete genomes of strains ofBifidobacterium longum and Methanobrevibacter smithii, anarchaeon frequently found in the gut ecosystem. Gutmicrobes contribute many functions to human metabolism,including glycan degradation and fatty acid synthesis, and anumber of these functions were identified and their diver-sity assessed by metagenomics. Scale-up of this approachcould yield near-complete genomes of many of the impor-tant microbes in our own gut. Comparative studies amongindividuals, during infant development and after antibioticingestion, and among people consuming different diets arebeginning to reveal both the idiosyncratic nature of eachpersons gut microbiota, which may provide a signature asunique as a fingerprint, and the microbial motifs that definehealth and disease.

    Sequence-based metagenomics has the potential torevolutionize our understanding of microbial diversityand function on earth. However, even these initial studieshave raised several questions of methods and technology.It is apparent that further advances in bioinformatics areneeded to handle the vast quantities of data derived fromthese projects. In addition, the species richness of thecommunities, coupled with the genetic complexity ofpopulations of each species, necessitates sequencingextremely large libraries to approach complete coverage.Finally, uneven species distribution, leading to the over-representation of abundant genomes in libraries, makesthe desired library size even larger in order to capturerare species. The acceleration of sequencing, techniquesthat remove the most abundant sequences, andat pH 67 and 6065 C conditions similar to those inthe digester.BAC libraries containing 28 000 clones with an averageinsert size of 43 kb. The frequency of finding active clonesin these libraries ranged from 1:456 to 1:3648, which issimilar to the results from other metagenomic surveys forbiocatalysts, thereby highlighting the need for robustassays for functional analysis.

    Exploiting Environmental PhysicochemicalConditions for Biocatalysts Discovery

    One approach intended to increase the likelihood of find-ing certain activities is to build metagenomic librariesfrom environments that are enriched for bacteria withthe desired function. For example, a search for cellulasesfocused on the liquor of an anaerobic, thermophilic, lig-turing and to survey a communitys functions (Table 2).Function-based metagenomics, unlike sequence-drivenapproaches, does not require that genes have homologyto genes of known function, and it offers the opportunityto add functional information to the nucleic acid andprotein databases.

    Screening Metagenomic Libraries for NovelEnzymes

    Microorganisms have always been a prime source ofindustrial and biotechnological innovations, but untilrecently applications have been derived from culturedorganisms. Metagenomics presents the possibility of dis-covering novel biocatalysts from microbial communitiesthat either confounded cultivation or failed to yield newculture isolates upon repeated attempts. Assays that havehistorically been used to identify enzymes (e.g., amylases,cellulases, chitinases, and lipases) in cultured isolates havebeen applied successfully to functional metagenomics.Function-based metagenomic analysis of Wisconsin agri-cultural soil yielded 41 clones having either antibiotic,lipase, DNase, amylase, or hemolytic activities amongcomputational tools will enhance sequence-based meta-genomic analysis.

    Function-Based Metagenomic Analysis

    Functional metagenomics involves identification of clonesthat express activities conferred by the metagenomicDNA. Sequence-based metagenomics has revealed phy-siological and ecological capacity that extends wellbeyond that of the culturable minority. Activity-basedmetagenomics provides an opportunity to circumvent cul-

  • Table 2 Functional metagenomics surveys

    Target gene Source Host strain Cloning vectorInsert size(kb)

    Number ofclones

    Activeclones References

    Extractionmethod

    Cellulases Feedstock E. coli N.R. N.R. N.R. 4 Healy et al.Applied Microbiology and

    Biotechnology

    43: 667674

    Direct

    Biocatalysts andAntimicrobials

    Soil E. coliDH10B

    pBeloBAC11 27/44.5 3646/ 24 546 41 Rondon et al.Applied and Environmental

    Microbiology

    66: 25412547

    Direct

    Tetracyclineresistance

    determinants

    Oral cavity E. coliTOP10

    pTOPO-XL 0.83 450 1 Diaz-Torres et al.Antimicrobial Agents and

    Chemotherapy

    47: 14301432

    Direct

    Xylanases Insect E. coli Lambda ZAP 36 1000000 4 Brennan et al.

    Applied and Environmental

    Microbiology

    70: 36093617

    Direct

    Amidases Soil E. coli

    TOP10

    pZerO-2 5.2 8000 / 25 000 5 Gabor et al.

    Environmental Microbiology

    6: 948958

    Direct/

    enrichment

    Aminoglycosideresistance

    determinants

    Soil E. coliDH10B

    pCF430/pJN105 1.965 1186200 10 Riesenfeld et al.Environmental Microbiology

    6: 981989

    Direct

    Signal molecules Soil E. coliEpi300

    pCC1FOS/pSuperBAC/

    pCC1BAC

    1190 180 000 3 Williamson et al.Applied and Environmental

    Microbiology

    71: 63356344

    Direct

    Catabolic enzymes Aquatic E. coliJM109

    p18GFP 7 150 000 35 Uchiyama et al.Nature Biotechnology

    23: 8893

    Direct

    Antibiotic

    desistancedeterminants

    Oral cavity E. coli

    TOP10/Epi300

    TOPO-XL/

    pCC1FOS

    0.83/40 1260/600 90/14 Diaz-Torres et al.FEMS Microbiology Letters258: 257262

    Direct

    Signal molecules Insect E. coli

    DH10B

    pBluescript II

    KS ()3.3 800 000 1 Guan et al.

    Applied and EnvironmentalMicrobiology

    73: 36693676

    Direct

  • metagenomic field. Two lessons are evident. First, the

    630 Genetics, Genomics | Metagenomicshydrolyzing the amide compounds. According to DGGEanalyses, enrichment cultures showed 6477% lowerbacterial diversity than in the original sample before enrich-ment, suggesting that the enrichment conditions enhancedgrowth of those bacteria that could utilize the amides as anitrogen source and limited growth of the rest of the com-munity, thereby reducing the diversity of the community.Four amidase-positive clones were identified from metage-nomic libraries constructed from enrichment cultures andall had low to moderate homology to known enzymes. Twoamidase-positive clones had low homology to hypotheticalproteins. Following extensive amide substrate profiling, asingle clone (pS2) was found to catalyze the synthesis ofpenicillin G from 6-aminopenicillanic acid and phenylace-tamide. The pS2-encoded enzyme facilitated accumulationof twofold higher maximum level of penicillin G than E. colipenicillin amidase and performed better in amoxicillin andampicillin production experiments. These analyses showthat activity-based screening of metagenomic libraries canAnother study was predicated on the finding thatwood- and plant-eating insects are rich sources ofenzymes involved in degradation of complex carbonpolymers such as cellulose and xylan. These polysacchar-ides are the primary nutrient source for both the insectand the microbial community it harbors, and thereforeshould enrich for species that produce glycosyl hydro-lases. Small-insert libraries constructed from termite andlepidopteran gut microbial DNA (1 106 clones) con-tained four clones with xylanase activity. These clonesharbored genes that encoded xylanase catalytic domainswith low sequence similarity (3340%) to other glycosylhydrolases. Not surprisingly, the enzymes most closelyrelated to three of the four xylanases identified in thisstudy were found elsewhere in gut-associated microbes.Prior knowledge of the habitat, based on either culture-based studies or metagenomic sequence analysis, facili-tates shrewd choices that match habitat with the functionthat is sought.

    Artificial Enrichment for Biocatalyst Discovery

    Just as the environment can compel microbial communitiesto retain members with specific biochemical abilities, micro-bial communities can be actively manipulated prior tometagenomic library construction to enrich for desiredactivities. One study focused on the discovery of amidasesthat convert D-phenylglycine amide derivatives into keyintermediates for the production of semisynthetic -lactamantibiotics. Soil was added to minimal medium thathad either D-phenylglycine amide or a mixture of variousamides as the sole nitrogen source. Libraries wereconstructed in a leucine-auxotrophic E. coli strainand screened on the medium containing onlyphenylacetyl-L-leucine or D-phenylglycine-L-leucine, eitherof which would select for the growth of clones capable ofuncultured world contains some genes that are quite famil-iar based on studies of cultured organisms as well as moreexotic ones that represent new classes of genes or proteinsyield unique biocatalysts of ecological relevance and bio-technological importance, and some of these may besuperior to those found in cultured organisms.

    Rapid Discovery of Novel Antibiotic ResistanceGenes

    Genes that confer antibiotic resistance to bacteria are ofgreat public health, pharmacological, and biologicalimportance. Lateral gene transfer and broad antibioticusage have resulted in wide distribution of antibioticresistance genes in microbial communities. As a result,many antibiotic treatments are rendered less effective ortotally ineffective against pathogens (e.g., methicillin-resistant Staphylococcus aureus). Characterization of theresistance genes in uncultured organisms may identifyresistance determinants that will appear in clinical set-tings in the future. Characterization of resistance inenvironments that have not been influenced directly byhuman use of antibiotics could point to the origins ofcertain resistance determinants. A comprehensive under-standing of resistance mechanisms in culturable anduncultured bacteria will improve drug design, by provid-ing clues to how resistance can be combated. Additionally,antibiotic resistance and synthesis genes usually clustertogether in microbial genomes. Therefore, antibioticresistance genes can provide a signpost for biosyntheticpathways residing in nearby DNA.

    Antibiotic resistance provides technical advantages overmost other characteristics studied with metagenomics.Because the frequency of active clones is low, there is asubstantive advantage to using a selection in which onlythe desired clones survive, rather than a screen, in which allclones need to be addressed to determine whether theyexpress the desired function. When antibiotic-resistantclones are the target, metagenomic libraries are culturedin the medium containing the appropriate antibiotic andonly those clones that contain and express cognate resis-tance genes grow. Using this strategy, a number of studieshave identified clones that carry resistance to antibioticssuch as tetracycline or aminoglycosides such as kanamycin,tobramycin, and amikasin. In one study, 1 186 200 clonesfrom small- and large-insert soil metagenomic librarieswere selected for aminoglycoside resistance and ten sur-viving clones were identified. Some of these clones carryresistance genes that are similar to known aminoglycosideresistance genes, some carry resistance genes that are dis-tantly related to known resistance genes, and someresistance genes do not have detectable similarity to anyknown resistance determinants. The results of the antibio-tic resistance studies provide a model for the rest of the

  • for a known function. Second, selections provide the powerto rapidly identify clones of interest, circumventing thetedious screening step that forms the basis and bottleneck of many function-driven metagenomic studies.

    Intracellular Functional Screens

    The daunting challenge of screening massive librarieswith millions of members has been met with a noveltype of screen, in addition to the selections described inthe previous section. In an effort to devise screens that areboth rapid and sensitive, the concept of intracellularscreens emerged in which the detector for the desiredactivity resides in the same cell as the metagenomic DNA.The first two intracellular screens, metabolite-regulatedexpression (METREX) and substrate-induced geneexpression (SIGEX) (Figure 2), meet the criteria ofspeed and sensitivity and provide prototypes for sensitivescreens that are capable of reporting poorly expressedgene products and can utilize technologies (e.g., fluores-

    metagenomic libraries were subjected to the METREX

    screen; 11 clones that stimulated Gfp expression were

    identified. Additionally, two clones inhibited Gfp expres-

    sion in the presence of 80 nmol l1 N-(3-oxohexanoyl)-L-HSL, a typical quorum-sensing signal molecule isolated

    from cultured organisms, which binds to LuxR and stimu-

    lates transcription from the luxI promoter. Interestingly,only one of the Gfp-stimulating clones could be detected

    in an overlay-based Gfp-reporter screen, indicating that the

    intracellular screen detects clones that would be lost in a

    standard screen for quorum-sensing signal compounds in

    which the active molecule needs to diffuse out of the

    producing cell and into the cell containing the reporter.

    Most of these clones had no sequence similarity to genes

    known to direct production of quorum-sensing stimulating

    or inhibiting compounds.SIGEX facilitates the rapid identification of promoters

    that drive transcription in the presence of a particular

    catabolite (Figure 2(b)). This can be used to identify

    pathways that are regulated by the metabolite.

    (

    Figure 2 Intracellular screens for metagenomic libraries. (a) Metab, w

    luxI

    -trato

    Genetics, Genomics | Metagenomics 631gene products (red) stimulate dimerization of LuxR proteins (blue)

    produce light (green). Gene products interacting directly with the

    expression (SIGEX): Metagenomic DNA is cloned into a promoterthe cloning site. Promoters in the metagenomic DNA that respondcence-activated cell sorting) that rapidly screen millionsof clones.

    METREX screening detects molecules that mimicquorum-sensing signal molecules (Figure 2(a)). Thesecompounds stimulate transcription, regulated by the LuxRprotein and initiated from the luxI promoter, which is linkedto a reporter, such as the gene encoding green fluorescentprotein. The typical quorum-sensing inducers are acylatedhomoserine lactones, but metagenomics has revealed otherclasses of molecules that can function similarly. Since themetagenomic DNA and the broad-specificity reporter plas-mid are both present within the host cell, even lowconcentrations of poorly expressed metagenomic gene pro-ducts can be detected. In one study, 180 000 clones from soil

    Reporter plasmid

    Metagenomic clone l u x R

    g f p

    Gfp

    (a)

    P luxlDegradative pathways are often regulated by the com-

    pound that is degraded, so SIGEX is of interest for rapid

    identification of new gene clusters for bioremediation. Of

    152 000 clones from a metagenomic library derived from

    an aquatic microbial community, 33 clones were induced

    by benzoate and two by naphthalene. A wide variety of

    genes were resolved in these screens, reflecting not only

    the sensitivity of the assay but also the broad impact of

    aromatic hydrocarbons on bacterial community gene

    expression.Further developments in high-throughput screening

    will enhance discovery in function-driven metagenomics

    just as advancements in sequencing and bioinformatics will

    accelerate discovery in sequence-driven metagenomics.

    b)

    Reporter Plasmid

    Metagenomic DNA

    Gfp

    g f p

    g f p

    Reporter Plasmid

    olite-regulated expression (METREX): diffusible metagenomic

    hich in turn induce expression of Gfp from the luxI promoter to

    promoter could also be detected. (b) Substrate-induced gene

    p vector containing a promoterless Gfp-reporter downstream ofspecific catabolites induce expression of Gfp.

  • See also: DNA Sequencing and Genomics; Ecology,

    Microbial; Genome Sequence Databases: Genomic,

    Construction of Libraries; Phylogenetic Methods;

    Recombinant DNA, Basic Procedures

    Further Reading

    Casas V, and Rohwer F (2007) Phage metagenomics. In Hughes,KT and Maloy, SR (eds.) Methods in Enzymology 421: Advancedbacterial genetics: use of transposons and phage for genomicengineering pp. 25968. Amsterdam: Elsevier.

    Daniel R (2005) The metagenomics of soil. Nature Reviews Microbiology3: 470478.

    DeLong EF (2005) Microbial community genomics in the ocean. NatureReviews Microbiology 3: 459469.

    Frank DN and Pace NR (2008) Gastrointestinal microbiology entersthe metagenomics era. Current Opinion in Gastroenterology24: 410.

    Gillespie DE, Rondon MR, Williamson LL, and Handelsman J (2005)Metagenomic libraries from uncultured microorganisms.In: Osborn AM and Smith CJ (eds.) Molecular microbial ecology,pp. 261279. London: Taylor and Francis.

    Handelsman J (2004) Metagenomics: application of genomics touncultured microorganisms. Microbiology and Molecular BiologyReviews 68: 669685.

    Hugenholtz P, Goebel BM, and Pace NR (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterialdiversity. Journal of Bacteriology 180: 47654774.

    Langer M, Gabor EM, Liebeton K, et al. (2006) Metagenomics: aninexhaustible access to natures diversity. Biotechnology Journal1: 815821.

    Lefevre F, Robe P, Jarrin C, et al. (2008) Drugs from hidden bugs: theirdiscovery via untapped resources. Research in Microbiology159: 153161.

    Li X and Qin L (2005) Metagenomics-based drug discovery and marinemicrobial diversity. Trends in Biotechnology 23: 539543.

    Rappe MS and Giovannoni SJ (2003) The uncultured microbial majority.Annual Review of Microbiology 57: 369394.

    Riesenfeld CS, Schloss PD, and Handelsman J (2004) Metagenomics:genomic analysis of microbial communities. Annual Review ofGenetics 38: 525552.

    Rondon MR, Goodman RM, and Handelsman J (1999) The Earthsbounty: assessing and accessing soil microbial diversity. Trends inBiotechnology 17: 403409.

    Tringe SG and Rubin EM (2005) Metagenomics: DNA sequencing ofenvironmental samples. Nature Reviews Genetics 6: 805814.

    Ward N (2006) New directions and interactions in metagenomicsresearch. FEMS Microbiology Ecology 55: 331338.

    632 Genetics, Genomics | Metagenomics

    MetagenomicsGlossaryAbbreviationsDefining StatementIntroductionBuilding Metagenomic LibrariesPreparing Metagenomic DNACloning Vectors and Metagenomic Library StructureCommunity Complexity and Metagenomic Library StructureSelecting and Transforming a Host OrganismStoring Metagenomic Libraries

    Sequence-Based Metagenomic AnalysisAnchor-Based SequencingViral MetagenomicsCommunity Metagenomics

    Function-Based Metagenomic AnalysisScreening Metagenomic Libraries for Novel EnzymesExploiting Environmental Physicochemical Conditions for Biocatalysts DiscoveryArtificial Enrichment for Biocatalyst DiscoveryRapid Discovery of Novel Antibiotic Resistance GenesIntracellular Functional Screens

    See alsoFurther Reading