microbial phylogenomics (eve161) class 14: metagenomics

85
EVE 161: Microbial Phylogenomics Era IV: Genomes from the Environment UC Davis, Winter 2016 Instructor: Jonathan Eisen 1

Upload: jonathan-eisen

Post on 16-Jan-2017

310 views

Category:

Science


8 download

TRANSCRIPT

Page 1: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

EVE 161:Microbial Phylogenomics

Era IV: Genomes from the Environment

UC Davis, Winter 2016 Instructor: Jonathan Eisen

!1

Page 2: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Quiz

Choose 2 of these 4 questions to answer.

• 1. Explain how phylogenetic analysis was used in these papers

• 2. Give an example of how the authors tried to predict the function of this new “proteorhodopsin” protein.

• 3. Give an example of how the authors experimentally studied the function of this new “proteorhodopsin.”

• 4. Why did they call this new protein “proteorhodopsin”?

Page 3: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Metagenomics History

• Term first use in 1998 paper by Handelsman et al. Chem Biol. 1998 Oct;5(10):R245-9. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products

• Collective genomes of microbes in the soil termed the “soil metagenome”

• Good review is Metagenomics: Application of Genomics to Uncultured Microorganisms by Handelsman Microbiol Mol Biol Rev. 2004 December; 68(4): 669–685.

Page 4: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Papers for Today

D O I: 10 .1126/science.289.5486.1902 , 1902 (2000); 289Science

et al.O ded Béjà ,Phototrophy in the SeaBacterial Rhodopsin: Evidence for a New Type of

This copy is for your personal, non-commercial use only.

. clicking herecolleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

. herefollowing the guidelines can be obta ined byPermission to republish or repurpose articles or portions of articles

(this information is current as of May 18, 2010 ):The following resources related to this article are available online at www.sciencemag.org

http://www.sciencemag.org/cgi/content/full/289 /5486/1902version of this article a t:

including high-resolution figures, can be found in the onlineUpdated information and services,

found at: can berelated to this articleA list of se lected additional articles on the S cience W eb sites

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#re la ted-content

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#otherarticles, 10 of which can be accessed for free: cites 28 articlesThis article

418 article (s) on the IS I W eb of S cience. cited byThis article has been

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#otherarticles 76 articles hosted by H ighW ire P ress; see: cited byThis article has been

http://www.sciencemag.org/cgi/collection/biochemBiochemistry

: subject collectionsThis article appears in the following

registered trademark of AAAS . is aScience2000 by the American Association for the Advancement of S cience; a ll rights reserved. The title

C opyrightAmerican Association for the Advancement of S cience, 1200 N ew Y ork Avenue N W , W ashington, D C 20005. (print IS S N 0036-8075; online IS S N 1095-9203) is published weekly, except the last week in D ecember, by theScience

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

,;1 <8=34>? -1 @1 A <B::92C? D1 #=76:3E 892F584BGB4H 9I &364= :3476B3EC 34 =BJ= 47:K7634567C1 !" #$%&'()"

*$)" !!" LMLLNLMOP Q,M;.R1

,O1 S9E4? -1 T1 #=76:3E FBII5CBGB4H 9I 9EBGB271 +,-.' /0,1$." 234" 5$.." #!" +/+N+/O Q,M;UR1

,M1 <7BK9EF? $1 #7:K7634567 F7K72F7287 9I 4=76:3E 4632CK964 K69K764B7C 9I 86HC43EEB27 698VCNJ72763E E301

6$3.%1%&'()43) #$%" ,L,N,;, Q,MMOR1

./1 S9I:7BC476? "1 W1 W324E7 G3E57C 9I 4=76:3E 892F584BGB4H 32F 4=7 J794=76: I69: K=9292 EBI74B:7C1

234$13$ #&'" ,LMMN,;/L Q,MMMR1

.,1 XE7:72C? Y1 D1 #=76:3E 892F584BGB4H 32F E344B87 GBZ634B923E :9F7C1 2%047 2.,.$ /'()" !" ,NMO Q,MUOR1

..1 T95=BIF?W1 "1? "2F635E4? [1? @B\574? D1 A %B8=74? Y1 #=76:3E 7]K32CB92 9I I96C476B47 5K 49 4=7:7E4B2J

K9B241 #$%&'()" *$)" 5$.." #'" ,P+N,,PL Q,MMLR1

.P1 (358=7>? "1? T36659E? D1 A #9::3CB? "1^=H F9 8924B2724C Z673V 5K K363EE7E 49 328B724 969J72B8 Z7E4C_

6$--, 8%9, $" L.NLL Q,MM;R1

.+1 #9::3CB? "1 A (358=7>? "1 `924B27243E 6BI4B2J K363EE7E 49 328B724 89EEBCB923E Z7E4Ca "2 7II784 9I 4=7

:78=32B83E 32BC9469KH 9I 4=7 EB4=9CK=76B8 :324E71 +,-.' /0,1$." 234" 5$.." %&(" ,MMN.,/ Q.//,R1

.U1 "8=3576? $1 A X%b<Y 096VB2J J695K1 !70 BF73C 92 4=7 X72H3 6BI4 Z3C7F 92 4=7 B2G76CB92 9I 4=7

89:ZB27F F343C74 9I 4=7 ,MOU 32F ,MOMcM/ C7BC:B8 49:9J63K=H 7]K76B:724C16$3.%1%&'()43) #')" P/UN

P.M Q,MOOR1

.L1 <7BK9EF? $1 <B:5E432795C :73C567:724C 9I 4=76:3E FBII5CBGB4H 32F 4=76:3E 892F584BGB4H 52F76 =BJ=

K67CC567 5CB2J 4=76:3E K5EC7 9I d2B47 E72J4=1 :4;' 6$<&"=:4;' /-$))" #*" L/MNL,P Q,MOOR1

.;1 "Z63:C92? &1 S1? T6902? W1? <E54CVH? *1 -1 A e35J? -1 #=7 7E3C4B8 892C4324C 9I <32 `36E9C 9EBGB27 5K 49

,;DY31 !" #$%&'()" *$)" %*#" ,..U.N,..LP Q,MM;R1

.O1 [5IIH? #1 <1 A(35J=32?W1 #1 &E3C4B8B4H 9I 72C434B47 32F B4C 67E34B92C=BKC 49 86HC43E C465845671 !" #$%&'()"

*$)" $'" POPNPM, Q,MOOR1

!"#$%&'()*(+($,-

7 4=32V `1 X36J76 I96 K364B8BK34B92 B2 4=7 :73C567:724C? 32FW1 @766769 32F @1 T95FB76I96 K69GBFB2J 4=7 C3:KE7C I69: T3EFBCC769 32F Y3K53 !70 D5B273? 67CK784BG7EH1 #=7*3Z96349B67 F7 #784929K=HCB\57fC &T<[ CHC47: 03C I52F7F ZH 4=7 `!%<cb!<$?$2BG76CB47g 9I W924K7EEB76 bb? 32F !<@ K69h784 ii"2349:H 9I 32 368=732 863492ff1 #=BC 096V03C C5KK9647F ZH 4=7 `!%<cb!<$ K69J63::7 ii"84B92 #=7g:34B\57 b229G3247ff1

`9667CK92F7287 C=95EF Z7 3FF67CC7F 49 "1#1Q7j:3BEa F7B3kFC45152BGj:924K.1I6R1

11111111111111111111111111111111111111111111111111111111111111111!"#$%#"&#'#()*+ (&#$#$"#(&,*+ $&% #-%.+/'%' 0%1 2.3lm4 56%+. 78 9(:'*-&mn4 ;#&+ <8 9(:'*-&n4 =."*#+ <%-6%"-l> 5'?."' @8 A%<#+Bl

l>%1.$-$( ?,( @AB,-4B< *$)$,-3' C1).4.B.$D >%)) 5,1741;D E,04F%-14, GHIJGDK2@nL$&,-.<$1. %F >43-%M4%0%;( ,17 >%0$3B0,- #$1$.43)D 6'$ K149$-)4.( %F6$N,) >$743,0 23'%%0D :%B).%1D 6$N,) OOIJID K2@m6'$)$ ,B.'%-) 3%1.-4MB.$7 $AB,00( .% .'4) P%-Q

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

,-./0.-1.2.3456%" 7 -0/56789:.6/75656; 56/0;-78 <0<=-760 3-.9/056 /17/ >?6:/5.64 74 7 85;1/92-5@06 3-./.6 3?<3" A74 254:.@0-0256 /10 ;06.<0 .> 76 ?6:?8/5@7/02 <7-560 =7:/0-5?<B 1.A0@0-" /103-0@7806:0" 0C3-0445.6 762 ;060/5: @7-57=585/D .> /154 3-./056 5667/5@0 <7-560 <5:-.=578 3.3?87/5.64 -0<756 ?6E6.A6+ F0-0 A0-03.-/ /17/ 31./.7:/5@0 3-./0.-1.2.3456 54 3-0406/ 56 .:0765:4?->7:0 A7/0-4+ G0 784. 3-.@520 0@5206:0 .> 76 0C/0645@0 >7<58D .>;8.=788D 254/-5=?/02 3-./0.-1.2.3456 @7-576/4+ H10 3-./056 35;9<06/4 :.<3-5456; /154 -1.2.3456 >7<58D 400< /. =0 430:/-788D/?602 /. 25>>0-06/ 17=5/7/4I7=4.-=56; 85;1/ 7/ 25>>0-06/ A7@09806;/14 56 7::.-276:0 A5/1 85;1/ 7@7587=80 56 /10 [email protected]<06/+H.;0/10-" .?- 27/7 4?;;04/ /17/ 3-./0.-1.2.34569=7402 31././-.931D 54 7 ;8.=788D 45;65J:76/ .:0765: <5:-.=578 3-.:044+T38476B96=9F9KCB2? 3 674B23Ej89243B2B2J :7:Z6327 K6947B2 4=34

I5284B92C 3C 3 EBJ=4jF6BG72 K69492 K5:K? 03C FBC89G767F 2736EH4=677 F783F7C 3J9 B2 4=7 =3E9K=BE7 :,0%M,3.$-4B< ),041,-B<.1 #=7iK56KE7 :7:Z6327Cf 9I 4=BC 368=3792 89243B2 3 =BJ= 8928724634B929I Z38476B96=9F9KCB2 :9E785E7C? K38V7F B2 32 96F767F 409jFB:72jCB923E 86HC43EEB27 3663HP1 )2 3ZC96K4B92 9I EBJ=4? Z38476B96=9F9KCB252F76J97C 3 C76B7C 9I 892I96:34B923E C=BI4C Q3 K=9498H8E7R? 835CB2J 3K69492 49 Z7 4632CK9647F 3869CC 4=7 :7:Z63271 #=7 67C5E4B2J7E784698=7:B83E :7:Z6327 K94724B3E F6BG7C "#Y CH24=7CBC? 4=695J=

32 Soj"#Y3C7+1 <B:BE36 6=9F9KCB2j:7FB347F? EBJ=4jF6BG72 K69492K5:KB2J? I96:76EH 4=95J=4 49 7]BC4 92EH B2 =3E9K=BEB8 368=373? =3CZ772 FBC89G767F B2 32 5285E4BG347F :36B27 Z38476B5:, 9I 4=7i<"%OLf K=HE9J7274B8 J695KU1 #=BC d2FB2J C5JJ7C47F 4=34 3 K67jGB95CEH 526789J2B>7F K=949469K=B8 K34=03H :3H 98856 B2 4=798732fC K=94B8 >927p =907G76? 3EE 736EB76 F343 367 Z3C7F C9E7EH 926789:ZB2324 [!" 32F 41 94.-% ZB98=7:B83E 323EHC7C? 32F 4=BCK=729:7292 =3C 294 H74 Z772 9ZC76G7F B2 4=7 C731#9 47C4 0=74=76 6=9F9KCB2jEBV7 :9E785E7C I96: K=949384BG7

K6947B2C B2 234BG7 :36B27 Z38476B3? 07 323EHC7F :7:Z6327 K67K363j4B92C 9I Z38476B9KE32V492 89EE7847F FB6784EH I69: W924767H T3HC56I387 03476C1 $CB2J E3C76 q3C=jK=949EHCBC 478=2B\57CL? 07 C7368=7FI96 CK78469C89KB8 7GBF7287 9I K694796=9F9KCB2jEBV7 K=9498=7:B83E384BGB4H B2 4=7C7 Z38476B9KE32V492 :7:Z6327 K67K3634B92C1 79ZC76G7F 4632CB724 q3C=jB2F587F 3ZC96K4B92 8=32J7C 3446BZ543ZE749 3 K=9498=7:B83E 67384B92 8H8E7 9I ,U:C Q@BJ1 ,R? 0=B8= BC8=3638476BC4B8 9I 674B2HEBF727 B92 K5:KC; 32F CB:BE36 49 4=7 K=949j8H8E7 C772 0B4= +)3'$-43'4, 3%04:7:Z6327C 7]K67CCB2J 6789:ZB2324K694796=9F9KCB2,1@564=76:967? U:C 3I476 4=7 q3C= 4=7 3ZC96K4B92 FBII767287

CK78465: 9I 4=7 72GB692:7243E C3:KE7 7]=BZB47F 4=7 C3:7 C=3K732F C3:7 BC9CZ7C4B8 K9B24 I96 4=7 EBJ=4 32F F36V C4347 3C K694796=9jF9KCB2 7]K67CC7F B2 +" 3%04 :7:Z6327C1 #=BC C=90C 4=34 K=949j8=7:B83E 67384B92C B2 234BG7 Z38476B9KE32V492 87EE :7:Z6327CJ7276347 3 67FjC=BI47F? E92JjEBG7F B2476:7FB347 CK78463EEH CB:BE3649 4=34 9I 6789:ZB2324EH 7]K67CC7F K694796=9F9KCB2 B2 +" 3%041 #=7q3C=jK=949EHCBC F343 K69GBF7 FB6784 K=HCB83E 7GBF7287 I96 4=77]BC47287 9I K694796=9F9KCB2jEBV7 4632CK96476C 32F 72F9J7295C674B23E :9E785E7C B2 4=7 :B869ZB3E I6384B92 9I 4=7C7 893C43E C56I38703476C1#=7 K=9498H8EB2J KBJ:724 03C ZE738=7F 0B4= =HF69]HE3:B27

32F? 3I476 =HF69]HE3:B27 67:9G3E? K=949384BGB4H 03C 67C4967F ZH3FFB2J 3EEj.-,1) 674B23E Q@BJ1 .R? C=90B2J 4=34 4=7 K=949CBJ23EC F9B2F77F F76BG7 I69: 674B2HEBF727 KBJ:72434B921 7 8928E5F7 4=34

–4

–3

–2

–1

0

1

2

Δ A

bsor

banc

e (×

10–3

Abs

orba

nce

(×10

–3)

806040200Time (ms)

580 nm

500 nm

400 nm

Wavelength (nm)

400 450 500 550 600 650 700–3

–2

–1

0

1

2

3

Monterey Bay environmental membranesProteorhodopsin in E. coli membranes

./*01( 2 !"#$% &"#'()*+,-$+ ".#/%."*-$ -'"*0$# )* #,#1$*#)/*# /2 3$3.%"*$#

1%$1"%$+ 2%/3 4'$ 1%/5"%6/4)- 2%"-4)/* /2 7/*4$%$6 8"6 #,%2"-$ 9"4$%#: ;/1< 3$3.%"*$

".#/%14)/* 9"# 3/*)4/%$+ "4 4'$ )*+)-"4$+ 9"=$>$*04'# "*+ 4'$ &"#' 9"# "4 4)3$ ? "4

@AB *3: 8/44/3< ".#/%14)/* +)22$%$*-$ #1$-4%,3 "4 @3# "24$% 4'$ &"#' 2/% 4'$

$*=)%/*3$*4"> #"31>$ C.>"-5D "*+ 2/% !" #$%&($E1%$##$+ 1%/4$/%'/+/1#)* C%$+D:

© 2001 Macmillan Magazines Ltd

http://www.sciencemag.org/content/289/5486/1902.full

http://www.nature.com/nature/journal/v411/n6839/abs/411786a0.html

Page 5: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Papers for Today

D O I: 10 .1126/science.289.5486.1902 , 1902 (2000); 289Science

et al.O ded Béjà ,Phototrophy in the SeaBacterial Rhodopsin: Evidence for a New Type of

This copy is for your personal, non-commercial use only.

. clicking herecolleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

. herefollowing the guidelines can be obta ined byPermission to republish or repurpose articles or portions of articles

(this information is current as of May 18, 2010 ):The following resources related to this article are available online at www.sciencemag.org

http://www.sciencemag.org/cgi/content/full/289 /5486/1902version of this article a t:

including high-resolution figures, can be found in the onlineUpdated information and services,

found at: can berelated to this articleA list of se lected additional articles on the S cience W eb sites

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#re la ted-content

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#otherarticles, 10 of which can be accessed for free: cites 28 articlesThis article

418 article (s) on the IS I W eb of S cience. cited byThis article has been

http://www.sciencemag.org/cgi/content/full/289 /5486/1902#otherarticles 76 articles hosted by H ighW ire P ress; see: cited byThis article has been

http://www.sciencemag.org/cgi/collection/biochemBiochemistry

: subject collectionsThis article appears in the following

registered trademark of AAAS . is aScience2000 by the American Association for the Advancement of S cience; a ll rights reserved. The title

C opyrightAmerican Association for the Advancement of S cience, 1200 N ew Y ork Avenue N W , W ashington, D C 20005. (print IS S N 0036-8075; online IS S N 1095-9203) is published weekly, except the last week in D ecember, by theScience

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

http://www.sciencemag.org/content/289/5486/1902.full

Page 6: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Extremely halophilic archaea contain retinal-binding integral membrane pro- teins called bacteriorhodopsins that function as light-driven proton pumps. So far, bacteriorhodopsins capable of generating a chemiosmotic membrane po- tential in response to light have been demonstrated only in halophilic archaea. We describe here a type of rhodopsin derived from bacteria that was discovered through genomic analyses of naturally occuring marine bacterioplankton. The bacterial rhodopsin was encoded in the genome of an uncultivated -proteobacterium and shared highest amino acid sequence similarity with archaeal rhodopsins. The protein was functionally expressed in Escherichia coli and bound retinal to form an active, light-driven proton pump. The new rhodopsin ex- hibited a photochemical reaction cycle with intermediates and kinetics characteristic of archaeal proton-pumping rhodopsins. Our results demonstrate that archaeal-like rhodopsins are broadly distributed among different taxa, including members of the domain Bacteria. Our data also indicate that a previously unsuspected mode of bacterially mediated light-driven energy generation may commonly occur in oceanic surface waters worldwide.

Page 7: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 8: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

SAR86

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!8

Page 9: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The 16S ribosomal RNA neighbor-joining tree was constructed on the basis of 1256 homologous positions by the “neighbor” program of the PHYLIP package [J. Felsenstein, Methods Enzymol. 266, 418 (1996)]. DNA distances were calculated with the Kimura’s 2-parameter method.

Page 10: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Cloning of proteorhodopsin. Sequence analysis of a 130-kb genomic fragment that encoded the ribosomal RNA (rRNA) operon from an uncultivated member of the marine g-Proteobacteria (that is, the “SAR86” group) (8, 9) (Fig. 1A) also revealed an open reading frame (ORF) encoding a putative rhodopsin (referred to here as proteorhodopsin) (10).

Page 11: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Cloning of proteorhodopsin. Sequence analysis of a 130-kb genomic fragment that encoded the ribosomal RNA (rRNA) operon from an uncultivated member of the marine g-Proteobacteria (that is, the “SAR86” group) (8, 9) (Fig. 1A) also revealed an open reading frame (ORF) encoding a putative rhodopsin (referred to here as proteorhodopsin) (10).

Page 12: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 13: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 14: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Marine Microbe Background

• rRNA PCR studies of marine microbes have been extensive

• Comparative analysis had revealed many lineages, some very novel, some less so, that were dominant in many, if not all, open ocean samples

• Lineages given names based on specific clones: e.g., SAR11, SAR86, etc

!14

Page 15: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

© 1990 Nature Publishing Group

http://www.nature.com/nature/journal/v345/n6270/abs/345060a0.html

Page 16: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

© 1990 Nature Publishing Group

© 1990 Nature Publishing Group

Page 17: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

© 1990 Nature Publishing Group

Page 18: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Vol. 57, No. 6APPLIED AND ENVIRONMENTAL MICROBIOLOGY, June 1991, p. 1707-1713

0099-2240/91/061707-07$02.00/0Copyright C) 1991, American Society for Microbiology

Phylogenetic Analysis of a Natural Marine BacterioplanktonPopulation by rRNA Gene Cloning and Sequencing

THERESA B. BRITSCHGI AND STEPHEN J. GIOVANNONI*

Department of Microbiology, Oregon State University, Corvallis, Oregon 97331

Received 5 February 1991/Accepted 25 March 1991

The identification of the prokaryotic species which constitute marine bacterioplankton communities has been

a long-standing problem in marine microbiology. To address this question, we used the polymerase chain

reaction to construct and analyze a library of 51 small-subunit (16S) rRNA genes cloned from Sargasso Sea

bacterioplankton genomic DNA. Oligonucleotides complementary to conserved regions in the 16S rDNAs ofeubacteria were used to direct the synthesis of polymerase chain reaction products, which were then cloned by

blunt-end ligation into the phagemid vector pBluescript. Restriction fragment length polymorphisms and

hybridizations to oligonucleotide probes for the SARll and marine Synechococcus phylogenetic groups

indicated the presence of at least seven classes of genes. The sequences of five unique rDNAs were determined

completely. In addition to 16S rRNA genes from the marine Synechococcus cluster and the previously identifiedbut uncultivated microbial group, the SARll cluster [S. J. Giovannoni, T. B. Britschgi, C. L. Moyer, and

K. G. Field, Nature (London) 345:60-63], two new gene classes were observed. Phylogenetic comparisonsindicated that these belonged to unknown species of a- and -y-proteobacteria. The data confirm the earlierconclusion that a majority of planktonic bacteria are new species previously unrecognized by bacteriologists.

Within the past decade, it has become evident that bacte-rioplankton contribute significantly to biomass and bio-geochemical activity in planktonic systems (4, 36). Untilrecently, progress in identifying the microbial species whichconstitute these communities was slow, because the major-ity of the organisms present (as measured by direct counts)cannot be recovered in cultures (5, 14, 16). The discovery adecade ago that unicellular cyanobacteria are widely distrib-uted in the open ocean (34) and the more recent observationof abundant marine prochlorophytes (3) have underscoredthe uncertainty of our knowledge about bacterioplanktoncommunity structure.

Molecular approaches are now providing genetic markersfor the dominant bacterial species in natural microbial pop-ulations (6, 21, 33). The immediate objectives of thesestudies are twofold: (i) to identify species, both known andnovel, by reference to sequence data bases; and (ii) toconstruct species-specific probes which can be used inquantitative ecological studies (7, 29).

Previously, we reported rRNA genetic markers and quan-titative hybridization data which demonstrated that a novellineage belonging to the oa-proteobacteria was abundant inthe Sargasso Sea; we also identified novel lineages whichwere very closely related to cultivated marine Synechococ-cus spp. (6). In a study of a hot-spring ecosystem, Ward andcoworkers examined cloned fragments of 16S rRNA cDNAsand found numerous novel lineages but no matches to genesfrom cultivated hot-spring bacteria (33). These results sug-gested that natural ecosystems in general may include spe-cies which are unknown to microbiologists.Although any gene may be used as a genetic marker,

rRNA genes offer distinct advantages. The extensive use of16S rRNAs for studies of microbial systematics and evolu-tion has resulted in large computer data bases, such as theRNA Data Base Project, which encompass the phylogeneticdiversity found within culture collections. rRNA genes are

* Corresponding author.

highly conserved and therefore can be used to examinedistant phylogenetic relationships with accuracy. The pres-ence of large numbers of ribosomes within cells and theregulation of their biosyntheses in proportion to cellulargrowth rates make these molecules ideally suited for ecolog-ical studies with nucleic acid probes.Here we describe the partial analysis of a 16S rDNA

library prepared from photic zone Sargasso Sea bacterio-plankton using a general approach for cloning eubacterial16S rDNAs. The Sargasso Sea, a central oceanic gyre,typifies the oligotrophic conditions of the open ocean, pro-viding an example of a microbial community adapted toextremely low-nutrient conditions. The results extend ourprevious observations of high genetic diversity amongclosely related lineages within this microbial community andprovide rDNA genetic markers for two novel eubacteriallineages.

MATERIALS AND METHODS

Collection and amplification of rDNA. A surface (2-m)bacterioplankton population was collected from hydrosta-tion S (32° 4'N, 64° 23'W) in the Sargasso Sea as previouslydescribed (8). The polymerase chain reaction (24) was usedto produce double-stranded rDNA from the mixed-microbi-al-population genomic DNA prepared from the planktonsample. The amplification primers were complementary to

conserved regions in the 5' and 3' regions of the 16S rRNAgenes. The amplification primers were the universal 1406R(ACGGGCGGTGTGTRC) and eubacterial 68F (TNANACATGCAAGTCGAKCG) (17). The reaction conditions were

as follows: 1 ,ug of template DNA, 10 ,ul of 10x reactionbuffer (500 mM KCI, 100 mM Tris HCI [pH 9.0 at 250C], 15mM MgCl2, 0.1% gelatin, 1% Triton X-100), 1 U of Taq DNApolymerase (Promega Biological Research Products, Madi-son, Wis.), 1 ,uM (each) primer, 200 ,uM (each) dATP, dCTP,dGTP, and dTTP in a 100-,ul total volume; 1 min at 94° C, 1min at 600C, 4 min at 72° C; 30 cycles. Polymerase chain

1707

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

ae

m.a

sm

.org

Do

wn

loa

de

d fro

m

MARINE BACTERIOPLANKTON 1711

Escherichia coliVibrio harveyi

inmm Vibrio anguillarum- Oceanospirillum linum- SAR 92

Caulobacter crescentusSAR83

- Erythrobacter OCh 114- Hyphomicrobium vulgareRhodopseudomonas marina

Rochalimaea quintanaAgrobacterium tumefaciens

SARI-Es SAR95

SAR1I

CZ* C)0._

u

10C)00

.0

0QC)00O)4..

eP.

- Liverwort CP*Anacystis nidulans- Prochlorothrix- SAR6* SAR7r SARIOOSAR139

0

p.00

0 0.05 0.10 0.15

FIG. 4. Phylogenetic tree showing relationships of the rDNA clones from the Sargasso Sea to representative, cultivated species (2, 6, 18,30, 31, 32, 35, 40). Positions of uncertain homology in regions containing insertions and deletions were omitted from the analysis.Evolutionary distances were calculated by the method of Jukes and Cantor (15), which corrects for the effects of superimposed mutations.The phylogenetic tree was determined by a distance matrix method (20). The tree was rooted with the sequence of Bacillus subtilis (38).Sequence data not referenced were provided by C. R. Woese and R. Rossen.

phototroph phyla were conserved in the correspondinggenes, which could also be identified by overall primarysequence similarities. The oxygenic phototroph genescloned from the bacterioplankton population were similar tothe 16S rRNA genes of cultured marine Synechococcus spp.isolated from the same oceanic region (similarity, .0.96; 33).Moreover, the cloned oxygenic phototroph genes containedall of the eubacterial and cyanobacterial signature nucleo-tides described by Woese (37). The differences among thegenes were localized in hypervariable regions of the mole-cule and were confirmed by compensatory base changesacross helices.When multiple genes of a given type were sequenced, as

with the SAR11 and oxygenic phototroph groups, a patternof sequence variation which indicated the presence of clus-ters of very closely related genes emerged. We reported thisphenomenon earlier and further confirm it with the addi-tional sequences presented here (SAR95, SAR100, andSAR139). Three approaches indicated that the sources ofthis variation were biological, rather than artifactual. First,in all cases compensatory changes were found in basespaired across helices. Second, identical primary sequencesand variations were observed when SAR11 cluster genesamplified from natural population Sargasso Sea DNA withspecific primers (SAR11 F and 1406 R) were sequenceddirectly (unpublished data). Third, the distributions of thevariations within the sequences were inconsistent with theexplanation that chimeric gene artifacts had occurred. Forexample, there were no genes containing ambiguous combi-nations of signature nucleotides, tracts of unpaired bases inhelical regions, or mosaic patterns of similarity in dot plots.A noteworthy exception occurred in the SAR11 cluster, inwhich all members exhibited an inversion on the order of avery highly conserved base pair between positions 570 and

880 (G - C to C G). We note with caution that some classesof chimeric gene artifacts, particularly those involving mul-tiple exchanges among very closely related genes, would beimpossible to detect.There are several alternative explanations for the variabil-

ity within the sequence clusters. Microheterogeneities in 16SrDNA gene families may account for some of the variability,particularly among very closely related genes (e.g., SAR1and SAR95; SAR100 and SAR139). There currently are nodocumented cases of microheterogeneities in 16S rDNAgene families, although many investigators informally reporttheir occurrence. It is likely that some of the sequencevariability among the more distantly related sequences is aconsequence of the divergence and speciation of cellularlineages. Wood and Townsend reported high genetic diver-sity among marine Synechococcus isolates on the basis ofRFLP data from four loci (39). The clone model of bacterialpopulation structure, which states that natural populationsare mixtures of independent clonal cell lines (26), predictsthat genetic diversity will be high in bacterial populationswith large effective population sizes. We estimate that theactual marine Synechococcus population size is ca. 1026.Viral predation may also play a role by promoting thedivergence of lineages through disruptive selection (22).The two novel eubacterial 16S rRNA lineages described

here, SAR83 and SAR92, could both be identified as proteo-bacteria. SAR92, a member of the ly-proteobacteria, formedno closer specific relationships. Since the -y-proteobacteriaare diverse in metabolic capacity, an ecological role forSAR92 could not be deduced from the observed phyloge-netic relationships.The specific phylogenetic relationship between SAR83

and the marine species E. longus Ochll4 suggests theintriguing possibility that these organisms might also share

VOL. 57, 1991

I I

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

ae

m.a

sm

.org

Do

wn

loa

de

d fro

m

Page 19: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

JOURNAL OF BACTERIOLOGY, JUlY 1991, p. 4371-43780021-9193/91/144371-08$02.00/0Copyright C) 1991, American Society for Microbiology

Vol. 173, No. 14

Analysis of a Marine Picoplankton Community by 16S rRNAGene Cloning and Sequencing

THOMAS M. SCHMIDT,t EDWARD F. DELONG,t AND NORMAN R. PACE*

Department ofBiology and Institute for Molecular and Cellular Biology, Indiana University,

Bloomington, Indiana 47405

Received 7 January 1991/Accepted 13 May 1991

The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzingthe sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the residentmicroorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central PacificOcean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and clonedinto bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2x 104 recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction andsequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison withan established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, includingfour from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was

identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related tosequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from theAtlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the y subdivisionof proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were

obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences

from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within thea subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of thea-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribu-tion of these organisms.

Marine picoplankton, organisms between 0.2 and 2 pum indiameter (18), are thought to play a significant role in globalmineral cycles (2), yet little is known about the organismalcomposition of picoplankton communities. Standard micro-biological techniques that involve the study of pure culturesof microorganisms provide a glimpse at the diversity ofpicoplankton, but most viable picoplankton resist cultivation(8, 10). The inability to cultivate organisms seen in theenvironment is a common bane of microbial ecology (1).As an alternative to reliance on cultivation, molecular

approaches based on phylogenetic analyses of rRNA se-quences have been used to determine the species composi-tion of microbial communities (14). The sequences of rRNAs(or their genes) from naturally occurring organisms arecompared with known rRNA sequences by using techniquesof molecular phylogeny. Some properties of an otherwiseunknown organism can be inferred on the basis of theproperties of its known relatives because representatives ofparticular phylogenetic groups are expected to share prop-erties common to that group. The same sequence variationsthat are the basis of the phylogenetic analysis can be used toidentify and quantify organisms in the environment byhybridization with organism-specific probes.Approaches that have been used to obtain rRNA se-

quences, and thereby identify microorganisms in naturalsamples without the requirement of laboratory cultivation,include direct sequencing of extracted 5S rRNAs (19, 20),

* Corresponding author.t Present address: Department of Microbiology, Miami Univer-

sity, Oxford, OH 45056.$ Present address: Woods Hole Oceanographic Institution,

Woods Hole, MA 02543.

analysis of cDNA libraries of 16S rRNAs (21, 24), andanalysis of cloned 16S rRNA genes obtained by amplificationusing the polymerase chain reaction (PCR) (3). Each of theseapproaches potentially imposes a selection on the sequencesthat are analyzed. Minor constituents of communities maynot be detected using the 5S rRNA because only abundantrRNAs can be analyzed. Methods that copy naturally occur-ring sequences in vitro before cloning potentially select forsequences that interact particularly favorably with primersor polymerases.

In this study we have characterized 16S rRNA sequencesfrom a Pacific Ocean picoplankton population by usingmethods chosen to minimize selection of particular se-quences. DNA from the mixed population was cloned di-rectly into phage X, and rRNA gene-containing clones wereidentified subsequently. The methods used in the analysisare applicable to other natural microbial communities. Theresults are correlated with sequences from cultivated organ-isms and with 16S rRNA gene sequences selected by PCRfrom Atlantic picoplankton (3).

MATERIALS AND METHODS

Collection and microscopy of picoplankton. Picoplanktonwere collected in the north central Pacific Ocean at theALOHA Global Ocean Flux Study site (22° 45'N, 158° 00' W)from aboard the RIV Moana Wave. As previously detailed(5), seawater was pumped onboard through a 10-,um-pore-size Nytex filter and concentrated by tangential flow filtra-tion using 10 ft2 (ca. 6,450 cm2) of 0.1 ,um-pore-size fluoro-carbon membrane, and cells were pelleted from the resultingconcentrate by centrifugation and frozen. Sampling began on

4371

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

jb.a

sm

.org

Do

wn

loa

de

d fro

m

Page 20: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

INSIGHT REVIEW NATURE|Vol 437|15 September 2005

344

community. As discussed below, some opportunistic strains thatexploit transient conditions may fall into this category.

The interpretation of the Sargasso Sea environmental sequencedata is already inspiring debate16. 16S rRNA gene sequence data fromthe Sargasso Sea WGS data set are shown in Fig. 2 without thesequences of Burkholdaria and Shewanella, which are rare in the Sar-gasso Sea ecosystem and have been questioned as possible contami-nants16. Naming microbial plankton by clade, as shown in Fig. 2, is aconvention used by most oceanographers that is based on evolution-ary principles. With few exceptions, the Sargasso Sea data fall into pre-viously named microbial plankton clades. However, Venter hasemphasized the new diversity shown by the data, concluding that 1,800‘genomic’ species of bacteria and 145 new ‘phylotypes’ inhabited thesamples recovered from the Sargasso Sea18. To reach this conclusion heapplied a rule-of-thumb which assumes that 16S rRNA gene sequencesthat are less than 97% similar originate from different species. As wediscuss further below, the origins of 16S rRNA sequence diversitywithin the named microbial plankton clades is a hot issue. But, how-ever they are interpreted, the high reliability of raw WGS sequencedata will be very useful for understanding the mechanisms of micro-bial evolution in the oceans.

Although most of the major microbial plankton clades have cosmopolitan distributions, new marine microbial plankton cladescontinue to emerge from studies that focus on unique hydrographicfeatures. For example, the new Archaea Crenarchaeota and Eury-archaeota were discovered at the brine–seawater interface of the Sha-ban Deep, in the Red Sea19.

Patterns in time and spaceMolecular biology has filled in some of the blanks about the naturalhistory of marine microbial plankton. As genetic markers becameavailable for ecological studies, it soon emerged that some of the dom-inant microbial plankton clades are vertically stratified (Fig. 1). Earlyindications of these patterns came from the distributions of rRNA geneclones among libraries collected from different depths20–23. Althoughthe study of microbial community stratification is far from complete,in many cases (marine unicellular cyanobacteria, SAR11, SAR202,SAR406, SAR324, group I marine Archaea) the vertical stratificationof populations has been confirmed by alternative experimentalapproaches21–27. The obvious interpretation is that many of thesegroups are specialized to exploit vertical patterns in physical, chemi-cal and biological factors. A clear example is the unicellular marine

cyanobacteria. As obligate phototrophs, these cyanobacteria are con-fined to the photic zone. A similar pattern is found in the SAR86 cladeof !-Proteobacteria. Proteorhodopsin genes have been found in frag-ments of SAR86 genomes, suggesting that this clade has the potentialfor phototrophic metabolism.

Many of the enigmatic microbial groups for which no metabolicstrategy has been identified are also stratified. The boundary betweenthe photic zone and the dark upper mesopelagic is particularly striking — below the photic zone the abundance of picophytoplank-ton and SAR86 declines sharply, and marine group I Archaea, SAR202,SAR406 and SAR324 all assume a prevalent status21–24,26,27. The impli-cations of these observations are clear: the upper mesopelagic com-munity is almost certainly specialized to harvest resources descendingfrom the photic zone. However, with the exception of the marinegroup I Archaea, very little specific information is available about theindividual activities of the upper-mesopelagic groups.

There are also significant differences between coastal and oceangyre microbial plankton populations (Fig. 1; ref. 28). Typically, con-tinental shelves are far more productive than ocean gyres becausephysical processes such as upwelling and mixing bring nutrients tothe surface. As a result eukaryotic phytoplankton make up a largerfraction of the biomass in coastal seas, and species differ betweencoastal and ocean populations. Most of the bacterial groups found ingyres also occur in large numbers in coastal seas, but a number ofmicrobial plankton clades, particular members of the "-Proteobac-teria, have coastal ecotypes or appear to be predominantly confinedto coastal seas28.

One of the most enigmatic microbial groups in the ocean is themarine group I Archaea. Tantalizing geochemical evidence suggeststhat these organisms are chemoautotrophs29. In the 1990s, DeLong andFuhrman established that archaea are widely distributed and numeri-cally significant in the marine water column11,20,30. The marine group IArchaea are Crenarchaeotes. They predominantly occur in themesopelagic, but are found at the surface in the cold waters of thesouthern ocean during the winter. Fluorescence in situ hybridizationtechnology was used to demonstrate that marine group I Archaea pop-ulations comprise about 40% of the mesopelagic microbial communityover vast expanses of the ocean, making them one of the most abun-dant organisms on the planet24. All of the marine Archaea remainuncultured.

New data about microbial distributions has provided tantalizinghints about geochemical activity, but most progress on this question

CrenarchaeotaGroup I Archaea

EuryarchaeotaGroup II ArchaeaGroup III ArchaeaGroup IV Archaea

α-Proteobacteria* SAR11 - Pelagibacter ubique* Roseobacter cladeOCS116ß-Proteobacteria* OM43µ-ProteobacteriaSAR86* OMG Clade* Vibrionaeceae* Pseudoalteromonas* Marinomonas* Halomonadacae* Colwellia* Oceanospirillum δ-Proteobacteria

Cyanobacteria* Marine Cluster A

(Synechococcus)* Prochlorococcus sp.

Lentisphaerae* Lentisphaera araneosa

Bacteroidetes

Marine Actinobacteria

FibrobacterSAR406

Planctobacteria

ChloroflexiSAR202

Archaea

Bacteria

Figure 1 | Schematic illustration of the phylogeny ofthe major plankton clades. Black letters indicatemicrobial groups that seem to be ubiquitous inseawater. Gold indicates groups found in the photiczone. Blue indicates groups confined to themesopelagic and surface waters during polar winters.Green indicates microbial groups associated withcoastal ocean ecosystems.

!"#$%&'())%#*+,-###./*/!0##*1""#23##4(56#,!

Nature Publishing Group© 2005

©!!""#!Nature Publishing Group!

!

NATURE|Vol 437|15 September 2005|doi:10.1038/nature04158 INSIGHT REVIEW

343

Molecular diversity and ecology of microbial planktonStephen J. Giovannoni1 & Ulrich Stingl1

The history of microbial evolution in the oceans is probably as old as the history of life itself. In contrast toterrestrial ecosystems, microorganisms are the main form of biomass in the oceans, and form some of thelargest populations on the planet. Theory predicts that selection should act more efficiently in largepopulations. But whether microbial plankton populations harbour organisms that are models of adaptivesophistication remains to be seen. Genome sequence data are piling up, but most of the key microbialplankton clades have no cultivated representatives, and information about their ecological activities is sparse.

cultivation of key organisms, metagenomics and ongoing biogeo-chemical studies. It seems very likely that the biology of the dominantmicrobial plankton groups will be unravelled in the years ahead.

Here we review current knowledge about marine bacterial andarchaeal diversity, as inferred from phylogenies of genes recoveredfrom the ocean water column, and consider the implications of micro-bial diversity for understanding the ecology of the oceans. Althoughwe leave protists out of the discussion, many of the same issues applyto them. Some of the studies we refer to extend to the abyssal ocean,but we focus principally on the surface layer (0–300 m) — the regionof highest biological activity.

Phylogenetic diversity in the oceanSmall-subunit ribosomal (RNA) genes have become universal phylo-genetic markers and are the main criteria by which microbial plank-ton groups are identified and named9. Most of the marine microbialgroups were first identified by sequencing rRNA genes cloned fromseawater10–14, and remain uncultured today. Soon after the first reportscame in, it became apparent that less than 20 microbial cladesaccounted for most of the genes recovered15. Figure 1 is a schematicillustration of the phylogeny of these major plankton clades. The taxonnames marked with asterisks represent groups for which cultured iso-lates are available.

The recent large-scale shotgun sequencing of seawater DNA is pro-viding much higher resolution 16S rRNA gene phylogenies and bio-geographical distributions for marine microbial plankton. Althoughthe main purpose of Venter’s Sorcerer II expedition is to gather whole-genome shotgun sequence (WGS) data from planktonic microorgan-isms16, thousands of water-column rRNA genes are part of theby-catch. The first set of collections, from the Sargasso Sea, haveyielded 1,184 16S rRNA gene fragments. These data are shown in Fig. 2, organized by clade structure. Such data are a rich scientificresource for two reasons. First, they are not tainted by polymerasechain reaction (PCR) artefacts; PCR artefacts rarely interfere with thecorrect placement of genes in phylogenetic categories, but they are amajor problem for reconstructing evolutionary patterns at the popu-lation level17. Second, the enormous number of genes provided by theSorcerer II expedition is revealing the distribution patterns and abun-dance of microbial groups that compose only a small fraction of the

Certain characteristics of the ocean environment — the prevailinglow-nutrient state of the ocean surface, in particular — mean it issometimes regarded as an extreme ecosystem. Fixed forms of nitrogen,phosphorus and iron are often at very low or undetectable levels in theocean’s circulatory gyres, which occur in about 70% of the oceans1.Photosynthesis is the main source of metabolic energy and the basis ofthe food chain; ocean phytoplankton account for nearly 50% of globalcarbon fixation, and half of the carbon fixed into organic matter israpidly respired by heterotrophic microorganisms. Most cells are freelysuspended in the mainly oxic water column, but some attach to aggre-gates. In general, these cells survive either by photosynthesizing or byoxidizing dissolved organic matter (DOM) or inorganic compounds,using oxygen as an electron acceptor.

Microbial cell concentrations are typically about 105 cells ml!1 inthe ocean surface layer (0–300 m) — thymidine uptake into microbialDNA indicates average growth rates of about 0.15 divisions per day(ref. 2). Efficient nutrient recycling, in which there is intense competi-tion for scarce resources, sustains this growth, with predation byviruses and protozoa keeping populations in check and driving highturnover rates3. Despite this competition, steady-state dissolvedorganic carbon (DOC) concentrations are many times higher thancarbon sequestered in living microbial biomass4. However, the averageage of the DOC pool in the deep ocean, of about 5,000 years5 (deter-mined by isotopic dating), suggests that much of the DOM is refrac-tory to degradation. Although DOM is a huge resource, rivallingatmospheric CO2 as a carbon pool6 , chemists have been thwarted bythe complexity of DOM and have characterized it only in broad terms7.

The paragraphs above capture prominent features of the oceanenvironment, but leave out the complex patterns of physical, chemicaland biological variation that drive the evolution and diversification ofmicroorganisms. For example, members of the genus Vibrio — whichinclude some of the most common planktonic bacteria that can be iso-lated on nutrient agar plates — readily grow anaerobically by fermen-tation. The life cycles of some Vibrio species have been shown toinclude anoxic stages in association with animal hosts, but the broadpicture of their ecology in the oceans has barely been characterized8.The story is similar for most of the microbial groups described below:the phylogenetic map is detailed, but the ecological panorama is thinlysketched. New information is rapidly flowing into the field from the

1Department of Microbiology, Oregon State University, Corvallis, Oregon 97331, USA.

!"#$%&'())%#*+,-###./*/!0##*1""#23##4(56#*

Nature Publishing Group© 2005

©!!""#!Nature Publishing Group!

!

!20

Page 21: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

NATURE|Vol 437|15 September 2005 INSIGHT REVIEW

345

near the tips of branches and could be attributed to neutral mutationsaccumulating in clonal populations. Thompson et al. went a step fur-ther by examining sequence divergence and genome variability in a setof 232 Vibrio splendidus isolates taken from the same coastal locationat different times35. The isolates differed by less than 1% in rRNA genesequences, but showed extensive variation in genome size and allelicdiversity. These results could explain why Venter’s group found marinemicrobial genomes difficult to assemble from shotgun sequence data.

However, Kimura predicted that selection has more opportunity toact on small changes in fitness as population size increases, and there-fore very large, stable populations should be more highly perfected byselection36. More specifically, Kimura coined the term ‘effective popu-lation size’ to refer to the minimal size reached by a population that isundergoing fluctuations. For these marine microbes, it may be thatwhere large populations have not been through recent episodes ofpurifying selection, they are able to maintain very large reservoirs ofneutral genetic variation. If this hypothesis is correct, then, at leastwithin ecotypes of microbial plankton, one would expect to find a coreset of genes conferring relatively conserved phenotype.

The SAR11 clade provided the earliest demonstration that the sub-clades of environmental gene clusters could be ecotypes26. ProbingrRNA revealed the presence of a surface (IA) and a deep sub-clade (II),but failed to identify the niche of a third sub-clade (IB; ref. 26). Morerecently, the niche of SAR11 subclade IB emerged in a study of thetransition between spring-bloom and summer-stratified conditions inthe western Sargasso Sea27. Nonmetric multidimensional scalingrevealed that the IB subclade occurs throughout the water column inthe spring, apparently giving way to the more specialized IA and IIsubclades when the water column becomes thermally stratified.

The ecotype concept continues to expand with the recognition thatmany microbial groups can be subdivided according to their distribu-tions in the water column. The unicellular cyanobacteria are by far thebest example. Two ecotypes of Prochlorococcus can readily be differ-entiated by their chlorophyll b/chlorophyll a ratios — a high-light-adapted (high-b/a) lineage, and a low-light-adapted (low-b/a) lineage.Phylogenetic evidence from internal transcribed spacers (ITS) sug-gests that the high-b/a strains can be differentiated into four geneti-cally distinct lineages. The ITS-based phylogenies indicate that MarineCluster A Synechococcus can be subdivided into six clades, three of which can be associated with adaptively important phenotypic characteristics (motility, chromatic adaptation, and lack of phyco-urobilin)37,38. So far, the genome sequences available have providedample support for the hypothesis that these ecotypes differ in charac-teristics that affect their ability to compete. Notably, the low-b/a strainSS120 has a much smaller genome than the others and can use onlyammonium and amino acids for nitrogen sources39. At the otherextreme, Synechococcus WH8102 can use ammonium, urea, nitrite,nitrate, cyanate, peptides and amino acids as sources of nitrogen. It isinteresting to note that Marine Cluster A Synechococcus populationsseem to prosper during periods of upwelling and vertical mixing —whereby nutrients are supplied but also cause chaotic, transitionalconditions. Thus, as observed in the SAR11 clade, there seem to beseasonal specialists and stratification specialists in the marine unicel-lular cyanobacteria.

The observation that the major microbial plankton clades havediverged into ecotypes is powerful evidence that selection is creatingfunctionally and genetically unique entities, despite the confoundinginfluence of neutral variation, which causes relatively marked diver-gence in genome sequences. Although the unicellular marinecyanobacteria are a good model for what the future may hold, thedebate about diversity is far from over.

Old paradigms challenged by new forms of phototrophyThe new millennium arrived in tandem with discoveries of new formsof phototrophy in the ocean surface, which in turn fundamentallychanged perspectives on microbial food webs. Béjà et al. reported the

has come either from cultures or from approaches designed to yieldinformation from experiments performed on native populations. Flu-orescence-activated cell-sorting and in situ hybridization have bothbeen used to separate populations and measure their uptake ofradioactive substrates31–33. Testing hypotheses originating fromgenome sequences, oceanographers were surprised to find that theunicellular marine bacteria, particularly Prochlorococcus, can assimi-late free amino acids — it had previously been thought that they relysolely on inorganic nitrogen32.

The species questionThe question of how to name microbial plankton species is not a triv-ial matter. For oceanographers the issue is: where should the lines bedrawn so that organisms with different properties relevant to geo-chemistry are given unique names? From an evolutionary perspective,the question might be phrased differently: how does one demarcatecell populations that use the same resources and possess the samesuites of adaptations inherited from a common ancestor? Confusionarises from the fact that there is no general agreement about the defi-nition of a microbial ‘species’. The ‘97 % rule’ is simple to apply but doesnot take into account the complex structure of microbial clades. Forexample, the unicellular marine cyanobacteria form a shallow cladethat would constitute a single species by the 97% rule, but all agree thatthis clade contains several species with distinct phenotypes. Cladessuch as SAR86 and SAR11 are far more diverse, but can clearly bedivided into subclades. One theory is that some of these ‘bushy’ sub-clades are ecotypes — populations with shared characters and uniqueniches34.

Acinas and co-workers studied clade structure by using clonelibraries prepared using PCR methods that reduced sequence arte-facts17. They concluded that most sequence variation was clustered

SAR11 su

bgroups I

a + Ib

SAR86 (γ-P

roteobacte

ria)

SAR11 su

bgroup II

Marin

e Picophyto

plankto

n

Uncultu

red α-Pro

teobacteria

Clade

SAR406 (Fibro

bacter)

Bactero

idetes

SAR324 (δ-P

roteobacte

ria)

Marin

e Acti

nobacteria

Altero

monas/Pse

udoaltero

monas

% o

f 16S

rRN

A s

eque

nces

0

5

10

15

20

25

30

35

SAR116 (α

-Pro

teobacteria

)

Rheinheimera

Roseobacte

r clade

SAR202 (Chloro

flexi)

Phylogenetic clade

Figure 2 | 16S rRNA genes from the Sargasso Sea metagenome data set,organized by clades. The clades are shown in rank order according to geneabundance. To create the figure, 16S gene fragments were recovered fromthe data set by the BLAST program using 14 full sequences of differentprokaryotic phyla as query. The resulting 934 sequences werephylogenetically analysed using the program package ARB. Venter et al.correctly reported more 16S rRNA gene fragments (1,184) because theiranalysis included smaller fragments that are excluded from the set of 934sequences used in the analysis shown here. Genes belonging to the generaBurkholdaria and Shewanella were omitted from the analysis because ofsuggestions that they are contaminants72.

!"#$%&'())%#*+,-###./*/!0##*1""#23##4(56#,,

Nature Publishing Group© 2005

©!!""#!Nature Publishing Group!

!

!21

Page 22: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

DeLong Lab

• Studying Sar86 and other marine plankton

• Note - published one of first genomic studies of uncultured microbes - in 1996

JOURNAL OF BACTERIOLOGY, Feb. 1996, p. 591–599 Vol. 178, No. 30021-9193/96/$04.00�0Copyright � 1996, American Society for Microbiology

Characterization of Uncultivated Prokaryotes: Isolation andAnalysis of a 40-Kilobase-Pair Genome Fragment

from a Planktonic Marine ArchaeonJEFFEREY L. STEIN,1* TERENCE L. MARSH,2 KE YING WU,3 HIROAKI SHIZUYA,4

AND EDWARD F. DELONG3*

Recombinant BioCatalysis, Inc., La Jolla, California 920371; Microbiology Department, Universityof Illinois, Urbana, Illinois 618012; Department of Ecology, Evolution and Marine Biology,

University of California, Santa Barbara, California 931063; and Division of Biology,California Institute of Technology, Pasadena, California 911254

Received 14 July 1995/Accepted 14 November 1995

One potential approach for characterizing uncultivated prokaryotes from natural assemblages involvesgenomic analysis of DNA fragments retrieved directly from naturally occurring microbial biomass. In thisstudy, we sought to isolate large genomic fragments from a widely distributed and relatively abundant but asyet uncultivated group of prokaryotes, the planktonic marine Archaea. A fosmid DNA library was preparedfrom a marine picoplankton assemblage collected at a depth of 200 m in the eastern North Pacific. Weidentified a 38.5-kbp recombinant fosmid clone which contained an archaeal small subunit ribosomal DNAgene. Phylogenetic analyses of the small subunit rRNA sequence demonstrated its close relationship to that ofpreviously described planktonic archaea, which form a coherent group rooted deeply within the Crenarchaeotabranch of the domain Archaea. Random shotgun sequencing of subcloned fragments of the archaeal fosmidclone revealed several genes which bore highest similarity to archaeal homologs, including large subunitribosomal DNA and translation elongation factor 2 (EF2). Analyses of the inferred amino acid sequence ofarchaeoplankton EF2 supported its affiliation with the Crenarchaeote subdivision of Archaea. Two gene frag-ments encoding proteins not previously found in Archaea were also identified: RNA helicase, responsible for theATP-dependent alteration of RNA secondary structure, and glutamate semialdehyde aminotransferase, anenzyme involved in initial steps of heme biosynthesis. In total, our results indicate that genomic analysis oflarge DNA fragments retrieved from mixed microbial assemblages can provide useful perspective on thephysiological potential of abundant but as yet uncultivated prokaryotes.

Characterization of complex microbial communities by iso-lation and analysis of phylogenetically informative gene se-quences has been an exciting development in microbiology (27,32). Studies using molecular phylogenetic approaches based onsmall subunit (ssu) rRNA sequence analyses have resulted innew estimates of the phylogenetic diversity contained withinnaturally occurring microbial assemblages. Entirely new phy-logenetic lineages, which may sometimes represent major con-stituents of natural microbial communities, have been revealedby using such approaches (3, 4, 7, 13, 16, 22, 29). Althoughresults of these studies have contributed substantially to assess-ments of naturally occurring microbial diversity, their utility aspredictors of the physiological attributes of newly describedphylotypes has been more limited. This is partly because manyphylogenetically coherent prokaryote lineages, for example,the proteobacteria, often encompass a bewildering array ofphysiological and metabolic diversity. The problem is furthercomplicated by the fact that many of the newly discoveredphylotypes revealed by molecular phylogenetic analysis areproving difficult to enrich and isolate in pure culture. Theselimitations suggest the need for alternative approaches to char-acterize the physiological and metabolic potential of as yet

uncultivated microorganisms, which to date have been charac-terized solely by analyses of PCR-amplified rRNA gene frag-ments, clonally recovered from mixed assemblage nucleic ac-ids.

Recent work has demonstrated that prokaryotes of the do-main Archaea (35, 37) are more phylogenetically diverse andecologically widespread than has been previously suspected (3,7, 13, 24). Marine planktonic archaea have now been shown tooccur in relatively high abundance in a variety of marine plank-tonic environments. For instance, in surface waters of Antarc-tica during late winter, as well as in subsurface waters of coastaltemperate regions, planktonic marine archaeal rRNA can ac-count for a significant fraction (5 to 35%) of the total pico-plankton rRNA (7, 8). These and other data indicate thatarchaeoplankton are widespread, abundant components ofmarine picoplankton and presumably are important partici-pants in microbially mediated cycling of energy and matter inthe sea. Specific relatives of the crenarchaeote-affiliated ma-rine archaea have also been detected in soil microbial assem-blages indicating their further diversification (33a).

Although the genotypic and phenotypic properties of marinepelagic archaea are unknown at present, their abundance andecological distribution suggest that they may represent entirelynew phenotypic groups within the domain Archaea. Availabledata on these microorganisms consist solely of quantitativeestimates of their abundance and distribution derived from ssurRNA hybridization experiments (7, 8) or of the recovery andsequence analysis of PCR-amplified rRNA gene fragments (7,

* Corresponding author. Mailing address for Jefferey L. Stein: Re-combinant BioCatalysis, Inc., 505 Coast Blvd. South, La Jolla, CA92037. Electronic mail address: [email protected]. Mailing ad-dress for Edward F. DeLong: Department of Ecology, Evolution andMarine Biology, University of California, Santa Barbara, CA 93106.Electronic mail address: [email protected].

591

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

jb.a

sm

.org

Do

wn

loa

de

d fro

m

Page 23: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Delong Lab

Restriction mapping. Large genomic fragments isolated from fosmid cloneswere mapped by partial and double digestion with various restriction endonucle-ases. When the subclone sizes exceeded 10 kb, the F-factor-based vectorpBAC108L (30) was used to accommodate the fosmid subfragments. Partialdigestions were performed by adding 2.5 U of restriction enzyme to 1 ⇥g ofNotI-digested clone DNA in a 30-⇥l reaction mixture. The reaction mixture wasincubated at 37⇤C, and 10-⇥l aliquots were removed at 10, 40, and 60 min.Restriction digestions were terminated by adding 1 ⇥l of 0.5 M EDTA to thereaction mixtures and placing the tubes on ice. The partially digested DNA wasseparated by pulsed-field gel electrophoresis as described above except using a 1-to 3-s ramped switch time at 100 V for 16 h. The sizes of the separated fragmentswere determined relative to those of known standards. The distances of therestriction sites relative to the terminal T7 and SP6 promoter sites on the excisedcassette were determined by end labeling 10 pmol of T7- or SP6-specific oligo-nucleotides with [⌅-32P]ATP (7,000 Ci/mmol) and hybridizing with Southernblots of the gels.

Southern blots of agarose gels containing fosmid and pBAC clones digestedwith two or more restriction enzymes were probed with labeled T7 and SP6oligonucleotides as well as random-prime-labeled subclones and PCR fragmentscarrying gene sequences identified from the shotgun sequencing described above.This information was correlated with the size estimates from the partial diges-tions to generate physical and genetic maps of the fosmids and their subclones.

Phylogenetic analysis. Sequence alignment and DeSoete distance (9) analyseswere performed on a Sun Sparc 10 workstation using GDE 2.2 and Treetool 1.0,obtained from the ribosomal database project (RDP) (23). DeSoete least squaresdistance analyses (9) were performed by using pairwise evolutionary distances,calculated by using the correction of Olsen to account for empirical base fre-quencies (34). Reference sequences were obtained from the RDP, version 4.0(23). Maximum likelihood analyses (10) of ssu rRNA sequences were performedby using fastDNAml 1.0 (25), obtained from the RDP. For distance analyses ofthe inferred amino acid sequence of EF2, evolutionary distances were estimatedby using the Phylip program (12) Protdist, and tree topology was inferred by theFitch-Margoliash method, using random taxon addition and global branch swap-ping. For maximum parsimony analyses of protein sequences, the Phylip pro-gram Protpars was used with random taxon addition and ordinary parsimonyoptions.

Nucleotide sequence accession numbers. Partial sequences reported in Table1 have been submitted to GenBank under the following accession numbers:U40238, U40239, U40240, U40241, U40242, U40243, U40244, and U40245. Thenucleotide sequences encoding ssu rRNA and EF2 have been submitted toGenBank under accession numbers U39635 and U41261.

RESULTS

Figure 1 shows an overview of the procedures used to con-struct an environmental library from the mixed picoplanktonsample. Our goal was to construct a stable, large insert DNAlibrary representing picoplankton genomic DNA, in order togain information about the genetic and physiological potentialof one constituent group in this community, the planktonicmarine Crenarchaeota. Agarose plugs containing high-molecu-lar-weight picoplankton DNA were prepared by concentratingcells from 30 liters of seawater, using hollow fiber filtration.These agarose plugs, representing picoplankton collected froma variety of sites and depths in the eastern North Pacific, werescreened for the presence of archaebacteria by using botheubacterium-biased (to test for positive amplification) and ar-chaeon-biased rDNA primers. PCR amplification results fromseveral of the agarose plugs (data not shown) indicated thepresence of significant amounts of archaeal DNA. Quantitativehybridization experiments using rRNA extracted from onesample, collected at a depth of 200 m off the Oregon coast,indicated that planktonic archaea in this assemblage comprisedapproximately 4.7% of the total picoplankton biomass (thissample corresponds to ‘‘PAC1’’-200 m in Table 1 of reference8). Results from archaeon-biased rDNA PCR amplificationperformed on agarose plug lysates confirmed the presence ofrelatively large amounts of archaeal DNA in this sample. Aga-rose plugs prepared from this picoplankton sample were cho-sen for subsequent fosmid library preparation. Each 1-ml aga-rose plug from this site contained approximately 7.5 � 109

cells; therefore, approximately 5.4 � 108 cells were present inthe 72-⇥l slice used in the preparation of the partially digestedDNA.

Recombinant fosmids, each containing ca. 40 kb of pico-plankton DNA insert, yielded a library of 3,552 fosmid clones,containing approximately 1.4 � 108 bp of cloned DNA. All ofthe clones examined contained inserts ranging in size from 38to 42 kbp (Fig. 2). Both the multiplex PCR (Fig. 3) and thehybridization experiments suggested that well B7 on microtiter

FIG. 1. Flowchart depicting the construction and screening of an environ-mental library from a mixed picoplankton sample. MW, molecular weight;PFGE, pulsed-field gel electrophoresis.

FIG. 2. Pulsed-field gel showing the separation of selected fosmid clonesdigested with NotI and BamHI. The pFOS1 vector band is at 7.2 kbp. The toptwo bands of clone 4B7 are doublets.

VOL. 178, 1996 GENOMIC FRAGMENTS FROM PLANKTONIC MARINE ARCHAEA 593

at U

NIV

OF

CA

LIF

DA

VIS

on

Ma

y 1

8, 2

01

0

jb.a

sm

.org

Do

wn

loa

de

d fro

m

!23

Page 24: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Delong Lab

dish 4 contained a clone encoding a 16S rDNA gene specific tothe archaea. This clone, designated 4B7, was selected for moredetailed examination. Using the cloned 4B7 fragment as aprobe, we did not detect other cloned fragments in the fosmidlibrary that contained overlapping regions (Fig. 4). No otherarchaeal ssu rRNA genes were detected in the library.

Sequence analysis and identification of protein- and rRNA-encoding genes. Fragments of archaeal DNA contained in the4B7 insert were subcloned by digesting the fosmid with eitherEcoRI, EcoRI plus BamHI, or SpeI. The resulting restrictionfragments were recovered in Bluescript vector (Stratagene),and the distal 200 to 300 nucleotides of each subclone weresequenced by using M13 forward and reverse sequencing prim-ers. Of a total of 18 subclones analyzed in this fashion, 6 (33%)contained nucleotide sequences with significant identity to pre-viously characterized genes archived in the National Center forBiotechnology Information nucleic acid or protein nonredun-dant database. Table 1 shows the rank order similarity ofsubclone sequences which share significant similarity to knownsequences, based on Poisson probabilities of random homol-ogy [P(n)] (1, 17). Putative genes contained on fosmid 4B7identified in this fashion include EF2, glutamate 1-semialde-hyde aminotransferase (GSAT), RNA helicase, DnaJ, ssurRNA, and large subunit (lsu) rRNA. Three of these nucle-otide sequences (EF2, lsu rRNA, and ssu rRNA) are mostsimilar to known archaeal homologs. Relatively high P(n) val-ues obtained with the DnaJ (subclone 22K-R) and lsu rRNA(Spe14-R) sequences are due to the fact that only a smallportion of each of these subcloned fragments encodes theindicated homolog.

Gene organization. The results of the sequence analyses ofsubclones containing the 16S-23S rRNA operon, GSAT, EF2,and RNA helicase were used to confirm the locations of re-striction sites determined by partial and double digestions. Thesequenced genes mapped to the distal BamHI-NotI fragmentsof the 38.5-kbp insert in clone 4B7 (Fig. 5). The GSAT geneand the 16S-23S operon reside on one of these fragments andare transcribed in the same direction. We have not foundevidence for a 5S rRNA gene encoded on the 4B7 clone. Thegenes encoding RNA helicase and EF2 were on the distalBamHI-NotI fragment. The distal location of the archaeo-plankton EF2 gene relative to the rRNA operon on clone 4B7,as well as the striking similarities of fosmid-encoded EF2 andssu and lsu rRNA genes to archaeal homologs (Table 1), pro-vides evidence that the 4B7 fosmid insert represents a contig-uous genomic fragment from a planktonic marine archaeon.

Phylogenetic analysis of fosmid-encoded ssu rRNA and EF2.Southern blot hybridization using probes produced from sub-

C

FIG. 3. Multiplex PCR analysis of the fosmid bacterioplankton DNA library.(A and B) Agarose gel electrophoreses of fosmid minipreparations pooled fromeach microtiter dish in the library (dishes 1 to 37) and amplified with archaeon-biased ssu rRNA-specific primers. Lane M, 1-kb molecular size marker (Be-thesda Research Laboratories); lane ⇥, positive control containing archaealgenomic DNA (Haloferax volcanii); lane �, negative control containing eubac-terial genomic DNA (Shewanella putrefaciens). (C) Agarose gel electrophoresisof archaeon-biased ssu rRNA PCR amplifications of fosmid clones pooled fromcolumns (1 to 12) and rows (A to H) of microtiter dish 4 (positive reaction inpanel A). Positive reactions were detected in microtiter dish 4, row B, column 7(clone 4B7).

FIG. 4. High-density filter replica of 2,304 fosmid clones containing approx-imately 92 million bp of DNA cloned from the mixed picoplankton community.The filter was probed with the labeled insert from clone 4B7 (dark spot). Thelack of other hybridizing clones suggests that contigs of 4B7 are absent from thisportion of the library. Similar experiments with the remainder of the libraryyielded similar results.

594 STEIN ET AL. J. BACTERIOL. a

t UN

IV O

F C

AL

IF D

AV

IS o

n M

ay

18

, 20

10

jb

.as

m.o

rgD

ow

nlo

ad

ed

from

!24

Page 25: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

SAR86 has Proteorhodopsin in Its Genome

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!25

Page 26: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The proteorhodopsin tree was constructed on the basis of the archaeal rhodopsin alignment used by Mukohata et al. [Y. Mukohata, K. Ihara, T. Tamura, Y. Sugiyama, J. Bio- chem. ( Tokyo) 125, 649 (1999)] and the neighbor- joining method as implemented in the neighbor pro- gram of the PHYLIP package. The numbers at forks indicate the percent of bootstrap replications (out of 1000) in which the given branching was observed. The least-square method (the Kitsch program of PHYLIP) and the maximum likelihood method (the Puzzle program) produced trees with essentially identical topologies, whereas the protein parsimony method (the Protpars program of PHYLIP) placed proteorhodopsin within the sensory rhodopsin cluster. The Yro/Hsp30 subfamilies sequences (6) were omitted from the phylogenetic analysis to avoid the effect of long branch attraction.

Page 27: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 28: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 29: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 30: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 31: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 32: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 33: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

1 ttgttatatc agtaatggct attgctccaa taacttaata ctaatatata attagtttat 61 gaataaattt tatatatttg ggttattgtt ttttacacta aatgcatttt cttgctcaga 121 tcttctagat acagacatga gagttcttga ttccgctgag tcaagaaacc tttgcgagtt 181 tgaaggaaaa gctttactag ttgtgaatgt tgcaagtaga tgtggttaca cttatcaata 241 tgctggcctt caaaagttat atgaaagtta taaagatgaa gattttctag taattgggat 301 cccatctaga gattttcttc aagaatactc tgatgaaagc gatgttgcag aattttgttc 361 tacagaatac ggtgttgaat ttcctatgtt ctcaactgct aaagtcaaag gaaaaaaagc 421 acacccattt tataaaaaac ttattgcaga atcaggtttt actccctcat ggaactttaa 481 taaatactta atctcaaaag agggcaaggt tgtatccaca tatggatcaa aggtaaagcc 541 tgattcaaaa gagcttatat cagctataga aggcttgctg taaaattatt acttagaaac 601 taatacagtt ttaggcttgt ttgctgcaaa tattccatta tctacaactc caggaatatt 661 attaatcaaa gcttccattt cagtggggtt tgaaatatcc atattagaga tatctaaaat 721 gtgattacct tggtctgtta taaatccagt tctatatgtg ggtattccac cgatcgagat 781 tatttttctt gcaacaaggc tcctactttc aggtatcacc tctataggca gtggaaaagc 841 tcccaaaaga ttaaccatct ttgactgatc aactatacat ataaactcgt tagaggcaga 901 agcaactatc ttttctctag tatgtgcgcc accaccacct ttaataagac aattttcagg 961 agacacctca tctgcaccat ctatgtaata agctatatca actacatcat taaggctaaa 1021 gacctctatc ccattttcat ttaataattt tgatgaagca tctgaactag aaacagctcc 1081 agcaaatttg tgcctatgct cctttagttc ttctataaaa aaattaactg ttgagccggt 1141 tccaatacct aaaatcatct caggatgaag attatttttg atatattcta tagcttgttt 1201 agcaacattt atctttgagc cactcataga gttataatac aagaaaatat aggtagttaa 1261 ttattttgag actaaaaatt aaaaaaacag gttcttttaa gaattcccag aagtacctaa 1321 agcttattcc taatgcatct gtttatgacg tagcattaaa atcacctata acatttgctc 1381 taaatatttc ttcaaagctg gggaataaag ttttcctaaa aagagaggat ctgcaaccta 1441 tattttcttt taaaaacaga ggagcgtata acaagattgt aaatttatcc gatgccgaaa 1501 agaagagggg ggttattgct gcatcagcag gaaatcatgc tcaaggggta gccagtgcat 1561 gtaagaaatt aaaaattaat tgcttgatag ttatgccaat aacaactcca gaaataaaaa 1621 taaaagatgt aaaaagattt ggagccaaaa tactccaaca tggggacaac gtagatgcag 1681 cattaaaaga ggcactgttt attgcaaaga aaaaaaaatt gtcttttgtt catccttttg 1741 acgaccctct aacaattgct ggccaaggga ctataggaca agaaattctt gaagataaaa 1801 ataattttga tgttgtcttt gttccggtgg gaggaggagg tattctagct ggtgtatctg 1861 cctggatagc acagaataat aagaaaataa aaattgttgg tgttgaggtt gaggattccg 1921 cttgtcttgc tgaggccgta aaagctaata aaagagttat tttaaaagaa gtgggcctct 1981 ttgctgatgg ggtggcagta tcaagggttg gaaaaaataa ttttgatgtt attaaagagt 2041 gcgtagatga agtcattaca gttagcgttg atgaggtctg caccgctgta aaagatatct 2101 ttgaagatac aagggttcta tcagaacctg ctggggcatt agcacttgca gggttaaaag 2161 cctacgcaag gaaagttaaa aataaaaaac ttattgctat aagttctggc gctaatgtaa 2221 atttccaaag acttaatttt attgttgagc gatcagagat tggtgaaaat agagaaaaaa 2281 tattaagtat caaaatccca gagatacctg gaagttttct taagctttca aggatgtttg 2341 gcagctctca agttacagag tttaactaca ggaaatctag cttaagcgat gcatatgttt 2401 tagttggtgt tagaactaaa actgaaaaat catttgaaat cttaaagtcc aaattaaaaa 2461 aagcaggctt cacctttagc gactttactc gaaatgaaat atccaatgat catctgaggc 2521 atatggttgg tggcagaaat agtgactcag gctctcataa caatgaaaga atatttaggg 2581 gagagtttcc tgagaagccg ggcgcgctgt taaattttct agagaaattt ggaaataaat 2641 ggmatatttc cttatttcat tacaggaacc taggttcagc ttttggaaag atattaattg 2701 gcatcgagag taaggataaa gacaagctaa taaatcattt aaataagtca ggnactattt 2761 ttacagaaga aacctctaac aaggcataca aagatttttt aaaatgaaag gttaatactt 2821 taatctaaat ttaattgaaa aaagctcatc gctagggttt tcccacggct ctttgaacaa 2881 ctcggattga gatctatcat cctcctcgtc gtaaattctc ccacctttag aatagaccaa 2941 aaatagatat gacaaaggag cgagctcata tttatatcta atttggaacg aagctacgcc 3001 agtattaaat tcattaacta tattatttcc tttataaagg tatccattag catctgaaaa 3061 aatactaata ggattttttg ctttcaaagc aacaaattga ctcttaagtc tgatctcatg 3121 tttattattt ttaaaccaat ttagatcaaa agataaggta tcttgtcttg aatcatatga 3181 ggcaagatta ttattatcct gccatatcag ccattcattt tcctttctta ttctatattg 3241 cgcattgatt cttaagttat catttggaaa tattgaacct gctatcttgt aaaattttct 3301 accataccca ttcgaatccc accgattatc tttctctccc ttaaagaagc taactctcca 3361 gtcatatgtc cagaatgaat agttctttgc ctcaaagtct gctgtaatac ctattcttct 3421 ttttgatttg ataaaggggt aggcttcatt ttttcttgtg atagttgtat ttttcccaga 3481 agatctaaag ttaaaatcta attgaaattt agagttgtcc ttaaaactaa aagaattttt 3541 ttgatcgatg cctattggat ttgaattacc agctgtgtca gcatcataat ttagatcaat 3601 tccataatct atttgtttta atatcgagct attatcaaat tcatttattt ttcgatttcc 3661 tccaatacca gcatgaatcc agtctcttct ttgcagataa ccaaagtcat ttaactcgaa 3721 gtcgtcttca aaataaagaa ggcttccact tatgttagat agtttatttg gaagatatgt 3781 aaactgagtc ctatacccaa gcccattttt accatctttt tctgaggcta acagatctga 3841 atatgtaatt aatttttttg aacgaatatt gatgtaatca ataacattga ctgttgatga 3901 ttcgcctgtc atttcattct caacattcgt taccatgaag ccaagcgttt tatttccaag 3961 ctttgttcga gatcgaagag cataatagtc tcttccaact gaaaaggctt catcagcctc 4021 acttgctaca aatactccaa attcattatt attacttttt tgagtaagtc ttaatgcaaa 4081 atcaatatca gaatagtttt ttttagctgc ctcgcaacct tcttcattac tttcttctga 4141 gcaattatag ctgggggcag ctccaatcct ccgcgtattt ataaccgagt atctatcata 4201 attactaata tcaaatagtg attggttttc attgaaaaat gctctttttt ctgagtaaaa 4261 agtttcttga gcagaaaagt taataaccac atcatcactc tcagcttgtc cgaaatctgg 4321 attaatagct aaatttattt gacgaccttt tccagtgcta taaaagattt cagccccaat 4381 atctgaccct tcttggtttg taactgaatt tttatttgaa gatatatatg gaaaaaaagt 4441 aagctttgat tttgtatagt tttgtatttc taagctatct aactcttgaa agtagtcatt 4501 tctactagcc attgttccgg cactgctaac ccatgactca ttttcggcac tataacgtaa 4561 tgcggtgtaa ttaatttttc ttatatcacc atcaggctgt ttcattaacg ttacatccca 4621 aggaataaaa aactcagaga cccaataccc atcaaatttt tgtgtttttg caatccaatc 4681 tccatcccag tctgtcttaa agtctcctgc ttgcgttttt atggcatcga aaagcgagtt 4741 cccaagattt atagcaagaa tgaaagcttt gttaccatca ccatcaaagt ctatatttat 4801 agagttttta tcgctaagtg aatttatttg atctctaagc gtcttcctgg agaacataga 4861 atcattactt tgaaaattct taaatccaac atatatacca tccttatttg agaaaattaa 4921 agccgttgta agaagttcat ttttcttaag agtaaaagga gatgtttcat aaaaatctgt 4981 aatttcaaat gcattattcc actcaggctc atcaagagag ccatcaataa caattgagtt 5041 tgaccaaatc agcacagagg taagtaaaag tgataatgag gctaaagtta ttttcataga 5101 tagatttaat tcagagtatt ttaatctatg aagagtagga aatcatccga tcattctcag 5161 aaaaaacata atcatattta aggtcatgtt tttctccaaa atgcatatta caaatctggt 5221 attgataaca cagtccaacc aataaaggtc ttgatacctc ctcgtttata gagccaataa 5281 ttctatcgaa atatccagag ccatagccaa gcctgtatcc atttaaatca actcctgtca 5341 taggaataaa cattaaatca atttcattta tgttgacata atcttcactt ttaacctctt

Page 34: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

cat /Users/jeisen/Desktop/sequence.gb.txt | grep product

/product="predicted threonine dehydratase" /product="predicted secreted protein" /product="predicted 5-formyltetrahydrofolate cyclo-ligase" /product="predicted Xaa-Pro aminopeptidase" /product="predicted quinone biosynthesis monooxygenase" /product="predicted 50S ribosomal protein L28" /product="predicted deoxyuridine 5'-triphosphate /product="predicted primosomal protein N' (replication /product="predicted N-acetylmuramoyl-L-alanine amidase" /product="predicted kinase of the phosphomethylpyrimidine /product="predicted kinase of P-loop ATPase superfamily" /product="predicted eukaryote-type paraoxonase" /product="predicted alpha/beta hydrolase" /product="predicted MutT superfamily hydrolase" /product="predicted metallobeta lactamase fold protein" /product="predicted acetyl-coenzyme A synthetase" /product="predicted deoxypurine kinase" /product="predicted poly(A) polymerase" /product="predicted DskA family protein with conserved /product="predicted ribokinase family sugar kinase" /product="predicted glutamate-1-semialdehyde /product="predicted enzyme with conserved cysteine" /product="predicted metallopeptidase of the G-G peptidase /product="predicted proline dehydrogenase" /product="predicted tyrosyl-tRNA synthetase" /product="predicted prolyl endopeptidase" /product="tRNA-Ile" /product="predicted unknown protein" /product="predicted cation efflux pump" /product="predicted TonB-dependent receptor protein" /product="predicted conserved membrane protein" /product="predicted LicA, choline kinase involved in LPS /product="predicted NAD-dependent formate dehydrogenase" /product="predicted 2-nitropropane dioxygenase" /product="predicted oxidoreductase" /product="predicted CsgA, Rossman fold oxidoreductase" /product="predicted acid--CoA ligase fadD13"

/product="proteorhodopsin" /product="predicted phosphopantetheine /product="predicted formamidopyrimidine-DNA glycosylase" /product="predicted outer membrane protein TolC" /product="predicted topoisomerase IV subunit B" /product="predicted topoisomerase IV subunit A" /product="predicted succinylornithine aminotransferase" /product="predicted 6-phosphofructokinase" /product="predicted /product="predicted ORF" /product="predicted penicillin-binding protein 2" /product="predicted membrane-bound lytic transglycosylase" /product="predicted penicillin-binding protein 6 /product="predicted D-alanine aminotransferase" /product="predicted DNA polymerase III, delta subunit" /product="predicted leucyl-tRNA synthetase (leuS)" /product="predicted acetyl-CoA carboxylase, biotin /product="predicted esterase of alpha/beta hydrolase fold" /product="predicted ABC transporter ATPase subunit" /product="predicted dihydroxyacid dehydratase" /product="predicted porphobilinogen synthase" /product="predicted transcription termination factor Rho" /product="predicted porphobilinogen deaminase" /product="predicted transporter (ACRA/ACRE type)" /product="predicted cation efflux system (AcrB/AcrD/AcrF /product="predicted secreted lipoprotein" /product="predicted ribosomal large subunit pseudouridine /product="predicted acetolactate synthase III large chain" /product="predicted ketol-acid reductoisomerase" /product="16S ribosomal RNA" /product="23S ribosomal RNA" /product="predicted YacE family of P-loop kinases" /product="predicted preprotein translocase secA subunit"

Page 35: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 36: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The inferred amino acid sequence of the proteorho- dopsin showed statistically significant similarity to archaeal rhodopsins (11).

Page 37: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The inferred amino acid sequence of the proteorho- dopsin showed statistically significant similarity to archaeal rhodopsins (11).

Page 38: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

SAR86 has Proteorhodopsin in Its Genome

tion with multiple sequence alignments, indi-cates that the majority of active site residuesare well conserved between proteorhodopsinand archaeal bacteriorhodopsins (15).

A phylogenetic comparison with archaealrhodopsins placed proteorhodopsin on an in-dependent long branch, with moderate statis-tical support for an affiliation with sensoryrhodopsins (16) (Fig. 1B). The finding ofarchaeal-like rhodopsins in organisms as di-verse as marine proteobacteria and eukarya(6) suggests a potential role for lateral genetransfer in their dissemination. Available ge-nome sequence data are insufficient to iden-tify the evolutionary origins of the proteo-rhodopsin genes. The environments fromwhich the archaeal and bacterial rhodopsinsoriginate are, however, strikingly different.Proteorhodopsin is of marine origin, whereasthe archaeal rhodopsins of extreme halophilesexperience salinity 4 to 10 times greater thanthat in the sea (14).

Functional analysis. To determinewhether proteorhodopsin binds retinal, weexpressed the protein in Escherichia coli(17). After 3 hours of induction in the pres-ence of retinal, cells expressing the proteinacquired a reddish pigmentation (Fig. 3A).When retinal was added to the membranes ofcells expressing the proteorhodopsin apopro-tein, an absorbance peak at 520 nm wasobserved after 10 min of incubation (Fig.3B). On further incubation, the peak at 520nm increased and had a !100-nm half-band-width. The 520-nm pigment was generatedonly in membranes containing proteorhodop-sin apoprotein, and only in the presence ofretinal, and its !100-nm half-bandwidth istypical of retinylidene protein absorptionspectra found in other rhodopsins. The red-shifted "max of retinal ("max # 370 nm in thefree state) is indicative of a protonated Schiffbase linkage of the retinal, presumably to thelysine residue in helix G (18).

Light-mediated proton translocation was de-termined by measuring pH changes in a cellsuspension exposed to light. Net outward trans-port of protons was observed solely in proteor-hodopsin-containing E. coli cells and only inthe presence of retinal and light (Fig. 4A).Light-induced acidification of the medium wascompletely abolished by the presence of a 10$M concentration of the protonophore carbonylcyanide m-chlorophenylhydrazone (19). Illumi-nation generated a membrane electrical poten-tial in proteorhodopsin-containing right-side-out membrane vesicles, in the presence of reti-nal, reaching –90 mV 2 min after light onset(20) (Fig. 4B). These data indicate that proteo-rhodopsin translocates protons and is capable ofgenerating membrane potential in a physiolog-ically relevant range. Because these activitieswere observed in E. coli membranes containingoverexpressed protein, the levels of proteorho-dopsin activity in its native state remain to be

determined. The ability of proteorhodopsin togenerate a physiologically significant mem-brane potential, however, even when heterolo-gously expressed in nonnative membranes, isconsistent with a postulated proton-pumpingfunction for proteorhodopsin.

Archaeal bacteriorhodopsin, and to a less-er extent sensory rhodopsins (21), can bothmediate light-driven proton-pumping activi-ty. However, sensory rhodopsins are general-ly cotranscribed with genes encoding theirown transducer of light stimuli [for example,Htr (22, 23)]. Although sequence analysis ofproteorhodopsin shows moderate statisticalsupport for a specific relationship with sen-

sory rhodopsins, there is no gene for an Htr-like regulator adjacent to the proteorhodopsingene. The absence of an Htr-like gene inclose proximity to the proteorhodopsin genesuggests that proteorhodopsin may functionprimarily as a light-driven proton pump. It ispossible, however, that such a regulatormight be encoded elsewhere in the proteobac-terial genome.

To further verify a proton-pumping func-tion for proteorhodopsin, we characterizedthe kinetics of its photochemical reaction cy-cle. The transport rhodopsins (bacteriorho-dopsins and halorhodopsins) are character-ized by cyclic photochemical reaction se-

Fig. 1. (A) Phylogenetic tree of bacterial 16S rRNA gene sequences, including that encoded on the130-kb bacterioplankton BAC clone (EBAC31A08) (16). (B) Phylogenetic analysis of proteorhodop-sin with archaeal (BR, HR, and SR prefixes) and Neurospora crassa (NOP1 prefix) rhodopsins (16).Nomenclature: Name_Species.abbreviation_Genbank.gi (HR, halorhodopsin; SR, sensory rhodopsin;BR, bacteriorhodopsin). Halsod, Halorubrum sodomense; Halhal, Halobacterium salinarum (halo-bium); Halval, Haloarcula vallismortis; Natpha, Natronomonas pharaonis; Halsp, Halobacterium sp;Neucra, Neurospora crassa.

R E S E A R C H A R T I C L E S

www.sciencemag.org SCIENCE VOL 289 15 SEPTEMBER 2000 1903

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!38

Page 39: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Halophiles …

!39

Page 40: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Bacteriorhodopsin and its relatives

Figure 3. Phylogenetic tree based on the amino acid sequences of 25 archaeal rhodopsins. (a) NJ-tree. The numbers at each node are clustering probabilities generated by bootstrap resampling 1000 times. D1 and D2 represent gene duplication points. The four shaded rectangles indicate the speciation dates when halobacteria speciation occurred at the genus level. (b) ML-tree. Log likelihood value for ML-tree was −6579.02 (best score) and that for topology of the NJ-tree was −6583.43. The stippled bars indicate the 95% confidence limits. Both trees were tentatively rooted at the mid-point of the longest distance, although true root positions are unknown.

From Ihara et al. 1999

!40

Page 41: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

What Does It Do?

Page 42: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Proteorhodopsin Predicted Secondary Structure

quences (photocycles) that are typically !20ms, whereas sensory rhodopsins are slow-cycling pigments with photocycle half-times"300 ms (3). This large kinetic difference isfunctionally important, because a rapid pho-tocycling rate is advantageous for efficiention pumping, whereas a slower cycle pro-vides more efficient light detection becausesignaling states persist for longer times. Toassess the photochemical reactivity of prote-orhodopsin and its kinetics, we subjectedmembranes containing the pigment to a 532-nm laser flash and analyzed flash-inducedabsorption changes in the 50-#s to 10-s timewindow. We observed transient flash-in-duced absorption changes in the early timesin this range (Fig. 5). Transient depletionoccurred near the absorption maximum of thepigment (500-nm trace, Fig. 5, top panel),and transient absorption increase was detect-ed at 400 nm and 590 nm, indicating a func-tional photocyclic reaction pathway. The ab-sorption difference spectrum shows that with-in 0.5 ms, an intermediate with maximal ab-sorption near 400 nm is produced (Fig. 5,bottom panel), which is typical of unproto-nated Schiff base forms (M intermediates) ofretinylidene pigments. The 5-ms minus 0.5-ms difference spectrum shows that after Mdecay, an intermediate species that is red-shifted from the unphotolyzed 520-nm stateappears, which is analogous to the final in-termediate (O) in bacteriorhodopsin. The de-cay of proteorhodopsin O is the rate-limitingstep in the photocycle and is fit well by asingle exponential process of 15 ms, with anupward baseline shift of 13% of the initialamplitude. A possible explanation is hetero-geneity in the proteorhodopsin population,with 87% of the molecules exhibiting a 15-ms photocycle and 13% exhibiting a slowerrecovery. An alternative explanation is thatphotocycle complexity such as branchingproduces a biphasic O decay. Consistent withthis alternative, the O recovery is fit equallywell as a two-exponential process with a fastcomponent, with a 9-ms half-time (61% ofthe total amplitude) and a slow componentwith a 45-ms half-time (39% of the totalamplitude). In either case, the rapid photo-cycle rate, which is a distinguishing charac-teristic of ion pumps, provides additionalstrong evidence that proteorhodopsin func-tions as a transporter rather than as a sensoryrhodopsin.

Implications. The $-proteobacteria thatharbor the proteorhodopsin are widely dis-tributed in the marine environment. Thesebacteria have been frequently detected in cul-ture-independent surveys (24) in coastal andoceanic regions of the Atlantic and PacificOceans, as well as in the Mediterranean Sea(8, 25–29). In addition to its widespread dis-tribution, preliminary data also suggest thatthis $-proteobacterial group is abundant (30,

Fig. 2. Secondarystructure of proteo-rhodopsin. Single-letter amino acidcodes are used (33),and the numberingis as in bacteriorho-dopsin. Predictedretinal binding pock-et residues aremarked in red.

Fig. 3. (A) Proteorhodopsin-expressing E. coli cell suspension (%) compared to control cells (&),both with all-trans retinal. (B) Absorption spectra of retinal-reconstituted proteorhodopsin in E. colimembranes (17). A time series of spectra is shown for reconstituted proteorhodopsin membranes(red) and a negative control (black). Time points for spectra after retinal addition, progressing fromlow to high absorbance values, are 10, 20, 30, and 40 min.

Fig. 4. (A) Light-driven transport of protons by aproteorhodopsin-expressing E. coli cell suspension.The beginning and cessation of illumination (with

yellow light "485 nm) is indicated by arrows labeled ON and OFF, respectively. The cells weresuspended in 10 mM NaCl, 10 mM MgSO4!7H2O, and 100 #M CaCl2. (B) Transport of

3H%-labeledtetraphenylphosphonium ([3H%]TPP) in E. coli right-side-out vesicles containing expressed proteorho-dopsin, reconstituted with (squares) or without (circles) 10 #M retinal in the presence of light (opensymbols) or in the dark (solid symbols) (20).

R E S E A R C H A R T I C L E S

15 SEPTEMBER 2000 VOL 289 SCIENCE www.sciencemag.org1904

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!42

Page 43: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The amino acid residues that form a retinal binding pocket in archaeal rhodopsins are also highly conserved in proteorhodopsin (Fig. 2). In par- ticular, the critical lysine residue in helix G, which forms the Schiff base linkage with retinal in archaeal rhodopsins, is present in proteorhodopsin. Analysis of a structural model of proteorhodopsin (14 ), in conjunction with multiple sequence alignments, indicates that the majority of active site residues are well conserved between proteorhodopsin and archaeal bacteriorhodopsins (15).

Page 44: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 45: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

The amino acid residues that form a retinal binding pocket in archaeal rhodopsins are also highly conserved in proteorhodopsin (Fig. 2). In par- ticular, the critical lysine residue in helix G, which forms the Schiff base linkage with retinal in archaeal rhodopsins, is present in proteorhodopsin. Analysis of a structural model of proteorhodopsin (14 ), in conjunction with multiple sequence alignments, indicates that the majority of active site residues are well conserved between proteorhodopsin and archaeal bacteriorhodopsins (15).

Page 46: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 47: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Proteorhodopsin in E. coli

quences (photocycles) that are typically !20ms, whereas sensory rhodopsins are slow-cycling pigments with photocycle half-times"300 ms (3). This large kinetic difference isfunctionally important, because a rapid pho-tocycling rate is advantageous for efficiention pumping, whereas a slower cycle pro-vides more efficient light detection becausesignaling states persist for longer times. Toassess the photochemical reactivity of prote-orhodopsin and its kinetics, we subjectedmembranes containing the pigment to a 532-nm laser flash and analyzed flash-inducedabsorption changes in the 50-#s to 10-s timewindow. We observed transient flash-in-duced absorption changes in the early timesin this range (Fig. 5). Transient depletionoccurred near the absorption maximum of thepigment (500-nm trace, Fig. 5, top panel),and transient absorption increase was detect-ed at 400 nm and 590 nm, indicating a func-tional photocyclic reaction pathway. The ab-sorption difference spectrum shows that with-in 0.5 ms, an intermediate with maximal ab-sorption near 400 nm is produced (Fig. 5,bottom panel), which is typical of unproto-nated Schiff base forms (M intermediates) ofretinylidene pigments. The 5-ms minus 0.5-ms difference spectrum shows that after Mdecay, an intermediate species that is red-shifted from the unphotolyzed 520-nm stateappears, which is analogous to the final in-termediate (O) in bacteriorhodopsin. The de-cay of proteorhodopsin O is the rate-limitingstep in the photocycle and is fit well by asingle exponential process of 15 ms, with anupward baseline shift of 13% of the initialamplitude. A possible explanation is hetero-geneity in the proteorhodopsin population,with 87% of the molecules exhibiting a 15-ms photocycle and 13% exhibiting a slowerrecovery. An alternative explanation is thatphotocycle complexity such as branchingproduces a biphasic O decay. Consistent withthis alternative, the O recovery is fit equallywell as a two-exponential process with a fastcomponent, with a 9-ms half-time (61% ofthe total amplitude) and a slow componentwith a 45-ms half-time (39% of the totalamplitude). In either case, the rapid photo-cycle rate, which is a distinguishing charac-teristic of ion pumps, provides additionalstrong evidence that proteorhodopsin func-tions as a transporter rather than as a sensoryrhodopsin.

Implications. The $-proteobacteria thatharbor the proteorhodopsin are widely dis-tributed in the marine environment. Thesebacteria have been frequently detected in cul-ture-independent surveys (24) in coastal andoceanic regions of the Atlantic and PacificOceans, as well as in the Mediterranean Sea(8, 25–29). In addition to its widespread dis-tribution, preliminary data also suggest thatthis $-proteobacterial group is abundant (30,

Fig. 2. Secondarystructure of proteo-rhodopsin. Single-letter amino acidcodes are used (33),and the numberingis as in bacteriorho-dopsin. Predictedretinal binding pock-et residues aremarked in red.

Fig. 3. (A) Proteorhodopsin-expressing E. coli cell suspension (%) compared to control cells (&),both with all-trans retinal. (B) Absorption spectra of retinal-reconstituted proteorhodopsin in E. colimembranes (17). A time series of spectra is shown for reconstituted proteorhodopsin membranes(red) and a negative control (black). Time points for spectra after retinal addition, progressing fromlow to high absorbance values, are 10, 20, 30, and 40 min.

Fig. 4. (A) Light-driven transport of protons by aproteorhodopsin-expressing E. coli cell suspension.The beginning and cessation of illumination (with

yellow light "485 nm) is indicated by arrows labeled ON and OFF, respectively. The cells weresuspended in 10 mM NaCl, 10 mM MgSO4!7H2O, and 100 #M CaCl2. (B) Transport of

3H%-labeledtetraphenylphosphonium ([3H%]TPP) in E. coli right-side-out vesicles containing expressed proteorho-dopsin, reconstituted with (squares) or without (circles) 10 #M retinal in the presence of light (opensymbols) or in the dark (solid symbols) (20).

R E S E A R C H A R T I C L E S

15 SEPTEMBER 2000 VOL 289 SCIENCE www.sciencemag.org1904

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

!47

Page 48: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Fig. 3. (A) Proteorhodopsin-expressing E. coli cell suspension (+) compared to control cells (-), both with all-trans retinal. (B) Absorption spectra of retinal-reconstituted proteorhodopsin in E. coli membranes (17). A time series of spectra is shown for reconstituted proteorhodopsin membranes (red) and a negative control (black). Time points for spectra after retinal addition, progressing from low to high absorbance values, are 10, 20, 30, and 40 min.

Page 49: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Proteorhodopsin was amplified from the 130-kb bac- terioplankton BAC clone 31A08 by polymerase chain reaction (PCR), using the primers 5 -ACCATGGGTA- AAT TAT TACTGATAT TAGG-3 and 5 -AGCAT TA- GAAGAT TCT T TAACAGC-3 . The amplified fragment was cloned with the pBAD TOPO TA Cloning Kit (Invitrogen). The protein was cloned with its native signal sequence and included an addition of the V5 epitope and a polyhistidine tail in the COOH-terminus. The same PCR product in the opposite orientation was used as a negative control. The protein was expressed in the E. coli outer membrane protease- deficient strain UT5600 [M. E. Elish, J. R. Pierce, C. F. Earhart, J. Gen. Microbiol. 134, 1355 (1988)] and induced with 0.2% arabinose for 3 hours. Membranes were prepared according to Shimono et al. [K. Shi- mono, M. Iwamoto, M. Sumi, N. Kamo, FEBS Lett. 420, 54 (1997)] and resuspended in 50 mM tris-Cl (pH 8.0) and 5 mM MgCl2. The absorbance spectrum was measured according to Bieszke et al. (7) in the presence of 10 M all-trans retinal.

Page 50: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Proteorhodopsin function

quences (photocycles) that are typically !20ms, whereas sensory rhodopsins are slow-cycling pigments with photocycle half-times"300 ms (3). This large kinetic difference isfunctionally important, because a rapid pho-tocycling rate is advantageous for efficiention pumping, whereas a slower cycle pro-vides more efficient light detection becausesignaling states persist for longer times. Toassess the photochemical reactivity of prote-orhodopsin and its kinetics, we subjectedmembranes containing the pigment to a 532-nm laser flash and analyzed flash-inducedabsorption changes in the 50-#s to 10-s timewindow. We observed transient flash-in-duced absorption changes in the early timesin this range (Fig. 5). Transient depletionoccurred near the absorption maximum of thepigment (500-nm trace, Fig. 5, top panel),and transient absorption increase was detect-ed at 400 nm and 590 nm, indicating a func-tional photocyclic reaction pathway. The ab-sorption difference spectrum shows that with-in 0.5 ms, an intermediate with maximal ab-sorption near 400 nm is produced (Fig. 5,bottom panel), which is typical of unproto-nated Schiff base forms (M intermediates) ofretinylidene pigments. The 5-ms minus 0.5-ms difference spectrum shows that after Mdecay, an intermediate species that is red-shifted from the unphotolyzed 520-nm stateappears, which is analogous to the final in-termediate (O) in bacteriorhodopsin. The de-cay of proteorhodopsin O is the rate-limitingstep in the photocycle and is fit well by asingle exponential process of 15 ms, with anupward baseline shift of 13% of the initialamplitude. A possible explanation is hetero-geneity in the proteorhodopsin population,with 87% of the molecules exhibiting a 15-ms photocycle and 13% exhibiting a slowerrecovery. An alternative explanation is thatphotocycle complexity such as branchingproduces a biphasic O decay. Consistent withthis alternative, the O recovery is fit equallywell as a two-exponential process with a fastcomponent, with a 9-ms half-time (61% ofthe total amplitude) and a slow componentwith a 45-ms half-time (39% of the totalamplitude). In either case, the rapid photo-cycle rate, which is a distinguishing charac-teristic of ion pumps, provides additionalstrong evidence that proteorhodopsin func-tions as a transporter rather than as a sensoryrhodopsin.

Implications. The $-proteobacteria thatharbor the proteorhodopsin are widely dis-tributed in the marine environment. Thesebacteria have been frequently detected in cul-ture-independent surveys (24) in coastal andoceanic regions of the Atlantic and PacificOceans, as well as in the Mediterranean Sea(8, 25–29). In addition to its widespread dis-tribution, preliminary data also suggest thatthis $-proteobacterial group is abundant (30,

Fig. 2. Secondarystructure of proteo-rhodopsin. Single-letter amino acidcodes are used (33),and the numberingis as in bacteriorho-dopsin. Predictedretinal binding pock-et residues aremarked in red.

Fig. 3. (A) Proteorhodopsin-expressing E. coli cell suspension (%) compared to control cells (&),both with all-trans retinal. (B) Absorption spectra of retinal-reconstituted proteorhodopsin in E. colimembranes (17). A time series of spectra is shown for reconstituted proteorhodopsin membranes(red) and a negative control (black). Time points for spectra after retinal addition, progressing fromlow to high absorbance values, are 10, 20, 30, and 40 min.

Fig. 4. (A) Light-driven transport of protons by aproteorhodopsin-expressing E. coli cell suspension.The beginning and cessation of illumination (with

yellow light "485 nm) is indicated by arrows labeled ON and OFF, respectively. The cells weresuspended in 10 mM NaCl, 10 mM MgSO4!7H2O, and 100 #M CaCl2. (B) Transport of

3H%-labeledtetraphenylphosphonium ([3H%]TPP) in E. coli right-side-out vesicles containing expressed proteorho-dopsin, reconstituted with (squares) or without (circles) 10 #M retinal in the presence of light (opensymbols) or in the dark (solid symbols) (20).

R E S E A R C H A R T I C L E S

15 SEPTEMBER 2000 VOL 289 SCIENCE www.sciencemag.org1904

on

Ma

y 1

8,

20

10

w

ww

.sc

ien

ce

ma

g.o

rgD

ow

nlo

ad

ed

fro

m

Page 51: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

20. Right-side-out membrane vesicles were prepared according to Kaback [H. R. Kaback, Methods Enzymol. 22, 99 (1971)]. Membrane electrical potential was measured with the lipophilic cation TPP by means of rapid filtration [E. Prossnitz, A. Gee, G. F. Ames, J. Biol. Chem. 264, 5006 (1989)] and was calculated according to Robertson et al. [D. E. Robertson, G. J. Kaczorowski, M. L. Garcia, H. R. Kaback, Biochemistry 19, 5692 (1980)]. No energy sources other than light were added to the vesicles, and the experiments were conducted at room temperature.

Page 52: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Figure 5Laser flash-induced absorbance changes in suspensions of E. coli membranes containing proteorhodopsin. A 532-nm pulse (6 ns duration, 40 mJ) was delivered at time 0, and absorption changes were monitored at various wavelengths in the visible range in a lab-constructed flash photolysis system as described (34). Sixty-four transients were collected for each wavelength. (A) Transients at the three wavelengths exhibiting maximal amplitudes. (B) Absorption difference spectra calculated from amplitudes at 0.5 ms (blue) and between 0.5 ms and 5.0 ms (red).

http://www.sciencemag.org/content/289/5486/1902/F5.expansion.html

Page 53: Microbial Phylogenomics (EVE161) Class 14: Metagenomics
Page 54: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Papers for Today

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

,;1 <8=34>? -1 @1 A <B::92C? D1 #=76:3E 892F584BGB4H 9I &364= :3476B3EC 34 =BJ= 47:K7634567C1 !" #$%&'()"

*$)" !!" LMLLNLMOP Q,M;.R1

,O1 S9E4? -1 T1 #=76:3E FBII5CBGB4H 9I 9EBGB271 +,-.' /0,1$." 234" 5$.." #!" +/+N+/O Q,M;UR1

,M1 <7BK9EF? $1 #7:K7634567 F7K72F7287 9I 4=76:3E 4632CK964 K69K764B7C 9I 86HC43EEB27 698VCNJ72763E E301

6$3.%1%&'()43) #$%" ,L,N,;, Q,MMOR1

./1 S9I:7BC476? "1 W1 W324E7 G3E57C 9I 4=76:3E 892F584BGB4H 32F 4=7 J794=76: I69: K=9292 EBI74B:7C1

234$13$ #&'" ,LMMN,;/L Q,MMMR1

.,1 XE7:72C? Y1 D1 #=76:3E 892F584BGB4H 32F E344B87 GBZ634B923E :9F7C1 2%047 2.,.$ /'()" !" ,NMO Q,MUOR1

..1 T95=BIF?W1 "1? "2F635E4? [1? @B\574? D1 A %B8=74? Y1 #=76:3E 7]K32CB92 9I I96C476B47 5K 49 4=7:7E4B2J

K9B241 #$%&'()" *$)" 5$.." #'" ,P+N,,PL Q,MMLR1

.P1 (358=7>? "1? T36659E? D1 A #9::3CB? "1^=H F9 8924B2724C Z673V 5K K363EE7E 49 328B724 969J72B8 Z7E4C_

6$--, 8%9, $" L.NLL Q,MM;R1

.+1 #9::3CB? "1 A (358=7>? "1 `924B27243E 6BI4B2J K363EE7E 49 328B724 89EEBCB923E Z7E4Ca "2 7II784 9I 4=7

:78=32B83E 32BC9469KH 9I 4=7 EB4=9CK=76B8 :324E71 +,-.' /0,1$." 234" 5$.." %&(" ,MMN.,/ Q.//,R1

.U1 "8=3576? $1 A X%b<Y 096VB2J J695K1 !70 BF73C 92 4=7 X72H3 6BI4 Z3C7F 92 4=7 B2G76CB92 9I 4=7

89:ZB27F F343C74 9I 4=7 ,MOU 32F ,MOMcM/ C7BC:B8 49:9J63K=H 7]K76B:724C16$3.%1%&'()43) #')" P/UN

P.M Q,MOOR1

.L1 <7BK9EF? $1 <B:5E432795C :73C567:724C 9I 4=76:3E FBII5CBGB4H 32F 4=76:3E 892F584BGB4H 52F76 =BJ=

K67CC567 5CB2J 4=76:3E K5EC7 9I d2B47 E72J4=1 :4;' 6$<&"=:4;' /-$))" #*" L/MNL,P Q,MOOR1

.;1 "Z63:C92? &1 S1? T6902? W1? <E54CVH? *1 -1 A e35J? -1 #=7 7E3C4B8 892C4324C 9I <32 `36E9C 9EBGB27 5K 49

,;DY31 !" #$%&'()" *$)" %*#" ,..U.N,..LP Q,MM;R1

.O1 [5IIH? #1 <1 A(35J=32?W1 #1 &E3C4B8B4H 9I 72C434B47 32F B4C 67E34B92C=BKC 49 86HC43E C465845671 !" #$%&'()"

*$)" $'" POPNPM, Q,MOOR1

!"#$%&'()*(+($,-

7 4=32V `1 X36J76 I96 K364B8BK34B92 B2 4=7 :73C567:724C? 32FW1 @766769 32F @1 T95FB76I96 K69GBFB2J 4=7 C3:KE7C I69: T3EFBCC769 32F Y3K53 !70 D5B273? 67CK784BG7EH1 #=7*3Z96349B67 F7 #784929K=HCB\57fC &T<[ CHC47: 03C I52F7F ZH 4=7 `!%<cb!<$?$2BG76CB47g 9I W924K7EEB76 bb? 32F !<@ K69h784 ii"2349:H 9I 32 368=732 863492ff1 #=BC 096V03C C5KK9647F ZH 4=7 `!%<cb!<$ K69J63::7 ii"84B92 #=7g:34B\57 b229G3247ff1

`9667CK92F7287 C=95EF Z7 3FF67CC7F 49 "1#1Q7j:3BEa F7B3kFC45152BGj:924K.1I6R1

11111111111111111111111111111111111111111111111111111111111111111!"#$%#"&#'#()*+ (&#$#$"#(&,*+ $&% #-%.+/'%' 0%1 2.3lm4 56%+. 78 9(:'*-&mn4 ;#&+ <8 9(:'*-&n4 =."*#+ <%-6%"-l> 5'?."' @8 A%<#+Bl

l>%1.$-$( ?,( @AB,-4B< *$)$,-3' C1).4.B.$D >%)) 5,1741;D E,04F%-14, GHIJGDK2@nL$&,-.<$1. %F >43-%M4%0%;( ,17 >%0$3B0,- #$1$.43)D 6'$ K149$-)4.( %F6$N,) >$743,0 23'%%0D :%B).%1D 6$N,) OOIJID K2@m6'$)$ ,B.'%-) 3%1.-4MB.$7 $AB,00( .% .'4) P%-Q

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

,-./0.-1.2.3456%" 7 -0/56789:.6/75656; 56/0;-78 <0<=-760 3-.9/056 /17/ >?6:/5.64 74 7 85;1/92-5@06 3-./.6 3?<3" A74 254:.@0-0256 /10 ;06.<0 .> 76 ?6:?8/5@7/02 <7-560 =7:/0-5?<B 1.A0@0-" /103-0@7806:0" 0C3-0445.6 762 ;060/5: @7-57=585/D .> /154 3-./056 5667/5@0 <7-560 <5:-.=578 3.3?87/5.64 -0<756 ?6E6.A6+ F0-0 A0-03.-/ /17/ 31./.7:/5@0 3-./0.-1.2.3456 54 3-0406/ 56 .:0765:4?->7:0 A7/0-4+ G0 784. 3-.@520 0@5206:0 .> 76 0C/0645@0 >7<58D .>;8.=788D 254/-5=?/02 3-./0.-1.2.3456 @7-576/4+ H10 3-./056 35;9<06/4 :.<3-5456; /154 -1.2.3456 >7<58D 400< /. =0 430:/-788D/?602 /. 25>>0-06/ 17=5/7/4I7=4.-=56; 85;1/ 7/ 25>>0-06/ A7@09806;/14 56 7::.-276:0 A5/1 85;1/ 7@7587=80 56 /10 [email protected]<06/+H.;0/10-" .?- 27/7 4?;;04/ /17/ 3-./0.-1.2.34569=7402 31././-.931D 54 7 ;8.=788D 45;65J:76/ .:0765: <5:-.=578 3-.:044+T38476B96=9F9KCB2? 3 674B23Ej89243B2B2J :7:Z6327 K6947B2 4=34

I5284B92C 3C 3 EBJ=4jF6BG72 K69492 K5:K? 03C FBC89G767F 2736EH4=677 F783F7C 3J9 B2 4=7 =3E9K=BE7 :,0%M,3.$-4B< ),041,-B<.1 #=7iK56KE7 :7:Z6327Cf 9I 4=BC 368=3792 89243B2 3 =BJ= 8928724634B929I Z38476B96=9F9KCB2 :9E785E7C? K38V7F B2 32 96F767F 409jFB:72jCB923E 86HC43EEB27 3663HP1 )2 3ZC96K4B92 9I EBJ=4? Z38476B96=9F9KCB252F76J97C 3 C76B7C 9I 892I96:34B923E C=BI4C Q3 K=9498H8E7R? 835CB2J 3K69492 49 Z7 4632CK9647F 3869CC 4=7 :7:Z63271 #=7 67C5E4B2J7E784698=7:B83E :7:Z6327 K94724B3E F6BG7C "#Y CH24=7CBC? 4=695J=

32 Soj"#Y3C7+1 <B:BE36 6=9F9KCB2j:7FB347F? EBJ=4jF6BG72 K69492K5:KB2J? I96:76EH 4=95J=4 49 7]BC4 92EH B2 =3E9K=BEB8 368=373? =3CZ772 FBC89G767F B2 32 5285E4BG347F :36B27 Z38476B5:, 9I 4=7i<"%OLf K=HE9J7274B8 J695KU1 #=BC d2FB2J C5JJ7C47F 4=34 3 K67jGB95CEH 526789J2B>7F K=949469K=B8 K34=03H :3H 98856 B2 4=798732fC K=94B8 >927p =907G76? 3EE 736EB76 F343 367 Z3C7F C9E7EH 926789:ZB2324 [!" 32F 41 94.-% ZB98=7:B83E 323EHC7C? 32F 4=BCK=729:7292 =3C 294 H74 Z772 9ZC76G7F B2 4=7 C731#9 47C4 0=74=76 6=9F9KCB2jEBV7 :9E785E7C I96: K=949384BG7

K6947B2C B2 234BG7 :36B27 Z38476B3? 07 323EHC7F :7:Z6327 K67K363j4B92C 9I Z38476B9KE32V492 89EE7847F FB6784EH I69: W924767H T3HC56I387 03476C1 $CB2J E3C76 q3C=jK=949EHCBC 478=2B\57CL? 07 C7368=7FI96 CK78469C89KB8 7GBF7287 9I K694796=9F9KCB2jEBV7 K=9498=7:B83E384BGB4H B2 4=7C7 Z38476B9KE32V492 :7:Z6327 K67K3634B92C1 79ZC76G7F 4632CB724 q3C=jB2F587F 3ZC96K4B92 8=32J7C 3446BZ543ZE749 3 K=9498=7:B83E 67384B92 8H8E7 9I ,U:C Q@BJ1 ,R? 0=B8= BC8=3638476BC4B8 9I 674B2HEBF727 B92 K5:KC; 32F CB:BE36 49 4=7 K=949j8H8E7 C772 0B4= +)3'$-43'4, 3%04:7:Z6327C 7]K67CCB2J 6789:ZB2324K694796=9F9KCB2,1@564=76:967? U:C 3I476 4=7 q3C= 4=7 3ZC96K4B92 FBII767287

CK78465: 9I 4=7 72GB692:7243E C3:KE7 7]=BZB47F 4=7 C3:7 C=3K732F C3:7 BC9CZ7C4B8 K9B24 I96 4=7 EBJ=4 32F F36V C4347 3C K694796=9jF9KCB2 7]K67CC7F B2 +" 3%04 :7:Z6327C1 #=BC C=90C 4=34 K=949j8=7:B83E 67384B92C B2 234BG7 Z38476B9KE32V492 87EE :7:Z6327CJ7276347 3 67FjC=BI47F? E92JjEBG7F B2476:7FB347 CK78463EEH CB:BE3649 4=34 9I 6789:ZB2324EH 7]K67CC7F K694796=9F9KCB2 B2 +" 3%041 #=7q3C=jK=949EHCBC F343 K69GBF7 FB6784 K=HCB83E 7GBF7287 I96 4=77]BC47287 9I K694796=9F9KCB2jEBV7 4632CK96476C 32F 72F9J7295C674B23E :9E785E7C B2 4=7 :B869ZB3E I6384B92 9I 4=7C7 893C43E C56I38703476C1#=7 K=9498H8EB2J KBJ:724 03C ZE738=7F 0B4= =HF69]HE3:B27

32F? 3I476 =HF69]HE3:B27 67:9G3E? K=949384BGB4H 03C 67C4967F ZH3FFB2J 3EEj.-,1) 674B23E Q@BJ1 .R? C=90B2J 4=34 4=7 K=949CBJ23EC F9B2F77F F76BG7 I69: 674B2HEBF727 KBJ:72434B921 7 8928E5F7 4=34

–4

–3

–2

–1

0

1

2

Δ A

bsor

banc

e (×

10–3

Abs

orba

nce

(×10

–3)

806040200Time (ms)

580 nm

500 nm

400 nm

Wavelength (nm)

400 450 500 550 600 650 700–3

–2

–1

0

1

2

3

Monterey Bay environmental membranesProteorhodopsin in E. coli membranes

./*01( 2 !"#$% &"#'()*+,-$+ ".#/%."*-$ -'"*0$# )* #,#1$*#)/*# /2 3$3.%"*$#

1%$1"%$+ 2%/3 4'$ 1%/5"%6/4)- 2%"-4)/* /2 7/*4$%$6 8"6 #,%2"-$ 9"4$%#: ;/1< 3$3.%"*$

".#/%14)/* 9"# 3/*)4/%$+ "4 4'$ )*+)-"4$+ 9"=$>$*04'# "*+ 4'$ &"#' 9"# "4 4)3$ ? "4

@AB *3: 8/44/3< ".#/%14)/* +)22$%$*-$ #1$-4%,3 "4 @3# "24$% 4'$ &"#' 2/% 4'$

$*=)%/*3$*4"> #"31>$ C.>"-5D "*+ 2/% !" #$%&($E1%$##$+ 1%/4$/%'/+/1#)* C%$+D:

© 2001 Macmillan Magazines Ltd

http://www.nature.com/nature/journal/v411/n6839/abs/411786a0.html

Page 55: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

,;1 <8=34>? -1 @1 A <B::92C? D1 #=76:3E 892F584BGB4H 9I &364= :3476B3EC 34 =BJ= 47:K7634567C1 !" #$%&'()"

*$)" !!" LMLLNLMOP Q,M;.R1

,O1 S9E4? -1 T1 #=76:3E FBII5CBGB4H 9I 9EBGB271 +,-.' /0,1$." 234" 5$.." #!" +/+N+/O Q,M;UR1

,M1 <7BK9EF? $1 #7:K7634567 F7K72F7287 9I 4=76:3E 4632CK964 K69K764B7C 9I 86HC43EEB27 698VCNJ72763E E301

6$3.%1%&'()43) #$%" ,L,N,;, Q,MMOR1

./1 S9I:7BC476? "1 W1 W324E7 G3E57C 9I 4=76:3E 892F584BGB4H 32F 4=7 J794=76: I69: K=9292 EBI74B:7C1

234$13$ #&'" ,LMMN,;/L Q,MMMR1

.,1 XE7:72C? Y1 D1 #=76:3E 892F584BGB4H 32F E344B87 GBZ634B923E :9F7C1 2%047 2.,.$ /'()" !" ,NMO Q,MUOR1

..1 T95=BIF?W1 "1? "2F635E4? [1? @B\574? D1 A %B8=74? Y1 #=76:3E 7]K32CB92 9I I96C476B47 5K 49 4=7:7E4B2J

K9B241 #$%&'()" *$)" 5$.." #'" ,P+N,,PL Q,MMLR1

.P1 (358=7>? "1? T36659E? D1 A #9::3CB? "1^=H F9 8924B2724C Z673V 5K K363EE7E 49 328B724 969J72B8 Z7E4C_

6$--, 8%9, $" L.NLL Q,MM;R1

.+1 #9::3CB? "1 A (358=7>? "1 `924B27243E 6BI4B2J K363EE7E 49 328B724 89EEBCB923E Z7E4Ca "2 7II784 9I 4=7

:78=32B83E 32BC9469KH 9I 4=7 EB4=9CK=76B8 :324E71 +,-.' /0,1$." 234" 5$.." %&(" ,MMN.,/ Q.//,R1

.U1 "8=3576? $1 A X%b<Y 096VB2J J695K1 !70 BF73C 92 4=7 X72H3 6BI4 Z3C7F 92 4=7 B2G76CB92 9I 4=7

89:ZB27F F343C74 9I 4=7 ,MOU 32F ,MOMcM/ C7BC:B8 49:9J63K=H 7]K76B:724C16$3.%1%&'()43) #')" P/UN

P.M Q,MOOR1

.L1 <7BK9EF? $1 <B:5E432795C :73C567:724C 9I 4=76:3E FBII5CBGB4H 32F 4=76:3E 892F584BGB4H 52F76 =BJ=

K67CC567 5CB2J 4=76:3E K5EC7 9I d2B47 E72J4=1 :4;' 6$<&"=:4;' /-$))" #*" L/MNL,P Q,MOOR1

.;1 "Z63:C92? &1 S1? T6902? W1? <E54CVH? *1 -1 A e35J? -1 #=7 7E3C4B8 892C4324C 9I <32 `36E9C 9EBGB27 5K 49

,;DY31 !" #$%&'()" *$)" %*#" ,..U.N,..LP Q,MM;R1

.O1 [5IIH? #1 <1 A(35J=32?W1 #1 &E3C4B8B4H 9I 72C434B47 32F B4C 67E34B92C=BKC 49 86HC43E C465845671 !" #$%&'()"

*$)" $'" POPNPM, Q,MOOR1

!"#$%&'()*(+($,-

7 4=32V `1 X36J76 I96 K364B8BK34B92 B2 4=7 :73C567:724C? 32FW1 @766769 32F @1 T95FB76I96 K69GBFB2J 4=7 C3:KE7C I69: T3EFBCC769 32F Y3K53 !70 D5B273? 67CK784BG7EH1 #=7*3Z96349B67 F7 #784929K=HCB\57fC &T<[ CHC47: 03C I52F7F ZH 4=7 `!%<cb!<$?$2BG76CB47g 9I W924K7EEB76 bb? 32F !<@ K69h784 ii"2349:H 9I 32 368=732 863492ff1 #=BC 096V03C C5KK9647F ZH 4=7 `!%<cb!<$ K69J63::7 ii"84B92 #=7g:34B\57 b229G3247ff1

`9667CK92F7287 C=95EF Z7 3FF67CC7F 49 "1#1Q7j:3BEa F7B3kFC45152BGj:924K.1I6R1

11111111111111111111111111111111111111111111111111111111111111111!"#$%#"&#'#()*+ (&#$#$"#(&,*+ $&% #-%.+/'%' 0%1 2.3lm4 56%+. 78 9(:'*-&mn4 ;#&+ <8 9(:'*-&n4 =."*#+ <%-6%"-l> 5'?."' @8 A%<#+Bl

l>%1.$-$( ?,( @AB,-4B< *$)$,-3' C1).4.B.$D >%)) 5,1741;D E,04F%-14, GHIJGDK2@nL$&,-.<$1. %F >43-%M4%0%;( ,17 >%0$3B0,- #$1$.43)D 6'$ K149$-)4.( %F6$N,) >$743,0 23'%%0D :%B).%1D 6$N,) OOIJID K2@m6'$)$ ,B.'%-) 3%1.-4MB.$7 $AB,00( .% .'4) P%-Q

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

,-./0.-1.2.3456%" 7 -0/56789:.6/75656; 56/0;-78 <0<=-760 3-.9/056 /17/ >?6:/5.64 74 7 85;1/92-5@06 3-./.6 3?<3" A74 254:.@0-0256 /10 ;06.<0 .> 76 ?6:?8/5@7/02 <7-560 =7:/0-5?<B 1.A0@0-" /103-0@7806:0" 0C3-0445.6 762 ;060/5: @7-57=585/D .> /154 3-./056 5667/5@0 <7-560 <5:-.=578 3.3?87/5.64 -0<756 ?6E6.A6+ F0-0 A0-03.-/ /17/ 31./.7:/5@0 3-./0.-1.2.3456 54 3-0406/ 56 .:0765:4?->7:0 A7/0-4+ G0 784. 3-.@520 0@5206:0 .> 76 0C/0645@0 >7<58D .>;8.=788D 254/-5=?/02 3-./0.-1.2.3456 @7-576/4+ H10 3-./056 35;9<06/4 :.<3-5456; /154 -1.2.3456 >7<58D 400< /. =0 430:/-788D/?602 /. 25>>0-06/ 17=5/7/4I7=4.-=56; 85;1/ 7/ 25>>0-06/ A7@09806;/14 56 7::.-276:0 A5/1 85;1/ 7@7587=80 56 /10 [email protected]<06/+H.;0/10-" .?- 27/7 4?;;04/ /17/ 3-./0.-1.2.34569=7402 31././-.931D 54 7 ;8.=788D 45;65J:76/ .:0765: <5:-.=578 3-.:044+T38476B96=9F9KCB2? 3 674B23Ej89243B2B2J :7:Z6327 K6947B2 4=34

I5284B92C 3C 3 EBJ=4jF6BG72 K69492 K5:K? 03C FBC89G767F 2736EH4=677 F783F7C 3J9 B2 4=7 =3E9K=BE7 :,0%M,3.$-4B< ),041,-B<.1 #=7iK56KE7 :7:Z6327Cf 9I 4=BC 368=3792 89243B2 3 =BJ= 8928724634B929I Z38476B96=9F9KCB2 :9E785E7C? K38V7F B2 32 96F767F 409jFB:72jCB923E 86HC43EEB27 3663HP1 )2 3ZC96K4B92 9I EBJ=4? Z38476B96=9F9KCB252F76J97C 3 C76B7C 9I 892I96:34B923E C=BI4C Q3 K=9498H8E7R? 835CB2J 3K69492 49 Z7 4632CK9647F 3869CC 4=7 :7:Z63271 #=7 67C5E4B2J7E784698=7:B83E :7:Z6327 K94724B3E F6BG7C "#Y CH24=7CBC? 4=695J=

32 Soj"#Y3C7+1 <B:BE36 6=9F9KCB2j:7FB347F? EBJ=4jF6BG72 K69492K5:KB2J? I96:76EH 4=95J=4 49 7]BC4 92EH B2 =3E9K=BEB8 368=373? =3CZ772 FBC89G767F B2 32 5285E4BG347F :36B27 Z38476B5:, 9I 4=7i<"%OLf K=HE9J7274B8 J695KU1 #=BC d2FB2J C5JJ7C47F 4=34 3 K67jGB95CEH 526789J2B>7F K=949469K=B8 K34=03H :3H 98856 B2 4=798732fC K=94B8 >927p =907G76? 3EE 736EB76 F343 367 Z3C7F C9E7EH 926789:ZB2324 [!" 32F 41 94.-% ZB98=7:B83E 323EHC7C? 32F 4=BCK=729:7292 =3C 294 H74 Z772 9ZC76G7F B2 4=7 C731#9 47C4 0=74=76 6=9F9KCB2jEBV7 :9E785E7C I96: K=949384BG7

K6947B2C B2 234BG7 :36B27 Z38476B3? 07 323EHC7F :7:Z6327 K67K363j4B92C 9I Z38476B9KE32V492 89EE7847F FB6784EH I69: W924767H T3HC56I387 03476C1 $CB2J E3C76 q3C=jK=949EHCBC 478=2B\57CL? 07 C7368=7FI96 CK78469C89KB8 7GBF7287 9I K694796=9F9KCB2jEBV7 K=9498=7:B83E384BGB4H B2 4=7C7 Z38476B9KE32V492 :7:Z6327 K67K3634B92C1 79ZC76G7F 4632CB724 q3C=jB2F587F 3ZC96K4B92 8=32J7C 3446BZ543ZE749 3 K=9498=7:B83E 67384B92 8H8E7 9I ,U:C Q@BJ1 ,R? 0=B8= BC8=3638476BC4B8 9I 674B2HEBF727 B92 K5:KC; 32F CB:BE36 49 4=7 K=949j8H8E7 C772 0B4= +)3'$-43'4, 3%04:7:Z6327C 7]K67CCB2J 6789:ZB2324K694796=9F9KCB2,1@564=76:967? U:C 3I476 4=7 q3C= 4=7 3ZC96K4B92 FBII767287

CK78465: 9I 4=7 72GB692:7243E C3:KE7 7]=BZB47F 4=7 C3:7 C=3K732F C3:7 BC9CZ7C4B8 K9B24 I96 4=7 EBJ=4 32F F36V C4347 3C K694796=9jF9KCB2 7]K67CC7F B2 +" 3%04 :7:Z6327C1 #=BC C=90C 4=34 K=949j8=7:B83E 67384B92C B2 234BG7 Z38476B9KE32V492 87EE :7:Z6327CJ7276347 3 67FjC=BI47F? E92JjEBG7F B2476:7FB347 CK78463EEH CB:BE3649 4=34 9I 6789:ZB2324EH 7]K67CC7F K694796=9F9KCB2 B2 +" 3%041 #=7q3C=jK=949EHCBC F343 K69GBF7 FB6784 K=HCB83E 7GBF7287 I96 4=77]BC47287 9I K694796=9F9KCB2jEBV7 4632CK96476C 32F 72F9J7295C674B23E :9E785E7C B2 4=7 :B869ZB3E I6384B92 9I 4=7C7 893C43E C56I38703476C1#=7 K=9498H8EB2J KBJ:724 03C ZE738=7F 0B4= =HF69]HE3:B27

32F? 3I476 =HF69]HE3:B27 67:9G3E? K=949384BGB4H 03C 67C4967F ZH3FFB2J 3EEj.-,1) 674B23E Q@BJ1 .R? C=90B2J 4=34 4=7 K=949CBJ23EC F9B2F77F F76BG7 I69: 674B2HEBF727 KBJ:72434B921 7 8928E5F7 4=34

–4

–3

–2

–1

0

1

2

∆ A

bsor

banc

e (×

10–3

)∆

Abs

orba

nce

(×10

–3)

806040200

Time (ms)

580 nm

500 nm

400 nm

Wavelength (nm)

400 450 500 550 600 650 700–3

–2

–1

0

1

2

3

Monterey Bay environmental membranesProteorhodopsin in E. coli membranes

./*01( 2 !"#$% &"#'()*+,-$+ ".#/%."*-$ -'"*0$# )* #,#1$*#)/*# /2 3$3.%"*$#1%$1"%$+ 2%/3 4'$ 1%/5"%6/4)- 2%"-4)/* /2 7/*4$%$6 8"6 #,%2"-$ 9"4$%#: ;/1< 3$3.%"*$

".#/%14)/* 9"# 3/*)4/%$+ "4 4'$ )*+)-"4$+ 9"=$>$*04'# "*+ 4'$ &"#' 9"# "4 4)3$ ? "4

@AB *3: 8/44/3< ".#/%14)/* +)22$%$*-$ #1$-4%,3 "4 @3# "24$% 4'$ &"#' 2/% 4'$

$*=)%/*3$*4"> #"31>$ C.>"-5D "*+ 2/% !" #$%&($E1%$##$+ 1%/4$/%'/+/1#)* C%$+D:

© 2001 Macmillan Magazines Ltd

Page 56: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

!"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189: *+*

;694796<9=9;>?2 :9@785@7> 367 ;67>724 ?2 4<7 :7:A6327> 9B 234?C7:36?27 A38476?9;@32D4921#<7 E3><F;<949@G>?> =343 ;76:?4 7>4?:34?92 9B 4<7 87@@5@36 892F

8724634?92 9B ;694796<9=9;>?21 ">>5:?2H I,J 4<7 E3>< G?7@= 9B 4<772C?692:7243@ ;?H:724 ?> >?:?@36 49 4<34 9B ;694796<9=9;>?27K;67>>7= ?2 !" #$%& I4<34 ?>L ,/1MN 3A>96;4?92 8<32H7 34 M// 2:;76 3A>96;4?92 52?4 9B ;?H:724 34 M.O 2:JL 07 83@85@347 /1/PP3A>96;4?92 52?4> 9B ;694796<9=9;>?2 34 M.O 2: I/1PM!H 9B ;69F4796<9=9;>?2 ;6947?2 ;76 @?467 9B >73 03476J1 Q@3>< >;78469>89;G 03>2787>>36G 49 =74784 4<7 ;694796<9=9;>?2 ;?H:724> A7835>7 4<767 ?>:58< H673476 3A>96;4?92 ?2 4<7 C?>?A@7 632H7 AG 94<76 ;?H:724> ?24<7 >3:;@7L 0<?8< ><90 ;73D> 9B ,1RL ,1PL /1M. 32= /1OP 3A>96;4?9252?4>L 34 +POL +RSL R+O 32= ROP 2:L 67>;784?C7@G1">>5:?2H 3@>9 I.J 3 :9@36 3A>96;4?92 897BT8?724 9B M/L///U!,

8:!, 34 4<7 3A>96;4?92 :3K?:5:L IPJ E5967>87287 &' (&)* <GA6?=?FV34?92 89524> 9B 3 4943@ 9B M1R " ,/,/ W"%SR 87@@> ?2 4<7 89287246347=>3:;@7 I!,/N 9B 4<7 4943@ A38476?3X >77U74<9=>JL I+J M/N 6789C76G9B :7:A6327> B69: 4<7 87@@>L 32= IMJ 4<34 4<7 5285@4?C347= W"%SRH695; ?> 4<7 ;6?28?;3@ A38476?9;@32D492 H695; 5>?2H 4<7>7 ;?H:724>L07 83@85@347 4<34 4<767 367 .1+ " ,/+ ;694796<9=9;>?2 :9@785@7> ;76W"%SR 87@@1#<?> C3@57 ?> 92 4<7 96=76 9B 4<7 8928724634?92 9B A38476?96<9F

=9;>?2 ?2 3 +" (,%&',-*. 87@@L ?2 0<?8< >5A>4324?3@ ;964?92> 9B 4<7:7:A6327 >56B387 3673 9B 4<7 87@@ 892>?>4 9B A38476?96<9=9;>?2 ?2 34?H<4@G ;38D7= 86G>43@@?27 3663GS1 Q96 89:;36?>92L .1+ " ,/+ A38476F?96<9=9;>?2 :9@785@7> =72>7@G ;38D7= ?2 4<7 ;56;@7F:7:A6327@344?87 095@= 89C76 3 /1RF!: =?3:7476L E34 8?685@36 3673 9B :7:FA6327Y3 >?H2?T8324 ;964?92 9B 4<7 >56B387 9B 3 >?2H@7 87@@P1 #<?>25:A76 9B:9@785@7> ?> >5BT8?724 49 ;69=587 >5A>4324?3@ 3:9524> 9B"#Z 52=76 ?@@5:?234?92[1 #<767B967L 4<7 <?H< =72>?4G 9B ;694796<9F=9;>?2 ?2 4<7 W"%SR :7:A6327 ?2=?8347= AG 956 83@85@34?92>>4692H@G >5HH7>4> 4<34 4<?> ;6947?2 <3> 3 >?H2?T8324 69@7 ?2 4<7

;<G>?9@9HG 9B 4<7>7 A38476?3 &' (&)*1#9 7K;@967 4<7 ;9>>?A@7 7K?>47287 9B 94<76 ;694796<9=9;>?2>L 07

>867727= 4<7 >3:7 :?K7=F;9;5@34?92 A38476?3@ 364?T8?3@ 8<69:9F>9:7 I\"]J @?A636G,/ ?2 0<?8< ;694796<9=9;>?2 03> ?2?4?3@@G =?>F89C767=L 0?4< 292F=7H7276347 ;9@G:763>7 8<3?2 67384?92 IZ]%J;6?:76>,1 W7C763@ 3==?4?923@ ;694796<9=9;>?2F89243?2?2H \"]8@927> 0767 B952= ?2 4<7 @?A636G1 #<7>7 ;694796<9=9;>?2> 0767>?:?@36L A54 =?= =?BB76 ?2 4<7?6 3:?29F38?= >7^57287> 0<72 89:F;367= 0?4< 4<7 96?H?23@ ;694796<9=9;>?2 IB96 7K3:;@7L >77 8@927>P,"SL R+"M 32= +/&SX Q?H> P 32= +J1 $>?2H 4<7 >3:7 292F=7H7276347;6?:76>L 07 895@= 3@>9 3:;@?BG AG Z]% ;694796<9=9;>?2 H727> B69:A38476?9;@32D492_!" 7K46384>L ?28@5=?2H 4<9>7 B69:U924767G \3GIU\ 8@927>X Q?H1 PJL 4<7 W954<762 98732 IZ3@:76 >434?92X Z"*8@927>J 32= 03476> 9B 4<7 872463@ !964< Z38?T8 98732 I`303??)8732 #?:7 >76?7> >434?92,,X `)# 8@927>J1a7 =747847= ,M =?BB76724 C36?324> 9B ;694796<9=9;>?2 ?2 4<7 Z]%F

H7276347= U924767G \3G ;694796<9=9;>?2 H727 @?A636GL B3@@?2H ?2494<677 8@5>476> IQ?H1 PJ 4<34 ><367 34 @73>4 [ON ?=724?4G 9C76 .+S 3:?2938?=> IQ?H1 +J 32= [PN ?=724?4G 34 4<7 _!" @7C7@1 #09 ;694796<9F=9;>?2 H727> B69: U924767GL +/&S 32= R+"ML 0767 7K;67>>7= ?2!" #$%& 32= ;69=587= 3A>96;4?92 >;78463 C76G >?:?@36 49 4<7 96?H?23@;694796<9=9;>?2, I8@927 P,"SJ ?>9@347= B69: 4<7 >3:7 03476> I=343294 ><902J1%7:36D3A@GL 3@@ 4<7 ;694796<9=9;>?2 H727> 4<34 0767 3:;@?T7= AG

Z]% B69: "243684?8 :36?27 A38476?9;@32D492 0767 =?BB76724 B69:4<9>7 9B U924767G \3G IQ?H1 PJL ><36?2H 3 :3K?:5: 9B OSN ?=724?4G9C76 .+S 3:?29 38?=> 0?4< 4<7 U924767G 8@3=7 IQ?H1 +J1 #<7 8<32H7>?2 3:?29F38?= >7^57287> 0767 294 67>46?847= 49 4<7 <G=69;<?@?8

Laserflash

Untreated membranes

Hydroxylamine-treated membranes

Retinal-reconstituted membranes

10–3

AU

10–1 s

!"#$%& ' !"#$% &"#'()*+,-$+ .%"*#)$*.# ". /00 *1 23 " 42*.$%$5 6"5 7"-.$%)289"*:.2*

1$17%"*$ 8%$8"%".)2*; <28= 7$32%$ "++).)2* 23 '5+%2>59"1)*$? 1)++9$= "3.$% 0;@4

'5+%2>59"1)*$ .%$".1$*. ". 8A B;0= CD !E= F).' /00(*1 )99,1)*".)2* 32% G01)*? 72..21=

"3.$% -$*.%)3,H)*H .F)-$ F).' %$#,#8$*#)2* )* C0014 8'2#8'".$ 7,33$%= 8A B;0= 32992F$+

75 "++).)2* 23 /!4 "99(!"#$% %$.)*"9 "*+ )*-,7".)2* 32% C ';

MB 0m2

MB 0m1

MB 20m2

MB 20m5

MB 40m12

MB 100m9

HOT 75m3

HOT 75m8

0.01

Mon

tere

y B

ay a

nd s

hallo

w H

OT

Ant

arct

ica

and

deep

HO

T

HOT 75m1

PAL B1

PAL B5PAL B6

PAL E7

PAL E1PAL B7

PAL B2PAL B8

MB 100m10

MB 100m5

MB 100m7

MB 20m12

MB 40m1

MB 40m5BAC 40E8

BAC 31A8

PAL E6

BAC 64A5

HOT 0m1

HOT 75m4

!"#$%& ( I'592H$*$.)- "*"95#)# 23 .'$ )*3$%%$+ "1)*2("-)+ #$J,$*-$ 23 -92*$+8%2.$2%'2+28#)* H$*$#; K)#."*-$ "*"95#)# 23 @@0 82#).)2*# F"# ,#$+ .2 -"9-,9".$ .'$ .%$$

75 *$)H'72,%(L2)*)*H ,#)*H .'$ I",8M$"%-' 8%2H%"1 23 .'$ N)#-2*#)* I"-:"H$ O$%#)2*

C0;0 PQ$*$.)-# E218,.$% Q%2,8? 4"+)#2*= N)#-2*#)*R; &' %#()$#"*+ 7"-.$%)2%'2+28#)*

F"# ,#$+ "# "* 2,.H%2,8= "*+ )# *2. #'2F*; M-"9$ 7"% %$8%$#$*.# *,17$% 23 #,7#.).,.)2*#

8$% #).$; 629+ *"1$# )*+)-".$ .'$ 8%2.$2%'2+28#)*# .'". F$%$ #8$-.%"995 -'"%"-.$%)S$+ )*

.')# #.,+5;

© 2001 Macmillan Magazines Ltd

Page 57: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

!"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189: *+*

;694796<9=9;>?2 :9@785@7> 367 ;67>724 ?2 4<7 :7:A6327> 9B 234?C7:36?27 A38476?9;@32D4921

#<7 E3><F;<949@G>?> =343 ;76:?4 7>4?:34?92 9B 4<7 87@@5@36 892F8724634?92 9B ;694796<9=9;>?21 ">>5:?2H I,J 4<7 E3>< G?7@= 9B 4<772C?692:7243@ ;?H:724 ?> >?:?@36 49 4<34 9B ;694796<9=9;>?27K;67>>7= ?2 !" #$%& I4<34 ?>L ,/1MN 3A>96;4?92 8<32H7 34 M// 2:;76 3A>96;4?92 52?4 9B ;?H:724 34 M.O 2:JL 07 83@85@347 /1/PP3A>96;4?92 52?4> 9B ;694796<9=9;>?2 34 M.O 2: I/1PM!H 9B ;69F4796<9=9;>?2 ;6947?2 ;76 @?467 9B >73 03476J1 Q@3>< >;78469>89;G 03>2787>>36G 49 =74784 4<7 ;694796<9=9;>?2 ;?H:724> A7835>7 4<767 ?>:58< H673476 3A>96;4?92 ?2 4<7 C?>?A@7 632H7 AG 94<76 ;?H:724> ?24<7 >3:;@7L 0<?8< ><90 ;73D> 9B ,1RL ,1PL /1M. 32= /1OP 3A>96;4?9252?4>L 34 +POL +RSL R+O 32= ROP 2:L 67>;784?C7@G1

">>5:?2H 3@>9 I.J 3 :9@36 3A>96;4?92 897BT8?724 9B M/L///U!,

8:!, 34 4<7 3A>96;4?92 :3K?:5:L IPJ E5967>87287 &' (&)* <GA6?=?FV34?92 89524> 9B 3 4943@ 9B M1R " ,/,/ W"%SR 87@@> ?2 4<7 89287246347=>3:;@7 I!,/N 9B 4<7 4943@ A38476?3X >77U74<9=>JL I+J M/N 6789C76G9B :7:A6327> B69: 4<7 87@@>L 32= IMJ 4<34 4<7 5285@4?C347= W"%SRH695; ?> 4<7 ;6?28?;3@ A38476?9;@32D492 H695; 5>?2H 4<7>7 ;?H:724>L07 83@85@347 4<34 4<767 367 .1+ " ,/+ ;694796<9=9;>?2 :9@785@7> ;76W"%SR 87@@1

#<?> C3@57 ?> 92 4<7 96=76 9B 4<7 8928724634?92 9B A38476?96<9F=9;>?2 ?2 3 +" (,%&',-*. 87@@L ?2 0<?8< >5A>4324?3@ ;964?92> 9B 4<7:7:A6327 >56B387 3673 9B 4<7 87@@ 892>?>4 9B A38476?96<9=9;>?2 ?2 34?H<4@G ;38D7= 86G>43@@?27 3663GS1 Q96 89:;36?>92L .1+ " ,/+ A38476F?96<9=9;>?2 :9@785@7> =72>7@G ;38D7= ?2 4<7 ;56;@7F:7:A6327@344?87 095@= 89C76 3 /1RF!: =?3:7476L E34 8?685@36 3673 9B :7:FA6327Y3 >?H2?T8324 ;964?92 9B 4<7 >56B387 9B 3 >?2H@7 87@@P1 #<?>25:A76 9B:9@785@7> ?> >5BT8?724 49 ;69=587 >5A>4324?3@ 3:9524> 9B"#Z 52=76 ?@@5:?234?92[1 #<767B967L 4<7 <?H< =72>?4G 9B ;694796<9F=9;>?2 ?2 4<7 W"%SR :7:A6327 ?2=?8347= AG 956 83@85@34?92>>4692H@G >5HH7>4> 4<34 4<?> ;6947?2 <3> 3 >?H2?T8324 69@7 ?2 4<7

;<G>?9@9HG 9B 4<7>7 A38476?3 &' (&)*1#9 7K;@967 4<7 ;9>>?A@7 7K?>47287 9B 94<76 ;694796<9=9;>?2>L 07

>867727= 4<7 >3:7 :?K7=F;9;5@34?92 A38476?3@ 364?T8?3@ 8<69:9F>9:7 I\"]J @?A636G,/ ?2 0<?8< ;694796<9=9;>?2 03> ?2?4?3@@G =?>F89C767=L 0?4< 292F=7H7276347 ;9@G:763>7 8<3?2 67384?92 IZ]%J;6?:76>,1 W7C763@ 3==?4?923@ ;694796<9=9;>?2F89243?2?2H \"]8@927> 0767 B952= ?2 4<7 @?A636G1 #<7>7 ;694796<9=9;>?2> 0767>?:?@36L A54 =?= =?BB76 ?2 4<7?6 3:?29F38?= >7^57287> 0<72 89:F;367= 0?4< 4<7 96?H?23@ ;694796<9=9;>?2 IB96 7K3:;@7L >77 8@927>P,"SL R+"M 32= +/&SX Q?H> P 32= +J1 $>?2H 4<7 >3:7 292F=7H7276347;6?:76>L 07 895@= 3@>9 3:;@?BG AG Z]% ;694796<9=9;>?2 H727> B69:A38476?9;@32D492_!" 7K46384>L ?28@5=?2H 4<9>7 B69:U924767G \3GIU\ 8@927>X Q?H1 PJL 4<7 W954<762 98732 IZ3@:76 >434?92X Z"*8@927>J 32= 03476> 9B 4<7 872463@ !964< Z38?T8 98732 I`303??)8732 #?:7 >76?7> >434?92,,X `)# 8@927>J1

a7 =747847= ,M =?BB76724 C36?324> 9B ;694796<9=9;>?2 ?2 4<7 Z]%FH7276347= U924767G \3G ;694796<9=9;>?2 H727 @?A636GL B3@@?2H ?2494<677 8@5>476> IQ?H1 PJ 4<34 ><367 34 @73>4 [ON ?=724?4G 9C76 .+S 3:?2938?=> IQ?H1 +J 32= [PN ?=724?4G 34 4<7 _!" @7C7@1 #09 ;694796<9F=9;>?2 H727> B69: U924767GL +/&S 32= R+"ML 0767 7K;67>>7= ?2!" #$%& 32= ;69=587= 3A>96;4?92 >;78463 C76G >?:?@36 49 4<7 96?H?23@;694796<9=9;>?2, I8@927 P,"SJ ?>9@347= B69: 4<7 >3:7 03476> I=343294 ><902J1

%7:36D3A@GL 3@@ 4<7 ;694796<9=9;>?2 H727> 4<34 0767 3:;@?T7= AGZ]% B69: "243684?8 :36?27 A38476?9;@32D492 0767 =?BB76724 B69:4<9>7 9B U924767G \3G IQ?H1 PJL ><36?2H 3 :3K?:5: 9B OSN ?=724?4G9C76 .+S 3:?29 38?=> 0?4< 4<7 U924767G 8@3=7 IQ?H1 +J1 #<7 8<32H7>?2 3:?29F38?= >7^57287> 0767 294 67>46?847= 49 4<7 <G=69;<?@?8

Laserflash

Untreated membranes

Hydroxylamine-treated membranes

Retinal-reconstituted membranes

10–3

AU

10–1 s

!"#$%& ' !"#$% &"#'()*+,-$+ .%"*#)$*.# ". /00 *1 23 " 42*.$%$5 6"5 7"-.$%)289"*:.2*

1$17%"*$ 8%$8"%".)2*; <28= 7$32%$ "++).)2* 23 '5+%2>59"1)*$? 1)++9$= "3.$% 0;@4

'5+%2>59"1)*$ .%$".1$*. ". 8A B;0= CD !E= F).' /00(*1 )99,1)*".)2* 32% G01)*? 72..21=

"3.$% -$*.%)3,H)*H .F)-$ F).' %$#,#8$*#)2* )* C0014 8'2#8'".$ 7,33$%= 8A B;0= 32992F$+

75 "++).)2* 23 /!4 "99(!"#$% %$.)*"9 "*+ )*-,7".)2* 32% C ';

MB 0m2

MB 0m1

MB 20m2

MB 20m5

MB 40m12

MB 100m9

HOT 75m3

HOT 75m8

0.01

Mon

tere

y B

ay a

nd s

hallo

w H

OT

Ant

arct

ica

and

deep

HO

T

HOT 75m1

PAL B1

PAL B5PAL B6

PAL E7

PAL E1PAL B7

PAL B2PAL B8

MB 100m10

MB 100m5

MB 100m7

MB 20m12

MB 40m1

MB 40m5BAC 40E8

BAC 31A8

PAL E6

BAC 64A5

HOT 0m1

HOT 75m4

!"#$%& ( I'592H$*$.)- "*"95#)# 23 .'$ )*3$%%$+ "1)*2("-)+ #$J,$*-$ 23 -92*$+

8%2.$2%'2+28#)* H$*$#; K)#."*-$ "*"95#)# 23 @@0 82#).)2*# F"# ,#$+ .2 -"9-,9".$ .'$ .%$$

75 *$)H'72,%(L2)*)*H ,#)*H .'$ I",8M$"%-' 8%2H%"1 23 .'$ N)#-2*#)* I"-:"H$ O$%#)2*

C0;0 PQ$*$.)-# E218,.$% Q%2,8? 4"+)#2*= N)#-2*#)*R; &' %#()$#"*+ 7"-.$%)2%'2+28#)*

F"# ,#$+ "# "* 2,.H%2,8= "*+ )# *2. #'2F*; M-"9$ 7"% %$8%$#$*.# *,17$% 23 #,7#.).,.)2*#

8$% #).$; 629+ *"1$# )*+)-".$ .'$ 8%2.$2%'2+28#)*# .'". F$%$ #8$-.%"995 -'"%"-.$%)S$+ )*

.')# #.,+5;

© 2001 Macmillan Magazines Ltd

!57

Page 58: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*++ !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

;99<= >54 0767 =<673? 9@76 4A7 724B67 <6947B2C B28;5?B2D 8A32D7=2736 4A7 674B23;E>B2?B2D ?9:3B2 FGBD1 +H1 ";=9C 4A767 03= 32B2=764B92 9I 927 3:B29 38B? B2 4A7 "243684B8 <694796A9?9<=B2=C67;34B@7 49 4A9=7 9I 4A7 J924767K 8;3?7 FGBD1 +H1L7 7M<67==7? 4A7 <3;&N <694796A9?9<=B2 D727 I69: "243684B83 B2

!" #$%& 87;;=1 "??B4B92 9I 674B23; 49 :7:>6327= 9I 87;;= 89243B2B2D"243684B8 <694796A9?9<=B2 3<9<6947B2 <69?587? 32 3>=96>3287<73O 34 +P/ 2: FGBD1 QHR3 >;57 =ABI4 9I ST 2: I69: 4A7 Q.TE2:<73O 9>=76@7? I96 4A7 J924767K U3K <694796A9?9<=B2=1 #A7 "24E3684B8 <694796A9?9<=B2 =<78465: 7MAB>B47? @B>634B923; V27 =4658E4567C 3= 9>=76@7? <67@B95=;K B2 674B2K;B?727 <BD:724= I96 368A373;=72=96K 6A9?9<=B2 WWC 0AB8A A3= 3 @76K =B:B;36 >;57E=ABI47? F+XTE2:H 3>=96<4B92 :3MB:5:,.1 #A7 2736;K B?724B83; =A3<7= 9I 4A7=46584567? =<78463 =5DD7=4 =B:B;36 :78A32B=:= 9I 03@7;72D4A 67D5E;34B92 B2 4A7 >38476B3; 32? 368A373; 6A9?9<=B2=1 G564A76:967C <3;&N<694796A9?9<=B2 I5284B92= =B:B;36;K 49 B4= J924767K U3KA9:9;9D57=,C 7MAB>B4B2D ;BDA4E:7?B347? 4632=<964 9I <69492= B26BDA4E=B?7E954 !" #$%& @7=B8;7= F&1 !1 Y<5?B8A '( )%1C :325=86B<4 B2<67<3634B92H1L7 =867727? 3 I9=:B? ;B>636K I69: "243684B83 F&1 G1 Z7*92DC

52<5>;B=A7? ?343H 0B4A <694796A9?9<=B2 <6B:76=C 32? =7[57287?927 8;927 89243B2B2D 4A7 D7271 \67;B:B236K =7[57287 323;K=7= >3=7?92 ]32OB2D D727 96?76 32? B?724B4K B2?B8347 4A34C ?7=<B47 ?BII767287=B2 3:B29E38B? =7[57287 32? 3>=96<4B92 =<78463C 4A7 "243684B8<694796A9?9<=B2 B= ?76B@7? I69: 3 >38476B5: ABDA;K 67;347? 494A7 <694796A9?9<=B2E89243B2B2D Y"%XN >38476B3 9I J924767K U3K,/

F)1 U7_3C 52<5>;B=A7? ?343H1#AB64KE409 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92 0767

8;927? I69: =56I387 FQE:H 32? TQE: =3:<;7=1 J9=4 <694796A9?9<E=B2 8;927= FX/bH I69: 4A7 a)# =56I387 03476= >7;92D7? 49 4A7J924767K 8;3?7 32? 0767 B?724B83; F92 4A7 3:B29E38B? ;7@7;H 49 4A7<694796A9?9<=B2 D727 I69: U"c +/&X1 W2 892463=4C :9=4 9I 4A78;927= I69: 4A7 TQE: =3:<;7 FP/bH I7;; 0B4AB2 4A7 "243684B8 8;3?7FGBD1 SH1 #09 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92C a)#/:, 32? a)# TQ:+C 0767 7M<67==7? B2 !" #$%& 32? 4A7B6 3>=96<4B92=<78463 0767 7M3:B27?1 "= 7M<7847? I69: 4A7B6 <6B:36K =7[57287=C4A7 a)# <694796A9?9<=B2 8;5=476B2D 0B4A 4A7 "243684B8 8;3?7 D3@73 >;57 F+P/E2:H 3>=96<4B92 :3MB:5:C 0A7673= 4A7 <694796A9?9<E=B2 8;5=476B2D 0B4A 4A7 J924767K 8;3?7 D3@7 3 D6772 FQ.TE2:H3>=96<4B92 :3MB:5: FGBD1 QH1W2 4A7 9;BD9469<AB8 03476= 9I 4A7 872463; !964A \38BV8 DK67C :9=4

9I 4A7 ;BDA4 7276DK B= B2 4A7 >;57 632D7C 0B4A :3MB:3; B2472=B4K 2736+TQ 2: F67I1 ,SH1 #AB= 7276DK <73O B= :3B243B27? 9@76 ?7<4AC0A7673= 4A7 4943; 7276DK ?78673=7=1 "4 4A7 =56I387C 4A7 7276DK <73OB= @76K >693? 0B4A 3 A3;I >32?0B?4A >740772 +// 32? NQ/ 2:1 W2?77<76 03476 >7;90 Q/:C 4A7 <73O 236690= 32? 4A7 A3;I >32?0B?4AB= 67=46B847? 49 >740772 +Q/ 32? Q// 2: FGBD1 QH1 c92=B?76B2D 4A7?BII76724 03@7;72D4A= 3>=96>7? >K :7:>76= 9I 4A7 409 ?BII76724

<694796A9?9<=B2 8;3?7= FQ.T 2: @76=5= +P/ 2:d GBD1 QHC B4 =77:=4A34 4A7 >;57E=ABI47? <694796A9?9<=B2 @36B324= 367 >74476 3?3<47? 494A7 ;BDA4 3@3B;3>;7 B2 4A7B6 72@B692:7241 !943>;KC 69? 32? 8927@B=53; <BD:724 6A9?9<=B2= I69: 8;9=7;K 67;347? V=A =<78B7= B2 *3O7U3BO3;,+ 3;=9 7MAB>B4 3 >;57 =ABI4 B2 ?77<76 ?07;;B2D =<78B7=1 W23??B4B92C <67@B95= =45?B7= 9I 85;4B@347? *+$#,%$+$#$##-.,QC,N 32?/01'#,$#$##-.,TC 3= 07;; 3= 234563; 8K329>38476B3; 89::52B4B7=,XC3;=9 B2?B8347 4A7 <67=7287 9I =56I387E 32? ?77<E03476 D695<=3?3<47? 49 ?BII76724 ?7<4A=1)56 ?343 290 =A90 4A7 <67=7287 9I 89E7MB=4B2D =56I387E 32? ?77<E

03476 8;3?7= 9I <694796A9?9<=B2E>3=7? <A949469<A=C 0A9=7 7276DKED727634B2D <BD:724= 367 =<78463;;K 4527? 49 7B4A76 =A3;;90 96 ?77<7603476 ;BDA4 V7;?=1 \67=5:3>;KC 4AB= <69@B?7= =5>=74= 9I <694796A9E?9<=B2 D7274B8 @36B324= B2 :BM7? <9<5;34B92= 0B4A 3 =7;784B@73?@3243D7 34 ?BII76724 <9B24= 3;92D 4A7 ?7<4AE?7<72?724 ;BDA4

!"#$%& ' !"#$%&#' (#%)*+'*$ ,- &.,$',./,0,&1%* (+%*,2(3%0 1'4"'*3'15 6/' 1'3,*0(.7 1$."3$".' 1/,8* 9:,;'1 -,. $.(*1+'+:.(*' /'#%3'1< %1 0'.%='0 -.,+ /70.,&($/7 &#,$15 >'1%0"'1

&.'0%3$'0 $, -,.+ $/' .'$%*(#2:%*0%*) &,3?'$ (.' +(.?'0 %* .'05

0.06

0.05

0.04

0.03

0.02

0.01

1.0

0.8

0.6

0.4

0.2

0350 400

75 m

5 m

MontereyBay

HOT 0 m

HOT 75 m

Antarctica

450 500

Wavelength (nm)

Rel

ativ

e irr

adia

nce

Abs

orba

nce

550 600 650

!"#$%& ( @:1,.&$%,* 1&'3$.( ,- .'$%*(#2.'3,*1$%$"$'0 &.,$',./,0,&1%*1 %* !" #$%&

+'+:.(*'15 @##2'()*+ .'$%*(# 9A5B!!< 8(1 (00'0 $, +'+:.(*' 1"1&'*1%,*1 %* CDD+!

&/,1&/($' :"--'.E &F G5DE (*0 (:1,.&$%,* 1&'3$.( 8'.' .'3,.0'05 6,&E -,". 1&'3$.( -,.

&(#HI 9@*$(.3$%3(<E FJ6 GB+KE FJ6 D+CE (*0 L@M NC@O 9!,*$'.'7 L(7< ($ C / (-$'.

.'$%*(# (00%$%,*5 L,$$,+E 0,8*8'##%*) %..(0%(*3' -.,+ FJ6 1$($%,* +'(1".'0 ($ 1%;

8(='#'*)$/1 9KCAE KKNE KPDE BCDE BBB (*0 IIB *+< (*0 ($ $8, 0'&$/1E -,. $/' 1(+'

0'&$/1 (*0 0($' $/($ $/' FJ6 1(+&#'1 8'.' 3,##'3$'0 9D (*0 GB+<5 Q..(0%(*3' %1 &#,$$'0

.'#($%=' $, %..(0%(*3' ($ KPD *+5

© 2001 Macmillan Magazines Ltd

!58

Page 59: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*++ !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

;99<= >54 0767 =<673? 9@76 4A7 724B67 <6947B2C B28;5?B2D 8A32D7=2736 4A7 674B23;E>B2?B2D ?9:3B2 FGBD1 +H1 ";=9C 4A767 03= 32B2=764B92 9I 927 3:B29 38B? B2 4A7 "243684B8 <694796A9?9<=B2=C67;34B@7 49 4A9=7 9I 4A7 J924767K 8;3?7 FGBD1 +H1L7 7M<67==7? 4A7 <3;&N <694796A9?9<=B2 D727 I69: "243684B83 B2

!" #$%& 87;;=1 "??B4B92 9I 674B23; 49 :7:>6327= 9I 87;;= 89243B2B2D"243684B8 <694796A9?9<=B2 3<9<6947B2 <69?587? 32 3>=96>3287<73O 34 +P/ 2: FGBD1 QHR3 >;57 =ABI4 9I ST 2: I69: 4A7 Q.TE2:<73O 9>=76@7? I96 4A7 J924767K U3K <694796A9?9<=B2=1 #A7 "24E3684B8 <694796A9?9<=B2 =<78465: 7MAB>B47? @B>634B923; V27 =4658E4567C 3= 9>=76@7? <67@B95=;K B2 674B2K;B?727 <BD:724= I96 368A373;=72=96K 6A9?9<=B2 WWC 0AB8A A3= 3 @76K =B:B;36 >;57E=ABI47? F+XTE2:H 3>=96<4B92 :3MB:5:,.1 #A7 2736;K B?724B83; =A3<7= 9I 4A7=46584567? =<78463 =5DD7=4 =B:B;36 :78A32B=:= 9I 03@7;72D4A 67D5E;34B92 B2 4A7 >38476B3; 32? 368A373; 6A9?9<=B2=1 G564A76:967C <3;&N<694796A9?9<=B2 I5284B92= =B:B;36;K 49 B4= J924767K U3KA9:9;9D57=,C 7MAB>B4B2D ;BDA4E:7?B347? 4632=<964 9I <69492= B26BDA4E=B?7E954 !" #$%& @7=B8;7= F&1 !1 Y<5?B8A '( )%1C :325=86B<4 B2<67<3634B92H1L7 =867727? 3 I9=:B? ;B>636K I69: "243684B83 F&1 G1 Z7*92DC

52<5>;B=A7? ?343H 0B4A <694796A9?9<=B2 <6B:76=C 32? =7[57287?927 8;927 89243B2B2D 4A7 D7271 \67;B:B236K =7[57287 323;K=7= >3=7?92 ]32OB2D D727 96?76 32? B?724B4K B2?B8347 4A34C ?7=<B47 ?BII767287=B2 3:B29E38B? =7[57287 32? 3>=96<4B92 =<78463C 4A7 "243684B8<694796A9?9<=B2 B= ?76B@7? I69: 3 >38476B5: ABDA;K 67;347? 494A7 <694796A9?9<=B2E89243B2B2D Y"%XN >38476B3 9I J924767K U3K,/

F)1 U7_3C 52<5>;B=A7? ?343H1#AB64KE409 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92 0767

8;927? I69: =56I387 FQE:H 32? TQE: =3:<;7=1 J9=4 <694796A9?9<E=B2 8;927= FX/bH I69: 4A7 a)# =56I387 03476= >7;92D7? 49 4A7J924767K 8;3?7 32? 0767 B?724B83; F92 4A7 3:B29E38B? ;7@7;H 49 4A7<694796A9?9<=B2 D727 I69: U"c +/&X1 W2 892463=4C :9=4 9I 4A78;927= I69: 4A7 TQE: =3:<;7 FP/bH I7;; 0B4AB2 4A7 "243684B8 8;3?7FGBD1 SH1 #09 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92C a)#/:, 32? a)# TQ:+C 0767 7M<67==7? B2 !" #$%& 32? 4A7B6 3>=96<4B92=<78463 0767 7M3:B27?1 "= 7M<7847? I69: 4A7B6 <6B:36K =7[57287=C4A7 a)# <694796A9?9<=B2 8;5=476B2D 0B4A 4A7 "243684B8 8;3?7 D3@73 >;57 F+P/E2:H 3>=96<4B92 :3MB:5:C 0A7673= 4A7 <694796A9?9<E=B2 8;5=476B2D 0B4A 4A7 J924767K 8;3?7 D3@7 3 D6772 FQ.TE2:H3>=96<4B92 :3MB:5: FGBD1 QH1W2 4A7 9;BD9469<AB8 03476= 9I 4A7 872463; !964A \38BV8 DK67C :9=4

9I 4A7 ;BDA4 7276DK B= B2 4A7 >;57 632D7C 0B4A :3MB:3; B2472=B4K 2736+TQ 2: F67I1 ,SH1 #AB= 7276DK <73O B= :3B243B27? 9@76 ?7<4AC0A7673= 4A7 4943; 7276DK ?78673=7=1 "4 4A7 =56I387C 4A7 7276DK <73OB= @76K >693? 0B4A 3 A3;I >32?0B?4A >740772 +// 32? NQ/ 2:1 W2?77<76 03476 >7;90 Q/:C 4A7 <73O 236690= 32? 4A7 A3;I >32?0B?4AB= 67=46B847? 49 >740772 +Q/ 32? Q// 2: FGBD1 QH1 c92=B?76B2D 4A7?BII76724 03@7;72D4A= 3>=96>7? >K :7:>76= 9I 4A7 409 ?BII76724

<694796A9?9<=B2 8;3?7= FQ.T 2: @76=5= +P/ 2:d GBD1 QHC B4 =77:=4A34 4A7 >;57E=ABI47? <694796A9?9<=B2 @36B324= 367 >74476 3?3<47? 494A7 ;BDA4 3@3B;3>;7 B2 4A7B6 72@B692:7241 !943>;KC 69? 32? 8927@B=53; <BD:724 6A9?9<=B2= I69: 8;9=7;K 67;347? V=A =<78B7= B2 *3O7U3BO3;,+ 3;=9 7MAB>B4 3 >;57 =ABI4 B2 ?77<76 ?07;;B2D =<78B7=1 W23??B4B92C <67@B95= =45?B7= 9I 85;4B@347? *+$#,%$+$#$##-.,QC,N 32?/01'#,$#$##-.,TC 3= 07;; 3= 234563; 8K329>38476B3; 89::52B4B7=,XC3;=9 B2?B8347 4A7 <67=7287 9I =56I387E 32? ?77<E03476 D695<=3?3<47? 49 ?BII76724 ?7<4A=1)56 ?343 290 =A90 4A7 <67=7287 9I 89E7MB=4B2D =56I387E 32? ?77<E

03476 8;3?7= 9I <694796A9?9<=B2E>3=7? <A949469<A=C 0A9=7 7276DKED727634B2D <BD:724= 367 =<78463;;K 4527? 49 7B4A76 =A3;;90 96 ?77<7603476 ;BDA4 V7;?=1 \67=5:3>;KC 4AB= <69@B?7= =5>=74= 9I <694796A9E?9<=B2 D7274B8 @36B324= B2 :BM7? <9<5;34B92= 0B4A 3 =7;784B@73?@3243D7 34 ?BII76724 <9B24= 3;92D 4A7 ?7<4AE?7<72?724 ;BDA4

!"#$%& ' !"#$%&#' (#%)*+'*$ ,- &.,$',./,0,&1%* (+%*,2(3%0 1'4"'*3'15 6/' 1'3,*0(.7 1$."3$".' 1/,8* 9:,;'1 -,. $.(*1+'+:.(*' /'#%3'1< %1 0'.%='0 -.,+ /70.,&($/7 &#,$15 >'1%0"'1

&.'0%3$'0 $, -,.+ $/' .'$%*(#2:%*0%*) &,3?'$ (.' +(.?'0 %* .'05

0.06

0.05

0.04

0.03

0.02

0.01

1.0

0.8

0.6

0.4

0.2

0350 400

75 m

5 m

MontereyBay

HOT 0 m

HOT 75 m

Antarctica

450 500

Wavelength (nm)

Rel

ativ

e irr

adia

nce

Abs

orba

nce

550 600 650

!"#$%& ( @:1,.&$%,* 1&'3$.( ,- .'$%*(#2.'3,*1$%$"$'0 &.,$',./,0,&1%*1 %* !" #$%&

+'+:.(*'15 @##2'()*+ .'$%*(# 9A5B!!< 8(1 (00'0 $, +'+:.(*' 1"1&'*1%,*1 %* CDD+!

&/,1&/($' :"--'.E &F G5DE (*0 (:1,.&$%,* 1&'3$.( 8'.' .'3,.0'05 6,&E -,". 1&'3$.( -,.

&(#HI 9@*$(.3$%3(<E FJ6 GB+KE FJ6 D+CE (*0 L@M NC@O 9!,*$'.'7 L(7< ($ C / (-$'.

.'$%*(# (00%$%,*5 L,$$,+E 0,8*8'##%*) %..(0%(*3' -.,+ FJ6 1$($%,* +'(1".'0 ($ 1%;

8(='#'*)$/1 9KCAE KKNE KPDE BCDE BBB (*0 IIB *+< (*0 ($ $8, 0'&$/1E -,. $/' 1(+'

0'&$/1 (*0 0($' $/($ $/' FJ6 1(+&#'1 8'.' 3,##'3$'0 9D (*0 GB+<5 Q..(0%(*3' %1 &#,$$'0

.'#($%=' $, %..(0%(*3' ($ KPD *+5

© 2001 Macmillan Magazines Ltd

!59

Page 60: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

More on Large Insert Metagenomics

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,- ' . /&0%$"%1 2332 ' 44456789:;5<=>

2?5 @=9AB8=6C %5 "5 D @7<EF;:C G5 *5 &>HIIH=6I =J <7:K=6 J:=> J=:;I8:L 76M F76MN9I; <B76A; H6 8:=OH<7F

"IH75 !"#$% &'()*+ ,-#"% !" +P,Q+?2 R,???S5

T35 U79OOHC V5 &5C WH;FHE7H6;6C U5 D U99I;F7C U5 0H=>7II 76M <7:K=6 K9MA;8 =J &9:=O;76 J=:;I8IC ,?., 8=

,??35 ./-+)/+ #!$" .3Q.+ R,??2S5

%&''()*)+,-./ 0+12.*-,32+ 7<<=>O76H;I 8B; O7O;: =6 0(123+XI 4;KIH8;RB88OYZZ44456789:;5<=>S5

!"#$%&'()*(+($,-

[; 8B76E 05 \8;OB;6I J=: <=>>;68I 76M I9AA;I8H=6I =6 ;7:FH;: ];:IH=6I =J 8B; >769I<:HO85#BHI 4=:E 47I I9OO=:8;M KL 8B; !\/C !)"" 76M 8B; ^68;:678H=67F _;=IOB;:; 0H=IOB;:;V:=A:7>Z_F=K7F "67FLIHIC ^68;:O:;878H=6C 76M W=M;FH6A V:=`;<85 \5/5 76M G5\5 4;:;I9OO=:8;M KL !)""XI )Ja<; =J _F=K7F V:=A:7>I J=: 8B; b7:K=6 W=M;FH6A b=6I=:8H9>5

b=::;IO=6M;6<; 76M :;c9;I8I J=: >78;:H7FI IB=9FM K; 7MM:;II;M 8= "5\5d5R;N>7HFY M;66H6Ae78>=I5<=F=I878;5;M9S5

55555555555555555555555555555555555555555555555555555555555555555!"#$#%&'(&) )*+&,#*(- ./0"1/.,*"&.&,02*' ."03-1&"*' %40(0(,0%4#5)&) 6&7 8.9fg: ;.,'&<*"0 => ?$@$A*f: B04" C> D&*)&<2&,1h:E*<<*./ F> G&<#0"h: F4,*#(*". ;> H,&#(0"f: =04,$ D./.).ig:B0".(4." I> J*#&"h: F<.*,& ;> C,.#&,h K J)L.,) C> M&N0"1f

f4#)1+3+5 ,(5 672(3-28 9+:+(3/' ;):1-121+< 4#:: =()>-)*<&("-?#3)-( @ABC@DBEFG< H.6hI'+ ;):1-121+ ?#3 !+)#8-/ 9+:+(3/'< 9#/JK-""+< 4(35"()> FBGAB< H.6i4(3-)+ ,-#1+/')#"#*5 ;):1-121+< L(8(-:'- =($#3(1#3-+:< L(8(-:'- &-15<;M(1+ BFEDBBBN< O(P()

4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444

5).2637" -+28/9)+37" ':2,2,.2':37 6-7,).3- 72+,-3+3+9 6-7,).32;7:(2.2':/(( ! <=7:(!> .)?&3.) 28/9)+ 12. 62,: 9.2@,: -+A =7:(!B/+,:)B3BCD$4 E)7)+, .)'2.,B B&99)B, ,:-, ,:)B) 6-7,).3- -.) @3A)(/A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A ,:-, ,:)/ *-/ -772&+, 12. &',2 !G 21 B&.1-7) 27)-+ ':2,2B/+,:),37 )()7,.2+ ,.-+B'2.,H -+ACCG 21 ,:) ,2,-( *37.263-( 72**&+3,/I4 J+2@+ '(-+F,2+37-+28/9)+37 ':2,2,.2':B 6)(2+9 ,2 2+(/ - 1)@ .)B,.37,)A 9.2&'B@3,:3+ ,:) K.2,)26-7,).3- !;B&67(-BB4 L).) @) .)'2., 9)+2*37-+-(/B)B 21 ,:) ':2,2B/+,:),37 9)+) 72+,)+, -+A 2').2+ 2.9-+3M-;,32+ 3+ +-,&.-((/ 277&..3+9 *-.3+) 6-7,).3-4 N:)B) ':2,2B/+,:),379)+) 7(&B,).B 3+7(&A)A B2*) ,:-, *2B, 7(2B)(/ .)B)*6()A ,:2B) 21K.2,)26-7,).3- 1.2* ,:)";B&67(-BB" @:37: :-O) +)O). 6)12.) 6))+26B).O)A 3+ *-.3+) )+O3.2+*)+,B4 P&.,:).*2.)" ,:)B) ':2,2B/+;,:),37 9)+)B @).) 6.2-A(/ A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A-7,3O)(/ )8'.)BB)A 3+ +).3,37 6-7,).32'(-+F,2+ -BB)*6(-9)B" 3+A3;7-,3+9 ,:-, ,:) +)@(/ 3A)+,3Q)A ':2,2,.2':B @).) ':2,2B/+,:),3;7-((/ 72*'),)+,4 R&. A-,- A)*2+B,.-,) ,:-, '(-+F,2+37 6-7,).3-(-BB)*6(-9)B -.) +2, B3*'(/ 72*'2B)A 21 2+) &+312.*" @3A)B'.)-A7(-BB 21 -+28/9)+37 ':2,2,.2':B" -B '.)O32&B(/ '.2'2B)AIS .-,:).",:)B) -BB)*6(-9)B 72+,-3+ *&(,3'()" A3B,-+,(/ .)(-,)A" ':2,2;B/+,:),37-((/ -7,3O) 6-7,).3-( 9.2&'B" 3+7(&A3+9 B2*) &+.)(-,)A,2 F+2@+ -+A 7&(,3O-,)A ,/')B4

W=I8 =J 8B; A;6;I :;c9H:;M J=: 8B; J=:>78H=6 =J K7<8;:H=<BF=:=NOBLFFN<=687H6H6A OB=8=ILI8;>I H6 7;:=KH<C 76=jLA;6H<C OB=8=N8:=OBH< R""VS K7<8;:H7 7:; <F9I8;:;M H6 7 <=68HA9=9IC +-NEHF=K7I;REKS <B:=>=I=>7F :;AH=6 RI9O;:=O;:=6Sk5 #B;I; H6<F9M; $/' 76M /31A;6;I <=MH6A J=: 8B; ;6lL>;I =J 8B; K7<8;:H=<BF=:=OBLFF 76M<7:=8;6=HM KH=IL68B;8H< O78B47LIC 76M 8B; P2? A;6;I <=MH6A J=:8B; I9K96H8I =J 8B; FHAB8NB7:];I8H6A <=>OF;j RP2?, 76M P2?6S 76M8B; :;7<8H=6 <;68:; <=>OF;j RP2?= 76M P2?4S5 #= K;88;: M;I<:HK; 8B;6789:; 76M MH];:IH8L =J OF76E8=6H<C 76=jLA;6H<C OB=8=IL68B;8H<K7<8;:H7C 4; I<:;;6;M 7 I9:J7<;N478;: K7<8;:H7F 7:8Ha<H7F <B:=>=N

g V:;I;68 7MM:;II;IY d;O7:8>;68 =J 0H=F=ALC #;<B6H=6N^I:7;F ^6I8H898; =J #;<B6=F=ALC @7HJ7 T2333C ^I:7;FR)505Sm !7:7 ^6I8H898; =J \<H;6<; 76M #;<B6=F=ALC #7E7L7>7C ^E=>7 kT3N3,3,C G7O76 R#5@5S5

Erythrobacter litoralis*MBIC3019*

Erythrobacter longus*Erythromicrobium ramosum

cDNA 0m1cDNA 20m21

Roseobacter denitrificans*Roseobacter litoralis*

Rhodovulum sulfidophilum*

100

BAC 39B11BAC 29C02

cDNA 20m10

BAC 52B02

BAC 65D09BAC 24D02

99

100

10054

Rhodoferax fermentans

Allochromatium vinosum

Roseateles depolymeransRubrivivax gelatinosus

cDNA 0m20

Rhodospirillum rubrum56

96

52

67

Thiocystisge latinosa

Rhodomicrobium vannieliiRhodocyclus tenuis

Rhodopseudomonas palustris

Chloroflexus aurantiacus

Sphingomonas natatoria

Rhodobacter sphaeroidesRhodobacter capsulatus

BAC 56B12a

b

BAC 60D04BAC 30G07

R2A84*

R2A163*

cDNA 20m22

cDNA 0m13

cDNA 20m8

R2A62*

cDNA 20m11

0.1

98

72

99

100

76

61

73

100

82

96

β

γ

α-3

α-4

α–1α-2

env20m1env20m5

env0m2

MBIC3951*

env0m1envHOT1

envHOT2envHOT3

100

99100

100

100

Roseobacter denitrificans*

Roseobacter litoralis*

MBIC3951*

R2A84*R2A62*

Rhodovulum sulfidophilum*Rhodobacter sphaeroidesRhodobacter capsulatus

Erythromicrobium ramosum

Erythrobacter litoralis*

MBIC3019*Sphingomonas natatoria

Erythrobacter longus*

Rhodomicrobium vannieliiRhodopseudomonas palustris

Rhodospirillum rubrum

Roseateles depolymeransRubrivivax gelatinosus

Rhodoferax fermentansRhodocyclus tenuis

Allochromatium vinosumThiocystis gelatinosa

Chloroflexus aurantiacus

0.1

α-3

α-4

α-2

α-1

β

γ

100

100

100

100

100

100

69

98

6165

100

82

95

69

100

5291

100

95

./*01( 2 !"#$%&'(')*+ ,'$-)*%(."*/. %0 !"#$ &'(' 132 -(3 ,456 142 .'78'(+'. %0 66!9-+)',*-: 3; 4; <=%$8)*%(-,# 3*.)-(+'. 0%, )"' !"#$ &'('. 132 >',' 3')',?*('3 0,%? -(

-$*&(?'() %0 @AA (8+$'%)*3' /%.*)*%(.; -(3 0%, ,456 &'('. 142 0,%? -( -$*&(?'() %0 B@A

(8+$'%)*3' .'78'(+' /%.*)*%(.: <=%$8)*%(-,# ,'$-)*%(."*/. >',' 3')',?*('3 9# ('*&"9%8,C

D%*(*(& -(-$#.*. 1.'' E')"%3.2: F"' &,''( (%(C.8$/"8, 9-+)',*8? %&'()(*+,"-

.")./01.2"- >-. 8.'3 -. -( %8)&,%8/: !"#$ &'('. )"-) >',' -?/$*G'3 9# !H4 *( )"*.

.)83# -,' *(3*+-)'3 9# )"' '(= /,'GI; >*)" J?K *(3*+-)*(& E%()','#; -(3 LMF *(3*+-)*(&

L->-** %+'-( )*?' .',*'.: H8$)*=-)'3 -',%9'. -,' ?-,N'3 *( $*&") 9$8'; 9-+)',*- +8$)8,'3

0,%? .'- >-)', -,' ?-,N'3 >*)" -( -.)',*.N; -(3 '(=*,%(?'()-$ +O56. -,' ?-,N'3 *( ,'3:

!"%)%.#()"')*+ !C; "C -(3 #C/,%)'%9-+)',*-$ &,%8/. -,' *(3*+-)'3 9# )"' =',)*+-$ 9-,. )%

)"' ,*&") %0 )"' ),'': P%%).),-/ =-$8'. &,'-)', )"-( QAR -,' *(3*+-)'3 -9%=' )"' 9,-(+"'.:

F"' .+-$' 9-, ,'/,'.'(). (8?9', %0 .89.)*)8)*%(. /', .*)':

© 2002 Macmillan Magazines Ltd

http://www.nature.com/nature/journal/v415/n6872/full/415630a.html

Page 61: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,- ' . /&0%$"%1 2332 ' 44456789:;5<=>

2?5 @=9AB8=6C %5 "5 D @7<EF;:C G5 *5 &>HIIH=6I =J <7:K=6 J:=> J=:;I8:L 76M F76MN9I; <B76A; H6 8:=OH<7F

"IH75 !"#$% &'()*+ ,-#"% !" +P,Q+?2 R,???S5

T35 U79OOHC V5 &5C WH;FHE7H6;6C U5 D U99I;F7C U5 0H=>7II 76M <7:K=6 K9MA;8 =J &9:=O;76 J=:;I8IC ,?., 8=

,??35 ./-+)/+ #!$" .3Q.+ R,??2S5

%&''()*)+,-./ 0+12.*-,32+ 7<<=>O76H;I 8B; O7O;: =6 0(123+XI 4;KIH8;RB88OYZZ44456789:;5<=>S5

!"#$%&'()*(+($,-

[; 8B76E 05 \8;OB;6I J=: <=>>;68I 76M I9AA;I8H=6I =6 ;7:FH;: ];:IH=6I =J 8B; >769I<:HO85#BHI 4=:E 47I I9OO=:8;M KL 8B; !\/C !)"" 76M 8B; ^68;:678H=67F _;=IOB;:; 0H=IOB;:;V:=A:7>Z_F=K7F "67FLIHIC ^68;:O:;878H=6C 76M W=M;FH6A V:=`;<85 \5/5 76M G5\5 4;:;I9OO=:8;M KL !)""XI )Ja<; =J _F=K7F V:=A:7>I J=: 8B; b7:K=6 W=M;FH6A b=6I=:8H9>5

b=::;IO=6M;6<; 76M :;c9;I8I J=: >78;:H7FI IB=9FM K; 7MM:;II;M 8= "5\5d5R;N>7HFY M;66H6Ae78>=I5<=F=I878;5;M9S5

55555555555555555555555555555555555555555555555555555555555555555!"#$#%&'(&) )*+&,#*(- ./0"1/.,*"&.&,02*' ."03-1&"*' %40(0(,0%4#5)&) 6&7 8.9fg: ;.,'&<*"0 => ?$@$A*f: B04" C> D&*)&<2&,1h:E*<<*./ F> G&<#0"h: F4,*#(*". ;> H,&#(0"f: =04,$ D./.).ig:B0".(4." I> J*#&"h: F<.*,& ;> C,.#&,h K J)L.,) C> M&N0"1f

f4#)1+3+5 ,(5 672(3-28 9+:+(3/' ;):1-121+< 4#:: =()>-)*<&("-?#3)-( @ABC@DBEFG< H.6hI'+ ;):1-121+ ?#3 !+)#8-/ 9+:+(3/'< 9#/JK-""+< 4(35"()> FBGAB< H.6i4(3-)+ ,-#1+/')#"#*5 ;):1-121+< L(8(-:'- =($#3(1#3-+:< L(8(-:'- &-15<;M(1+ BFEDBBBN< O(P()

4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444

5).2637" -+28/9)+37" ':2,2,.2':37 6-7,).3- 72+,-3+3+9 6-7,).32;7:(2.2':/(( ! <=7:(!> .)?&3.) 28/9)+ 12. 62,: 9.2@,: -+A =7:(!B/+,:)B3BCD$4 E)7)+, .)'2.,B B&99)B, ,:-, ,:)B) 6-7,).3- -.) @3A)(/A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A ,:-, ,:)/ *-/ -772&+, 12. &',2 !G 21 B&.1-7) 27)-+ ':2,2B/+,:),37 )()7,.2+ ,.-+B'2.,H -+ACCG 21 ,:) ,2,-( *37.263-( 72**&+3,/I4 J+2@+ '(-+F,2+37-+28/9)+37 ':2,2,.2':B 6)(2+9 ,2 2+(/ - 1)@ .)B,.37,)A 9.2&'B@3,:3+ ,:) K.2,)26-7,).3- !;B&67(-BB4 L).) @) .)'2., 9)+2*37-+-(/B)B 21 ,:) ':2,2B/+,:),37 9)+) 72+,)+, -+A 2').2+ 2.9-+3M-;,32+ 3+ +-,&.-((/ 277&..3+9 *-.3+) 6-7,).3-4 N:)B) ':2,2B/+,:),379)+) 7(&B,).B 3+7(&A)A B2*) ,:-, *2B, 7(2B)(/ .)B)*6()A ,:2B) 21K.2,)26-7,).3- 1.2* ,:)";B&67(-BB" @:37: :-O) +)O). 6)12.) 6))+26B).O)A 3+ *-.3+) )+O3.2+*)+,B4 P&.,:).*2.)" ,:)B) ':2,2B/+;,:),37 9)+)B @).) 6.2-A(/ A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A-7,3O)(/ )8'.)BB)A 3+ +).3,37 6-7,).32'(-+F,2+ -BB)*6(-9)B" 3+A3;7-,3+9 ,:-, ,:) +)@(/ 3A)+,3Q)A ':2,2,.2':B @).) ':2,2B/+,:),3;7-((/ 72*'),)+,4 R&. A-,- A)*2+B,.-,) ,:-, '(-+F,2+37 6-7,).3-(-BB)*6(-9)B -.) +2, B3*'(/ 72*'2B)A 21 2+) &+312.*" @3A)B'.)-A7(-BB 21 -+28/9)+37 ':2,2,.2':B" -B '.)O32&B(/ '.2'2B)AIS .-,:).",:)B) -BB)*6(-9)B 72+,-3+ *&(,3'()" A3B,-+,(/ .)(-,)A" ':2,2;B/+,:),37-((/ -7,3O) 6-7,).3-( 9.2&'B" 3+7(&A3+9 B2*) &+.)(-,)A,2 F+2@+ -+A 7&(,3O-,)A ,/')B4

W=I8 =J 8B; A;6;I :;c9H:;M J=: 8B; J=:>78H=6 =J K7<8;:H=<BF=:=NOBLFFN<=687H6H6A OB=8=ILI8;>I H6 7;:=KH<C 76=jLA;6H<C OB=8=N8:=OBH< R""VS K7<8;:H7 7:; <F9I8;:;M H6 7 <=68HA9=9IC +-NEHF=K7I;REKS <B:=>=I=>7F :;AH=6 RI9O;:=O;:=6Sk5 #B;I; H6<F9M; $/' 76M /31A;6;I <=MH6A J=: 8B; ;6lL>;I =J 8B; K7<8;:H=<BF=:=OBLFF 76M<7:=8;6=HM KH=IL68B;8H< O78B47LIC 76M 8B; P2? A;6;I <=MH6A J=:8B; I9K96H8I =J 8B; FHAB8NB7:];I8H6A <=>OF;j RP2?, 76M P2?6S 76M8B; :;7<8H=6 <;68:; <=>OF;j RP2?= 76M P2?4S5 #= K;88;: M;I<:HK; 8B;6789:; 76M MH];:IH8L =J OF76E8=6H<C 76=jLA;6H<C OB=8=IL68B;8H<K7<8;:H7C 4; I<:;;6;M 7 I9:J7<;N478;: K7<8;:H7F 7:8Ha<H7F <B:=>=N

g V:;I;68 7MM:;II;IY d;O7:8>;68 =J 0H=F=ALC #;<B6H=6N^I:7;F ^6I8H898; =J #;<B6=F=ALC @7HJ7 T2333C ^I:7;FR)505Sm !7:7 ^6I8H898; =J \<H;6<; 76M #;<B6=F=ALC #7E7L7>7C ^E=>7 kT3N3,3,C G7O76 R#5@5S5

Erythrobacter litoralis*MBIC3019*

Erythrobacter longus*Erythromicrobium ramosum

cDNA 0m1cDNA 20m21

Roseobacter denitrificans*Roseobacter litoralis*

Rhodovulum sulfidophilum*

100

BAC 39B11BAC 29C02

cDNA 20m10

BAC 52B02

BAC 65D09BAC 24D02

99

100

10054

Rhodoferax fermentans

Allochromatium vinosum

Roseateles depolymeransRubrivivax gelatinosus

cDNA 0m20

Rhodospirillum rubrum56

96

52

67

Thiocystisge latinosa

Rhodomicrobium vannieliiRhodocyclus tenuis

Rhodopseudomonas palustris

Chloroflexus aurantiacus

Sphingomonas natatoria

Rhodobacter sphaeroidesRhodobacter capsulatus

BAC 56B12a

b

BAC 60D04BAC 30G07

R2A84*

R2A163*

cDNA 20m22

cDNA 0m13

cDNA 20m8

R2A62*

cDNA 20m11

0.1

98

72

99

100

76

61

73

100

82

96

β

γ

α-3

α-4

α–1α-2

env20m1env20m5

env0m2

MBIC3951*

env0m1envHOT1

envHOT2envHOT3

100

99100

100

100

Roseobacter denitrificans*

Roseobacter litoralis*

MBIC3951*

R2A84*R2A62*

Rhodovulum sulfidophilum*Rhodobacter sphaeroidesRhodobacter capsulatus

Erythromicrobium ramosum

Erythrobacter litoralis*

MBIC3019*Sphingomonas natatoria

Erythrobacter longus*

Rhodomicrobium vannieliiRhodopseudomonas palustris

Rhodospirillum rubrum

Roseateles depolymeransRubrivivax gelatinosus

Rhodoferax fermentansRhodocyclus tenuis

Allochromatium vinosumThiocystis gelatinosa

Chloroflexus aurantiacus

0.1

α-3

α-4

α-2

α-1

β

γ

100

100

100

100

100

100

69

98

6165

100

82

95

69

100

5291

100

95

./*01( 2 !"#$%&'(')*+ ,'$-)*%(."*/. %0 !"#$ &'(' 132 -(3 ,456 142 .'78'(+'. %0 66!9-+)',*-: 3; 4; <=%$8)*%(-,# 3*.)-(+'. 0%, )"' !"#$ &'('. 132 >',' 3')',?*('3 0,%? -(

-$*&(?'() %0 @AA (8+$'%)*3' /%.*)*%(.; -(3 0%, ,456 &'('. 142 0,%? -( -$*&(?'() %0 B@A

(8+$'%)*3' .'78'(+' /%.*)*%(.: <=%$8)*%(-,# ,'$-)*%(."*/. >',' 3')',?*('3 9# ('*&"9%8,C

D%*(*(& -(-$#.*. 1.'' E')"%3.2: F"' &,''( (%(C.8$/"8, 9-+)',*8? %&'()(*+,"-

.")./01.2"- >-. 8.'3 -. -( %8)&,%8/: !"#$ &'('. )"-) >',' -?/$*G'3 9# !H4 *( )"*.

.)83# -,' *(3*+-)'3 9# )"' '(= /,'GI; >*)" J?K *(3*+-)*(& E%()','#; -(3 LMF *(3*+-)*(&

L->-** %+'-( )*?' .',*'.: H8$)*=-)'3 -',%9'. -,' ?-,N'3 *( $*&") 9$8'; 9-+)',*- +8$)8,'3

0,%? .'- >-)', -,' ?-,N'3 >*)" -( -.)',*.N; -(3 '(=*,%(?'()-$ +O56. -,' ?-,N'3 *( ,'3:

!"%)%.#()"')*+ !C; "C -(3 #C/,%)'%9-+)',*-$ &,%8/. -,' *(3*+-)'3 9# )"' =',)*+-$ 9-,. )%

)"' ,*&") %0 )"' ),'': P%%).),-/ =-$8'. &,'-)', )"-( QAR -,' *(3*+-)'3 -9%=' )"' 9,-(+"'.:

F"' .+-$' 9-, ,'/,'.'(). (8?9', %0 .89.)*)8)*%(. /', .*)':

© 2002 Macmillan Magazines Ltd

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,- ' . /&0%$"%1 2332 ' 44456789:;5<=>

2?5 @=9AB8=6C %5 "5 D @7<EF;:C G5 *5 &>HIIH=6I =J <7:K=6 J:=> J=:;I8:L 76M F76MN9I; <B76A; H6 8:=OH<7F

"IH75 !"#$% &'()*+ ,-#"% !" +P,Q+?2 R,???S5

T35 U79OOHC V5 &5C WH;FHE7H6;6C U5 D U99I;F7C U5 0H=>7II 76M <7:K=6 K9MA;8 =J &9:=O;76 J=:;I8IC ,?., 8=

,??35 ./-+)/+ #!$" .3Q.+ R,??2S5

%&''()*)+,-./ 0+12.*-,32+ 7<<=>O76H;I 8B; O7O;: =6 0(123+XI 4;KIH8;RB88OYZZ44456789:;5<=>S5

!"#$%&'()*(+($,-

[; 8B76E 05 \8;OB;6I J=: <=>>;68I 76M I9AA;I8H=6I =6 ;7:FH;: ];:IH=6I =J 8B; >769I<:HO85#BHI 4=:E 47I I9OO=:8;M KL 8B; !\/C !)"" 76M 8B; ^68;:678H=67F _;=IOB;:; 0H=IOB;:;V:=A:7>Z_F=K7F "67FLIHIC ^68;:O:;878H=6C 76M W=M;FH6A V:=`;<85 \5/5 76M G5\5 4;:;I9OO=:8;M KL !)""XI )Ja<; =J _F=K7F V:=A:7>I J=: 8B; b7:K=6 W=M;FH6A b=6I=:8H9>5

b=::;IO=6M;6<; 76M :;c9;I8I J=: >78;:H7FI IB=9FM K; 7MM:;II;M 8= "5\5d5R;N>7HFY M;66H6Ae78>=I5<=F=I878;5;M9S5

55555555555555555555555555555555555555555555555555555555555555555!"#$#%&'(&) )*+&,#*(- ./0"1/.,*"&.&,02*' ."03-1&"*' %40(0(,0%4#5)&) 6&7 8.9fg: ;.,'&<*"0 => ?$@$A*f: B04" C> D&*)&<2&,1h:E*<<*./ F> G&<#0"h: F4,*#(*". ;> H,&#(0"f: =04,$ D./.).ig:B0".(4." I> J*#&"h: F<.*,& ;> C,.#&,h K J)L.,) C> M&N0"1f

f4#)1+3+5 ,(5 672(3-28 9+:+(3/' ;):1-121+< 4#:: =()>-)*<&("-?#3)-( @ABC@DBEFG< H.6hI'+ ;):1-121+ ?#3 !+)#8-/ 9+:+(3/'< 9#/JK-""+< 4(35"()> FBGAB< H.6i4(3-)+ ,-#1+/')#"#*5 ;):1-121+< L(8(-:'- =($#3(1#3-+:< L(8(-:'- &-15<;M(1+ BFEDBBBN< O(P()

4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444

5).2637" -+28/9)+37" ':2,2,.2':37 6-7,).3- 72+,-3+3+9 6-7,).32;7:(2.2':/(( ! <=7:(!> .)?&3.) 28/9)+ 12. 62,: 9.2@,: -+A =7:(!B/+,:)B3BCD$4 E)7)+, .)'2.,B B&99)B, ,:-, ,:)B) 6-7,).3- -.) @3A)(/A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A ,:-, ,:)/ *-/ -772&+, 12. &',2 !G 21 B&.1-7) 27)-+ ':2,2B/+,:),37 )()7,.2+ ,.-+B'2.,H -+ACCG 21 ,:) ,2,-( *37.263-( 72**&+3,/I4 J+2@+ '(-+F,2+37-+28/9)+37 ':2,2,.2':B 6)(2+9 ,2 2+(/ - 1)@ .)B,.37,)A 9.2&'B@3,:3+ ,:) K.2,)26-7,).3- !;B&67(-BB4 L).) @) .)'2., 9)+2*37-+-(/B)B 21 ,:) ':2,2B/+,:),37 9)+) 72+,)+, -+A 2').2+ 2.9-+3M-;,32+ 3+ +-,&.-((/ 277&..3+9 *-.3+) 6-7,).3-4 N:)B) ':2,2B/+,:),379)+) 7(&B,).B 3+7(&A)A B2*) ,:-, *2B, 7(2B)(/ .)B)*6()A ,:2B) 21K.2,)26-7,).3- 1.2* ,:)";B&67(-BB" @:37: :-O) +)O). 6)12.) 6))+26B).O)A 3+ *-.3+) )+O3.2+*)+,B4 P&.,:).*2.)" ,:)B) ':2,2B/+;,:),37 9)+)B @).) 6.2-A(/ A3B,.36&,)A 3+ *-.3+) '(-+F,2+" -+A-7,3O)(/ )8'.)BB)A 3+ +).3,37 6-7,).32'(-+F,2+ -BB)*6(-9)B" 3+A3;7-,3+9 ,:-, ,:) +)@(/ 3A)+,3Q)A ':2,2,.2':B @).) ':2,2B/+,:),3;7-((/ 72*'),)+,4 R&. A-,- A)*2+B,.-,) ,:-, '(-+F,2+37 6-7,).3-(-BB)*6(-9)B -.) +2, B3*'(/ 72*'2B)A 21 2+) &+312.*" @3A)B'.)-A7(-BB 21 -+28/9)+37 ':2,2,.2':B" -B '.)O32&B(/ '.2'2B)AIS .-,:).",:)B) -BB)*6(-9)B 72+,-3+ *&(,3'()" A3B,-+,(/ .)(-,)A" ':2,2;B/+,:),37-((/ -7,3O) 6-7,).3-( 9.2&'B" 3+7(&A3+9 B2*) &+.)(-,)A,2 F+2@+ -+A 7&(,3O-,)A ,/')B4

W=I8 =J 8B; A;6;I :;c9H:;M J=: 8B; J=:>78H=6 =J K7<8;:H=<BF=:=NOBLFFN<=687H6H6A OB=8=ILI8;>I H6 7;:=KH<C 76=jLA;6H<C OB=8=N8:=OBH< R""VS K7<8;:H7 7:; <F9I8;:;M H6 7 <=68HA9=9IC +-NEHF=K7I;REKS <B:=>=I=>7F :;AH=6 RI9O;:=O;:=6Sk5 #B;I; H6<F9M; $/' 76M /31A;6;I <=MH6A J=: 8B; ;6lL>;I =J 8B; K7<8;:H=<BF=:=OBLFF 76M<7:=8;6=HM KH=IL68B;8H< O78B47LIC 76M 8B; P2? A;6;I <=MH6A J=:8B; I9K96H8I =J 8B; FHAB8NB7:];I8H6A <=>OF;j RP2?, 76M P2?6S 76M8B; :;7<8H=6 <;68:; <=>OF;j RP2?= 76M P2?4S5 #= K;88;: M;I<:HK; 8B;6789:; 76M MH];:IH8L =J OF76E8=6H<C 76=jLA;6H<C OB=8=IL68B;8H<K7<8;:H7C 4; I<:;;6;M 7 I9:J7<;N478;: K7<8;:H7F 7:8Ha<H7F <B:=>=N

g V:;I;68 7MM:;II;IY d;O7:8>;68 =J 0H=F=ALC #;<B6H=6N^I:7;F ^6I8H898; =J #;<B6=F=ALC @7HJ7 T2333C ^I:7;FR)505Sm !7:7 ^6I8H898; =J \<H;6<; 76M #;<B6=F=ALC #7E7L7>7C ^E=>7 kT3N3,3,C G7O76 R#5@5S5

Erythrobacter litoralis*MBIC3019*

Erythrobacter longus*Erythromicrobium ramosum

cDNA 0m1cDNA 20m21

Roseobacter denitrificans*Roseobacter litoralis*

Rhodovulum sulfidophilum*

100

BAC 39B11BAC 29C02

cDNA 20m10

BAC 52B02

BAC 65D09BAC 24D02

99

100

10054

Rhodoferax fermentans

Allochromatium vinosum

Roseateles depolymeransRubrivivax gelatinosus

cDNA 0m20

Rhodospirillum rubrum56

96

52

67

Thiocystisge latinosa

Rhodomicrobium vannieliiRhodocyclus tenuis

Rhodopseudomonas palustris

Chloroflexus aurantiacus

Sphingomonas natatoria

Rhodobacter sphaeroidesRhodobacter capsulatus

BAC 56B12a

b

BAC 60D04BAC 30G07

R2A84*

R2A163*

cDNA 20m22

cDNA 0m13

cDNA 20m8

R2A62*

cDNA 20m11

0.1

98

72

99

100

76

61

73

100

82

96

β

γ

α-3

α-4

α–1α-2

env20m1env20m5

env0m2

MBIC3951*

env0m1envHOT1

envHOT2envHOT3

100

99100

100

100

Roseobacter denitrificans*

Roseobacter litoralis*

MBIC3951*

R2A84*R2A62*

Rhodovulum sulfidophilum*Rhodobacter sphaeroidesRhodobacter capsulatus

Erythromicrobium ramosum

Erythrobacter litoralis*

MBIC3019*Sphingomonas natatoria

Erythrobacter longus*

Rhodomicrobium vannieliiRhodopseudomonas palustris

Rhodospirillum rubrum

Roseateles depolymeransRubrivivax gelatinosus

Rhodoferax fermentansRhodocyclus tenuis

Allochromatium vinosumThiocystis gelatinosa

Chloroflexus aurantiacus

0.1

α-3

α-4

α-2

α-1

β

γ

100

100

100

100

100

100

69

98

6165

100

82

95

69

100

5291

100

95

./*01( 2 !"#$%&'(')*+ ,'$-)*%(."*/. %0 !"#$ &'(' 132 -(3 ,456 142 .'78'(+'. %0 66!9-+)',*-: 3; 4; <=%$8)*%(-,# 3*.)-(+'. 0%, )"' !"#$ &'('. 132 >',' 3')',?*('3 0,%? -(

-$*&(?'() %0 @AA (8+$'%)*3' /%.*)*%(.; -(3 0%, ,456 &'('. 142 0,%? -( -$*&(?'() %0 B@A

(8+$'%)*3' .'78'(+' /%.*)*%(.: <=%$8)*%(-,# ,'$-)*%(."*/. >',' 3')',?*('3 9# ('*&"9%8,C

D%*(*(& -(-$#.*. 1.'' E')"%3.2: F"' &,''( (%(C.8$/"8, 9-+)',*8? %&'()(*+,"-

.")./01.2"- >-. 8.'3 -. -( %8)&,%8/: !"#$ &'('. )"-) >',' -?/$*G'3 9# !H4 *( )"*.

.)83# -,' *(3*+-)'3 9# )"' '(= /,'GI; >*)" J?K *(3*+-)*(& E%()','#; -(3 LMF *(3*+-)*(&

L->-** %+'-( )*?' .',*'.: H8$)*=-)'3 -',%9'. -,' ?-,N'3 *( $*&") 9$8'; 9-+)',*- +8$)8,'3

0,%? .'- >-)', -,' ?-,N'3 >*)" -( -.)',*.N; -(3 '(=*,%(?'()-$ +O56. -,' ?-,N'3 *( ,'3:

!"%)%.#()"')*+ !C; "C -(3 #C/,%)'%9-+)',*-$ &,%8/. -,' *(3*+-)'3 9# )"' =',)*+-$ 9-,. )%

)"' ,*&") %0 )"' ),'': P%%).),-/ =-$8'. &,'-)', )"-( QAR -,' *(3*+-)'3 -9%=' )"' 9,-(+"'.:

F"' .+-$' 9-, ,'/,'.'(). (8?9', %0 .89.)*)8)*%(. /', .*)':

© 2002 Macmillan Magazines Ltd

rRNA PUF

a, b, Evolutionary distances for the pufM genes (a) were determined from an alignment of 600 nucleotide positions, and for rRNA genes (b) from an alignment of 860 nucleotide sequence positions. Evolutionary relationships were determined by neighbour-joining analysis (see Methods). The green non-sulphur bacterium Chloroflexus aurantiacus was used as an outgroup. pufM genes that were amplified by PCR in this study are indicated by the env prefix, with 'm' indicating Monterey, and HOT indicating Hawaii ocean time series. Cultivated aerobes are marked in light blue, bacteria cultured from sea water are marked with an asterisk, and environmental cDNAs are marked in red. Photosynthetic -, - and -proteobacterial groups are indicated by the vertical bars to the right of the tree. Bootstrap values greater than 50% are indicated above the branches. The scale bar represents number of substitutions per site.

Page 62: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

!"#$%& ' ()* +,- ' . /&0%$"%1 2332 ' 44456789:;5<=> *+,

?=>; @0"AB CDE:7:FG H:;H7:;I J:=> ?9:J7<;K478;: >7:D6; E7<8;:D=KHC76L8=6M J=: <C=6;? <=687D6D6N !"#$ 76I !"#%,3M,, N;6;?5O;P;:7C !"#K<=687D6D6N 0"A <C=6;? 4;:; DI;68DQ;I D6 ?<:;;6?

9?D6N H:D>;:?,3 I;?DN6;I 8= 7>HCDJF 7C>=?8 8R; ;68D:; !"#$ 76I!"#% :;ND=6 @,-23S,-T3 69<C;=8DI;?B5 U6 7IID8D=6M ?;P;:7C <=7?87C!KH:=8;=E7<8;:D7C ?8:7D6? H:;PD=9?CF D?=C78;I =JJ 8R; ):;N=6 <=7?84;:; 7C?= ?<:;;6;I @%2" ?8:7D6?B,25 #R; !"#% HRFC=N;6;8D< 8:;;@/DN5 ,7B ;6<=>H7??;I 84= H:D6<DH7C <C7I;?M =6; <=687D6D6N !KV 76I!K+ H:=8;=E7<8;:D7 76I =6; <=687D6D6N !K,M !K2M "K 76I #KH:=8;=KE7<8;:D7 :;H:;?;6878DP;?M D6 7N:;;>;68 4D8R 76 ;7:CD;: ?89IF,35 #R;0"A <C=6;? J;CC D68= 8R:;; N:=9H? =6 8R; E7?D? =J !"#% N;6;HRFC=N;6F @/DN5 ,7B5 )6; N:=9H @0"A <C=6;? V3W3.M -X0,2 76IX3Y3+B 47? >=?8 <C=?;CF :;C78;I 8= !"#% J:=> !KH:=8;=E7<8;:D7D?=C78;? %2"X2 76I %2"T+ @:;J5 ,2B @&'()'*+,-).KCDL;M /DN5 ,EBM 76I47? HC7<;I D6 8R; N:=9H <=687D6D6N !KV Z:=8;=E7<8;:D75 #R; =8R;:84= N:=9H? @2GA32M VG0,, 76I 2+Y32M -2032M X-Y3GB E:76<R;I8=N;8R;:M 76I 4;:; >=?8 ?D>DC7: 8= 8R; J:;?R478;: "KH:=8;=E7<8;:KD9> &/'0'#).+1 #).2)3-+3( @/DN5 ,7M EB5 !"# N;6;? 4;:; 7C?=7>HCDQ;I EF H=CF>;:7?; <R7D6 :;7<8D=6 @ZA%B J:=> Y!" ;[8:7<8?=J 7 >D[;I E7<8;:D=HC76L8=6 7??;>EC7N;M ?7>HC;I 78 8R; ?7>;C=<78D=6 7? 4R;:; 8R; 0"A CDE:7:F =:DND678;I\ ]=68;:;F 07FMA7CDJ=:6D75 #R;?; ]=68;:;F 07F !"#% ?;^9;6<;? <C9?8;:;I 4D8R0"A <C=6;? V3W3.M -X0,2 76I X3Y3+M 8R; !KH:=8;=E7<8;:D7 D?=C78;?%2"X2 76I %2"T+ @;6P3>2M ;6P23>, 76I ;6P23>-BM 76I 8R;N:=9H <=>H=?;I =J &'()'*+,-). D?=C78;? @;6P3>,B5 #R; HRFC=N;K6;8D< :;C78D=6?RDH? =J !"#% N;6;? 4;:; N;6;:7CCF <=6?D?8;68 4D8R8R=?; I;8;:>D6;I J:=> :DE=?=>7C %!" N;6; ?;^9;6<;?M 4D8R 8R;;[<;H8D=6 =J 8R; !"#% <C9?8;: <=687D6D6N 8R; !K,M !K2M "K 76I #KH:=8;=E7<8;:D7 @/DN5 ,EB5#= DI;68DJF N:=9H? 7<8DP;CF ;[H:;??D6N HR=8=?F68R;8D< N;6;? D6

6789:7C H=H9C78D=6?M 4; 9?;I ZA% 4D8R :;P;:?; 8:76?<:DH8D=6 @%#KZA%B 8= DI;68DJF HR=8=?F68R;8D<K=H;:=6 >;??;6N;: %!"? J:=> 8R;?7>; ;6PD:=6>;685 #R; !"#$_% H:D>;: ?;8 J7DC;I 8= 7>HCDJF 76F<=>HC;>;687:F Y!"5 `=4;P;:M 7 IDJJ;:;68 H:D>;: 87:N;8D6N 7

?>7CC;: J:7N>;68 @,-X 69<C;=8DI;?B =J 8R; !"#% N;6; :;P;7C;I QP;IDJJ;:;68 N:=9H? =J !"#% D6 7]=68;:;F 07F !"#% <Y!"N;6; CDE:7:F@<Y!"? D6 /DN5 ,B5 )6; <Y!" N:=9H <C9?8;:;I 4D8R 8R; 7;:=ED<>7:D6; HR=8=8:=HR &'()'*+,-). 0)34-.45,+3(M 76=8R;: 4D8R ?8:7D6%2"T+M 76I =6; 47? >=?8 ?D>DC7: 8= 8R; !"#% ?;^9;6<; J:=> 7 #KH:=8;=E7<8;:D9>M 6/4',7(-4( 8)9+-43'(+5 #4= !"#% <Y!" N:=9H?<C9?8;:;I 4D8R 8R; IDJJ;:;68 0"A <C=6;?M D>HCFD6N 8R78 8R;?; 0"AD6?;:8? =:DND678; J:=> E7<8;:D7 8R78 4;:; 7<8DP;CF ;[H:;??D6N HR=8=K?F68R;8D< N;6;?5#= E;88;: <R7:7<8;:Da; 8R; E7<8;:D7 J:=> 4RD<R 8R;?; HR=8=?F6K

8R;8D< =H;:=6? =:DND678;IM 8R; 0"A D6?;:8? @2GA32M +, LEb X3Y3+M,3V LEb X-Y3GM T. LEB 4;:; J9CCF ?;^9;6<;I 76I 8R; HR=8=?F68R;8D<=H;:=6? <=>H7:;I 4D8R 8R=?; =J <9C89:;I H:=8;=E7<8;:D75 #R;=H;:=6 =:N76Da78D=6 =J 0"A <C=6; X-Y3G @76I 2GA32b I787 6=8?R=46BM 4RD<R J7CC? D6?DI; 8R; !K,_!K2_"_# !"#% <C9?8;:M D? <=6K?DI;:7ECF IDJJ;:;68 J:=> 8R; =H;:=6 =:N76Da78D=6 =J 76F H:;PD=9?CF:;H=:8;I HR=8=?F68R;8D< =:N76D?> @/DN5 2B5 AC=6;? X-Y3G 76I2GA32 7:; ?D>DC7: D6 8R;D: HR=8=?F68R;8D< =H;:=6 =:N76Da78D=6@I787 6=8 ?R=46BM 76I 4R;6 <=>H7:;I 4D8R E=8R !K =: "KH:=8;=KE7<8;:D7C @&/'0'*+,-). (!/+).'40)(,V 76I &"*.4:4:+1 8)9+-43'("(,+M:;?H;<8DP;CFB =H;:=6 =:N76Da78D=6M >=:; <C=?;CF :;?;>EC; 8R; "KH:=8;=E7<8;:D7C =H;:=6 @/DN5 2B5 0"A <C=6; X3Y3+M 4RD<R D? :;C78;I8= !KV H:=8;=E7<8;:D7 @=6 8R; E7?D? =J !"#% ?;^9;6<;?BM 7C?= IDJJ;:??DN6DQ<768CF D6 N;6; 7::76N;>;68 J:=> 76F =8R;: HR=8=?F68R;8D<=H;:=6 =:N76Da78D=6 :;H=:8;I 8= I78;5 U8 >=?8 <C=?;CF :;?;>EC;? 8R;HR=8=?F68R;8D< =H;:=6? J:=> &/'0'*+,-). ,+!("9+-"(T 76I &;(!/+).'40)(,VM,-M D6 E=8R =:N76Da78D=6 76I N;6; <=68;68 @D8 <=687D6?8R; *,/<=> 76I ,.-= N;6;?M 4RD<R 7:;>D??D6N D6 &; 8)9+-43'("(,+ 76I0"A <C=6;? 2GA32 76I X-Y3GB5 #R; ?9H;:=H;:=67C N;6; 7::76N;K>;68M ,.-?@S*,/ABCDS!"# 76I *,/@EFG$%S9/+HS!"/H,VS,.MJ=96I D6 &; (!/+).'40)(I &; ,+!("9+-"( 76I &; 8)9+-43'("( 7:; <=6K?;:P;I 7>=6N 8R; 6789:7CCF =<<9::D6N HC76L8=6D< E7<8;:D7C N;6=>;?@/DN5 2BM ?9NN;?8D6N 8R78 8R; HR=8=?F68R;8D< 7HH7:789? D6 8R; E7<8;:D7J:=> 4RD<R 8R; 0"A? =:DND678;I 7:; J96<8D=67C5

BAC 65D09

crt bch puf

hem

N

orf2

76or

f154

orf2

27pu

hA

lhaA

bch

ppsR

bch

Rubrivivaxgelatinosus

Rhodobactersphaeroides

AB L M C G PIH B N FM LBCDAE F C X Y Z

C X Y Z L M C ABB C EI F H B N FM L G P

HBNF MLGP J EBE IF D A I DCXYZLM BAX

orf3

58

orf4

40

H B N FM LGPB C EI FDAID C X Y Z L MB AQO

BAC 60D04

cycA

Q

orf6

41

OC

tspO

ppsRpp

a

bchpufbchcrt bchbch

puhA

lhaA

orf2

13or

f128

orf2

77or

f292

idi

idi

!"#$%& ' !"#$%&'(" ")%*&+(,)- ). *#)'),/-'#$'(" )*$+)-, .+)% !" #$%&'()*+,+ 0"1

*+)'$)2&"'$+(&34 !" +-.&$/*(0$+ 0!1*+)'$)2&"'$+(&3 &-5 6-"67'6+$5 $-8(+)-%$-'&7 9:;,<

=>? &22+$8(&'()-, 6,$ '#$ -)%$-"7&'6+$ 5$@-$5 (- +$., AB4 AC &-5 DC< E+$5("'$5 =>?,

&+$ ")7)6+$5 &"")+5(-F ') 2()7)F("&7 "&'$F)+/G F+$$-4 2&"'$+()"#7)+)*/77 2(),/-'#$,(,

F$-$,H )+&-F$4 "&+)'$-)(5 2(),/-'#$,(, F$-$,H +$54 7(F#'1#&+8$,'(-F &-5 +$&"'()- "$-'+$

F$-$,H &-5 276$4 "/')"#+)%$ 1D< I#('$ 2)J$, (-5("&'$ -)-1*#)'),/-'#$'(" &-5

#/*)'#$'("&7 *+)'$(-, K('# -) L-)K- .6-"'()-< M)%)7)F)6, +$F()-, &-5 F$-$, &+$

")--$"'$5 2/ ,#&5$5 8$+'("&7 &+$&, &-5 7(-$,4 +$,*$"'(8$7/<

© 2002 Macmillan Magazines Ltd

Page 63: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*+, !"#$%& ' ()* +,- ' . /&0%$"%1 2332 ' 44456789:;5<=>

?; <=>@7:;A B;C;:7D @E=8=BF68E;8G< H;6;B I=96A =6 8E; =<;76G<J7<8;:G=<ED=:=@EFDD B9@;:=@;:=6B 4G8E <E7:7<8;:GK;A E=>=D=H9;BI:=> <9D89:;A @E=8=BF68E;8G< J7<8;:G75 #E; :;D78G=6BEG@B 7>=6H0<ED! JG=BF68E;8G< @:=8;G6B 0<E0 L7 B9J96G8 =I DGHE8MG6A;@;6A;68@:=8=<ED=:=@EFDDGA; :;A9<87B;N 76A 0<EO L>7H6;BG9> <E;D787B;N4;:; A;8;:>G6;A,P L/GH5 QN5 RG>GD7: 8= "#$% :;D78G=6BEG@BS 0<E0 76A0<EO @:=8;G6B I:=> 0"T <D=6; U3V3+ 4;:; >=B8 BG>GD7: 8=E=>=D=H9;B I:=> &' (!")#*!+#) 76A &' )",!-./01-) L/GH5 QN5 0<ED!JG=BF68E;8G< @:=8;G6B I:=> 0"T <D=6;B 2WT32 76A U-V3W 4;:;>=B8 <D=B;DF :;D78;A 8= 8E=B; =I &' 2-*!+03/)#) L6= B;X9;6<;B 7:;F;8 7C7GD7JD; I:=> 7 @E=8=BF68E;8G< !M@:=8;=J7<8;:G9>N 76A&,/1/")-#1/4/3!) "!*#)+.0)S 7H7G6 G6 7H:;;>;68 4G8E 8E; @EFD=MH;6;8G< :;D78G=6BEG@B =I 8E; "#$% B;X9;6<;B L/GH5 ,N5%;<;68 767DFB;B B9HH;B8 @=BBGJD; E=:GK=687D 8:76BI;: =I 8E; @E=8=M

BF68E;8G< H;6; <D9B8;: G6 @9:@D; J7<8;:G7,+S 76A 8EGB @=BBGJGDG8F<=>@DG<78;B A;Y6G8GC; GA;68GY<78G=6 =I 8E; =:H76GB>7D =:GHG6B =I8E;B; =@;:=6B J7B;A B=D;DF =6 @E=8=BF68;8EG< H;6; 767DFBGB5 O=4M;C;:S H;6; 7BBGH6>;68 =I =@;6 :;7AG6H I:7>;B L)%/BN :;C;7D;A 8E78>=:; 8E76 .-Z =I )%/B =98BGA; 8E; @E=8=BF68E;8G< B9@;:=@;:=6 =60"T U-V3W 7:; >=B8 BG>GD7: 8= @:=8;G6B I:=> !M@:=8;=J7<8;:G7S4E;:;7B )%/B I:=> 0"T <D=6; U3V3+ 7:; >=B8 BG>GD7: 8= 8E=B; =I"M@:=8;=J7<8;:G75 #E; @EFD=H;6;8G< 7BBGH6>;68B =6 8E; J7BGB =I "#$H;6; BG>GD7:G8G;B 76A 7::76H;>;68 7:; 8E;:;I=:; <=6BGB8;68 4G8E 8E;<E:=>=B=>7D <=68;[8 ;[8;:67D 8= 8E; :;B@;<8GC; @E=8=BF68E;8G<B9@;:=@;:=6B G6 8E; 0"T <D=6;B 767DFB;A5\6 8EGB B89AFS 6= B;X9;6<;B :;<=C;:;A G6 ]=68;:;F 07F 478;:B

4;:; BG>GD7: 8= 8E=B; =I 5.6+,./7!(+-. B@;<G;B^=6; =I 8E; >=:;<=>>=6DF <9D89:;A 0<ED!M<=687G6G6H J7<8;:G7 :;<=C;:;A I:=> 8E;=@;6 =<;76P5 ?; 8E;:;I=:; 9B;A 8E; B7>; "#$% @:G>;:B =6 J7<8;:G=M@D76_8=6 V!" ;[8:7<8B I:=> 478;:B =I 8E; <;68:7D !=:8E `7<GY<)<;76 LO747GG =<;76 8G>; B;:G;B B878G=6,Wa ;6CO)# <D=6;B G6 /GH5 ,N5)6; O)# ;6CG:=6>;687D "#$% H:=9@ L:;@:;B;68;A JF ;6CO)#,<D=6;N <D9B8;:;A 4G8E B;X9;6<;B I:=> "MQ @:=8;=J7<8;:G7 GB=D78;BS76A 76=8E;: H:=9@ L;6CO)#2 76A ;6CO)#QN <D9B8;:;A 4G8E8E; 0"T B;X9;6<;B :;D78;A 8= 8E; I:;BE478;: #M@:=8;=J7<8;:G9>S&' $-.4-3+!3)5 #E; :;B9D8B B9HH;B8 8E78 BG>GD7: H:=9@B G6C=DC;A G6=<;76G< 7;:=JG< 76=[FH;6G< @E=8=BF68E;BGB 7:; I=96A G6 B9:I7<;478;:B I:=> J=8E 6;:G8G< 76A =<;76G< BFB8;>B5 #E;B; H:=9@B A= 6=87@@;7: 8= J; BG>GD7: L78 D;7B8 4G8E :;B@;<8 8= 8E;G: @E=8=BF68E;8G<

=@;:=6N 8= <9D8GC78;A 5.6+,./7!(+-. B@;<G;B5%;<;68DFS G8 E7B J;;6 B9HH;B8;A 8E78 <9D8GC78;A 5.6+,./7!(+-.

B@;<G;B L"M+ B9J<D7BB =I @:=8;=J7<8;:G7N >7F :;@:;B;68 8E; @:;MA=>G6768 ""`B G6 8E; 9@@;: =<;76PS235 R9:@:GBG6HDFS 4; 4;:; 6=87JD; 8= :;8:G;C; @E=8=BF68E;8G< =@;:=6 H;6;B J;D=6HG6H 8= 8EGB H:=9@G6 76F =I 8E; B7>@D;B 767DFB;A5 /9:8E;:>=:;S C;:F I;4 B;X9;6<;B:;D78;A 8= 5.6+,./7!(+-. B@;<G;B E7C; J;;6 :;@=:8;A G6 ,UR :V!"<D=6; DGJ:7:G;B <=6B8:9<8;A I:=> >7:G6; @D76_8=6 V!"S 7DB= B9HMH;B8G6H 8E78 8EGB H:=9@ >7F 6=8 :;@:;B;68 8E; @:;A=>G6768 ""`B G68E; 9@@;: =<;765 R=>; =I 8E; :;@:;B;6878GC; ""`B 4; I=96A 4;:;:;D78;A 8= <9D89:;A >7:G6; J7<8;:G7 L&/)-/7!(+-. 76A &/)-/7!(+-.MDG_; J7<8;:G7 4G8EG6 8E; "MQ B9J<D7BB =I @:=8;=J7<8;:G7N5 ];>J;:B =I8EGB H:=9@ E7C; 7DB= J;;6 I:;X9;68DF :;8:G;C;A G6 ,UR :%!" <D=6;DGJ:7:G;B 76A :;@:;B;68 7 D7:H; @:=@=:8G=6 =I J7<8;:G=@D76_8=6:V!"B G6 <=7B87D 478;:B2,5 O=4;C;:S =8E;: H:=9@B LI=96A G6 J=8E0"T 76A <V!" DGJ:7:G;BN 4;:; =6DF AGB8768DF :;D78;A 8= _6=4676=[FH;6G< @E=8=8:=@EBS 76A E7C; 6;C;: J;I=:; J;;6 =JB;:C;A G6>7:G6; @D76_8=65 #EGB G>@DG;B 8E78 G6 7AAG8G=6 8= 5.6+,./7!(+-. 76A&/)-/7!(+-. B@;<G;BS =8E;: F;8M8=MJ;M<9D8GC78;A J7<8;:G7 L>=B8 DG_;DF:;D78;A 8= #M =: !M@:=8;=J7<8;:G7N 7<8GC;DF @7:8G<G@78; G6 =<;76G<7;:=JG<S J7<8;:G=<ED=:=@EFDDMJ7B;A @E=8=BF68E;BGB5 #E; AGB<=C;:F=I 6;4 >7:G6; ""` J7<8;:G7 8E:=9HE <9D89:;MG6A;@;6A;68 H;6=>G<767DFB;B ;>@E7BGK;B 8E; <=>@D;>;687:F 6789:; =I <9D89:;MJ7B;A76A <9D8GC78G=6MG6A;@;6A;68 7@@:=7<E;BS 4EG<E 87_;6 8=H;8E;:@:=CGA; 7 >9<E >=:; <=>@:;E;6BGC; @;:B@;<8GC; 8E76 ;G8E;: A=;B7D=6;5 )9: 6;4 =JB;:C78G=6B BE=9DA E;D@ AG:;<8 <9::;68 ;II=:8B7G>;A 78 <E7:7<8;:GKG6H 8E=B; >G<:=J;B :;B@=6BGJD; I=: =<;76G<SJ7<8;:G=<ED=:=@EFDDM>;AG78;A @E=8=BF68E;BGBS 7 6;4DF :;<=H6GK;A.SP

J98 @==:DF 96A;:B8==A @:=<;BB G6 >7:G6; @D76_8=65 !

!"#$%&'()* +,-./.0 /1& "12,.%13"1#/+ 456) 7."7/./#,%1

#E; B9:I7<;M478;: 0"T DGJ:7:F <=6B8:9<8G=6 E7B J;;6 A;B<:GJ;A @:;CG=9BDFW 76A 47B@:;@7:;A I:=> B;7 478;: I:=> B878G=6 ]2 LD=<78;A 7@@:=[G>78;DF +- _> =IIBE=:; =I ]=BB*76AG6HS T7DGI=:6G7N @:;MYD8;:;A 8E:=9HE 7 b/c" HD7BB YJ:; YD8;: L7@@:=[G>78; @7:8G<D;BGK; D;BB 8E76 ,5U$>N 8= :;>=C; <;DD 7HH:;H78;B 76A D7:H;: ;9_7:F=8G< @EF8=@D76_8=6 <;DDB5R;7 478;: I:=> 8E; O)#B878G=6 L225+= !S ,-P53= ?N 47B <=DD;<8;A =6 ,. ]7:<E ,WWP5 /=:@:;@7:78G=6 =I ;6CG:=6>;687D <V!"S B;7 478;: 47B <=DD;<8;A I:=> 3> 76A 23> =6 2U"@:GD 2333 78 B878G=6 +0 =II]=BB *76AG6HS T7DGI=:6G7 LQU5..!!S ,22532!?N 7J=7:A 8E; %(8-)+-.3 9*6-.S 76A @G<=@D76_8=6 4;:; <=DD;<8;A =68= R8;:GC;[ T7:8:GAH;B L]GDDG@=:;N 76A

BAC 60D04

BAC 65D09

BAC 29C02

ba BchH

100/96100/100

100/99

100/100

100/100

100/100100/100

BAC 60D04

BAC 65D09

BAC 29C02100/66

100/100

100/94

100/100

100/100

100/100

BchB

0.1

0.1

Acidiphilium rubrum

Rhodobacter capsulatus

Rhodobacter sphaeroides

Rhodopseudomonas palustris

Rubrivivax gelatinosus

Chloroflexus aurantiacus

Chlorobium tepidum

Heliobacillus mobilis

Rhodopseudomonas palustrisRhodobacter capsulatus

Rhodobacter sphaeroides

Acidiphilium rubrum

Heliobacillus mobilis

Chlorobium tepidum

Chloroflexus aurantiacus

Rubrivivax gelatinosus

8,9:." ; !"#$%&'(')*+ ,(,$#-'- %. /+"/ ,(0 /+"1 23%)'*(-4 /5 !"#$%&'(')*+ )3'' .%3 )"'/+"/ 23%)'*(4 -5 !"#$%&'(')*+ )3'' .%3 )"' /+"1 23%)'*(4 6"' /+"1 -'78'(+'- .3%9

!"#$%$&'() *'&%'$+$%),:;,(0 /+"1: ,(0 /+"1< .3%9 !- .,/'0()=> ?'3' %9*))'0 .3%9 )"'

)3'' @'+,8-' )"'-' &'('- 2%)'()*,$$# '(+%0' ,( '(A#9' .%3 @,+)'3*%+"$%3%2"#$$ 1

@*%-#()"'-*- ,(0 ,3' 23%@,@$# %. 0*-)*(+) %3*&*( BC4 D*%(&5 2'3-%(,$ +%998(*+,)*%(E4

/%%)-)3,2 F,$8'- B('*&"@%83GH%*(*(&I2,3-*9%(# 9')"%0E &3',)'3 )",( ;JK ,3' *(0*+,)'0

('L) )% )"' @3,(+"'-4 6"' -+,$' @,3 3'23'-'()- (89@'3 %. -8@-)*)8)*%(- 2'3 -*)'4 6"'

2%-*)*%( %. 21'0'/"'#'() %(&%() B@%$0 @3,(+"E ?,- (%) ?'$$ 3'-%$F'0 @# @%)" 9')"%0-4

© 2002 Macmillan Magazines Ltd

Page 64: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

!"##"$% #& '(#)$"

*++ !"#$%& ' ()* +,, ' ,+ -$!& .//, ' 0001234567189:

;99<= >54 0767 =<673? 9@76 4A7 724B67 <6947B2C B28;5?B2D 8A32D7=2736 4A7 674B23;E>B2?B2D ?9:3B2 FGBD1 +H1 ";=9C 4A767 03= 32B2=764B92 9I 927 3:B29 38B? B2 4A7 "243684B8 <694796A9?9<=B2=C67;34B@7 49 4A9=7 9I 4A7 J924767K 8;3?7 FGBD1 +H1L7 7M<67==7? 4A7 <3;&N <694796A9?9<=B2 D727 I69: "243684B83 B2

!" #$%& 87;;=1 "??B4B92 9I 674B23; 49 :7:>6327= 9I 87;;= 89243B2B2D"243684B8 <694796A9?9<=B2 3<9<6947B2 <69?587? 32 3>=96>3287<73O 34 +P/ 2: FGBD1 QHR3 >;57 =ABI4 9I ST 2: I69: 4A7 Q.TE2:<73O 9>=76@7? I96 4A7 J924767K U3K <694796A9?9<=B2=1 #A7 "24E3684B8 <694796A9?9<=B2 =<78465: 7MAB>B47? @B>634B923; V27 =4658E4567C 3= 9>=76@7? <67@B95=;K B2 674B2K;B?727 <BD:724= I96 368A373;=72=96K 6A9?9<=B2 WWC 0AB8A A3= 3 @76K =B:B;36 >;57E=ABI47? F+XTE2:H 3>=96<4B92 :3MB:5:,.1 #A7 2736;K B?724B83; =A3<7= 9I 4A7=46584567? =<78463 =5DD7=4 =B:B;36 :78A32B=:= 9I 03@7;72D4A 67D5E;34B92 B2 4A7 >38476B3; 32? 368A373; 6A9?9<=B2=1 G564A76:967C <3;&N<694796A9?9<=B2 I5284B92= =B:B;36;K 49 B4= J924767K U3KA9:9;9D57=,C 7MAB>B4B2D ;BDA4E:7?B347? 4632=<964 9I <69492= B26BDA4E=B?7E954 !" #$%& @7=B8;7= F&1 !1 Y<5?B8A '( )%1C :325=86B<4 B2<67<3634B92H1L7 =867727? 3 I9=:B? ;B>636K I69: "243684B83 F&1 G1 Z7*92DC

52<5>;B=A7? ?343H 0B4A <694796A9?9<=B2 <6B:76=C 32? =7[57287?927 8;927 89243B2B2D 4A7 D7271 \67;B:B236K =7[57287 323;K=7= >3=7?92 ]32OB2D D727 96?76 32? B?724B4K B2?B8347 4A34C ?7=<B47 ?BII767287=B2 3:B29E38B? =7[57287 32? 3>=96<4B92 =<78463C 4A7 "243684B8<694796A9?9<=B2 B= ?76B@7? I69: 3 >38476B5: ABDA;K 67;347? 494A7 <694796A9?9<=B2E89243B2B2D Y"%XN >38476B3 9I J924767K U3K,/

F)1 U7_3C 52<5>;B=A7? ?343H1#AB64KE409 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92 0767

8;927? I69: =56I387 FQE:H 32? TQE: =3:<;7=1 J9=4 <694796A9?9<E=B2 8;927= FX/bH I69: 4A7 a)# =56I387 03476= >7;92D7? 49 4A7J924767K 8;3?7 32? 0767 B?724B83; F92 4A7 3:B29E38B? ;7@7;H 49 4A7<694796A9?9<=B2 D727 I69: U"c +/&X1 W2 892463=4C :9=4 9I 4A78;927= I69: 4A7 TQE: =3:<;7 FP/bH I7;; 0B4AB2 4A7 "243684B8 8;3?7FGBD1 SH1 #09 <694796A9?9<=B2 D727= I69: 4A7 a)# =434B92C a)#/:, 32? a)# TQ:+C 0767 7M<67==7? B2 !" #$%& 32? 4A7B6 3>=96<4B92=<78463 0767 7M3:B27?1 "= 7M<7847? I69: 4A7B6 <6B:36K =7[57287=C4A7 a)# <694796A9?9<=B2 8;5=476B2D 0B4A 4A7 "243684B8 8;3?7 D3@73 >;57 F+P/E2:H 3>=96<4B92 :3MB:5:C 0A7673= 4A7 <694796A9?9<E=B2 8;5=476B2D 0B4A 4A7 J924767K 8;3?7 D3@7 3 D6772 FQ.TE2:H3>=96<4B92 :3MB:5: FGBD1 QH1W2 4A7 9;BD9469<AB8 03476= 9I 4A7 872463; !964A \38BV8 DK67C :9=4

9I 4A7 ;BDA4 7276DK B= B2 4A7 >;57 632D7C 0B4A :3MB:3; B2472=B4K 2736+TQ 2: F67I1 ,SH1 #AB= 7276DK <73O B= :3B243B27? 9@76 ?7<4AC0A7673= 4A7 4943; 7276DK ?78673=7=1 "4 4A7 =56I387C 4A7 7276DK <73OB= @76K >693? 0B4A 3 A3;I >32?0B?4A >740772 +// 32? NQ/ 2:1 W2?77<76 03476 >7;90 Q/:C 4A7 <73O 236690= 32? 4A7 A3;I >32?0B?4AB= 67=46B847? 49 >740772 +Q/ 32? Q// 2: FGBD1 QH1 c92=B?76B2D 4A7?BII76724 03@7;72D4A= 3>=96>7? >K :7:>76= 9I 4A7 409 ?BII76724

<694796A9?9<=B2 8;3?7= FQ.T 2: @76=5= +P/ 2:d GBD1 QHC B4 =77:=4A34 4A7 >;57E=ABI47? <694796A9?9<=B2 @36B324= 367 >74476 3?3<47? 494A7 ;BDA4 3@3B;3>;7 B2 4A7B6 72@B692:7241 !943>;KC 69? 32? 8927@B=53; <BD:724 6A9?9<=B2= I69: 8;9=7;K 67;347? V=A =<78B7= B2 *3O7U3BO3;,+ 3;=9 7MAB>B4 3 >;57 =ABI4 B2 ?77<76 ?07;;B2D =<78B7=1 W23??B4B92C <67@B95= =45?B7= 9I 85;4B@347? *+$#,%$+$#$##-.,QC,N 32?/01'#,$#$##-.,TC 3= 07;; 3= 234563; 8K329>38476B3; 89::52B4B7=,XC3;=9 B2?B8347 4A7 <67=7287 9I =56I387E 32? ?77<E03476 D695<=3?3<47? 49 ?BII76724 ?7<4A=1)56 ?343 290 =A90 4A7 <67=7287 9I 89E7MB=4B2D =56I387E 32? ?77<E

03476 8;3?7= 9I <694796A9?9<=B2E>3=7? <A949469<A=C 0A9=7 7276DKED727634B2D <BD:724= 367 =<78463;;K 4527? 49 7B4A76 =A3;;90 96 ?77<7603476 ;BDA4 V7;?=1 \67=5:3>;KC 4AB= <69@B?7= =5>=74= 9I <694796A9E?9<=B2 D7274B8 @36B324= B2 :BM7? <9<5;34B92= 0B4A 3 =7;784B@73?@3243D7 34 ?BII76724 <9B24= 3;92D 4A7 ?7<4AE?7<72?724 ;BDA4

!"#$%& ' !"#$%&#' (#%)*+'*$ ,- &.,$',./,0,&1%* (+%*,2(3%0 1'4"'*3'15 6/' 1'3,*0(.7 1$."3$".' 1/,8* 9:,;'1 -,. $.(*1+'+:.(*' /'#%3'1< %1 0'.%='0 -.,+ /70.,&($/7 &#,$15 >'1%0"'1

&.'0%3$'0 $, -,.+ $/' .'$%*(#2:%*0%*) &,3?'$ (.' +(.?'0 %* .'05

0.06

0.05

0.04

0.03

0.02

0.01

1.0

0.8

0.6

0.4

0.2

0350 400

75 m

5 m

MontereyBay

HOT 0 m

HOT 75 m

Antarctica

450 500

Wavelength (nm)

Rel

ativ

e irr

adia

nce

Abs

orba

nce

550 600 650

!"#$%& ( @:1,.&$%,* 1&'3$.( ,- .'$%*(#2.'3,*1$%$"$'0 &.,$',./,0,&1%*1 %* !" #$%&

+'+:.(*'15 @##2'()*+ .'$%*(# 9A5B!!< 8(1 (00'0 $, +'+:.(*' 1"1&'*1%,*1 %* CDD+!

&/,1&/($' :"--'.E &F G5DE (*0 (:1,.&$%,* 1&'3$.( 8'.' .'3,.0'05 6,&E -,". 1&'3$.( -,.

&(#HI 9@*$(.3$%3(<E FJ6 GB+KE FJ6 D+CE (*0 L@M NC@O 9!,*$'.'7 L(7< ($ C / (-$'.

.'$%*(# (00%$%,*5 L,$$,+E 0,8*8'##%*) %..(0%(*3' -.,+ FJ6 1$($%,* +'(1".'0 ($ 1%;

8(='#'*)$/1 9KCAE KKNE KPDE BCDE BBB (*0 IIB *+< (*0 ($ $8, 0'&$/1E -,. $/' 1(+'

0'&$/1 (*0 0($' $/($ $/' FJ6 1(+&#'1 8'.' 3,##'3$'0 9D (*0 GB+<5 Q..(0%(*3' %1 &#,$$'0

.'#($%=' $, %..(0%(*3' ($ KPD *+5

© 2001 Macmillan Magazines Ltd

Page 65: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Community structure and metabolismthrough reconstruction of microbialgenomes from the environmentGene W. Tyson1, Jarrod Chapman3,4, Philip Hugenholtz1, Eric E. Allen1, Rachna J. Ram1, Paul M. Richardson4, Victor V. Solovyev4,Edward M. Rubin4, Daniel S. Rokhsar3,4 & Jillian F. Banfield1,2

1Department of Environmental Science, Policy and Management, 2Department of Earth and Planetary Sciences, and 3Department of Physics, University of California,Berkeley, California 94720, USA4Joint Genome Institute, Walnut Creek, California 94598, USA

...........................................................................................................................................................................................................................

Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and theirroles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we reportreconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three othergenomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency ofgenomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a differentindividual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level.The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundancevariants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologousrecombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed thepathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extremeenvironment.

The study of microbial evolution and ecology has been revolutio-nized by DNA sequencing and analysis1–3. However, isolates havebeen the main source of sequence data, and only a small fraction ofmicroorganisms have been cultivated4–6. Consequently, focus hasshifted towards the analysis of uncultivated microorganisms viacloning of conserved genes5 and genome fragments directly fromthe environment7–9. To date, only a small fraction of genes have beenrecovered from individual environments, limiting the analysis ofmicrobial communities as networks characterized by symbioses,competition and partitioning of community-essential roles.Comprehensive genomic data would resolve organism-specificpathways and provide insights into population structure, speciationand evolution. So far, sequencing of whole communities has notbeen practical because most communities comprise hundreds tothousands of species10.

Acid mine drainage (AMD) is a worldwide environmentalproblem that arises largely from microbial activity11. Here, wefocused on a low-complexity AMD microbial biofilm growinghundreds of feet underground within a pyrite (FeS2) ore body

12–15.This represents a self-contained biogeochemical system character-ized by tight coupling between microbial iron oxidation andacidification due to pyrite dissolution11,16,17. Random shotgunsequencing of DNA from entire microbial communities is oneapproach for the recovery of the gene complement of uncultivatedorganisms, and for determining the degree of variability withinpopulations at the genome level. We used random shotgun sequen-cing of the biofilm to obtain the first reconstruction of multiplegenomes directly from a natural sample. The results provide novelinsights into community structure, and reveal the strategies thatunderpin microbial activity in this environment.

Initial characterization of the biofilmBiofilms growing on the surface of flowing AMD in the five-way region of the Richmond mine at Iron Mountain, California12,were sampled in March 2000. Screening using group-specific18

fluorescence in situ hybridization (FISH) revealed that all biofilmscontained mixtures of bacteria (Leptospirillum, Sulfobacillus and, ina few cases, Acidimicrobium) and archaea (Ferroplasma and othermembers of the Thermoplasmatales). The genome of one of thesearchaea, Ferroplasma acidarmanus fer1, isolated from the Richmondmine, has been sequenced previously (http://www.jgi.doe.gov/JGI_microbial/html/ferroplasma/ferro_homepage.html).A pink biofilm (Fig. 1a) typical of AMD communities was

selected for detailed genomic characterization (see SupplementaryInformation). The biofilm was dominated by Leptospirillum speciesand contained F. acidarmanus at a relatively low abundance (Fig. 1b,c). This biofilm was growing in pH 0.83, 42 8C, 317mM Fe, 14mMZn, 4mM Cu and 2mM As solution, and was collected from asurface area of approximately 0.05m2.A 16S ribosomal RNA gene clone library was constructed from

DNA extracted from the pink biofilm, and 384 clones were end-sequenced (see Supplementary Information). Results indicated thepresence of three bacterial and three archaeal lineages. The mostabundant clones are close relatives of L. ferriphilum19 and belongto Leptospirillum group II (ref. 13). Although 94% of the Lepto-spirillum group II clones were identical, 17 minor variants weredetected with up to 1.2% 16S rRNA gene-sequence divergence fromthe dominant type. Tightly defined groups (up to 1% sequencedivergence) related to Leptospirillum group III (ref. 13), Sulfobacillus,Ferroplasma (some identical to fer1), ‘A-plasma’15 and ‘G-plasma’15

were also detected. Leptospirillum group III, G-plasma andA-plasma have only recently been detected in culture-independentmolecular surveys. FISH-based quantification (Fig. 1c; seealso Supplementary Information) confirmed the dominance ofLeptospirillum group II in the biofilm.

Community genome sequencing and assemblyIn conventional shotgun sequencing projects of microbial isolates,all shotgun fragments are derived from clones of the same genome.When using the shotgun sequencing approach on genomes from an

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature 1© 2004 Nature Publishing Group

!65

Environmental Genome ShotgunSequencing of the Sargasso SeaJ. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3

Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3

Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3

Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6

Michael W. Lomas,6 Ken Nealson,5 Owen White,3

Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6

Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4

Hamilton O. Smith1

Wehave applied “whole-genome shotgun sequencing” tomicrobial populationscollected enmasse on tangential flow and impact filters from seawater samplescollected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairsof nonredundant sequencewas generated, annotated, and analyzed to elucidatethe gene content, diversity, and relative abundance of the organisms withinthese environmental samples. These data are estimated to derive from at least1800 genomic species based on sequence relatedness, including 148 previouslyunknown bacterial phylotypes. We have identified over 1.2 million previouslyunknown genes represented in these samples, including more than 782 newrhodopsin-like photoreceptors. Variation in species present and stoichiometrysuggests substantial oceanic microbial diversity.

Microorganisms are responsible for most of thebiogeochemical cycles that shape the environ-ment of Earth and its oceans. Yet, these organ-isms are the least well understood on Earth, asthe ability to study and understand the metabol-ic potential of microorganisms has been ham-pered by the inability to generate pure cultures.Recent studies have begun to explore environ-mental bacteria in a culture-independent man-ner by isolating DNA from environmental sam-ples and transforming it into large insert clones.For example, a previously unknown light-drivenproton pump, proteorhodopsin, was discoveredwithin a bacterial artificial chromosome (BAC)from the genome of a SAR86 ribotype (1), andsoil microbial DNA libraries have been construct-ed and screened for specific activities (2).

Here we have applied whole-genome shot-gun sequencing to environmental-pooled DNAsamples to test whether new genomic approach-es can be effectively applied to gene and spe-cies discovery and to overall environmental

characterization. To help ensure a tractable pilotstudy, we sampled in the Sargasso Sea, a nutrient-limited, open ocean environment. Further, weconcentrated on the genetic material captured onfilters sized to isolate primarily microbial inhabit-ants of the environment, leaving detailed analysisof dissolved DNA and viral particles on one endof the size spectrum and eukaryotic inhabitants onthe other, for subsequent studies.The Sargasso Sea. The northwest Sar-

gasso Sea, at the Bermuda Atlantic Time-seriesStudy site (BATS), is one of the best-studiedand arguably most well-characterized regionsof the global ocean. The Gulf Stream representsthe western and northern boundaries of thisregion and provides a strong physical boundary,separating the low nutrient, oligotrophic openocean from the more nutrient-rich waters of theU.S. continental shelf. The Sargasso Sea hasbeen intensively studied as part of the 50-yeartime series of ocean physics and biogeochem-istry (3, 4) and provides an opportunity forinterpretation of environmental genomic data inan oceanographic context. In this region, for-mation of subtropical mode water occurs eachwinter as the passage of cold fronts across theregion erodes the seasonal thermocline andcauses convective mixing, resulting in mixedlayers of 150 to 300 m depth. The introductionof nutrient-rich deep water, following thebreakdown of seasonal thermoclines into thebrightly lit surface waters, leads to the bloom-ing of single cell phytoplankton, including twocyanobacteria species, Synechococcus and Pro-

chlorococcus, that numerically dominate thephotosynthetic biomass in the Sargasso Sea.

Surface water samples (170 to 200 liters)were collected aboard the RV Weatherbird IIfrom three sites off the coast of Bermuda inFebruary 2003. Additional samples were col-lected aboard the SV Sorcerer II from “Hydro-station S” in May 2003. Sample site locationsare indicated on Fig. 1 and described in tableS1; sampling protocols were fine-tuned fromone expedition to the next (5). Genomic DNAwas extracted from filters of 0.1 to 3.0 !m, andgenomic libraries with insert sizes ranging from2 to 6 kb were made as described (5). Theprepared plasmid clones were sequenced fromboth ends to provide paired-end reads at the J.Craig Venter Science Foundation Joint Tech-nology Center on ABI 3730XL DNA sequenc-ers (Applied Biosystems, Foster City, CA).Whole-genome random shotgun sequencing ofthe Weatherbird II samples (table S1, samples 1 to4) produced 1.66 million reads averaging 818 bpin length, for a total of approximately 1.36 Gbp ofmicrobial DNA sequence. An additional 325,561sequences were generated from the Sorcerer IIsamples (table S1, samples 5 to 7), yielding ap-proximately 265 Mbp of DNA sequence.Environmental genome shotgun as-

sembly. Whole-genome shotgun sequencingprojects have traditionally been applied to iden-tify the genome sequence(s) from one particularorganism, whereas the approach taken here isintended to capture representative sequencefrom many diverse organisms simultaneously.Variation in genome size and relative abun-dance determines the depth of coverage of anyparticular organism in the sample at a givenlevel of sequencing and has strong implicationsfor both the application of assembly algorithmsand for the metrics used in evaluating the re-sulting assembly. Although we would expectabundant species to be deeply covered and wellassembled, species of lower abundance may berepresented by only a few sequences. For asingle genome analysis, assembly coveragedepth in unique regions should approximate aPoisson distribution. The mean of this distribu-tion can be estimated from the observed data,looking at the depth of coverage of contigsgenerated before any scaffolding. The assem-bler used in this study, the Celera Assembler(6), uses this value to heuristically identifyclearly unique regions to form the backbone ofthe final assembly within the scaffolding phase.However, when the starting material consists ofa mixture of genomes of varying abundance, athreshold estimated in this way would classifysamples from the most abundant organism(s) asrepetitive, due to their greater-than-averagedepth of coverage, paradoxically leaving themost abundant organisms poorly assembled.We therefore used manual curation of an initial

1The Institute for Biological Energy Alternatives, 2TheCenter for the Advancement of Genomics, 1901 Re-search Boulevard, Rockville, MD 20850, USA. 3TheInstitute for Genomic Research, 9712 Medical CenterDrive, Rockville, MD 20850, USA. 4The J. Craig VenterScience Foundation Joint Technology Center, 5 Re-search Place, Rockville, MD 20850, USA. 5University ofSouthern California, 223 Science Hall, Los Angeles, CA90089–0740, USA. 6Bermuda Biological Station forResearch, Inc., 17 Biological Lane, St George GE 01,Bermuda.

*To whom correspondence should be addressed. E-mail: [email protected]

RESEARCH ARTICLE

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org66

Page 66: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Shotgun metagenomics

shotgunsequence

!66

Page 67: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Acid Mine Drainage 2004

environmental sample, however, variation within each speciespopulation might complicate assembly. If intraspecies variation isdominated by limited local polymorphism or homologous recom-bination, it should be possible to define a composite genome foreach species population. Conversely, if the genomic heterogeneitywithin a species is dominated by large rearrangements, deletions, orinsertions, it may be impossible to define composite genomes forspecies populations from natural communities.A small insert plasmid library (average insert size 3.2 kilobases

(kb)) was constructed from the biofilm DNA for random shotgunsequencing (see Supplementary Information). A total of 76.2million base pairs (bp) of DNA sequence was generated from103,462 high-quality reads (averaging 737 bp per read). Analysisof raw shotgun data (Supplementary Figs S1–5) indicated thepresence of both bacterial and archaeal genomes at sequencecoverages of up to 10£, which would be sufficient to produce ahigh-quality assembly from a conventional microbial genomeproject20,21. The shotgun data set was assembled with JAZZ, awhole-genome shotgun assembler22. Anticipating polymorphisms,we permitted alignment discrepancies beyond those expected fromsequencing error if they were consistent with end-pairing con-straints. Over 85% of the shotgun reads were assembled intoscaffolds longer than 2 kb (a scaffold is a reconstructed genomicregion that may contain gaps of a known size range). The combinedlength of the 1,183 scaffolds is 10.83 megabases (Mb). The assemblyis internally self consistent, with 97.2% of end pairs from the sameclone assembled with the appropriate orientation and separation, asexpected for a low rate of mispairing error (tracking and chimaericclones).The first step in assignment of scaffolds to organism types was to

separate the scaffolds by average GþC content. These were sub-sequently subdivided using read depth (coverage). Dinucleotidefrequencies did not allow for further subdivision. Notably, separa-tion of scaffolds into low GþC (,43.5%; Supplementary Fig. S3a)and high GþC ($43.5%) content ‘bins’ was not significantlycompromised by local heterogeneities in GþC content becausethe scaffolds were binned after assembly. As the scaffolds aretypically tens of kilobases long, local fluctuations in GþC contentare averaged over the length of each scaffold, allowing, in most cases(.99%), clear assignment to bins of high or low GþC content.

The high GþC scaffolds at approximately 10£ coverage (70scaffolds up to 137 kb in length, totalling 2.23Mb) were identifiedby the presence of a single 16S rRNA gene as belonging to thegenome of a Leptospirillum group II species. The average GþCcontent (55.8%) is comparable to the GþC content (54.9–58%) ofL. ferriphilum19. The total high GþC scaffold length is close to theestimated genome size of Leptospirillum ferrooxidans23 (1.9Mb).This suggests that essentially the entire Leptospirillum group IIgenome was recovered from the community DNA.

The low GþC scaffolds at approximately 10£ coverage wereassembled into 59 scaffolds of up to 138 kb in length, totalling1.82Mb. The single 16S rRNA gene identified in these scaffolds was99% identical to that of the fer1 isolate; however, alignment of thescaffolds to the fer1 genome revealed an average of 22% divergenceat the nucleotide level (Supplementary Fig. S6). The total scaffoldlength is close to the genome size of fer1 (1.9Mb; Allen et al.,unpublished data), and local gene order and content are highlyconserved (Supplementary Fig. S7). Therefore, these 59 scaffoldsrepresent a nearly complete genome of a previously unknown,uncultured Ferroplasma species distinct from fer1. We designatethis as Ferroplasma type II. The dominance of this organism typewas unexpected before the genomic analysis.

We assigned the roughly 3£ coverage, high GþC scaffolds toLeptospirillum group III on the basis of rRNAmarkers (474 scaffoldsup to 31 kb, totalling 2.66Mb). Comparison of these scaffolds withthose assigned to Leptospirillum group II indicates significantsequence divergence and only locally conserved gene order, con-firming that the scaffolds belong to a relatively distant relative ofLeptospirillum group II. A partial 16S rRNA gene sequence fromSulfobacillus thermosulfidooxidans was identified in the un-assembled reads, suggesting very low coverage of this organism. Ifany Sulfobacillus scaffolds .2 kb were assembled, they would begrouped with the Leptospirillum group III scaffolds.

We compared the 3£ coverage, low GþC scaffolds (580 scaffolds,4.12Mb) to the fer1 genome in order to assign them to organismtypes (Supplementary Fig. S6). Scaffolds with $96% nucleotideidentity to fer1 were assigned to an environmental Ferroplasma typeI genome (170 scaffolds up to 47 kb in length and comprising1.48Mb of sequence). The remaining low-coverage, low GþCscaffolds are tentatively assigned to G-plasma. The largest scaffoldin this bin (62 kb) contains the G-plasma 16S rRNA gene. The 410scaffolds assigned to G-plasma comprise 2.65Mb of sequence. Apartial 16S rRNAgene sequence fromA-plasmawas identified in theunassembled reads, suggesting low coverage of this organism. Anyscaffolds from A-plasma.2 kb would be included in the G-plasmabin. Although eukaryotes are present in the AMD system, they werein low abundance in the biofilm studied. So far, no scaffolds fromeukaryotes have been detected.

As independent evidence that the Leptospirillum group II andFerroplasma type II genomes are nearly complete, we located a fullcomplement of transfer RNA synthetases in each genome data set.An almost complete set of these genes was also recovered fromLeptospirillum group III. TheG-plasma bin containsmore than a fullset of tRNA synthetases, consistent with inclusion of some A-plasmascaffolds. In addition, we established that the Leptospirillumgroup II, Leptospirillum group III, Ferroplasma type I, Ferroplasmatype II and G-plasma bins contained only one set of rRNA genes.

Figure 1 The pink biofilm. a, Photograph of the biofilm in the Richmond mine (hand

included for scale). b, FISH image of a. Probes targeting bacteria (EUBmix; fluoresceinisothiocyanate (green)) and archaea (ARC915; Cy5 (blue)) were used in combination with a

probe targeting the Leptospirillum genus (LF655; Cy3 (red)). Overlap of red and green

(yellow) indicates Leptospirillum cells and shows the dominance of Leptospirillum.

c, Relative microbial abundances determined using quantitative FISH counts.

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature2 © 2004 Nature Publishing Group !67

Page 68: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

inputs of fixed carbon or nitrogen from external sources. As withLeptospirillum group I, both Leptospirillum group II and III have thegenes needed to fix carbon by means of the Calvin–Benson–Bassham cycle (using type II ribulose 1,5-bisphosphate carboxy-lase–oxygenase). All genomes recovered from the AMD system

contain formate hydrogenlyase complexes. These, in combinationwith carbon monoxide dehydrogenase, may be used for carbonfixation via the reductive acetyl coenzyme A (acetyl-CoA) pathwayby some, or all, organisms. Given the large number of ABC-typesugar and amino acid transporters encoded in the Ferroplasma type

Figure 4 Cell metabolic cartoons constructed from the annotation of 2,180 ORFs

identified in the Leptospirillum group II genome (63% with putative assigned function) and

1,931 ORFs in the Ferroplasma type II genome (58% with assigned function). The cell

cartoons are shown within a biofilm that is attached to the surface of an acid mine

drainage stream (viewed in cross-section). Tight coupling between ferrous iron oxidation,

pyrite dissolution and acid generation is indicated. Rubisco, ribulose 1,5-bisphosphate

carboxylase–oxygenase. THF, tetrahydrofolate.

articles

NATURE | doi:10.1038/nature02340 | www.nature.com/nature 5© 2004 Nature Publishing Group

!68

Page 69: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Environmental Genome ShotgunSequencing of the Sargasso SeaJ. Craig Venter,1* Karin Remington,1 John F. Heidelberg,3

Aaron L. Halpern,2 Doug Rusch,2 Jonathan A. Eisen,3

Dongying Wu,3 Ian Paulsen,3 Karen E. Nelson,3 William Nelson,3

Derrick E. Fouts,3 Samuel Levy,2 Anthony H. Knap,6

Michael W. Lomas,6 Ken Nealson,5 Owen White,3

Jeremy Peterson,3 Jeff Hoffman,1 Rachel Parsons,6

Holly Baden-Tillson,1 Cynthia Pfannkoch,1 Yu-Hui Rogers,4

Hamilton O. Smith1

Wehave applied “whole-genome shotgun sequencing” tomicrobial populationscollected enmasse on tangential flow and impact filters from seawater samplescollected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairsof nonredundant sequencewas generated, annotated, and analyzed to elucidatethe gene content, diversity, and relative abundance of the organisms withinthese environmental samples. These data are estimated to derive from at least1800 genomic species based on sequence relatedness, including 148 previouslyunknown bacterial phylotypes. We have identified over 1.2 million previouslyunknown genes represented in these samples, including more than 782 newrhodopsin-like photoreceptors. Variation in species present and stoichiometrysuggests substantial oceanic microbial diversity.

Microorganisms are responsible for most of thebiogeochemical cycles that shape the environ-ment of Earth and its oceans. Yet, these organ-isms are the least well understood on Earth, asthe ability to study and understand the metabol-ic potential of microorganisms has been ham-pered by the inability to generate pure cultures.Recent studies have begun to explore environ-mental bacteria in a culture-independent man-ner by isolating DNA from environmental sam-ples and transforming it into large insert clones.For example, a previously unknown light-drivenproton pump, proteorhodopsin, was discoveredwithin a bacterial artificial chromosome (BAC)from the genome of a SAR86 ribotype (1), andsoil microbial DNA libraries have been construct-ed and screened for specific activities (2).

Here we have applied whole-genome shot-gun sequencing to environmental-pooled DNAsamples to test whether new genomic approach-es can be effectively applied to gene and spe-cies discovery and to overall environmental

characterization. To help ensure a tractable pilotstudy, we sampled in the Sargasso Sea, a nutrient-limited, open ocean environment. Further, weconcentrated on the genetic material captured onfilters sized to isolate primarily microbial inhabit-ants of the environment, leaving detailed analysisof dissolved DNA and viral particles on one endof the size spectrum and eukaryotic inhabitants onthe other, for subsequent studies.The Sargasso Sea. The northwest Sar-

gasso Sea, at the Bermuda Atlantic Time-seriesStudy site (BATS), is one of the best-studiedand arguably most well-characterized regionsof the global ocean. The Gulf Stream representsthe western and northern boundaries of thisregion and provides a strong physical boundary,separating the low nutrient, oligotrophic openocean from the more nutrient-rich waters of theU.S. continental shelf. The Sargasso Sea hasbeen intensively studied as part of the 50-yeartime series of ocean physics and biogeochem-istry (3, 4) and provides an opportunity forinterpretation of environmental genomic data inan oceanographic context. In this region, for-mation of subtropical mode water occurs eachwinter as the passage of cold fronts across theregion erodes the seasonal thermocline andcauses convective mixing, resulting in mixedlayers of 150 to 300 m depth. The introductionof nutrient-rich deep water, following thebreakdown of seasonal thermoclines into thebrightly lit surface waters, leads to the bloom-ing of single cell phytoplankton, including twocyanobacteria species, Synechococcus and Pro-

chlorococcus, that numerically dominate thephotosynthetic biomass in the Sargasso Sea.

Surface water samples (170 to 200 liters)were collected aboard the RV Weatherbird IIfrom three sites off the coast of Bermuda inFebruary 2003. Additional samples were col-lected aboard the SV Sorcerer II from “Hydro-station S” in May 2003. Sample site locationsare indicated on Fig. 1 and described in tableS1; sampling protocols were fine-tuned fromone expedition to the next (5). Genomic DNAwas extracted from filters of 0.1 to 3.0 !m, andgenomic libraries with insert sizes ranging from2 to 6 kb were made as described (5). Theprepared plasmid clones were sequenced fromboth ends to provide paired-end reads at the J.Craig Venter Science Foundation Joint Tech-nology Center on ABI 3730XL DNA sequenc-ers (Applied Biosystems, Foster City, CA).Whole-genome random shotgun sequencing ofthe Weatherbird II samples (table S1, samples 1 to4) produced 1.66 million reads averaging 818 bpin length, for a total of approximately 1.36 Gbp ofmicrobial DNA sequence. An additional 325,561sequences were generated from the Sorcerer IIsamples (table S1, samples 5 to 7), yielding ap-proximately 265 Mbp of DNA sequence.Environmental genome shotgun as-

sembly. Whole-genome shotgun sequencingprojects have traditionally been applied to iden-tify the genome sequence(s) from one particularorganism, whereas the approach taken here isintended to capture representative sequencefrom many diverse organisms simultaneously.Variation in genome size and relative abun-dance determines the depth of coverage of anyparticular organism in the sample at a givenlevel of sequencing and has strong implicationsfor both the application of assembly algorithmsand for the metrics used in evaluating the re-sulting assembly. Although we would expectabundant species to be deeply covered and wellassembled, species of lower abundance may berepresented by only a few sequences. For asingle genome analysis, assembly coveragedepth in unique regions should approximate aPoisson distribution. The mean of this distribu-tion can be estimated from the observed data,looking at the depth of coverage of contigsgenerated before any scaffolding. The assem-bler used in this study, the Celera Assembler(6), uses this value to heuristically identifyclearly unique regions to form the backbone ofthe final assembly within the scaffolding phase.However, when the starting material consists ofa mixture of genomes of varying abundance, athreshold estimated in this way would classifysamples from the most abundant organism(s) asrepetitive, due to their greater-than-averagedepth of coverage, paradoxically leaving themost abundant organisms poorly assembled.We therefore used manual curation of an initial

1The Institute for Biological Energy Alternatives, 2TheCenter for the Advancement of Genomics, 1901 Re-search Boulevard, Rockville, MD 20850, USA. 3TheInstitute for Genomic Research, 9712 Medical CenterDrive, Rockville, MD 20850, USA. 4The J. Craig VenterScience Foundation Joint Technology Center, 5 Re-search Place, Rockville, MD 20850, USA. 5University ofSouthern California, 223 Science Hall, Los Angeles, CA90089–0740, USA. 6Bermuda Biological Station forResearch, Inc., 17 Biological Lane, St George GE 01,Bermuda.

*To whom correspondence should be addressed. E-mail: [email protected]

RESEARCH ARTICLE

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org66

!69

http://www.sciencemag.org/content/304/5667/66

Page 70: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Sargasso Sea

assembly to identify a set of large, deeply as-sembling nonrepetitive contigs. This was used toset the expected coverage in unique regions (to23!) for a final run of the assembler. This al-lowed the deep contigs to be treated as uniquesequence when they would otherwise be labeledas repetitive. We evaluated our final assemblyresults in a tiered fashion, looking at well-sampledgenomic regions separately from those barelysampled at our current level of sequencing.

The 1.66 million sequences from theWeatherbird II samples (table S1; samples 1 to4; stations 3, 11, and 13), were pooled andassembled to provide a single master assemblyfor comparative purposes. The assembly gener-ated 64,398 scaffolds ranging in size from 826bp to 2.1 Mbp, containing 256 Mbp of uniquesequence and spanning 400 Mbp. After assem-bly, there remained 217,015 paired-end reads,or “mini-scaffolds,” spanning 820.7 Mbp aswell as an additional 215,038 unassembled sin-gleton reads covering 169.9 Mbp (table S2,column 1). The Sorcerer II samples providedalmost no assembly, so we consider for thesesamples only the 153,458 mini-scaffolds, span-ning 518.4 Mbp, and the remaining 18,692singleton reads (table S2, column 2). In total,1.045 Gbp of nonredundant sequence was gen-erated. The lack of overlapping reads within theunassembled set indicates that lack of addition-al assembly was not due to algorithmic limita-tions but to the relatively limited depth of se-quencing coverage given the level of diversitywithin the sample.

The whole-genome shotgun (WGS) assemblyhas been deposited at DDBJ/EMBL/GenBankunder the project accession AACY00000000,and all traces have been deposited in a corre-sponding TraceDB trace archive. The versiondescribed in this paper is the first version,AACY01000000. Unlike a conventional WGSentry, we have deposited not just contigs andscaffolds but the unassembled paired singletonsand individual singletons in order to accurate-ly reflect the diversity in the sample andallow searches across the entire sample with-in a single database.Genomes and large assemblies. Our

analysis first focused on the well-sampled ge-nomes by characterizing scaffolds with at least3! coverage depth. There were 333 scaffoldscomprising 2226 contigs and spanning 30.9Mbp that met this criterion (table S3), account-ing for roughly 410,000 reads, or 25% of thepooled assembly data set. From this set of well-sampled material, we were able to cluster andclassify assemblies by organism; from the rarespecies in our sample, we used sequence similar-ity based methods together with computationalgene finding to obtain both qualitative and quan-titative estimates of genomic and functional diver-sity within this particular marine environment.

We employed several criteria to sort themajor assembly pieces into tentative organism“bins”; these include depth of coverage, oligo-

nucleotide frequencies (7), and similarity topreviously sequenced genomes (5). With thesetechniques, the majority of sequence assignedto the most abundant species (16.5 Mbp of the30.9 Mb in the main scaffolds) could be sepa-rated based on several corroborating indicators.In particular, we identified a distinct group ofscaffolds representing an abundant populationclearly related to Burkholderia (fig. S2) andtwo groups of scaffolds representing two dis-tinct strains closely related to the published

Shewanella oneidensis genome (8) (fig. S3).There is a group of scaffolds assembling at over6! coverage that appears to represent the ge-nome of a SAR86 (table S3). Scaffold setsrepresenting a conglomerate of Prochlorococ-cus strains (Fig. 2), as well as an unculturedmarine archaeon, were also identified (table S3;Fig. 3). Additionally, 10 putative mega plasmidswere found in the main scaffold set, coveredat depths ranging from 4! to 36! (indicatedwith shading in table S3 with nine depicted in

Fig. 1. MODIS-Aqua satellite image ofocean chlorophyll in the Sargasso Sea gridabout the BATS site from 22 February2003. The station locations are overlainwith their respective identifications. Notethe elevated levels of chlorophyll (greencolor shades) around station 3, which arenot present around stations 11 and 13.

Fig. 2. Gene conser-vation among closelyrelated Prochlorococ-cus. The outermostconcentric circle ofthe diagram depictsthe competed genom-ic sequence of Pro-chlorococcus marinusMED4 (11). Fragmentsfrom environmentalsequencing were com-pared to this complet-ed Prochlorococcus ge-nome and are shown inthe inner concentriccircles and were givenboxed outlines. Genesfor the outermost cir-cle have been as-signed psuedospec-trum colors based onthe position of thosegenes along the chro-mosome, where genesnearer to the start ofthe genome are col-ored in red, and genesnearer to the end of the genome are colored in blue. Fragments from environmental sequencingwere subjected to an analysis that identifies conserved gene order between those fragments andthe completed Prochlorococcus MED4 genome. Genes on the environmental genome segmentsthat exhibited conserved gene order are colored with the same color assignments as theProchlorococcus MED4 chromosome. Colored regions on the environmental segments exhibitingcolor differences from the adjacent outermost concentric circle are the result of conserved geneorder with other MED4 regions and probably represent chromosomal rearrangements. Genes thatdid not exhibit conserved gene order are colored in black.

R E S E A R C H A R T I C L E

www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 67

!70

http://www.sciencemag.org/content/304/5667/66

Page 71: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

identified and curated genes. With the vast ma-jority of the Sargasso sequence in short (lessthan 10 kb), unassociated scaffolds and single-tons from hundreds of different organisms, it isimpractical to apply this approach. Instead, wedeveloped an evidence-based gene finder (5).Briefly, evidence in the form of protein align-ments to sequences in the bacterial portion ofthe nonredundant amino acid (nraa) data set(13) was used to determine the most likelycoding frame. Likewise, approximate start andstop positions were determined from the bound-ing coordinates of the alignments and refined toidentify specific start and stop codons. Thisapproach identified 1,214,207 genes coveringover 700 MB of the total data set. This repre-sents approximately an order of magnitudemore sequences than currently archived in thecurated SwissProt database (14), which con-tains 137,885 sequence entries at the time ofwriting; roughly the same number of sequencesas have been deposited into the uncuratedREM-TrEMBL database (14) since its incep-tion in 1996. After excluding all intervals cov-ered by previously identified genes, additionalhypothetical genes were identified on the basisof the presence of conserved open readingframes (5). A total of 69,901 novel genes be-longing to 15,601 single link clusters were iden-tified. The predicted genes were categorized

Fig. 5. Prochlorococcus-related scaffold 2223290 illustrates the assembly of a broad commu-nity of closely related organisms, distinctly nonpunctate in nature. The image represents (A)global structure of Scaffold 2223290 with respect to assembly and (B) a sample of the multiplesequence alignment. Blue segments, contigs; green segments, fragments; and yellow segments,stages of the assembly of fragments into the resulting contigs. The yellow bars indicate thatfragments were initially assembled in several different pieces, which in places collapsed toform the final contig structure. The multiple sequence alignment for this region shows ahomogenous blend of haplotypes, none with sufficient depth of coverage to provide aseparate assembly.

Table 1. Gene count breakdown by TIGR rolecategory. Gene set includes those found on as-semblies from samples 1 to 4 and fragment readsfrom samples 5 to 7. A more detailed table, sep-arating Weatherbird II samples from the Sorcerer IIsamples is presented in the SOM (table S4). Notethat there are 28,023 genes which were classifiedin more than one role category.

TIGR role category Totalgenes

Amino acid biosynthesis 37,118Biosynthesis of cofactors,prosthetic groups, and carriers

25,905

Cell envelope 27,883Cellular processes 17,260Central intermediary metabolism 13,639DNA metabolism 25,346Energy metabolism 69,718Fatty acid and phospholipidmetabolism

18,558

Mobile and extrachromosomalelement functions

1,061

Protein fate 28,768Protein synthesis 48,012Purines, pyrimidines, nucleosides,and nucleotides

19,912

Regulatory functions 8,392Signal transduction 4,817Transcription 12,756Transport and binding proteins 49,185Unknown function 38,067Miscellaneous 1,864Conserved hypothetical 794,061

Total number of roles assigned 1,242,230

Total number of genes 1,214,207

R E S E A R C H A R T I C L E

www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 69!71

Fig. 4). Other organisms were not so readilyseparated, presumably reflecting some combi-nation of shorter assemblies with less “taxo-nomic signal,” less distinctive sequence, andgreater divergence from previously sequencedgenomes (9).Discrete species versus a population

continuum. The most deeply covered of thescaffolds (21 scaffolds with over 14! coverageand 9.35 Mb of sequence), contain just over 1single nucleotide polymorphism (SNP) per10,000 base pairs, strongly supporting the pres-ence of discrete species within the sample. Inthe remaining main scaffolds (table S3), theSNP rate ranges from 0 to 26 per 1000 bp, witha length-weighted average of 3.6 per 1000 bp.We closely examined the multiple sequencealignments of the contigs with high SNP ratesand were able to classify these into two fairlydistinct classes: regions where several closelyrelated haplotypes have been collapsed, in-creasing the depth of coverage accordingly(10), and regions that appear to be a relativelyhomogenous blend of discrepancies from theconsensus without any apparent separation intohaplotypes, such as the Prochlorococcus scaf-fold region (Fig. 5). Indeed, the Prochlorococ-cus scaffolds display considerable heterogene-ity not only at the nucleotide sequence level(Fig. 5) but also at the genomic level, wheremultiple scaffolds align with the same region ofthe MED4 (11) genome but differ due to geneor genomic island insertion, deletion, rearrange-ment events. This observation is consistent withprevious findings (12). For instance, scaffolds2221918 and 2223700 share gene synteny witheach other and MED4 but differ by the insertionof 15 genes of probable phage origin, likelyrepresenting an integrated bacteriophage. Thesegenomic differences are displayed graphicallyin Fig. 2, where it is evident that up to fourconflicting scaffolds can align with the sameregion of the MED4 genome. More than 85%of the Prochlorococcus MED4 genome can bealigned with Sargasso Sea scaffolds greaterthan 10 kb; however, there appear to be acouple of regions of MED4 that are not repre-sented in the 10-kb scaffolds (Fig. 2). Thelarger of these two regions (PMM1187 toPMM1277) consists primarily of a gene clustercoding for surface polysaccharide biosynthesis,which may represent a MED4-specific polysac-charide absent or highly diverged in our Sar-gasso Sea Prochlorococcus bacteria. The heter-ogeneity of the Prochlorococcus scaffolds suggestthat the scaffolds are not derived from a singlediscrete strain, but instead probably represent aconglomerate assembled from a population ofclosely related Prochlorococcus biotypes.The gene complement of the Sargasso.

The heterogeneity of the Sargasso sequencescomplicates the identification of microbialgenes. The typical approach for microbial an-notation, model-based gene finding, relies en-tirely on training with a subset of manually

Fig. 3. Comparison ofSargasso Sea scaf-folds to Crenarchaealclone 4B7. Predictedproteins from 4B7and the scaffoldsshowing significanthomology to 4B7 bytBLASTx are arrayedin positional orderalong the x and yaxes. Colored boxesrepresent BLASTpmatches scoring atleast 25% similarityand with an e valueof better than 1e-5.Black vertical andhorizontal lines delin-eate scaffold borders.

Fig. 4. Circular diagrams of nine complete megaplasmids. Genes encoded in the forward directionare shown in the outer concentric circle; reverse coding genes are shown in the inner concentriccircle. The genes have been given role category assignment and colored accordingly: amino acidbiosynthesis, violet; biosynthesis of cofactors, prosthetic groups, and carriers, light blue; cellenvelope, light green; cellular processes, red; central intermediary metabolism, brown; DNAmetabolism, gold; energy metabolism, light gray; fatty acid and phospholipid metabolism, magenta;protein fate and protein synthesis, pink; purines, pyrimidines, nucleosides, and nucleotides, orange;regulatory functions and signal transduction, olive; transcription, dark green; transport and bindingproteins, blue-green; genes with no known homology to other proteins and genes with homologyto genes with no known function, white; genes of unknown function, gray; Tick marks are placedon 10-kb intervals.

R E S E A R C H A R T I C L E

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org68

assembly to identify a set of large, deeply as-sembling nonrepetitive contigs. This was used toset the expected coverage in unique regions (to23!) for a final run of the assembler. This al-lowed the deep contigs to be treated as uniquesequence when they would otherwise be labeledas repetitive. We evaluated our final assemblyresults in a tiered fashion, looking at well-sampledgenomic regions separately from those barelysampled at our current level of sequencing.

The 1.66 million sequences from theWeatherbird II samples (table S1; samples 1 to4; stations 3, 11, and 13), were pooled andassembled to provide a single master assemblyfor comparative purposes. The assembly gener-ated 64,398 scaffolds ranging in size from 826bp to 2.1 Mbp, containing 256 Mbp of uniquesequence and spanning 400 Mbp. After assem-bly, there remained 217,015 paired-end reads,or “mini-scaffolds,” spanning 820.7 Mbp aswell as an additional 215,038 unassembled sin-gleton reads covering 169.9 Mbp (table S2,column 1). The Sorcerer II samples providedalmost no assembly, so we consider for thesesamples only the 153,458 mini-scaffolds, span-ning 518.4 Mbp, and the remaining 18,692singleton reads (table S2, column 2). In total,1.045 Gbp of nonredundant sequence was gen-erated. The lack of overlapping reads within theunassembled set indicates that lack of addition-al assembly was not due to algorithmic limita-tions but to the relatively limited depth of se-quencing coverage given the level of diversitywithin the sample.

The whole-genome shotgun (WGS) assemblyhas been deposited at DDBJ/EMBL/GenBankunder the project accession AACY00000000,and all traces have been deposited in a corre-sponding TraceDB trace archive. The versiondescribed in this paper is the first version,AACY01000000. Unlike a conventional WGSentry, we have deposited not just contigs andscaffolds but the unassembled paired singletonsand individual singletons in order to accurate-ly reflect the diversity in the sample andallow searches across the entire sample with-in a single database.Genomes and large assemblies. Our

analysis first focused on the well-sampled ge-nomes by characterizing scaffolds with at least3! coverage depth. There were 333 scaffoldscomprising 2226 contigs and spanning 30.9Mbp that met this criterion (table S3), account-ing for roughly 410,000 reads, or 25% of thepooled assembly data set. From this set of well-sampled material, we were able to cluster andclassify assemblies by organism; from the rarespecies in our sample, we used sequence similar-ity based methods together with computationalgene finding to obtain both qualitative and quan-titative estimates of genomic and functional diver-sity within this particular marine environment.

We employed several criteria to sort themajor assembly pieces into tentative organism“bins”; these include depth of coverage, oligo-

nucleotide frequencies (7), and similarity topreviously sequenced genomes (5). With thesetechniques, the majority of sequence assignedto the most abundant species (16.5 Mbp of the30.9 Mb in the main scaffolds) could be sepa-rated based on several corroborating indicators.In particular, we identified a distinct group ofscaffolds representing an abundant populationclearly related to Burkholderia (fig. S2) andtwo groups of scaffolds representing two dis-tinct strains closely related to the published

Shewanella oneidensis genome (8) (fig. S3).There is a group of scaffolds assembling at over6! coverage that appears to represent the ge-nome of a SAR86 (table S3). Scaffold setsrepresenting a conglomerate of Prochlorococ-cus strains (Fig. 2), as well as an unculturedmarine archaeon, were also identified (table S3;Fig. 3). Additionally, 10 putative mega plasmidswere found in the main scaffold set, coveredat depths ranging from 4! to 36! (indicatedwith shading in table S3 with nine depicted in

Fig. 1. MODIS-Aqua satellite image ofocean chlorophyll in the Sargasso Sea gridabout the BATS site from 22 February2003. The station locations are overlainwith their respective identifications. Notethe elevated levels of chlorophyll (greencolor shades) around station 3, which arenot present around stations 11 and 13.

Fig. 2. Gene conser-vation among closelyrelated Prochlorococ-cus. The outermostconcentric circle ofthe diagram depictsthe competed genom-ic sequence of Pro-chlorococcus marinusMED4 (11). Fragmentsfrom environmentalsequencing were com-pared to this complet-ed Prochlorococcus ge-nome and are shown inthe inner concentriccircles and were givenboxed outlines. Genesfor the outermost cir-cle have been as-signed psuedospec-trum colors based onthe position of thosegenes along the chro-mosome, where genesnearer to the start ofthe genome are col-ored in red, and genesnearer to the end of the genome are colored in blue. Fragments from environmental sequencingwere subjected to an analysis that identifies conserved gene order between those fragments andthe completed Prochlorococcus MED4 genome. Genes on the environmental genome segmentsthat exhibited conserved gene order are colored with the same color assignments as theProchlorococcus MED4 chromosome. Colored regions on the environmental segments exhibitingcolor differences from the adjacent outermost concentric circle are the result of conserved geneorder with other MED4 regions and probably represent chromosomal rearrangements. Genes thatdid not exhibit conserved gene order are colored in black.

R E S E A R C H A R T I C L E

www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 67http://www.sciencemag.org/content/304/5667/66

Page 72: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

rRNA phylotyping from metagenomics

!72

http://www.sciencemag.org/content/304/5667/66

Page 73: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Shotgun Sequencing Allows Alternative Anchors (e.g., RecA)

!73

http://www.sciencemag.org/content/304/5667/66

Page 74: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

using the curated TIGR role categories (5). Abreakdown of predicted genes by category isgiven in Table 1.

The samples analyzed here represent onlyspecific size fractions of the sampled environ-ment, dictated by the pore size of the collectionfilters. By our selection of filter pore sizes, wedeliberately focused this initial study on theidentification and analysis of microbial organ-isms. However, we did examine the data for thepresence of eukaryotic content as well. Al-though the bulk of known protists are 10 !mand larger, there are some known in the rangeof 1 to 1.5 !m in diameter [for example, Os-treococcus tauri (15) and the Bolidomonas spe-cies (16)], and such organisms could potentiallywork their way through a 0.80 !m prefilter. Aninitial screening for 18S ribosomal RNA(rRNA), a commonly used eukaryotic marker,identified 69 18S rRNA genes, with 63 of theseon singletons and the remaining 6 on verysmall, lowcoverage assemblies. These 18SrRNAs are similar to uncultured marine eu-karyotes and are indicative of a eukaryotic pres-ence but inconclusive on their own. Becausebacterial DNA contains a much greater densityof genes than eukaryotic DNA, the relativeproportion of gene content can be used as an-other indicator to distinguish eukaryotic mate-rial in our sample. An inverse relation wasobserved between the pore size of the pre-filtersand collection filters and the fraction of se-quence coding for genes (table S5). This rela-tion, together with the presence of 18S rRNAgenes in the samples, is strong evidence thateukaryotic material was indeed captured.Diversity and species richness. Most

phylogenetic surveys of uncultured organismshave been based on studies of rRNA genesusing polymerase chain reaction (PCR) withprimers for highly conserved positions in thosegenes. More than 60,000 small subunit rRNAsequences from a wide diversity of prokaryotictaxa have been reported (17). However, PCR-based studies are inherently biased, because notall rRNA genes amplify with the same “univer-sal” primers. Within our shotgun sequence dataand assemblies, we identified 1164 distinctsmall subunit rRNA genes or fragments ofgenes in the Weatherbird II assemblies andanother 248 within the Sorcerer II reads (5).Using a 97% sequence similarity cutoff to dis-tinguish unique phylotypes, we identified 148previously unknown phylotypes in our samplewhen compared against the RDP II database(17). With a 99% similarity cutoff, this numberincreases to 643. Though sequence similarity isnot necessarily an accurate predictor of func-tional conservation and sequence divergencedoes not universally correlate with the biologi-cal notion of “species,” defining species (alsoknown as phylotypes) by sequence similaritywithin the rRNA genes is the accepted standardin studies of uncultured microbes. All sampledrRNAs were then assigned to taxonomic groups

using an automated rRNA classification pro-gram (5). Our samples are dominated by rRNAgenes from Proteobacteria (primarily membersof the ", #, and $ subgroups) with moderatecontributions from Firmicutes (low-GC Grampositive), Cyanobacteria, and species in theCFB phyla (Cytophaga, Flavobacterium, andBacteroides) (fig. S4A; Fig. 6). The patterns wesee are similar in broad outline to those ob-served by rRNA PCR studies from the SargassoSea (18), but with some quantitative differencesthat reflect either biases in PCR studies or dif-ferences in the species found in our sampleversus those in other studies.

An additional disadvantage associated withrelying on rRNA for estimates of species diver-sity and abundance is the varying number ofcopies of rRNA genes between taxa (more thanan order of magnitude among prokaryotes)(19). Therefore, we constructed phylogenetictrees (fig. S4, B to E) using other representedphylogenetic markers found in our data set,[RecA/RadA, heat shock protein 70 (HSP70),elongation factor Tu (EF-Tu), and elongationfactor G (EF-G)]. Each marker gene interval inour data set (with a minimum length of 75amino acids) was assigned to a putative taxo-nomic group using the phylogenetic analysisdescribed for rRNA. For example, our data set

contains over 600 recA homologs fromthroughout the bacterial phylogeny, includingrepresentatives of Proteobacteria, low- andhigh-GC Gram positives, Cyanobacteria, greensulfur and green nonsulfur bacteria, and othergroups. Assignment to phylogenetic groupsshows a broad consensus among the differentphylogenetic markers. For most taxa, therRNA-based proportion is the highest or lowestin comparison to the other markers. We believethis is due to the large amount of variation incopy number of rRNA genes between species.For example, the rRNA-based estimate of theproportion of $Proteobacteria is the highest,while the estimate for cyanobacteria is the low-est, which is consistent with the reports thatmembers of the $-Proteobacteria frequentlyhave more than five rRNA operon copies,whereas cyanobacteria frequently have fewerthan three (19).

Just as phylogenetic classification isstrengthened by a more comprehensive markerset, so too is the estimation of species richness.In this analysis, we define “genomic” species asa clustering of assemblies or unassembled readsmore than 94% identical on the nucleotide lev-el. This cutoff, adjusted for the protein-codingmarker genes, is roughly comparable to the97% cutoff traditionally used for rRNA. Thus

Fig. 6. Phylogenetic diversity of Sargasso Sea sequences using multiple phylogenetic markers. Therelative contribution of organisms from different major phylogenetic groups (phylotypes) wasmeasured using multiple phylogenetic markers that have been used previously in phylogeneticstudies of prokaryotes: 16S rRNA, RecA, EF-Tu, EF-G, HSP70, and RNA polymerase B (RpoB). Therelative proportion of different phylotypes for each sequence (weighted by the depth of coverageof the contigs from which those sequences came) is shown. The phylotype distribution wasdetermined as follows: (i) Sequences in the Sargasso data set corresponding to each of these geneswere identified using HMM and BLAST searches. (ii) Phylogenetic analysis was performed for eachphylogenetic marker identified in the Sargasso data separately compared with all members of thatgene family in all complete genome sequences (only complete genomes were used to control forthe differential sampling of these markers in GenBank). (iii) The phylogenetic affinity of eachsequence was assigned based on the classification of the nearest neighbor in the phylogenetic tree.

R E S E A R C H A R T I C L E

2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org70 !74

http://www.sciencemag.org/content/304/5667/66

Page 75: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Functional Inference from Metagenomics

• Can work well for individual genes

!75

Page 76: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Functional Diversity of Proteorhodopsins?

!76

http://www.sciencemag.org/content/304/5667/66

Page 77: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

ARTICLE

NATURE COMMUNICATIONS | 2:183 | DOI: 10.1038/ncomms1188 | www.nature.com/naturecommunications

© 2011 Macmillan Publishers Limited. All rights reserved.

Received 3 Nov 2010 | Accepted 11 Jan 2011 | Published 8 Feb 2011 DOI: 10.1038/ncomms1188

Proteorhodopsins are light-driven proton pumps involved in widespread phototrophy. Discovered in marine proteobacteria just 10 years ago, proteorhodopsins are now known to have been spread by lateral gene transfer across diverse prokaryotes, but are curiously absent from eukaryotes. In this study, we show that proteorhodopsins have been acquired by horizontal gene transfer from bacteria at least twice independently in dinoflagellate protists. We find that in the marine predator Oxyrrhis marina, proteorhodopsin is indeed the most abundantly expressed nuclear gene and its product localizes to discrete cytoplasmic structures suggestive of the endomembrane system. To date, photosystems I and II have been the only known mechanism for transducing solar energy in eukaryotes; however, it now appears that some abundant zooplankton use this alternative pathway to harness light to power biological functions.

1 Canadian Institute for Advanced Research, Botany Department, University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z4. †Present addresses: Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X5 (C.H.S); Institute of Medicine, Haukeland University Hospital, University of Bergen, Bergen N-5021, Norway (L.B.). Correspondence and requests for materials should be addressed to P.J.K. (email: [email protected]).

A bacterial proteorhodopsin proton pump in marine eukaryotesClaudio H. Slamovits1,†, Noriko Okamoto1, Lena Burri1,†, Erick R. James1 & Patrick J. Keeling1

http://www.nature.com/ncomms/journal/v2/n2/abs/ncomms1188.html

Page 78: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1188

NATURE COMMUNICATIONS | 2:183 | DOI: 10.1038/ncomms1188 | www.nature.com/naturecommunications

© 2011 Macmillan Publishers Limited. All rights reserved.

50

100

Neurospora crassa51 Aspergillus fumigatus

Gibberella zeae

90‘Oryza_sativa’

95 Leptosphaeria maculans97 Pyrenophora tritici

Bipolaris oryzae100 Oxyrrhis marina OM2197 Dinoflagellate 3

Dinoflagellate 1

Dinoflagellate 2

O. marina OM1497Cyanophora paradoxa

80

Guillardia theta 1490

99100 Rhodomonas salina 987

G. theta t309

62Cryptomonas sp.G. tetha 2135

100 G. theta 2G. theta 1712

99 Haloterrigena sp.68 Haloarcula argentinensis

Haloquadratum walsbyi

100Nostoc sp.

64Natronomonas pharaonis

Halorubrum sodomense62 Haloterrigena sp.

Halobacterium salinarum

100

89

58

Salinibacter ruber (Xanthorhodopsin)uncult. marine picoplankton (Hawaii)

Thermus aquaticusGloeobacter violaceus

Roseiflexus sp.

52

100 Env. SS AACY01492695Env. SS AACY01598379

99GS012-3

73 GS020-4398 GS020-39

GS020-70

99

GS033-3

56

100 GS012-21

53MWH-Dar4

MWH-Dar1

87GS012-37

98GS020-82GS012-39

EgelM2-3.DGS012-14

86

100 uncult. marine bacterium EB0_41B09 (Monterey Bay)79 Methylophilales bacterium

Beta proteobacterium KB159 Marinobacter sp.

Alpha proteobacterium BAL199Octadecabacter antarcticus

92

100 uncult. GOS 7929496 uncult. GOS ECY53283

100

Alexandrium catenella

53

Pyrocystis lunula‘Amphioxus’

83

O. marina 2791 O. marina 344

O. marina 5001

91O. marina 173O. marina 72

55

uncult. marine picoplankton 2 (Hawaii)

65

75 Exiguobacterium sibiricum 100 GS013_1

GS033_122

91

100 Env. SS AACY0100235798 Tenacibaculum sp.

Psychroflexus torquis

95

Env. SS AACY01054069

76

uncult. marine group II euryarchaeote HF70_39H11

69

52uncult. gamma proteobacterium eBACHOT4E07

Env. SS AACY0104396086 uncult. marine bacterium HF130_81H07

Env. SS AACY01120795

100Env. SS AACY01118955

79uncult. marine proteobacterium ANT32C12

77

uncult. bacterium MedeBAC35C0699 uncult. bacterium

uncult. marine gamma proteobacterium EBAC31A0851 uncult. marine alpha proteobacterium HOT2C01

100 uncult. marine bacterium EB0_41B09Photobacterium sp.

50

Env. SS AACY01499050

70

uncult. marine gamma proteobacterium HTCC2207uncult. bacterium eBACred22E04

100 Karlodinium micrum 02K. micrum 01

100 uncult. bacterium eBACmed86H08Pelagibacter ubique

50

Env. SS AACY01015111uncult. marine bacterium 66A03

80

100 GS033_85Vibrio harveyi

76

Rhodobacterales bacteriumuncult. bacterium MedeBAC82F10

55uncult. marine bacterium 66A03 (2)uncult. bacterium MEDPR46A6

Fungal ( H+ / ?)

Halorhodopsins(H+ / Cl–)

Algal opsins(sensory)

Proteorhodopsins(H+)

Figure 1 | Phylogenetic distribution of dinoflagellate rhodopsins. Protein sequences of 96 rhodopsins encompassing the known diversity of microbial (type I) rhodopsins from the three domains of life10 were used to generate a maximum likelihood phylogenetic tree (See Supplementary Table S1 for accession numbers). Grey boxes distinguish the recognized groupings of type 1 rhodopsin, and the prevalent function in each group is shown: proton pumps (H + ), sensory, chlorine pumps (Cl − ) and unknown (?). Numbers indicate bootstrap support when 50% (over 300 replicates). Black boxes highlight dinoflagellate genes: Dinoflagellate 1 is a large group of proteorhodopsins present in diverse dinoflagellates; dinoflagellate 2 includes two K. micrum proteorhodopsin genes of independent origin; dinoflagellate 3 includes two O. marina genes related to algal sensory rhodopsins, probably of endosymbiotic origin.

Page 79: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1188

NATURE COMMUNICATIONS | 2:183 | DOI: 10.1038/ncomms1188 | www.nature.com/naturecommunications

© 2011 Macmillan Publishers Limited. All rights reserved.

SAR-86 31A08Karlodinium micrumOxyrrhis 27Oxyrrhis 5001Alexandrium Pyrocystis lunula‘Amphioxus’Oxyrrhis 2197XR Salinibacter ruber

111111111

757162626262625762

M K L L L I L G SV I A L P T F A A G G G DL DA SD Y T GV SF WL V T A A L L A S TV F F FV ER DR V S AK WK T S LT V S GL V T G I A F WH- - - - M GA PM S S TK PV DN P A DA F L Q P ND GV A I S F WI I S I A M I A A T A F F FA EA ST VK AH WK T T LH V G A L V T LV A G VH- - - - - - - - - - - M A P LA Q DWT YA EW SA V Y NA L S F- G I - A GM G S AT I F FW LQ L P NV T K NY RT A L T I T G I V T L I A T Y H- - - - - - - - - - - M A P LA GD F S YG EW NA V Y NA L S F- G I - A AM G S AT V F FW LQ L P NV T R SY RT A L T I T G I V TW I A T Y H- - - - - - - - - - - M A P I P DG F S YG QW SV V YN A L S F - G I - A A MG SA T I F F WL Q L P N V S K S YR TA L T I T G I V T F I A T YH- - - - - - - - - - - M A P I P DG FT YG QW SL V Y N S L S F- G I - A GM GC AT I F FW LQ L P NV SK SY RT A L T I T G LV T A I A T Y H- - - - - - - - - - - M A P LP EG VT YG QW L A V Y NA L S F- G I - A AM G S CM I F VW LQ MP QV KK QY RT A L A I T G LV V A I A T YH- - - - - - - - - - - M GV HT WS RS EA G S Q E T L FA I - - - - - - - F V I F A I A F L WV L L L SQ Q S K SK KY Y Y VS AA I L A V A A CA- - - - - - - - - - - M LQ E L P T LT PG QY SL V F NM F S FT V- - A T MT A S FV F F V L A R N N VA P K YR I S MM V S A L V V F I A G YH

767263636363635863

131128135134135135136118130

Y MY MR GV W- - - - - - - - - - - - - - - - - - I ET GD SP T V FR Y I DW L L T V P L L I C E FY L I L A A A T N V A - G SL FK K L L V G SY MY MR EY W- - - - - - - - - - - - - - - - - - VQ V H A S P I V Y R Y VDW S I T VP LQ M I E F N L I L K A A GK T T SS AM FW K L L LG TY FR I F N S WV A A FN V G LG V- N G AY EV T V SG T P FN DA Y R YV DW L L T V P L L LV EL I L V M K L P A K E T- V C LAWT LG I A SY FR I F N S WV EA F E VQ EY - - HG AY LV KV SG T P FN DA Y R YV DW L L T V P L L L I E L I L V M K L P S GE T- A A MG T K LG LA SY FR I F N S WV EA FN V T N SG G - G DY T V K L T G AP FN DA Y R YV DW L L T V P L L LV EL I L V M K L P A EQ T - T S MS WK LG FA SY VR I F N S WV DA FK V V NV NG - G DY T V T L L G AP FN DA Y R YV DW L L T V P L L L I E L I L V M K L P K A E T- V K L S WN L G VA SY VR I F N S WN A A FD V T NG G GQ GE YT VK LT GA P F ND A Y R Y VDWL L T VP L L L I E L I L V M G L P A D E T- A S LG W KL GV SSY Y - - FM A W- - GY G I L D N- - - GQ A - - W HT DG KH L F WL RY LDWL I T T PL L L LD L A L L A G LD FW ET - G F- - - - I I L MDY FR I T SS WEA A Y A LQ N- - - - GM Y - - Q P T GE L F ND A Y R Y VDWL L T VP L L T V EL V L V M G L P K N E R- GP LA A K L GF LA

132129136135136136137119131

200197205204206205206184203

L V M L V FG YM G E AG - - I M A - - - AW PA F I I G C L AW VY M I Y E LWA G EG K S AC NT A S P- A V Q S AY NT MM Y I I I F G WA I YV VM L L FG Y L GE I A - - V V P- - - K L I G F I L G MC GW F F I L N E I F LG EA G G TA K D C S E- A I S S A F SN MR L I V T VG WA I YA V M V A L G YP GE I Q DD L S V- - - RW FWWA C A MV P F VY V V GT LV V GL GA A - T A K QP E- GV V DL V SA A RY LT VV SW L T YA V M V A L G YP GE I Q DN L A V - - - RW G WWA L AM I P F F YV V YS L L SG LG EA - T A R QP E- SV GG LV SA A RY LT A V SW L T YA LM V A L G YP GE I Q DD L T V- - - RW VWWG LA M I P F C YV V YE LV V GL ND A - T K RQ A S AT V S S LI S S AR Y L T V I S WC T YA V M V A L G YP GE I Q DD L L V - - - RW FWWA M A M I P FY YV V V TL V N G L SD A - T A KQ P D - S VK SL V V TA RY LT V I SW L T YA LM V A L G YP GE I Q D D N S Q- - - RW I WWA L A M I P F C YV V NT L L V G L S GA - T ER QP A - A A KG L I V K AR Y L T A I S WL T YM LM I T A G Y I GA ST EQ FV - - - - - WQG FG V S MV F F I L V L GY L- - GD GV L- A L D E DS - K NT GT A R NL FW L T V L I W CT YA LM I V LG YP GE V S EN A A L F GT RG LW GF L S T I P F VW I L Y I L F TQ L G D- T I Q R QS S- R V ST L L GN A R L L L LA T W GF Y

201198206205207206207185204

249251262241239262256251273

P VG Y F T G Y- - - - - L MG DG GS A L N L - N L I Y N L A D F VN K I L F GL I I - W NV A V K E SS NA - - - - - - - - - - - - - - - - - - -P LG YV LG M- - - - - M I G S- - E GD V F LN V T YN I A D F VN K I A F V L A C - W SC AK T D SA SK T D A L L P - - - - - - - - - - - - -P FV Y I V K N- - - - - I G L A G ST A T MY EQ I G Y SA A DV T A K A V F GV L I - WA I A N A K SR L E E EG K L RA - - - - - - - - - - - -P FV Y I I K N- - - - - VG LA GP VA T I Y E Q I GY SV A D V V AK LY T V F- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -P FV Y I V K D- - - - - I G L SG PT A T MY EQ V G Y S VA EV V AK G- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -P GV Y I I K S- - - - - M G L A G N I A T T YE QV GY SV A D V V AK AV FG V L I - WA I A A GK SD E E EK NG L L G- - - - - - - - - - - -P FV Y I I K M- - - - - VG L S GA FA T C A E Q I GY S I SD V T A K AV FG V L I - WA I A A A K S ED E- - - - - - - - - - - - - - - - - - -P LY FV L E HT - - - - M G L S- - - - T F Q E I L CY G I SD V L A K VV FA L V LV YN FD D D DE PA V YQ Q Q MV VMQ Q Q QP MV T T I GP I A Y M I P MA FP EA FP SN T P GT I V A L QV GY T I A D V L A K AG YG V L I - Y N I A K A K S E E EG FN V S EM V E PA T A SA - - - -

A B

C D

E F

G

SAR-86 31A08Karlodinium micrumOxyrrhis 27Oxyrrhis 5001Alexandrium Pyrocystis lunula‘Amphioxus’Oxyrrhis 2197XR Salinibacter ruber

SAR-86 31A08Karlodinium micrumOxyrrhis 27Oxyrrhis 5001Alexandrium Pyrocystis lunula‘Amphioxus’Oxyrrhis 2197XR Salinibacter ruber

SAR-86 31A08Karlodinium micrumOxyrrhis 27Oxyrrhis 5001Alexandrium Pyrocystis lunula‘Amphioxus’Oxyrrhis 2197XR Salinibacter ruber

Exterior

Interior

NYVASWEAYTWD

QALPAM

A L S

F G I

A G M

G S A

T I F

F W L

QLP N V T K N Y R

TA

L T I

T G I

V T L

I A T

Y H Y

F R I

FNSWVAA F

NVGL

GVNGA

YEVTV

SG

T PFND

A Y

R Y VD W L L

T V P

L L L

V E LI L VM K L

P A K E

T V C L

A W T

L G I

A S A

V M V

A L G

YPG E I Q D D L S V

RW

F W W

A C A

M V P

F V Y

V V G

T L V

VGLGAAT A K Q P E G V V

DLVSAA

R Y L T

V V S

W L T

Y P F

V Y I

V K N

IGL A G S T A T M Y

EQ

I G Y S

A A D

V T A

K A V

F G V

L I W

AIANAKSRLEEE

G KLRA

101/97

112/108

A B C D E F G

109/105

Helix

Membrane 237/231

Figure 2 | Structural and comparative analysis of a dinoflagellate proteorhodopsin. (a) Amino-acid alignment of various rhodopsins from bacteria and dinoflagellates: SAR86-31A08 is a functionally characterized proteorhodopsin1; K. micrum is a ‘Dinoflagellate group 2’ in Figure 1; Oxyrrhis 2197 represents ‘Dinoflagellate 3’; XR S. ruber is a Xanthorhodopsin; the remaining sequences belong to ‘Dinoflagellate group 1’. Numbers on the right indicate residue number. Black rectangles show predicted transmembrane segments. Functional sites (coloured triangles): blue, proton acceptor and donor; green, spectral tuning (L109/105 for green and Q for blue); red, lysine linked to the cofactor retinal. Epitope for antibody preparation is indicated with an orange rectangle. (b) Secondary structure of the O. marina proteorhodopsin predicted using PHD (http://www.predictprotein.org/). Single-letter amino-acid codes are used, and numbers correspond to the positions in O. marina and 31A081, respectively. Functional residues are highlighted as follows. Blue: proton acceptor (D101/97) and proton donor (E112/108); green: spectral tuning; red: retinal pocket; red filled: lysine linked to retinal; orange: epitope for antibody.

Page 80: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1188

NATURE COMMUNICATIONS | 2:183 | DOI: 10.1038/ncomms1188 | www.nature.com/naturecommunications

© 2011 Macmillan Publishers Limited. All rights reserved.

Rhodopsin function requires the prosthetic group retinal, sug-gesting that retinal biosynthesis should also be present in O. marina. Retinal is one end product of the carotenoid pathway and is formed by splitting -carotene by a carotenoid dioxygenase. Searching the O. marina ESTs revealed the presence of transcripts encoding a pro-tein with high similarity to carotenoid dioxygenase. We conducted a phylogenetic analysis with diverse carotenoid dioxygenase protein sequences, and the result suggests that O. marina acquired this step of retinal biosynthesis, either horizontally from bacteria or verti-cally from an ancestral plastid (Supplementary Fig. S1). Unfortu-nately, the enzyme is not sufficiently well conserved for phyloge-netic reconstruction to be more precise.

Subcellular localization of rhodopsin in O. marina. To deter-mine the subcellular location of the Group 1 proteorhodopsin in O. marina, an antibody was developed against the 14 C-terminal res-idues shared by several highly expressed copies of the gene (Fig. 2). We tested the antibody specificity by western blotting against total O. marina protein, in which it recognized a single, highly abundant band of the predicted size (Fig. 3a). In immunofluorescent subcel-lular localization, proteorhodopsin was distributed unevenly within the cytosol, indicating some form of cytosolic compartmentaliza-tion (Fig. 3b). In many but not all cells, a single distinctive ring-like structure was observed (Fig. 3b, Supplementary Movies 1–4). There was no evidence of localization to the plasma membrane, and co-staining with 4,6-diamidino-2-phenylindole and MitoTracker showed no evidence of nuclear or mitochondrial localization of proteorhodopsin (Fig. 3b). The ancestor of O. marina possessed a plastid18, and relict plastid-targeted protein-coding genes have been found in its nuclear genome15; however, these and other dinoflagel-late plastid-targeted proteins encode a distinctive N-terminal- targeting peptide that includes a secretory signal peptide and

a distinctive transit peptide19. There is no evidence of such a pep-tide on any O. marina rhodopsin gene, arguing against a location in the hypothetical plastid. Overall, the pattern of proteorhodopsin localization is most consistent with compartmentalization in some fraction of the endomembrane system. We looked for sequences consistent with the characteristics of signal peptides, but the results were inconclusive: some algorithms fail to predict a signal peptide of any sort, whereas others predict a secretory pathway with varying degrees of confidence. If a typical signal peptide is absent, the protein would have to be inserted into the endomembrane from the cyto-plasm rather than through the lumen of the endoplasmic reticulum.

DiscussionThe abundance and diversity of rhodopsins in dinoflagellates is unex-pected, given the relatively restricted distribution previously known in eukaryotes, suggesting that rhodopsin perhaps has a variety of sensory or regulatory roles in this lineage. However, the primary structure, localization pattern and expression levels of the Group 1 proteorhodopsin in O. marina all suggest that this protein is a proton pump, highlighting the question of its possible function. Regardless of what this function might be, it must be non-essential in the short term, as it is in prokaryotes, because O. marina can be grown in total darkness. Moreover, the membrane complexity of eukaryotes com-pounds the number of possibilities significantly, although a couple of more plausible explanations can be singled out. On one hand, the protein might operate very much like its prokaryotic counter-parts: if proteorhodopsin is inserted into the same endomembrane compartment as is the vacuolar ATPase (V-ATPase), then its activ-ity would generate a proton gradient that could drive the V-ATPase in reverse, generating energy from light. On the other hand, in a voracious predator such as O. marina, one of the expected functions of V-ATPase is to acidify digestive vacuoles at the expense

DIC Rhodopsin MitoTracker DAPI Merge

260

60

40

30

20

15

Figure 3 | Cellular localization of proteorhodopsin in O. marina cells. (a) Western blot of total O. marina protein probed with an antibody raised against the C-terminal peptide of the proteorhodopsin OM27 from O. marina. Expected protein size is 28 kDa. (b) Localization of proteorhodopsin in O. marina cells using immunofluorescence assay with the same antibody. Antibody signal forms small irregular and ring-like structures independent of mitochondria in O. marina. Three independent cells are shown, each showing (left to right) differential interference contrast (DIC) light micrograph, anti-OM27 proteorhodopsin, MitoTracker staining, Hoechst 33258 staining for DNA and a merge of all four. White bar = 10 m. See Supplementary Movies 1–4 for a 360° video rendering. DAPI, 4,6-diamidino-2-phenylindole.

Page 81: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

1Primer

The Light-Driven Proton Pump ProteorhodopsinEnhances Bacterial Survival during Tough TimesEdward F. DeLong1,2*, Oded Beja3*

1 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, 2 Department of

Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, 3 Faculty of Biology, Technion - Israel Institute of

Technology, Technion City, Haifa, Israel

Some microorganisms contain proteins that can interact withlight and convert it into energy for growth and survival, or intosensory information that guides cells towards or away from light.The simplest energy-harvesting photoproteins are the rhodopsins,which consist of a single, membrane-embedded protein covalentlybound to the chromophore retinal (a light-sensitive pigment) [1].One class of archaeal photoproteins (called bacteriorhodopsin) wasshown to function as a light-driven proton pump, generatingbiochemical energy from light [2,3]. For many years, these light-driven proton pumps were thought to be found only in relativelyobscure Archaea living in high salinity.

Ten years ago, a new type of microbial rhodopsin (proteorho-dopsin) was discovered in marine planktonic bacterial assemblages[4]. This proteorhodopsin was co-localized on a large genomefragment containing the small subunit ribosomal RNA gene,which identified its genomic source—an uncultured gammapro-teobacterium (of the SAR86 bacterial lineage). Further workshowed that proteorhodopsin could be expressed in Escherichia coli,producing light-driven proton pumping, with photocycle kineticssimilar to archaeal bacteriorhodopsins [4]. Soon, proteorhodop-sins were detected in other populations of planktonic marinebacteria. These photoproteins were specifically ‘‘tuned’’ to absorbthe wavelengths of light found in the surrounding environment—green light in surface waters and blue light at greater depths in thewater column [5].

Work over the past decade shows that proteorhodopsins andrelated light-driven proton pumps are not an exception, but ratherthe rule for microbial inhabitants of sunny environments like theupper ocean (see Table 1 and Box 1). Based on genomic surveydata, proteorhodopsins occur in an estimated 13% to 80% ofmarine bacteria and archaea in oceanic surface waters [7,8].Given that microbial cell densities can approach one billionmicrobes per litre, the potential influence of proteorhodopsin-based light-driven energy flux in ocean ecosystems is significantbut still difficult to quantify directly. Proteorhodopsins addsubstantially to the list of phototrophs (organisms that can uselight as an energy source) known to inhabitate ocean surfacewaters, such as oxygenic and anoxygenic chlorophyll-basedphototrophs [6,7].

The proteorhodopsin proteins retain their native structure andfunction in E. coli membranes, an ability not shared by their closehomologues, bacteriorhodopsins. Therefore, proteorhodopsinsexpressed in E. coli are very useful for probing rhodopsin structureand function [7,18–21]. The relative ease of working withproteorhodopsins in E. coli helped to dissect their light-dependentproton pump activity using definitive biophysical assays [19,22–25]and their potential role in phototrophy [17,18] (see Figure 2 for anartist’s rendition of the fundamental arrangement of proteorho-dopsin in the cell membrane). However, since all this work relied onthe heterologous E. coli system, the specific physiological roles andadaptive strategies of native marine bacteria that contain proteor-hodopsin still needs to be better described.

Cultivation-independent genomic surveys (e.g., ‘‘metage-nomics’’) revealed proteorhodopsin presence and diversity, andheterologous expression in E. coli demonstrated many of itsfunctional properties. Access to cultivable marine bacteria thatcontain proteorhodopsin, however, would be very useful to furthercharacterize its native function in diverse physiological contexts.

Whole-genome sequencing then came to the rescue. TheGordon and Betty Moore Foundation (GBMF) Microbial GenomeSequencing Project (http://www.moore.org/microgenome) unex-pectedly revealed that many culturable marine bacteria submittedfor sequencing (including Pelagibacter spp., Vibrio spp., andFlavobacteria isolates) in fact possessed proteorhodopsin genes(Table 1). What can these proteorhodopsin-containing isolates tellus? Experiments with the proteorhodopsin-containing isolate‘Cand. P. ubique’ (a member of the SAR11 bacterial lineage, themost abundant bacterial group in the ocean [19,20]) showed nosignificant light enhancement of growth rate or yield [11]. Later,however, Gomez-Consarnau and colleagues [13] showed light-dependent growth at low-carbon concentrations in marineflavobacteria. However since these flavobacteria are difficult tomanipulate genetically, a direct relationship between the proteor-hodopsin protein and light-dependent growth could not bedefinitively shown.

Now, a new study by the same research group led by JaronePinhassi, published in this issue of PLoS Biology [16], definitivelydemonstrates one role for proteorhodopsin in a light-dependentadaptive strategy. In a series of experiments, they showed thatmarine Vibrio cells survived starvation much better in the light thanin the dark. Since vibrios are amenable to genetic manipulation,Gomez-Consarnau et al. could construct a strain where theproteorhodopsin gene was deleted. In the absence of proteorho-dopsin, light-dependent starvation survival was abolished, andthen restored when proteorhodopsin was supplied in trans.

Gomez-Consarnau et al. have conclusively demonstrated for thefirst time at least one specific physiological role for proteorho-dopsin in a native marine bacterium [16]. Considering theabundance of proteorhodopsins (estimated to be present in 80% ofmarine bacteria in some waters [7]) and the possibility of lateralgene transfer [12], the potential influence of even this one adaptive

Citation: DeLong EF, Beja O (2010) The Light-Driven Proton Pump Proteorho-dopsin Enhances Bacterial Survival during Tough Times. PLoS Biol 8(4): e1000359.doi:10.1371/journal.pbio.1000359

Published April 27, 2010

Copyright: ! 2010 DeLong, Beja. This is an open-access article distributedunder the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided theoriginal author and source are credited.

Funding: No specific funding was received for this article.

Competing Interests: The authors have declared that no competing interestsexist.

* E-mail: [email protected] (EFD); [email protected] (OB)

PLoS Biology | www.plosbiology.org 1 April 2010 | Volume 8 | Issue 4 | e1000359

http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000359#pbio-1000359-g002

Page 82: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Table 1. Marine bacterial isolates and genome fragments containing proteorhodopsins.

Organism Strain General Group Reference

Genomes

Methylophilales HTCC2181 Betaproteobacteria GBMF

Rhodobacterales sp. HTCC2255 Alphaproteobacteria GBMF

Vibrio angustum S14 Gammaproteobacteria GBMF

Photobacterium SKA34 Gammaproteobacteria GBMF

Vibrio harveyi ATCC BAA-1116 Gammaproteobacteria GenBank # CP000789

Marine gamma HTCC2143 Gammaproteobacteria GBMF

Marine gamma HTCC2207 Gammaproteobacteria GBMF

Cand. P. ubique HTCC1002 Alphaproteobacteria GBMF

Cand. P. ubique HTCC1062 Alphaproteobacteria [26]

Rhodospirillales BAL199 Alphaproteobacteria GBMF

Marinobacter ELB17 Gammaproteobacteria GBMF

Vibrio campbelli AND4 Gammaproteobacteria GBMF

Vibrio angustum S14 Gammaproteobacteria GBMF

Dokdonia donghaensis MED134 Flavobacteria GBMF

Polaribacter dokdonensis MED152 Flavobacteria GBMF

Psychroflexus ATCC700755 Flavobacteria GBMF

Polaribacter irgensii 23-P Flavobacteria GBMF

Flavobacteria bacterium BAL38 Flavobacteria GBMF

BACs and fosmids

HF10_05C07 Proteobacteria [24]

HF10_45G01 Proteobacteria [24]

HF130_81H07 Gammaproteobacteria [24]

EB0_39F01 Alphaproteobacteria [24]

EB0_39H12 Proteobacteria [24]

EB80_69G07 Alphaproteobacteria [24]

EB80_02D08 Gammaproteobacteria [24]

EB0_35D03 Proteobacteria [24]

EB0_49D07 Proteobacteria [24]

EBO_50A10 Gammaproteobacteria [24]

EB0_55B11f Alphaproteobacteria [24]

EBO_41B09 Betaproteobacteria [24]

HF10_19P19 Proteobacteria [17]

HF10_25F10 Proteobacteria [17]

HF10_49E08 Planctomycetes [24]

HF10_12C08 Alphaproteobacteria [24]

HF10_29C11 Euryarchaea [24]

MED13K09 unknown [10]

MED18B02 unknown [10]

MED35C06 unknown [10]

MED42A11 unknown [10]

MED46A06 unknown [10]

MED49C08 unknown [10]

MED66A03 unknown [10]

MED82F10 unknown [10]

MED86H08 unknown [10]

RED17H08 unknown [10]

RED22E04 unknown [10]

eBACHOT4E07 Gammaproteobacteria [25]

EBAC20E09 Gammaproteobacteria [25]

PLoS Biology | www.plosbiology.org 2 April 2010 | Volume 8 | Issue 4 | e1000359

strategy on marine bacterioplankton could be substantial given the‘‘feast or famine’’ existence experienced by many of thesemicrobes. The Vibrio/proteorhodopsin model system is likely toreveal further secrets on the nature and function of proteorho-dopsin photosytems in bacteria that are usually (but erroneously)considered as strict heterotrophs not capable of utilizing light at all.

While this study [16] adds an important new result, it certainlydoes not solve the whole puzzle of proteorhodopsin photophysiol-ogy. Considering the staggering variety of genetic, physiologicaland environmental contexts in which proteorhodopsin and relatedphotoproteins are found, a great variety of light-dependentadaptive strategies are likely to occur in the natural microbialworld. For example, in 2005, a new type of bacterial rhodopsin

(xanthorhodopsin) was discovered in the salt-loving bacteriumSalinibacter ruber [28]. Xanthorhodopsin is a proton-pumpingretinal protein/carotenoid complex in which the carotenoid

Box 1. A Decade of ProteorhodopsinMilestones

2000 NFirst proteorhodopsin gene found in unculturedSAR86 using metagenomics; proteorhodopsinlight-driven proton pump activity confirmed inheterologous E. coli cells [4].

2001 NProteorhodopsin presence confirmed directly inthe ocean using laser flash photolysis [5].

2003 NProteorhodopsin genes also found in otherbacterial groups [8].

2004 NEnormous diversity of proteorhodopsin genesfound in the Sargasso Sea using metagenomics [9].

2005 NRetinal biosynthesis pathways found in metage-nomic data and confirmed using E. coli cells [10].NProteorhodopsin genes are found in ‘CanditatusPelagibacter ubique’ (SAR11), the most abundantbacterium on earth; environmental SAR11 proteor-hodopsin presence confirmed using metaproteo-mics [11].

2006 NProteorhodopsin genes found in unculturedmarine Archaea [12].

2007 NFirst indication of proteorhodopsin light-depen-dent growth in cultured Flavobacteria [13] (seeFigure 1 for colony morphologies and pigmenta-tion).

2008 NProteorhodopsin genes found in non-marineenvironments [14,15].

2010 NProteorhodopsin phototrophy directly confirmedusing a genetic system in marine Vibrio sp. [16]

Organism Strain General Group Reference

HOT2C01 unknown [8]

EBAC31A08 Gammaproteobacteria [4]

ANT32C12 unknown [8]

HF70_39H11_ArchHighGC unknown [12]

HF10_3D09_mediumGC unknown [12]

HF70_19B12_highGC unknown [12]

HF70_59C08 unknown [12]

Marine microbial isolates and large genome fragments from the environment GBMF, microbial genomes sequenced as part of the Gordon and Betty Moore Foundationmicrobial genome sequencing project (http://www.moore.org/microgenome), found to encode proteorhodopsin genes. The list includes whole genome sequencesfrom a wide array of cultivated marine microorganisms (Genomes), as well as cloned large DNA fragments (BACs and fosmids) recovered directly from the environment.doi:10.1371/journal.pbio.1000359.t001

Table 1. Cont.

Figure 1. Various colony morphologies and coloration ofdifferent proteorhodopsin-containing bacteria used to studyproteorhodopsin phototrophy. From top to bottom, the flavobac-terium Polaribacter dokdonensis strain MED152 used to show proteor-hodopsin light stimulated growth [13]; the flavobacterium Dokdoniadonghaensis strain MED134 used to show proteorhodopsin lightstimulated CO2-fixation [23]; and Vibrio strain AND4 used to showproteorhodopsin phototrophy [16]; note the lack of detectablepigments in Vibrio strain AND4. However, when these vibrio cells arepelleted, they do show a pale reddish color, which is the result ofproteorhodopsin pigments presence in their membranes. Photos arecourtesy of Jarone Pinhassi.doi:10.1371/journal.pbio.1000359.g001

PLoS Biology | www.plosbiology.org 3 April 2010 | Volume 8 | Issue 4 | e1000359

Page 83: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

strategy on marine bacterioplankton could be substantial given the‘‘feast or famine’’ existence experienced by many of thesemicrobes. The Vibrio/proteorhodopsin model system is likely toreveal further secrets on the nature and function of proteorho-dopsin photosytems in bacteria that are usually (but erroneously)considered as strict heterotrophs not capable of utilizing light at all.

While this study [16] adds an important new result, it certainlydoes not solve the whole puzzle of proteorhodopsin photophysiol-ogy. Considering the staggering variety of genetic, physiologicaland environmental contexts in which proteorhodopsin and relatedphotoproteins are found, a great variety of light-dependentadaptive strategies are likely to occur in the natural microbialworld. For example, in 2005, a new type of bacterial rhodopsin

(xanthorhodopsin) was discovered in the salt-loving bacteriumSalinibacter ruber [28]. Xanthorhodopsin is a proton-pumpingretinal protein/carotenoid complex in which the carotenoid

Box 1. A Decade of ProteorhodopsinMilestones

2000 NFirst proteorhodopsin gene found in unculturedSAR86 using metagenomics; proteorhodopsinlight-driven proton pump activity confirmed inheterologous E. coli cells [4].

2001 NProteorhodopsin presence confirmed directly inthe ocean using laser flash photolysis [5].

2003 NProteorhodopsin genes also found in otherbacterial groups [8].

2004 NEnormous diversity of proteorhodopsin genesfound in the Sargasso Sea using metagenomics [9].

2005 NRetinal biosynthesis pathways found in metage-nomic data and confirmed using E. coli cells [10].NProteorhodopsin genes are found in ‘CanditatusPelagibacter ubique’ (SAR11), the most abundantbacterium on earth; environmental SAR11 proteor-hodopsin presence confirmed using metaproteo-mics [11].

2006 NProteorhodopsin genes found in unculturedmarine Archaea [12].

2007 NFirst indication of proteorhodopsin light-depen-dent growth in cultured Flavobacteria [13] (seeFigure 1 for colony morphologies and pigmenta-tion).

2008 NProteorhodopsin genes found in non-marineenvironments [14,15].

2010 NProteorhodopsin phototrophy directly confirmedusing a genetic system in marine Vibrio sp. [16]

Organism Strain General Group Reference

HOT2C01 unknown [8]

EBAC31A08 Gammaproteobacteria [4]

ANT32C12 unknown [8]

HF70_39H11_ArchHighGC unknown [12]

HF10_3D09_mediumGC unknown [12]

HF70_19B12_highGC unknown [12]

HF70_59C08 unknown [12]

Marine microbial isolates and large genome fragments from the environment GBMF, microbial genomes sequenced as part of the Gordon and Betty Moore Foundationmicrobial genome sequencing project (http://www.moore.org/microgenome), found to encode proteorhodopsin genes. The list includes whole genome sequencesfrom a wide array of cultivated marine microorganisms (Genomes), as well as cloned large DNA fragments (BACs and fosmids) recovered directly from the environment.doi:10.1371/journal.pbio.1000359.t001

Table 1. Cont.

Figure 1. Various colony morphologies and coloration ofdifferent proteorhodopsin-containing bacteria used to studyproteorhodopsin phototrophy. From top to bottom, the flavobac-terium Polaribacter dokdonensis strain MED152 used to show proteor-hodopsin light stimulated growth [13]; the flavobacterium Dokdoniadonghaensis strain MED134 used to show proteorhodopsin lightstimulated CO2-fixation [23]; and Vibrio strain AND4 used to showproteorhodopsin phototrophy [16]; note the lack of detectablepigments in Vibrio strain AND4. However, when these vibrio cells arepelleted, they do show a pale reddish color, which is the result ofproteorhodopsin pigments presence in their membranes. Photos arecourtesy of Jarone Pinhassi.doi:10.1371/journal.pbio.1000359.g001

PLoS Biology | www.plosbiology.org 3 April 2010 | Volume 8 | Issue 4 | e1000359

Page 84: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

Pretty proteorhodopsinstrategy on marine bacterioplankton could be substantial given the‘‘feast or famine’’ existence experienced by many of thesemicrobes. The Vibrio/proteorhodopsin model system is likely toreveal further secrets on the nature and function of proteorho-dopsin photosytems in bacteria that are usually (but erroneously)considered as strict heterotrophs not capable of utilizing light at all.

While this study [16] adds an important new result, it certainlydoes not solve the whole puzzle of proteorhodopsin photophysiol-ogy. Considering the staggering variety of genetic, physiologicaland environmental contexts in which proteorhodopsin and relatedphotoproteins are found, a great variety of light-dependentadaptive strategies are likely to occur in the natural microbialworld. For example, in 2005, a new type of bacterial rhodopsin

(xanthorhodopsin) was discovered in the salt-loving bacteriumSalinibacter ruber [28]. Xanthorhodopsin is a proton-pumpingretinal protein/carotenoid complex in which the carotenoid

Box 1. A Decade of ProteorhodopsinMilestones

2000 NFirst proteorhodopsin gene found in unculturedSAR86 using metagenomics; proteorhodopsinlight-driven proton pump activity confirmed inheterologous E. coli cells [4].

2001 NProteorhodopsin presence confirmed directly inthe ocean using laser flash photolysis [5].

2003 NProteorhodopsin genes also found in otherbacterial groups [8].

2004 NEnormous diversity of proteorhodopsin genesfound in the Sargasso Sea using metagenomics [9].

2005 NRetinal biosynthesis pathways found in metage-nomic data and confirmed using E. coli cells [10].NProteorhodopsin genes are found in ‘CanditatusPelagibacter ubique’ (SAR11), the most abundantbacterium on earth; environmental SAR11 proteor-hodopsin presence confirmed using metaproteo-mics [11].

2006 NProteorhodopsin genes found in unculturedmarine Archaea [12].

2007 NFirst indication of proteorhodopsin light-depen-dent growth in cultured Flavobacteria [13] (seeFigure 1 for colony morphologies and pigmenta-tion).

2008 NProteorhodopsin genes found in non-marineenvironments [14,15].

2010 NProteorhodopsin phototrophy directly confirmedusing a genetic system in marine Vibrio sp. [16]

Organism Strain General Group Reference

HOT2C01 unknown [8]

EBAC31A08 Gammaproteobacteria [4]

ANT32C12 unknown [8]

HF70_39H11_ArchHighGC unknown [12]

HF10_3D09_mediumGC unknown [12]

HF70_19B12_highGC unknown [12]

HF70_59C08 unknown [12]

Marine microbial isolates and large genome fragments from the environment GBMF, microbial genomes sequenced as part of the Gordon and Betty Moore Foundationmicrobial genome sequencing project (http://www.moore.org/microgenome), found to encode proteorhodopsin genes. The list includes whole genome sequencesfrom a wide array of cultivated marine microorganisms (Genomes), as well as cloned large DNA fragments (BACs and fosmids) recovered directly from the environment.doi:10.1371/journal.pbio.1000359.t001

Table 1. Cont.

Figure 1. Various colony morphologies and coloration ofdifferent proteorhodopsin-containing bacteria used to studyproteorhodopsin phototrophy. From top to bottom, the flavobac-terium Polaribacter dokdonensis strain MED152 used to show proteor-hodopsin light stimulated growth [13]; the flavobacterium Dokdoniadonghaensis strain MED134 used to show proteorhodopsin lightstimulated CO2-fixation [23]; and Vibrio strain AND4 used to showproteorhodopsin phototrophy [16]; note the lack of detectablepigments in Vibrio strain AND4. However, when these vibrio cells arepelleted, they do show a pale reddish color, which is the result ofproteorhodopsin pigments presence in their membranes. Photos arecourtesy of Jarone Pinhassi.doi:10.1371/journal.pbio.1000359.g001

PLoS Biology | www.plosbiology.org 3 April 2010 | Volume 8 | Issue 4 | e1000359

Page 85: Microbial Phylogenomics (EVE161) Class 14: Metagenomics

salinixanthin functions as a light-harvesting antenna, transferringenergy to the rhodopsin/retinal complex [21]. It was recentlysuggested that the ability of rhodopsins to bind salinixanthindepends on a single glycine amino acid [30], suggesting that otherrecently identified retinal proteins (e.g., proteorhodopsins) mightalso interact with carotenoid antennas, since some possess theidentical homologous glycine residue.

What questions remain to be tackled for the second decade ofresearch on proteorhodopsin photophysiology? Maintenance ofenergy charge during respiratory stress or starvation, the most likelyphysiological mechanism explaining the results of Gomez-Con-sarnau et al. [16], is just one example of a life history strategybenefitting from proteorhodopsin. As Martinez and colleaguespointed out [17], in different physiological, ecological, phylogenetic,and genomic contexts, proteorhodopsin activity may benefitmicrobes in a variety of ways. Besides producing ATP from thelight-generated proton gradient, flagellar motility and activetransport of solutes into or out of the cell can make use of theproton motive force generated by proteorhodopsins [17]. Hetero-trophs adapted to either high or low nutrient concentrations areknown to contain and express proteorhodopsins. Whether highversus low nutrient adapted bacteria exhibit life-style–specificpatterns of proteorhodopsin photophysiology remains to bedetermined. Already it seems clear that two different high-nutrient–utilizing bacteria containing proteorhodopsin (vibrios andflavobacteria) exhibit fundamentally different light-dependent

growth strategies [13,16]. Understanding the diversity of interactionsamong proteorhodopsin-containing organisms in natural communi-ties represents yet another layer of complexity [22]. Finally,obtaining quantitative estimates of the total contribution ofproteorhodopsin photosystems to the overall energy flux inmicrobial food webs is a worthy goal, but extremely challenging.For chlorophyll-based oxygenic photosythesizers, fluorescence-basedassays, carbon dioxide uptake experiments, and oxygen evolutionmeasurements to constrain energy garnered from sunlight arereadily available. In contrast, light-dependent activity assays are notsimple, nor straightforwardly interpretable for proteorhodopsin-containing microorganisms. The dizzying array of phylogenetic andphysiological contexts in which proteorhodopsins are found (Table 1)also confounds any simple, universal approaches for quantifyingtheir energetic contributions in situ. Nevertheless, the future is brightfor both basic understanding as well as technological applications ofproteorhodopsins and the microbes that contain them. Theincreasing availability of cultivable and readily manipulated modelsystems, along with increasingly more sophisticated in situ studies inthe environment, promise to shed further light on the structure,function, and ecological significance of these ubiquitous andfascinating photoproteins.

Acknowledgments

We wish to thank John Spudich for commenting on the manuscript and JayMcCarren for help in preparing Table 1.

Figure 2. An artist’s rendition of the fundamental arrangement of proteorhodopsin in the cell membrane. Left panel: a cartoon (not toscale) of planktonic bacteria in the ocean water column. Right panel: a simple view of one potential proteorhodopsin energy circuit. (1)Proteorhodopsin – uses light energy to translocate protons across the cell membrane. (2) Extracellular protons – the excess extracellular protonscreate a proton motive force, that can energetically drive flagellar motility, transport processes, or ATP synthesis in the cell. (3) Proton-translocatingATPase – a multi-protein membrane-bound complex that can utilize the proton motive force to synthesize 5. Adenosine triphosphate (ATP, a centralhigh energy biochemical intermediate for the cell) from 4. Adenosine triphosphate (ADP, a lower energy biochemical intermediate). Illustration byKirsten Carlson, ! MBARI 2001.doi:10.1371/journal.pbio.1000359.g002

PLoS Biology | www.plosbiology.org 4 April 2010 | Volume 8 | Issue 4 | e1000359