evolution of transcription factors from selfish elements: the tale of rcs1, a global regulator of...
TRANSCRIPT
Evolution of transcription factors from selfish elements: The tale of Rcs1, a global regulator of cell size in yeast
MRC Laboratory of Molecular BiologyCambridge
MRC Laboratory of Molecular BiologyCambridge
M. Madan BabuM. Madan Babu
Overview of researchEvolution of biological systems
Evolution of transcriptional networks Evolution of networks within and across genomes
Nature Genetics (2004) J Mol Biol (2006a)
Evolution of transcription factors
Nuc. Acids. Res (2003)
Discovery of novel DNA binding proteins
Data integration, function prediction and classification
Nuc. Acids. Res (2005) Cell Cycle (2006)
C
C
H
H
Discovery of transcription factors in Plasmodium
Evolution of global regulatory hubs
Structure and dynamics of transcriptional networks
Structure and function of biological systems Uncovering a distributed architecture in networks
Methods to study network dynamics
J Mol Biol (2006b) J Mol Biol (2006c) Nature (2004)
A fundamental developmental process we are interested
in understanding is theregulation of cell size
Rcs1: DNA binding domain not known
Reasons why we became interested in Rcs1
What is the DNA binding domain in Rcs1?
Transcriptional regulatory network in yeast
123 41 314
Aft2p Rcs1p
Number of target genes regulated
Sub-network of Rcs1 and Aft2
How did Rcs1 and its paralog Aft2, which are two global regulators, evolve?
Rcs1: regulator of cell size
Micrographs and data from SCMD
Roundness of mother cell
1.291.20
We find that the following parameters that were used to define cell-sizewere at least 2 Standard deviation (2 ) from the mean values of the wild-type
Mother cell-size
874760
Contour length of mother cell
108100
Long axis length of mother cell
3633
Short axis length of mother cell
3027
S. cerevisiae - wild type S. cerevisiae - Rcs1 mutantSize of mutant cells are
twice that of the parental strain
The critical size for budding in the mutant is
similarly increased
Rcs1 binds specific DNA sequences
Outline: Data integration to infer function & evolution
Sequence analysis to identify members and distant homologs
Structural analysis to infer function and distant homologs
Cladistic analysis to group proteins into families and infer relationship
Domain context analysis to infer function of individual members
Comparative genomics and phylogenetic analysis to infer evolution of the family
Expression data and network analysis to infer spatio-temporal behaviour
Relationship to WRKY DNA binding domain – Sequence analysis I
Non-redundant database
+
...
.
Lineage specific expansion in several fungi and is seen in lower eukaryotes
Candida albicans (ascomycete)Yarrowia lipolytica (ascomycete)Ustilago maydis (basidiomycete)Cryptococcus sp (basidiomycetes)E. cuniculi (microsporidia)
Giardia lamblia (diplomonad)Dictyostelium discoideumEntamoeba histolytica
Rcs1
Profiles + HMMof this region
Non-redundant database
+
WRKY domain(Arabidopsis)
FAR-1 type transposase(Medicago truncatula)
Globular region maps to WRKY DNA-binding domain
Non-redundant Database & PDB
+
WRKY DNA-bindingDomain fromArabidopsis
WRKY4
Rcs1(S. cerevisiae)
Gcm1 (Mouse)PEB-1 (C. elegans)
WRKY maps to the same globular region, Gcm1 & FLYWCH
Confirmation of relationship to WRKY DBD – Sequence analysis II
Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain
Multiple sequence alignment of all globular
domains
JPRED/PHD
Sequence of secondary structure is similar to the WRKY DNA-binding domainand GCM1 protein seen in mouse
S1 S2 S3 S4
Characterization of the globular domain – structural analysis I
A. thaliana transcription factor(WRKY4:1wj2:NMR structure)
S1 S2 S3
S1 S2 S3
Predicted SS of Rcs1 DBD
SS of WRKY4
S4
S4 S1 S2 S3
S1 S2 S3
Predicted SS of Rcs1 DBD
SS of GCM1
S4
S4
Mus musculus Glial Cell Missing - 1(GCM-1:1odh:X-ray structure)
Both WRKY and GCM1 have similar network of stabilizing interactions
Template structure
Characterization of the globular domain – structural analysis II
S1 S2 S3
4 residues involved in metal co-ordination and10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone
in the four strands of the GCM1-WRKY domainshow a strong pattern of conservation.
S4
Core fold of the Rcs1 DBDwill be similar to the WRKY-GCM1
domain and may bind DNA in a similar way
Outline: Data integration to infer function & evolution
Sequence analysis to identify members and distant homologs
Structural analysis to infer function and distant homologs
Cladistic analysis to group proteins into families and infer relationship
Domain context analysis to infer function of individual members
Comparative genomics and phylogenetic analysis to infer evolution of the family
Expression data and network analysis to infer spatio-temporal behaviour
Classification of WRKY-GCM1 superfamily – Cladistic analysis I
S1 S2 S3 S4
S1 S2 S3S4
C
C
H
H
Zn2+
Template structure
+
S1 S2 S3S4
C
C
H
C
Zn2+
HxC containing version (HxC)
HxC instead of HxHN-terminal helixShort insert between S2 & S3
HxC
S1 S2 S3S4
CH
H
Zn2+
N-terminal helixConserved W in S4Large insert between S2 & S3
Insert containingversion (I)
W
C
I Rcs1Far1
S1 S2 S3S4
C
C
H
H
Zn2+
FLYWCH domain(F)
Conserved W in S2Sequence features
W
F Mdg
S1 S2 S3S4
CH
H
Zn2+
Insertion of Zn ribbon between S2 and S3
GCM domain(G)
C
G Gcm1
... > 4500 proteins
from over 450
genomes
S1 S2 S3S4
C
C
H
H
Zn2+
Classical WRKY (C)
WRKY motif in S1Short loop between S2 & S3
C WRKY4
Domain context for the different families – Domain network analysis I
S1 S2 S3S4
C
C
H
H
Zn2+
Classical WRKY (C)
C
e.g. WRKY4
C C
Tan
dem
Stan
dal
one
Zn
clus
ter
S1 S2 S3S4
CH
H
Zn2+
Insert containingversion (I)
W
C
e.g. Rcs1
e.g. Far1
I I
I
Tan
dem
Stan
dal
one
MU
LE
Tpa
se
OT
Upr
otea
seSM
BD
Znkn
uckl
e
S1 S2 S3S4
C
C
H
C
Zn2+
HxC containing version (HxC)
HxC
MU
LE
Tpa
se
Mob
ile
elem
ent
Stan
dal
one
HxC
e.g. 101.t00020
e.g. At2g23500
S1 S2 S3S4
C
C
H
H
Zn2+
FLYWCH domain(F)
W
e.g. Mod (mdg)
F
BE
Dfi
nger
Stan
dal
one
PO
Z
F
S1 S2 S3S4
CH
H
Zn2+
GCM domain(G)
C
G
G
Stan
dal
one
e.g. Gcm1
WRKY is seen both in transcription factors and transposases
Phyletic distribution – Comparative genome analysis I
GC HxC I F
TF o
nly
TF o
nly
TF +
TP
Human
Fly
Worm
Fungi
Plants
Entamoeba
Slime mould
Plants
Lowereukaryotes
Fungi
HigherEukaryotes
GCM1 and FLYWCH versionsevolved from an insert containingversion that is a transposase
Classical version of the WRKYevolved from an insert containingversion that is a transposase
HxC and Insert containing versionsare seen as both transcription factorsand as transposases only in fungi e.g. Rcs1
Domain context and phyletic analysis suggests that transcription factors could have evolved from transposases
Transcription factor
Transposase
Comparative genomics using >30 different fungal genomes provides convincing evidence
Evolutionary relationship of the insert containing WRKY domains
TFs have evolved from TPsin multiple instances within fungi
Rcs1Aft2
Recent duplication event within Saccharomycetales has resulted
in two hubs Independent duplicationin candida
MULE TransposaseInsert-WRKY
Insert-WRKY
Subsequently recruited as transcription factors by the host
Functional transition in evolutioncaptured by genomic studies
MULE TransposaseMULE Selfishelements in Yarrowia
are seen as standalone ORFs & can regulate their own expression
Insert-WRKY
Transposases have been recruited to become developmentally important
global regulatory proteins in all the three eukaryotic kingdoms of life
WRKY domain is seen in developmentally important proteins
Classical type WRKY has expanded in plants and are expressed in a tissue specific
manner across all developmental stages
RootStem Leaf
ApexFlower
Floral
organs Seeds
Plants
Insert containing WRKY domainshave been recruited to be regulators of
cell size and morphology in yeast
Fungi
GCM1 and FLYWCH type WRKYdomains have been recruited inthe differentiation of stem-cells
Animals
Conclusion
Integration of different types of publicly experimental data allowed us to identify that Rcs1 and several other developmentally important proteins in
different lineages contains a WRKY-type DNA binding domain
Sequence Structure Expression InteractionCladistics &phylogenetics
Data integration allowed us to elucidate that developmentally important transcription factors in the different lineages have evolved from
transposases
Acknowledgements
S Balaji
Lakshminarayan Iyer
National Center for Biotechnology InformationNational Institutes of Health
L Aravind
Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger
S1 S2 S3 S4
C
C
HZn2+
H
ZnC
C
C
C
S1 S2 S3S4
C
C
H
H
Zn2+
WRKY (1wj2)
GCM-type WRKY(1odh)
S1 S2 S3
CC
H
HZn2+
S4S1 S2 H1
CC
H
HZn2+
Bed-finger(2ct5)
Classical Zn-finger(1m36)
Rcs1 regulates genes involved in metal ion transport, specifically ironSiderophore transportCu ion homeostasis
Vacuolar protein catabolism
Intracellular transportVesicle mediated transportGolgi vesicle transportMembrane fusionSecretory pathway
Aft2 regulates genes involved in metal ion transport, again specifically ironIron homeostasisCu ion homeostasis
Vacuolar protein catabolism
Co-factor synthesisVitamin B6 biosynthesisPyridoxine metabolismThiamin biosynthesis
Common targets include:
Genes involved in metal ion transport, again specifically ironIron homeostasisCu ion homeostasis
Vacuolar protein catabolism
Aft2 (171 genes) Rcs1 (381 genes)
Common targets (41 genes)
Ciliates
Apicomplexa
WRKY domain
GCM-type WRKY
MudR transposase
Plant specificZn-cluster
SWIMdomain
POZ
Giardia lamblia
GLP_79_64671_67418_Glam_71077115)
GLP_9_36401_35940_Glam_71071693)
Fungi
mutA_Ylip_49523824
AFT2_Scer_6325054
Encephalitozoon cuniculi
Dictyostelium discoideum
Entamoebahistolytica
101.t00020_Ehis_67474280
dd_03024_Ddis_28829829
ECU05_0180_Ecun_19173554
Caenorhabditis elegans
C26E6.2_Cele_32565510
T24C4.2_Cele_17555262
mod(mdg4)_Dmel_24648712
LOC411361_Amel_66547010
CG13845_Dmel_24649011
Homo sapiens
Drosophilamelanogaster
Animals
1- 5
LOC374920_Hsap_27694337
C20orf164_Hsap_13929452
KIAA1552_Hsap_10047169
hGCMa_Hsap_1769820
gcm_Dmel_17137116
FLYWCH-type WRKY
Zincknuckle
BEDfinger
NtEIG-D48_Ntab_10798760
Plants
TTR1_Atha_30694675
WRKY41_Osat_46394336
WRKY58_Atha_22330782
At2g34830_Atha_27754312
FAR1_Atha_18414374AT4g19990_Atha_7268794
LOC_Os11g31760_Osat_77551147
At2g23500_Atha_3242713
**
Plant specificN-all-beta
TIRdomain
LRR
DUF1723
STANDATPAse
Domain architectures of WRKY-GCM1 domain proteins
60 W
RK
Y do
mai
n co
ntai
ning
pro
tein
s15
Far
1-ty
pe
prot
eins
40 H
xC ty
pe W
RK
Ydo
mai
n pr
otei
ns5
WR
KY
dom
ain
Pro
tein
s w
ith
TIR
/LR
R
+
60 W
RK
Y do
mai
n co
ntai
ning
pro
tein
s15
Far
1-ty
pe
prot
eins
40 H
xC ty
pe W
RK
Ydo
mai
n pr
otei
ns5
WR
KY
dom
ain
Pro
tein
s w
ith
TIR
/LR
R
+
Gene expression profiles for the developmental stages in
Arabidopsis thaliana
Gene expression profiles for the light exposure conditions in
Arabidopsis thaliana
RootStem Leaf
Apex
Flower
Floral
organs Seeds
Darkness
Continuous
light
Pulse
light
a b
Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis
WRKY proteinsshow tissue
specific expression
WRKY proteinsshow light
specific expression
123 41 314
Aft2p Rcs1p
Number of target genes regulated
Aft2p
Rcs1p
Transcriptional network involving Aft2p and Rcs1p
UM
03656.1 Um
ay 71019145
CA
GL
0H03487G
CG
LA
49526254
CA
GL
0G09042G
CG
LA
49526062
CaO
19.2272 Calb 68482460
DE
HA
0F25124g D
han 50425555
KL
LA
0D03256g K
lac 50306475
AF
L087C
AG
OS 44984319
OR
FP
Sklu Contig1830.2 kluyveri
Kw
al 24045 waltii
OR
FP
Skud Contig2057.12 kudriavzeii
OR
FP
Scas Contig720.21 castelli
RC
S1 S
CE
R 51830313
OR
FP
7853 mikatae
OR
FP
8601 paradoxus
OR
FP
21513 mikatae
OR
FP
Scas Contig690.14 castelli
OR
FP
22109 paradoxus
AF
T2 S
CE
R 6325054
OR
FP
Skud Contig1659.3 kudriavzeii
Relationship between Rcs1p and Aft2p homologs
* *
AAL026Wp Agos 44980144UM03656.1 Umay 71019145CHGG 06963 CGLO 88178242CHGG 06785 CGLO 88182698CHGG 09478 CGLO 88177996CHGG 00175 CGLO 88184472CHGG 10902 CGLO 88175616FG05699.1 Gzea 46122643NCU06551.1 Ncra 85106835NCU05145.1 Ncra 85081010YALI0F07128g Ylip 50555399MG05295.4 Mgri 39939890FG04147.1 Gzea 46116610NCU07855.1 Ncra 85109845MG06795.4 Mgri 39977821NCU08168.1 Ncra 85093270CHGG 09951 CGLO 88176079CHGG 08318 CGLO 88179597NCU04492.1 Ncra 32406464FG09606.1 Gzea 46136181NCU06975.1 Ncra 85108658CHGG 05063 CGLO 88180976HOP78 FOXY 30421204CHGG 00311 CGLO 88184608CIMG 00825 CIMM 90305840AN6124.2 Anid 67539908ISOCHOR AFUM 71001046CNC00740 CNEO 57225606CNBH2400 Cneo 50256416AN0859.2 ANID 67517161YALI0A16269g Ylip 50545173CaO19 12424 Calb 68467239DEHA0E17127g Dhan 50422877RBF1P CALB 2498834
DEHA0A05258g Dhan 50405817CaO19.2272 Calb 68482460DEHA0F25124g Dhan 50425555CAGL0H03487G CGLA 49526254AFL087C AGOS 44984319KLLA0D03256g Klac 50306475CAGL0G09042G CGLA 49526062RCS1 SCER 51830313AFT2 SCER 6325054YALI0A05313g Ylip 50543230YALI0A02266g Ylip 50543034Mutyl Ylip 50545163YALI0C17193g.c Ylip 50548927Mutyl.c Ylip 50545161YALI0C00781g.d Ylip 50547661YALI0C00781g.a Ylip 50547661YALI0C00781g.b Ylip 50547661YALI0C00781g.c Ylip 50547661YALI0C17193g.a Ylip 50548927Mutyl.a Ylip 50545161YALI0D22506g Ylip 50551361Mutyl.b Ylip 50545161YALI0C17193g.b Ylip 50548927MG07557.4 Mgri 39972511MG09992.4 Mgri 39965911101.T00020 EHIS 674742804.T00052 EHIS 67483840FAR1 ATHA 18414374AT2G27110 ATHA 18401324AT2G43280 ATHA 30689328AT4G38180 ATHA 15233732AT3G59470 ATHA 18411179AT5G28530 ATHA 22327146AT1G52520 ATHA 15219020AT1G80010 ATHA 15220043C20ORF164 HSAP 13929452LOC428161 GGAL 50759053T24C4.2 CELE 17555262SJCHGC04823 SJAP 567589366330408A02RIK MMUS 50053999LOC374920 HSAP 27694337
Multiple independent evolution of TFs from Transposons
Animals
Plants
Entamoeba
Fungi
Rcs1Aft2p
cluster
Rbf1cluster
CIN5
YAP5
GCN4
YAP6YAP7 YAP1
CAD1
MET4
CST6
SKO1
ARR1
YAP3MET28
HAC1
ACA1
(227)
Fig 3
SWI4 SWI4
MBP1 MBP1
XBP1 XBP1
PHD1 PHD1
SOK2 SOK2
XBP1
SWI4
SOK2(471)
PHD1
MBP1
STE12
YOX1
TOS8
YHP1
PHO2
CUP9
HML2
HMRa1HML1
HMRa2
(357)
Basic Leucine Zipper family Homeodomain family Apses familya b c
MET4MET4
MET28MET28
GCN4GCN4
ARR1ARR1
YAP3YAP3
YAP1YAP1
CAD1CAD1
CIN5CIN5
YAP6YAP6
YAP5YAP5
YAP7YAP7
HAC1HAC1
SKO1SKO1
ACA1ACA1
CST6CST6
HMLALPHA2 HMLALPHA2
HMRA2 HMRA2
CUP9 CUP9
TOS8 TOS8
HMRA1 HMRA1
PHO2 PHO2
YOX1 YOX1
YHP1 YHP1
MET4MET4MET28MET28GCN4GCN4ARR1ARR1YAP3YAP3YAP1YAP1CAD1CAD1CIN5CIN5YAP6YAP6YAP5YAP5YAP7YAP7HAC1HAC1SKO1SKO1ACA1ACA1CST6CST6
HML2 HML2
HMRA2 HMRA2
CUP9 CUP9
TOS8 TOS8
HMRA1 HMRA1
PHO2 PHO2
YOX1 YOX1
YHP1 YHP1
SWI4 SWI4
MBP1 MBP1
XBP1 XBP1
PHD1 PHD1
SOK2 SOK2
S. c
er
ev
is
ia
e
S. p
ar
ad
ox
us
S. m
ik
ata
e
S. k
ud
ria
vz
ev
ii
S. b
ay
an
us
S. c
as
te
llii
S. k
lu
yv
er
i
K. w
altii
A. g
os
sy
pii
C. a
lb
ic
an
s
N. cr
as
sa
M. gr
is
ea
A. n
id
ula
ns
S. p
om
be
U. m
ay
dis
C. n
eo
fo
rm
an
s
A s c o m y c o t a
F u n g iB a s i d i o m y c o t a
S. c
ere
vis
iae
S. p
ara
doxu
s
S. m
ikata
e
S. k
udri
avze
vii
S. b
ayan
us
S. c
ast
ell
ii
S. k
luyveri
K. w
alt
ii
A. g
oss
ypii
C. a
lbic
an
s
N. c
rass
a
M. g
rise
a
A. n
idu
lan
s
S. p
om
be
U. m
ayd
is
C. n
eofo
rman
s
AscomycotaFungi
Basidiomycota
40 -
0 -
10 -
20 -
30 -
C6-
Fun
gal
C2H
2-Z
n
bZip
Hom
eo
Gat
a
bHLHFkhHsf
Aps
es
Myb
Mad
s
HM
G1
LisH
+C
TLH
Gcr
1p+
Msn
1p
Rcs
1
Ace
1
AT
-Hoo
k
Tig
Abf
1
Tea
Ime1
Dal
82
Tig
ger
P53
-Cyt
ochr
ome
Nu
mb
er
of
me
mb
ers
in t
he
fa
mily
(n
on
-hu
b :
hu
b)
*
*
Fig 1
0 100
0 100
Fraction of the 14 fungal genomes in which a non-hub transcription factor is evolutionarily conserved (i.e. an ortholog exists)
Fraction of the 14 fungal genomes in which a regulatory hub is evolutionarily conserved (i.e. an ortholog exists)
* Fungal specific DNA-binding domain
DNA-binding domain family which evolved from a transposon
Each box represents a TF member with a specific DBD family, arranged according to evolutionary conservationA red box represents a regulatory hub (A TF regulating > 150 genes), and a blue box represents a non-hub regulatorThe intensity of color represents the fraction of the 14 fungal genomes in which the protein has an ortholog
Possible evolutionary trajectories of transcriptional regulators
Common ancestor was nota regulatory hub. One of the extant
proteins is a regulatory hub
Common ancestor wasa regulatory hub. One of the extant
proteins is a regulatory hub
Common ancestor wasa regulatory hub. Both extant proteins
are regulatory hubs
Common ancestor was nota regulatory hub. Extant proteins
are not regulatory hubs
Extant proteins share less target genes than expected by chance
XXXY ZZZW
Extant proteins share less target genes than expected by chance
XXXY ZZZW
Extant proteins share less target genes than expected by chance
XXXY ZZZW
Extant proteins share less target genes than expected by chance
XXXY ZZZW
Extant proteins share more target genes than expected by chance
Sok2 Phd1
Extant proteins share more target genes than expected by chance
XXXY ZZZW
Extant proteins share more target genes than expected by chance
XXXY ZZZW
Extant proteins share more target genes than expected by chance
XXXY ZZZW
Gene duplication Gene duplication Gene duplication Gene duplication
a b c d