distribution of the selma translocon in secondary plastids of red algal origin and predicted...

36
Distribution of the SELMA translocon in secondary plastids of red algal origin 1 and predicted uncoupling of ubiquitin-dependent translocation from degradation 2 Simone Stork* 1 , Daniel Moog* 1 , Jude M. Przyborski 2 , Ilka Wilhelmi 1 , Stefan Zauner 1 3 and Uwe G. Maier #1,3 4 1 Laboratory for Cell Biology, Philipps-University Marburg, Karl-von-Frisch Str. 8, D- 5 35032 Marburg, Germany. 6 2 Laboratory for Parasitology, Philipps-University Marburg, Karl-von-Frisch Str. 8, D- 7 35032 Marburg, Germany 8 3 LOEWE-Zentrum für Synthetische Mikrobiologie (SynMikro), Hans-Meerwein-Straße, 9 D-35032 Marburg, Germany 10 *These authors contributed equally. 11 ‘Present address: Institute for Molecular Tumor Biology and Cancer Gene Therapy, 12 Philipps-University Marburg, Emil-Mannkopff-Str. 2, D-35032 Marburg, Germany. 13 # E-mail: [email protected] 14 Running title: SELMA - uncoupling translocation from degradation 15 Copyright © 2012, American Society for Microbiology. All Rights Reserved. Eukaryotic Cell doi:10.1128/EC.00183-12 EC Accepts, published online ahead of print on 5 October 2012

Upload: u-g

Post on 14-Oct-2016

222 views

Category:

Documents


5 download

TRANSCRIPT

Distribution of the SELMA translocon in secondary plastids of red algal origin 1

and predicted uncoupling of ubiquitin-dependent translocation from degradation 2

Simone Stork*1, Daniel Moog*1, Jude M. Przyborski2, Ilka Wilhelmi1’, Stefan Zauner1 3

and Uwe G. Maier#1,3 4

1Laboratory for Cell Biology, Philipps-University Marburg, Karl-von-Frisch Str. 8, D-5

35032 Marburg, Germany. 6

2Laboratory for Parasitology, Philipps-University Marburg, Karl-von-Frisch Str. 8, D-7

35032 Marburg, Germany 8

3LOEWE-Zentrum für Synthetische Mikrobiologie (SynMikro), Hans-Meerwein-Straße, 9

D-35032 Marburg, Germany 10

*These authors contributed equally. 11

‘Present address: Institute for Molecular Tumor Biology and Cancer Gene Therapy, 12

Philipps-University Marburg, Emil-Mannkopff-Str. 2, D-35032 Marburg, Germany. 13

#E-mail: [email protected] 14

Running title: SELMA - uncoupling translocation from degradation 15

Copyright © 2012, American Society for Microbiology. All Rights Reserved.Eukaryotic Cell doi:10.1128/EC.00183-12 EC Accepts, published online ahead of print on 5 October 2012

2

Abstract 16

Protein import into complex plastids of red algal origin is a multistep process including 17

translocons of different evolutionary origin. The symbiont-derived ERAD-like machinery 18

(SELMA), shown to be of red algal origin, is proposed to be the transport system for 19

preprotein import across the periplastidal membrane of heterokontophytes, 20

haptophytes, cryptophytes and apicomplexans. In contrast to the canonical endoplasmic 21

reticulum-associated degradation (ERAD) system, SELMA translocation is suggested to 22

be uncoupled from proteasomal degradation. We investigated the distribution of known 23

and newly identified SELMA components in organisms with complex plastids of red 24

algal origin by intensive data mining, thereby defining a set of core components present 25

in all examined organisms. These include putative pore-forming components, a 26

ubiquitylation machinery, as well as a Cdc48 complex. Furthermore, the set of known 27

20S proteasomal components in the periplastidal compartment (PPC) of diatoms was 28

expanded. These newly identified putative SELMA components as well as proteasomal 29

subunits were in vivo-localized as PPC proteins in the diatom Phaeodactylum 30

tricornutum. The presented data allows us to speculate about the specific features of 31

SELMA translocation in contrast to the canonical ERAD system, especially the 32

uncoupling of translocation from degradation. 33

3

Introduction 34

Organelles like plastids, including those of secondary origin, almost completely rely on 35

protein import from the host cytosol (46, 65). The structure of complex plastids, 36

surrounded by three or four membranes required, in contrast to primary plastids, the 37

evolution of several additional protein transport mechanisms. Complex plastids arose 38

through secondary endosymbiosis, a process which describes the engulfment of a 39

former free-living eukaryotic alga into a eukaryotic host cell (32-33). During evolution, 40

the symbiont was subsequently reduced in terms of compartmentalization and genome 41

size to an organelle strictly dependent on the host cell (16, 32). Different types of 42

secondary plastids exist in a very broad range of algae and protists, which can be 43

distinguished based on their evolutionary origin (e.g. a red or green alga derived 44

symbiont), as well as on the amount of cellular reduction inside the host cell. Our 45

understanding of the evolution of organisms harboring a secondary plastid of red algal 46

origin has changed in the last few years. According to the chromalveolate hypothesis, 47

six major lineages were grouped together to be of monophyletic origin: cryptophytes, 48

haptophytes, heterokontophytes, peridinin-containing dinoflagellates, apicomplexans 49

and the non-plastid containing ciliates, as well as several smaller lineages related to 50

some chromalveolate members (15, 41). However, recent phylogenetic analyses have 51

given rise to extended theories about the evolution of the lineages with a red algal 52

endosymbiont, including serial endosymbiotic events with secondary, as well as tertiary, 53

endosymbioses (21-22, 26-27, 56, 61, 71, 75). 54

It has been shown that the lineages with an endosymbiont of red algal origin share 55

common plastid protein import mechanisms despite remarkable differences resulting 56

from specific features in plastid ultrastructure (10, 37, 65). Import into complex plastids 57

starts co-translationally at the endoplasmatic reticulum (ER) membrane where nascent 58

4

precursor proteins are synthesized into the ER lumen. This transport step requires a 59

canonical N-terminal signal peptide (SP). In heterokontophytes, cryptophytes and 60

haptophytes, the outermost plastid membrane, termed chloroplast ER (cER) membrane, 61

is continuous with the endomembrane system of the host cell; therefore, the Sec61 62

mediated import already represents transport across the first membrane of complex 63

plastids. In contrast, the plastids of apicomplexans and peridinin-containing 64

dinoflagellates are not connected to the endomembrane system. Thus, after import into 65

the ER lumen, proteins are likely to be transported to the plastid via vesicle transport 66

mechanisms directly from the ER or via the Golgi apparatus (47, 57). 67

After the preprotein has entered the cER lumen, the SP is thought to be cleaved off and 68

a transit peptide-like sequence (TPL) is exposed at the new N-terminus. The TPL is 69

required for further transport into the periplastidal compartment (PPC), which resembles 70

the naturally reduced cytoplasm of the endosymbiont, and further into the stroma of the 71

plastid. Such transit peptide-like sequences thereby fulfill an additional function in 72

contrast to transit peptides (TP) in primary plastids. Detailed characterization of the TPL 73

revealed a difference between stroma- and PPC-localized proteins. Stromal proteins 74

possess an aromatic (Phe, Tyr, Trp) or bulky (Leu) amino acid at the +1 position of their 75

TPL, in contrast to PPC proteins (4, 31, 34, 42). However, the observed AXA-FAP motif 76

at the transition between SP and TPL of stromal proteins is not as well conserved in 77

haptophytes, apicomplexans and dinoflagellates as it is the case for heterokontophytes 78

and cryptophytes (58). Furthermore, some membrane proteins of apicomplexan plastids 79

(apicoplasts) seem to carry intrinsic targeting signals instead of a bipartite targeting 80

signal (BTS) consisting of SP and TPL (1, 20). 81

For transport across the second outermost membrane, the periplastidal membrane 82

(PPM), a translocon model was proposed to consist of a recycled ER-associated 83

5

degradation (ERAD) machinery of symbiont origin (67). Support for this model came 84

from the detection of symbiont-specific ERAD components encoded on the 85

nucleomorph of the cryptophyte Guillardia theta, this being the remnant nucleus of the 86

former red algal endosymbiont in the cryptophytes PPC (23). The canonical ERAD 87

removes aberrant or misfolded luminal (ERAD-L) and membrane (ERAD-M and ERAD-88

C) ER proteins and tags them after retro-translocation in the cytosol with poly-ubiquitin 89

moieties for subsequent proteasomal degradation (7, 40, 66). However, in the symbiont-90

specific ERAD-like pathway (SELMA) the retro-translocation machinery of ERAD-L is 91

postulated to be maintained, and possesses the capacity to transport proteins from an 92

ER luminal compartment into a cytoplasmic compartment, the PPC. This process is 93

supposed to be uncoupled from degradation. SELMA is conserved in all secondary 94

evolved organisms with a red algal endosymbiont, for which genomic data are available 95

(26, 67-68). Proteins of the derlin family are still controversially discussed elements of 96

the ERAD-specific translocon. In the diatom Phaeodactylum tricornutum, two symbiont-97

localized derlins (PtsDer1-1/PtsDer1-2) are expressed which form hetero- as well as 98

homo-oligomers and show interaction with transit peptide-like sequences of PPC-99

localized proteins (38). These components are indeed involved in the transport of 100

proteins into the plastid as indicated by a conditional knock-down mutant of the 101

Toxoplasma gondii sDer1 protein which showed impairment in plastid protein import (2). 102

The translocation process is predicted to be dependent on ubiquitylation, further 103

supported by the presence of a set of ubiquitylation enzymes (39, 67). Additional factors 104

proposed to be involved in SELMA are a symbiont-specific Cdc48 AAA-ATPase with its 105

co-factor Ufd1 and adaptor proteins (55, 67). Following translocation, the precursor 106

proteins are likely to undergo de-ubiquitylation and are either passed on to the 107

translocon in the third outermost membrane or folded in the PPC (13, 39, 55). Although 108

6

a residual set of 20S proteasomal components was identified in the PPC of diatoms, 109

there is currently no link between SELMA and proteasomal degradation (55). 110

Having passed through the PPC, transport across the innermost plastid membranes 111

seems to be comparable to primary plastids with a translocon at the inner membrane of 112

chloroplasts (TIC) and a recently identified Omp85 protein which belongs to the family 113

of Toc75 proteins, the core components of the translocon at the outer membrane of 114

chloroplasts (TOC) (1, 10, 13, 73). 115

Here, we present an update on the SELMA translocation model in organisms with a red 116

algal endosymbiont with focus on five heterokontophytes and apicomplexan parasites. 117

In particular we mined the genomes of organisms that carry secondary plastids, 118

including recently published full genome sequences, for SELMA proteins. With this 119

collected data set one would expect to define the degree of factor conservation and 120

identify main components of the SELMA system which evolved to function in protein 121

transport at a plastid membrane. Our results are compared to the respective host ERAD 122

system as well as to red algal ERAD components, from which SELMA originated. 123

Additionally, four new PPC-localized proteins similar to factors involved in ERAD could 124

be identified in the diatom P. tricornutum. We also extended the set of core proteasomal 125

components in the PPC of heterokontophytes and discuss their putative function in 126

relation to SELMA. 127

128

Materials and Methods 129

Bioinformatic Analysis 130

7

Protein sequences of ERAD and SELMA as well as proteasomal components were 131

collected from published data or retrieved via blastp and tblastn searches. As queries, 132

sequences from the Saccharomyces cerevisiae ERAD system and the P. tricornutum 133

SELMA system were used to search the genomic databases for Phaeodactylum 134

tricornutum v2.0 (12), Thalassiosira pseudonana (5), Fragilariopsis cylindrus 135

(http://genome.jgi-psf.org/Fracy1/Fracy1.home.html), Aureococcus anophagefferens 136

(30), Emiliania huxleyi CCMP1516 main genome assembly v1.0 (http://genome.jgi-137

psf.org/Emihu1/Emihu1.home.html) and Guillardia theta CCMP2712 v1.0 138

(http://genome.jgi-psf.org/Guith1/Guith1.home.html). Sequences from Ectocarpus 139

siliculosus (19) and Babesia bovis were searched at the National Center for 140

Biotechnology Information (NCBI) server (http://www.ncbi.nlm.nih.gov/guide/). 141

Apicomplexan sequences were retrieved from the Plasmodium Genomics Resource 142

Version 9.0 (6) for Plasmodium, the Toxoplasma Genomics Resource v7.2 (29) for T. 143

gondi and Neospora caninum, TparvaDB Version 1.0 (74) and the National Center for 144

Biotechnology Information (NCBI) server (http://www.ncbi.nlm.nih.gov/guide/) for 145

Theileria parva and the Cryptosporidium Genomics Resource v4.6 (36) for 146

Cryptosporidium parvum. ERAD sequences for red algae were either retrieved from the 147

genome projects of Cyanidioschyzon merolae (53) and Galdieria sulphuraria (Michigan 148

State University Galdieria Database [http://genomics.msu.edu/galdieria/about.html]), or 149

by local Blast (blast-2.2.10-ia32-win32) using expressed sequence tags (EST) of 150

Porphyridium cruentum and partial genome data from Calliarthron tuberculosum 151

(http://dbdata.rutgers.edu/data/plantae/) generated by Chan and colleagues (17). 152

In general a minimal e-value of 1e-04 was set as threshold for the identification of 153

ERAD/SELMA components on the protein level. However, in cases of weak query 154

sequence significance, also matches with a lower e-value were inspected. Additionally, 155

8

criteria like domain structure and composition similarity (NCBI Conserved Domain 156

search) were applied for identification of relevant proteins (51). For proteasomal 157

components, all S. cerevisiae 20S protein sequences were used as queries to collect a 158

data set of putative proteasomal components which were then classified according to 159

the NCBI Conserved Domain Database (51) which differs from the S. cerevisiae 160

nomenclature (detailed information on different classifications can be found in (60)). 161

All gene models of the identified proteins were aligned to genomic and EST sequences, 162

if available. Thereby, missing N- and C-termini were identified by searching for putative 163

start and stop codons in frame, respectively. If possible, intron borders of the gene 164

models were checked to be in agreement with EST data. The protein sequences were 165

additionally examined for N-terminal targeting sequences to discriminate symbiont 166

proteins from host factors. PPC directed proteins are characterized by the presence of a 167

SP and a TPL. The SignalP 3.0 Server (24) was used for the prediction of a SP with a 168

cutoff of >0.5 by the HMM algorithm. Then, the sequences were analyzed with the 169

TargetP 1.1 Server (25) with default settings to define the SP as a secretory signal 170

sequence and exclude mitochondrial targeting. In general, the TPL of PPC (symbiont) 171

proteins cannot be predicted accurately with available tools. For this reason, besides 172

performing the prediction with the TargetP 1.1 Server (25) using signal peptide-173

truncated sequences in “plant” mode, the criteria defined in (55) were applied. In some 174

cases, a protein model was identified with high similarity to a known symbiont protein of 175

the diatom P. tricornutum or the apicomplexan parasite P. falciparum but without SP 176

prediction. This can be caused by an incorrect gene model prediction due to the lack of 177

EST data or the presence of several putative start codons. Therefore, these proteins 178

were assigned as symbiont but marked to lack a signal peptide prediction. 179

9

Analyses of transmembrane spanning regions were performed with TOPCONS (9), 180

domain and coiled-coil prediction was done using SMART (45). Protein sequence 181

alignments were performed with GENEDOC Software (version 2.6.002 182

[http://www.psc.edu/biomed/genedoc]). 183

Plasmid Construction and Transfection of P. tricornutum 184

The predicted PPC proteins were cloned and transfected into the diatom P. tricornutum. 185

The sequences of genes containing introns or without EST support were amplified from 186

cDNA, the rest from gDNA, cloned in front of egfp into P. tricornutum transfection 187

vectors. ptsubx, ptspng1, ptsubq were cloned into the nitrate-inducible pPha-NR vector 188

(GenBank: JN180663), ptsnpl4, ptsβ1, ptsα3, pthβ7 and pthrpn10 into the light-inducible 189

pPha-T1 vector (GenBank: AF219942). For further information about sequences of in 190

vivo-localized proteins as well as primer sequences see supplemental file 1. Biolistic 191

transfection into P. tricornutum cells was performed as described previously (67, 77). 192

Positive transformants were cultured under standard conditions as described before (3) 193

with 1.5 mM NH4+ in permanent cultivation. Protein expression under control of the 194

nitrate reductase promoter (pPha-NR vector) was induced by cultivation on 0.9 mM 195

NO3- for two days. 196

Fluorescence Microscopy 197

P. tricornutum transformants were fixed with 4 % paraformaldehyde/ 0,0075 % 198

glutaraldehyde in 1x PBS buffer and analyzed with a confocal laser scanning 199

microscope Leica TCS SP2 using a HCX PL APO 40×/1.25 − 0.75 Oil CS objective. 200

Fluorescence of eGFP and chlorophyll was excited with an Argon laser at 488 nm and 201

detected with two photomultiplier tubes at a bandwidth of 500–520 nm and 625–720 nm 202

for eGFP and chlorophyll fluorescence, respectively. 203

10

204

Results 205

1. Identification of ERAD and SELMA components in red algae and organisms 206

with a red algal endosymbiont 207

In order to identify new ERAD and SELMA components, all available genomic 208

sequences of red algae and organisms with a red algal endosymbiont were screened 209

via BLAST search with queries from the best studied ERAD system of Saccharomyces 210

cerevisiae (40, 66). The recently published genomes of heterokontophytes (the diatom 211

Fragilariopsis cylindrus, the brown alga Ectocarpus siliculosus, the harmful alga 212

Aureococcus anophagefferens), the nuclear genome of the cryptophyte Guillardia theta 213

and the apicomplexan Neospora caninum were included in these analyses. Because 214

SELMA was shown to be phylogenetically derived from the ERAD system of the red 215

algal endosymbiont (26), we also included sequences from the red algae 216

Cyanidioschyzon merolae, Porphyridium cruentum, Calliarthron tuberculosum and 217

Galdieria sulphuraria in our analyses (see Material and Methods for detailed description 218

on used genome data). In contrast to the other chromalveolate groups, dinoflagellate 219

plastids have only three surrounding membranes and very little is known about the 220

mechanisms that transport proteins across these membranes (10, 65). Due to the 221

paucity of genomic data for these organisms, we have not included peridinin-containing 222

dinoflagellates in this study. 223

We identified genes for conserved ERAD components in all investigated red algal 224

genomes (Table 1). However, due to incomplete data for Porphyridium cruentum (EST 225

data) and especially Calliarthron tuberculosum (partial genome data), only a subset of 226

ERAD factors could be identified. The collected data set for red algae implicates that the 227

11

progenitor from which the SELMA machinery originated was capable of ERAD-L via the 228

Hrd1 complex as well as ERAD-C via the Doa10 complex in the ER membrane. 229

Additionally, all proteins required for ubiquitylation and efficient proteasomal substrate 230

delivery after ERAD retro-translocation are present in red algae. 231

In organisms with a red algal endosymbiont, the SELMA system exists in parallel with 232

the host ERAD machinery. The discrimination between proteins of both systems is 233

based on the targeting signal of the PPC localized SELMA proteins in contrast to the 234

mostly cytosolic ERAD components (see Material and Methods). Identification of a 235

SELMA protein is more reliable if a respective host protein with the same putative 236

function can be found. Therefore, a detailed analysis of the host ERAD system of the 237

investigated organisms was included and almost all ERAD proteins known from S. 238

cerevisiae could be identified in the genomes (Table 1, for detailed information see 239

supplemental file 2, Table S1). All organisms encode for the ER membrane proteins 240

Sec61α, Hrd1, the derlin proteins and, with the exception of apicomplexans, also for 241

Doa10. In addition, a cytosolic ubiquitylation machinery, the Cdc48 complex with its co-242

factors (Npl4 and Ufd1) and all proteasomal substrate delivery factors (Rad23, Dsk2, 243

Png1) were identified. 244

Our inspection of the SELMA system in secondary evolved algae and apicomplexan 245

parasites showed a high degree of conserved components for this putative protein 246

translocation machinery (Table 1). However, failures in the identification of certain 247

proteins can have different reasons. If a protein is present in most of the organisms of 248

one group but lacking in one specific organism, this is likely caused by incomplete 249

genome sequencing and assembly, or by incorrect protein model prediction (e. g. for 250

Aureococcus anophagefferens). In contrast, a protein not identified in a whole group of 251

organisms may have been lost completely during evolution. Haptophytes and 252

12

cryptophytes are represented only by one organism, hindering a final conclusion about 253

the presence or absence of specific proteins but allowing considerations of whole 254

protein complexes. The analysis of the newly available genome sequence of the 255

cryptophyte G. theta shows that the partially nucleomorph-encoded SELMA system is 256

supplemented with nucleus encoded factors (Table 1). 257

Interestingly, from the three ER membrane protein classes, Sec61α, derlin proteins and 258

the ubiquitin ligase Hrd1, which are discussed as putative ERAD channel proteins, only 259

the derlin proteins are found in the complex plastids of these organisms as membrane 260

proteins with several transmembrane domains, with the exception of a nucleomorph-261

encoded Hrd1 in cryptophytes (see below). In respect to derlins, two symbiontic 262

representatives are present in heterokontophytes, haptophytes and cryptophytes, as is 263

the case for yeast (ScDer1p and ScDfm1p). Ubiquitylation requires a cascade of three 264

enzymes starting with a ubiquitin-activating enzyme (Uba1) which is present in all 265

organisms. At least one symbiont ubiquitin conjugating enzyme (sUbc) can also be 266

found, but not all putative PPC-targeted sUbc proteins can be assigned to the same S. 267

cerevisiae Ubc protein. While heterokontophytes share a sUbc similar to ScUbc6p and 268

at least one other sUbc protein, apicomplexans seem to encode only for one sUbc 269

protein with the highest similarity to ScUbc4p. The ubiquitin ligase sHrd1 of 270

heterokontophytes differs in protein structure from the symbiont ubiquitin ligase of 271

cryptophytes. Several transmembrane domains are predicted for the GtsHrd1 protein. 272

Therefore, it more resembles the yeast ScHrd1p structure than the heterokontophyte E3 273

ligase which contains only one predicted transmembrane domain. 274

The symbiont Cdc48-complex together with sUfd1 can be found in all organisms 275

investigated and we were now able to identify a sNpl4 protein in the diatom P. 276

tricornutum which is conserved among heterokontophytes. The same is the case for 277

13

three other newly identified putative symbiont proteins, sUBX, sUbq and sPng1. These 278

share similarity to ERAD factors and are present in addition to the host version in the 279

diatom P. tricornutum and other heterokontophytes. None of these proteins is 280

conserved in apicomplexans (Table 1). 281

2. Newly identified putative SELMA components: the sCdc48 co-factor sNpl4, a 282

UBX domain-containing protein, a symbiont ubiquilin-like protein and a peptide 283

N-glycanase 284

Among the newly identified putative SELMA components are two Cdc48 binding 285

proteins, the UBX domain-containing protein sUBX (symbiont UBX) and the Ufd1 co-286

factor sNpl4, as well as proteins with sequence similarity to the de-glycosylation enzyme 287

ScPng1p (sPng1) and the poly-ubiquitin binding protein ScDsk2p (sUbq). Not all four 288

proteins are predicted to have a TPL in the diatom P. tricornutum. In such a case, a 289

signal peptide on usually cytosolic proteins is indicative for a PPC localization, but 290

remained to be verified in localization experiments. Therefore, PtsNpl4, PtsUBX, 291

PtsUbq and PtsPng1 were expressed as eGFP fusions in P. tricornutum and their 292

localization was examined in vivo. All constructs showed the typical fluorescence 293

pattern of PPC-localized proteins in the middle of the two plastid lobes (Fig. 1A). 294

A comparison of the protein sequences to well known ERAD components of S. 295

cerevisiae and their domain composition points to their putative function (Fig. 1B). The 296

identified symbiont UBX domain-containing protein, PtsUBX, shares sequence similarity 297

to other proteins only in its UBX domain which has been shown to be a general Cdc48 298

binding module (63). Preceding the UBX domain, the protein harbors a coiled-coil 299

region for homo- or heterotypic protein interaction (52). The second identified Cdc48 300

binding protein, PtsNpl4, now completes the sCdc48 complex, together with the 301

14

previously described co-factor sUfd1 (67). In comparison to its yeast ERAD counterpart 302

the protein lacks the N-terminal Npl4 zinc finger domain. The symbiont ubiquilin-like 303

protein PtsUbq shares the N-terminal UBQ domain known from mammalian ubiquilins 304

and ScDsk2p (28, 69) but lacks the C-terminal UBA domain for poly-ubiquitin binding as 305

well as the internal STI1 domains. PtsPng1 as a symbiont de-glycosylation enzyme was 306

initially annotated as the host protein (67) despite the presence of a weakly predicted 307

signal peptide which can be explained by a missing second copy of this protein at that 308

time. In this current study an additional Png1 protein was identified in its place, lacking 309

an N-terminal targeting sequence and leading to reevaluation of the former prediction. 310

PtsPng1 has a transglutaminase/protease-like domain with the conserved catalytic 311

residues of cytoplasmic PNGase (data not shown) (70). 312

3. The symbiont ubiquitin in diatoms lacks the conserved lysine residues Lys48 313

and Lys63 314

The SELMA model proposes ubiquitylation of the precursor proteins during transport 315

and subsequent removal of the ubiquitin moiety via a PPC-specific de-ubiquitinating 316

enzyme (38, 67). The PPC-targeted ubiquitin of P. tricornutum (PtsUbi) was shown to 317

lack the specific lysine residue (Lys48) as the most prominent linker for poly-318

ubiquitylation involved in degradation (67). With the identification of symbiont ubiquitins 319

from T. pseudonana and F. cylindrus, this feature becomes even more apparent, as the 320

Lys48 is also absent in these diatom sequences (Fig. 2). The TpsUbi protein model (ID: 321

1539) can hardly be recognized and has to be modified according to the available EST 322

sequences in order to obtain the full sequence (see supplemental file 1). In the genome 323

of the cryptophyte Guillardia theta, a di-ubiquitin protein sequence with a signal peptide 324

prediction could be identified with the less conserved first ubiquitin domain sharing the 325

diatom lysine mutation. In contrast, sUbi sequences from the haptophyte E. huxleyi and 326

15

the apicomplexans B. bovis and T. parva still contain lysine 48. The previously identified 327

symbiont ubiquitin of P. falciparum shows only weak conservation with the symbiont 328

ubiquitins from diatoms (68). Unfortunately, it was not possible to detect a symbiont 329

ubiquitin in the other newly investigated organisms. 330

Interestingly, the position Lys63, usually used for poly-ubiquitin linkages related to 331

modifications of protein function, is also no longer present in all symbiont ubiquitins 332

except the second domain of GtsUbi. 333

4. Identification and localization of proteasomal components in the PPC of 334

heterokontophytes and cryptophytes 335

We previously reported on the presence of relict 20S proteasomal components in the P. 336

tricornutum PPC (55). Here, we expand the model of a symbiont core proteasome by 337

extensive in silico analyses. 338

The cryptophyte Guillardia theta, which is still able to synthesize proteins in the PPC, 339

encodes for an almost complete set of PPC-localized proteasomal degradation 340

components on its nucleomorph genome (23). This is in contrast to other organisms 341

with a red algal endosymbiont and amongst them, only in heterokontophytes could 342

residual 20S subunits be identified. Importantly, we did not detect a symbiont 19S 343

regulatory particle in the PPC of heterokontophytes although it is present in 344

cryptophytes. We could not identify a complete set of 20S subunits, including 7 alpha 345

and 7 beta subunits, for any of the organisms studied (Table 2). Instead, the putative 346

20S core particle seems to vary in subunit composition, with the exception of conserved 347

sα2, sα3 and sβ6. The putative catalytically active subunits β6 and β7 could be 348

identified in almost all heterokontophytes in a PPC-directed version; a second β5 gene 349

is only detectable as a gene fusion with a 5'-3' exonuclease, also lacking a signal 350

16

peptide. In A. anophagefferens and E. siliculosus in addition to a symbiont sβ5 with 351

signal peptide prediction, several putative symbiont subunits exist but an exact defining 352

of the gene model is difficult. Therefore, a classification into host or symbiont protein 353

cannot yet conclusively be determined. 354

In addition to the already reported symbiont 20S proteasomal subunits Ptsβ2, Ptsβ6, 355

Ptsβ7, Ptsα7-1 and Ptsα7-2 (55), we successfully localized two additional subunits in 356

the PPC of P. tricornutum. Both Ptsα3-1 and Ptsβ1 showed the typical PPC 357

fluorescence pattern (see Fig. 3). As a comparison, two subunits of the host 358

proteasome were also localized, Pthβ7 and PthRpn10, which resulted in a different 359

fluorescence pattern outside of the plastid. 360

361

Discussion 362

For protein transport across the periplastidal membrane of complex plastids of red algal 363

origin, an ERAD derived mechanism (SELMA) was proposed as the protein 364

translocation machinery (67). The SELMA model, originally based on our findings in 365

cryptophytes, was shown to be conserved in organisms with a red algal endosymbiont 366

(2, 26, 67-68) and SELMA components can be identified in all available genomes in 367

addition to the host ERAD machinery (Table 1). However, the exact mechanism of this 368

transport step, the pore-forming proteins and the minimal required components remain 369

an open question. The proposed SELMA components show often a minimized structure, 370

as domains, known from ERAD proteins of other organisms, are missing. Thus, the 371

SELMA complex should indicate a minimized version of the retro-translocation activity 372

of ERAD in general. Different extents of ERAD to SELMA reduction can be found in the 373

investigated organisms according to the amount of reduction of the former 374

17

endosymbiont. The cryptophyte G. theta represents the most extended set of SELMA 375

and proteasomal components resulting most probably from its transcriptionally and 376

translationally active nucleomorph in the PPC. Apicomplexans instead have the 377

smallest set of identified SELMA components and seem to lack a symbiont proteasome. 378

So far, all investigated organisms share the following as SELMA components in the 379

PPC: derlins as membrane and putative channel proteins, a ubiquitylation machinery 380

and a Cdc48 complex. It remains to be determined if the recently identified, conserved 381

PPC protein PPP1 provides a new crucial function for protein transport (64). All other 382

identified proteins with functions related to SELMA or the proteasome in 383

heterokontophytes might represent lineage specific adaptations (Fig. 4). 384

Of all ER membrane proteins that are candidates for a translocation channel in ERAD 385

(66), only the derlin proteins could be identified as SELMA components. Apart from that, 386

the Sec61 channel and the ubiquitin ligase Hrd1 are in discussion as potentially being 387

capable of fulfilling this function. On the one hand, we could not identify additional 388

symbiont Sec61 subunits, on the other hand, the diatom symbiont E3 ubiquitin ligase is 389

predicted to have only one transmembrane domain, and is therefore most likely not 390

capable of homotypic channel formation. However, the derlin proteins and the ubiquitin 391

ligase might form a membrane complex which connects translocation to ubiquitylation. 392

The presence of the ubiquitylation enzymes sUba, sUbc and ubiquitin itself in almost all 393

investigated organisms including apicomplexans suggests the presence of a ubiquitin-394

dependent mechanism in the PPC. Once the preprotein is ubiquitylated in the PPC, it 395

can be recognized by the Cdc48 complex (76). We could identify at least one symbiont-396

specific sCdc48 protein in all organisms investigated. The Cdc48-ATPase has been 397

shown to be a central component of ERAD, acting specifically in concert with its co-398

factors Ufd1 and Npl4 (54, 76). Although Cdc48 is known to have various cellular 399

18

functions, the identification and localization of a symbiont Npl4 protein of P. tricornutum 400

presented here now define the sCdc48-sUfd1-sNpl4 complex as a SELMA component. 401

However, other functions unrelated to protein transport together with so far unidentified 402

co-factors cannot be excluded. The new PPC-localized protein PtsUBX, most likely a 403

sCdc48 binding protein due to its UBX domain, might also be involved in SELMA 404

translocation akin to the case for UBX proteins in ERAD. These proteins can direct the 405

Cdc48-ATPase to a specific protein complex, in the context of ERAD to ubiquitin ligases 406

at the ER membrane (50, 62-63). 407

After translocation is completed, ERAD substrates are recognized by a set of cytosolic 408

proteins and processed for degradation by the proteasome (59, 76). The presence of a 409

relict symbiont proteasome in heterokontophytes (Table 2) raises the question of a 410

functional link between the ERAD derived SELMA machinery and degradation in the 411

PPC of these organisms. Several features of both machineries in the PPC argue 412

against such a connection. On the one hand, the PPC of apicomplexan parasites and 413

haptophytes lacks symbiont 20S subunits and therefore harbors a SELMA system 414

which seems to be completely independent of a proteasomal function. On the other 415

hand, proteasomal substrates are not only delivered by the ERAD system but can also 416

be degraded independently of ubiquitylation (8). The 20S core particle was shown to 417

have basal proteolytic activity towards unstructured or oxidized proteins (60). It is also 418

implicated in maturation and specific cleavage of various proteins, which gain access to 419

the proteolytic chamber through an interaction with N-termini of the α-subunits (8, 49). 420

In the PPC of heterokontophytes, we identified proteasomal subunits of the 20S core 421

particle including proteolytic active subunits. It was not possible to detect a complete set 422

of 20S subunits in any of the heterokontophyte species. Either the remaining α- and β-423

subunits in the genomes are too divergent to be recognized or the putative reduced 20S 424

19

particle in the PPC can vary in subunit composition, replacing some subunits by other 425

ones. 426

In addition, the recognition and unfolding of ubiquitylated proteasomal substrates is 427

mediated by the 19S regulatory particle of the proteasome which was not identified in a 428

PPC targeted version in heterokontophytes. However, the two newly identified PPC 429

proteins sUbq and sPng1 are counterparts to ScDsk2p and ScPng1p which are known 430

from ERAD to function between retro-translocation and degradation (43). The symbiont 431

ubiquilin-like protein PtsUbq lacks the C-terminal ubiquitin-associated (UBA) domain for 432

poly-ubiquitin binding present in ScDsk2p and mammalian ubiquilins for recognition of 433

proteasomal substrates (28, 48). In addition, the ubiquitin-like domain (UBQ) at the N-434

terminus of ScDsk2p was shown to bind proteasomal components of the 19S regulatory 435

particle as well as to ScUfd2p (35, 69), both not present in a symbiont version in the 436

PPC. Most likely, both proteins had to adapt to new functions. PtsPng1, the PPC-437

localized peptide N-glycanase, might either be involved in the maturation of PPC-438

localized proteins or may be required for efficient removal of glycan moieties of plastid 439

precursor proteins added in the ER lumen before transport across the third outermost 440

plastid membrane. In contrast to heterokontophytes, haptophytes and cryptophytes, 441

apicomplexans encode neither a host nor a symbiont Png1 protein. This is likely due to 442

a reduction of N-glycosylation capacities in these organisms, especially for apicoplast 443

proteins (14). 444

Another important feature of SELMA is the symbiont ubiquitin which shows alterations 445

at specific lysine residues. In heterokontophytes, the PPC-localized ubiquitin (sUbi) 446

does not possess the conserved lysine residues Lys48 and Lys63, while haptophyte 447

and apicomplexan (except P. falciparum) sUbi sequences still contain Lys48 but show 448

mutations at Lys63. Lys48 was shown to represent the most prominent position for poly-449

20

ubiquitylation leading to proteasomal degradation (72). Although recent work suggests a 450

more complex interplay between different ubiquitin linkages on various lysine residues 451

also in degradation (44), loss of Lys48 in the symbiont ubiquitins of heterokontophytes 452

might be an evolutionary adaptation required for the establishment of the symbiont 453

ERAD as a preprotein translocation system. This is supported by the finding that only 454

organisms with a symbiont proteasome (Table 2) show this ubiquitin Lys48 modification. 455

An exception is the cryptophyte G. theta with two ubiquitin domains in the predicted 456

GtsUbi sequence, one overall conserved and another having Lys mutations at both 457

positions. One might speculate about a separation of SELMA and proteasomal 458

degradation in the cryptophytes PPC based on different ubiquitins, which might be 459

caused by the different morphology, as cryptophytes – in contrast to all other organisms 460

with a secondary red algal symbiont- still synthesize proteins in the PPC. Ubiquitin 461

Lys63 is implicated in ubiquitylation processes related to functional modifications of 462

target proteins (72). Remarkably, loss of Lys63 in all organisms with a secondary plastid 463

of red algal origin leads to reduced ubiquitylation possibilities in the PPC in contrast to 464

the manifold mechanisms regulated by ubiquitylation in the host cytosol. Thus, it 465

remains to be determined if the symbiont ubiquitins can be used for both mono- and 466

polyubiquitylation on the remaining lysine residues. 467

The SELMA translocation machinery (Fig. 4) provides an interesting view into 468

evolutionary rearrangements and modifications of already existing mechanisms. During 469

the establishment of a red alga as an organelle, the symbiont ER-associated 470

degradation machinery was split into a translocation complex on the one hand and a 471

presumed degradation machinery on the other one. The former now represents the 472

second step of protein import into complex plastids across the periplastidal membrane, 473

whereas the latter one might be required for protein homeostasis in the PPC of only 474

21

certain groups of organisms with a red algal endosymbiont. Such modularization of the 475

well conserved ERAD translocation not only gave rise to SELMA but also was shown to 476

be the principle mechanism of the peroxisomal importomer, again a ubiquitin-dependent 477

translocation independent of proteasomal degradation (11). 478

479

Acknowledgements 480

We are supported by the Deutsche Forschungsgemeinschaft (Collaborative Research 481

Centre 593 for S.S., I.W., S.Z. and U.-G.M.; SFB TR1 for J.M.P.). D.M. is a fellow of the 482

International Max Planck Research School for Environmental, Cellular and Molecular 483

Microbiology (IMPRS-MIC). 484

485

References 486

1. Agrawal S, Striepen B. 2010. More membranes, more proteins: complex protein 487

import mechanisms into secondary plastids. Protist 161:672-687. 488

2. Agrawal S, van Dooren GG, Beatty WL, Striepen B. 2009. Genetic evidence 489

that an endosymbiont-derived endoplasmic reticulum-associated protein 490

degradation (ERAD) system functions in import of apicoplast proteins. J. Biol. 491

Chem. 284:33683-33691. 492

3. Apt KE, Grossman A, Kroth-Pancic P. 1996. Stable nuclear transformation of 493

the diatom Phaeodactylum tricornutum. Mol. Gen. Genet. 252:572-579. 494

4. Apt KE, Zaslavkaia L, Lippmeier JC, Lang M, Kilian O, Wetherbee R, 495

Grossman AR, Kroth PG. 2002. In vivo characterization of diatom multipartite 496

plastid targeting signals. J. Cell Sci. 115:4061-4069. 497

5. Armbrust EV, Berges JA, Bowler C, Green BR. 2004. The genome of the 498

diatom Thalassiosira pseudonana: ecology, evolution, and metabolism. Science 499

306:79-86. 500

6. Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao 501

X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger 502

JC, Kraemer E, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos 503

DS, Ross C, Stoeckert CJ, Treatman C, Wang H. 2009. PlasmoDB: a 504

22

functional genomic database for malaria parasites. Nucleic Acids Res. 37:D539-505

D543. 506

7. Bagola K, Mehnert M, Jarosch E, Sommer T. 2011. Protein dislocation from 507

the ER. Biochim. Biophys. Acta 1808:925-936. 508

8. Baugh JM, Viktorova EG, Pilipenko EV. 2009. Proteasomes can degrade a 509

significant proportion of cellular proteins independent of ubiquitination. J. Mol. 510

Biol. 386:814-827. 511

9. Bernsel A, Viklund H, Hennerdal A, Elofsson A. 2009. TOPCONS: consensus 512

prediction of membrane protein topology. Nucleic Acids Res. 37:W465-468. 513

10. Bolte K, Bullmann L, Hempel F, Bozarth A, Zauner S, Maier U-G. 2009. 514

Protein targeting into secondary plastids. J. Eukaryot. Microbiol. 56:9-15. 515

11. Bolte K, Gruenheit N, Felsner G, Sommer MS, Maier UG, Hempel F. 2011. 516

Making new out of old: recycling and modification of an ancient protein 517

translocation system during eukaryotic evolution. Mechanistic comparison and 518

phylogenetic analysis of ERAD, SELMA and the peroxisomal importomer. 519

Bioessays 33:368-376. 520

12. Bowler C, Allen AE, Badger JH, Grimwood J, Jabbari K, Kuo A, Maheswari 521

U, Martens C, Maumus F, Otillar RP, Rayko E, Salamov A, Vandepoele K, 522

Beszteri B, Gruber A, Heijde M, Katinka M, Mock T, Valentin K, Verret F, 523

Berges JA, Brownlee C, Cadoret J-P, Chiovitti A, Choi CJ, Coesel S, De 524

Martino A, Detter JC, Durkin C, Falciatore A, Fournet J, Haruta M, Huysman 525

MJJ, Jenkins BD, Jiroutova K, Jorgensen RE, Joubert Y, Kaplan A, Kroger 526

N, Kroth PG, La Roche J, Lindquist E, Lommer M, Martin-Jezequel V, Lopez 527

PJ, Lucas S, Mangogna M, McGinnis K, Medlin LK, Montsant A, Secq M-PO-528

L, Napoli C, Obornik M, Parker MS, Petit J-L, Porcel BM, Poulsen N, 529

Robison M, Rychlewski L, Rynearson TA, Schmutz J, Shapiro H, Siaut M, 530

Stanley M, Sussman MR, Taylor AR, Vardi A, von Dassow P, Vyverman W, 531

Willis A, Wyrwicz LS, Rokhsar DS, Weissenbach J, Armbrust EV, Green BR, 532

Van de Peer Y, Grigoriev IV. 2008. The Phaeodactylum genome reveals the 533

evolutionary history of diatom genomes. Nature 456:239-244. 534

13. Bullmann L, Haarmann R, Mirus O, Bredemeier R, Hempel F, Maier UG, 535

Schleiff E. 2010. Filling the gap, evolutionarily conserved Omp85 in plastids of 536

chromalveolates. J. Biol. Chem. 285:6848-6856. 537

14. Bushkin GG, Ratner DM, Cui J, Banerjee S, Duraisingh MT, Jennings CV, 538

Dvorin JD, Gubbels M-J, Robertson SD, Steffen M, O'Keefe BR, Robbins 539 PW, Samuelson J. 2010. Suggestive evidence for Darwinian selection against 540

asparagine-linked glycans of Plasmodium falciparum and Toxoplasma gondii. 541

Eukaryot. Cell 9:228-241. 542

15. Cavalier-Smith T. 1999. Principles of protein and lipid targeting in secondary 543

symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the 544

eukaryote family tree. J. Eukaryot. Microbiol. 46:347-366. 545

16. Cavalier-Smith T. 2000. Membrane heredity and early chloroplast evolution. 546

Trends Plant Sci. 5:174-182. 547

23

17. Chan CX, Yang EC, Banerjee T, Yoon HS, Martone PT, Estevez JM, 548

Bhattacharya D. 2011. Red and green algal monophyly and extensive gene 549

sharing found in a rich repertoire of red algal genes. Curr. Biol. 21:328-333. 550

18. Chung D-WD, Ponts N, Prudhomme J, Rodrigues EM, Le Roch KG. 2012. 551

Characterization of the ubiquitylating components of the human malaria 552

parasite’s protein degradation pathway. PLoS ONE 7:e43477. 553

doi:10.1371/journal.pone.0043477. 554

19. Cock JM, Sterck L, Rouzé P, Scornet D, Allen AE, Amoutzias G, Anthouard 555

V, Artiguenave F, Aury J-M, Badger JH, Beszteri B, Billiau K, Bonnet E, 556

Bothwell JH, Bowler C, Boyen C, Brownlee C, Carrano CJ, Charrier B, Cho 557

GY, Coelho SM, Collén J, Corre E, Da Silva C, Delage L, Delaroque N, 558

Dittami SM, Doulbeau S, Elias M, Farnham G, Gachon CMM, Gschloessl B, 559

Heesch S, Jabbari K, Jubin C, Kawai H, Kimura K, Kloareg B, Küpper FC, 560

Lang D, Le Bail A, Leblanc C, Lerouge P, Lohr M, Lopez PJ, Martens C, 561

Maumus F, Michel G, Miranda-Saavedra D, Morales J, Moreau H, Motomura 562

T, Nagasato C, Napoli CA, Nelson DR, Nyvall-Collén P, Peters AF, Pommier 563

C, Potin P, Poulain J, Quesneville H, Read B, Rensing SA, Ritter A, 564

Rousvoal S, Samanta M, Samson G, Schroeder DC, Ségurens B, 565

Strittmatter M, Tonon T, Tregear JW, Valentin K, von Dassow P, Yamagishi 566

T, Van de Peer Y, Wincker P. 2010. The Ectocarpus genome and the 567

independent evolution of multicellularity in brown algae. Nature 465:617-621. 568

20. Derocher AE, Karnataki A, Vaney P, Parsons M. 2012. Apicoplast targeting of 569

a Toxoplasma gondii transmembrane protein requires a cytosolic tyrosine-based 570

motif. Traffic 13:694-704. 571

21. Deschamps P, Moreira D. 2012. Re-evaluating the Green Contribution to 572

Diatom Genomes. Genome Biol. Evol. 4:683-688. doi:10.1093/gbe/evs053. 573

22. Dorrell RG, Smith AG. 2011. Do red and green make brown?: perspectives on 574

plastid acquisitions within chromalveolates. Eukaryot. Cell 10:856-868. 575

23. Douglas S, Zauner S, Fraunholz M, Beaton M, Penny S, Deng LT, Wu X, 576

Reith M, Cavalier-Smith T, Maier UG. 2001. The highly reduced genome of an 577

enslaved algal nucleus. Nature 410:1091-1096. 578

24. Dyrløv Bendtsen J, Nielsen H, von Heijne G, Brunak S. 2004. Improved 579

prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783-795. 580

25. Emanuelsson O, Nielsen H, Brunak S, von Heijne G. 2000. Predicting 581

subcellular localization of proteins based on their N-terminal amino acid 582

sequence. J. Mol. Biol. 300:1005-1016. 583

26. Felsner G, Sommer MS, Gruenheit N, Hempel F, Moog D, Zauner S, Martin 584

W, Maier UG. 2011. ERAD components in organisms with complex red plastids 585

suggest recruitment of a preexisting protein transport pathway for the periplastid 586

membrane. Genome Biol. Evol. 3:140-150. doi:10.1093/gbe/evq074. 587

27. Frommolt R, Werner S, Paulsen H, Goss R, Wilhelm C, Zauner S, Maier UG, 588

Grossman AR, Bhattacharya D, Lohr M. 2008. Ancient recruitment by 589

24

chromists of green algal genes encoding enzymes for carotenoid biosynthesis. 590

Mol. Biol. Evol. 25:2653-2667. 591

28. Funakoshi M, Sasaki T, Nishimoto T, Kobayashi H. 2002. Budding yeast 592

Dsk2p is a polyubiquitin-binding protein that can interact with the proteasome. 593

Proc. Natl. Acad. Sci. U. S. A. 99:745-750. 594

29. Gajria B, Bahl A, Brestelli J, Dommer J, Fischer S, Gao X, Heiges M, Iodice 595

J, Kissinger JC, Mackey AJ, Pinney DF, Roos DS, Stoeckert CJ, Wang H, 596

Brunk BP. 2008. ToxoDB: an integrated Toxoplasma gondii database resource. 597

Nucleic Acids Res. 36:D553-D556. 598

30. Gobler CJ, Berry DL, Dyhrman ST, Wilhelm SW, Salamov A, Lobanov AV, 599

Zhang Y, Collier JL, Wurch LL, Kustka AB, Dill BD, Shah M, VerBerkmoes 600

NC, Kuo A, Terry A, Pangilinan J, Lindquist EA, Lucas S, Paulsen IT, 601

Hattenrath-Lehmann TK, Talmage SC, Walker EA, Koch F, Burson AM, 602

Marcoval MA, Tang Y-Z, Lecleir GR, Coyne KJ, Berg GM, Bertrand EM, Saito 603

MA, Gladyshev VN, Grigoriev IV. 2011. Niche of harmful alga Aureococcus 604

anophagefferens revealed through ecogenomics. Proc. Natl. Acad. Sci. U. S. A. 605

108:4352-4357. 606

31. Gould SB, Sommer MS, Kroth PG, Gile GH, Keeling PJ, Maier UG. 2006. 607

Nucleus-to-nucleus gene transfer and protein retargeting into a remnant 608

cytoplasm of cryptophytes and diatoms. Mol. Biol. Evol. 23:2413-2422. 609

32. Gould SB, Waller RF, McFadden GI. 2008. Plastid evolution. Annu. Rev. Plant 610

Biol. 59:491-517. 611

33. Green BR. 2011. After the primary endosymbiosis: an update on the 612

chromalveolate hypothesis and the origins of algae with Chl c. Photosynth. Res. 613

107:103-115. 614

34. Gruber A, Vugrinec S, Hempel F, Gould SB, Maier U-G, Kroth PG. 2007. 615

Protein targeting into complex diatom plastids: functional characterisation of a 616

specific targeting motif. Plant Mol. Biol. 64:519-530. 617

35. Hänzelmann P, Stingele J, Hofmann K, Schindelin H, Raasi S. 2010. The 618

yeast E4 ubiquitin ligase Ufd2 interacts with the ubiquitin-like domains of Rad23 619

and Dsk2 via a novel and distinct ubiquitin-like binding domain. J. Biol. Chem. 620

285:20390-20398. 621

36. Heiges M, Wang H, Robinson E, Aurrecoechea C, Gao X, Kaluskar N, 622

Rhodes P, Wang S, He C-Z, Su Y, Miller J, Kraemer E, Kissinger JC. 2005. 623

CryptoDB: a Cryptosporidium bioinformatics resource update. Nucleic Acids Res. 624

34:D419-D422. 625

37. Hempel F, Bozarth A, Sommer MS, Zauner S, Przyborski JM, Maier U-G. 626

2007. Transport of nuclear-encoded proteins into secondarily evolved plastids. 627

Biol. Chem. 388:899-906. 628

38. Hempel F, Bullmann L, Lau J, Zauner S, Maier UG. 2009. ERAD-derived 629

preprotein transport across the second outermost plastid membrane of diatoms. 630

Mol. Biol. Evol. 26:1781-1790. 631

25

39. Hempel F, Felsner G, Maier UG. 2010. New mechanistic insights into pre-632

protein transport across the second outermost plastid membrane of diatoms. Mol. 633

Microbiol. 76:793-801. 634

40. Hoseki J, Ushioda R, Nagata K. 2010. Mechanism and components of 635

endoplasmic reticulum-associated degradation. J. Biochem. 147:19-25. 636

41. Keeling PJ. 2009. Chromalveolates and the evolution of plastids by secondary 637

endosymbiosis. J. Eukaryot. Microbiol. 56:1-8. 638

42. Kilian O, Kroth PG. 2005. Identification and characterization of a new conserved 639

motif within the presequence of proteins targeted into complex diatom plastids. 640

Plant J. 41:175-183. 641

43. Kim I, Ahn J, Liu C, Tanabe K, Apodaca J, Suzuki T, Rao H. 2006. The Png1–642

Rad23 complex regulates glycoprotein turnover. J. Cell Biol. 172:211-219. 643

44. Kravtsova-Ivantsiv Y, Ciechanover A. 2012. Non-canonical ubiquitin-based 644

signals for proteasomal degradation. J. Cell Sci. 125:539-548. 645

45. Letunic I, Doerks T, Bork P. 2012. SMART 7: recent updates to the protein 646

domain annotation resource. Nucleic Acids Res. 40:D302-D305. 647

46. Li H-m, Chiu C-C. 2010. Protein transport into chloroplasts. Annu. Rev. Plant 648

Biol. 61:157-180. 649

47. Lim L, Kalanon M, McFadden GI. 2009. New proteins in the apicoplast 650

membranes: time to rethink apicoplast protein targeting. Trends Parasitol. 651

25:197-200. 652

48. Lim PJ, Danner R, Liang J, Doong H, Harman C, Srinivasan D, Rothenberg 653

C, Wang H, Ye Y, Fang S, Monteiro MJ. 2009. Ubiquilin and p97/VCP bind 654

erasin, forming a complex involved in ERAD. J. Cell Biol. 187:201-217. 655

49. Lin Y-C, Chen H-M, Chou I-M, Chen A-N, Chen C-P, Young G-H, Lin C-T, 656

Cheng C-H, Chang S-C, Juang R-H. 2012. Plastidial starch phosphorylase in 657

sweet potato roots is proteolytically modified by protein-protein interaction with 658

the 20S proteasome. PLoS ONE 7:e35336. doi:10.1371/journal.pone.0035336. 659

50. Madsen L, Kriegenburg F, Vala A, Best D, Prag S, Hofmann K, Seeger M, 660

Adams IR, Hartmann-Petersen R. 2011. The tissue-specific Rep8/UBXD6 661

tethers p97 to the endoplasmic reticulum membrane for degradation of misfolded 662

proteins. PLoS ONE 6:e25061. doi:10.1371/journal.pone.0025061. 663

51. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-664

Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, 665

Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, 666

Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang 667

D, Zhang N, Zheng C, Bryant SH. 2011. CDD: a Conserved Domain Database 668

for the functional annotation of proteins. Nucleic Acids Res. 39:D225-D229. 669

52. Mason JM, Arndt KM. 2004. Coiled Coil Domains: Stability, Specificity, and 670

Biological Implications. ChemBioChem 5:170-176. 671

26

53. Matsuzaki M, Misumi O, Shin-I T, Maruyama S, Takahara M, Miyagishima S-672

Y, Mori T, Nishida K, Yagisawa F, Nishida K, Yoshida Y, Nishimura Y, Nakao 673

S, Kobayashi T, Momoyama Y, Higashiyama T, Minoda A, Sano M, Nomoto 674

H, Oishi K, Hayashi H, Ohta F, Nishizaka S, Haga S, Miura S, Morishita T, 675

Kabeya Y, Terasawa K, Suzuki Y, Ishii Y, Asakawa S, Takano H, Ohta N, 676

Kuroiwa H, Tanaka K, Shimizu N, Sugano S, Sato N, Nozaki H, Ogasawara 677

N, Kohara Y, Kuroiwa T. 2004. Genome sequence of the ultrasmall unicellular 678

red alga Cyanidioschyzon merolae 10D. Nature 428:653-657. 679

54. Meyer H, Bug M, Bremer S. 2012. Emerging functions of the VCP/p97 AAA-680

ATPase in the ubiquitin system. Nat. Cell Biol. 14:117-123. 681

55. Moog D, Stork S, Zauner S, Maier U-G. 2011. In silico and in vivo investigations 682

of proteins of a minimized eukaryotic cytoplasm. Genome Biol. Evol. 3:375-382. 683

doi:10.1093/gbe/evr031. 684

56. Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K, Bhattacharya D. 685

2009. Genomic footprints of a cryptic plastid endosymbiosis in diatoms. Science 686

324:1724-1726. 687

57. Nassoury N, Cappadocia M, Morse D. 2003. Plastid ultrastructure defines the 688

protein import pathway in dinoflagellates. J. Cell Sci. 116:2867-2874. 689

58. Patron NJ, Waller RF. 2007. Transit peptide diversity and divergence: a global 690

analysis of plastid targeting signals. Bioessays 29:1048-1058. 691

59. Richly H, Rape M, Braun S, Rumpf S, Hoege C, S. 2005. A series of ubiquitin 692

binding factors connects CDC48/p97 to substrate multiubiquitylation and 693

proteasomal targeting. Cell 120:73-84. 694

60. Saeki Y, Tanaka K. 2012. Assembly and function of the proteasome. Methods 695

Mol. Biol. 832:315-337. 696

61. Sanchez-Puerta MV, Delwiche CF. 2008. A hypothesis for plastid evolution in 697

chromalveolates. J. Phycol. 44:1097-1107. 698

62. Schuberth C, Buchberger A. 2005. Membrane-bound Ubx2 recruits Cdc48 to 699

ubiquitin ligases and their substrates to ensure efficient ER-associated protein 700

degradation. Nat. Cell Biol. 7:999-1006. 701

63. Schuberth C, Buchberger A. 2008. UBX domain proteins: major regulators of 702

the AAA ATPase Cdc48/p97. Cell. Mol. Life Sci. 65:2360-2371. 703

64. Sheiner L, Demerly JL, Poulsen N, Beatty WL, Lucas O, Behnke MS, White 704

MW, Striepen B. 2011. A systematic screen to discover and analyze apicoplast 705

proteins identifies a conserved and essential protein import factor. PLoS Path. 706

7:e1002392. doi:10.1371/journal.ppat.1002392. 707

65. Sheiner L, Striepen B. 2012. Protein sorting in complex plastids. Biochim. 708

Biophys. Acta. doi:10.1016/j.bbamcr.2012.05.030. 709

66. Smith MH, Ploegh HL, Weissman JS. 2011. Road to ruin: targeting proteins for 710

degradation in the endoplasmic reticulum. Science 334:1086-1090. 711

27

67. Sommer MS, Gould SB, Lehmann P, Gruber A, Przyborski JM, Maier U-G. 712

2007. Der1-mediated preprotein import into the periplastid compartment of 713

chromalveolates? Mol. Biol. Evol. 24:918-928. 714

68. Spork S, Hiss JA, Mandel K, Sommer M, Kooij TWA, Chu T, Schneider G, 715

Maier UG, Przyborski JM. 2009. An unusual ERAD-like complex is targeted to 716

the apicoplast of Plasmodium falciparum. Eukaryot. Cell 8:1134-1145. 717

69. Su V, Lau AF. 2009. Ubiquitin-like and ubiquitin-associated domain proteins: 718

significance in proteasomal degradation. Cell. Mol. Life Sci. 66:2819-2833. 719

70. Suzuki T, Park H, Lennarz WJ. 2002. Cytoplasmic peptide:N-glycanase 720

(PNGase) in eukaryotic cells: occurrence, primary structure, and potential 721

functions. FASEB J. 16:635-641. 722

71. Teich R, Zauner S, Baurain D, Brinkmann H, Petersen J. 2007. Origin and 723

distribution of Calvin cycle fructose and sedoheptulose bisphosphatases in 724

plantae and complex algae: a single secondary origin of complex red plastids 725

and subsequent propagation via tertiary endosymbioses. Protist 158:263-276. 726

72. Trempe J-F. 2011. Reading the ubiquitin postal code. Curr. Opin. Struct. Biol. 727

21:792-801. 728

73. van Dooren GG, Tomova C, Agrawal S, Humbel BM, Striepen B. 2008. 729

Toxoplasma gondii Tic20 is essential for apicoplast protein import. Proc. Natl. 730

Acad. Sci. U. S. A. 105:13574-13579. 731

74. Visendi P, Ng'ang'a W, Bulimo W, Bishop R, Ochanda J, de Villiers EP. 732

2011. TparvaDB: a database to support Theileria parva vaccine development. 733

Database (Oxford) 2011:bar015. 734

75. Woehle C, Dagan T, Martin WF, Gould SB. 2011. Red and problematic green 735

phylogenetic signals among thousands of nuclear genes from the photosynthetic 736

and apicomplexa-related Chromera velia. Genome Biol. Evol. 3:1220-1230. 737

doi:10.1093/gbe/evr100. 738

76. Wolf DH, Stolz A. 2012. The Cdc48 machine in endoplasmic reticulum 739

associated protein degradation. Biochim. Biophys. Acta 1823:117-124. 740

77. Zaslavskaia LA, Lippmeier JC, Kroth PG, Grossman AR, Apt KE. 2000. 741

Transformation of the diatom Phaeodactylum tricornutum (Bacillariophyceae) 742

with a variety of selectable marker and reporter genes. J. Phycol. 36:379-386. 743

744

745

28

Figure legends 746

747

Figure 1 748

In vivo-localization and domain organization of new SELMA proteins of the diatom P. 749

tricornutum. A: In vivo-localizations of PtsUBX, PtsNpl4, PtsPng1 and PtsUbq as eGFP 750

fusion proteins in P. tricornutum show the characteristic PPC fluorescence in the middle 751

of the two plastid lobes (scale: 10 µm, TL: transmission light, PAF: plastid 752

autofluorescence). B: Structural overview of the domain composition of PtsNpl4, PtsUbq 753

and PtsPng1 in comparison to the respective S. cerevisiae ERAD protein (ScNpl4p, 754

ScDsk2p and ScPng1p). Whereas PtsNpl4 and PtsPng1 share the conserved domains 755

of their yeast counterparts, PtsUbq lacks the UBA domain for poly-ubiquitin binding. 756

PtsUBX cannot be assigned to a specific ERAD protein. (red: signal peptide, blue: TPL 757

predicted, light blue: no TPL predicted, UBX: domain present in ubiquitin-regulatory 758

proteins, orange: coiled coil region, UBQ: ubiquitin homologues, STI1: heat shock 759

chaperonin-binding motif, UBA: ubiquitin associated domain; TG: 760

transglutaminase/protease-like homologues; Rad4: Rad4 transglutaminase-like 761

domain). 762

Figure 2 763

Alignment of the ubiquitin domain of hUbi and sUbi sequences with the S. cerevisiae 764

and C. merolae ubiquitins. The host ubiquitin sequences are derived from polyubiquitins 765

consisting of multiple ubiquitin domains of the same sequence. Protein sequences of 766

sUbi from P. tricornutum, F. cylindrus and T. pseudonana share the same lysine 767

mutations at positions 48 and 63. E. huxleyi, T. parva and B. bovis only show an altered 768

Lys63 position. Both ubiquitin domains of the F. cylindrus symbiont di-ubiquitin have 769

29

identical protein sequences, in contrast, the di-ubiquitin of G. theta is depicted as two 770

independent ubiquitin domains (GtsUbi_155024_1/2) (* There is no polyubiquitin in C. 771

merolae, therefore a ubiquitin-ribosomal fusion protein was included in the alignment. 772

For detailed information on protein sequences see supplemental file 1). 773

Figure 3 774

In vivo-localization of new symbiont 20S proteasomal components in the PPC of the 775

diatom P. tricornutum. eGFP fusion proteins Ptsβ1 and Ptsα3-1 show fluorescence in 776

the middle of the two plastid lobes indicating a PPC localization in contrast to the host 777

proteins Pthβ7 and PthRpn10 which localize in areas around the plastid (scale: 10 µm; 778

TL: transmission light; PAF: plastid autofluorescence). 779

Figure 4 780

Schematic model of protein import into the PPC across the SELMA translocation 781

complex in diatoms. Upon recognition of a SELMA substrate in the cER lumen, 782

precursor proteins are translocated across the membrane complex composed of sDer1-783

1/-2 and the ubiquitin ligase ptE3p. As soon as they reach the PPC, the precursors are 784

ubiquitylated via a PPC-located ubiquitylation machinery. This leads to a recruitment of 785

the Cdc48 complex with its co-factors sUfd1 and sNpl4 and translocation is completed. 786

The ubiquitin moiety is most likely removed prior of protein maturation for PPC resident 787

proteins or further transport for plastid proteins. The identified 20S proteasomal 788

components function according to current knowledge independent of ubiquitylation. 789

(cER: chloroplast ER; PPC: periplastidal compartment; IMS: intermembrane space; 790

modified after (39), orange indicates proteins identified in this study) 791

30

TABLE 1 Overview of all identified host ERAD and symbiont SELMA components in organisms with secondary plastids of red algal origin compared to ERAD proteins of four red algal species.

Protein complex/

function

Protein name host (symbiont)

Red algae Heterokontophytes Haptophytes Cryptophytes Apicomplexans

C

. m

ero

lae

P. c

rue

ntu

m

C.

tub

erc

ulo

su

m

G.

su

lph

ura

ria

P.

tric

orn

utu

m

T.

pse

ud

on

an

a

F.

cyli

nd

rus

E.

sil

ico

los

us

A.

an

op

ha

geff

ere

ns

E.

hu

xle

yi

G.

the

ta

P.

falc

ipa

rum

T.

go

nd

ii

B. b

ovis

N. c

an

inu

m

T.

parv

a

C. p

arv

um

H S H S H S H S H S H S H S H S H S H S H S H S H

ER translocon Sec61 X X X X X(67)

Xa X X

a X X X

(67) X

(67) X X X X X

Derlin proteins Dfm1/ hDer1-1

(sDer1-1)

X X X X X(38, 67)

X(38, 67)

X X X X X X X X X(26)

X(26)

X(67)

X(67)

X(67-68)

X(67-68)

X(2, 68)

X(2, 68)

X(68)

X X X X(68)

Der1/hDer1-2

(sDer1-2)

X X X X(38, 67)

X(38, 67)

Xa X X (X) X X X X X

(26) X

(26) X X X

(67-68) X

(67-68) X

(2, 68) X

(68) X

(68) X X

(68) X

(68)

Ubiquitylation Hrd1/Der3 (ptE3P) X X X X X(67)

X(39)

X X X X X X X X X(67)

X(18, 67-68)

X(65)

X(68)

X X(68)

X(68)

Hrd3p X X X X X X Xa,(67)

X(65)

X(68)

X

Uba1 (sUba1) X X X X X(67)

X(67)

X X X X X (X) X (X) Xa,(26)

X(26)

Xa X X

(18, 67-68) X

(67-68) X

(65) X

(64) X X X X

(68)

Ubc (sUbc) Xa X

a X

a X

a X

a, (67) X

a X

a X

a X

a X

a X

a X

a X

a X

a X X

a X

a,(31) X

a(18) X

(68) X

a X

(2, 68) X

a X

(68) X

a X X

a X

(68) X

a

Doa10 X X X X X X X Xa X X

polyubiquitin (sUbi) X(67)

X(67)

X X X X X Xa X X X

a X X

(67-68) X

(68) X X

(68) X

(68) X X

(68) X X

(68)

Cdc48 complex Cdc48 (sCdc48-1) X X X X(67)

X(67)

X X X X X X X X X(26)

X(26)

X X(67)

X(67-68)

X(67-68)

X(2, 68)

X(2, 68)

X(68)

X X X X(68)

X X(68)

(sCdc48-2) X(55)

X X X X X X

Ufd1 (sUfd1) X X X X(67)

X(67)

X X X X Xa X X X X

(26) X

(26) X

a X

(67) X

(67-68) X

(67-68) X

(2, 68) X

(2, 68) X

(68) X

(68) X X X

(68) X

(68) X

(68)

Npl4 (sNpl4) X X X X(67)

X X X X X X X X X X(67-68)

X(68)

X(68)

X(68)

X(68)

(sUBX) X (X) (X) X

(sPUB) X(55)

X X X X

Processing Png1 (sPng1) X X X X(67)

X X X X X X X X X X Xa X

Dsk2 (sUbq) X X X X X X X X X X X X X X X X X X X X

Rad23 X X X X(67)

X X Xa X X X X X X

(67) X X X X X

Ufd2 X X X X X(67)

X X X X X Xa X

(67) X X X X X

(ptDUP) X(39)

(X) X X X X

Unknown (PPP1) X(64)

X(64)

X(55, 64)

X(64)

X(64)

X(64)

X(64)

X(64)

X(64)

X(64)

X(64)

X(64)

X(64)

Chaperones Hsp70 (sHsp70) X X X X(67)

X (31)

X X X X X X X X X X X X X X X X X

Hsp40 X X X X X X X X X X X X

(sDPC) X(52)

X X X X

31

Proteins similar to ScUsa1p, ScCue1p and ScUbx4p could not be identified in host or symbiont version and are therefore not included in the table. Ubc proteins are further assigned to the most similar S. cerevisiae Ubc enzyme in supplemental file 2 (Table S1). X: detected; (X): symbiont gene detected by homology but without targeting sequence;

a: more than one gene detected. Numbers in

brackets refer to respective literature. Protein identifiers can be found in supplemental file 2 (Table S1).

792

793

32

TABLE 2: Overview of host and symbiont 20S proteasomal components of heterokontophytes and cryptophytes compared to those from red algae.

Red algae Heterokontophytes

Crypto- phytes

Protein name

C.

mero

lae

P.

cru

en

tum

C.

tub

erc

ulo

su

m

G.

su

lph

ura

ria

P.

tric

orn

utu

m

T.

ps

eu

do

na

na

F.

cyli

nd

rus

E.

sil

ico

los

us

A.

an

op

ha

ge

ffe

ren

s

G.

the

ta

H S H S H S H S H S H S

α1 X X

X X (55)

X

X

X

X

X X

α2 X X

X X(55)

X(55)

X X X X

X X X X X

α3 X X X X X(55)

Xa X X

a X X X X X

X X

α4 X X X X X(55)

X

X

X X X

X X

α5 X X

X X(55)

X

X

X X X

X X

α6 X X X X X(55)

X

X

X X X X X X

α7 X X X X X(55)

X(55)

X

X X X

X

X X

alpha general

Xa,(55)

X X

Xa

X

β1 X X

X X

(55) X X X X X X

a

X

a

X X

β2 X X

X X(55)

X X X X X X

Xa

X X

β3 X X

X X(55)

X(55)

X X X X Xa

X X X X

β4 X X

X X(55)

X

X

X

X

X

β5 X X X X Xa,(55)

Xa

X

a

X X X X X X

β6 X X

X X(55)

X(55)

X X X X X X X

X X

β7 X X X X X(55)

X(55)

X

X (X) X

X

X X

X: detected; a: more than one gene detected; (X): symbiont gene detected by homology but without targeting

sequence. Numbers in brackets refer to respective literature. Protein identifiers can be found in supplemental file 2 (Table S2).

794