the putative synaptotagmin protein encoded by the syt1 gene of the picoplanktonic alga micromonas is...

7/30/2019 The Putative Synaptotagmin Protein Encoded by the SYT1 Gene of the Picoplanktonic Alga Micromonas is a Novel

1/17

36 Mukherjee

Int. J. Biosci. 2012

RESEARCH PAPER

The putative synaptotagmin protein encoded by the SYT1 gene

of the picoplanktonic alga Micromonas is a novel member of

C2-domain containing proteins: evidence from in silico

characterization and homology modeling

Ashutosh Mukherjee

Department of Botany, Dinabandhu Mahavidyalaya, Bongaon, North 24 Parganas - 743235, West

Bengal, India

Received: 14 September 2012Revised: 21 September 2012Accepted: 22 September 2012

Key words: Disorder, template, dendrogram, ramachandran plot, flexibility, electrostatic potential.

Abstract

Synaptotagmin proteins are a class of membrane trafficking proteins and controls endocytosis and exocytosis of

synaptic vesicles in animals. Increasing number of plant nucleotide and protein data shows they are also present

in plants.Micromonas pusilla is a picophytoplanktonic alga belonging to Prasinophyceae which is believed to be

the ancient member of green plant lineage and thus, very useful in various evolutionary studies. The SYT1 gene of

this alga encodes a putative synaptotagmin which shows novel features. In this study, this protein has been

characterized by several bioinformatic tools. The protein contains several novel motifs and domains besides the

C2 domain. The three dimensional structure has been predicted in silico by homology modeling to gather

knowledge about the structure of the ancient forms of the plant synaptotagmin protein. The C2 domain in this

protein itself is somewhat different from the known structures. The spatial distribution of the active site amino

acids around the calcium ion showed that some amino acids outside the C2 domain are also involved in calcium

binding which is a novel feature of this protein.

Corresponding Author: Ashutosh Mukherjee ashutoshcaluniv@gmail.com

International Journal of Biosciences (IJB)ISSN: 2220-6655 (Print) 2222-5234 (Online)

Vol. 2, No. 10(1), p. 36-52, 2012http://www.innspub.net


2/17

37 Mukherjee


Introduction

Synaptotagmins are a group of membrane

trafficking proteins characterized by the presence of

an N-terminal transmembrane region (TMR), a

linker of variable length and two tandemly arranged

C-terminal C2 domains (Craxton, 2004), called C2A

and C2B. The C2 domain is a Ca2+-binding protein

domain, approximately 130-145 amino acids long

which are found in many membrane-associated

signaling proteins in a large number of organisms

(Nalefski and Falke, 1996). It is considered that Ca2+

neutralizes negatively charged residues in the loop

regions of the C2 domain and permits its interaction

with phospholipids in the membrane which leads to

trafficking (Rizo and Sudhof, 1998). In mammals,

there are 15 members of synaptotagmin family and

many of these proteins act in the regulated synaptic

vesicle exocytosis required for efficient

neurotransmission (Craxton, 2004). They are

calcium sensors and regulate exocytosis and

endocytosis of synaptic vesicles. Although they were

thought to be exclusive to animals, they have also

identified from plants (Lewis and Lazarowitz,

2010). From the sequenced plant genomes, many

synaptotagmin genes have been identified byseveral computational procedures (Craxton, 2004).

The picoplanktonic alga Micromonas pusilla is an

important model organism in developmental

biology and evolutionary biology, as it belong to

Prasinophyceae which is thought to be the anciently

diverged sister clade to land plants (Worden et al.,

2009). Analyses of the genome of this small

unicellular eukaryote offer valuable insights into the

dynamic nature of early plant evolution. The

genome of this picoplankton contains one SYT1

gene which encodes one C2-domain containing

protein annotated as putative synaototagmin

(Worden et al., 2009). The protein is 1053 amino

acid long and the C2 domain spans for 214 amino

acids, which is much longer than the average length

of a C2 domain (130-145 amino acids). Additionally,

Initial BLAST (Altschul et al., 1990) search against

NCBI non-redundant protein database revealed

several plant synaptotagmins with high sequence

similarity in the C2-domain region but outside the

C2-domain, no sequence similarity was found with

any other protein. As this is a 1053 amino acid long

protein and C2-domain only spans for 214 amino

acids, a large portion of the protein is

uncharacterized. Thus, further characterization

including the presence of known or novel domains,

motifs in this region is needed for better

understanding of the structural and functional

properties of this ancient form of C2-domain

containing putative synaptotagmin protein.

Biological function of a protein is also the

manifestation of its tertiary structure and

knowledge of the structural organization of the

protein is a prerequisite for understanding its

functional aspects (Paital et al., 2011). However, no

three-dimensional structure of this C2 domain

containing protein from Micromonas is known.

Thus, it would be useful to recognize the 3D

structure of this protein for the understanding of its

functional aspects. In absence of crystal structure,

homology modeling, which is done in silico,

provides a faster way to obtain structural insight

into the protein (Dolan et al., 2012). Additionally,

identification of the Ca2+ binding residues andknowledge about their interaction with the ligand

are necessary for understanding of its functional

properties. This study was conducted with the help

of several bioinformatics approaches including

homology modeling to a) investigate the

physicochemical, structural and functional

properties of this protein, b) analyze the structure of

the whole protein and the C2 domain and c) study

the interaction of the active site amino acid residues

with the Ca2+ ion.

Materials and methods

Sequence retrieval

The Micromonas pusilla Ca2+-lipid binding protein

sequence containing C2 domain i.e. putative

synaptotagmin (GenBank accession

XP_002504251; GI: 255082530; further called as

SYT1 in this study) was downloaded from the NCBI

Refseq (Pruitt et al., 2007) database

(http://www.ncbi.nlm.nih.gov/projects/RefSeq/).
http://www.ncbi.nlm.nih.gov/projects/RefSeq/http://www.ncbi.nlm.nih.gov/projects/RefSeq/


3/17

Fig. 1. Dendrogram showing the phylogenetic relationship of the SYT1 from Micromonas with other C2-domain

containing proteins. The SYT1 fromMiromonas is shown in a grey box.

The protein sequence was predicted by conceptual

translation from an mRNA sequence of

Micromonas sp. RCC299 (Worden et al., 2009).

The protein is 1053 amino acids long and the C2

domain (COG5038) spans from residue 282-495.

Three dimensional crystal structure of this protein

was not yet available in the Protein Data bank. This

sequence was further utilized for characterization

and structure prediction.

38 Mukherjee



4/17

Fig. 2. Multiple sequence alignment of the templates and the target protein as visualized with Jalview.

Phylogenetic analysis

Protein sequences related to SYT1 were searched

using NCBI BLASTP (Altschul et al., 1990)

program. For evaluating the phylogenetic

relationship, the resulting sequences (excluding

hypothetical and predicted sequences) were aligned

using alignment explorer in Mega 5.0 (Tamura et

al., 2011) with default parameters. Unrooted

phylogenetic tree of these sequences was

constructed by the neighbor-joining (NJ) method in

Mega 5 program. The level of confidence was

estimated using bootstrap analysis of 1000

replications.

39 Mukherjee



5/17

40 Mukherjee


Fig. 3. Ramachandran plot of the modeled SYT1 protein.


6/17

41 Mukherjee


Fig. 4. Details of the modeled three-dimensional structure of SYT1 protein. A) Ribbon diagram of the protein as

shown in Chimera. The alpha helices are shown in orange, beta sheets are shown in yellow and loops are coloured

in cyan; B) Position of the C2-domain (orange) into the protein.

Physicochemical analysis

The computation of various physicochemical

parameters, such as amino acid composition,

isoelectric point (pI), total number of negatively and

positively charged residues, instability index,

aliphatic index and Grand Average of Hydropathy

(GRAVY), was done using ProtParam tool

(Gasteiger et al., 2005) available at

http://us.expasy.org/ tools/protparam.html.

Fig. 5. Topology of the modeled SYT1 protein as predicted by PDBsum. Helices and strands outside the C2-

domain are shown in red and pink, respectively. The helices and strands of C2-domain are shown in blue and

green, respectively.


7/17

42 Mukherjee


Fig. 6. Flexibility of modeled three-dimensional structure of SYT1. A) Flexibility to rigidity as shown in a

gradient of red to white in the 3D model; B) Flexibility along the length of the protein as indicated by peaks; C)

Flexibility as indicated in a red white gradient over the entire sequence.

Fig. 7. A) Protein disorder (disordered regions are indicated as blue regions); B) Interacting surface (shown as

red regions) and C) Surface electrostatic potential of SYT1 (Red portions are electronegative and blue portions are

electropositive. White portions are neutral).

Fig. 8. Interaction of Ca2+

ion with the SYT1 protein. a. Three-dimensional orientation of side chains of activesite residues surrounding Ca2+ ion (cyan ribbon represent part of C2-domain and orange ribbon includes

important amino acids for Ca2+ binding outside the C2-domain); b. LIGPLOT of SYT1 complexed with Ca2+.

Structural and functional characterization

Secondary structure prediction was carried out with

SOPMA (Geourjon and Deleage 1995). The CDD

database (Marchler-Bauer et al., 2011) was searched

for domains using CD search (Marchler-Bauer and

Bryant, 2004). Motifs were predicted using Multiple

Em for Motif Elicitation (MEME) suite (Bailey et

al., 2009) respectively using default parameters to

gain insight about its function. Motifs found with

MEME were further searched with MAST tool for

known matches for the motifs. Motif Scan (Pagni et

al., 2007; Sigrist et al., 2010) server (http://hits.isb-

sib.ch/cgi-bin/PFSCAN) and SMART (Schultz et al.,

1998; Letunic et al., 2012) server

(http://smart.embl-heidelberg.de/) were also usedfor scanning signature domains with the default
http://hits.isb-sib.ch/cgi-bin/PFSCANhttp://hits.isb-sib.ch/cgi-bin/PFSCANhttp://smart.embl-heidelberg.de/http://smart.embl-heidelberg.de/http://hits.isb-sib.ch/cgi-bin/PFSCANhttp://hits.isb-sib.ch/cgi-bin/PFSCAN


8/17

43 Mukherjee


parameters, including outlier homologs and

homologs of known structures, Pfam domains,

signal peptides and internal repeats. The SOSUI

(Hirokawa et al., 1998) program

(http://bp.nuap.nagoyau.ac.jp/sosui/sosui_submit.

html) was employed to predict the presence of any

transmembrane region. Subcellular localization was

predicted using TargetP (Emanuelsson et al., 2000)

1.1 server

(http://www.cbs.dtu.dk/services/TargetP/abstract.

php). Protein disorder was predicted using

Disopred (Ward et al., 2004) (http://

bioinf.cs.ucl.ac.uk/disopred/) server.

Homology modeling

Primarily, HHpred (Sding et al., 2005) server

(http://toolkit.tuebingen.mpg.de/hhpred)as well as

PSI- BLAST (Altschul et al., 1997) server

(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Pro

teins) was used for identification of suitable

templates from the PDB protein structure database

(Berman et al., 2000). However, HHpred only

identified some templates with coiled coil region

aligned with a very small region (approximately

from 400th to 650th residue) of the target protein.PSI-BLAST, on the other hand, could not find any

significant match. The Phyre2 (Kelley and

Sternberg, 2009) web server

(http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi

?id=index) was also employed for modeling.

However, only 31% of the protein could be modeled

by the normal mode. Intensive mode could not be

employed on Phyre2 as it requires protein less than

1000 amino acids long.

Finally, I-TASSER (Zhang, 2007; Roy et al., 2010),

the iterative threading assembly refinement server

(http://zhanglab.ccmb.med.umich.edu/I-

TASSER/), was chosen to generate the homology

models because it is automated and easy to use,

its algorithm incorporates multiple templates, and

it has a high degree of accuracy based on blind

CASP experiments (Roy et al., 2010). Rather than

specifying one single template for homology

modeling, I-TASSER was allowed to incorporate

multiple templates since it is recommended that

multiple templates should be used in order to avoid

biasing the model toward one protein or one set

of side chain conformations (Ginalski, 2006;

Rhodes, 2006). Sequence alignments of the target

protein and the templates were performed using

CLUSTALW(http://www.ch.embnet.org/software/

ClustalW.html) (Larkin et al., 2007) and visualized

with Jalview (Clamp et al., 2004; Waterhouse et al.,

2009). I-TASSER generated five predicted

structures for the protein of which the model with

the highest C-score was chosen for further analysis.

Validation and analysis of the 3D model

After modeling, the validation of the modeled

structure was carried out using Protein Structure

Validation Suite (PSVS) tool (Bhattacharya, et al.,

2007) available at http://psvs-1_4-dev.nesg.org/.

Within PSVS, the model was analyzed by

PROCHECK (Laskowski et al., 1993) and

Molprobity (Lovell et al., 2003). 3D structures of

the proteins and protein-calcium complex were

visualized with Chimera (Pettersen et al., 2004).

For an at-a-glance overview of the topology of the

modeled protein, PDBsum (Laskowski, 2009) webserver was used (http://www.ebi.ac.uk/pdbsum/).

Molecular surface area and contact volume was

calculated with the web-based tool Voss Volume

Voxelator (http://www.molmovdb.org/cgi-

bin/3v.cgi) (Voss, 2007; Voss et al., 2006). To know

the secondary structure and topology of the protein,

the 3D structure was submitted to the PDBsum

(Laskowski, 2009) server

(http://www.ebi.ac.uk/pdbsum/). B-factor profiles

of the modeled protein were investigated using the

web-based tool for the analysis of protein flexibility

FlexServ(http://mmb.pcb.ub.es/FlexServ/)(Camps

et al., 2009), with Normal Mode Analysis employed.

This server incorporates the protocols for the

coarse-grained determination of protein dynamics

using different algorithms. For further annotation

and identification of protein interface identification,

the structure was analysed with Polyview (Porollo

and Meller, 2007) server

(http://polyview.cchmc.org/). To identify the likely
http://bp.nuap.nagoyau.ac.jp/sosui/sosui_submit.htmlhttp://bp.nuap.nagoyau.ac.jp/sosui/sosui_submit.htmlhttp://www.cbs.dtu.dk/services/TargetP/abstract.phphttp://www.cbs.dtu.dk/services/TargetP/abstract.phphttp://toolkit.tuebingen.mpg.de/hhpredhttp://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteinshttp://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteinshttp://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=indexhttp://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=indexhttp://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=indexhttp://www.ch.embnet.org/software/ClustalW.htmlhttp://www.ch.embnet.org/software/ClustalW.htmlhttp://www.ch.embnet.org/software/ClustalW.htmlhttp://www.ebi.ac.uk/pdbsum/http://www.molmovdb.org/cgi-bin/3v.cgihttp://www.molmovdb.org/cgi-bin/3v.cgihttp://www.ebi.ac.uk/pdbsum/http://mmb.pcb.ub.es/FlexServ/http://polyview.cchmc.org/http://polyview.cchmc.org/http://mmb.pcb.ub.es/FlexServ/http://www.ebi.ac.uk/pdbsum/http://www.molmovdb.org/cgi-bin/3v.cgihttp://www.molmovdb.org/cgi-bin/3v.cgihttp://www.ebi.ac.uk/pdbsum/http://www.ch.embnet.org/software/ClustalW.htmlhttp://www.ch.embnet.org/software/ClustalW.htmlhttp://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=indexhttp://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=indexhttp://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteinshttp://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteinshttp://toolkit.tuebingen.mpg.de/hhpredhttp://www.cbs.dtu.dk/services/TargetP/abstract.phphttp://www.cbs.dtu.dk/services/TargetP/abstract.phphttp://bp.nuap.nagoyau.ac.jp/sosui/sosui_submit.htmlhttp://bp.nuap.nagoyau.ac.jp/sosui/sosui_submit.html


9/17

44 Mukherjee


biochemical function of the protein from its three-

dimensional structure, ProFunc (Laskowski et al.,

2005a; Laskowski et al., 2005b) server

(http://www.ebi.ac.uk/thornton-

srv/databases/profunc/) was employed. Binding

site prediction was performed with I-TASSER which

also generated a ligand-protein complex. The ligand

(Ca2+) bound with active site residues was plotted

with LIGPLOT (Wallace et al., 1995) within

PDBsum.

Protein structure accession numbers

The homology model of the protein was submitted

to the Protein Model Data Base i.e. PMDB

(Castrignan et al., 2006) at

http://mi.caspur.it/PMDB/ and assigned the

identifiers PM0078184.

Results and discussion

Phylogenetic relationship of SYT1 with other

members of C2 domain containing proteins BLAST

search of SYT1 identified several C2 domain

containing proteins including some hypothetical

and predicted proteins. These hypothetical and

predicted proteinswere excluded for dendrogram

preparation. Finally, Micromonas SYT1 and the

other 57 related proteins (supplementary material,

table S1) were used for phylogenetic tree

construction. All of them had either one or two C2

domains (table 3). Besides plant synaptotagmin,

these proteins included several membrane proteins

with single C2 domain, calcium-dependent lipid-

binding domain-containing proteins, CLB1 and

other C2 domain containing proteins. The

dendrogram showed that SYT1 ofMicromonas is

distinctly different from all the other 57 proteins

(Fig. 1).

Table 1. ProtParam table showing different physicochemical properties of the C2 domain containing protein.

Parameters Value Explanation

pI 5.10 Indicates that the protein is acidic.

Total number of negatively

charged residues (Asp + Glu)

155 Total number of negatively charged residues is

greater than Total number of positively charged

residues. This indicates that the protein is

intracellular.

Total number of positively

charged residues (Arg + Lys)

125

The instability index (II) 39.61 This classifies the protein as stable.

Aliphatic index 82.74 Indicates that this globular protein is thermostable.

Grand average of hydropathicity

(GRAVY)

-0.263 A negative GRAVY score indicates that the protein is

hydrophilic.

Table 2. Secondary structure of the C2 domain containing protein as predicted by SOPMA.

Parameters Number of amino acids Percentage of amino acids

Alpha helix (Hh) 386 36.66

310 helix (Gg) 0 0.00

Pi helix (Ii) 0 0.00

Beta bridge (Bb) 0 0.00

Extended strand (Ee) 187 17.16

Beta turn (Tt) 89 8.45Bend region (Ss) 0 0.00
http://www.ebi.ac.uk/thornton-srv/databases/profunc/http://www.ebi.ac.uk/thornton-srv/databases/profunc/http://www.ebi.ac.uk/thornton-srv/databases/profunc/http://www.ebi.ac.uk/thornton-srv/databases/profunc/


10/17

45 Mukherjee


Table 3. Motifs predicted using MEME.

Motif Width Sites E-value Start

position

p-value Sequence

Motif 1 8 2 4.4e-001 234 7.72e-11 FMGWQQSK

453 1.16e-12 WMVWPRCI

Motif 2 6 2 1.4e+001 504 3.10e-08 LQVRWP

550 4.16e-10 LCVRWY

Motif 3 6 2 7.4e+001 387 1.10e-08 EFECSF

404 1.97e-08 VFPCFG

Physicochemical properties

The physicochemical properties of the C2 domain

containing protein fromMicromonas was predicted

using Expasys ProtParam server

(http://expasy.org/cgi-bin/protparam) using the

protein sequence and the results are shown in table

1. The most frequent amino acid present in the

sequence was found to be alanine (157 residues,

14.9%) and the least was that of cystine (5 residues,

0.5%). The total number of negatively charged

residues (Asp + Glu) was 155 and the total number

of positively charged residues (Arg + Lys) was 125

which indicate the protein to be intracellular as

intracellular proteins have higher fraction of

negatively charged residues. The calculated

isoelectric point (pI) is useful for the fact that at

isoelectric point, the solubility is the least and the

mobility in an electric field is zero. Isoelectric point(pI) is the pH at which the surface of protein is

covered with charge but net charge of protein is

zero. The calculated isoelectric point (pI) was

computed to be 5.10 which indicates that the

protein is acidic. The high aliphatic index (82.74)

indicates that this protein is stable for a wide range

of temperature range. This is important to combat

various stressful environments which is natural for

a signaling protein. The instability index (39.61)

also provides the evidence that the protein in stable.The Grand Average Hydropathicity (GRAVY) value

is negative (-0.263) which indicates better

interaction of the protein with water. The SOSUI

program also showed an average of hydrophobicity

of -0.263343 confirming that the protein is a soluble

protein. Prediction of its subcellular localization

with TargetP showed the protein is localized in

chloroplast with a 63 amino acid long target

peptide.

Structural and functional properties

Table 2 presents the results of secondary structure

prediction analysis by SOPMA from which it is clear

that random coil is predominantly present (37.13%),

followed by alpha helix (36.66%) and extended

strand (17.16%). SOPMA also predicted the

presence of Beta turn (8.45%).

The Conserved Domain Database showed only thepresence of C2-domain (COG5038). No other

domains were found. The scan for motifs with

MEME showed the presence of three motifs (table

3). The sequence of the motifs are

[FW]M[GV]W[PQ][QR][CS][IK], L[CQ]VRW[PY]

and [EV]F[EP]C[FS][FG]. All the motifs were

present in two copies in the sequence. Of these,

motif 1 and 3 were the part of the C2 domain.

Search for the presence of these motifs in other

proteins with MAST revealed some interestingresults. For motif 1, MAST resulted into 26 proteins

Random coil (Cc) 391 37.13

Ambigous states 0 0.00

Other states 0 0.00


11/17

46 Mukherjee


with E-values less than 10 which include several

FAB fragments, FV fragments and few indolicidin

(antimicrobial cationic peptide), membrane

glycoprotein and one cytochrome. For motif 2, 12

sequences were identified with E-value less than 10

and include few S-phase Kinase associated protein

and many Dienelactone hydrolase. Only 5 proteins

were identified with motif 3 which include bacterial

toxins. SMART identified several low complexity

regions as well as two coiled coil regions (table 4).

No low-complexity regions fell into the C2 domain.

It also identified two SCOP domains (d1i19a1 i.e.

FAD linked oxidase, C-terminal domain, and

d1hcia4 i.e. spectrin repeat). The Motif scan tool

identified one amidation site, two N-glycosylation

sites, fourteen Casein kinase II phosphorylation

sites, eighteen N-myristoylation sites, sixteen

Protein kinase C phosphorylation sites, one cell

attachment sequence, one each of Alanine rich,

Arginine rich and Glycine rich regions as well as one

octapeptide repeat (table 5).

Initial BLAST search against NCBI non-redundant

protein database showed many plant

synaptotagmins and some other C2 domaincontaining proteins in the top BLAST hits. Also,

ProFunc identified several synaptotagmin genes

related with SYT1 of Micromonas from plants.

Surprisingly, these proteins only showed similarity

in the C2 domain region. The C-terminal and N-

terminal regions outside the C2 domain did not

show any sequence similarity with any other

proteins. The CD search showed that the C2 domain

spans from Asp282 to Gly495. The results showed

the presence of another small domain of the

superfamily cl01482. As shown in CDD, this

superfamily represents bacterial proteins related to

CpxP, a periplasmic protein that forms part of a

two-component system which acts as a global

modulator of cell-envelope stress in Gram-negative

bacteria. In this protein, this domain spans from

Gly816 to Arg874.

Disordered regions of a protein facilitate

interactions of the protein and allow more

modification sites in the protein (Paital et al., 2011).

The total disordered amino acid residues were 378

(35.89%) as predicted by Disopred. However, they

were spread over the protein in 14 regions. The

longest disordered region was spread from Glu47 to

Thr181. However, the C2 domain was not

disordered as no amino acid within this region was

found disordered. These disordered regions playsignificant roles in protein interaction (Paital et al.,

2011). From these results, it seems that this protein

interacts with other proteins with novel properties.

Table 4. Motifs identified with SMART.

Motif No. of sites Amino acid positions E-value

Low complexity 11 28-42, 56-77, 82-103, 117-134, 165-176, 241-

252, 607-621, 636-655, 693-707, 742-760,

1018-1035

---

Coiled coil 2 816-865, 940-980 ---

SCOP: d1i19a1

(FAD-linked oxidases, C-

terminal domain)

1 198-321 2.20e+00

SCOP: d1hcia4

(Spectrin repeat)

1 800-851 1.40e-01

Among the top five models generated by I-TASSER,

each was with a C-score. The C-score is a

confidence score for estimating the quality of a

predicted model: a high C-score signifies a model

with a high confidence and vice-versa. Models with

a C-score > -1.5 generally have a correct fold (Royet

al., 2010). The structure with the highest C-score (-0.9) was used for further studies. The template-


12/17

47 Mukherjee


modeling score (TM-score) provides a sensitive

measure of overall topology difference between a

predicted structure and template, with a higher

score indicating a better structural match. A TM-

score >0.5 indicates correct overall topology for a

modeled structure. The TM-score for the modeled

protein of this study was 0.600.14 which indicates

that the model had correct overall topology.

Additionally, the normalized Z-score for each

threading alignment between the target and a given

template indicates the significance of the alignment

compared to the average. I-TASSER documentation

advises that a threading alignment with a

normalized Z-score >1 reflects a confident

alignment. In this study, normalized Z-score for the

top 10 templates used by I-TASSER ranged from

1.02-3.53 which reflects the confidence of

alignment.

Table 5. Motifs identified with Motif Scan.

Motif information No. of

sites

Amino acid residues E-value

Amidation site 1 81-84 ---

Nglycosylation site 2 79-82, 519-522 ---Casein kinase II

phosphorylation site

14 44-47, 65-68, 162-165, 198-201, 261-264, 349-352,

385-388, 535-538, 711-714, 847-850, 950-953, 985-

988, 999-1002, 1043-1046

---

Nmyristoylation site 18 88-93, 126-131, 227-232, 335-340, 372-377, 404-409,

430-435, 467-472, 531-536, 627-632, 644-649, 654-

659, 663-638, 684-689, 758-763, 802-807, 826-821,

995-1000

---

Protein kinase C

phosphorylation site 16

23-25, 33-35, 37-39, 46-48, 81-83, 127-129, 241-243,

394-396, 535-537, 565-567, 670-672, 723-725, 795-

797, 807-809, 844-846, 950-952

---

Cell attachment

sequence

1 74-175 ---

Alanine-rich region 1 123145 0.07

Arginine-rich region 1 25-69 5.7

Glycine-rich region 1 627-706 0.00059

Octapeptide repeat 1 473-480 3.6

The PSVS suite analyzed the protein structure with

the help of several tools. According to PROCHECKprogram, Ramachandran plot (figure 3) of the

shading represents the different regions of the plot.

The darker the area, the more favorable is the -

combination. Residues in most favored regions,

additionally allowed regions and generously allowed

regions were 79%, 14.7% 4.9%, respectively. Only

1.3% residues were in disallowed region. Molprobity

evaluates the stereochemical quality of a structure

by calculating phi and psi torsion angles, backbone

bond lengths and backbone bond angles.

Molprobity provides a clashscore as a result of an

all-atom contact analysis which is performed after

adding hydrogen atoms to a structure. When non-donor acceptor atoms overlap by more than 0.4 ,

at least one of the two atoms must be modeled

incorrectly. A clash at this location is noted and

incorporated into the clashscore, which is simply

the number of clashes per 1000 atoms (Lovell et al.,

2003). In this study, the clashscore was quite low

(169.39). All these quality evaluation measures

showed that the modeled structure was quite

reliable.

Overall three-dimensional structure of the protein


13/17

48 Mukherjee


The modeled protein belongs to the / structural

class (Chou and Zhang, 1995) as evidenced from

figure 4A. It is also notable that the protein formed

a V-shaped structure. One part of this V-shaped

structure has prevalence of beta sheets and the

other part has the prevalence of alpha helices. The

volume of the protein was 149974 3.The C2

domain lies in the beta sheet prevalent area (figure

4B). The modeled structure was submitted to

PDBsum to show the secondary structures

graphically. This showed the presence of 21 helices

and 31 strands (which formed 10 sheets) and 10

beta hairpins. The topology (figure 5) showed that

the N-terminal part is primarily consisted of beta

sheets, while the C-terminal portion was made

primarily of alpha helices along with some small

beta strands. Of the 31 beta strands, 23 were present

in the N-terminal region. The B factor, which

reflects spatial uncertainty, was calculated using the

web-based tool for the analysis of protein flexibility,

FlexServ. The minimum B-factor for a residue was

measured to be 4.663 2 and the maximum B-factor

was 304.671 2. The protein has six regions in form

of six peaks which have B-factor values more than

100 2 (figure 6A). In general, several loop regionsshowed more flexibility as shown in figure 6B.

Maximum flexibility was showed by Pro119, Leu120,

Pro121, Thr482, Ala483, Pro718, and Leu719 (figure

6C). As loops do not form any rigid structure in the

protein, these flexible regions seemed to be vital for

structural modifications of the protein.

The disordered regions were mainly situated in the

loop regions of the protein (figure 7A). 19 beta

strands contained disordered regions in them in

contrast to only 4 alpha helices. The longest

disordered region was Glu47 to Thr181 which

contained 6 beta strands and only 1 alpha helix. The

Polyview 3D program estimated the interacting

residues of the protein. Total 275 residues were

predicted as interacting i.e. interfacial (figure 7B).

Comparison of the data of disordered regions and

interacting residues showed that 30 interacting

residues were predicted to be disordered.

Comparing the results of FlexServ and Polyview, it

was evident that all of the amino acids which

contribute to the flexibility of the protein except

Pro121 form the interacting surfaces of the protein.

The distribution of electrostatic potentials

(figure7C) showed that the C2-domain is primarily

neutral with some negatively charged regions and a

few positively charged regions. It is also notable that

the highly flexible region of the protein has either

positive or negative electrostatic potentials. The

presence of charged residues in the loop regions of

high flexibility suggests their participation in

dynamic charge-mediated interactions with other

molecules.

Structure of the C2 domain and ca2+binding

residues

The C2 domain was consisted of 4 sheets (9

strands). Of these 9 strands, one very small strand

(Asp425-Arg427) was not shown as strand in the I-

TASSER generated model as viewed by Chimera,

but showed in PDBsum topology (figure 5).

Otherwise the topology generated by the PDBsum

matched with the modeled structure. The C2

domain also contains three small alpha helices.

However, the C2 domain is not fully formed ofhelices and strands. 125 of 214 residues (58.41%)

did not form any helix or sheet. Usually, the C2

domain forms a beta-sheet scaffold with eight anti

parallel strands connected by loops (Reddy and

Reddy, 2004). Loops 1-3 are placed on top of the

sheets and coordinate with Ca2+ binding (Sutton et

al., 1995). This binding of C2 domain with Ca2+ ion

facilitates its interaction with negatively charged

phospholipids. The protein studied here, however,

interacts with Ca2+ ion with the help of amino acids

within the C2-domain as well as amino acids

outside the C2 domain (Asp545, Pro546, Lys547,

Ala548 and Gln549), as shown by I-TASSER. The

Ca2+ ion is surrounded by nine amino acids (figure

8A) The protein with a similar binding site was,

surprisingly showed by one integrin alphaXbeta2

ectodomain from human (PDB ID: 3K6S) (Xie et al.,

2010). The Ca2+ bound model was submitted to

PDBsum and the LIGPLOT showed bonding of the

Ca2+ ion with the backbone nitrogen of Phe424. The


14/17

49 Mukherjee


Ca2+ ion formed hydrogen bonds with Leu422,

Asp545 and Pro546 (figure 8B).

Conclusion

The putative synaptotagmin protein from the

picoeukaryoic planktonMicromonas investigated in

this study is a novel member of the C2-domain

containing protein family as it did not show any

sequence similarity with other members of the C2

domain family outside the C2-domain as shown by

NCBI BLAST search. The NJ tree developed on the

basis of sequence alignment also showed that the

protein is distinct from other members of the C2-

domain containing proteins from the plant

kingdom. Finally, this analysis provides insight into

the unique structural properties as well as its

novelty for interaction with Calcium. The predicted

model of the protein is useful for different

experimental purposes in relation to the different

signaling mechanisms involving this protein. The

interaction between the protein and the Ca2+-ion

proposed in this study are useful for understanding

the potential mechanism of action of this protein

and also its evolutionary significance.

Acknowledgement

The facility situated at the Department of Botany,

Dinabandhu Mahavidyalaya is gratefully

acknowledged.

References

Altschul SF, Gish W, Miller W, Myers EW,

Lipman DJ. 1990. Basic local alignment search

tool. Journal of Molecular Biology215(3), 403-410.

Altschul SF, Madden TL, Schffer AA, Zhang

J, Zhang Z, Miller W, Lipman DJ. 1997.

Gapped BLAST and PSI-BLAST: a new generation

of protein database search programs. Nucleic Acids

Research 25, 3389-3402.

Bailey TL, Boden M, Buske FA, Frith M,

Grant CE, Clementi L, Ren J, Li WW, Noble

WS. 2009. MEME SUITE: tools for motif

discovery and searching. Nucleic Acids Research

37, W202-W208.

Berman HM, Westbrook J, Feng Z, Gilliland

G, Bhat TN, Weissig H, Shindyalov IN,

Bourne PE. 2000. The Protein Data Bank.

Nucleic Acids Research 28, 235-242.

Bhattacharya A, Tejero R, Montelione GT.

2007. Evaluating protein structures determined by

structural genomics consortia. Proteins 66, 778-

795.

Camps J, Carrillo O, Emperador A, Orellana

L, Hospital A, Rueda M, Cicin-Sain D,

D'Abramo M, Gelp JL, Orozco M. 2009.

FlexServ: an integrated tool for the analysis of

protein flexibility. Bioinformatics 25(13), 1709-

1710.

Castrignan T, De Meo PD, Cozzetto D,

Talamo IG, Tramontano A. 2006. The PMDB

Protein Model Database. Nucleic Acids Research

34, D306-D309.

Cedano J, Aloy P, Prez-Pons JA, Querol E.

1997. Relation between amino acid composition

and cellular location of proteins. Journal of

Molecular Biology266(3), 594-600.

Chou KC, Zhang CT. 1995. Prediction of protein

structural classes. Critical Reviews in Biochemistry

and Molecular Biology30, 275-349.

Clamp M, Cuff J, Searle SM, Barton GJ.

2004. The Jalview Java Alignment Editor.

Bioinformatics 20, 426-427.

Craxton M. 2004. Synaptotagmin gene content of

the sequenced genomes. BMC Genomics 5, 43.

Dolan MA, Noah JW, Hurt D. 2012.

Comparison of common homology modeling

algorithms: application of user-defined alignments.

In: Orry A. J.W. and Abagyan R, eds. Homology


15/17

50 Mukherjee


Modeling: Methods and Protocols, Methods in

Molecular Biology, vol. 857, Humana Prerss, USA,

399-414.

Emanuelsson O, Nielsen H, Brunak S, von

Heijne G. 2000. Predicting Subcellular

localization of proteins based on their N-terminal

amino acid sequence. Journal of Molecular Biology

300(4), 1005-1016.

Gasteiger E, Hoogland C, Gattiker A, Duvaud

S, Wilkins MR, Appel RD, Bairoch A. 2005.

Protein Identification and Analysis Tools on the

ExPASy Server. In: Walker JM, ed. The Proteomics

Protocols Handbook. Humana Press, Totowa, New

Jersey, USA, 571-607.

Geourjon C, Delage G. 1995. SOPMA:

Significant improvements in protein secondary

structure prediction by consensus prediction from

multiple alignments. Computer applications in the

biosciences 11, 681-684.

Ginalski K. 2006. Comparative modeling for

protein structure prediction. Current Opinion inStructural Biology16(2), 172-177.

Hirokawa T, Boon-Chieng S, Mitaku S. 1998.

SOSUI: classification and secondary structure

prediction system for membrane proteins.

Bioinformatics 14, 378-379.

Kelley LA, Sternberg MJE. 2009. Protein

structure prediction on the web: a case study using

the Phyre server. Nature Protocol 4, 363-371.

Larkin MA, Blackshields G, Brown NP,

Chenna R, McGettigan PA, McWilliam H,

Valentin F, Wallace IM, Wilm A, Lopez R,

Thompson JD, Gibson TJ, Higgins DG. 2007.

ClustalW and ClustalX version 2. Bioinformatics

23(21), 2947-2948.

Laskowski RA. 2009. PDBsum new things.

Nucleic Acids Research 37, D355-D359.

Laskowski RA, MacArthur MW, Moss DS,

Thornton JM. 1993. PROCHECK: a program to

check the stereochemistry of protein structures.

Journal of Applied Crystallography26, 283-291.

Laskowski RA, Watson JD, Thornton JM.

2005a. ProFunc: a server for predicting protein

function from 3D structure. Nucleic Acids Research

33, W89-W93.

Laskowski RA, Watson JD, Thornton JM.

2005b. Protein function prediction using local 3D

templates. Journal of Molecular Biology 351, 614-

626.

Letunic I, Doerks T, Bork P. 2012. SMART 7:

recent updates to the protein domain annotation

resource. Nucleic Acids Research 40(D1), D302-

D305.

Lewis JD, Lazarowitz SG. 2010.Arabidopsis

synaptotagmin SYTA regulates endocytosis and

virus movement protein cell-to-cell transport.

Proceedings of the National Academy of Sciences

USA107(6), 2491-2496.

Lovell SC, Davis IW, Arendall WB, de Bakker

PIW, Word JM, Prisant MG, Richardson JS,

Richardson DC. 2003. Structure validation by

C geometry: , and C deviation. Proteins 50,

437-450.

Marchler-Bauer A, Bryant SH. 2004. CD-

Search: protein domain annotations on the fly.

Nucleic Acids Research. 32, W327-W331.

Marchler-Bauer A, Lu S, Anderson JB,

Chitsaz F, Derbyshire MK, Deweese-Scott C,

Fong JH, Geer LY, Geer RC, Gonzales NR,

Gwadz M, Hurwitz DI, Jackson JD, Ke Z,

Lanczycki CJ, Lu F, Marchler GH,

Mullokandov M, Omelchenko MV,

Robertson CL, Song JS, Thanki N, Yamashita

RA, Zhang D, Zhang N, Zheng C, Bryant SH.

2011. CDD: a Conserved Domain Database for the


16/17

51 Mukherjee


functional annotation of proteins. Nucleic Acids

Research 39, D225-D229.

Nalefski EA, Falke JJ. 1996. The C2 domain

calcium-binding motif: Structural and functional

diversity. Protein Science 5, 2375-2390.

Pagni M, Ioannidis V, Cerutti L, Zahn-Zabal

M, Jongeneel CV, Hau J, Martin O,

Kuznetsov D, Falquet L. 2007. MyHits:

improvements to an interactive resource for

analyzing protein sequences. Nucleic Acids

Research 35, W433-W437.

Paital B, Kumar S, Farmer R, Tripathy NK,

Chainy GBN. 2011. In silico Prediction and

characterization of 3D structure and binding

properties of catalase from the commercially

important crab, Scylla serrata. Interdisciplinary

Sciences: Computational Life Science 3, 110-120.

Pettersen EF, Goddard TD, Huang CC, Couch

GS, Greenblatt DM, Meng EC, Ferrin TE.

2004. UCSF Chimera - a visualization system for

exploratory research and analysis. Journal ofcomputational chemistry25(13), 1605-1612.

Porollo A, Meller J. 2007. Versatile Annotation

and Publication Quality Visualization of Protein

Complexes Using POLYVIEW-3D. BMC

Bioinformatics 8, 316.

Pruitt KD, Tatusova T, Maglott TR. 2007.

NCBI reference sequences (RefSeq): a curated non-

redundant sequence database of genomes,

transcripts and proteins. Nucleic Acids Research

35, D61-D65.

Reddy VS, Reddy ASN. 2004. Proteomics of

calcium-signaling components in plants.

Phytochemistry65, 1745-1776.

Rhodes G. 2006. Crystallography Made Crystal

Clear. 3rd ed., Academic Press, Burlington, MA.

Rizo J, Sudhof TC. 1998. C2-domains, structure

and function of a universal Ca2+ -binding domain.

Journal of Biological Chemmistry 273, 15879-

15882.

Roy A, Kucukural A, Zhang Y. 2010. I-

TASSER: a unified platform for automated protein

structure and function prediction. Nature Protocol

5(4), 725-738.

Schultz J, Milpetz F, Bork P, Ponting CP.

1998. SMART, a simple modular architecture

research tool: Identification of signaling domains.

Proceedings of the National Academy of Sciences

USA95, 5857-5864.

Sigrist CJA, Cerutti L, de Castro E,

Langendijk-Genevaux PS, Bulliard V,

Bairoch A, Hulo N. 2010. PROSITE, a protein

domain database for functional characterization and

annotation. Nucleic Acids Research 38, D161-D166.

Sding J, Biegert A, Lupas AN. 2005. The

HHpred interactive server for protein homology

detection and structure prediction. Nucleic AcidsResearch 33, W244-W248.

Sutton RB, Davletov BA, Berghuis AM,

Sudhof TC, Sprang SR. 1995. Structure of the

first C2 domain of synaptotagmin I: a novel

Ca2+/phospholipid-binding fold. Cell 80, 929-938.

Tamura K. Peterson D, Peterson N, Stecher

G, Nei M, Kumar S. 2011. MEGA5: Molecular

Evolutionary Genetics Analysis using Maximum

Likelihood, Evolutionary Distance, and Maximum

Parsimony Methods. Molecular Biology and

Evolution 28, 2731-2739.

Voss NR, Gerstein M, Steitz TA, Moore PB.

2006. The geometry of the ribosomal polypeptide

exit tunnel. Journal of Molecular Biology 360(4),

893-906.


17/17

52 Mukherjee


Voss NR. 2007. Geometric Studies of RNA and

Ribosomes, and Ribosome Crystallization PhD

dissertation, Yale University.

Wallace AC, Laskowski RA, Thornton JM.

1995. LIGPLOT: a program to generate schematic

diagrams of protein-ligand interactions. Protein

Engineering design & selection 8(2), 127-134.

Ward JJ, McGuffin LJ, Bryson K, Buxton BF,

Jones DT. 2004. The DISOPRED server for the

prediction of protein disorder. Bioinformatics. 20,

2138-2139.

Waterhouse AM, Procter JB, Martin DMA,

Clamp M, Barton GJ. 2009. Jalview version 2: A

Multiple Sequence Alignment and Analysis

Workbench. Bioinformatics 25 (9), 1189-1191.

Worden AZ, Lee J-H, Mock T. Rouz P,

Simmons MP, Aerts AL, Allen AE, Cuvelier

ML, Derelle E, Everett MV, Foulon E,

Grimwood J, Gundlach H, Henrissat B,

Napoli C, McDonald SM, Parker MS,

Rombauts S, Salamov A, Von Dassow P,

Badger JH, Coutinho PM, Demir E, Dubchak

I, Gentemann C, Eikrem W, Gready JE, John

U, Lanier W, Lindquist EA, Lucas S, Mayer

KF, Moreau H, Not F, Otillar R, Panaud O,

Pangilinan J, Paulsen I, Piegu B, Poliakov A,

Robbens S, Schmutz J, Toulza E, Wyss T,

Zelensky A, Zhou K, Armbrust EV,

Bhattacharya D, Goodenough UW, Van de

Peer Y, Grigoriev IV. 2009. Green evolution

and dynamic adaptations revealed by the genomes

of the marine picoeukaryote Micromonas. Science

324, 268-272.

Xie C, Zhu J, Chen X, Mi L, Nishida N,

Springer TA. 2010. EMBO Journal 29(3), 666-

679.

Zhang Y. 2007. Template-based modeling and

free modeling by I-TASSER in CASP7. Proteins

69(S8), 108-117.

the putative synaptotagmin protein encoded by the syt1 gene of the picoplanktonic alga micromonas is...

Documents

revision of the genus micromonas manton et parke ... · et...

diversity of picoplanktonic prasinophytes assessed by...

picodiv aims: establish diversity of picoplankton measure...

tarif public produits sécurité - azenn.com · syt108c2p...

micromonas pusilla y nannochloris...

global journal of biochemistry -...

characterization and temperature dependence of arctic...

nouveautÉs draka - neklan · 2020. 2. 18. · 2...

annales du contrôle national de qualité des analyses de...

micromonas sp. (prasinophyceae) from different habitats...

temperature is a key factor in micromonas–virus...

bolidophyceae, a sister picoplanktonic group of diatoms...

revision of the genus micromonas manton et parke ...and...

culttire of rnible molluscs held at...

temperature is a key factor in micromonas–virus...

micromonas sp. (prasinophyceae) from different habitats ......

chapter 4 4.pdf · sensors, in particular doc2s, for snare...

week 3 lab: 1) basal green eukaryotes -micromonas...

arabidopsis synaptotagmin 1 is required for the ...role in...