REGULAR ARTICLE
In Silico Functional and Structural Characterizationof H1N1 Influenza A Viruses Hemagglutinin,2010–2013, Shiraz, Iran
Afagh Moattari1 • Behzad Dehghani1 •
Nastaran Khodadad1 • Forogh Tavakoli1
Received: 8 September 2014 /Accepted: 6 May 2015 / Published online: 12 May 2015
� Springer Science+Business Media Dordrecht 2015
Abstract Hemagglutinin (HA) is a major virulence factor of influenza viruses and
plays an important role in viral pathogenesis. Analysis of amino acid changes,
epitopes’ regions, glycosylation and phosphorylation sites have greatly contributed
to the development of new generations of vaccine. The hemagglutinins of 10 se-
lected isolates, 8 of 2010 and 2 of 2013 samples were sequenced and analyzed by
several bioinformatic softwares and the results were compared with those of 3
vaccine isolates. The study detected several amino acid changes related to altered
epitopes’ sites, modification sites and physico-chemical properties. The results
showed some conserved modification sites in HA structure. This study is the first
analytical research on isolates obtained from Shiraz, Iran, and our results can be
used to better understand the genetic diversity and antigenic variations in Iranian
and Asian H1N1 pathogenic strains.
Keywords Influenza � Hemagglutinin � In silico � Glycosylation � Phosphorylation
1 Introduction
In recent years, the influenza A virus has generally caused pandemic influenza
where only type A influenza (H1N1) virus has infected millions of people and the
infection has caused 18,000 deaths prior to May 30, 2010 globally (Girard et al.
Electronic supplementary material The online version of this article (doi:10.1007/s10441-015-9260-
1) contains supplementary material, which is available to authorized users.
& Afagh Moattari
1 Influenza Research Center, Department of Bacteriology and Virology, Shiraz University of
Medical Sciences, 71348-45794 Shiraz, Iran
123
Acta Biotheor (2015) 63:183–202
DOI 10.1007/s10441-015-9260-1
2010; Taubenberger and Morens 2010). Among many subtypes of influenza A virus,
H1N1, H2N2, and H3N2 subtypes have efficiently been adapted to transmit to and
infect humans (Bouvier and Lowen 2010; Schrauwen and Fouchier 2014). For many
years, H1N1 has accounted for most influenza epidemics. Unlike seasonal influenza,
it has caused severe respiratory illness with high mortality rates (worldwide, 20–50
million deaths in 1918) (Ma et al. 2011; Taubenberger and Morens 2006). The
emergence and transition of type A (H1N1) pdm09 in 2009 resulted in a new
pandemic as declared by the World Health Organization (WHO, 6 August 2010).
The virus RNA encodes eleven proteins including HA, NA, NP, M1, M2, NS1,
NEP, PA, PB1, PB1-F2, PB2, of which hemagglutinin (HA) and neuraminidase
(NA) are two surface glycoproteins that interact with cellular receptors and play an
important role in cellular attachment (Kapoor and Dhama 2014; Mishin et al. 2005).
Mishin et al. (2005) reported the role of HA in binding to cellular receptors and
the functional balance between HA and NA in influenza virus infection. HA is
synthesized in the endoplasmic reticulum as an HA precursor (HA0) that is post-
translationally cleaved into two subunits of HA1 and HA2 (Boulay et al. 1988).
Influenza A virus cellular receptors contain terminal neuraminic acid (NeuAc)
moieties (Mishin et al. 2005). Pathogenicity, virus infection and spread of the virus
depend on the HA0 cleavage. The HA1 subunit carries the NeuAc-binding site, and
the HA2 subunit is responsible for fusion of viral and cellular membrane (Mishin
et al. 2005).
Structurally, HA is a trimer glycoprotein and comprises a globular head and stem
regions. Globular region includes receptor binding domain and major antigenic sites
and the stem consists of fusion peptide that supports globular domain (Das et al.
2010; Wang et al. 2009).
HA modification includes glycosylation and phosphorylation. HA co-transla-
tional or posttranslational glycosylation modification is essential for folding and
transport (Anwar et al. 2006; Das et al. 2010; Wang et al. 2009).
Frequent mutations in HA are related to variation in antigenic epitopes that affect
the antibody recognition, escape from the immune responses, and impacts on
vaccination (Han and Marasco 2011).
Eighteen HA subtypes were recognized and for some subtypes high resolution
crystal structures were determined (H1, H2, H3, H5, H7, H9, H14) (Sun et al. 2010;
Tong et al. 2012, 2013).
Several studies have focused on the relationship between functional and
structural properties of HA subtypes and determined important structures related
to special function (Isin et al. 2002; Sriwilaijaroen and Suzuki 2012; Sun et al.
2010).
These studies provided beneficial data to identify the corresponding structural
and functional modules in HA. Comparing the similarities and differences between
HA modules could usefully define other HA molecule’s properties.
Bioinformatic analysis of HA is a favorable and useful method to determine
several changes in amino acids, modification sites, B cell and T cell epitopes (Das
et al. 2010; Sun et al. 2010). The study of amino acid variations related to epitopes
could lead to a new generation of vaccine against influenza. This study attempted to
determine the major changes in the HA protein of influenza viruses isolated in the
184 A. Moattari et al.
123
Virology Department of Shiraz University Medical School in 2010 and 2013,
compared with those of vaccine strains introduced by the World Health Organi-
zation (WHO).
2 Materials and Methods
2.1 Sampling
The present study comprised 772 patients selected from pandemic Influenza A
(H1N1) infections in Shiraz, southern Iran, between May 2010 to February 2013.
The specimens collected from the patients were placed in viral transport media and
transported, under refrigeration to the virology laboratory of Shiraz University of
Medical School (SUMS) and stored at -70 �C until tested. The study was approved
by the ethics committee of SUMS.
2.2 RNA Extraction and Real Time Reverse Transcription (rRT)-PCR
RNA extraction was carried out using Roche High Pure Viral RNA Extraction Kit
(Roch, Mannheim, Germany) according to the manufacturer’s instructions.
Extracted RNAs were kept at -80 �C until further processing, where rRT-PCR
was carried out using SuperScript III Platinum One-Step Quantitative RT-PCR kit
manufactured by Invitrogen. Real time runs were performed on the Corbett 6000
Rotor Gene system. The reaction comprised 4 ll of the extracted RNA combined
with 16 ll of the master mix, including 29 reaction mix, SuperScript III RT/
Platinum Taq Mix, 5.4 ll RNase-DNase Free water and 0.4 ll of each primer and
probe. Each isolate of RNA was tested by separate primer/probe sets for detection of
influenza universal swine (swFLUA), swine H1 and RNase P. According to the
CDC Real time RT-PCR protocol, the cycling conditions included a 30 min RT step
at 50 �C, followed by enzyme inactivation at 95 �C for 2 min. PCR step included 45
cycles at 95 �C for 15 s, 55 �C for 30 s, and 72 �C for 30 s. Data collection and
analysis of the real-time PCR assay were accomplished using the Rotor-Gene data
analysis Software, Version 6.0A.
The isolates were positive for H1N1pdm09 grown in MDCK cells.
2.3 Virus Isolation
The swabs were vortexed in 5 ml DMEM medium for a few minutes to dislodge and
suspend adherent viruses. The Madin–Darby canine kidney cell confluent mono-
layers were inoculated with 200 microliters of the viral suspension proven positive
by Real Time PCR. The monolayers were maintained in the serum free Dulbeco’s
Modified Eagle’s Medium (Sigma) and supplemented with 2 mg/ml trypsin (Gibco
BRL, Life Technologies), 100 lg/ml streptomycin and 100 units/ml penicillin G.
The cultures were incubated at 34 �C and examined daily for cytopathic effect
which was confirmed by the ability of infected cultures to agglutinate guinea pig
erythrocytes no later than 7 days post-infection.
In Silico Functional and Structural Characterization of… 185
123
2.4 Sequencing
The PCR products of 8 HA gene isolated in 2010 and 2 HA gene isolated in 2013
were purified by a commercial gel extraction kit (QiagenGmbH, Hilden, Germany)
and subsequently sequenced. The nucleotide sequences obtained in this study were
submitted to Gen Bank under the following accession numbers.
2.5 Selection of HA for Analysis
For bioinformatic analysis, 10 sequences were submitted (full length: 1701 bp, 567
amino acids):
GenBank:HQ419004.1(A/Shiraz/1/2010(H1N1), GenBank:HQ419005.1(A/Shi-
raz/2/2010(H1N1), GenBank:HQ419006.1(A/Shiraz/3/2010(H1N1), GenBank:HQ4
19007.1(A/Shiraz/4/2010(H1N1), GenBank:HQ419008.1(A/Shiraz/5/2010(H1N1),
GenBank:HQ419009.1(A/Shiraz/6/2010(H1N1), GenBank:HQ419010.1(A/Shiraz/
7/2010(H1N1), GenBank:HQ419011.1(A/Shiraz/8/2010(H1N1), GenBank:KJ7812
17.1(A/Shiraz/38/2013(H1N1), GenBank:KJ781218.1(A/Shiraz/43/2013(H1N1) and
three vaccine isolates GenBank:FJ981613(A/California/07/2009(H1N1), GenBank:
CY058519 (California/07/2009 x NYMC X-157), GenBank:CY030232(A/Brisbane/
59/2007(H1N1)) were obtained from http://www.ncbi.nlm.nih.gov.
For easier reading, abbreviations were used instead of the names of isolates:
Shiraz1–Shiraz 8, Shiraz 38, Shiraz 43, Calif, Calif X-157, and Brisbane (Table 1).
2.6 Amino Acid Changes and Phylogenetic Trees
For analysis of the mutations in all 13 HA sequences, translated and editing were
carried out with the CLC sequence viewer version Beta (QIAGEN). The alignment
of the translated peptides of all sequences was generated using CLUSTAL X
software, version 1.81. Phylogenetic trees were constructed by neighbor–joining
Table 1 Abbreviations were
used instead of isolated namesGenBank Abbreviations
HQ419004.1(A/Shiraz/1/2010(H1N1) Shiraz 1
HQ419005.1(A/Shiraz/2/2010(H1N1) Shiraz 2
HQ419006.1(A/Shiraz/3/2010(H1N1) Shiraz 3
HQ419007.1(A/Shiraz/4/2010(H1N1) Shiraz 4
HQ419008.1(A/Shiraz/5/2010(H1N1) Shiraz 5
HQ419009.1(A/Shiraz/6/2010(H1N1) Shiraz 6
HQ419010.1(A/Shiraz/7/2010(H1N1) Shiraz 7
HQ419011.1(A/Shiraz/8/2010(H1N1) Shiraz 8
KJ781217.1(A/Shiraz/38/2013(H1N1) Shiraz 38
KJ781218.1(A/Shiraz/43/2013(H1N1) Shiraz 43
FJ981613(A/California/07/2009(H1N1) Calif
CY058519 (California/07/2009 9 NYMC X-157) Calif x-157
CY030232(A/Brisbane/59/2007(H1N1) Brisbane
186 A. Moattari et al.
123
(NJ) and maximum-likelihood (ML) methods, 100 times, to confirm the reliability
of phylogenetic trees.
2.7 Primary Sequence Analysis
Theoretical isoelectric point (PI), molecular weight, total number of positive and
negative residues, extinction coefficient, instability index, aliphatic index and grand
average hydropathy (GRAVY) were evaluated using the ‘‘Expasy’sProtParam’’
(http://expasy.org/tools/protparam.html), (Gasteiger et al. 2005).
‘‘PROTSCALE’’ (http://us.expasy.org/tools/protscale.html) was used to calculate
the number of codons, bulkiness, polarity, refractivity, recognition factors, hy-
drophobicity, transmembrane tendency, percent buried residues, percent accessible
residues, average area buried, average flexibility, relative mutability, and the
number of amino acids (Gasteiger et al. 2005).
2.8 Immuno-Informatic Analysis
B cell epitopes’ positions were determined at www.immuneepitope.org (http://tools.
immuneepitope.org/tools/bcell/iedb_input). The server uses the following methods:
Chou and Fasman method of Chou and Fasman (2006) used for Beta-Turns (Karplus
and Schulz 1985) for predicting the flexibility; Emini method (Emini et al. 1985) for
predicting surface accessibility and Parker method (Parker et al. 1986) for hy-
drophilicity evaluation.
Linear B cell epitopes were also predicted by Bepipred (Larsen et al. 2006)
(http://www.cbs.dtu.dk/services/BepiPred/) software. BcePred software at http://
www.imtech.res.in/raghava/bcepred was run on sequences to detect polarity-based
B cell epitopes in addition to properties used by the previous server (Saha and
Raghava 2004). ABCpred software at http://www.imtech.res.in/raghava/abcpred/
predicted B cell epitopes (Saha and Raghava 2006b).
Probability of antigenicity was estimated at http://www.ddg-pharmfac.net/
vaxijen/VaxiJen/VaxiJen.html website using VaxiJen software (Doytchinova and
Flower 2007). Default threshold of the software was 0.4. Also AlgPred (Saha and
Raghava 2006a) at http://www.imtech.res.in/raghava/algpred/submission.html was
used regarding IgE epitopes.
2.9 Functional Characterization
DISPHOS (http://www.dabi.temple.edu/disphos/pred.html) (Iakoucheva et al. 2004)
and NetPhos (http://www.cbs.dtu.dk/services/NetPhos/) (Blom et al. 1999) were
used to predict serine, threonine and tyrosine phosphorylation sites in eukaryotic
proteins. NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/) (Blom et al. 2004)
was used to determine kinase specific phosphorylation sites in eukaryotic proteins.
N-glycosylation sites were predicted using NetNGlyc (http://www.cbs.dtu.dk/
services/NetNGlyc/), (Gupta and Brunak 2002) and GlycoEP (http://www.imtech.
res.in/raghava/glycoep/submit.html) (Chauhan et al. 2013).
In Silico Functional and Structural Characterization of… 187
123
2.10 Secondary Structure Prediction
SOPMA software at http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_
sopma.html (Geourjon and Deleage 1995) was used to predict the secondary
structure of all sequences. The results were confirmed by Phyre server at http://
www.sbg.bio.ic.ac.uk/phyre (Kelley and Sternberg 2009); ALPHAPRED, Be-
taTpred2 (Kaur and Raghava 2003b), and GAMMAPred (Kaur and Raghava 2003a)
software at http://www.imtech.res.in and RONN at http://www.strubi.ox.ac.uk/
RONN (Yang et al. 2005).
2.11 Tertiary Structure Prediction and Validation
All 3D structures were built in I-TASSER (Roy et al. 2010) at http://zhanglab.ccmb.
med.umich.edu/I-TASSER, Phyre2server (Kelley and Sternberg 2009) at http://
www.sbg.bio.ic.ac.uk/*phyre2/html and (PS)2 Server (Chen et al. 2006) at http://
ps2v2.life.nctu.edu.tw. Qmean (Benkert et al. 2008) at http://swissmodel.expasy.
org/qmean/cgi/index.cgi was employed to evaluate the stereochemistry and quality
of the models. The Ramachandran plots were mapped by Rammpage at http://
mordred.bioc.cam.ac.uk/*rapper/rampage.php.
3 Results
3.1 Phylogenetic Results
Phylogenetic tree for 13 isolates is shown in Fig. 1 by NJ method. The two main
clades are shown in tree. The upper clade was divided into two clusters. In the first
cluster Calif and CalifX-157 were closer than Brisbane and in the second cluster
Shiraz 38 and Shiraz 43 were very close with 94 bootstrap score. Down clades were
divided into two clusters, where the first cluster included Shiraz 4, Shiraz 5 and the
second cluster contained other isolates. In addition, by ML method, two main clades
are shown in tree, and the upper clade was divided into two clusters; in the first
cluster Shiraz 1–8 and Calif X-157 and calif were close, and in the second cluster
Shiraz 38 and 43 were very closely related. Down clade include Brisbane isolate.
3.2 Amino Acid Changes
Comparison of the patient and vaccines’ isolates showed changes in several amino
acids. Computable changes were observed in 100, 220, and 104 positions in Calif
and Calif X-157., and similar changes were found in all isolates’ sequences
(Table 2).
3.3 ProtParam and Protscale Properties
A variety of protein sequences were evaluated using ProtParam, ProtParam physico-
chemical properties, molecular weight, aliphatic index. The grand averages of
188 A. Moattari et al.
123
hydropathicity were similar in patients’ isolates and vaccine sequences. PI analysis
results were divided into 3 groups. The first group was vaccine and Shiraz 6 isolates
(pI 7.19–6.74), the second was Shiraz 7 (pI 7.51) isolate, and the third group
included Shiraz 1–Shiraz 5, Shiraz 8, Shiraz 38 and Shiraz 43 (pI 7.81–8.22)
isolates. There was no significant difference between the instability indexes of the
isolates, which was predicted as stable proteins.
‘‘PROTSCALE’’ results for several properties of patient and vaccine isolates
showed no significant difference; the results revealed a high degree of similarity in
many of the isolates’ features.
Fig. 1 Phylogenetic tree for 13 sequences, bootstrap 100. By NJ and Ml methods
In Silico Functional and Structural Characterization of… 189
123
Table
2Comparisonofam
inoacid
residues
changes
amongisolatesandvaccineisolatesCalifornia
andCalifornia
X-157
Shiraz
1Shiraz
2Shiraz
3Shiraz
4Shiraz
5Shiraz
6Shiraz
7Shiraz
8Shiraz
38
Shiraz
43
Popular
amino
acid
changes
P100S
??
??
??
??
?
S220T
??
??
??
??
?
N104K
?N104E
??
??
??
S102T
??
??
??
G105W
??
??
??
?
D291N
??
?
I338V
??
??
?
V251I
?
E391K
??
?
H155R
?
S202T
?
E516K
?
I477M
? G157A
?
C320Y
?
S468N
?
V226L
?
N461D
?
T305P
?
Unique
amino
acid
changes
G242R,
E252Q,
E260Q
A261P
D103H,
T262P,
T316K,
P321S
Y175F,P176S,
S181T,T183L,
A278V,G378K
I341V,
G478K
S138N,
R222K
D239G
Y10F,I22L,
D114N,K300Q,
P314L,D489N
Q205H,S285P,
K300E,V428L,
N472K,A549E
190 A. Moattari et al.
123
3.4 B Cell Epitopes Analysis
Proteins’ sequences position containing the B cell epitopes at 80 % identity level by
immuneepitope results are shown in Table 3,where they confirmed three common
regions except for Brisbane (141–145, 173–175, 501); 104–108 regions were found
in Shiraz 38, Calif, and Calif X-157.
Linear B cell epitopes by Bepipred analysis demonstrated 3 distinct conserved
regions (28–32, 138–142, 499–507) and some common regions among the isolates.
These include 100–107 regions in Shiraz 38, Shiraz 43, Brisbane, Calif, Calif X-157
and 289 region in Shiraz 4, Shiraz 7, Shiraz 38, Shiraz 43, Calif, and Calif X-157.
BcePred results identified several similar B cell epitopes’ regions in all isolates
including 114–120, 129–133, 462–468, 370–372 and 506–518. The regions shared
by all isolates, except Brisbane, were 4. Shiraz 43 showed two new regions as
156–160 and 506–518.
ABCpred result revealed 16 meric peptide sequences as B cell epitopes for 13
protein sequences (Table 4). Three 16 meric conserved regions (279, 357, 351) were
determined; region 300 was common to all sequences except Brisbane.
Epitopes having vaxijen cutoff value was considered as 0.4 for identification of T
cell epitopes, the results showed no significant difference and all them were
probably antigenic. Based on prediction by AlgPred, none of the proteins’
sequences was allergen.
3.5 Functional Analysis
Prediction of serine, threonine and tyrosine phosphorylation sites by DISPHOS,
NetPhos and kinase specific phosphorylation sites by NetPhosK are shown in
Table 5. DISPHOS program output showed that Shiraz 1–Shiraz 8 had some similar
regions (123, 127, 209, 215, 287, 501, 507); these were also shown in Calif and
Table 3 Results of B cell epitopes by ‘‘immuneepitope’’
Isolates B cell epitopes positions
Shiraz 1 141–145 173–175 501
Shiraz 2 ? ? ?
Shiraz 3 ? ? ?
Shiraz 4 ? ?
Shiraz 5 ? ? ?
Shiraz 6 ? ? ?
Shiraz 7 ? ? ?
Shiraz 8 ? ? ?
Shiraz 38 ? ? ? 104–108
Shiraz 43 ? ? ? 102–103 108–110
Brisbane ? 107, 156, 500
Calif ? ? ? ? 100–102
Calif X-157 ? ? ? ? ?
In Silico Functional and Structural Characterization of… 191
123
Calif X-157. Phosphorylation sites in Shiraz 38 (123, 126, 127, 209, 215, 287, 501,
507) were found in Calif and Calif X-157 but a new site (101) was found in Shiraz
43. Brisbane phosphorylation sites showed a very different Phosphorylation pattern.
The study of the numbers of phosphorylation sites in the isolates revealed various
changes. These include 14 serine sites in Shiraz 1, Shiraz 2, Shiraz 5, Shiraz 8, Calif
and Calif X-157. On the other hand, 17 and 16 serine sites were found in Shiraz 3,
Shiraz 4 as well as 13 in Shiraz 38 and Shiraz 43, respectively.
In addition 8 threonine phosphorylation sites were detected in Shiraz 1, Shiraz 3,
Shiraz 38, Calif and Calif X-157. Also 9 threonine phosphorylation sites were found
in Shiraz 4 and Shiraz 8 with 7 in Shiraz 43 and 4 in Brisbane. Similar kinase
phosphorylation sites (124, 220, 224, 326, 393, 500, and 524) were identified in all
isolates except in Brisbane.
Comparison of the results of the patient and vaccine isolates indicated lack of site
221 in Shiraz 1,Shiraz 5, Shiraz 7,Shiraz 8 and Shiraz 38; 294 in Shiraz 1–Shiraz 7,
Shiraz 8. Also our analysis indicated addition of site 321 in Shiraz 3, site176 in
Shiraz 4, and site 201 in Shiraz 43. Brisbane had only 4 sites (124, 220, 392, and
499).
The outcomes of glycosylation site prediction for all protein sequences by using
NetNGlyc and GlycoEP are displayed in Table 6. NetNGlyc results showed 4
conserved glycosylation sites (28, 40, 304, and 557) for all isolates except for
Brisbane, that glycosylation sites located on 28, 40, 71, 142, 176, 303, and 556.
Similar glycosylation sites’ prediction (27, 28, 293, and 498) was shown by
GlycoEP in all isolates but not in Brisbane.
Comparison all sequences with Calif and Calif X-157 showed loss of sites 71 and
176 in Shiraz 1–Shiraz 8, Shiraz 38, and Shiraz 43 and deletion of site 304 in Shiraz
38 and Shiraz 43 and addition of 40 and 557 in Shiraz 8. Brisbane had some similar
sites with Calif and Calif X-157 (27, 28, 71, 176, and 498) and one different site
(497).
Table 4 Results of 16 meric peptide sequences as B cell epitopes by ‘‘ABCpred’’
Isolates Start codon of 16 meric peptide sequences as B cell epitopes
Shiraz 1 279 338 300 250 357 94 449 351 383
Shiraz 2 ? ? ? ? ? ? ? ? ?
Shiraz 3 ? ? ? ? ? ? ? ?
Shiraz 4 ? ? ? ? ? ? ? ?
Shiraz 5 ? ? ? ? ? ? ?
Shiraz 6 ? ? ? ? ? ? = ? ?
Shiraz 7 ? ? ? ? ? ? ? ? ?
Shiraz 8 ? ? ? ? ? ?
Shiraz 38 ? ? ? ? ?
Shiraz 43 ? ? ? ? ?
Brisbane ? 356 350 497, 337
Calif ? ? ? ? 356 93 ? ? ?
Calif X-157 ? ? ? ? ? 93 ? ? ?
192 A. Moattari et al.
123
Tab
le5
Resultsofpositionofphosphorylationsites,number
ofphosphorylationsites,andKinasephosphorilationsites
Isolates
Positionofphosphorylationsites
Numbersphosphorylationsites
Kinasephosphorilationsites
Shiraz
1123,127,209,215,287,501,507
Ser:14Thr:8Tyr:11
124,220,224,326,393,500,524
Shiraz
2?,?,?,?,?,?,?
Ser:14Thr:8Tyr:11
124,220,224,326,393,500,524
Shiraz
3?,?,?,?,?,?,?
Ser:13Thr:8Tyr:11
124,220,224,321,326,393,500,524
Shiraz
4?,?,?,?,?,?,?
Ser:13Thr:9Tyr:11
124,176,220,224,294,326,393,500,524
Shiraz
5?,?,?,?,?,?,?
Ser:14Thr:9Tyr:11
124,220,224,294,326,393,500,524
Shiraz
6?,?,?,?,?,?,?
Ser:14Thr:9Tyr:11
124,220,221,224,294,326,393,500,524
Shiraz
7?,?,?,?,?,?,?
Ser:14Thr:9Tyr:11
124,220,224,294,326,393,500,524
Shiraz
8?,?,?,?,?,?,?
Ser:14Thr:9Tyr:11
124,220,224,326,393,500,524
Shiraz
38
?,?,?,?,?,?,?,126
Ser:17Thr:8Tyr:10
124,220,224,294,326,393,500,524
Shiraz
43
?,?,?,?,?,?,?,101,106,126
Ser:16Thr:7Tyr:11
124,201,220,224,294,326,393,500,524
Brisbane
115,123,126,127,208,214,227,500,506
Ser:17Thr:4Tyr:11
124,220,392,499
Calif
95,99,106,123,126,127,209,215,220,287,501,507
Ser:14Thr:8Tyr:11
124,220,221,224,294,326,393,500,524
CalifX-157
95,99,106,123,126,127,209,215,287,501,507
Ser:14Thr:8Tyr:11
124,220,221,224,294,326,393,500,524
In Silico Functional and Structural Characterization of… 193
123
Table 6 Glycosylation sites of all 13 isolates by two softwares ‘‘NetNGlyc’’ and ‘‘GlycoEP’’
Isolates NetNGlyc GlycoEP
Shiraz 1 28, 40, 304, 557 27, 28, 293, 304, 498
Shiraz 2 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 3 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 4 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 5 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 6 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 7 ?, ?, ?, ? ?, ?, ?, ?, ?
Shiraz 8 ?, ?, ?, ? 27, 28, 40, 293, 304, 498, 557
Shiraz 38 ?, ?, ?, ? 27, 28, 293, 498
Shiraz 43 ?, ?, ?, ? 27, 28, 293, 498
Brisbane 28, 40, 71, 142, 176, 303, 556 27, 28, 71, 176, 303, 497
Calif ?, ?, ?, ? 27, 28, 71, 176, 293, 304, 498
Calif X-157 ?, ?, ?, ? 27, 28, 71, 176, 293, 304, 498
Fig. 2 Secondary structures of all sequences predicted by ‘‘SOMPA’’ and validated. Blue helix, redstrand, purple coil and green beta turn. (Color figure online)
194 A. Moattari et al.
123
3.6 Secondary Structure Prediction
Percentages of secondary structure constituents generated by SOPMA and other
softwares, and schematic display of proteins’ secondary structure are depicted in
Fig. 2.
3.7 Tertiary Structures Prediction
All 3D structures were determined by I-TASSER, Phyre2server and (PS)2, and the
predicted structures were validated using Qmean and Rammpage. Rammpage
identified 3D structure by 3 regions including favored region, allowed region and
outlying region. The analysis of the results showed that the predicted 3D structures
by Phyre2server were more reliable. Means of favored and allowed regions for
Phyre2server were 94.24 and 4.26 %; which was 89.07 and 7.11 % for I-TASSER
and 92.13 and 5.1 % for (PS)2, indicating the Phyre2server as a more credible
bioinformatic software to predict the tertiary structure of hemagglutinin.
Qmean results included two main scores, QMEAN score and Z-score, showing
the quality and reliability of tertiary structures. Means of QMEAN score and
Z-score for Phyre2server were 0.624 and 1.7; for I-TASSER as 0.49, -3.05 and for
(PS) 2 as 0.48, -3.1. The current results confirm better prediction of quality and
reliability structure by Phyre2server. The results of Qmean and Rammpage analyses
are shown in Table 7 and finally predicted 3D structure for each sequence is
displayed in Fig. 3.
The positions of phosphorylation and glycosylation sites of 2010, 2013 and
vaccine isolates are shown in Figs. 4 and 5, respectively.
4 Discussion
Bioinformatic tools are beneficial and useful methods used for analysis and
prediction of biological phenomena. Several bioinformatic tools have been
developed in recent years but validation tests are necessary to perform for all of
them. This research confirmed the validation of each tool, before they are used in
analytical studies.
The current study is a comparative analysis of some viral sequences derived from
patients between 2010 and 2013 in virology department of Shiraz University of
Medical Sciences and those of 3 vaccine isolates as control.
The results showed some amino acid changes in 13 sequences of HA related to
alignment tree. Also, the study of amino acids revealed similar changes in Shiraz 1,
Shiraz 8 in 105 and 102 positions. The study of Shiraz 38, and Shiraz 43 detected
changes in 9 amino acid residues including 391, 155, 202, 516, 320, 468, 226, 461,
and 305. The changes in amino acids could be related to diversity in modification
sites, epitopes, function and structure of HA. The comparison between amino acid
changes and properties of HA indicated widespread useful data supporting HA
functional and structural prediction of isolates derived from the patients (Das et al.
2010; Strengell et al. 2011; Sun et al. 2010, 2013).
In Silico Functional and Structural Characterization of… 195
123
Table
7Validationofproteins3D
structures,Ram
mpageanalysis(%
ofresidues
infavouredregion,%
ofresidues
inallowed
region),QMEAN
score
(global
score
of
thewhole
model
reflectingthepredictedmodel
reliabilityrangingfrom
0to
1)andZ-score
isamasseurfortheabsolute
qualityofamodel
Isolates
Ram
mpageanalysis
QMEAN
score
Z-score
I-TASSER(%
)Phyre2(%
)(PS)2
(%)
I-TASSER
Phyre2
(PS)2
I-TASSER
Phyre2
(PS)2
Shiraz
189.5,6.6
94.4,4.2
93.3,4.6
0.479
0.63
0.475
-3.1
-1.63
-3.19
Shiraz
288.7,7.8
94.2,4.2
92.2,5.1
0.484
0.631
0.476
-3.09
-1.62
-3.17
Shiraz
389.5,7.4
94.0,4.4
93.1,3.9
0.501
0.622
0.482
-2.89
-1.73
-3.1
Shiraz
489.5,6.6
94.2,4.4
91.8,5.5
0.482
0.617
0.509
-3.16
-1.79
-2.8
Shiraz
588.7,7.8
94.2,4.4
91.8,5.5
0.484
0.616
0.459
-3.23
-1.8
-3.36
Shiraz
689.2,7.5
94.4,4.2
93.1,4.1
0.5
0.634
0.478
-2.83
-1.58
-3.15
Shiraz
789.3,6.6
94.4,4.2
93.1,4.1
0.47
0.625
0.462
-3.39
-1.69
-3.34
Shiraz
888.1,7.8
94.2,4.4
91.1,6.0
0.464
0.625
0.503
-2.89
-1.68
-2.87
Shiraz
38
89.9,7.4
94.2,4.2
91.1,6.0
0.511
0.623
0.481
-3.16
-1.72
-3.12
Shiraz
43
89.3,6.3
94.4,4.0
91.1,6.0
0.489
0.631
0.498
-3.13
-1.62
-2.92
Brisbane
88.2,7.3
93.8,4.4
93.1,4.6
0.49
0.602
0.562
-2.84
-1.97
-2.47
Calif
89,7.2
94.4,4.2
91.5,5.5
0.512
0.634
0.447
-3.19
-1.58
-3.5
CalifX-157
89.1,6.2
94.4,4.2
91.5,5.5
0.499
0.63
0.454
-2.79
-1.62
-3.43
196 A. Moattari et al.
123
Fig. 3 3D structure of proteins, 1 H1, 2 H2, 3 H3, 4 H4, 5 H5, 6 H6, 7 H7, 8 H8, 9 H38, 10 H43, 11Brisbane, 12 California, and 13 California X-157
Fig. 4 Position of phosphorylation sites. a 2010 isolates, b 2013 isolates, c vaccine isolates
In Silico Functional and Structural Characterization of… 197
123
The primary analysis of the properties of HA sequences did not show any
relationship to amino acid changes and protein properties. Such data will be
beneficial to future analyses like cloning, expression, and purification of HA.
The comparative study between B cells epitope regions by immuneepitope and
amino acid changes showed that 141–145 and 501 were conserved regions; the lack
of 173–175 in Shiraz 4 was related to tyrosine to phenylalanine change in amino
acid 175. Also, amino acid change in amino acid 104 was related to lack of 104–108
epitope region in patient isolates except Shiraz 38, with no change in position 104;
the proline changing to serine in amino acid 100 was related to lack of 100–102
region in all patient isolates. 108–110 and 102–103 regions in Shiraz 43 isolate was
not amenable to logical interpretation.
Bepipred showed 6 conserved regions in all patient isolates including 28–32,
138–148, 200–204, 238–239, 371–377, and 499–507. Tyrosine to phenylalanine in
175 and proline to serine in 176 positions were related to lack of 174–176 epitope
region. Lack of 100–107 region in Shiraz 1 and Shiraz 8 isolates was related to
changing of asparagine to lysine in 104, serine to threonine in 102, and glycine to
tryptophan in 105 but changing of glycine to tryptophan was more important. Shiraz
1, Shiraz 3 and Shiraz 8 did not contain 287–292 region, because aspartic acid
changed to asparagine in 291.
BcePred detected many conserved regions in B cell epitopes but lack of 413–429
and 320–324 in Shiraz 1 and Shiraz 2 was not related to amino acid changes.
On the other hand, 279, 300, 357, 351 are the start codons of conserved 16 meric
regions in all patient isolates. Lack of 338–354 region in Shiraz 5 isolates was
related to isoleucine change to valine in 341. Changing of glutamic acid to lysine in
391 was responsible for lack of 383–399 region in Shiraz 4, Shiraz 5, Shiraz 38 and
Shiraz 43.
Phosphorylation is a major and important phase of HA post-translational
modifications and viral protein phosphorylation plays important roles in the
Fig. 5 Position of glycosylation sites. a 2010 isolates, b 2013 isolates, c vaccine isolates
198 A. Moattari et al.
123
influenza virus life cycle (Hutchinson et al. 2012; Wang et al. 2013). DISPHOS and
NetPhos are prevalent and helpful tools based on serine, threonine and tyrosine
phosphorylation sites in proteins.
Study of properties on proteins Complexity, hydrophobicity, and charge seem to
exist in multiple regions showed protein regions in and around the phosphorylation
sites were an important prerequisite for phosphorylation. Two dimensional analyses
of conserved phosphorylation sites (123, 127, 209, 215, 287, 501, 507) showed that
123, 209, 215 and 507 were on the helix and 127, 287 and 501 were on coil
structure.
The number of phosphorylation sites did not show a significant difference
between 2010, 2013 and vaccine isolates, but there was a limited increase in serine
sites.
Predictions of kinase specific eukaryotic protein phosphoylation sites by
NetPhosK 1.0 Server’’ with 0.7 threshold revealed all phosphoylation sites with
the highest score corresponding to the Protein kinase C (PKC) phosphorylation
sites. Some related studies have shown the important roles played by PKC in
infection and release from human cells (Root et al. 2000; Sieczkarski et al. 2003).
The analysis of data did not show any major changes in PKC phosphorylation
sites except for a new phosphorylation site in Shiraz 43 compared to vaccine
isolates.
Threonine change to serine amino acid was related to lack of 321 and 176 PKC
phosphorylation sites in Shiraz 3 and Shiraz 4, respectively.
HA is considered as a surface glycoprotein of influenza virus and glycosylation
has been shown to have important roles in many functions of HA molecules (Das
et al. 2010; Mir-Shekari et al. 1997; Sun et al. 2013).
Oligosaccharides can attach to the asparagine (Asn) side chain in N-X-(S/T)
Sequon, where X represents any residue other than proline in glycosylation
modification cotranslationally or posttranslationally.
Many types of glycans have been found on HA molecules, including high
mannose, complex type, and hybrid type. Regardless of glycan type, structure and
composition of glycans depends on the accessibility of glycosylation sequons to
host cell saccharide modifying enzymes.
In many previous studies, the great function of glycosylation has been found
including: (a) protein folding that is necessary to transport to the cell surface, (b) to
avoid accumulation in the Golgi complex, (Roberts et al. 1993) (c) receptor binding,
(d) escape from immune system by interfering with antibody recognition, (e) the
HA cleavage of glycans near the proteolytic activation site of HA modulate, and
(f) changes in receptor binding properties (Klenk et al. 2001). Studies on the
progressive increase in glycosylation sites since 1918, has shown that glycosylation
takes place specifically on the HA globular head region (Sun et al. 2013).
In the current study, glycosylation analysis showed two similar sites in 28 and
304 regions, indicated by NetNGlyc, GlycoEP softwares. Studies conducted from
2007 to 2013 showed that regions 27, 28 and 40 are the conserved sites.
Interestingly, comparison between 2009, 2010, and 2013 isolates detected a
decrease in the number of glycosylation sites without any new site.
In Silico Functional and Structural Characterization of… 199
123
The regions 27, 28, 40, 293, 304, and 498 were major locations on the coil
secondary structure and all major sites except 498(stalk) were found on globular
part of viral protein domain.
N-linked glycosylation sites in 304 were absent in Shiraz 38 and Shiraz 43
because threonine changed to proline in 305 region, regarding Asn-X-Ser/
Thrsequons where X is any amino acid except proline. This change was determined
by GlycoEP software but the results of NetNGlyc did not show any changes. This
indicates a better performance of GlycoEP compared to NetNGlyc.
Secondary and tertiary structure analysis did not show any significant differences
among patient and vaccine isolates; also the analysis showed that the main mass of
HA consisted of coils, helix, strand and turn.
Overview of all results confirmed widespread changes in 2013 isolates compared
with vaccine and 2010 isolates. Often the change in properties of HA shows
diversity in HA protein that could lead to changes in virulence and infection
mechanism of influenza virus, a condition reducing the efficiency of vaccine.
Similar studies, at different time periods, are necessary to distinguish the the
diversity and changes of HA protein as an important and multifunctional protein in
influenza virus virulence. Few studies have focused on the relationship between
experimental results and in silico analysis for HA proteins. Therefore, the results of
this study are useful for better understanding of the HA modification sites, epitope
sites, and structural analysis that are important in delineating the mechanism of
hemagglutinin action. Screening of hemagglutinin diversity is very important to
achieve better understanding of H1N1 antigenic variations, antigenic drift and
examination of vaccine efficacy of influenza vaccine.
Acknowledgments The authors would like to acknowledge Shiraz University of Medical Sciences for
financial support.
References
Anwar T, Lal SK, Khan AU (2006) In silico analysis of genes nucleoprotein, neuraminidase and
hemagglutinin: a comparative study on different strains of influenza A (Bird Flu) virus sub-type
H5N1. In Silico Biol 6:161–168
Benkert P, Tosatto SC, Schomburg D (2008) QMEAN: a comprehensive scoring function for model
quality assessment proteins—structure, function. Bioinformatics 71:261–277
Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure-based prediction of eukaryotic protein
phosphorylation sites. J Mol Biol 294:1351–1362
Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational
glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics
4:1633–1649
Boulay F, Doms RW, Webster RG, Helenius A (1988) Posttranslational oligomerization and cooperative
acid activation of mixed influenza hemagglutinin trimers. J Cell Biol 106:629–639
Bouvier NM, Lowen AC (2010) Animal models for influenza virus pathogenesis and transmission.
Viruses 2:1530–1563
Chauhan JS, Rao A, Raghava GP (2013) In silico platform for prediction of N-, O-and C-glycosites in
eukaryotic protein sequences. PLoS ONE 8:e67008
Chen C-C, Hwang J-K, Yang J-M (2006) 2: protein structure prediction server. Nucleic Acids Res
34:W152–W157
200 A. Moattari et al.
123
Chou PY, Fasman GD (2006) Prediction of the secondary structure of proteins from their amino acid
sequence. In: Advances in enzymology and related areas of molecular biology. Wiley, New York,
pp 45–148. doi:10.1002/9780470122921.ch2
Das SR, Puigbo P, Hensley SE, Hurt DE, Bennink JR, Yewdell JW (2010) Glycosylation focuses
sequence variation in the influenza A virus H1 hemagglutinin globular domain. PLoS Pathog
6:e1001211
Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour
antigens and subunit vaccines. BMC Bioinform 8:4
Emini EA, Hughes JV, Perlow D, Boger J (1985) Induction of hepatitis A virus-neutralizing antibody by a
virus-specific synthetic peptide. J Virol 55:836–839
Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A (2005) Protein identification
and analysis tools on the ExPASy server. In: Walker JM (ed) The proteomics protocols handbook.
Springer, Berlin, pp 571–607
Geourjon C, Deleage G (1995) SOPMA: significant improvements in protein secondary structure
prediction by consensus prediction from multiple alignments. Comput Appl Biosci CABIOS
11:681–684
Girard MP, Tam JS, Assossou OM, Kieny MP (2010) The 2009 A (H1N1) influenza virus pandemic: a
review. Vaccine 28:4895–4902
Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to
protein function. Pac Symp Biocomput 7:310–322
Han T, Marasco WA (2011) Structural basis of influenza virus neutralization. Ann N Y Acad Sci
1217:178–190
Hutchinson EC et al (2012) Mapping the phosphoproteome of influenza A and B viruses by mass
spectrometry. PLoS Pathog 8:e1002993
Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK (2004) The
importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32:1037–1049
Isin B, Doruker P, Bahar I (2002) Functional motions of influenza virus hemagglutinin: a structure-based
analytical approach. Biophys J 82:569–581
Kapoor S, Dhama K (eds) (2014) Properties of influenza viruses. In: Insight into influenza viruses of
animals and humans. Springer, Berlin, pp 7–13
Karplus P, Schulz G (1985) Prediction of chain flexibility in proteins. Naturwissenschaften 72:212–213
Kaur H, Raghava G (2003a) A neural-network based method for prediction of c-turns in proteins from
multiple sequence alignment Protein. Science 12:923–929
Kaur H, Raghava GPS (2003b) Prediction of b-turns in proteins from multiple alignment using neural
network Protein. Science 12:627–634
Kelley LA, Sternberg MJ (2009) Protein structure prediction on the Web: a case study using the Phyre
server. Nat Protoc 4:363–371
Klenk H-D, Wagner R, Heuer D, Wolff T (2001) Importance of hemagglutinin glycosylation for the
biological functions of influenza virus. Virus Res 82:73–75
Larsen JE, Lund O, Nielsen M (2006) Improved method for predicting linear B-cell epitopes. Immunome
Res 2:2
Ma W et al (2011) 2009 pandemic H1N1 influenza virus causes disease and upregulation of genes related
to inflammatory and immune responses, cell death, and lipid metabolism in pigs. J Virol
85:11626–11637
Mir-Shekari SY, Ashford DA, Harvey DJ, Dwek RA, Schulze IT (1997) The glycosylation of the
influenza A virus hemagglutinin by Mammalian cells. A site-specific study. J Biol Chem
272:4027–4036
Mishin VP, Novikov D, Hayden FG, Gubareva LV (2005) Effect of hemagglutinin glycosylation on
influenza virus susceptibility to neuraminidase inhibitors. J Virol 79:12416–12424
Parker J, Guo D, Hodges R (1986) New hydrophilicity scale derived from high-performance liquid
chromatography peptide retention data: correlation of predicted surface residues with antigenicity
and X-ray-derived accessible sites. Biochemistry 25:5425–5432
Roberts PC, Garten W, Klenk H-D (1993) Role of conserved glycosylation sites in maturation and
transport of influenza A virus hemagglutinin. J Virol 67:3048–3060
Root CN, Wills EG, McNair LL, Whittaker GR (2000) Entry of influenza viruses into cells is inhibited by
a highly specific protein kinase C inhibitor. J Gen Virol 81:2697–2705
Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and
function prediction. Nat Protoc 5:725–738
In Silico Functional and Structural Characterization of… 201
123
Saha S, Raghava GPS (2004) BcePred: prediction of continuous b cell epitopes in antigenic sequences
using physico-chemical properties. In: Nicosia G, Cutello V, Bentley P, Timmis J (eds) Artificial
immune systems, vol 3239. Lecture Notes in Computer Science. Springer, Berlin, pp 197–204.
doi:10.1007/978-3-540-30220-9_16
Saha S, Raghava G (2006a) AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.
Nucleic Acids Res 34:W202–W209
Saha S, Raghava G (2006b) Prediction of continuous B-cell epitopes in an antigen using recurrent neural
network Proteins: structure, Function. Bioinformatics 65:40–48
Schrauwen EJ, Fouchier RA (2014) Host adaptation and transmission of influenza A viruses in mammals.
Emerg Microbes Infect 3:e9
Sieczkarski SB, Brown HA, Whittaker GR (2003) Role of protein kinase C bII in influenza virus entry vialate endosomes. J Virol 77:460–469
Sriwilaijaroen N, Suzuki Y (2012) Molecular basis of the structure and function of H1 hemagglutinin of
influenza virus. Proc Jpn Acad Ser B Phys Biol Sci 88:226
Strengell M, Ikonen N, Ziegler T, Julkunen I (2011) Minor changes in the hemagglutinin of influenza A
(H1N1) 2009 virus alter its antigenic properties. PLoS ONE 6:e25848
Sun Y et al (2010) In silico characterization of the functional and structural modules of the hemagglutinin
protein from the swine-origin influenza virus A (H1N1)-2009. Sci China Life Sci 53:633–642
Sun X et al (2013) N-linked glycosylation of the hemagglutinin protein influences virulence and
antigenicity of the 1918 pandemic and seasonal H1N1 influenza a viruses. J Virol 87:8756–8766
Taubenberger JK, Morens DM (2006) 1918 Influenza: the mother of all pandemics. Rev Biomed
17:69–79
Taubenberger JK, Morens DM (2010) Influenza: the once and future pandemic. Public Health Rep 125:16
Tong S, Li Y, Rivailler P, Conrardy C, Castillo DA, Chen LM, Recuenco S, Ellison JA, Davis CT, York
IA et al (2012) A distinct lineage of influenza A virus from bats. Proc Natl Acad Sci USA
109:4269–4274
Tong S, Zhu X, Li Y, Shi M, Zhang J, Bourgeois M, Yang H, Chen X, Recuenco S, Gomez J et al (2013)
New world bats harbor diverse influenza A viruses. PLoS Pathog 9:e1003657
Wang C-C et al (2009) Glycans on influenza hemagglutinin affect receptor binding and immune response.
Proc Natl Acad Sci 106:18137–18142
Wang S, Zhao Z, Bi Y, Sun L, Liu X, Liu W (2013) Tyrosine 132 phosphorylation of influenza A virus
M1 protein is crucial for virus replication by controlling the nuclear import of M1. J Virol
87:6182–6191
World Health Organization (2010) Pandemic (H1N1) 2009—update 112.World Health Organization
Yang ZR, Thomson R, McNeil P, Esnouf RM (2005) RONN: the bio-basis function neural network
technique applied to the detection of natively disordered regions in proteins. Bioinformatics
21:3369–3376
202 A. Moattari et al.
123