research article expressing redundancy among linear
TRANSCRIPT
Research ArticleExpressing Redundancy among Linear-Epitope SequenceData Based on Residue-Level Physicochemical Similarity inthe Context of Antigenic Cross-Reaction
Salvador Eugenio C Caoili
Department of Biochemistry and Molecular Biology College of Medicine University of the Philippines Manila Room 101Medical Annex Building 547 Pedro Gil Street Ermita 1000 Manila Philippines
Correspondence should be addressed to Salvador Eugenio C Caoili badongpostupmeduph
Received 27 October 2015 Revised 29 March 2016 Accepted 10 April 2016
Academic Editor Gilbert Deleage
Copyright copy 2016 Salvador Eugenio C Caoili This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Epitope-based design of vaccines immunotherapeutics and immunodiagnostics is complicated by structural changes that radicallyalter immunological outcomes This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions evenwhere thesemight be considered conservative in the context of classical sequence analysis From the perspective of immune functionbased on molecular recognition of epitopes functional redundancy of epitope data (FRED) thus may be defined in a biologicallymore meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction with functionalsimilarity between epitopes expressed as the Shannon information entropy for differential epitope binding Such similarity may beestimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealizedbinding site of high complementarity to the immunogen epitope by analogy between protein folding and ligand-receptor bindingbut this underestimates potential for cross-reactivity suggesting that epitope-binding site complementarity is typically suboptimalas regards immunologic specificity The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immunefunction that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes
1 Introduction
Immunological targeting of antigens exemplified by pathogenvirulence factors allergens and even typical drugs is funda-mental to the solution of global-health problems includingboth infectious and noninfectious diseases [1ndash3] This entailsmolecular recognition of antigens by immune-system com-ponents (eg antibodies and T-cell receptors) which occursvia binding of epitopes (ie the recognized submolecularstructural features of antigens) [4 5] Epitope prediction (iecomputational identification of epitopes among biomoleculessuch as proteins) aims to enable selective incorporation ofparticular epitopes (eg actual targets of protective immuneresponses rather than disease-enhancing immunologicaldecoys) into antigenic constructs (eg synthetic peptides) fornovel vaccines immunotherapeutics and immunodiagnos-tics [6] However this is complicated by the limited accuracy
of existing tools for epitope prediction [7] The present workthus explores the crucial yet largely neglected issue of epitope-data redundancy as a key consideration in epitope-predictiontool development
Progressive development of epitope-prediction toolsrequires empirical epitope data for both training (eg inthe context of machine learning) and benchmarking [8ndash10] Hence epitope-data redundancy is a major concernespecially where data-driven statistical andmachine-learningmethods are employed to develop tools for epitope predictionand also related applications (eg MHC binding prediction)as studies might yield biased results due to overrepresenta-tion of similar epitopes Similarity among epitopes is oftenexpressed as sequence similarity particularly for linear pep-tidic epitopes which include continuous B-cell epitopes (eachconsisting of a single unbroken epitope-residue sequence incontrast to discontinuous epitopes wherein epitope residues
Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2016 Article ID 1276594 13 pageshttpdxdoiorg10115520161276594
2 Advances in Bioinformatics
are separated in the sequence by intervening residues) andtypical T-cell epitopes (each bound by a MHC moleculefor presentation to T-cells) For these classical heuristicapproaches to decrease the redundancy entail setting asimilarity threshold (typically expressed as a fraction ofidentical residues for a pair of aligned sequences) such thatsubsequent analyses may compensate accordingly (eg byexcluding sequences sharing a degree of similarity abovethe threshold) This practice is well-established for general-purpose protein structural analyses [11ndash13] but potentiallyproblematic if applied to peptidic epitopes in view of thenonlinear relationship between sequence similarity and anti-genic similarity (eg as demonstrated by radically divergentantigenic properties arising from a structural difference ofonly a single chemical group [14]) This suggests the needfor a more functionally meaningful alternative approach toexpressing redundancy of epitope data
Protein folding and binding [15] may be regarded asmanifestations of the same underlying phenomenon drivenby the hydrophobic effect favoring burial of nonpolar sur-faces in general away from solvent water albeit with moreselective burial of polar surfaces that favors complemen-tary pairing between hydrogen-bond donors and acceptorsResidues that thus become completely buried (eg withinthe core of a folded protein or at the binding interface of aligand-receptor complex) are sterically and electrostaticallyconstrained by surrounding residues to a much greaterextent than unfolded or even folded but only partially buriedresidues (eg at solvent-exposed protein surfaces) Conse-quentlymolecular recognition of epitopes which ismediatedby local ligand-receptor binding interactions depends onsequence detailsmuchmore than overall (ie global) featuresof protein structure do notably in the sense of proteinfolds
Proteinsmay share the same fold (eg as demonstrated bystructural superposition of their backbones) well into the so-called twilight zone below the threshold for reliable detectionof aligned-sequence similarity (ie less than 35 pairwisesequence identity) [16 17] Residue substitutions can betolerated at surface-exposed positions (eg with replacementof certain polar residues by others whose side-chains differin steric and electrostatic properties) Even at buried-corepositions certain nonpolar residues may be replaced byothers whose side-chains differ in volume especially whereadditional substitutions or other changes compensate for thevolume differences although the introduction of unsatisfiedhydrogen-bond donors or acceptors and of unpaired formalcharges tends to be poorly tolerated [18 19] Howevereven just a single-residue substitution in an epitope mayabolish epitope-specific immune binding (eg by antibod-ies) if surface complementarity is disrupted at the epitope-binding interface (much as the structural organization ofa protein core is disrupted by a radical substitution) asobserved with immune evasion by pathogen escape mutantsHence the notion of conservative residue substitutions asconceptualized for evolving protein sequences (eg on thebasis of changes in both volume and hydrophobicity) is ofquestionable applicability to epitopes as it obscures crucialdetails of changes in steric and electrostatic complementarity
From the perspective of immune function redundancyof epitope data arguably implies potential for antigeniccross-reaction (ie recognition of different epitopes bya common B- or T-cell receptor or equivalent thereof)rather than sequence similarity defined without reference toany immune-system component Hence the present workoutlines a generic theoretical framework for quantitativelyexpressing epitope data redundancy in terms of potentialantigenic cross-reactivity as estimated from an idealized lim-iting case of optimal ligand-receptor complementarity (com-parable to the steric and electrostatic self-complementarityobservedwithin folded proteins [20])This is applied to linearpeptidic epitopes for clarity of illustration and consideringtheir envisioned roles as immune targets in relation topeptide-based vaccines and immunodiagnostics Theoreticalresults are compared with informative available experimentaldata on antigenic cross-reactivity of structurally related shortpeptides in light of possible suboptimal ligand-receptor com-plementarity and its biomedical implications vis-a-vis epitopesequence variation (eg arising as pathogen mutations andhost polymorphisms)
2 Theory and Methods
21 Functional Redundancy of Epitope Data The presentwork aims to support the use of available empirical datato further develop epitope-prediction methods without dis-carding unique epitopes deemed redundant on the basisof some sequence similarity threshold Epitope data thusretained must be characterized as to redundancy for whicha reduced epitope count is introduced belowThis is based onepitope similarity defined in terms of either aligned-sequencesimilarity or functional similarity as regards cross-reactiveepitope binding Aligned-sequence similarity is exploredfirst because it is a familiar and readily calculated quantityHowever its limitations become apparent in the contextof attempting to describe antigenic cross-reaction from aphysicochemical perspective Hence functional similarity isalso explored
For a set of epitopes (construed as epitope structureseg linear peptidic sequences) functional redundancy canbe expressed as a reduced epitope count 119903 such that 1 le
119903 le 119899 where 119899 is the total epitope count with 119903 = 1 and119903 = 119899 respectively corresponding to extreme cases whereinevery epitope is either maximally or minimally similar toevery other epitope from a functional standpoint Thusstructurally identical epitopes would be maximally similarto one another but structurally nonidentical epitopes mightalso be regarded as maximally similar if their structuraldifference was negligible or otherwise irrelevant from afunctional standpoint For simplicity further elaboration ofconcepts herein focuses on cases wherein all epitopes arestructurally unique peptidic sequences of equal length (Thequalifier of ldquostructurally uniquerdquo is consistent with real-world epitope databases insofar as they regard each epitopeas a structurally unique entity The qualifier of ldquoequal lengthrdquois clearly applicable to typical class I MHC-restricted T-cellepitopes and relatively short linear B-cell epitopes albeit less
Advances in Bioinformatics 3
so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)
For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as
119903 =
119899
sum
119894=1
119908119894 (1)
where 119908119894is the contribution of the 119894th epitope to 119903 In turn
119908119894may be defined as
119908119894=
1 + sum119899
119895=1(1 minus 119878
119894119895)
119899
(2)
where 119878119894119895is the functional similarity between the 119894th and 119895th
epitopes such that 0 le 119878119894119895le 1 with 119878
119894119895= 0 and 119878
119894119895= 1
respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878
119894119895might be defined
for a pairwise epitope sequence alignment as
119878119894119895=
sum119898
119896=1119904119894119895119896
119898
(3)
where 119904119894119895119896
is the functional similarity between amino-acidresidues 119886
119894119896and 119886
119895119896at the 119896th sequence positions of the
aligned 119894th and 119895th epitope sequences respectively such that0 le 119904
119894119895119896le 1 with 119904
119894119895119896= 0 and 119904
119894119895119896= 1 respectively
corresponding to extreme cases wherein 119886119894119896
and 119886119895119896
areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904
119894119895119896might be simply defined as
119904119894119895119896=
0 if 119886119894119896
= 119886119895119896
1 if 119886119894119896= 119886119895119896
(4)
for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether
A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon
information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that
119878119894119895= minus (119891
119894log2119891119894+ 119891119895log2119891119895) (5)
where 119891119894and 119891
119895are the fractions of epitope-bound probe
bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878
119894119895= 1 where the 119894th and 119895th epitopes are
indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as
119891119895=
prod119898
119896=1119904119894119895119896
1 + prod119898
119896=1119904119894119895119896
(6)
where 119904119894119895119896
retains its meaning as introduced in (3) Howeverretaining the definition of 119904
119894119895119896given by (4) would be tanta-
mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted
A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904
119894119895119896of
(3) (4) and (6)) may be defined alternatively as
119904119894119895119896
=
0 if 119886119894119896 119886119895119896
sterically incompatible
exp(minusΔΔ119866119894119895119896
119877119879
) otherwise
(7)
where 119894 and 119895 denote the original and altered epitopesrespectively 119886
119894119896and 119886
119895119896retain their meanings as in (4)
ΔΔ119866119894119895119896
is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886
119894119896
by 119886119895119896119877 is the gas constant and119879 is the absolute temperature
ΔΔ119866119894119895119896
is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)
To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping
4 Advances in Bioinformatics
HN
HO HOArg (R)
NH
O O
O O
O
O O O O O O
OH
O O O
O O
OO
O O O O
Lys (K)
Subset 2 trigonal planar
HO
HN HN
Gln (Q)
HOHO
HOGlu (E)
Subset 4 trigonal planar
membered ring
N
S
HOMet (M)
HS HO
HOCys (C)
HOSer (S)
HOAla (A)
HOGly (G)
O
HOTyr (Y)
Subset 5 branching at tetrahedral 120574-
HOPhe (F)
HOAsn (N)
HOAsp (D)
Subset 6 branching at tetrahedral 120573-carbon atom
Subset 7 withring comprising
main-chain
HN
HOHOHis (H)
HOLeu (L)
HOIle (I)
HOVal (V)
HOThr (T) Pro (P)
H2N
H2N
H2N
H2N H2N H2N H2NH2NH2N
H2N H2N H2N H2N H2NH2N
H2N H2N H2N H2N H2NH2N
NH2
NH2
Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions
OH
120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-
120574-carbon atom in five-
Trp (W)
carbon atom
HOO
membered ring
nitrogen atom
Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)
the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral
The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather
K R (1)
(2) (5) (6)
Q E M L I
(3) (4)
Y W C T V
F H S P (7)
N D A G
Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1
Advances in Bioinformatics 5
Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)
G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0
than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom
According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization
As regards ΔΔ119866119894119895119896
(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as
ΔΔ119866119894119895119896= 119888Δ119881
119894119895119896+ 119887119873119894119895119896 (8)
where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896
is the change in residue volume and 119873119894119895119896
is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope
Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as
119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)
where 119860119894119895119896
is the total number of oxygen and nitrogenatoms in the aligned residues while 119861
119894119895119896is the number of
such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873
119894119895119896
increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both
Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873
119894119895119896= 0 for pairs of identical
residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873
119894119895119896= 1
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 Advances in Bioinformatics
are separated in the sequence by intervening residues) andtypical T-cell epitopes (each bound by a MHC moleculefor presentation to T-cells) For these classical heuristicapproaches to decrease the redundancy entail setting asimilarity threshold (typically expressed as a fraction ofidentical residues for a pair of aligned sequences) such thatsubsequent analyses may compensate accordingly (eg byexcluding sequences sharing a degree of similarity abovethe threshold) This practice is well-established for general-purpose protein structural analyses [11ndash13] but potentiallyproblematic if applied to peptidic epitopes in view of thenonlinear relationship between sequence similarity and anti-genic similarity (eg as demonstrated by radically divergentantigenic properties arising from a structural difference ofonly a single chemical group [14]) This suggests the needfor a more functionally meaningful alternative approach toexpressing redundancy of epitope data
Protein folding and binding [15] may be regarded asmanifestations of the same underlying phenomenon drivenby the hydrophobic effect favoring burial of nonpolar sur-faces in general away from solvent water albeit with moreselective burial of polar surfaces that favors complemen-tary pairing between hydrogen-bond donors and acceptorsResidues that thus become completely buried (eg withinthe core of a folded protein or at the binding interface of aligand-receptor complex) are sterically and electrostaticallyconstrained by surrounding residues to a much greaterextent than unfolded or even folded but only partially buriedresidues (eg at solvent-exposed protein surfaces) Conse-quentlymolecular recognition of epitopes which ismediatedby local ligand-receptor binding interactions depends onsequence detailsmuchmore than overall (ie global) featuresof protein structure do notably in the sense of proteinfolds
Proteinsmay share the same fold (eg as demonstrated bystructural superposition of their backbones) well into the so-called twilight zone below the threshold for reliable detectionof aligned-sequence similarity (ie less than 35 pairwisesequence identity) [16 17] Residue substitutions can betolerated at surface-exposed positions (eg with replacementof certain polar residues by others whose side-chains differin steric and electrostatic properties) Even at buried-corepositions certain nonpolar residues may be replaced byothers whose side-chains differ in volume especially whereadditional substitutions or other changes compensate for thevolume differences although the introduction of unsatisfiedhydrogen-bond donors or acceptors and of unpaired formalcharges tends to be poorly tolerated [18 19] Howevereven just a single-residue substitution in an epitope mayabolish epitope-specific immune binding (eg by antibod-ies) if surface complementarity is disrupted at the epitope-binding interface (much as the structural organization ofa protein core is disrupted by a radical substitution) asobserved with immune evasion by pathogen escape mutantsHence the notion of conservative residue substitutions asconceptualized for evolving protein sequences (eg on thebasis of changes in both volume and hydrophobicity) is ofquestionable applicability to epitopes as it obscures crucialdetails of changes in steric and electrostatic complementarity
From the perspective of immune function redundancyof epitope data arguably implies potential for antigeniccross-reaction (ie recognition of different epitopes bya common B- or T-cell receptor or equivalent thereof)rather than sequence similarity defined without reference toany immune-system component Hence the present workoutlines a generic theoretical framework for quantitativelyexpressing epitope data redundancy in terms of potentialantigenic cross-reactivity as estimated from an idealized lim-iting case of optimal ligand-receptor complementarity (com-parable to the steric and electrostatic self-complementarityobservedwithin folded proteins [20])This is applied to linearpeptidic epitopes for clarity of illustration and consideringtheir envisioned roles as immune targets in relation topeptide-based vaccines and immunodiagnostics Theoreticalresults are compared with informative available experimentaldata on antigenic cross-reactivity of structurally related shortpeptides in light of possible suboptimal ligand-receptor com-plementarity and its biomedical implications vis-a-vis epitopesequence variation (eg arising as pathogen mutations andhost polymorphisms)
2 Theory and Methods
21 Functional Redundancy of Epitope Data The presentwork aims to support the use of available empirical datato further develop epitope-prediction methods without dis-carding unique epitopes deemed redundant on the basisof some sequence similarity threshold Epitope data thusretained must be characterized as to redundancy for whicha reduced epitope count is introduced belowThis is based onepitope similarity defined in terms of either aligned-sequencesimilarity or functional similarity as regards cross-reactiveepitope binding Aligned-sequence similarity is exploredfirst because it is a familiar and readily calculated quantityHowever its limitations become apparent in the contextof attempting to describe antigenic cross-reaction from aphysicochemical perspective Hence functional similarity isalso explored
For a set of epitopes (construed as epitope structureseg linear peptidic sequences) functional redundancy canbe expressed as a reduced epitope count 119903 such that 1 le
119903 le 119899 where 119899 is the total epitope count with 119903 = 1 and119903 = 119899 respectively corresponding to extreme cases whereinevery epitope is either maximally or minimally similar toevery other epitope from a functional standpoint Thusstructurally identical epitopes would be maximally similarto one another but structurally nonidentical epitopes mightalso be regarded as maximally similar if their structuraldifference was negligible or otherwise irrelevant from afunctional standpoint For simplicity further elaboration ofconcepts herein focuses on cases wherein all epitopes arestructurally unique peptidic sequences of equal length (Thequalifier of ldquostructurally uniquerdquo is consistent with real-world epitope databases insofar as they regard each epitopeas a structurally unique entity The qualifier of ldquoequal lengthrdquois clearly applicable to typical class I MHC-restricted T-cellepitopes and relatively short linear B-cell epitopes albeit less
Advances in Bioinformatics 3
so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)
For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as
119903 =
119899
sum
119894=1
119908119894 (1)
where 119908119894is the contribution of the 119894th epitope to 119903 In turn
119908119894may be defined as
119908119894=
1 + sum119899
119895=1(1 minus 119878
119894119895)
119899
(2)
where 119878119894119895is the functional similarity between the 119894th and 119895th
epitopes such that 0 le 119878119894119895le 1 with 119878
119894119895= 0 and 119878
119894119895= 1
respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878
119894119895might be defined
for a pairwise epitope sequence alignment as
119878119894119895=
sum119898
119896=1119904119894119895119896
119898
(3)
where 119904119894119895119896
is the functional similarity between amino-acidresidues 119886
119894119896and 119886
119895119896at the 119896th sequence positions of the
aligned 119894th and 119895th epitope sequences respectively such that0 le 119904
119894119895119896le 1 with 119904
119894119895119896= 0 and 119904
119894119895119896= 1 respectively
corresponding to extreme cases wherein 119886119894119896
and 119886119895119896
areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904
119894119895119896might be simply defined as
119904119894119895119896=
0 if 119886119894119896
= 119886119895119896
1 if 119886119894119896= 119886119895119896
(4)
for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether
A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon
information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that
119878119894119895= minus (119891
119894log2119891119894+ 119891119895log2119891119895) (5)
where 119891119894and 119891
119895are the fractions of epitope-bound probe
bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878
119894119895= 1 where the 119894th and 119895th epitopes are
indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as
119891119895=
prod119898
119896=1119904119894119895119896
1 + prod119898
119896=1119904119894119895119896
(6)
where 119904119894119895119896
retains its meaning as introduced in (3) Howeverretaining the definition of 119904
119894119895119896given by (4) would be tanta-
mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted
A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904
119894119895119896of
(3) (4) and (6)) may be defined alternatively as
119904119894119895119896
=
0 if 119886119894119896 119886119895119896
sterically incompatible
exp(minusΔΔ119866119894119895119896
119877119879
) otherwise
(7)
where 119894 and 119895 denote the original and altered epitopesrespectively 119886
119894119896and 119886
119895119896retain their meanings as in (4)
ΔΔ119866119894119895119896
is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886
119894119896
by 119886119895119896119877 is the gas constant and119879 is the absolute temperature
ΔΔ119866119894119895119896
is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)
To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping
4 Advances in Bioinformatics
HN
HO HOArg (R)
NH
O O
O O
O
O O O O O O
OH
O O O
O O
OO
O O O O
Lys (K)
Subset 2 trigonal planar
HO
HN HN
Gln (Q)
HOHO
HOGlu (E)
Subset 4 trigonal planar
membered ring
N
S
HOMet (M)
HS HO
HOCys (C)
HOSer (S)
HOAla (A)
HOGly (G)
O
HOTyr (Y)
Subset 5 branching at tetrahedral 120574-
HOPhe (F)
HOAsn (N)
HOAsp (D)
Subset 6 branching at tetrahedral 120573-carbon atom
Subset 7 withring comprising
main-chain
HN
HOHOHis (H)
HOLeu (L)
HOIle (I)
HOVal (V)
HOThr (T) Pro (P)
H2N
H2N
H2N
H2N H2N H2N H2NH2NH2N
H2N H2N H2N H2N H2NH2N
H2N H2N H2N H2N H2NH2N
NH2
NH2
Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions
OH
120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-
120574-carbon atom in five-
Trp (W)
carbon atom
HOO
membered ring
nitrogen atom
Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)
the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral
The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather
K R (1)
(2) (5) (6)
Q E M L I
(3) (4)
Y W C T V
F H S P (7)
N D A G
Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1
Advances in Bioinformatics 5
Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)
G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0
than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom
According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization
As regards ΔΔ119866119894119895119896
(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as
ΔΔ119866119894119895119896= 119888Δ119881
119894119895119896+ 119887119873119894119895119896 (8)
where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896
is the change in residue volume and 119873119894119895119896
is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope
Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as
119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)
where 119860119894119895119896
is the total number of oxygen and nitrogenatoms in the aligned residues while 119861
119894119895119896is the number of
such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873
119894119895119896
increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both
Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873
119894119895119896= 0 for pairs of identical
residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873
119894119895119896= 1
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 3
so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)
For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as
119903 =
119899
sum
119894=1
119908119894 (1)
where 119908119894is the contribution of the 119894th epitope to 119903 In turn
119908119894may be defined as
119908119894=
1 + sum119899
119895=1(1 minus 119878
119894119895)
119899
(2)
where 119878119894119895is the functional similarity between the 119894th and 119895th
epitopes such that 0 le 119878119894119895le 1 with 119878
119894119895= 0 and 119878
119894119895= 1
respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878
119894119895might be defined
for a pairwise epitope sequence alignment as
119878119894119895=
sum119898
119896=1119904119894119895119896
119898
(3)
where 119904119894119895119896
is the functional similarity between amino-acidresidues 119886
119894119896and 119886
119895119896at the 119896th sequence positions of the
aligned 119894th and 119895th epitope sequences respectively such that0 le 119904
119894119895119896le 1 with 119904
119894119895119896= 0 and 119904
119894119895119896= 1 respectively
corresponding to extreme cases wherein 119886119894119896
and 119886119895119896
areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904
119894119895119896might be simply defined as
119904119894119895119896=
0 if 119886119894119896
= 119886119895119896
1 if 119886119894119896= 119886119895119896
(4)
for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether
A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon
information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that
119878119894119895= minus (119891
119894log2119891119894+ 119891119895log2119891119895) (5)
where 119891119894and 119891
119895are the fractions of epitope-bound probe
bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878
119894119895= 1 where the 119894th and 119895th epitopes are
indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as
119891119895=
prod119898
119896=1119904119894119895119896
1 + prod119898
119896=1119904119894119895119896
(6)
where 119904119894119895119896
retains its meaning as introduced in (3) Howeverretaining the definition of 119904
119894119895119896given by (4) would be tanta-
mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted
A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904
119894119895119896of
(3) (4) and (6)) may be defined alternatively as
119904119894119895119896
=
0 if 119886119894119896 119886119895119896
sterically incompatible
exp(minusΔΔ119866119894119895119896
119877119879
) otherwise
(7)
where 119894 and 119895 denote the original and altered epitopesrespectively 119886
119894119896and 119886
119895119896retain their meanings as in (4)
ΔΔ119866119894119895119896
is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886
119894119896
by 119886119895119896119877 is the gas constant and119879 is the absolute temperature
ΔΔ119866119894119895119896
is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)
To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping
4 Advances in Bioinformatics
HN
HO HOArg (R)
NH
O O
O O
O
O O O O O O
OH
O O O
O O
OO
O O O O
Lys (K)
Subset 2 trigonal planar
HO
HN HN
Gln (Q)
HOHO
HOGlu (E)
Subset 4 trigonal planar
membered ring
N
S
HOMet (M)
HS HO
HOCys (C)
HOSer (S)
HOAla (A)
HOGly (G)
O
HOTyr (Y)
Subset 5 branching at tetrahedral 120574-
HOPhe (F)
HOAsn (N)
HOAsp (D)
Subset 6 branching at tetrahedral 120573-carbon atom
Subset 7 withring comprising
main-chain
HN
HOHOHis (H)
HOLeu (L)
HOIle (I)
HOVal (V)
HOThr (T) Pro (P)
H2N
H2N
H2N
H2N H2N H2N H2NH2NH2N
H2N H2N H2N H2N H2NH2N
H2N H2N H2N H2N H2NH2N
NH2
NH2
Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions
OH
120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-
120574-carbon atom in five-
Trp (W)
carbon atom
HOO
membered ring
nitrogen atom
Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)
the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral
The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather
K R (1)
(2) (5) (6)
Q E M L I
(3) (4)
Y W C T V
F H S P (7)
N D A G
Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1
Advances in Bioinformatics 5
Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)
G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0
than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom
According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization
As regards ΔΔ119866119894119895119896
(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as
ΔΔ119866119894119895119896= 119888Δ119881
119894119895119896+ 119887119873119894119895119896 (8)
where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896
is the change in residue volume and 119873119894119895119896
is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope
Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as
119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)
where 119860119894119895119896
is the total number of oxygen and nitrogenatoms in the aligned residues while 119861
119894119895119896is the number of
such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873
119894119895119896
increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both
Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873
119894119895119896= 0 for pairs of identical
residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873
119894119895119896= 1
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 Advances in Bioinformatics
HN
HO HOArg (R)
NH
O O
O O
O
O O O O O O
OH
O O O
O O
OO
O O O O
Lys (K)
Subset 2 trigonal planar
HO
HN HN
Gln (Q)
HOHO
HOGlu (E)
Subset 4 trigonal planar
membered ring
N
S
HOMet (M)
HS HO
HOCys (C)
HOSer (S)
HOAla (A)
HOGly (G)
O
HOTyr (Y)
Subset 5 branching at tetrahedral 120574-
HOPhe (F)
HOAsn (N)
HOAsp (D)
Subset 6 branching at tetrahedral 120573-carbon atom
Subset 7 withring comprising
main-chain
HN
HOHOHis (H)
HOLeu (L)
HOIle (I)
HOVal (V)
HOThr (T) Pro (P)
H2N
H2N
H2N
H2N H2N H2N H2NH2NH2N
H2N H2N H2N H2N H2NH2N
H2N H2N H2N H2N H2NH2N
NH2
NH2
Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions
OH
120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-
120574-carbon atom in five-
Trp (W)
carbon atom
HOO
membered ring
nitrogen atom
Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)
the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral
The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather
K R (1)
(2) (5) (6)
Q E M L I
(3) (4)
Y W C T V
F H S P (7)
N D A G
Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1
Advances in Bioinformatics 5
Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)
G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0
than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom
According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization
As regards ΔΔ119866119894119895119896
(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as
ΔΔ119866119894119895119896= 119888Δ119881
119894119895119896+ 119887119873119894119895119896 (8)
where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896
is the change in residue volume and 119873119894119895119896
is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope
Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as
119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)
where 119860119894119895119896
is the total number of oxygen and nitrogenatoms in the aligned residues while 119861
119894119895119896is the number of
such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873
119894119895119896
increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both
Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873
119894119895119896= 0 for pairs of identical
residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873
119894119895119896= 1
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 5
Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)
G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0
than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom
According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization
As regards ΔΔ119866119894119895119896
(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as
ΔΔ119866119894119895119896= 119888Δ119881
119894119895119896+ 119887119873119894119895119896 (8)
where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896
is the change in residue volume and 119873119894119895119896
is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope
Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as
119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)
where 119860119894119895119896
is the total number of oxygen and nitrogenatoms in the aligned residues while 119861
119894119895119896is the number of
such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873
119894119895119896
increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both
Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873
119894119895119896= 0 for pairs of identical
residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873
119894119895119896= 1
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 Advances in Bioinformatics
due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873
119894119895119896is the total
number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873
119894119895119896 Hence 119873
119894119895119896= 3
for Tyr paired with Asn although 119873119894119895119896
= 2 for Asn pairedwith Asp
Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]
Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)
22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)
Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for
Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)
which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 7
5 10 15 200Sequence length
0
5
10
15
20
Redu
ced
epito
pe co
unt
Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896
in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)
and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21
The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21
3 Results and Discussion
31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with
epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)
32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen
Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 Advances in Bioinformatics
Table2IEDBQualitative-Outcome(Q)d
atao
npeptidec
ross-reactivity
assays
(cfFigure
5)
Q
Assay
inform
ation
IEDB
Epito
peID
Substitution
Sequ
ence
context(with119883atpo
sitionof
substitution)
Ref
TypeM
HCcla
ssParameter
measured(cfIEDB
ldquomeasuremento
frdquodatafield)
IEDBID
Con
trolreaction
Cross-reactio
n1minus
B-cell
Antibod
y-antig
enbind
ing
1370146
1370150
10801
WY
DXE
DRY
YRE
[22]
2minus
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
854
6260
4PA
SYIX
SAEK
I[23]
3minus
T-cellII
Cell
proliferatio
n1646
734
1646
735
107384
ED
GEP
GIAGFK
GXQ
GPK
[24]
4minus
T-cellII
Cell
proliferatio
n1660246
1660247
46317
FA
NTW
TTCQ
SIAXP
SK[25]
5minus
T-cellII
Cytokine
release(IL-4)
1660252
1660258
112245
AF
NTW
TTCQ
SIAXP
SK[25]
6minus
T-cellII
Cytokine
release(IL-2)
166706
01667065
68846
KA
VHFF
XNIV
TPRT
P[26]
7minus
T-cellII
Cell
proliferatio
n1667112
1667113
80071
AK
VHFF
XNIV
TPRT
P[26]
8minus
T-cellII
Cell
proliferatio
n171660
41716614
125569
TV
SWEG
VGVXP
DV
[27]
9minus
T-cellII
Cell
proliferatio
n171660
41716616
125569
DN
SWEG
VGVTP
XV[27]
10minus
T-cellII
Cell
proliferatio
n1716605
1716618
125571
VT
SWEG
VGVXP
NV
[27]
11+
T-cellI
Cytotoxicity
1634784
1634785
103300
SA
IYXT
VAGSL
[28]
12+
T-cellI
Cytotoxicity
1634784
1634786
103300
GS
IYST
VAXS
L[28]
13+
T-cellI
Cytotoxicity
1634788
1634789
103298
SG
IYAT
VAXS
L[28]
14+
T-cellI
Cytotoxicity
1634788
1634790
103298
AS
IYXT
VASSL
[28]
15+
T-cellI
Cytotoxicity
1657362
1657364
111964
IDYX
FAFR
DL
[29]
16+
T-cellI
Cytokine
release(IFN-120574)
171644
2171644
3125101
ITXM
IKFN
RL[30]
17+
T-cellI
Cytokine
release(IFN-120574)
1340
848
1340
852
6260
4K
DSY
IPSA
EXI
[23]
18+
T-cellI
Cytokine
release(IFN-120574)
1381750
1381748
37174
DG
LLXR
DSFEV
[31]
19+
T-cellI
Cytokine
release(IFN-120574)
1381752
1381753
37392
HG
LLXR
DSFEV
[31]
20+
T-cellI
Cytotoxicity
1404705
1404714
61086
SI
SXIEFA
RL[32]
21+
T-cellI
Cytotoxicity
1404705
1404715
61086
AE
SSIEFX
RL[32]
22+
T-cellI
Cytotoxicity
1404705
1404716
61086
RK
SSIEFA
XL[32]
23+
T-cellI
Cytotoxicity
1404712
1404719
61088
EA
SSIEFX
RL[32]
24+
T-cellI
Cytotoxicity
1404713
1404721
61085
KR
SSIEFA
XL[32]
25+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865530
98995
TF
SAPD
XRPA
[33]
26+
T-cellI
Cytokine
release(IFN-120574)
1865529
1865532
98995
AL
SAPD
TRPX
[33]
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 9
G
S
N
E
K
Y
W
1
2
3
4
5
6
8
12
13
17
18 19
20
21
22
23
2425
A
CD
V
HL
R
15
26
P
Q
M
F
11
7
14
16T
I
109
Ant
igen
resid
ue v
olum
e (Aring3 )
60
80
100
120
140
160
180
200
220
240
80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)
ReferenceNegative B-cellNegative T-cellI
Negative T-cellIIPositive T-cellI
Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments
For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset
33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6
Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were
consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity
The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded
34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
10 Advances in Bioinformatics
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(a)
0001001
011
10100
1000
10 20 30 40 500Sequence length
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
(b)
Sequ
ence
coun
t or
sequ
ence
coun
tminusre
duce
dco
unt
0001001
011
10100
1000
10 20 30 40 500Sequence length
(c)
Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1
for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent
From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen
epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions
In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as
119870119894119895= 119870119894119894119885119894119895 (10)
where 119870119894119895is the association constant for cross-reaction with
antigen epitope 119895 119870119894119894is the association constant for reaction
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 11
with immunogen epitope 119894 and 119885119894119895is a correction term
to account for the difference between the two associationconstants In turn119870
119894119894might be estimated as
119870119894119894= exp(minus
Δ119866119894119894
119877119879
) (11)
where Δ119866119894119894
is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885
119894119895might be estimated as
119885119894119895=
119898
prod
119896=1
119904119894119895119896 if
119898
prod
119896=1
119904119894119895119896gt 119870minus1
119894119894
119870minus1
119894119894 otherwise
(12)
where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870
119894119895= 1 for nonspecific binding) Δ119866
119894119894could be
estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904
119894119895119896could be
estimated along similar lines with regard to ΔΔ119866119894119895119896
in (7)The preceding discussion has alluded to epitope binding
as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial
The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by
TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR
At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history
4 Summary and Conclusions
In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data
More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
12 Advances in Bioinformatics
in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset
In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)
Competing Interests
The author declares that they have no competing interests
Acknowledgments
This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author
References
[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013
[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014
[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014
[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009
[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014
[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014
[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015
[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010
[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014
[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014
[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992
[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997
[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003
[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987
[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011
[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999
[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015
[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996
[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005
[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012
[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994
[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007
[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Advances in Bioinformatics 13
[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992
[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997
[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997
[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994
[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000
[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995
[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010
[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001
[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998
[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006
[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948
[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957
[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957
[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992
[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995
[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015
[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006
[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010
[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000
[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012
[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011
[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology