research article expressing redundancy among linear

14
Research Article Expressing Redundancy among Linear-Epitope Sequence Data Based on Residue-Level Physicochemical Similarity in the Context of Antigenic Cross-Reaction Salvador Eugenio C. Caoili Department of Biochemistry and Molecular Biology, College of Medicine, University of the Philippines Manila, Room 101, Medical Annex Building, 547 Pedro Gil Street, Ermita, 1000 Manila, Philippines Correspondence should be addressed to Salvador Eugenio C. Caoili; [email protected] Received 27 October 2015; Revised 29 March 2016; Accepted 10 April 2016 Academic Editor: Gilbert Deleage Copyright © 2016 Salvador Eugenio C. Caoili. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. is is obscured by expressing redundancy among linear-epitope data as fractional sequence- alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED) thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. e apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes. 1. Introduction Immunological targeting of antigens exemplified by pathogen virulence factors, allergens, and even typical drugs is funda- mental to the solution of global-health problems including both infectious and noninfectious diseases [1–3]. is entails molecular recognition of antigens by immune-system com- ponents (e.g., antibodies and T-cell receptors), which occurs via binding of epitopes (i.e., the recognized submolecular structural features of antigens) [4, 5]. Epitope prediction (i.e., computational identification of epitopes among biomolecules such as proteins) aims to enable selective incorporation of particular epitopes (e.g., actual targets of protective immune responses rather than disease-enhancing immunological decoys) into antigenic constructs (e.g., synthetic peptides) for novel vaccines, immunotherapeutics, and immunodiagnos- tics [6]. However, this is complicated by the limited accuracy of existing tools for epitope prediction [7]. e present work thus explores the crucial yet largely neglected issue of epitope- data redundancy as a key consideration in epitope-prediction tool development. Progressive development of epitope-prediction tools requires empirical epitope data for both training (e.g., in the context of machine learning) and benchmarking [8– 10]. Hence, epitope-data redundancy is a major concern especially where data-driven statistical and machine-learning methods are employed to develop tools for epitope prediction and also related applications (e.g., MHC binding prediction), as studies might yield biased results due to overrepresenta- tion of similar epitopes. Similarity among epitopes is oſten expressed as sequence similarity, particularly for linear pep- tidic epitopes which include continuous B-cell epitopes (each consisting of a single unbroken epitope-residue sequence, in contrast to discontinuous epitopes wherein epitope residues Hindawi Publishing Corporation Advances in Bioinformatics Volume 2016, Article ID 1276594, 13 pages http://dx.doi.org/10.1155/2016/1276594

Upload: others

Post on 23-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Expressing Redundancy among Linear

Research ArticleExpressing Redundancy among Linear-Epitope SequenceData Based on Residue-Level Physicochemical Similarity inthe Context of Antigenic Cross-Reaction

Salvador Eugenio C Caoili

Department of Biochemistry and Molecular Biology College of Medicine University of the Philippines Manila Room 101Medical Annex Building 547 Pedro Gil Street Ermita 1000 Manila Philippines

Correspondence should be addressed to Salvador Eugenio C Caoili badongpostupmeduph

Received 27 October 2015 Revised 29 March 2016 Accepted 10 April 2016

Academic Editor Gilbert Deleage

Copyright copy 2016 Salvador Eugenio C Caoili This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Epitope-based design of vaccines immunotherapeutics and immunodiagnostics is complicated by structural changes that radicallyalter immunological outcomes This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions evenwhere thesemight be considered conservative in the context of classical sequence analysis From the perspective of immune functionbased on molecular recognition of epitopes functional redundancy of epitope data (FRED) thus may be defined in a biologicallymore meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction with functionalsimilarity between epitopes expressed as the Shannon information entropy for differential epitope binding Such similarity may beestimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealizedbinding site of high complementarity to the immunogen epitope by analogy between protein folding and ligand-receptor bindingbut this underestimates potential for cross-reactivity suggesting that epitope-binding site complementarity is typically suboptimalas regards immunologic specificity The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immunefunction that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes

1 Introduction

Immunological targeting of antigens exemplified by pathogenvirulence factors allergens and even typical drugs is funda-mental to the solution of global-health problems includingboth infectious and noninfectious diseases [1ndash3] This entailsmolecular recognition of antigens by immune-system com-ponents (eg antibodies and T-cell receptors) which occursvia binding of epitopes (ie the recognized submolecularstructural features of antigens) [4 5] Epitope prediction (iecomputational identification of epitopes among biomoleculessuch as proteins) aims to enable selective incorporation ofparticular epitopes (eg actual targets of protective immuneresponses rather than disease-enhancing immunologicaldecoys) into antigenic constructs (eg synthetic peptides) fornovel vaccines immunotherapeutics and immunodiagnos-tics [6] However this is complicated by the limited accuracy

of existing tools for epitope prediction [7] The present workthus explores the crucial yet largely neglected issue of epitope-data redundancy as a key consideration in epitope-predictiontool development

Progressive development of epitope-prediction toolsrequires empirical epitope data for both training (eg inthe context of machine learning) and benchmarking [8ndash10] Hence epitope-data redundancy is a major concernespecially where data-driven statistical andmachine-learningmethods are employed to develop tools for epitope predictionand also related applications (eg MHC binding prediction)as studies might yield biased results due to overrepresenta-tion of similar epitopes Similarity among epitopes is oftenexpressed as sequence similarity particularly for linear pep-tidic epitopes which include continuous B-cell epitopes (eachconsisting of a single unbroken epitope-residue sequence incontrast to discontinuous epitopes wherein epitope residues

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2016 Article ID 1276594 13 pageshttpdxdoiorg10115520161276594

2 Advances in Bioinformatics

are separated in the sequence by intervening residues) andtypical T-cell epitopes (each bound by a MHC moleculefor presentation to T-cells) For these classical heuristicapproaches to decrease the redundancy entail setting asimilarity threshold (typically expressed as a fraction ofidentical residues for a pair of aligned sequences) such thatsubsequent analyses may compensate accordingly (eg byexcluding sequences sharing a degree of similarity abovethe threshold) This practice is well-established for general-purpose protein structural analyses [11ndash13] but potentiallyproblematic if applied to peptidic epitopes in view of thenonlinear relationship between sequence similarity and anti-genic similarity (eg as demonstrated by radically divergentantigenic properties arising from a structural difference ofonly a single chemical group [14]) This suggests the needfor a more functionally meaningful alternative approach toexpressing redundancy of epitope data

Protein folding and binding [15] may be regarded asmanifestations of the same underlying phenomenon drivenby the hydrophobic effect favoring burial of nonpolar sur-faces in general away from solvent water albeit with moreselective burial of polar surfaces that favors complemen-tary pairing between hydrogen-bond donors and acceptorsResidues that thus become completely buried (eg withinthe core of a folded protein or at the binding interface of aligand-receptor complex) are sterically and electrostaticallyconstrained by surrounding residues to a much greaterextent than unfolded or even folded but only partially buriedresidues (eg at solvent-exposed protein surfaces) Conse-quentlymolecular recognition of epitopes which ismediatedby local ligand-receptor binding interactions depends onsequence detailsmuchmore than overall (ie global) featuresof protein structure do notably in the sense of proteinfolds

Proteinsmay share the same fold (eg as demonstrated bystructural superposition of their backbones) well into the so-called twilight zone below the threshold for reliable detectionof aligned-sequence similarity (ie less than 35 pairwisesequence identity) [16 17] Residue substitutions can betolerated at surface-exposed positions (eg with replacementof certain polar residues by others whose side-chains differin steric and electrostatic properties) Even at buried-corepositions certain nonpolar residues may be replaced byothers whose side-chains differ in volume especially whereadditional substitutions or other changes compensate for thevolume differences although the introduction of unsatisfiedhydrogen-bond donors or acceptors and of unpaired formalcharges tends to be poorly tolerated [18 19] Howevereven just a single-residue substitution in an epitope mayabolish epitope-specific immune binding (eg by antibod-ies) if surface complementarity is disrupted at the epitope-binding interface (much as the structural organization ofa protein core is disrupted by a radical substitution) asobserved with immune evasion by pathogen escape mutantsHence the notion of conservative residue substitutions asconceptualized for evolving protein sequences (eg on thebasis of changes in both volume and hydrophobicity) is ofquestionable applicability to epitopes as it obscures crucialdetails of changes in steric and electrostatic complementarity

From the perspective of immune function redundancyof epitope data arguably implies potential for antigeniccross-reaction (ie recognition of different epitopes bya common B- or T-cell receptor or equivalent thereof)rather than sequence similarity defined without reference toany immune-system component Hence the present workoutlines a generic theoretical framework for quantitativelyexpressing epitope data redundancy in terms of potentialantigenic cross-reactivity as estimated from an idealized lim-iting case of optimal ligand-receptor complementarity (com-parable to the steric and electrostatic self-complementarityobservedwithin folded proteins [20])This is applied to linearpeptidic epitopes for clarity of illustration and consideringtheir envisioned roles as immune targets in relation topeptide-based vaccines and immunodiagnostics Theoreticalresults are compared with informative available experimentaldata on antigenic cross-reactivity of structurally related shortpeptides in light of possible suboptimal ligand-receptor com-plementarity and its biomedical implications vis-a-vis epitopesequence variation (eg arising as pathogen mutations andhost polymorphisms)

2 Theory and Methods

21 Functional Redundancy of Epitope Data The presentwork aims to support the use of available empirical datato further develop epitope-prediction methods without dis-carding unique epitopes deemed redundant on the basisof some sequence similarity threshold Epitope data thusretained must be characterized as to redundancy for whicha reduced epitope count is introduced belowThis is based onepitope similarity defined in terms of either aligned-sequencesimilarity or functional similarity as regards cross-reactiveepitope binding Aligned-sequence similarity is exploredfirst because it is a familiar and readily calculated quantityHowever its limitations become apparent in the contextof attempting to describe antigenic cross-reaction from aphysicochemical perspective Hence functional similarity isalso explored

For a set of epitopes (construed as epitope structureseg linear peptidic sequences) functional redundancy canbe expressed as a reduced epitope count 119903 such that 1 le

119903 le 119899 where 119899 is the total epitope count with 119903 = 1 and119903 = 119899 respectively corresponding to extreme cases whereinevery epitope is either maximally or minimally similar toevery other epitope from a functional standpoint Thusstructurally identical epitopes would be maximally similarto one another but structurally nonidentical epitopes mightalso be regarded as maximally similar if their structuraldifference was negligible or otherwise irrelevant from afunctional standpoint For simplicity further elaboration ofconcepts herein focuses on cases wherein all epitopes arestructurally unique peptidic sequences of equal length (Thequalifier of ldquostructurally uniquerdquo is consistent with real-world epitope databases insofar as they regard each epitopeas a structurally unique entity The qualifier of ldquoequal lengthrdquois clearly applicable to typical class I MHC-restricted T-cellepitopes and relatively short linear B-cell epitopes albeit less

Advances in Bioinformatics 3

so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)

For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as

119903 =

119899

sum

119894=1

119908119894 (1)

where 119908119894is the contribution of the 119894th epitope to 119903 In turn

119908119894may be defined as

119908119894=

1 + sum119899

119895=1(1 minus 119878

119894119895)

119899

(2)

where 119878119894119895is the functional similarity between the 119894th and 119895th

epitopes such that 0 le 119878119894119895le 1 with 119878

119894119895= 0 and 119878

119894119895= 1

respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878

119894119895might be defined

for a pairwise epitope sequence alignment as

119878119894119895=

sum119898

119896=1119904119894119895119896

119898

(3)

where 119904119894119895119896

is the functional similarity between amino-acidresidues 119886

119894119896and 119886

119895119896at the 119896th sequence positions of the

aligned 119894th and 119895th epitope sequences respectively such that0 le 119904

119894119895119896le 1 with 119904

119894119895119896= 0 and 119904

119894119895119896= 1 respectively

corresponding to extreme cases wherein 119886119894119896

and 119886119895119896

areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904

119894119895119896might be simply defined as

119904119894119895119896=

0 if 119886119894119896

= 119886119895119896

1 if 119886119894119896= 119886119895119896

(4)

for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether

A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon

information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that

119878119894119895= minus (119891

119894log2119891119894+ 119891119895log2119891119895) (5)

where 119891119894and 119891

119895are the fractions of epitope-bound probe

bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878

119894119895= 1 where the 119894th and 119895th epitopes are

indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as

119891119895=

prod119898

119896=1119904119894119895119896

1 + prod119898

119896=1119904119894119895119896

(6)

where 119904119894119895119896

retains its meaning as introduced in (3) Howeverretaining the definition of 119904

119894119895119896given by (4) would be tanta-

mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted

A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904

119894119895119896of

(3) (4) and (6)) may be defined alternatively as

119904119894119895119896

=

0 if 119886119894119896 119886119895119896

sterically incompatible

exp(minusΔΔ119866119894119895119896

119877119879

) otherwise

(7)

where 119894 and 119895 denote the original and altered epitopesrespectively 119886

119894119896and 119886

119895119896retain their meanings as in (4)

ΔΔ119866119894119895119896

is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886

119894119896

by 119886119895119896119877 is the gas constant and119879 is the absolute temperature

ΔΔ119866119894119895119896

is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)

To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping

4 Advances in Bioinformatics

HN

HO HOArg (R)

NH

O O

O O

O

O O O O O O

OH

O O O

O O

OO

O O O O

Lys (K)

Subset 2 trigonal planar

HO

HN HN

Gln (Q)

HOHO

HOGlu (E)

Subset 4 trigonal planar

membered ring

N

S

HOMet (M)

HS HO

HOCys (C)

HOSer (S)

HOAla (A)

HOGly (G)

O

HOTyr (Y)

Subset 5 branching at tetrahedral 120574-

HOPhe (F)

HOAsn (N)

HOAsp (D)

Subset 6 branching at tetrahedral 120573-carbon atom

Subset 7 withring comprising

main-chain

HN

HOHOHis (H)

HOLeu (L)

HOIle (I)

HOVal (V)

HOThr (T) Pro (P)

H2N

H2N

H2N

H2N H2N H2N H2NH2NH2N

H2N H2N H2N H2N H2NH2N

H2N H2N H2N H2N H2NH2N

NH2

NH2

Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions

OH

120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-

120574-carbon atom in five-

Trp (W)

carbon atom

HOO

membered ring

nitrogen atom

Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)

the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral

The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather

K R (1)

(2) (5) (6)

Q E M L I

(3) (4)

Y W C T V

F H S P (7)

N D A G

Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1

Advances in Bioinformatics 5

Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)

G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0

than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom

According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization

As regards ΔΔ119866119894119895119896

(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as

ΔΔ119866119894119895119896= 119888Δ119881

119894119895119896+ 119887119873119894119895119896 (8)

where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896

is the change in residue volume and 119873119894119895119896

is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope

Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as

119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)

where 119860119894119895119896

is the total number of oxygen and nitrogenatoms in the aligned residues while 119861

119894119895119896is the number of

such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873

119894119895119896

increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both

Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873

119894119895119896= 0 for pairs of identical

residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873

119894119895119896= 1

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Expressing Redundancy among Linear

2 Advances in Bioinformatics

are separated in the sequence by intervening residues) andtypical T-cell epitopes (each bound by a MHC moleculefor presentation to T-cells) For these classical heuristicapproaches to decrease the redundancy entail setting asimilarity threshold (typically expressed as a fraction ofidentical residues for a pair of aligned sequences) such thatsubsequent analyses may compensate accordingly (eg byexcluding sequences sharing a degree of similarity abovethe threshold) This practice is well-established for general-purpose protein structural analyses [11ndash13] but potentiallyproblematic if applied to peptidic epitopes in view of thenonlinear relationship between sequence similarity and anti-genic similarity (eg as demonstrated by radically divergentantigenic properties arising from a structural difference ofonly a single chemical group [14]) This suggests the needfor a more functionally meaningful alternative approach toexpressing redundancy of epitope data

Protein folding and binding [15] may be regarded asmanifestations of the same underlying phenomenon drivenby the hydrophobic effect favoring burial of nonpolar sur-faces in general away from solvent water albeit with moreselective burial of polar surfaces that favors complemen-tary pairing between hydrogen-bond donors and acceptorsResidues that thus become completely buried (eg withinthe core of a folded protein or at the binding interface of aligand-receptor complex) are sterically and electrostaticallyconstrained by surrounding residues to a much greaterextent than unfolded or even folded but only partially buriedresidues (eg at solvent-exposed protein surfaces) Conse-quentlymolecular recognition of epitopes which ismediatedby local ligand-receptor binding interactions depends onsequence detailsmuchmore than overall (ie global) featuresof protein structure do notably in the sense of proteinfolds

Proteinsmay share the same fold (eg as demonstrated bystructural superposition of their backbones) well into the so-called twilight zone below the threshold for reliable detectionof aligned-sequence similarity (ie less than 35 pairwisesequence identity) [16 17] Residue substitutions can betolerated at surface-exposed positions (eg with replacementof certain polar residues by others whose side-chains differin steric and electrostatic properties) Even at buried-corepositions certain nonpolar residues may be replaced byothers whose side-chains differ in volume especially whereadditional substitutions or other changes compensate for thevolume differences although the introduction of unsatisfiedhydrogen-bond donors or acceptors and of unpaired formalcharges tends to be poorly tolerated [18 19] Howevereven just a single-residue substitution in an epitope mayabolish epitope-specific immune binding (eg by antibod-ies) if surface complementarity is disrupted at the epitope-binding interface (much as the structural organization ofa protein core is disrupted by a radical substitution) asobserved with immune evasion by pathogen escape mutantsHence the notion of conservative residue substitutions asconceptualized for evolving protein sequences (eg on thebasis of changes in both volume and hydrophobicity) is ofquestionable applicability to epitopes as it obscures crucialdetails of changes in steric and electrostatic complementarity

From the perspective of immune function redundancyof epitope data arguably implies potential for antigeniccross-reaction (ie recognition of different epitopes bya common B- or T-cell receptor or equivalent thereof)rather than sequence similarity defined without reference toany immune-system component Hence the present workoutlines a generic theoretical framework for quantitativelyexpressing epitope data redundancy in terms of potentialantigenic cross-reactivity as estimated from an idealized lim-iting case of optimal ligand-receptor complementarity (com-parable to the steric and electrostatic self-complementarityobservedwithin folded proteins [20])This is applied to linearpeptidic epitopes for clarity of illustration and consideringtheir envisioned roles as immune targets in relation topeptide-based vaccines and immunodiagnostics Theoreticalresults are compared with informative available experimentaldata on antigenic cross-reactivity of structurally related shortpeptides in light of possible suboptimal ligand-receptor com-plementarity and its biomedical implications vis-a-vis epitopesequence variation (eg arising as pathogen mutations andhost polymorphisms)

2 Theory and Methods

21 Functional Redundancy of Epitope Data The presentwork aims to support the use of available empirical datato further develop epitope-prediction methods without dis-carding unique epitopes deemed redundant on the basisof some sequence similarity threshold Epitope data thusretained must be characterized as to redundancy for whicha reduced epitope count is introduced belowThis is based onepitope similarity defined in terms of either aligned-sequencesimilarity or functional similarity as regards cross-reactiveepitope binding Aligned-sequence similarity is exploredfirst because it is a familiar and readily calculated quantityHowever its limitations become apparent in the contextof attempting to describe antigenic cross-reaction from aphysicochemical perspective Hence functional similarity isalso explored

For a set of epitopes (construed as epitope structureseg linear peptidic sequences) functional redundancy canbe expressed as a reduced epitope count 119903 such that 1 le

119903 le 119899 where 119899 is the total epitope count with 119903 = 1 and119903 = 119899 respectively corresponding to extreme cases whereinevery epitope is either maximally or minimally similar toevery other epitope from a functional standpoint Thusstructurally identical epitopes would be maximally similarto one another but structurally nonidentical epitopes mightalso be regarded as maximally similar if their structuraldifference was negligible or otherwise irrelevant from afunctional standpoint For simplicity further elaboration ofconcepts herein focuses on cases wherein all epitopes arestructurally unique peptidic sequences of equal length (Thequalifier of ldquostructurally uniquerdquo is consistent with real-world epitope databases insofar as they regard each epitopeas a structurally unique entity The qualifier of ldquoequal lengthrdquois clearly applicable to typical class I MHC-restricted T-cellepitopes and relatively short linear B-cell epitopes albeit less

Advances in Bioinformatics 3

so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)

For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as

119903 =

119899

sum

119894=1

119908119894 (1)

where 119908119894is the contribution of the 119894th epitope to 119903 In turn

119908119894may be defined as

119908119894=

1 + sum119899

119895=1(1 minus 119878

119894119895)

119899

(2)

where 119878119894119895is the functional similarity between the 119894th and 119895th

epitopes such that 0 le 119878119894119895le 1 with 119878

119894119895= 0 and 119878

119894119895= 1

respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878

119894119895might be defined

for a pairwise epitope sequence alignment as

119878119894119895=

sum119898

119896=1119904119894119895119896

119898

(3)

where 119904119894119895119896

is the functional similarity between amino-acidresidues 119886

119894119896and 119886

119895119896at the 119896th sequence positions of the

aligned 119894th and 119895th epitope sequences respectively such that0 le 119904

119894119895119896le 1 with 119904

119894119895119896= 0 and 119904

119894119895119896= 1 respectively

corresponding to extreme cases wherein 119886119894119896

and 119886119895119896

areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904

119894119895119896might be simply defined as

119904119894119895119896=

0 if 119886119894119896

= 119886119895119896

1 if 119886119894119896= 119886119895119896

(4)

for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether

A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon

information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that

119878119894119895= minus (119891

119894log2119891119894+ 119891119895log2119891119895) (5)

where 119891119894and 119891

119895are the fractions of epitope-bound probe

bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878

119894119895= 1 where the 119894th and 119895th epitopes are

indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as

119891119895=

prod119898

119896=1119904119894119895119896

1 + prod119898

119896=1119904119894119895119896

(6)

where 119904119894119895119896

retains its meaning as introduced in (3) Howeverretaining the definition of 119904

119894119895119896given by (4) would be tanta-

mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted

A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904

119894119895119896of

(3) (4) and (6)) may be defined alternatively as

119904119894119895119896

=

0 if 119886119894119896 119886119895119896

sterically incompatible

exp(minusΔΔ119866119894119895119896

119877119879

) otherwise

(7)

where 119894 and 119895 denote the original and altered epitopesrespectively 119886

119894119896and 119886

119895119896retain their meanings as in (4)

ΔΔ119866119894119895119896

is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886

119894119896

by 119886119895119896119877 is the gas constant and119879 is the absolute temperature

ΔΔ119866119894119895119896

is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)

To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping

4 Advances in Bioinformatics

HN

HO HOArg (R)

NH

O O

O O

O

O O O O O O

OH

O O O

O O

OO

O O O O

Lys (K)

Subset 2 trigonal planar

HO

HN HN

Gln (Q)

HOHO

HOGlu (E)

Subset 4 trigonal planar

membered ring

N

S

HOMet (M)

HS HO

HOCys (C)

HOSer (S)

HOAla (A)

HOGly (G)

O

HOTyr (Y)

Subset 5 branching at tetrahedral 120574-

HOPhe (F)

HOAsn (N)

HOAsp (D)

Subset 6 branching at tetrahedral 120573-carbon atom

Subset 7 withring comprising

main-chain

HN

HOHOHis (H)

HOLeu (L)

HOIle (I)

HOVal (V)

HOThr (T) Pro (P)

H2N

H2N

H2N

H2N H2N H2N H2NH2NH2N

H2N H2N H2N H2N H2NH2N

H2N H2N H2N H2N H2NH2N

NH2

NH2

Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions

OH

120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-

120574-carbon atom in five-

Trp (W)

carbon atom

HOO

membered ring

nitrogen atom

Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)

the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral

The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather

K R (1)

(2) (5) (6)

Q E M L I

(3) (4)

Y W C T V

F H S P (7)

N D A G

Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1

Advances in Bioinformatics 5

Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)

G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0

than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom

According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization

As regards ΔΔ119866119894119895119896

(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as

ΔΔ119866119894119895119896= 119888Δ119881

119894119895119896+ 119887119873119894119895119896 (8)

where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896

is the change in residue volume and 119873119894119895119896

is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope

Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as

119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)

where 119860119894119895119896

is the total number of oxygen and nitrogenatoms in the aligned residues while 119861

119894119895119896is the number of

such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873

119894119895119896

increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both

Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873

119894119895119896= 0 for pairs of identical

residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873

119894119895119896= 1

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 3

so for longer B-cell epitopes and class II MHC-restricted T-cell epitopes)

For a set of 119899 structurally unique epitopes the reducedepitope count may be defined as

119903 =

119899

sum

119894=1

119908119894 (1)

where 119908119894is the contribution of the 119894th epitope to 119903 In turn

119908119894may be defined as

119908119894=

1 + sum119899

119895=1(1 minus 119878

119894119895)

119899

(2)

where 119878119894119895is the functional similarity between the 119894th and 119895th

epitopes such that 0 le 119878119894119895le 1 with 119878

119894119895= 0 and 119878

119894119895= 1

respectively corresponding to extreme cases wherein the 119894thand 119895th epitopes are either minimally or maximally similarfroma functional standpoint If all the epitopes are of uniformsequence length119898 amino-acid residues 119878

119894119895might be defined

for a pairwise epitope sequence alignment as

119878119894119895=

sum119898

119896=1119904119894119895119896

119898

(3)

where 119904119894119895119896

is the functional similarity between amino-acidresidues 119886

119894119896and 119886

119895119896at the 119896th sequence positions of the

aligned 119894th and 119895th epitope sequences respectively such that0 le 119904

119894119895119896le 1 with 119904

119894119895119896= 0 and 119904

119894119895119896= 1 respectively

corresponding to extreme cases wherein 119886119894119896

and 119886119895119896

areeither minimally or maximally similar from a functionalstandpoint Accordingly 119904

119894119895119896might be simply defined as

119904119894119895119896=

0 if 119886119894119896

= 119886119895119896

1 if 119886119894119896= 119886119895119896

(4)

for (3) to yield the fraction of identical aligned residueswhich is a conventional measure of protein sequence simi-larity [11ndash13] However this is an oversimplification that failsto capture the possibility of minimal functional similaritybetween epitopes (eg operationally defined as undetectableantigenic cross-reactivity in an immunoassay) arising froma structural difference of only a single chemical group [14]unless the possibility of antigenic cross-reaction is deniedaltogether

A more realistic physicochemically grounded alternativeis to conceptualize functional similarity on the basis ofbiomolecular binding interactions thereby relating func-tional similarity to binding affinity as measured via animmunoassay that employs some epitope-binding probe (egantibody) Thus functional similarity between epitopes (ie119878119894119895of (2) and (3)) may be defined alternatively as the Shannon

information entropy [34ndash36] for the equilibrium distributionof epitope-bound probe between the 119894th and 119895th epitopeswith both epitopes at the same concentration in the presenceof much less probe such that

119878119894119895= minus (119891

119894log2119891119894+ 119891119895log2119891119895) (5)

where 119891119894and 119891

119895are the fractions of epitope-bound probe

bound to the 119894th and 119895th epitopes respectively (noting that119891119894+ 119891119895= 1 and 119878

119894119895= 1 where the 119894th and 119895th epitopes are

indistinguishable by means of the immunoassay) Constrain-ing 119891119895as 119891119895le 119891119894 it may in turn be defined as

119891119895=

prod119898

119896=1119904119894119895119896

1 + prod119898

119896=1119904119894119895119896

(6)

where 119904119894119895119896

retains its meaning as introduced in (3) Howeverretaining the definition of 119904

119894119895119896given by (4) would be tanta-

mount to denying the possibility of antigenic cross-reactionhence an alternative is warranted

A plausible starting point is the essential unity of proteinfolding and binding phenomena based on steric and elec-trostatic complementarity of molecular surfaces as observedwithin typical folded proteins [20] If this is regarded as rep-resentative of ligand-receptor interfaces wherein the ligand isa peptidic epitope bound to one or more protein receptors(eg antibody or MHC molecule and T-cell receptor) anidealized epitope-binding site may be devised to estimateepitope similarity as potential antigenic cross-reactivity interms of structural differences assuming optimal comple-mentarity with all epitope atoms completely surrounded byand close-packed against the binding-site contact atomsHence suboptimal complementarity would arguably resultif even a single epitope amino-acid residue was structurallyaltered Notably steric clashes conceivably would precludeantigenic cross-reaction if the altered residue was stericallyincompatible with its original counterpart in the sense offailing to assume any conformation allowing it to fit entirelywithin the region of space (as defined by residue van derWaals surfaces) that otherwise would have been occupiedby the original residue at the binding site Thus functionalsimilarity between epitope amino-acid residues (ie 119904

119894119895119896of

(3) (4) and (6)) may be defined alternatively as

119904119894119895119896

=

0 if 119886119894119896 119886119895119896

sterically incompatible

exp(minusΔΔ119866119894119895119896

119877119879

) otherwise

(7)

where 119894 and 119895 denote the original and altered epitopesrespectively 119886

119894119896and 119886

119895119896retain their meanings as in (4)

ΔΔ119866119894119895119896

is the contribution to the change in the overall free-energy change of epitope binding due to replacement of 119886

119894119896

by 119886119895119896119877 is the gas constant and119879 is the absolute temperature

ΔΔ119866119894119895119896

is analogous to the change in the free-energy changeof folding due to replacement of the 119896th residue of a proteinby a structurally different residue (eg to generate a mutantversion of a wild-type protein)

To evaluate (7) steric incompatibility is posited for everysubstitution wherein the net change in volume is positive(ie wherever bulk increases) and also for other substitutionswherein stereochemical differences (eg relating to bondangles and branching) are deemed sufficient to result in stericclashes that preclude antigenic cross-reaction Steric incom-patibility is conceptualized herein on the basis of grouping

4 Advances in Bioinformatics

HN

HO HOArg (R)

NH

O O

O O

O

O O O O O O

OH

O O O

O O

OO

O O O O

Lys (K)

Subset 2 trigonal planar

HO

HN HN

Gln (Q)

HOHO

HOGlu (E)

Subset 4 trigonal planar

membered ring

N

S

HOMet (M)

HS HO

HOCys (C)

HOSer (S)

HOAla (A)

HOGly (G)

O

HOTyr (Y)

Subset 5 branching at tetrahedral 120574-

HOPhe (F)

HOAsn (N)

HOAsp (D)

Subset 6 branching at tetrahedral 120573-carbon atom

Subset 7 withring comprising

main-chain

HN

HOHOHis (H)

HOLeu (L)

HOIle (I)

HOVal (V)

HOThr (T) Pro (P)

H2N

H2N

H2N

H2N H2N H2N H2NH2NH2N

H2N H2N H2N H2N H2NH2N

H2N H2N H2N H2N H2NH2N

NH2

NH2

Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions

OH

120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-

120574-carbon atom in five-

Trp (W)

carbon atom

HOO

membered ring

nitrogen atom

Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)

the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral

The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather

K R (1)

(2) (5) (6)

Q E M L I

(3) (4)

Y W C T V

F H S P (7)

N D A G

Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1

Advances in Bioinformatics 5

Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)

G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0

than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom

According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization

As regards ΔΔ119866119894119895119896

(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as

ΔΔ119866119894119895119896= 119888Δ119881

119894119895119896+ 119887119873119894119895119896 (8)

where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896

is the change in residue volume and 119873119894119895119896

is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope

Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as

119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)

where 119860119894119895119896

is the total number of oxygen and nitrogenatoms in the aligned residues while 119861

119894119895119896is the number of

such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873

119894119895119896

increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both

Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873

119894119895119896= 0 for pairs of identical

residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873

119894119895119896= 1

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Expressing Redundancy among Linear

4 Advances in Bioinformatics

HN

HO HOArg (R)

NH

O O

O O

O

O O O O O O

OH

O O O

O O

OO

O O O O

Lys (K)

Subset 2 trigonal planar

HO

HN HN

Gln (Q)

HOHO

HOGlu (E)

Subset 4 trigonal planar

membered ring

N

S

HOMet (M)

HS HO

HOCys (C)

HOSer (S)

HOAla (A)

HOGly (G)

O

HOTyr (Y)

Subset 5 branching at tetrahedral 120574-

HOPhe (F)

HOAsn (N)

HOAsp (D)

Subset 6 branching at tetrahedral 120573-carbon atom

Subset 7 withring comprising

main-chain

HN

HOHOHis (H)

HOLeu (L)

HOIle (I)

HOVal (V)

HOThr (T) Pro (P)

H2N

H2N

H2N

H2N H2N H2N H2NH2NH2N

H2N H2N H2N H2N H2NH2N

H2N H2N H2N H2N H2NH2N

NH2

NH2

Subset 1 no trigonal planar atoms or branching at 120573 through 120576 positions

OH

120575-carbon atomSubset 3 trigonal planar 120574-carbon atom but no five-

120574-carbon atom in five-

Trp (W)

carbon atom

HOO

membered ring

nitrogen atom

Figure 1 Subsets of the 20 proteinogenic amino acids grouped by stereochemical similarity Structures were obtained from the ChEBI(Chemical Entities of Biological Interest) database (httpwwwebiacukchebi) Within each subset comprising two or more amino acidsthese are arranged from left to right in order of decreasing size such that sterically incompatible changes are avoided by substitution of asmaller amino acid for a larger one The subsets are linked to form the residue substitution paths in Figure 2 wherein Subset 1 comprises anentire path while the other subsets comprise the upstream segments of paths converging on either Cys (C) or Ala (A)

the 20 proteinogenic amino acids into subsets accordingto stereochemical similarity as depicted in Figure 1 Subsetassignments are based on the covalent bonding geometryof nonhydrogen side-chain atoms In particular these atomsare characterized as either tetrahedral or trigonal planarand branching of tetrahedral atoms is also considered Forexamplemethyl andmethylene carbon atoms are tetrahedralas are sulfur and amine nitrogen atoms Hence the sulfhydrylgroup of Cys is regarded as isosteric with a methyl groupwhereas the sulfur atom of Met and the side-chain iminogroup of Arg are both regarded as isosteric with a methylenegroup In contrast amide and carboxyl carbon atoms aretrigonal planar rather than tetrahedral

The subsets defined in Figure 1 are linked to form residuesubstitution paths avoiding sterically incompatible changesas depicted in Figure 2 Subset 1 comprises an entire pathwhile the other subsets comprise the upstream segments ofpaths converging on either Cys or Ala The path of Subset 2converges on Cys as members of Subset 1 larger than Cys aresterically incompatible with Subset 2 whose 120575-carbon atom istrigonal planar The path of Subset 3 converges on Ala rather

K R (1)

(2) (5) (6)

Q E M L I

(3) (4)

Y W C T V

F H S P (7)

N D A G

Figure 2 Amino-acid residue substitution paths avoiding stericallyincompatible changes (cf (7)) for the 20 canonical proteinogenicresidues Numbers correspond to the subsets of stereochemicallysimilar amino acids depicted in Figure 1 For every residue substi-tution along the paths shown the number of unsatisfied hydrogenbonds is presented in Table 1

Advances in Bioinformatics 5

Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)

G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0

than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom

According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization

As regards ΔΔ119866119894119895119896

(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as

ΔΔ119866119894119895119896= 119888Δ119881

119894119895119896+ 119887119873119894119895119896 (8)

where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896

is the change in residue volume and 119873119894119895119896

is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope

Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as

119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)

where 119860119894119895119896

is the total number of oxygen and nitrogenatoms in the aligned residues while 119861

119894119895119896is the number of

such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873

119894119895119896

increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both

Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873

119894119895119896= 0 for pairs of identical

residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873

119894119895119896= 1

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 5

Table 1 Numbers of unsatisfied hydrogen bonds for amino-acid residue substitutions of Figure 2 calculated according to (9)

G A S C D T P N V E Q H L I M K R F Y WG 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashA 0 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashS 1 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashC 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashD 2 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashT 1 1 0 1 mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashP 1 1 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashN 2 2 mdash mdash 2 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashV 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashE 2 2 3 2 mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdashQ 2 2 3 2 mdash mdash mdash mdash mdash 2 0 mdash mdash mdash mdash mdash mdash mdash mdash mdashH 2 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdash mdashL 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdash mdashI 0 0 1 0 mdash 1 mdash mdash 0 mdash mdash mdash mdash 0 mdash mdash mdash mdash mdash mdashM 0 0 1 0 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdash mdash mdash mdashK 1 1 2 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdash mdash mdash mdashR 3 3 4 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash mdash 3 4 0 mdash mdash mdashF 0 0 mdash mdash 2 mdash mdash 2 mdash mdash mdash mdash mdash mdash mdash mdash mdash 0 mdash mdashY 1 1 mdash mdash 3 mdash mdash 3 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 0 mdashW 1 1 mdash mdash mdash mdash mdash mdash mdash mdash mdash 1 mdash mdash mdash mdash mdash mdash mdash 0

than Cys because the 120574-carbon atom is trigonal planar inSubset 3 The case of Subset 4 is thus similar to that of Subset3 but the 120574-carbon atom in Subset 4 is constrained as part ofa five-membered ring such that Subsets 3 and 4 are deemedsterically incompatible The path of Subset 5 (Leu) convergeson Cys as members of Subset 1 larger than Cys are stericallyincompatible with the branching at the 120574-carbon atomof LeuThe path of Subset 6 converges on Ala instead of Cys becauseof the branching at the 120573-carbon atom of Subset 6 Finallythe path of Subset 7 (Pro) converges on Ala as the Pro side-chain is constrained by ring formation with the main-chainnitrogen atom and thus distorted beyond the 120573-carbon atom

According to Figure 2 a substitution from any residueupstream to any residue downstream along a particular pathavoids steric incompatibility Conversely downstream-to-upstream and off-path substitutions are deemed stericallyincompatible This is a qualitative first approximation usinga highly idealized model rather than an attempt at definitivecategorization

As regards ΔΔ119866119894119895119896

(in (7)) this is related to amino-acidresidue structural differences (as a qualitative first approxi-mation rather than an attempt to obtain quantitatively veryaccurate results) by heuristic partitioning as

ΔΔ119866119894119895119896= 119888Δ119881

119894119895119896+ 119887119873119894119895119896 (8)

where 119888 and 119887 are both proportionality constants while Δ119881119894119895119896

is the change in residue volume and 119873119894119895119896

is the number ofunsatisfied hydrogen bonds (Table 1 wherein the residues areordered by their volumes [21]) considering that an energeticpenalty is incurred with unsatisfied hydrogen bonds uponbinding a suboptimally complementary epitope

Table 1 is sparse with a null value (ldquomdashrdquo) for mostcells because (8) is evaluated only for sterically compatiblesubstitutions (cf (7)) Hence the numbers in Table 1 arefor sterically compatible substitutions according to Figure 2The said numbers were each obtained assuming that everyavailable highly electronegative atom participates in theformation of a hydrogen bond as

119873119894119895119896= 119860119894119895119896minus 119861119894119895119896 (9)

where 119860119894119895119896

is the total number of oxygen and nitrogenatoms in the aligned residues while 119861

119894119895119896is the number of

such atoms for which direct correspondence was affirmedas regards structural superposability atom identity andcovalent bonding partners (according to Figure 1) Thus119873

119894119895119896

increases with each failure to affirm a direct correspondencebetween hydrogen-bonding atoms Physically such failurecorresponds to an unsatisfied hydrogen bond on the epitopeits binding site or both

Direct correspondence was affirmed for all main-chain(ie backbone) carbonyl oxygen atoms In view of thedifference between Pro and other residues at the main-chainnitrogen atom direct correspondence was affirmed for thisatom between Pro and itself as well as between non-Proresidues Among side-chain atoms direct correspondencewas affirmed only between hydroxyl oxygen atoms of Serand Thr carbonyl oxygen atoms of Asp and Asn and ofGlu and Gln and 120575-nitrogen atoms of His and Trp Tofacilitate explication a residue is termed polar if its side-chaincontains at least one oxygen or nitrogen atom but nonpolarif otherwise Accordingly 119873

119894119895119896= 0 for pairs of identical

residues (along the diagonal of Table 1) and of non-Prononpolar residues For Pro paired with Gly or Ala 119873

119894119895119896= 1

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Expressing Redundancy among Linear

6 Advances in Bioinformatics

due to themain-chain nitrogen difference For a polar residuepaired with a non-Pro nonpolar residue 119873

119894119895119896is the total

number of side-chain oxygen and nitrogen atoms For a pairof polar residues the total number of side-chain oxygen andnitrogen atoms is an upper bound on 119873

119894119895119896 Hence 119873

119894119895119896= 3

for Tyr paired with Asn although 119873119894119895119896

= 2 for Asn pairedwith Asp

Provisional values for the proportionality constants 119888and 119887 in (8) were obtained from literature In particular119888 was regarded as the energetic penalty per unit volumeof a cavity formed due to substitution of a less bulkyresidue within the interior of a folded protein with valueof minus0024 kcalmolminus1 Aminus1 estimated empirically from proteinmutation studies [37] and 119887 was likewise regarded as theenergetic cost of breaking a hydrogen bond with a conser-vative estimated value of 05 kcalmolminus1 per bond [38]

Thus using (1) through (8) in conjunction with Table 1the literature values for constants in (8) and publishedamino-acid residue volumes [21] reduced epitope counts(1) were computed for artificial epitope sequences and alsoactual experimentally studied epitope sequences assuming atemperature of 37∘C (31015 K)

22 Retrieval and Processing of Epitope Data Epitope datawere obtained via the Immune Epitope Database (IEDBat httpwwwiedborg) [39] as database records retrievedthrough searches conducted from 4 to 6 October 2015 usingits B-Cell Search and T-Cell Search facilities in the lattercase restricting searches to either MHC class I or II alleleswith each record thus retrieved pertaining to an individualB-cell or T-cell assay and containing data fields defined inrelation to the key concepts of ldquoObject Typerdquo (ie moleculetype in the context of chemical structure) ldquoEpitope Relationrdquo(ie molecule relationship to epitope) ldquo1st Immunogenrdquo(ie immunogen initially administered to elicit the immuneresponse) and ldquoAntigenrdquo (ie antigen used in the B-cell or T-cell assay) Searches were restricted such that the data fieldsnamed ldquo1st Immunogen Object Typerdquo and ldquo1st ImmunogenEpitope Relationrdquo had values of ldquoLinear Peptiderdquo and ldquoEpi-toperdquo respectively with the ldquoMHC classrdquo option set to eitherldquoIrdquo or ldquoIIrdquo for T-Cell Search Employing this basic searchstrategy both narrow and broad searches were conducted theformer to retrieve data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitutionand the latter to survey the entire repertoire of availabledata on linear peptidic epitopes (as illustrated by example inFigure 3)

Narrow searches were first attempted to retrieve recordsfor which structural data on epitope binding were availablesetting the ldquoYesrdquo option for the Viewer Flag (under ldquo3DStructure of Complexrdquo) Subsequently the searches were con-ducted to separately retrieve records curated as containingeither negative or positive data on cross-reactivity and wererestricted such that the data fields named ldquoAntigen ObjectTyperdquo and ldquoAntigen Epitope Relationrdquo had values of ldquoLinearPeptiderdquo and ldquoStructurally Relatedrdquo respectively with thedata field named ldquoQualitativeMeasurementrdquo having a value ofeither ldquonegativerdquo or ldquopositiverdquo In the latter case searches werefurther limited to class I MHC-restricted T-cell assays for

Figure 3 IEDB T-Cell Search facility interface(httpwwwiedborgadvancedQueryTcellphp) Example showncorresponds to narrow search for class I MHC-restricted T-cellassays with positive data on cross-reactivity between epitopesdiffering by only a single amino-acid residue substitution Searchesfor B-cell assays were performed using IEDB B-Cell Searchfacility interface (httpwwwiedborgadvancedQueryBcellphp)of analogous form (eg without assay-related fields for ldquoMHCallelerdquo) narrow searches for negative data were performed withoutspecified value for field labeled as ldquoEpitope Structure Definesrdquo andbroad searches were performed without specified value for bothsaid field and that named ldquoQualitative Measurementrdquo (see main textfor additional details)

which the data field named ldquoEpitope Structure Definesrdquo hada value of either ldquoExact Epitoperdquo (rather than ldquoepitope con-taining regionantigenic siterdquo) as cross-reaction may occurwith structural differences outside any relevant epitopesunless this is for a peptide confined within a class I MHCbinding cleft For each record thus retrieved the immunogen

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 7

5 10 15 200Sequence length

0

5

10

15

20

Redu

ced

epito

pe co

unt

Figure 4 Reduced epitope counts (1) for peptidic sequences ofuniform length varying only at a single-residue position Each countcorresponds to a set of 20 peptides representing every standardproteinogenic amino-acid residue at the variable-residue positionFunctional similarity is equated with either fractional aligned-sequence identity (ldquo◻rdquo (3) and (4)) or the Shannon informationentropy for differential epitope binding ((5) through (9)) In thelatter case countswere based on steric incompatibility only (ldquo998810rdquo (7)and Figure 2) both steric incompatibility and cavity formation (ldquordquo119888Δ119881119894119895119896

in (8)) or steric incompatibility with both cavity formationand hydrogen bonding (ldquordquo (8) and (9) and Table 1)

and antigen sequences (ie values of the data fields namedldquo1st Immunogen Object Primary Molecule Sequencerdquo andldquoAntigen Object Primary Molecule Sequencerdquo resp) werecompared The record was considered for further processingonly if the said sequences were of equal length and differedat exactly one amino-acid residue position and the recordwas ultimately retained only if a corresponding record with aldquoQualitative Measurementrdquo data-field value of ldquopositiverdquo wasfound sharing the same immunogen and immunoassay pro-tocol except that the antigen was identical to the immunogen(thus confirming that the immunogen had elicited a relevantanti-peptide immune response in the first place)The epitopedata thus obtained were analyzed in light of the precedingSection 21

The broad searches were conducted to retrieve recordson linear peptidic epitopes in general Among the epitoperecords thus retrieved those wherein the data field namedldquoModificationrdquowas nonempty (indicating amino-acid residuecovalent modification) were excluded from further consider-ation and the remainder were analyzed as regards functionalredundancy according to the preceding Section 21

3 Results and Discussion

31 Functional Redundancy versus Sequence As depicted inFigure 4 computed functional redundancy increases with

epitope sequence length if functional similarity is equatedwith fractional aligned-sequence identity (ldquo◻rdquo) whereas itis independent of sequence length if functional similar-ity is equated with the Shannon information entropy fordifferential epitope binding (ldquo998810rdquo ldquordquo and ldquordquo) The lat-ter approach thus better accounts for the possibility of astructural difference in only a single chemical group beingsufficient to preclude antigenic cross-reaction provided thatthe sequence under consideration is entirely encompassedby the relevant epitope structure as structural differencesmay be functionally irrelevant if they lie outside the epitopeMoreover computed functional redundancy decreases (iethe reduced epitope count approaches the total epitopecount) when both steric incompatibility and the energeticpenalty of cavity formation are considered (ldquordquo) instead ofsteric incompatibility alone (ldquo998810rdquo)The redundancy decreaseseven further if the energetic penalty of unsatisfied hydrogenbonds (more generally representing suboptimal electrostaticcomplementarity) is also considered (ldquordquo)

32 Empirical Cross-Reactivity Data IEDB searches yielded26 records on epitope cross-reactivity vis-a-vis single-residuesubstitution (Table 2) all of which contain only qualitativerather than quantitative binding data without reference toatomic coordinates or other structural details of boundepitopes The data suggest that the posited steric incompat-ibility (in (7) and Figure 2) assumes too little tolerance forsuboptimal complementarity especially as regards volumedifferences between immunogen and antigen in Figure 5which depicts examples of cross-reaction despite substitu-tions increasing residue bulk (ie positive-data points to theleft of the diagonal) This might be at least partially correctedby relaxing or even eliminating the steric-incompatibilitycriteria possibly replacing them with a continuous-formshape-dependent energetic penalty (considering that theyassume an infinite-step hard-sphere model of steric clashesbetween atoms whereas potential energy may be more accu-rately represented by continuous functions of interatomicseparation distances) likewise electrostatic complementarityalso might be more accurately described for example interms of differential energetic contributions due to chargedand uncharged hydrogen-bonding partners However typicalepitope binding actuallymay be suboptimal considering ther-modynamic and kinetic constraints on affinity maturation inB-cell ontogeny and the typical absence of affinitymaturationin T-cell ontogeny which may underlie heteroclitic cross-reaction wherein antigen is bound with higher affinity thanimmunogen

Notwithstanding the problem of suboptimal complemen-tarity described above data presented in Table 2 suggestthat the approach to epitope sequence redundancy proposedherein is more conceptually and practically meaningful thansetting threshold (ie cutoff) values for fractional aligned-sequence identity This is exemplified by the control-reactionepitopes of data rows 18 and 19 (having consensus sequenceLLXRDSFEV) These epitopes differ from each other at justone residue position As typical MHC class I-restricted T-cell epitopes they are nonapeptides for which the highestmeaningful sequence-identity threshold value is 89 (sim89)

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Expressing Redundancy among Linear

8 Advances in Bioinformatics

Table2IEDBQualitative-Outcome(Q)d

atao

npeptidec

ross-reactivity

assays

(cfFigure

5)

Q

Assay

inform

ation

IEDB

Epito

peID

Substitution

Sequ

ence

context(with119883atpo

sitionof

substitution)

Ref

TypeM

HCcla

ssParameter

measured(cfIEDB

ldquomeasuremento

frdquodatafield)

IEDBID

Con

trolreaction

Cross-reactio

n1minus

B-cell

Antibod

y-antig

enbind

ing

1370146

1370150

10801

WY

DXE

DRY

YRE

[22]

2minus

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

854

6260

4PA

SYIX

SAEK

I[23]

3minus

T-cellII

Cell

proliferatio

n1646

734

1646

735

107384

ED

GEP

GIAGFK

GXQ

GPK

[24]

4minus

T-cellII

Cell

proliferatio

n1660246

1660247

46317

FA

NTW

TTCQ

SIAXP

SK[25]

5minus

T-cellII

Cytokine

release(IL-4)

1660252

1660258

112245

AF

NTW

TTCQ

SIAXP

SK[25]

6minus

T-cellII

Cytokine

release(IL-2)

166706

01667065

68846

KA

VHFF

XNIV

TPRT

P[26]

7minus

T-cellII

Cell

proliferatio

n1667112

1667113

80071

AK

VHFF

XNIV

TPRT

P[26]

8minus

T-cellII

Cell

proliferatio

n171660

41716614

125569

TV

SWEG

VGVXP

DV

[27]

9minus

T-cellII

Cell

proliferatio

n171660

41716616

125569

DN

SWEG

VGVTP

XV[27]

10minus

T-cellII

Cell

proliferatio

n1716605

1716618

125571

VT

SWEG

VGVXP

NV

[27]

11+

T-cellI

Cytotoxicity

1634784

1634785

103300

SA

IYXT

VAGSL

[28]

12+

T-cellI

Cytotoxicity

1634784

1634786

103300

GS

IYST

VAXS

L[28]

13+

T-cellI

Cytotoxicity

1634788

1634789

103298

SG

IYAT

VAXS

L[28]

14+

T-cellI

Cytotoxicity

1634788

1634790

103298

AS

IYXT

VASSL

[28]

15+

T-cellI

Cytotoxicity

1657362

1657364

111964

IDYX

FAFR

DL

[29]

16+

T-cellI

Cytokine

release(IFN-120574)

171644

2171644

3125101

ITXM

IKFN

RL[30]

17+

T-cellI

Cytokine

release(IFN-120574)

1340

848

1340

852

6260

4K

DSY

IPSA

EXI

[23]

18+

T-cellI

Cytokine

release(IFN-120574)

1381750

1381748

37174

DG

LLXR

DSFEV

[31]

19+

T-cellI

Cytokine

release(IFN-120574)

1381752

1381753

37392

HG

LLXR

DSFEV

[31]

20+

T-cellI

Cytotoxicity

1404705

1404714

61086

SI

SXIEFA

RL[32]

21+

T-cellI

Cytotoxicity

1404705

1404715

61086

AE

SSIEFX

RL[32]

22+

T-cellI

Cytotoxicity

1404705

1404716

61086

RK

SSIEFA

XL[32]

23+

T-cellI

Cytotoxicity

1404712

1404719

61088

EA

SSIEFX

RL[32]

24+

T-cellI

Cytotoxicity

1404713

1404721

61085

KR

SSIEFA

XL[32]

25+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865530

98995

TF

SAPD

XRPA

[33]

26+

T-cellI

Cytokine

release(IFN-120574)

1865529

1865532

98995

AL

SAPD

TRPX

[33]

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 9

G

S

N

E

K

Y

W

1

2

3

4

5

6

8

12

13

17

18 19

20

21

22

23

2425

A

CD

V

HL

R

15

26

P

Q

M

F

11

7

14

16T

I

109

Ant

igen

resid

ue v

olum

e (Aring3 )

60

80

100

120

140

160

180

200

220

240

80 100 120 140 160 180 200 220 24060Immunogen residue volume (Aring3)

ReferenceNegative B-cellNegative T-cellI

Negative T-cellIIPositive T-cellI

Figure 5 Peptide cross-reactivity assay data of Table 2 vis-a-visamino-acid residue volume [21] of immunogen and antigen atsingle-residue substitution positions along sequence alignments

For longer epitopes (eg typical MHC class II-restrictedT-cell epitopes) the value is even higher For example inthe case of data rows 4 and 5 (having consensus sequenceNTWTTCQSIAXPSK) the said value is 1314 (sim93) Thethreshold values thus calculated are markedly higher thanthose used (eg 70 [20]) for conventional protein sequenceanalysis Moreover regardless of the exact sequence lengthsapplying such threshold values would entail discarding dataon epitopes differing by one residue (eg the examples justcited from Table 2) To avoid such loss of potentially valuableinformation all the available epitope data might be analyzedand a corresponding reduced epitope count reported toindicate the level of redundancy in the underlying dataset

33 Survey of Peptidic Epitope Sequence Data IEDB searchesyielded 11497 records on peptidic epitopes of length less than50 residues (ie the general cutoff provided in the IEDBcuration manual [39]) curated as covalently unmodifiedand comprising 4443 B-cell and 7054 T-cell epitope records(the latter divided between 2749 and 4305 records for MHCrestriction classes I and II resp) for which computedfunctional redundancy vis-a-vis sequence length is depictedin Figure 6

Considering the wide range of sequence counts for thevarious sequence lengths in Figure 6 a logarithmic scale waschosen to depict the data in a compact form Consequentlythe difference calculated as sequence count minus reduced countwas plotted instead of reduced count itself where sequencecount gt reduced count to avoid crowding of data points Forall positive sequence counts (ldquo◻rdquo) the said differences were

consistently higher where functional similarity was equatedwith fractional aligned-sequence identity (ldquotimesrdquo) rather thanthe Shannon information entropy of the differential epitopebinding (ldquordquo) In the latter case (ldquordquo) the differences wereconsistently less than 1 with reduced count close or evenequal to sequence countThese results are anticipated in viewof Figure 4 which shows that reduced count is independent ofsequence length using the entropy-based approach proposedherein but decreases with sequence length using fractionalaligned-sequence identity

The data in Figures 4 and 6 thus point to key considera-tions in evaluating functional redundancy Clearly functionalredundancy may be overestimated by simply equating func-tional similarity with fractional aligned-sequence identityespecially with increasing sequence length for highly similarsequences On the other hand functional redundancy maybe underestimated where functional similarity is equatedwith the Shannon information entropy of differential epitopebinding if the sequence under consideration extends beyondthe relevant epitope structure which becomes more likelywith increasing sequence length Hence the information-entropy approach as developed herein is conceivably appli-cable to sequences not exceeding a realistic application-dependent epitope length (eg six residues for antigenbinding by anti-peptide antibodies [40]) although rigorousgeneralization to longer sequences might be pursued byaligning subsequences of such length andpossibly accountingfor differential immunodominance among immunogen epi-topes [41] For example if a set of peptides were to be foundsuch that each of them comprised a single immunodominantepitope having fixed length (eg six residues in all cases) theinformation-entropy approach might be applied to the entireset even if the peptides were of variable length Moreovereven if the said immunodominant epitopes were highlysimilar to one another in terms of fractional aligned-sequenceidentity all the available data still could be included insubsequent analyses provided that both the total and reducedepitope counts would be reported in order to account forfunctional redundancy In this way useful epitope data couldbe retained instead of discarded

34 Applications and Future Directions The discussion thusfar suggests the potentially greater utility of antigenic func-tional similarity cast as the Shannon information entropy ofdifferential epitope binding (instead of fractional sequence-alignment identity) as basis for computing functional redun-dancy of epitope data to express their structural diversity ina biologically meaningful manner (ie explicitly in terms ofrelative binding affinity which underlies the balance betweenthe inversely related emergent continuum phenomena ofimmunologic specificity and cross-reactivity) This is subjectto the caveat that assuming an idealized epitope-bindingsite of optimal steric and electrostatic complementarity (egcomparable to what is typically observed in a nativelyfolded wild-type protein) may both overestimate affinityfor an immunogen epitope and underestimate affinity forantigen epitopes structurally different from the immuno-gen epitope (thereby possibly overestimating immunologicspecificity as a consequence of underestimating potential

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Expressing Redundancy among Linear

10 Advances in Bioinformatics

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(a)

0001001

011

10100

1000

10 20 30 40 500Sequence length

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

(b)

Sequ

ence

coun

t or

sequ

ence

coun

tminusre

duce

dco

unt

0001001

011

10100

1000

10 20 30 40 500Sequence length

(c)

Figure 6 Peptidic epitope sequence data from IEDB for B-cell assays (a) and for T-cell assays of MHC restriction class I (b) or II (c)Sequence counts (ldquo◻rdquo) are total epitope counts in IEDB Corresponding reduced counts (119903 in (1)) are expressed as the difference (sequencecount minus reduced count) with functional similarity equated with either fractional aligned-sequence identity (ldquotimesrdquo (3) and (4)) or the Shannoninformation entropy for differential epitope binding (ldquordquo (5) through (9)) Missing ldquo◻rdquo indicates sequence count = 0 (eg for B-cell epitopesequence length lt 4 in (a)) where ldquo◻rdquo is present and missing ldquotimesrdquo or ldquordquo indicates sequence count minus reduced count = 0 (eg with missingldquordquo for B-cell epitope sequence length = 5 in (a)) Both ldquotimesrdquo and ldquordquo are missing where sequence count = 1 (eg for B-cell epitope sequencelength = 43 in (a)) as reduced count = 1 for sequence count = 1

for cross-reactivity) These considerations could guide thedevelopment of immunization strategies (eg via vacci-nation) and immunodiagnostics (eg to produce peptidicconstructs suitable for eliciting anti-peptide antibodies thateither protect against disease or serve to detect particularantigens in biological samples via immunoassays) notingthat the practical significance of immunologic specificity andcross-reactivity is highly context-dependent

From the perspective of immunization immunity (eginduced via vaccination or passive transfer of immune-system components) may be useful if specifically directedagainst a pathogen epitope so as to favor pathogen elimi-nation without producing harmful hypersensitivity reactions(eg in the form of allergy or autoimmunity) Yet suchimmunity might be even more useful if it also favored elim-ination of multiple antigenically related variant pathogensvia cross-reaction (such that a single vaccine epitope elicitedthe production of immune-system components each cross-reactive with multiple variant pathogen epitopes) whilestill avoiding hypersensitivity Cross-reactivity of immuneresponses is thus potentially useful within limits defined byrisk of harmful hypersensitivity (eg due to cross-reactionwith host self-antigens) As regards immunodiagnosis animmunodiagnostic test may be useful if it enables specificpathogen detection via discrimination between a pathogen

epitope and a set of other biologically relevant epitopes (egof the host or its commensal symbionts in health or ofother pathogens for which the clinical implications are verydifferent) Yet the test might be even more useful if it alsoenabled detection of multiple antigenically related variantpathogens (such that a single immunologic probe detectedmultiple variant pathogen epitopes) while still retainingits discriminatory power with respect to other biologicallyrelevant epitopes particularly where the variant pathogensare very similar to one another in terms of their clinicalimplications (eg prognosis and therapeutic options)Hencecross-reactivity strongly influences the outcomes of bothimmunization and immunodiagnosis for which reason theirdevelopment could be supported by computational tools thatestimate epitope-binding affinity for cross-reactions

In order to estimate the epitope-binding affinity ofan immune-system component (eg antibody) elicited inresponse to (and thus having a binding site for) immunogenepitope 119894 for antigen epitope 119895 one possible approach wouldbe to express affinity in terms of association constants forexample as

119870119894119895= 119870119894119894119885119894119895 (10)

where 119870119894119895is the association constant for cross-reaction with

antigen epitope 119895 119870119894119894is the association constant for reaction

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 11

with immunogen epitope 119894 and 119885119894119895is a correction term

to account for the difference between the two associationconstants In turn119870

119894119894might be estimated as

119870119894119894= exp(minus

Δ119866119894119894

119877119879

) (11)

where Δ119866119894119894

is the free-energy change for reaction withimmunogen epitope 119894 while both 119877 and 119879 retain theirmeanings as in (7) In cases where epitopes 119894 and 119895 are both119898residues in length 119885

119894119895might be estimated as

119885119894119895=

119898

prod

119896=1

119904119894119895119896 if

119898

prod

119896=1

119904119894119895119896gt 119870minus1

119894119894

119870minus1

119894119894 otherwise

(12)

where the product retains its meaning as in (6) such that119870119894119895ge 1 (with 119870

119894119895= 1 for nonspecific binding) Δ119866

119894119894could be

estimated from epitope structure (eg using structural ener-getics [40ndash42]) subject to mechanism-specific constraints(eg on B-cell affinity maturation [43]) and 119904

119894119895119896could be

estimated along similar lines with regard to ΔΔ119866119894119895119896

in (7)The preceding discussion has alluded to epitope binding

as if it were a binary interaction which is strictly true onlyfor B-cell epitopes (eg bound by surface immunoglobulin orantibody) but in the context of T-cell epitopes the epitope-binding site is to be understood as a bipartite entity consistingof an MHC molecule and a T-cell receptor (TCR) suchthat the epitope becomes at least partly confined betweenthe two binding-site components The overall process of T-cell recognition is subject to thermodynamic and kineticconstraints on initial MHC-peptide binding [44] and subse-quent TCR recognition ofMHC-peptide complex [45] whichpotentially complicate prediction of T-cell cross-reactivityand functional consequences thereof (eg considering howbinding affinity and receptor numbers interact to influenceT-cell effector function [45]) Still predicting B-cell epitopecross-reactivity is challenging in view of the greater structuraldiversity of B-cell epitopes especially where only a fractionof the epitope surface contributes to the epitope-paratopeinterface such that cross-reaction may occur with variationsin epitope structure beyond the interface Hence althoughthe analysis herein for continuous epitopes might be gener-alized to discontinuous epitopes with indexing of residues byrelative sequence positions (instead of consecutive sequence-position numbers) that accounts for any sequence-alignmentgaps correlating variations in epitope-residue structure withpotential for cross-reactivity would be nontrivial

The analysis developed herein conceivably could berefined to better account forMHC-peptide binding as distinctfrom subsequent binding of MHC-peptide complex by TCRcognizant of MHC polymorphism Hence MHC-peptidebinding could be analyzed with explicit consideration of per-tinent MHC structural details (eg noting that unfavorablepeptide backbone conformations may preclude accommoda-tion of peptide within the MHC binding cleft and emphasiz-ing the potential affinity contributions of prospective anchorresidues on peptides vis-a-vis corresponding MHC bindingpockets) Subsequent binding of MHC-peptide complex by

TCR could be analyzed separately with emphasis on peptidesurfaces accessible for contact by TCR (eg excluding thoseburied within MHC binding pockets) This could be morereadily applied to class I rather than class II MHCmoleculesas the latter typically have binding clefts that allow forvariable-length peptide overhangs The analysis for MHC-peptide binding might be extended for class II via dockingsimulations to predict the MHC-bound conformations andaffinity contributions of the overhangs after which thebinding of MHC-peptide complex by TCR could be analyzedwith inclusion of any overhang surfaces that are likely to beaccessible for contact by TCR

At a more fundamental level apparently suboptimalcomplementarity (manifesting as submaximal immunologicspecificity) may be an inherent tradeoff in attaining optimalimmune function from the standpoint of biological fitness(eg enabling each immune-system component to recognizea variety of structurally distinct epitopes albeit at the cost ofbinding each of these epitopes with only submaximal affin-ity) Investigation of this possibility may lead to insights onhow immune responses might be optimally biased (eg viavaccination or immunotherapy) Moreover the asymmetryof cross-reaction with respect to immunogen- and antigen-epitope structures cautions against conceptualizing antigenicdifferences in terms of antigenic distance as a metric that isindependent of immunization history

4 Summary and Conclusions

In assembling epitope datasets (eg to develop and bench-mark epitope-prediction tools) sequence redundancy amongepitopes must be considered to avoid misleading results thatreflect overrepresentation of functionally similar epitopesHowever potentially useful data may be needlessly discardedby excluding epitopes that share an apparently high degree ofsequence similarity as even only a single-residue differencemay manifest as extreme functional dissimilarity betweenepitopesThe present work thus introduces a reduced epitopecount as defined using (1) and (2) to account for functionalredundancy of epitope data (FRED)within an epitope dataset(such that FRED is quantified as the difference between thetotal and reduced epitope counts) This can be used forexample to characterize the dataset instead of excludingepitopes with apparently similar sequences from it therebymaximizing the use of available epitope data

More importantly the present work frames FREDin terms of potential antigenic cross-reactivity (ratherthan sequence distance) construed as functional similaritybetween epitopes Hence pairwise comparison of epitopes isperformed such that each epitope pair considered is regardedas consisting of an immunogen epitope (which elicits theimmune response of interest) and an antigen epitope (whichis the target of cross-reactive binding in an immunoassayto evaluate the immune response) Functional similaritybetween epitopes is thus defined in a physicochemically andbiologically meaningful way as the Shannon informationentropy for differential epitope binding in (5) which iscompatible with the possibility of extreme functional dissim-ilarity between epitopes due to seemingly minor differences

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Expressing Redundancy among Linear

12 Advances in Bioinformatics

in sequence Consequently the complement of functionalsimilarity is not a distance (in the sense of a unique valuequantifying the separation between two points in sequencespace) as its value may vary with reversal of the roles (ieimmunogen or antigen) assigned to epitopes under pairwisecomparison However it is nonetheless a measure of epitopedissimilarity such that summation of its values over allepitopes of a dataset according to (2) enables calculation of anaveraged contribution of each epitope to the reduced epitopecountThis circumvents the conventional dichotomous label-ing of epitopes as either redundant or nonredundant based onan arbitrarily selected threshold value of sequence similaritySuch labeling is problematic in that selection of the actualthreshold value is difficult to justify on physicochemically andbiologically meaningful grounds and because an epitope thusmight be labeled as redundant on the basis of similarity to aminority of other epitopes in the dataset

In line with the preceding considerations epitope func-tional similarity may be estimated in terms of differencesbetween immunogen- and antigen-epitope structure relativeto an idealized binding site of high complementarity to theimmunogen epitope However this tends to underestimatepotential for cross-reactivity which suggests that epitope-binding site complementarity is typically suboptimal Theapparently suboptimal complementarity may reflect a trade-off to attain optimal immune function that favors genera-tion of immune-system components each having potentialfor cross-reactivity with a variety of epitopes Such cross-reactivity conceivably could be exploited in the developmentof practical applications such as vaccines and immunodi-agnostics (eg to aim for beneficially broad cross-reactivityrather than overly extreme immunologic specificity)

Competing Interests

The author declares that they have no competing interests

Acknowledgments

This work was supported by an Angelita T Reyes CentennialProfessorial Chair grant awarded to the author

References

[1] S E C Caoili ldquoAntidotes antibody-mediated immunity andthe future of pharmaceutical product developmentrdquo HumanVaccines and Immunotherapeutics vol 9 no 2 pp 294ndash2992013

[2] S E C Caoili ldquoBeyond new chemical entities advancing drugdevelopment based on functional versatility of antibodiesrdquoHuman Vaccines and Immunotherapeutics vol 10 no 6 pp1639ndash1644 2014

[3] B Greenwood ldquoThe contribution of vaccination to globalhealth past present and futurerdquo Philosophical Transactions ofthe Royal Society of London Series B Biological Sciences vol 369no 1645 Article ID 20130433 2014

[4] M H V Van Regenmortel ldquoWhat is a B-cell epitoperdquoMethodsin Molecular Biology vol 524 pp 3ndash20 2009

[5] M H V Regenmortel ldquoSpecificity polyspecificity and het-erospecificity of antibody-antigen recognitionrdquo Journal ofMolecular Recognition vol 27 no 11 pp 627ndash639 2014

[6] N Tomar and R K De ldquoImmunoinformatics a brief reviewrdquoMethods in Molecular Biology vol 1184 pp 23ndash55 2014

[7] S E C Caoili ldquoAn integrative structure-based framework forpredicting biological effects mediated by antipeptide antibod-iesrdquo Journal of Immunological Methods vol 427 pp 19ndash29 2015

[8] S E C Caoili ldquoImmunization with peptide-protein conjugatesimpact on benchmarking B-cell epitope prediction for vaccinedesignrdquo Protein and Peptide Letters vol 17 no 3 pp 386ndash3982010

[9] S E C Caoili ldquoHybrid methods for B-cell epitope predictionrdquoMethods in Molecular Biology vol 1184 pp 245ndash283 2014

[10] S E C Caoili ldquoBenchmarking B-cell epitope prediction withquantitative dose-response data on antipeptide antibodiestowards novel pharmaceutical product developmentrdquo BioMedResearch International vol 2014 Article ID 867905 13 pages2014

[11] U Hobohm M Scharf R Schneider and C Sander ldquoSelectionof representative protein data setsrdquo Protein Science vol 1 no 3pp 409ndash417 1992

[12] U Lessel andD Schomburg ldquoCreation and characterization of anew non-redundant fragment data bankrdquo Protein Engineeringvol 10 no 6 pp 659ndash664 1997

[13] T Noguchi and Y Akiyama ldquoPDB-REPRDB a database ofrepresentative protein chains from the Protein Data Bank(PDB) in 2003rdquo Nucleic Acids Research vol 31 no 1 pp 492ndash493 2003

[14] P Motte G Alberici M Ait-Abdellah and D Bellet ldquoMono-clonal antibodies distinguish synthetic peptides that differ inone chemical grouprdquo The Journal of Immunology vol 138 no10 pp 3332ndash3338 1987

[15] M K Gilson and S E Radford ldquoProtein folding and bindingfrom biology to physics and back againrdquo Current Opinion inStructural Biology vol 21 no 1 pp 1ndash3 2011

[16] B Rost ldquoTwilight zone of protein sequence alignmentsrdquo ProteinEngineering vol 12 no 2 pp 85ndash94 1999

[17] B Y Khor G J Tye T S Lim and Y S Choong ldquoGeneraloverview on structure prediction of twilight-zone proteinsrdquoTheoretical Biology and Medical Modelling vol 12 article 152015

[18] H Nakamura ldquoRoles of electrostatic interaction in proteinsrdquoQuarterly Reviews of Biophysics vol 29 no 1 pp 1ndash90 1996

[19] P J Fleming and G D Rose ldquoDo all backbone polar groups inproteins form hydrogen bondsrdquo Protein Science vol 14 no 7pp 1911ndash1917 2005

[20] S Basu D Bhattacharyya and R Banerjee ldquoSelf-complementarity within proteins bridging the gap betweenbinding and foldingrdquo Biophysical Journal vol 102 no 11 pp2605ndash2614 2012

[21] Y Harpaz M Gerstein and C Chothia ldquoVolume changes onprotein foldingrdquo Structure vol 2 no 7 pp 641ndash649 1994

[22] A Handisurya S Gilch D Winter et al ldquoVaccination withprion peptide-displaying papillomavirus-like particles inducesautoantibodies to normal prion protein that interfere withpathologic prion protein production in infected cellsrdquo FEBSJournal vol 274 no 7 pp 1747ndash1758 2007

[23] M Plebanski E AM Lee CMHannan et al ldquoAltered peptideligands narrow the repertoire of cellular immune responses byinterfering with T-cell primingrdquo Nature Medicine vol 5 no 5pp 565ndash571 1999

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Expressing Redundancy among Linear

Advances in Bioinformatics 13

[24] E Michaelsson M Andersson A Engstrom and R HolmdahlldquoIdentification of an immunodominant type-II collagen peptiderecognized by T cells in H-2q mice self tolerance at the level ofdeterminant selectionrdquo European Journal of Immunology vol22 no 7 pp 1819ndash1825 1992

[25] J M Greer C Klinguer E Trifilieff R A Sobel and M BLees ldquoEncephalitogenicity of murine but not bovine DM20in SJL mice is due to a single amino acid difference in theimmunodominant encephalitogenic epitoperdquo NeurochemicalResearch vol 22 no 4 pp 541ndash547 1997

[26] A Gaur S A Boehme D Chalmers et al ldquoAmeliorationof relapsing experimental autoimmune encephalomyelitis withaltered myelin basic protein peptides involves different cellularmechanismsrdquo Journal of Neuroimmunology vol 74 no 1-2 pp149ndash158 1997

[27] A T Kozhich Y-I Kawano C E Egwuagu et al ldquoA pathogenicautoimmune process targeted at a surrogate epitoperdquo TheJournal of Experimental Medicine vol 180 no 1 pp 133ndash1401994

[28] H Masaki M Tamura and I Kurane ldquoInduction of cytotoxicT lymphocytes of heterogeneous specificities by immunizationwith a single peptide derived from influenza A virusrdquo ViralImmunology vol 13 no 1 pp 73ndash81 2000

[29] G B Lipford S Bauer H Wagner and K Heeg ldquoPeptideengineering allows cytotoxic T-cell vaccination against humanpapilloma virus tumour antigen E6rdquo Immunology vol 84 no2 pp 298ndash303 1995

[30] C L Keech K C Pang J McCluskey and W Chen ldquoDirectantigen presentation by DC shapes the functional CD8+ T-cellrepertoire against the nuclear self-antigen La-SSBrdquo EuropeanJournal of Immunology vol 40 no 2 pp 330ndash338 2010

[31] S Tangri G Y Ishioka X Huang et al ldquoStructural features ofpeptide analogs of human histocompatibility leukocyte antigenclass I epitopes that are more potent and immunogenic thanwild-type peptiderdquo Journal of Experimental Medicine vol 194no 6 pp 833ndash846 2001

[32] S J Turner and F R Carbone ldquoA dominant V120573 bias in theCTL response after HSV-1 infection is determined by peptideresidues predicted to also interact with the TCR 120573- chainCDR3rdquoMolecular Immunology vol 35 no 5 pp 307ndash316 1998

[33] E Lazoura J LoddingW Farrugia et al ldquoEnhancedmajor his-tocompatibility complex class I binding and immune responsesthrough anchor modification of the non-canonical tumour-associated mucin 1-8 peptiderdquo Immunology vol 119 no 3 pp306ndash316 2006

[34] C E Shannon ldquoAmathematical theory of communicationrdquoTheBell System Technical Journal vol 27 no 4 pp 623ndash656 1948

[35] E T Jaynes ldquoInformation theory and statistical mechanicsrdquoPhysical Review vol 106 no 4 pp 620ndash630 1957

[36] E T Jaynes ldquoInformation theory and statistical mechanics IIrdquoPhysical Review vol 108 no 2 pp 171ndash190 1957

[37] A E Eriksson W A Baase X-J Zhang et al ldquoResponse of aprotein structure to cavity-creating mutations and its relationto the hydrophobic effectrdquo Science vol 255 no 5041 pp 178ndash183 1992

[38] B Honig and A-S Yang ldquoFree energy balance in proteinfoldingrdquoAdvances in Protein Chemistry vol 46 pp 27ndash58 1995

[39] R Vita J A Overton J A Greenbaum et al ldquoThe immuneepitope database (IEDB) 30rdquoNucleic Acids Research vol 43 no1 pp D405ndashD412 2015

[40] S E C Caoili ldquoA structural-energetic basis for B-cell epitopepredictionrdquo Protein and Peptide Letters vol 13 no 7 pp 743ndash751 2006

[41] S E C Caoili ldquoBenchmarking B-cell epitope prediction forthe design of peptide-based vaccines problems and prospectsrdquoJournal of Biomedicine and Biotechnology vol 2010 Article ID910524 14 pages 2010

[42] S P Edgcomb and K P Murphy ldquoStructural energetics ofprotein folding and bindingrdquo Current Opinion in Biotechnologyvol 11 no 1 pp 62ndash66 2000

[43] S E C Caoili ldquoOn the meaning of affinity limits in B-cellepitope prediction for antipeptide antibody-mediated immu-nityrdquo Advances in Bioinformatics vol 2012 Article ID 34676517 pages 2012

[44] NDalchau A Phillips LDGoldstein et al ldquoA peptide filteringrelation quantifies MHC class I peptide optimizationrdquo PLoSComputational Biology vol 7 no 10 Article ID e1002144 2011

[45] L J Edwards and B D Evavold ldquoT cell recognition of weakligands roles of signaling receptor number and affinityrdquoImmunologic Research vol 50 no 1 pp 39ndash48 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Expressing Redundancy among Linear

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology