prediction of promiscuous peptides that bind hla class i molecules

6
Immunology and Cell Biology (2002) 80, 280–285 Special Feature Prediction of promiscuous peptides that bind HLA class I molecules VLADIMIR BRUSIC, 1 NIKOLAI PETROVSKY, 2 GUANGLAN ZHANG 1 and VLADIMIR B BAJIC 1 1 Kent Ridge Digital Labs, Singapore and 2 National Bioinformatics Centre, University of Canberra and National Health Sciences Centre, Canberra Hospital, Woden, Australian Capital Territory, Australia Summary Promiscuous T-cell epitopes make ideal targets for vaccine development. We report here a computa- tional system, MULTIPRED, for the prediction of peptide binding to the HLA-A2 supertype. It combines a novel representation of peptide/MHC interactions with a hidden Markov model as the prediction algorithm. MULTIPRED is both sensitive and specific, and demonstrates high accuracy of peptide-binding predictions for HLA-A*0201, *0204, and *0205 alleles, good accuracy for *0206 allele, and marginal accuracy for *0203 allele. MULTIPRED replaces earlier requirements for individual prediction models for each HLA allelic variant and simplifies computational aspects of peptide-binding prediction. Preliminary testing indicates that MULTIPRED can predict peptide binding to HLA-A2 supertype molecules with high accuracy, including those allelic variants for which no experimental binding data are currently available. Key words: hidden Markov models, HLA allele, immunoinformatics, peptide binding, predictive modelling. Introduction T cells recognize short peptides resulting from the intracellu- lar processing of foreign and self proteins, presented bound to specific cell surface molecules encoded by the MHC. There are two discrete classes of MHC molecules: (i) MHC class I presents endogenous peptides; and (ii) MHC class II presents exogenous peptides. The process of MHC class I antigen presentation involves protein degradation, peptide transport to the endoplasmic reticulum, peptide–MHC binding and export of peptide–MHC complexes to the cell surface for recognition by CD8 T cells. 1 Peptides are bound within a specific MHC- binding groove, the shape and characteristics of which results in the binding of specific subsets of peptides sharing a common binding motif. T cells are activated when the T-cell receptor recognizes a specific peptide–MHC complex, and in this way identify cells infected by intracellular parasites or viruses or cells containing abnormal proteins (e.g. tumour cells) and mount appropriate immune responses against them. The peptides involved in specific peptide–MHC complexes triggering T-cell recognition (T-cell epitopes) are important tools for the diagnosis and treatment of infectious, 2 auto- immune, 3 allergic 4 and neoplastic diseases. 5 Because T-cell epitopes are subsets of MHC-binding peptides, precise identi- fication of portions of proteins that can bind MHC molecules is important for the design of vaccines and immunotherapeutics. MHC-binding peptides contain position-specific amino acids that interact with the groove of the MHC molecule, contributing to peptide binding. 6 The preferred amino acids at each position of the binding motif may vary between allelic variants of MHC molecules. Computational models facilitate identification of peptides that bind various MHC molecules. A variety of computational methods have been used for the prediction of MHC class I binding peptides, including those based on binding motifs, 7 quantitative matrices, 8 artificial neural networks (ANN), 9 and hidden Markov models (HMM). 10 Artificial neural networks can model non-linear interactions and have higher predictive accuracy than binding motifs or quantitative matrices for the prediction of MHC class I 11 and II 12 binding peptides. MHC-binding peptides can also be predicted by molecular modelling, 13 using knowledge of basic physico-chemical properties, crystal structure, and protein– protein interactions. Previously, the approach to statistical or molecular modelling of MHC binding has been to create a unique model for each individual MHC allelic variant for which sufficient data are available to build the model. Peptides that bind more than one MHC allelic variant (‘promiscuous peptides’) are prime targets for vaccine and immunotherapy development because they are relevant to higher proportions of the human population. Computational strategies for determination of promiscuous MHC class II binding peptides, using multiple quantitative matrices, 14,15 have been used for cancer 16 and infectious disease 17 vaccine development. Promiscuous peptides have also been reported in relation to HLA class I supertypes, the human form of MHC. 18 Peptides that bind with high affinity to a given HLA class I molecule, frequently bind multiple, but not all, HLA molecules belonging to the same supertype, such as HLA-A2, 19 HLA-A3 20 or HLA-B7. 21 It was reported that each molecule of the HLA-A2 supertype has unique allele-specific peptide- binding preferences, although there are some supertype- shared features, and some peptides bind some, but not all, allelic variants. 22 Binding motifs and quantitative matrices have been reported for a limited number of common HLA class I molecules. Out of 49 well-defined HLA-A2 supertype alleles, 23 binding motifs have been reported for only eight (HLA*0201, *0202, *0204, *0205, *0206, *0209, *0214 and Correspondence: V Brusic, Kent Ridge Digital Labs, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore. Email: [email protected] Received 14 February 2002; accepted 14 February 2002.

Upload: vladimir-b

Post on 06-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prediction of promiscuous peptides that bind HLA class I molecules

Immunology and Cell Biology (2002) 80, 280–285

Special Feature

Prediction of promiscuous peptides that bind HLA class I molecules

V L A D I M I R B R U S I C , 1 N I K O L A I P E T R O V S K Y , 2 G U A N G L A N Z H A N G 1 a n d V L A D I M I R B B A J I C 1

1Kent Ridge Digital Labs, Singapore and 2National Bioinformatics Centre, University of Canberra and National Health Sciences Centre, Canberra Hospital, Woden, Australian Capital Territory, Australia

Summary Promiscuous T-cell epitopes make ideal targets for vaccine development. We report here a computa-tional system, MULTIPRED, for the prediction of peptide binding to the HLA-A2 supertype. It combines a novelrepresentation of peptide/MHC interactions with a hidden Markov model as the prediction algorithm. MULTIPRED isboth sensitive and specific, and demonstrates high accuracy of peptide-binding predictions for HLA-A*0201,*0204, and *0205 alleles, good accuracy for *0206 allele, and marginal accuracy for *0203 allele. MULTIPRED

replaces earlier requirements for individual prediction models for each HLA allelic variant and simplifiescomputational aspects of peptide-binding prediction. Preliminary testing indicates that MULTIPRED can predictpeptide binding to HLA-A2 supertype molecules with high accuracy, including those allelic variants for which noexperimental binding data are currently available.

Key words: hidden Markov models, HLA allele, immunoinformatics, peptide binding, predictive modelling.

Introduction

T cells recognize short peptides resulting from the intracellu-lar processing of foreign and self proteins, presented bound tospecific cell surface molecules encoded by the MHC. Thereare two discrete classes of MHC molecules: (i) MHC class Ipresents endogenous peptides; and (ii) MHC class II presentsexogenous peptides. The process of MHC class I antigenpresentation involves protein degradation, peptide transport tothe endoplasmic reticulum, peptide–MHC binding and exportof peptide–MHC complexes to the cell surface for recognitionby CD8 T cells.1 Peptides are bound within a specific MHC-binding groove, the shape and characteristics of which resultsin the binding of specific subsets of peptides sharing acommon binding motif. T cells are activated when the T-cellreceptor recognizes a specific peptide–MHC complex, and inthis way identify cells infected by intracellular parasites orviruses or cells containing abnormal proteins (e.g. tumourcells) and mount appropriate immune responses against them.The peptides involved in specific peptide–MHC complexestriggering T-cell recognition (T-cell epitopes) are importanttools for the diagnosis and treatment of infectious,2 auto-immune,3 allergic4 and neoplastic diseases.5 Because T-cellepitopes are subsets of MHC-binding peptides, precise identi-fication of portions of proteins that can bind MHC molecules isimportant for the design of vaccines and immunotherapeutics.

MHC-binding peptides contain position-specific aminoacids that interact with the groove of the MHC molecule,contributing to peptide binding.6 The preferred amino acids ateach position of the binding motif may vary between allelicvariants of MHC molecules. Computational models facilitate

identification of peptides that bind various MHC molecules.A variety of computational methods have been used for theprediction of MHC class I binding peptides, including thosebased on binding motifs,7 quantitative matrices,8 artificialneural networks (ANN),9 and hidden Markov models (HMM).10

Artificial neural networks can model non-linear interactionsand have higher predictive accuracy than binding motifs orquantitative matrices for the prediction of MHC class I11 andII12 binding peptides. MHC-binding peptides can also bepredicted by molecular modelling,13 using knowledge of basicphysico-chemical properties, crystal structure, and protein–protein interactions. Previously, the approach to statistical ormolecular modelling of MHC binding has been to create aunique model for each individual MHC allelic variant forwhich sufficient data are available to build the model.

Peptides that bind more than one MHC allelic variant(‘promiscuous peptides’) are prime targets for vaccine andimmunotherapy development because they are relevant tohigher proportions of the human population. Computationalstrategies for determination of promiscuous MHC class IIbinding peptides, using multiple quantitative matrices,14,15

have been used for cancer16 and infectious disease17 vaccinedevelopment. Promiscuous peptides have also been reportedin relation to HLA class I supertypes, the human form ofMHC.18 Peptides that bind with high affinity to a given HLAclass I molecule, frequently bind multiple, but not all, HLAmolecules belonging to the same supertype, such as HLA-A2,19

HLA-A320 or HLA-B7.21 It was reported that each moleculeof the HLA-A2 supertype has unique allele-specific peptide-binding preferences, although there are some supertype-shared features, and some peptides bind some, but not all,allelic variants.22 Binding motifs and quantitative matriceshave been reported for a limited number of common HLAclass I molecules. Out of 49 well-defined HLA-A2 supertypealleles,23 binding motifs have been reported for only eight(HLA*0201, *0202, *0204, *0205, *0206, *0209, *0214 and

Correspondence: V Brusic, Kent Ridge Digital Labs, 21 Heng MuiKeng Terrace, Singapore 119613, Singapore. Email: [email protected]

Received 14 February 2002; accepted 14 February 2002.

Page 2: Prediction of promiscuous peptides that bind HLA class I molecules

Prediction of promiscuous peptides 281

*0217)6 and quantitative matrices for two (*0201 and *0205)8

molecules.A single model that could predict with high accuracy

peptide binding to multiple MHC molecules would be auseful tool to search for promiscuous MHC-binding peptides.Here we describe a system, MULTIPRED, for the prediction ofpromiscuous MHC class I binding peptides. This system isbased on the modelling of peptide interactions with multipleMHC molecules that belong to the same supertype. Therepresentation of peptide–MHC interaction includes both thepeptide and selected portions of the MHC molecule. MULTI-

PRED uses HMM as the prediction algorithm. By reformulat-ing the problem, we were able to develop a binding predictionsystem that replaces multiple models and simplifies computa-tional aspects of peptide-binding prediction. MULTIPRED wastested using HLA-A2 supertype data. This testing indicatedthat MULTIPRED can predict peptide binding to HLA-A2supertype molecules with high accuracy, including thoseallelic variants for which no experimental binding data areavailable.

Materials and Methods

Peptide data

Peptide data were extracted from the MHCPEP database,24 publishedarticles, and a set of HLA non-binding peptides (V. Brusic, unpubl.data). We collected data on binding or non-binding 9-mer peptidesfor nine alleles of the HLA-A2 supertype (Table 1).

Representation of peptide–HLA interactions

The interaction site consists of the whole length of the peptide sittingin the groove of the HLA-A*0201 molecule. The positional binding

environments of the HLA-binding peptide have been resolved bycrystallography.25 HLA-A*0201 has 48 contact amino acids on thesurface of the binding groove that together constitute the peptideinteraction site (Fig. 1). The HLA-A-binding peptide positionalenvironments involve fixed positions across the sequences of theHLA-A allelic variants (Table 2).26 We analysed the 49 well-definedHLA-A2 allelic variants for variation. Out of nearly 30 polymorphicresidues, we identified 19 that are involved in forming the peptideinteraction site (Table 2; italicised contact residues).

We represented the specific HLA–peptide interaction by combin-ing each amino acid of the peptide with the variable amino acids of itspositional environment (P stands for the amino acid within thebinding peptide and C stands for contact amino acid within the HLAmolecule). General representation of the interaction is thus a virtualpeptide:

P1-C62-C66-C99-C163-C167-P2-C9-C24-C62-C66-C70-C99-C163-C167-…-P9-…-C142The contact residues were used in the order they appear in the linearsequence of HLA molecules. An amino acid contact position may be

Table 1 Number of peptides in training and test sets for variousHLA-A2 alleles

HLA allele Training data Test (binders) Test (non-binders)

0201 147 444 20050202 560 31 190203 562 29 40204 567 24 2240205 575 16 400206 561 30 320207 587 4 110209 586 5 10214 583 8 1

Figure 1 Identification of peptide contact residues within the HLA-A*0201 molecule, and the representation of the peptide interactionsite of the HLA receptor. The numbers above the sequence represent peptide positions that interact with contact amino acids. Theinteraction forming positions are within the first 180 amino acids of the 0201 molecule.

Page 3: Prediction of promiscuous peptides that bind HLA class I molecules

282 V Brusic et al.

involved in forming contact residues of positional environments ofmore than one amino acid of the peptide. The representation of any9-mer peptide is thus a virtual peptide containing 73 amino acids.Thus, we have defined uniform representations of peptide–HLAinteractions for multiple HLA-A2 alleles.

Algorithm

We used HMM as the prediction engine. Hidden Markov modelsbelong to a class of probabilistic discrete dynamic system models. AnHMM is defined by a finite set of states representing possible statesof the modelled system. Some of these states may be directlyobservable, but some are not, and are denoted as hidden. Biologicalproblems are often sequential and HMM frequently utilize sequentialordering of system states. A change (transition) of the system fromone state to another is governed by statistical regularities. Theprobability distribution of the system states can be estimated from thedata. Transitions between the states follow a set of transition andemission probabilities. A current state of the system described by anHMM depends on one or more previous states the system was in. Adetailed description of HMM can be found elsewhere.10,27 In thepresent study, we used a first-order HMM,28 in which the currentsystem state is determined only by the preceding state.

The process used for building the HMM29 included the followingsteps: (i) input of virtual peptides; (ii) calculation of distancesbetween virtual peptides to cluster them; (iii) weighting of eachvirtual peptide using the BLOSSUM6230 distance matrix, a treealgorithm and scaling; (iv) adding the prior contribution of aminoacids (occurrence frequencies) to the model; (v) using Dirichletmixtures to estimate prior probability distributions associated withthe model; and (vi) model configuration.

The HMM produces scores that are calculated by a generalformula E = Σ – (log ps), where ps is the probability of an individualcase. The Viterbi algorithm is used for determining the set ofsuccessive states that most likely produced the observed state. Valuesof E scores above the selected threshold represent positive predic-tions (binders), and those below the threshold represent negativepredictions (non-binders). Descriptions of such technical details areavailable in the literature.27

Training, testing and validation

We trained five HMM models, that is, one model for the prediction ofpeptide binding to each selected HLA-A2 allele. The training set forthe HLA-A*0201 model contained virtual peptides built using allpeptides known to bind other HLA-A2 alleles. Thus, the *0201training set consisted of 147 virtual peptides (31, 29, 24, 16, 30, 4, 5

and eight binders to the alleles *0202, *0203, *0204, *0205, *0206,*0207, *0209 and *0214, respectively). The test set for HLA-A*0201included all peptides for which *0201 binding affinity was known.The training and test sets for the other four (HLA-A*0202, *0204,*0205 and *0206) alleles were built using the same scheme. Becauseof the small number of test peptides, models for the prediction ofpeptide binding to HLA-A*0203, *0207, *0209 and *0214 could notbe tested reliably, and were excluded from the present study.

The predictive performance was assessed using the defined testset for each HLA-A2 peptide-binding model. Assessment of predic-tive performance was carried out using sensitivity (SE) and specifi-city (SP) of predictions, as well as receiver operating characteristic(ROC) analysis. Sensitivity SE = TP/(TP + FN) and specificitySP = TN/(TN + FP), indicate percentages of correctly predictedbinders and non-binders, and always come as a paired measure (TP,true positives [experimental binder predicted as binder]; TN, truenegatives [experimental non-binder predicted as binder]; FN, falsenegatives [experimental binder predicted as non-binder]; and FP,false positives [experimental non-binder predicted as binder]). Weconsider values of SP ≥ 0.8 useful in practice and assessed SE forthree values of SP (0.8, 0.9 and 0.95).

A single measure of the accuracy of predictive models is providedby ROC analysis.31 The ROC curve is generated by plotting thefunction SE = ƒ(1-SP) for various classification thresholds. The areaunder the ROC curve (Aroc) provides a measure of overall predictionaccuracy. Values of Aroc = 50% indicate random-choice; while theaccuracy of predictions are poor for values of Aroc < 70%, good forvalues of Aroc > 80% and excellent for values of Aroc > 90%.31

Binding peptides for all alleles studied comprise a mix of low-,moderate- and high-affinity binders. Binders or T-cell epitopes, forwhich specific binding affinity was not known, were considered asmoderate binders. Assessment of predictive accuracy was carried outfor three subsets of peptide binders: (i) low, moderate and highbinders as a group; (ii) moderate and high binders; or (iii) highbinders only.

Results

The assessment of accuracy of the five HLA-A2 moleculeprediction models is shown in Fig. 2. Overall, predictionswere of a high accuracy for peptide binding to HLA-A*0201,*0204 and *0205 (Aroc > 0.9). The *0206 binding predictionswere of high accuracy (Aroc > 0.8). Predictions for *0202 wereof moderate accuracy for all peptides (Aroc = 0.72), but particu-larly poor for moderate- to high-affinity peptides (Aroc < 0.7).

The reason for poor performance of predictions for HLA-A*0202 could be either that the MULTIPRED system did not

Table 2 Peptide residue positional environments for HLA class I molecules. Contact residues are shown in Fig. 1. The contact residues that varybetween HLA-A2 alleles are italicized

Positions Contact residues (C)

P1 5, 7, 33, 59, 62, 63, 66, 99, 159, 163, 167, 171P2 7, 9, 24, 25, 26, 34, 35, 36, 45, 62, 63, 66, 67, 70, 99, 159, 163, 167P3 7, 9, 62, 66, 70, 97, 99, 114, 152, 155, 156, 159, 163P4 62, 65, 66, 69, 70, 155, 156, 159P5 69, 70, 73, 74, 97, 114, 116, 152, 155, 156, 159P6 7, 9, 22, 24, 66, 69, 70, 73, 74, 97, 99, 114, 116, 133, 147, 152, 155, 156P7 73, 77, 97, 114, 116, 133, 146, 147, 150, 152, 155, 156P8 73, 76, 77, 80, 97, 143, 146, 147P9 70, 73, 74, 76, 77, 80, 81, 84, 95, 96, 97, 114, 116, 123, 124, 142, 143, 146, 147All 5, 7, 9, 22, 24, 25, 26, 33, 34, 35, 36, 45, 59, 62, 63, 65, 66, 67, 69, 70, 73, 74, 76, 77, 80, 81, 84, 95, 96, 97, 99, 114, 116, 123

124, 133, 142, 143, 146, 147, 150, 152, 155, 156, 159, 163, 167, 171

Page 4: Prediction of promiscuous peptides that bind HLA class I molecules

Prediction of promiscuous peptides 283

properly learn the rules from the training set, or that the testset (31 binders and 19 non-binders) for this molecule containssome kind of bias, or that the training set does not containsufficient information to infer binding rules in this case.HLA-A*0202 is very similar to the excellent predictor HLA-A*0205, the only difference in the peptide-binding site ofthese two alleles being position 9 (F9 in *0202 and Y9 in*0205). In HLA-A molecules, position 9 is involved in thepositional environment of P2 of a binding peptide,26 which isa major anchor in all defined HLA-A2 motifs.6 Furtherresearch and experimental data are needed for clarification ofthe discrepancy in accuracy of the MULTIPRED HLA-A*0202model.

The sensitivities of MULTIPRED predictions were calcu-lated for the three thresholds that define levels of specificityconsidered as useful (Table 3). A value of SE = 0.80 provideshigh-sensitivity predictions, whereas a value of SE = 0.95yields a lower number of true positives, but also fewer falsepositives. The prediction results for HLA-A*0201, *0204 and*0205 MULTIPRED models were in accordance with theexpected pattern and provided sensitivities of approximately

90% for all groups of binders. For higher levels of specificity(SP = 0.90 or SP = 0.95), the sensitivity values were some-what lower, but still showed excellent performances, since inthe worst instance (high binders to HLA-A*0201; SP = 0.95),sensitivity was still SE = 0.49, meaning that half of the high-affinity binders were correctly predicted.

The MULTIPRED model for HLA-A*0206 performed wellfor specificity levels of SP = 0.80 and SP = 0.90. Increasingspecificity further, to SP = 0.95, resulted in a poor prediction.Therefore, the current model for HLA-A*0202 should beused only for the group of all binders (LMH) at the specificitylevel of SP = 0.80.

The specificity/sensitivity results are in agreement withthe results of the ROC analysis. We have developed a datarepresentation that enabled us to build a single model foraccurate predictions of peptide binding to multiple alleles ofHLA-A2 molecules. Because each MULTIPRED model reportedhere was developed using binding data for other alleles, ourresults indicate that we can make accurate predictions forpeptide binding to allelic variants that lack experimentalbinding data, provided that we have binding data to othermolecules that belong to the same HLA class I supertype.

Discussion

The relevant polymorphic positions in HLA-A2 moleculesanalysed in the present study were positions 9 (positionalenvironment for P2, P3 and P6 of the peptide; residues F orY), 95 (P9; V, L or I), 97 (P3, P5, P6, P7, P8 and P9; R or M),152 (P3, P5, P6 and P7; V, E or A), and 156 (P3, P4, P5, P6and P7; L, W, Q or R). Additional polymorphic positions inHLA-A2 sequences that cover all HLA-A2 alleles are 24 (P2and P6; A or S), 62 (P1, P2, P3 and P4; G, E or Q), 65 (P4; Ror G), 66 (P1, P2, P3, P4 and P6; K, N or Q), 73 (P5, P7, P8and P9; T, I or S), 74 (P5, P6 and P9; H or D), 99 (P1, P2, P3and P6; Y, C, F or S), 114 (P3, P5, P6, P7 and P9; H, Q or I),116 (P5, P6 and P9; Y or H), 142 (P9; T or I), 163 (P1, P2 andP3; T, E or R), and 167 (P1, and P2; W or G). Out of allpositions in 49 well-described HLA-A2 molecules that com-prise peptide contact sites, 17 positions are polymorphic. Outof these, two different amino acids have been observed ineight positions, three different amino acids in seven positions,and four amino acids in the remaining two positions. Theoret-ically, this genetic variation provides nearly 10 million uniquepossibilities for forming the HLA-A2 peptide binding groove,

Figure 2 The assessment of the performance of models forprediction of peptide binding to five HLA-A2 allelic variants. �,LMH (low, moderate, high affinity); , MH (moderate to highaffinity); �, H (only high affinity binders).

Table 3 Sensitivities and prediction thresholds for MULTIPRED predictions of peptide binding to five HLA-A2 alleles. The values are given forthree levels of specificity (0.8, 0.9 and 0.95). The predictions were assessed for all binders

Level GroupSensitivities and prediction thresholds for HLA-A2 alleles

*0201 *0202 *0204 *0205 *0206

SP = 0.80 LMH 0.91 (241.9) 0.52 (252.8) 0.92 (212.1) 0.88 (238.9) 0.73 (249.7)MH 0.93 (242.4) 0.11 (254.4) 0.90 (212.4) 0.93 (238.0) 0.72 (249.8)H 0.92 (243.2) 0.21 (254.2) 0.88 (212.5) 0.92 (239.8) 0.50 (251.0)

SP = 0.90 LMH 0.81 (244.2) 0.10 (254.5) 0.83 (213.6) 0.81 (240.7) 0.57 (250.8)MH 0.80 (244.9) 0.07 (254.5) 0.90 (213.8) 0.87 (240.7) 0.52 (251.0)H 0.72 (246.1) 0.06 (254.5) 0.82 (214.0) 0.58 (242.9) 0.20 (252.6)

SP = 0.95 LMH 0.62 (246.5) 0.06 (255.1) 0.75 (215.1) 0.56 (242.9) 0.07 (253.3)MH 0.56 (247.4) 0.04 (255.1) 0.80 (215.1) 0.60 (242.9) 0.07 (253.3)H 0.49 (248.3) 0.07 (255.1) 0.71 (215.5) 0.58 (243.9) 0.20 (252.7)

LMH, high-, moderate- and low-affinity binders; MH, moderate- and high-affinity binders; H, high-affinity binders only.

Page 5: Prediction of promiscuous peptides that bind HLA class I molecules

284 V Brusic et al.

of which 49 functional HLA-A2 alleles are known to exist inhumans. Thus, a small number of polymorphic residues areresponsible for the great diversity of the HLA-A2 bindinggroove. This implies that large-scale screening of candidateproteins for promiscuous T-cell epitopes can only succeedusing statistical computational models. Ideally, selected can-didates can be further validated by molecular modellingbefore experimental testing. Nine major supertypes of HLAclass I peptides have been defined; namely, A1, A2, A3, A24,B7, B27, B44, B58 and B62.32 Thus, nine supertype-specificmodels will be required for the prediction of peptide bindingto a vast majority of HLA-A and HLA-B variants. Ourpreliminary results for the HLA-A3 supertype indicate thatthe ‘supertype prediction’ approach can be successfullyapplied to multiple HLA supertypes.

Peptide vaccine development is rapidly progressing.Recent developments include reports of successful malaria33

and antitumour34 vaccines based on peptides. Both thesevaccines comprised peptides from natural antigens (circum-sporozoite protein of P. falciparum 282–383, and HER-2/neu369–384, 688–703 and 971–984), which elicited strong CD4+

and CD8+ lymphocyte responses. The challenge of peptidevaccines is to identify promiscuous T-cell epitopes that areeffective in a large proportion of the human population.Candidate peptide vaccines are often preselected usingbinding motifs or computational methods of limited value.The majority of publicly available methods have not beenproperly assessed for predictive accuracy. Identification ofT-cell epitope candidates for vaccine design requires sensitiveand specific prediction methods so that vaccines can beconstructed to suit the immunological profiles of patients.Earlier, we reported a method for cyclical refinement ofcomputational models for peptide-binding prediction.11 Self-refining computational methods, combined with accuratemethods for prediction of promiscuous peptides, provide a setof computational tools that will facilitate systematic screeningof peptides for future vaccine formulation.

References

1 Yewdell JW, Bennink JR. Cut and trim: generating MHC class Ipeptide ligands. Curr. Opin. Immunol. 2001; 13: 13–18.

2 Ferrari G, Kostyu DD, Cox J et al. Identification of highlyconserved and broadly cross-reactive HIV type 1 cytotoxic Tlymphocyte epitopes as candidate immunogens for inclusionin Mycobacterium bovis BCG-vectored HIV vaccines. AIDSRes. Hum. Retroviruses 2000; 16: 1433–43.

3 Singh RR. The potential use of peptides and vaccination to treatsystemic lupus erythematosus. Curr. Opin. Rheumatol. 2000; 12:399–406.

4 Haselden BM, Kay AB, Larche M. Peptide-mediated immuneresponses in specific immunotherapy. Int. Arch. Allergy Immu-nol. 2000; 122: 229–37.

5 Wang E, Phan GQ, Marincola FM. T-cell-directed cancer vac-cines: the melanoma model. Expert Opin. Biol. Ther. 2001; 1:277–90.

6 Rammensee H, Bachmann J, Emmerich NP, Bachor OA,Stevanovic S. SYFPEITHI database for MHC ligands andpeptide motifs. Immunogenetics 1999; 50: 213–19.

7 Nijman HW, Houbiers JG, Vierboom MP et al. Identification ofpeptide sequences that potentially trigger HLA-A2.1-restrictedcytotoxic T lymphocytes. Eur. J. Immunol. 1993; 23: 1215–19.

8 Parker KC, Bednarek MA, Coligan JE. Scheme for rankingpotential HLA-A2 binding peptides based on independentbinding of individual peptide side-chains. J. Immunol. 1994,1994; 152: 163–75.

9 Brusic V, Rudy G, Harrison LC. Prediction of MHC bindingpeptides using artificial neural networks. In: Stonier RJ, Yu XSeds. Complex Systems: Mechanism of Adaptation, Amsterdam:IOS Press, 1994; 253–60.

10 Mamitsuka H. Predicting peptides that bind to MHC moleculesusing supervised learning of hidden Markov models. Proteins1998; 33: 460–74.

11 Brusic V, Bucci K, Schönbach C, Petrovsky N, Zeleznikow J,Kazura JW. Efficient discovery of immune response targets bycyclical refinement of QSAR models of peptide binding. J. Mol.Graph. Model. 2001; 19: 467.

12 Borras-Cuesta F, Golvano J, Garcia-Granero M et al. Specificand general HLA-DR binding motifs: comparison of algorithms.Hum. Immunol. 2000; 61: 266–78.

13 Schueler-Furman O, Elber R, Margalit H. Knowledge-basedstructure prediction of MHC class I bound peptides: a study of23 complexes. Fold. Des. 1998; 3: 549–64.

14 Southwood S, Sidney J, Kondo A et al. Several common HLA-DR types share largely overlapping peptide binding repertoires.J. Immunol. 1998; 160: 3363–73.

15 Sturniolo T, Bono E, Ding J et al. Generation of tissue-specificand promiscuous HLA ligand databases using DNA microarraysand virtual HLA class II matrices. Nat. Biotechnol. 1999; 17:555–61.

16 Kobayashi H, Lu J, Celis E. Identification of helper T-cellepitopes that encompass or lie proximal to cytotoxic T-cellepitopes in the gp100 melanoma tumor antigen. Cancer Res.2001; 61: 7577–84.

17 Panigada M, Sturniolo T, Besozzi G et al. Identification of apromiscuous T-cell epitope in Mycobacterium tuberculosis Mceproteins. Infect. Immun. 2002; 70: 79–85.

18 Threlkeld SC, Wentworth PA, Kalams SA et al. Degenerate andpromiscuous recognition by CTL of peptides presented by theMHC class I, A3-like superfamily: implications for vaccinedevelopment. J. Immunol. 1997; 159: 1648–57.

19 del Guercio MF, Sidney J, Hermanson G et al. Binding of apeptide antigen to multiple HLA alleles allows definition of anA2-like supertype. J. Immunol. 1995; 154: 685–93.

20 Sidney J, Grey HM, Southwood S et al. Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum. Immunol.1996; 45: 79–93.

21 Sidney J, Southwood S, del Guercio MF et al. Specificity anddegeneracy in peptide binding to HLA-B7-like class I molecules.J. Immunol. 1996; 157: 3480–90.

22 Sidney J, Southwood S, Mann DL, Fernandez-Vina MA,Newman MJ, Sette A. Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertypemolecules. Hum. Immunol. 2001; 62: 1200–16.

23 Robinson J, Malik A, Parham P, Bodmer JG, Marsh SG. IMGT/HLA database – a sequence database for the human major histo-compatibility complex. Tissue Antigens 2000; 55: 280–7.

24 Brusic V, Rudy G, Harrison LC. MHCPEP, a database of MHC-binding peptides: update 1997. Nucleic Acids Res. 1998; 26:368–71.

25 Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL,Wiley DC. Structure of the human class I histocompatibility anti-gen, HLA-A2. Nature 1987; 329: 506–12.

26 Chelvanayagam G. A roadmap for HLA-A, HLA-B, and HLA-Cpeptide binding specificities. Immunogenetics 1996; 45: 15–26.

Page 6: Prediction of promiscuous peptides that bind HLA class I molecules

Prediction of promiscuous peptides 285

27 Baldi P, Brunak S. Bioinformatics. The Machine LearningApproach. Cambridge: MIT Press, 1998.

28 Eddy S. Profile hidden Markov models. Bioinformatics 1998; 14:755–63.

29 Eddy S, Birney E. HMMER User Guide. Biological SequenceAnalysis Using Profile Hidden Markov Models, Version 2 (on-line). Available from: ftp://genetics.wustl.edu/pub/eddy/hmmer/CURRENT/Userguide.pdf.

30 Henikoff S, Henikoff JG. Amino acid substitution matrices fromprotein blocks. Proc. Natl Acad. Sci. USA 1992; 89: 10915–19.

31 Swets JA. Measuring the accuracy of diagnostic systems.Science 1988; 240: 1285–93.

32 Sette A, Sidney J. Nine major HLA class I supertypes accountfor the vast preponderance of HLA-A and -B polymorphism.Immunogenetics 1999; 50: 201–12.

33 Lopez JA, Weilenman C, Audran R et al. A synthetic malariavaccine elicits a potent CD8 (+) and CD4 (+) T lymphocyteimmune response in humans. Implications for vaccination strate-gies. Eur. J. Immunol. 2001; 31: 1989–98.

34 Knutson KL, Schiffman K, Disis ML. Immunization with aHER-2/neu helper peptide vaccine generates HER-2/neu CD8T-cell immunity in cancer patients. J. Clin. Invest. 2001; 107:477–84.