algorithmic and artificial intelligence methods for protein bioinformatics (pan/algorithmic and...

14
CHAPTER 11 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES VIA COMPUTATIONAL APPROACHES AIGUO DU, HUI LIU, HAI DENG, and YI PAN 11.1 INTRODUCTION Cysteine is one of the few amino acids that contain sulfur. The free thiol group allows it to bond with another cysteiene residue to form disulfide bond and help maintain the structure of proteins. The thiol group on cysteine residues is nucleo- philic and easily oxidized. Because of this reactivity, cysteine residues serve numerous biological functions such as activation of certain biological activities [1], DNA binding [2], and reproductive systems [3] as well as the aging process of proteins [4]. As was shown in Figure 11.1, cysteine residues can be found in two chemical states in proteins: oxidized state and reduced state. The two states are interchangeable when proper conditions are met. When in reduced form, cysteine undergoes chemical reactions such as alkylation [5,6], oxidation [7], or forming complex compounds with metal ions [8]. These chemical reactions play critical biological roles such as activation, deactivation of the active sites of enzymes, and altering the local environment of the proteins. In their oxidized form, two cysteine residues form disulfide bond and enable more complicated protein structures [9,10] and functions [11,12]. The disulfide bond is the covalent bond formed between two cysteine residues on protein chains. The formation of disulfide bonds is a critical step for some of the membrane and secreted proteins in both eukaryotic and prokaryotic cells. Disulfide bonds are considered one of the elements of protein tertiary structure and directly contribute to the stability of the protein [13]. Figure 11.2 illustrates protein in 3D with the disulfide bonds denoted in light green. Disulfide bonds have been generally classified into three categories: struc- tural disulfide bonds, catalytic disulfide bonds, and allosteric disulfide bonds. Algorithmic and Artificial Intelligence Methods for Protein Bioinformatics. First Edition. Edited by Yi Pan, Jianxin Wang, Min Li. © 2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc. 217

Upload: min

Post on 29-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

CHAPTER 11

PROGRESS IN PREDICTIONOF OXIDATION STATES OF CYSTEINESVIA COMPUTATIONAL APPROACHES

AIGUO DU, HUI LIU, HAI DENG, and YI PAN

11.1 INTRODUCTION

Cysteine is one of the few amino acids that contain sulfur. The free thiol groupallows it to bond with another cysteiene residue to form disulfide bond and helpmaintain the structure of proteins. The thiol group on cysteine residues is nucleo-philic and easily oxidized. Because of this reactivity, cysteine residues servenumerous biological functions such as activation of certain biological activities[1], DNA binding [2], and reproductive systems [3] as well as the aging processof proteins [4]. As was shown in Figure 11.1, cysteine residues can be found intwo chemical states in proteins: oxidized state and reduced state. The two statesare interchangeable when proper conditions are met. When in reduced form,cysteine undergoes chemical reactions such as alkylation [5,6], oxidation [7],or forming complex compounds with metal ions [8]. These chemical reactionsplay critical biological roles such as activation, deactivation of the active sitesof enzymes, and altering the local environment of the proteins. In their oxidizedform, two cysteine residues form disulfide bond and enable more complicatedprotein structures [9,10] and functions [11,12].

The disulfide bond is the covalent bond formed between two cysteine residueson protein chains. The formation of disulfide bonds is a critical step for someof the membrane and secreted proteins in both eukaryotic and prokaryotic cells.Disulfide bonds are considered one of the elements of protein tertiary structureand directly contribute to the stability of the protein [13]. Figure 11.2 illustratesprotein in 3D with the disulfide bonds denoted in light green.

Disulfide bonds have been generally classified into three categories: struc-tural disulfide bonds, catalytic disulfide bonds, and allosteric disulfide bonds.

Algorithmic and Artificial Intelligence Methods for Protein Bioinformatics. First Edition.Edited by Yi Pan, Jianxin Wang, Min Li.© 2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.

217

218 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

(a)

SH S SSH

Oxidation

Reduction

(b)

Figure 11.1 Reduced form (a) and oxidized form (b) of cysteine residues(twoStates.png).

Figure 11.2 Illustration of disulfide bonds (shown in light green) on protein(disulfideBond.png).

An important function of disulfide bonds is that they constrain the distal por-tions of protein chain and reduces the entropy of the unfolded molecule [14]and hence influence the thermodynamics of protein folding. Disulfide bonds alsostabilize the folded state of a protein chain [14]. The stabilization of the proteinstructure helps in reducing protein damage in the presence of oxidants and pro-teolytic enzymes. Because of their biological importance, knowing the disulfideconnectivity in the protein is essential for studying protein structure. Knowledgeof the oxidation states of individual cysteines is the first step toward acquiringknowledge on disulfide bonds connectivity and hence inspired significant interestin developing effective and economical ways of finding the oxidation states ofindividual cysteines.

The bonding state of cysteines can be obtained both directly via lab methodsor indirectly by information derived from sequence information. Experimentally,the determination of bonding states of cysteine and disulfide bonds often involvesuse of costly biomarker and thiol reagents and techniques such as electrophoresis[15], mass spectroscopy [16], and HPLC [17]. These experimental approachesare normally time-consuming and dependent on expensive equipment. Therefore,many researchers resort to computational approaches. Predicting the oxidationstates of cysteines computationally usually involves first extracting commonalitiesfrom the available data obtained by observation or statistical analysis, or utilizingartificial intelligence. Then the acquired knowledge is applied to sequences with

PREVIOUS EFFORTS TO PREDICT BONDING STATE OF CYSTEINE RESIDUES ON PROTEIN 219

Figure 11.3 Information discovery via artificial intelligence (dataMining.png).

the bonding states unknown. Accuracy as calculation was based on the percentageof correct predictions among all test cases. To date, artificial intelligence achievesthe highest accuracy among all computational methods attempted. Figure 11.3is a flowchart of the prediction process via artificial intelligence. First, knowndata are extracted from database. Redundancy or over representation of certainprotein families were avoided by careful selection and sampling. Then sequenceinformation is transformed and reduced to fit the purpose of the data miningstep, where relevant features are encoded and used to train the learning machine.After proper training, the resulted decision rules are used to evaluate to for thetest data.

11.2 SURVEY OF PREVIOUS EFFORTS TO PREDICTBONDING STATE OF CYSTEINE RESIDUES ON PROTEINVIA COMPUTATIONAL APPROACHES

In this chapter we surveyed ∼15 research efforts published since the 1980s inpredicting cysteine oxidation states via computational methods, from the initialattempt using neural networks [18] to the present state of the art [19]. Most ofthese research efforts have reported prediction accuracy, although some of themused datasets that differed from the others. The essential information used forthe predictions was first summarized all to determine the factors impacting theproblem. In the following paragraphs we discuss prediction efforts in detail.

11.2.1 Major Factors Influencing Prediction ofCysteine Oxidation States

Figure 11.4 shows the information used in cysteine oxidation state prediction.Sequences local to the bonded cysteines provide a local chemical physical

220

Figure 11.4 Statistics on information used for predicting oxidation states of cysteines (featuresUsedStatistics.png).

PREVIOUS EFFORTS TO PREDICT BONDING STATE OF CYSTEINE RESIDUES ON PROTEIN 221

environment for the enzyme-mediated process of disulfide bond formation;therefore, it is critical for prediction. Out of the 15 research efforts surveyed, 12(80%) included local sequence as input to their algorithms. Two of the earlyresearch efforts are based exclusively on the local flanking sequence [18,20].Local sequence information usually consists of amino acid composition, charge,and hydrophobicity of the neighboring amino acids. Except for the flankingamino acids on the protein sequence, spatially neighboring amino acids arealso considered as part of the local sequence information in some more recentresearch efforts [21].

Global sequence information was perceived as a good indicator of the proteintype and ranks as the second most popular input feature used in prediction algo-rithms. Features such as the amino acid composition of the whole chain, chainlength, and total number of cysteine on the chain belong to the global sequenceinformation category.

Because of their biological importance, disulfide bonds are considered highlyconserved features in proteins. Thus evolution information derived from multiple-sequence alignment is utilized in much of the research on oxidation state predic-tion. Addition of evolution information has shown improved prediction accuracyby significant percentage [e.g., 22].

Experiments have indicated that subcellular location of proteins is relevant todisulfide bond formation as well as stabilization. For example, disulfide bonds arehardly found in cytoplasm, and for eukaryotes, disulfide bond formation occursmostly in the lumen of endoplasmic reticulum. By including the subcellular local-ization information as input, Savojardo et al. improved prediction accuracy wasachieved for eukaryotes [19].

Other characteristics have also been explored such as the ‘none-or-all rule’derived from direct observations of the test datasets, cysteine oxidation statetransition automaton, dipeptide composition calculation, cysteine oxidation statepattern, normalized order of cysteines, the presence of signal peptide, and sec-ondary structure information. Some of these features were used as an additionalfeature to train the learning machines. Some are used to fine-tune or correct theprediction results, including cysteine oxidation transition automaton [23–25] andthe rules for infeasible disulfide bond [25] shown in Figure 11.5. Although notas popularly used as the local and global information and the evolution featuresmentioned above, each addition of these new features or new rules was reportedto contribute to the overall prediction accuracy.

11.2.2 Major Efforts in Prediction of Cysteine Oxidation States

Figure 11.6 summarizes the efforts in predicting cysteine oxidation states. Sincethe 1980s, various computation techniques have been applied to the oxidationstate prediction problem, ranging from single-layer neural networks to multistagesupport vector machines with increasingly relevant features added to the com-putation. As a result, prediction accuracy has increased from 71% for relatively

222 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

Start

FoBo

Be

End

Fe

(a) (b)

Rule 2: if multipledisulfide bonds ontwo beta sheets,the tend to alignparallel to preservestructure.

Rule 1: Cysfrom samebeta-sheetdo not fromdisulfidebond

CYS

CYS

CYS

CYS

CYS

CYS

Figure 11.5 Fine-tuning steps reported in the literature: (a) cysteine oxidation transitionautomaton to limit the number of oxidized cysteines to even numbers; (b) rules forinfeasible bonds (fineTunes.png).

small datasets to 93% for larger and more inclusive datasets. The details of theseresearch are discussed in detail in the following paragraphs.

11.2.2.1 Predicting Cysteine Bonding State from Local Amino AcidSequence Only Muskal et al. [18] trained simple feedforward neural networkswith the flanking sequences around cysteine. This is the first effort reported inthe literature to tackle the topic. The assumption was that local sequence isthe determinant factor for the oxidation states of cysteines. Accuracy of 80%was achieved on 30 randomly selected proteins (15 sequences from sequencessurrounding disulfide-bonded cysteines and 15 from non-disulfide-bonded) froma pool of 689. The window size of flanking sequences were also studied, and itwas found that with increased window size, accuracy increases, but memorizationbecomes apparent after window size of 12 (6 before cysteine and 6 after).

Noticing the influence of flanking amino acids, Fiser et al. [20] took a statis-tical approach and analyzed a larger population of amino acids around cysteineresidues (they studied 37,000 flanking residues vs. 9,000 residues studied in aprevious effort). For each flanking sequence from −10 to +10 relative to thecysteine residue in consideration, the occurrence of the 20 amino acids werecounted, and ratios of appearance as bonded and free cysteines were calculatedon the basis of their appearance frequency. Finally, the disulfide forming potential

223

Year & Reference1990 (18) Feed forward neural networks

Statistical analysis

Jury of neural network

Two stage SVM

Two stage SVM

Three stage SVM

SVM

GRHCRFs

45 50 55 60 65 70

% Best claimed prediction accuracy % Best claimed

71.00 94.20

75 80 85 90 95 100

Support vector machine

Association rule discovery algorithm

BRNN on chain level + Viterbi algorithm on individual cysteines

Linear function of dipeptide composition

Logistic functions

Hybrid of hidden Markov model and neural networks

All or none rule

1992 (20)

1999 (22)

2000 (26)

2002 (28)

2002 (23)

2002 (27)

2004 (29)

2004 (31)

2004 (30)

2004 (21)

2006 (24)

2010 (32)

2010 (25)

2011 (18)

Figure 11.6 Research efforts on prediction of cysteine oxidation states since the 1980s (methodsSurvey.png).

224 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

for a cysteine containing segment was calculated as the product of the corre-sponding values of ratio of appearance of all flanking amino acids. The accuracyfor prediction is 71% with 1.0 as the threshold of the overall accuracy. Thisresearch effort further confirmed that the flanking sequences around bonded andfree cysteines differ substantially and are applicable in prediction of bonding stateof cysteines.

11.2.2.2 Predicting Cysteine Bonding State from Evolution Informa-tion and Local Sequence Information Fariselli et al. [22] trained a neuralnetwork–based predicator to differentiate the bonding states of cysteine in pro-tein chain in terms of both the flanking sequence and the evolution informationvia multiple-sequence alignment. Eight different encodings were used in thisresearch. With only the local amino acid composition encoded, the predictionaccuracy is ∼ 72%. Addition of evolution information such as multiple sequenceplus charge, hydrophobicity, conservation weight, and relative entropy gave animproved accuracy of 80.1%. Finally, a jury of neural networks further improvedprediction accuracy to 81%.

11.2.2.3 Predicting Cysteine Bonding State by Multiple SequenceAlignment Plus the ‘‘All or None’’ Rule Fiser et al. [26] examined twodatasets (81 protein chains from PDB and 233 protein chains, respectively, bothfrom PDB) and found that only very few (2–4%) of the protein contain cysteinesin both bonding states. In the majority of proteins containing cysteine residues,the cysteines are in either all-bonded state or in all-free state. This observation,combined with a conservation score calculated from physicochemical propertiesof the flanking amino acid, yielded a prediction approach with 82% in predictionaccuracy. The dataset used here relatively small (81 protein chains included).

11.2.2.4 Predicting Cysteine Bonding State by Global Sequence Infor-mation Mucchielli-Giorgi et al. [27] used information on either the residuesflanking the cysteines or the amino acid content of the whole protein chain totrain single logistic function. Two descriptors were used to train the function: (1)the cysteine environment descriptor corresponding to the local flanking sequenceand (2) the protein descriptor. The latter was calculated as Pr = log (f

l ,k/f

l),

where fl ,k

is the occurrence frequency of amino acid l in this protein chain and fl

is the total number of occurrences of this amino acid in all protein chains in theDB. The prediction performance rates of the descriptors were compared and theweights associated with the logistic function analyzed. Final results showed thatthe local sequence environment of the cysteines contains less information thanthe global amino acid composition of the protein chain. The prediction accuracyin a dataset of 559 protein chains was ∼84%.

11.2.2.5 Cysteine Bonding State Prediction via Combined Global andLocal Sequence Information Martelli et al. [23] designed a hybrid sys-tem based on the hidden Markov model (HMM) and neural networks (NN).

PREVIOUS EFFORTS TO PREDICT BONDING STATE OF CYSTEINE RESIDUES ON PROTEIN 225

A standard feedforward neural network was first trained with the local sequenceof 27 residue-long input window around the cysteine. Then a Markov model withN states connected by means of transition probabilities is applied. The transitionbetween each possible state is restrained only to those that make sense. (If thechain contains disulfide bonds, then the number of bonded cysteine is boundedto be even. Transitions ending up with odd number of boned cysteines are notpossible with the transition chart.) The prediction accuracy was up to 88% withthis hybrid system.

11.2.2.6 Cysteine Bonding State Prediction Using a Two-Stage SVMArchitecture Because cysteines and half-cysteines rarely occur simultaneouslyin the same protein chain, Frasconi et al. [28] designed a system based on SVMarchitecture with two cascaded classifiers. The first classifier predicts the type ofprotein (all, none, or mix, indicating all, none, or a mixed number of oxidizedcysteines on the protein chain) based on the whole sequence. In this stage theinputs include amino acid composition of the protein chain, the length percentageof the protein chain compared with that of the training set (calculated as length ofprotein under consideration divided by the average protein length in the trainingset). Also, a flag is encoded in input features indicating whether the number ofcysteine count is odd. According to the output of the first classifier, the secondstage classifier further classifies cysteines from those protein chains that weremarked as mix. The stage 2 classifier uses the multiple-sequence alignment profileof the local input window to predict the bonding states of each individual cysteineon the chain. The prediction accuracy reported was 83.6% for a test set of 716chains from 2001 PDB database.

11.2.2.7 Cysteine Bonding State Prediction Using a Combination ofKernel Machines Ceroni et al. [29] extended the two-stage SVM architecturedescribed above with a combination of kernel machines. First, a kernel machinebased on the spectrum kernel was adopted and whole-protein sequence character-istics were used as input to this stage. In second stage, evolution information inthe form of multiple-sequence alignments around cysteines were used. The accu-racy was improved to 84.5% for the same protein dataset described in Section11.2.2.7.

11.2.2.8 Cysteine Bonding State Prediction Based on Dipeptide Com-position Song et al. [30] implemented a two-class predicator based on dipep-tide composition information to improve the bonding state prediction accuracy to89.1% for 8114 cysteine-containing protein chains selected from PISCES culledPDB. The linear function of the probability of dipeptides Pab (which can bethe same or a different amino acid) is used to calculate characteristic index Q

k,

specifically, Qk = ∑ab VabPab

(k), where Vab is constant for all proteins.

11.2.2.9 Signal Peptide as a Strong Indicator for Cysteine BondingStates in Extracellular Proteins To gain full understanding of the biological

226 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

factors during the process of disulfide bonds formation in cells, Tessier et al. [21]collected sequential and spatial neighborhood information, structural information(such as secondary structure information and the calculated hydrophobic regionson the chain), and evolutionary information as well as other important informationabout the protein such as the presence of a signal peptide, parity of the cysteines,and the subcellular location of the protein. By applying the associating rulediscovery algorithm a priori, it was observed that for extracellular proteins, thepresence of a signal peptide is a strong descriptor of the bonding state of cysteines.This rule can be further reinforced when there are an even number of cysteineson the chain. Such an association was only found in extracellular proteins andnot valid in membrane proteins and proteins from other compartments.

11.2.2.10 Cysteine Bonding State Prediction Using SVM Based onLocal and Global Sequences Combined with Cysteine State SequenceChen et al. [31] integrated all the ideas described in earlier research efforts asinputs to a two-stage approach based on SVM. The decision value from the firstSVM is normalized by the arctan transfer function before being used to predict thebonding state prediction. The second stage used the branch-and-bound algorithmto optimize the probability of cysteine state sequences (CSSs) while constrainthe number of oxidized cysteines to an even number. With the local sequenceinformation and the global amino acid composition alone, with SVM alone, anaccuracy of 86% was obtained. Further coupling with the CSS yielded up to 90%in the overall prediction accuracy.

11.2.2.11 Cysteine Bonding State Prediction Using SVM and Bidirec-tional Recurrent Neural Networks Ceroni et al. [24] employed SVM binaryclassifier to predict the overall cysteine bonding states for the whole chain (noton an individual cysteine basis), followed by a refinement step with bidirectionalrecurrent neural network (BRNN). For each cysteine, the BRNN output was com-puted using the logistic function. Then the number of the bonded cysteine wasenforced to be even using a finite-state automaton. This approach achieved anaccuracy of 88%.

11.2.2.12 Cysteine Bonding State Prediction with SVM Based on Pro-tein Types Lin et al. [25] used a SVM-based two-stage system to bring theprediction accuracy to 94.2%. In the first stage, an iterative protein level “type”(none, mix, or all) classification was performed. Each iteration uses the out-put probabilities of SVM of the previous iteration as the new feature until theresult converges. The second stage is the mix classification using SVM for thosechains that were classified as mix chain. Inputs to the SVM in this stage includeposition-specific scoring matrix (PSSM), normalized order of the cysteine in theprotein chain, normalized length of protein length. In addition, a procedure calledsimple tune (Fig. 11.5b) is applied on the output from SVM. The simple tunestep is based on two assumptions: (1) disulfide bonds in one chain do not locatein the same β sheet and (2) in case there are multiple disulfide bonds on two

SUMMARY 227

peptide chains of a β sheet, the disulfide bonds that are parallel to each otherwould be biologically favored so as to preserve the protein structure. The simpletune step consists of four tuning steps: boundary adjustment, oxidized inversion,reduced inversion, and odd–even revision. These fine-tune steps was reported tonoticeably improve the prediction accuracy.

11.2.2.13 Prediction of Cysteine Oxidation States Using Three-StageSVM Guang et al. [32] proposed a three-stage SVM to predict the oxidationof cysteines: (1) protein chains are classified into ‘none’ or ‘have’ categoriesto indicate whether there are disulfide bonds on the protein chain; (2) the havesequences are further classified to mix and all, indicating whether all cysteineson the chain are in the oxidized form; and (3) cysteiens in the ‘mix’ sequencesare analyzed with respect to a sliding window centered by cysteine. Featuresused as input to SVM include amino acid composition, local sequence aroundcysteines, evolution information, and secondary structure information as well asthe existence of signal peptide from protein annotation. Accuracies of 90.05%,96.36%, and 80% were achieved for the three stages of prediction, respectively.

11.2.2.14 Cysteine Oxidation State Prediction with GrammaticallyRestrained Hidden Conditional Random Fields (GRHCRFs) Based onProtein Subcellular Localization and Position-Specific Scoring Matrix(PSSM) Motivated by the fact that subcellular localization provides a suitableambient environment for redox reaction of cysteines to occur, Savojardo et al.[19] included predicted protein subcellular localization in addition to othersequence information such as PSSM. It was found that inclusion of proteinsubcellular localization improves the prediction accuracy by two or threepercentage points. The machine learning technique used here is GRHCRFs,which was found to perform better than other machine learning methods [33].The prediction accuracy reached 93%. At the protein level, the predictionaccuracy is reported to be 86%. It’s worth noting that the dataset PDBCYS usedin this study has excluded the trivial cases and that protein chains containing asingle cysteine are not present.

11.3 SUMMARY

Since the 1980s, tremendous progresses have been made in the area of predict-ing oxidation states of cysteines. From simply utilizing the local sequences andsingle-layer neural networks to a broad spectrum of protein characteristics com-bined with multistage machine learning algorithms, researchers have improvedthe prediction accuracy from 71% for a relatively small dataset containing ≤93%accuracy for a larger and more selective dataset excluded proteins with singlecysteine (in which case, the oxidation states for those cysteines would have beenobvious without performing complicated calculations). Research on the predic-tion of oxidation states of cysteines is surveyed, and the details of each study

228 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

are discussed in this chapter. With the rapid advancement in biological researchand better understanding of the mechanism of protein oxidative folding [34,35],prediction of oxidation states of cysteines on protein chains is more selective interms of substrate (eukaryotes vs. prokaryotes or, e.g., whether catalyzed by acertain enzyme family) and focuses more attend on the mechanism of disulfidebond formation. This chapter is intended to serve as a reference source for futuredevelopments in the most recent efforts.

REFERENCES

1. Chamberlain LH, Burgoyne RD, Activation of the ATPase activity of heat-shockproteins Hsc70/Hsp70 by cysteine-string protein, Biochem. J. 322(3):853–858 (1997).

2. McBride AA, Klausner RD, Howley PM, Conserved cysteine residue in theDNA-binding domain of the bovine papillomavirus type 1 E2 protein confersredox regulation of the DNA- binding activity in vitro, Proc. Natl. Aca. Sci , USA89(16):7531–7535 (1992).

3. Hatch TP, Allan I, Pearce JH, Structural and polypeptide differences betweenenvelopes of infective and reproductive life cycle forms of Chlamydia spp.,J. Bacteriol . 157(1):13–20 (1984).

4. Berlett BS, Stadtman ER, Protein oxidation in aging, disease, and oxidative stress,J. Biol. Chem. 272(33):20313–20316 (1997).

5. Kudo N, Matsumori N, Taoka H, Fujiwara D, Schreiner EP, Wolff B, Yoshida M,Horinouchi S, Leptomycin B inactivates CRM1/exportin 1 by covalent modifica-tion at a cysteine residue in the central conserved region, Proc Natl. Acad. Sci.96(16):9112–9117 (1999).

6. Zhang Z-Y, Dixon JE, Active site labeling of the Yersinia protein tyrosine phos-phatase: The determination of the pKa of the active site cysteine and the function ofthe conserved histidine 402, Biochemistry 32(8):9340–9345 (1993).

7. Arne ES and Holmgren A, Physiological functions of thioredoxin and thioredoxinreductase, Eur J. Biochem. 267(2):6102–6109 (2000).

8. Vallee BL, Auld DS, Zinc coordination, function, and structure of zinc enzymes andother proteins, Biochemistry 29(24):5647–5659 (1990).

9. Lehrer SS, Effects of an interchain disulfide bond on tropomyosin structure: intrinsicfluorescence and circular dichroism studies,. J Mol. Biol. 118(2):209–226 (1978).

10. Wagner DD, Lawrence SO, Ohlsson-Wilhelm BM, Fay PJ, Marder VJ, Topologyand order of formation of interchain disulfide bonds in von Willebrand factor, Blood69(1):27–32 (1987).

11. Reiter Y, Brinkmann U, Webber KO, Jung S-H, Lee B, Pastan I, Engineeringinterchain disulfide bonds into conserved framework regions of Fv fragments:improved biochemical characteristics of recombinant immunotoxins containingdisulfide-stabilized Fv, Protein Eng. 7(5):697–704 (1994).

12. Reiter Y, Brinkmann U, Jung SH, Lee B, Kasprzyk PG, King CR, Pastan I, Improvedbinding and antitumor activity of a recombinant anti-erbB2 immunotoxin by disulfidestabilization of the Fv fragment, J. Biol. Chem. 269(28):18327–18331 (1994).

REFERENCES 229

13. Horton HR, Moran LA, Ochs RS, Ravn JD, Scrimgeour KG, Principles of Biochem-istry, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, pp. 102

14. Wedemeyer WJ, Welkler E, Narayan M, et al, Disulfide bonds and protein folding,Biochemistry 39:4207–4216 (2000).

15. Huck CW, Bakry R, Bonn GK, Progress in capillary electrophoresis of biomarkersand metabolites between 2002 and 2005, Electrophoresis 27(1):111–125 (2005).

16. Kim SO, Merchant K, et al, OxyR a molecular code for redox-related signaling, Cell109(3):383–396 (2002).

17. Toyo’oka T et al, Amino acid composition analysis of minute amounts ofcysteinecontaining proteins using 4-(aminosulfonyl)-7-fluoro-2,1,3- benzoxadiazoleand 4-fluoro-7-nitro-2,1,3-benzoxadiazole in combination with HPLC, BiomedChromatogr. 1(1):5–20 (1986).

18. Muskal S, Holbrook S, Kim S, Prediction of the disulfide-bonding state of cysteinein proteins, Protein Eng. 3(8):667–672 (1990).

19. Savojardo C, Fariselli P, Alhamdoosh M, Martelli PL, Pierleoni A, and CasadioR, Improving the prediction of disulfide bonds in eukaryotes with machine learn-ing methods and protein subcellular localization, Bioinformatics: 27(16):2224–2230(2011).

20. Fiser AM, Cserzo ET, Simon I, Different sequence environments of cysteines andhalf cystines in proteins, application to predict disulfide forming residues, FEBS Lett.302:117 (1992).

21. Tessier D, Bardiaux B, Larre C, Popineau Y, Data mining techniques to study thedisulfide-bonding state in proteins: signal peptide is a strong descriptor, Bioinformatics20(16):2509–2512 (2004).

22. Fariselli P, Riccobelli P, Casadio R, Role of evolutionary information in predictingthe disulfide-bonding state of cysteine in proteins, Proteins 36:340 (1999).

23. Martelli PL, Fariselli P, Malaguti L, Casadio R, Prediction of the disulfidebonding state of cysteines in proteins at 88% accuracy, Protein Sci. 11:2735–2739(2002).

24. Ceroni A, Passerini A, Vullo A, Frasconi P, DISULFIND: A disulfide bondingstate and cysteine connectivity prediction server, Nucleic Acids Res. 34:W177–W181(2006).

25. Lin CY, Yang CB, Hor CY, and Huang KS, Disulfide bonding state prediction withSVM based on protein types, Proc. 5th Int. IEEE Conf., Bio-Inspired Computing:Theories and Applications (BIC-TA), Changsha, 2010, pp. 1436–1442.

26. Fiser A, Simon I, Predicting the oxidation state of cysteines by multiple sequencealignment, Bioinformatics 16:251 (2000).

27. Muccielli-Giorgi MH, Hazout S, Tuffery P, Predicting the disulfide bonding state ofcysteines using protein descriptors, Proteins 46:243–249 (2002).

28. Frasconi P, Passerini A, Vullo A, A two-stage SVM architecture for predicting thedisulfide bonding state of cysteines, Proc. 12th IEEE Workshop on Neural Networksfor Signal Processing , 2002, pp. 25–34.

29. Ceroni A, Frasconi P, Passerini A, Vullo A, Predicting the disulfide bonding state ofcysteines with combinations of kernel machines, J. VLSI Signal Process. 35:287–295(2003).

230 PROGRESS IN PREDICTION OF OXIDATION STATES OF CYSTEINES

30. Song JN, Wang ML, Li WJ, Xu WB, Prediction of the disulfide-bonding state of cys-teines in proteins based on dipeptide composition, Biochem. Biophys. Res. Commun.318:142–147 (2004).

31. Chen YC, Lin YS, Lin CJ, Hwang JK, Prediction of the bonding states of cysteinesusing the support vector machines based on multiple feature vectors and cysteinestate sequences, Proteins: Struct. Funct. Bioinformatics 55(4):1036–1042 (2004).

32. Guang X, Guo Y, Xiao J, Wang X, Sun J, Xiong W, Li M, Predicting the state ofcysteines based on sequence information, J. Theor. Biol. 267:312–318 (2010).

33. Savojardo C, Fariselli P, Martelli PL, Shukla P, Casadio R, Prediction of the bondingstate of cysteine residues in proteins with machine-learning methods, ComputationalIntelligence Methods for Bioinformatics and Biostatistics , Lecture Notes in ComputerScience, 6685, pp 98–111 (2011).

34. Mamathambika BS, Bardwell JC, Disulfide-linked protein folding pathways, Annu.Rev. Cell Devel. Biol. 24:211–235 (2008).

35. Sevier CS, New insights into oxidative folding, J. Cell Biol. 188:757–758 (2010).