the effect of new parameters and increased database size on the cysteine oxidation prediction...

26
The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand California State University, Los Angeles August 23, 2007 HS CH 2 C COOH H NH 2

Upload: kenia-goodson

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction

Program

Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

California State University, Los AngelesAugust 23, 2007

HS CH2 C COOH

H

NH2

Page 2: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Overview• Cysteine Oxidation Prediction Program

(COPP) – Oxidation defined

• Biological Significance of Cysteine Oxidation

– Effects of oxidation on proteins

• Summer 2007 Goals– Increase database size– Add new parameters

• Methods and Results

HS CH2 C COOH

H

NH2

Page 3: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Cysteine Oxidation Prediction Program

• Goal: Create a program that will use physicochemical parameters to predict reactive surface cysteine thiols

• Methods:– Gather examples of proteins susceptible to

cysteine oxidation– Extract parameters from Protein Data Bank – Use computer classifier C4.5 to determine

rules that will predict if cysteine can become oxidized

Page 4: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Oxidation of Cysteines

Sanchez (2007)

Page 5: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Cysteine Prediction

Two types of cysteine oxidation:1. Permanent structural oxidation: cysteines

that form permanent disulfide bonds or bind to metals shortly after translation

– Prediction programs based on sequence already exist 88% accuracy (Martelli et al. 2002)

2. Reactive surface cysteine thiols: cysteines that become oxidized under certain conditions, most reversibly

– No prediction programs exist COPP

Page 6: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Biological Significance: Oxidation and Enzyme Function

• The active sites of glutaredoxin and thioredoxin cycle between reduced and oxidized states

http://www.cs.stedwards.edu/chem/Chemistry/CHEM43/CHEM43/Thioredox/RNA2.GIF

Page 7: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Enzyme Inactivation via Oxidation

• H2O2 inactivates PTEN tumor suppressor protein by causing the formation of a disulfide bond

Lee et al. JBC (2002)

Page 8: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Summer 2007 Goals

1. Increase the size of the COPP database

2. Test new parameters to determine if they affect the rules and accuracy of COPP

HS CH2 C COOH

H

NH2

Page 9: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Increase Database Size

• Previously:– 85 proteins that undergo non-structural

cysteine oxidation– 135 cysteines that undergo oxidation– 225 cysteines that remain reduced under

oxidizing conditions

• To create an accurate, general set of rules for cysteine oxidation requires a large, unbiased database

Page 10: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Methods: Increase Database Size 1. Search Entrez for keywords

• i.e. cysteine and oxidation, sulfenic acid, etc.

2. Look for proteins in Protein Data Bank

Potential Proteins

Page 11: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Increase Database Size

3. Do BLASTALL – eliminate proteins with:• Identity > 35%• E value < 1• Conserved cys

Potential Proteins

Cysteines Oxidize

Page 12: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Increase Database Size

C4.5/ J48

Original Proteins

Rules to Classify Cysteines

New Proteins

Page 13: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Results: New Proteins

S1 DISTANCE <= 6: 1 (88.19/17.0)S1 DISTANCE > 6| ASA (Å2) <= 1: 0 (136.51/6.51)| ASA (Å2) > 1| | N1 DISTANCE <= 5.2: 1 (32.54/9.0)| | N1 DISTANCE > 5.2| | | O1 ASA <= 2: 1 (33.0/15.0)| | | O1 ASA > 2: 0 (71.76/15.76)

+

5.2Å

-

Sanchez (2007)

• Increased database size caused reduction in rules

• Accuracy decreasedOld Rules New Rules

S1 DISTANCE <= 6.1: 1 (115.44/28.0)S1 DISTANCE > 6.1| ASA (Å2) <= 1.8: 0 (177.51/12.51)| ASA (Å2) > 1.8| | N1 DISTANCE <= 5.4: 1 (46.54/17.0)| | N1 DISTANCE >5.4: 0 (133.51/39.51)

81.8% Accuracy 79.1% Accuracy

Page 14: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Methods: Parameters already used by COPP

• S1 DISTANCE distance to nearest sulfur atom• S1 ASA area exposed to the surface• N1 DISTANCE distance to the nearest +nitrogen

atom• N1 DONOR nitrogen’s parent side chain• N1 ASA area exposed to the surface• O1 DISTANCE distance to the nearest -oxygen• O1 DONOR oxygen’s parent side chain• O1 ASA area exposed to the surface• ASA exposed surface of S in question • CLASS class: 0 if reduced; 1 otherwise

Page 15: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Methods: Parameters already used by COPP

• S1 DISTANCE distance to nearest sulfur atom• S1 ASA area exposed to the surface• N1 DISTANCE distance to the nearest +N atom• N1 DONOR nitrogen’s parent side chain• N1 ASA area exposed to the surface• O1 DISTANCE distance to the nearest -oxygen• O1 DONOR oxygen’s parent side chain• O1 ASA area exposed to the surface• ASA exposed surface of S in question • CLASS class: 0 if reduced; 1 otherwise

Page 16: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

New Parameters

• pKa: acid dissociation constant– How easily can the S lose a proton?

• Electrostatic Potential: potential energy per unit charge– How well stabilized is the charged S after the

proton is lost?

-S CH2 C COOH

H

NH2

H

Page 17: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Methods: New Parameters • PCE: Protein Continuum Electrostatics

– Calculates Electrostatic Potential

Coordinates Electrostatic Potential

7.854 0.668 -0.602 -10.725 4.683 1.223 3.305 -25.413 3.330 8.072 3.708 -19.335 2.256 -11.243 9.879 -21.887 14.014 7.907 3.298 -13.670

Miteva et al. NAR (2005)http://bioserv.rpbs.jussieu.fr/cgi-bin/PCE-Pot

Page 18: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

New Parameters• PROPKA

– Calculates pKa

Li et al. Proteins (2005)http://propka.ki.ku.dk/

Page 19: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

New Parameters

C4.5/ J48

Electrostatic Potential and pKa Data

Original Data

Rules to Classify Cysteines

Page 20: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Results: New Parameters

• New parameters caused an alteration in the final rule

• The accuracy is similar

Old ParametersS1 DISTANCE <= 6.1: 1 (115.44/28.0)S1 DISTANCE > 6.1| ASA (Å2) <= 1.8: 0 (177.51/12.51)| ASA (Å2) > 1.8| | N1 DISTANCE <= 5.4: 1 (46.54/17.0)| | N1 DISTANCE >5.4: 0 (133.51/39.51)

79.0698% Accuracy

S1 DISTANCE <= 6.1: 1 (115.44/28.0)S1 DISTANCE > 6.1| ASA (Å2) <= 1.8: 0 (177.51/12.51)| ASA (Å2) > 1.8| | pKa of S0 <= 8.75: 1 (74.29/32.0)| | pKa of S0 > 8.75: 0 (105.76/26.76)

New Parameters

78.6469% Accuracy

Page 21: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Conclusions

• New Proteins:– A larger database results in a more general,

but less accurate, set of rules

• New Parameters:– A low pKa value correlates with oxidation, but

does not improve the accuracy of COPP

• Future Goals:– Make COPP publicly available– Modify COPP to predict type of oxidation

Page 22: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

With many thanks to. . .

Dr. Jamil Momand, Ricardo Sanchez, and the rest of the Momand lab

SoCalBSI fellow students and mentors

California State University at Los Angeles

Funding from:

LA Orange County Biotechnology Center

Page 23: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand
Page 24: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

66

68

70

72

74

76

78

80

82

84

0 10 20 30 40 50 60

M-value

Co

rre

ctl

y C

las

sif

ied

Ins

tan

ce

s (

%)

0

5

10

15

20

25

30

35

40

45

Tre

e S

ize

Correctly Classified

Tree Size

Page 25: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False Positive Rate

Tru

e P

os

itiv

e R

ate

M2

M3

M4

M5

M6

M7

M8

M9

M10

M11

M12

M13

M14

M15

M16

M17

M18

M19

M20

M21

M22

M23

M24

M25

M26

M27

M28

M29

M30

M31

M32

M33

M34

M35

M36

M37

M38

M39

M40

M41

M42

M43

M44

M45

M46

M47

M48

M49

M50

Page 26: The Effect of New Parameters and Increased Database Size on the Cysteine Oxidation Prediction Program Megan Riddle with Ricardo Sanchez and Dr. Jamil Momand

Results: New Proteins

• Increased database size caused reduction in rules

• Accuracy decreased

S1 DISTANCE <= 6: 1 (88.19/17.0)S1 DISTANCE > 6| ASA (Å2) <= 1: 0 (136.51/6.51)| ASA (Å2) > 1| | N1 DISTANCE <= 5.2: 1 (32.54/9.0)| | N1 DISTANCE > 5.2| | | O1 ASA <= 2: 1 (33.0/15.0)| | | O1 ASA > 2: 0 (71.76/15.76)

81.8% Accuracy 79.1% Accuracy

Old Rules New RulesS1 DISTANCE <= 6.1: 1 (115.44/28.0)S1 DISTANCE > 6.1| ASA (Å2) <= 1.8: 0 (177.51/12.51)| ASA (Å2) > 1.8| | N1 DISTANCE <= 5.4: 1 (46.54/17.0)| | N1 DISTANCE >5.4: 0 (133.51/39.51)

Cys-SH HS-Cys