advancement to candidacy computer science department by rachel karchin advisor: kevin karplus

58
Local Statistical Local Statistical Dependencies in Protein Dependencies in Protein Structure: Discovery, Structure: Discovery, Evaluation, Prediction Evaluation, Prediction and Applications and Applications Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

Upload: melora

Post on 09-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Local Statistical Dependencies in Protein Structure: Discovery, Evaluation, Prediction and Applications. Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus. Outline. Protein structure - primary, secondary, tertiary - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

Local Statistical Dependencies Local Statistical Dependencies in Protein Structure: Discovery, in Protein Structure: Discovery,

Evaluation, Prediction and Evaluation, Prediction and ApplicationsApplications

Advancement to Candidacy

Computer Science Department

by Rachel Karchin

Advisor: Kevin Karplus

Page 2: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

2

OutlineOutline

Protein structure - primary, secondary, tertiary

Fold recognition, local and secondary structure

Alphabets of local structureDesigning and evaluating local

structure alphabetsImproving fold recognition

Page 3: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

3

Molecular structure of proteinsMolecular structure of proteins

Proteins are large, organic molecules composed of smaller molecules called amino acids.

Ball-and-stick atomic model of Crambinplant seed protein with 44 amino acids

threonine cysteine arginine

Page 4: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

4

The amino acidsThe amino acids

There are 20 kinds ofamino acids found in natural proteins.

All share a common structure.

Biochemistry Mathews, 3ed. AddisonWesley

R side chain

carboxyl groupamine group

alpha carbon(with attached hydrogen)

Page 5: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

5

Primary structurePrimary structure

Proteins consist of one or more polypeptide chains of amino acids connected by peptide bonds.

The sequence of linked amino acids along the chain is called the protein’s primary structure.Phe-Leu-Ser-Cys . . .FLSC . . .

Access Excellence NHGRI Graphics Gallery

Page 6: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

6

Secondary structureSecondary structure

Symmetric patterns of hydrogen bonds between amino acids.

Anthony Day/Pace et. al. 1996

Helix. H-bonds between residues close in primary sequence.

Page 7: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

7

Secondary structureSecondary structure

Strand. H-bonds between residues not close in primary sequence.

Anthony Day/Pace et. al. 1996

Page 8: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

8

Protein FoldingProtein Folding

In an aqueous environment (such as cell cytoplasm), polypeptide chains fold into 3D shapes (tertiary structure).

Page 9: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

9

From primary to tertiary structureFrom primary to tertiary structure

A protein’s 3D shape is determined by its primary amino acid sequence. Anfinsen et. al. 1963.

Predicting tertiary structure from amino acid sequence is an unsolved problem.

– Difficult to model the energies that stabilize a protein molecule.

– Conformational search space is enormous.

Laboratory of MolecularBiophysics, University of Oxford

Page 10: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

10

Fold recognitionFold recognition

In nature, proteins are observed to assume on the order of a thousand shapes or “folds”.

Biochemistry Mathews, 3ed. AddisonWesley

Page 11: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

11

Fold recognitionFold recognition Given an amino acid sequence target:

– search a set of known folds by aligning target and a template fold representative

– predict the fold that gets the best scoring alignment

Target amino acid sequence

Template

Fold library

YLAADTYK

Template amino acid sequence FISSETCN MEPSSYV TGLIRKN

Target/template Score: 7 21 2

Page 12: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

12

Twilight zone sequence Twilight zone sequence relationshipsrelationships This method is very effective when target

and template have > 30% sequence identity. Approximately 1/3 of protein sequences can

be assigned folds and modeled this way. We would like to extend the method to

sequences in the twilight zone (< 30% identity to any sequence of known structure).

Page 13: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

13

SAM-T98SAM-T98

Build a target HMM of amino acid frequencies from a multiple alignment of target plus homologs (SAM-T98).

YLAADTYK Target amino acid sequence

Protein DatabaseSearch for

homologs

YLAADTYK FISTE-HR HVATD-H- -ITA--HR YLASDS-R

Multiple alignment

Target amino acid HMM

Courtesy of K. Karplus

Page 14: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

14

SAM-T98SAM-T98

Target amino acid HMM Template Fold library

Template amino acid sequence FISSETCN MEPSSYV TGLIRKN

Amino acid HMM for target. Amino acid strings for templates Three -fold increase in recognizing twilight zone similarities (Park et. al.

1998)

Target/template Score: 7 21 2

Courtesy of K. Karplus

Page 15: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

15

SAM-T98 enhancementsSAM-T98 enhancements

Two-way scoring Augment the method with secondary structure information.

Page 16: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

16

Two-way SAM-T98Two-way SAM-T98 Also build amino acid HMMs for templates. Do 2-way scoring

to strengthen recognition of twilight zone relationships.

Template amino acid HMMs

Target amino acid sequence

YLAADTYK

Target/template Score: 19 82 31

Template Fold library

Page 17: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

17

Secondary structureSecondary structure

DSSP alphabet (Kabsch and Sander 1983). Classifies the secondary structure of a residue using known tertiary structure.

alpha helixH

beta strandE

pi helixI

3-10 helixG

turnT

bendS

bridgeB

random coilC

Basic patterns:Repeating

turns:Repeating

bridges:

Other:

Biochemistry Mathews, 3ed. AddisonWesley

Page 18: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

18

Secondary structureSecondary structure

Alternatives to DSSP definitions.– Collapse 8 classes to 3: H,E,C– Other programs to automate

assignment:• Richards and Kundrot (1988) Define• Sklenar (1989) P-Curve• Adzhubei and Sternberg (1993)• Frishman and Argos (1995) STRIDE• King and Johnson (1999) xlsstr

Page 19: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

19

Predicting secondary structurePredicting secondary structure

Extensive research on predicting secondary structure from primary sequence.

Neural nets are most successful approach.– PHD (Rost and Sander 1996)– Predict_2nd (Karplus and Barrett 1998)

Best methods around 75-80% accurate

Page 20: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

20

Secondary structure and fold Secondary structure and fold recognitionrecognition Predicted secondary structure shown useful for

fold recognition (Russell et. al. 1998). Fold recognition accuracy correlated with

secondary structure prediction accuracy(Di Francesco 1995, 1997, 1999).

Why?– Structure more conserved than sequence.

– Proteins in the same fold family have similar topologies (secondary structure elements have similar lengths, spatial organization and connectivities).

Page 21: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

21

Two-track SAM-T2KTwo-track SAM-T2K Predicted probability vectors of secondary

structure added to target HMM

YLAADTYK Target amino acid sequence

H E CY 0.65 0.2 0.15L 0.15 0.7 0.25A 0.01 0.04 0.9A 0.47 0.45 0.08D 0.85 0.1 0.05T 0.32 0.18 0.5Y 0.81 0.09 0.1K 0.5 0.25 0.15

Target two-track HMM

YLAADTYKFISTE-HRHVATD-H--ITA--HR

Multiple alignment

Courtesy of C. Barrett

Courtesy of K. Karplus

P(H) P(E) P(C)

Page 22: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

22

Two-track SAM-T2KTwo-track SAM-T2K

Search template library of sequence pairs with two-track target HMM

Template with 2 sequence pairsFISSETCN CCEECHHH

MEPSSYV HHHHCCE

TGLIRKN EEECEEE

Target two-track HMM

Target/template Score: 22 68 15

Courtesy of K. Karplus

Template Fold library

Page 23: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

23

Motivation for alternatives to Motivation for alternatives to secondary structure classessecondary structure classesWhat’s wrong with secondary structure

classes?– The most widely used secondary structure

alphabet (3-state DSSP) is crude (Helix, Strand, Coil).

– Secondary structure classes are ambiguous.• Automated assignment methods disagree.• 63% agreement between DSSP, Define and

P-Curve (Collc’h et. al. 1993).

Page 24: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

24

What is Local structure? – describes environment of a residue– a residue’s relationship to neighbors

Can use this information to predict fold from primary structure.

Requires comparing local structure of target and template.

Local structure and fold Local structure and fold recognitionrecognition

KnownMust predict (easier than 3d)

Page 25: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

25

Low level descriptions of local Low level descriptions of local structurestructure Lowest level representation of protein

structure - atomic position vectors.

ATOM 1 CA THR 1 7.047 14.099 3.625ATOM 2 C THR 1 16.967 12.784 4.338ATOM 3 O THR 1 15.685 12.755 5.133ATOM 4 N SER 2 15.115 11.555 5.265ATOM 5 CA SER 2 13.856 11.469 6.066ATOM 6 C SER 2 14.164 10.785 7.379ATOM 7 O SER 2 14.993 9.862 7.443ATOM 8 CB SER 2 12.732 10.711 5.261ATOM 9 N CYS 3 13.488 11.241 8.417ATOM 10 CA CYS 3 13.660 10.707 9.787

AtomNo. Type

ResidueType No.

Position vectorX Y Z

Conformations of BiopolymersIUPAC-IUB

Page 26: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

26

“One level up”. From atomic position vectors can derive a list of properties that describe a residue’s local environment.

Low level descriptions of local Low level descriptions of local structurestructure

Conformations of BiopolymersIUPAC-IUB

Page 27: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

27

Dihedral and bond anglesDihedral and bond angles

Dihedral angles are defined by 4 atoms.

Bond angles are defined by 3 atoms.

Conformations of BiopolymersIUPAC-IUB

Conformations of BiopolymersIUPAC-IUB

Page 28: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

28

Dihedral angles: Phi, Psi, OmegaDihedral angles: Phi, Psi, Omega

The 6 atoms in each peptide unit lie in the same plane.

ω

ω

= 180 (trans)or 0 (cis)

and free to rotate

Biochemistry Mathews, 3ed. AddisonWesley

Page 29: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

29

Dihedral angles: Phi, Psi, OmegaDihedral angles: Phi, Psi, Omega

Result: good approximation of polypeptide backbone is list of (,) pairs ( cis is rare).

(,) pairs often represented on a plane called the Ramachandran plot.

http://www.biochem.artizona.eduBiochemistry 462A Lecture Notes

Page 30: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

30

A small gallery of properties: A small gallery of properties: the geometry of local structurethe geometry of local structure

Kappa. Virtual bond angle between

C of residues i-2, i, i+2

Alpha. Virtual dihedral angle between C of residues i-1, i, i+1, i+2

Tau. Virtual bond angle between C of residues i-1, i, i+1

Zeta. Dihedral angle between carbonyl bonds of residues i and i-1

Page 31: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

31

Relationship of a residue to its Relationship of a residue to its neighborsneighbors Density measures. How many residues

are within a given distance?

Count of H-bond partners.

12 neighboring residueswithin 6 A radius

2 H-bond partners

Page 32: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

32

Existing local structure alphabetsExisting local structure alphabets

Approximately 30 alphabets of local structure in the literature.

Can they be used to improve fold recognition?

Page 33: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

33

Phi/psi alphabetsPhi/psi alphabets

Classes based on partition of phi/psi space

Bystroff et. al. 2000. 10 classes: B E b d e G H L I x

Kang et. al. 1993. 1296 classes: uniform partitioning by 10

Sun et. al. 1996DSSP H,E plus 5 phi/psi classes: a b e l t

Bystroff et. al. 2000

Page 34: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

34

Backbone fragment alphabetsBackbone fragment alphabets

Classes based on clustering low-level properties of contiguous series of residues.

Unger et. al. 1987~100 6-residue fragments

k-nearest neighbor clustering by RMSD of C atoms Centroid of each cluster selected as building block

Unger et. al. 1987

Page 35: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

35

Backbone fragment alphabetsBackbone fragment alphabets

De Brevern et. al. 2000Protein Building Blocks (PBBs).

16 classes of 5-residue fragments. SOM clustering of vectors of

8 dihedral angles ( and ).

De Brevern et. al. 2000

Page 36: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

36

Desired properties of local Desired properties of local structural alphabetsstructural alphabetsFor purposes of improving fold

recognition:– Predictable from primary sequence– Conserved within a fold family

Page 37: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

37

Comparison of existing local Comparison of existing local structure alphabetsstructure alphabets

Only a few of the alphabets have been tested for predictability. None of the alphabets have been tested for conservation within fold families.

Page 38: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

38

Designing a Local Structure Designing a Local Structure AlphabetAlphabet Extract properties with respect to each residue in the

dataset.

Selected property:

TCO

Selected PDB structures

Property extraction

PDBNo AA TCO1 M -0.32 L -0.343 S 0.914 P 0.9355 E -0.16 V 0.2..

i-1 i

Page 39: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

39

Designing a Local Structure Designing a Local Structure AlphabetAlphabet Partition the data into k populations.

PDBNo AA TCO1 M -0.32 L -0.343 S 0.914 P 0.9355 E -0.16 V 0.2..

UnsupervisedLearning

Algorithm

PDBNo AA TCO1 M -0.32 L -0.345 E -0.1

PDBNo AA TCO 3 S 0.914 P 0.9356 V 0.2

Class A

Class B

-1 -0.5 0 0.5 1

X OX O

Class A Class B

X O

Page 40: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

40

Designing a Local Structure Designing a Local Structure AlphabetAlphabet

Selected property:KJ descriptor vector*:

[,, d1, d2, d3]

ZETA TAU

D1 dison3:H-bond lengthfrom Oi to Ni+3

D2 dison4:H-bond lengthfrom Oi to Ni+4

D3 discn3:length from Ci to Ni+3

* Descriptor vector of key geometric properties identified by King and Johnson 1999

i

i

i

i+3

i+3

i+4

i

i-1

i

i-1 i+1

Page 41: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

41

Designing a Local Structure Designing a Local Structure AlphabetAlphabet Extract properties with respect to each residue in the

dataset.

Selected property:KJ descriptor vector:

[, , d1, d2, d3]

Selected PDB structures

Property extraction

PDBNo AA KJDV1 M [13.6, 9 2.9, 3.7, 3.1, 4.1]2 L [14.4, 9, 5.7,4 .9, 7.1, 4.9]3 S [19.8, 100.3, 7.2, 10.1, 6.9]4 P [18.1, 116.2, 6.7, 9.2,6 .9]...

Page 42: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

42

Designing a Local Structure Designing a Local Structure AlphabetAlphabet Clustering multi-dimensional data points.

PDBNo AA KJDV1 M [13.6, 9 2.9, 3.7, 3.1, 4.1]2 L [14.4, 9, 5.7,4 .9, 7.1, 4.9]3 S [19.8, 100.3, 7.2, 10.1, 6.9]4 P [18.1, 116.2, 6.7, 9.2,6 .9]...

Components in different units. Scale to same range? For very high dimensional vectors require feature reduction.

Page 43: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

43

Evaluation protocolEvaluation protocol

Protocol is based on:– testing candidate alphabets for their conservation within fold families.– testing predictability of candidate alphabets– testing improvements in fold recognition when candidate alphabets are used.

Page 44: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

44

Evaluation Protocol: string Evaluation Protocol: string translationtranslation

Selected PDB structures

Selected alphabet Stringbuilder

Position-equivalent strings in

new alphabet

>2abdCAAABCAB>4ecaACBBABCA. . .

>2abdMDAAVKTG>4ecaMELVIRSG. . .

Page 45: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

45

Evaluation Protocol: alignment Evaluation Protocol: alignment translationtranslation

Fold family alignments

Alignmentbuilder

Position-equivalent alignments

in new alphabet

Position-equivalent strings in

new alphabet

CA-AABCABAC-BBABCAC-AACCBBCCCA-BB-A-

MD-AAVKTGME-LVIRSGM-SAGCRDKMEA-SC-E-

Page 46: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

46

Position-equivalent alignments

in new alphabet

Conserved?

CA-AABCABAC-BBABCAC-AACCBBCCCA-BB-A-

Evaluation Protocol: alphabet Evaluation Protocol: alphabet conservationconservation

Average entropy in columns of alignments. Relative entropy of substitution matrix

constructed from alignments (Altschul 91).

Page 47: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

47

Evaluation Protocol: alphabet Evaluation Protocol: alphabet predictabilitypredictability

Test predictability with Predict_2nd neural net.

Improve on neural net performance with alternate methods. Position-

equivalent strings in

new alphabet

Predictable?

Courtesy of C. Barrett

P(A) P(B) P(C)

Page 48: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

48

Evaluation Protocol: fold Evaluation Protocol: fold recognitionrecognition

Build a fold library that incorporates the local structure alphabet and do fold recognition testing using this library.

Page 49: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

49

Incorporating local structure Incorporating local structure alphabets into a fold libraryalphabets into a fold library Simplest. Use predicted local structure string for

target and known local structure string for templates.

Target local structure string

ABBCACAB

Target/template Score: 7 21 2

Template local structure string CCABBBAC AACBCAA CAACBBB

PROBLEM!Wrong letter predicted.

Template Fold library

Page 50: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

50

Incorporating local structure Incorporating local structure information into a fold libraryinformation into a fold library Use several strings (amino acid and local

structure) for target and templates.Target with string tuple

YLAADTYKABBCACABWYTZTTVU

Template with string tuples FISSETCNCCABBBACYVUUTZVV

MEPSSYVAACBCAATTYUVWZ

TGLIRKNCAACBBBYUUUVZW

Target/template Score: 6 23 5

PROBLEM!Wrong letters predicted.

Template Fold library

Page 51: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

51

Add tracks to the target HMM. Search template library of sequence tuples with multi-track target HMM.

Template with sequence tuplesFISSETCNCCABBBACYVUUTZVV

MEPSSYVAACBCAATTYUVWZ

TGLIRKNCAACBBBYUUUVZW

Target multi-track HMM

Extending the SAM-T2K method Extending the SAM-T2K method with local structure informationwith local structure information

Target/template Score: 75 3 22

Template Fold library

Page 52: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

52

Adding local structure strings to the template HMM. Enable 2-way HMM scoring.

Template amino acid HMMs plus local structure strings

Extending the SAM-T2K method Extending the SAM-T2K method with local structure informationwith local structure information

Target/template Score: 8 24 49

CCABBBACYVUUTZVV

AACBCAATTYUVWZ

CAACBBBYUUUVZW

Target

YLAADTYKABBCACABWYTZTTVU

A B CY 0.65 0.2 0.15L 0.15 0.7 0.25A 0.01 0.04 0.9A 0.47 0.45 0.08D 0.85 0.1 0.05T 0.32 0.18 0.5Y 0.81 0.09 0.1K 0.5 0.25 0.15

Template Fold library

Page 53: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

53

Build multi-track HMMs for target and template.

Target multi-track HMM

Extending the SAM-T2K method Extending the SAM-T2K method with local structure informationwith local structure information

Template multi-track HMMs

Target/template Score: 6 23 5

Template Fold library

Page 54: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

54

Evaluation Protocol: fold Evaluation Protocol: fold recognitionrecognition

Foldclassification

database

Fold testset

Non-redundant

119l T4 Lysozyme12asA Asparagine Synthetase153l Goose Lysozyme16pk Phosphoglycerate Kinase16vpA VP16 regulatory protein. . .

Target

Template Fold library

119l

Target/template Score: 12 2 71

Templates: 12asA 153l 16pk

119l12asA153l16pk16vpA. . .

Page 55: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

55

Evaluation Protocol: fold Evaluation Protocol: fold recognitionrecognition

courtesy of K. Karplus

1

2

5

10

20

50

100

200

500

1000

2000

500 1000 2000 5000 10000

Fals

e Po

sitiv

es

True Positives

+=Same foldold PSI-blast

PSI-blastSAM-T2K

SAM-T2K EHL 50-50SAM-T2K EBGHTL 50-50

DALI

Page 56: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

56

Research ScheduleResearch Schedule

Year 1:Find a local structure alphabet that improves fold recognition. Build a fold library that uses the alphabet. Put up a webserver for public use of the library.

Summer 2002CASP5

Page 57: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

57

Research ScheduleResearch Schedule

Year 2:Design more alphabets. Compare and combine new and existing alphabets. Expand the methods to continuous-value predictions. Incorporate best combination into my fold library.

June 2003Produce completed dissertation.

Page 58: Advancement to Candidacy Computer Science Department by Rachel Karchin Advisor: Kevin Karplus

58

ConclusionConclusion

Focus of the work:– Evaluate existing local structure alphabets– Design and evaluate novel local structure

alphabets

Evaluation protocol:– conservation– predictability – fold recognition