genetic algorithms select protein features most predictive of enzyme function andrew kernytsky,...

6
Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

Upload: hester-hensley

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

TAGHCVNYDYGAGCQSGSPV bbbbbieeeiibbieeeeee..|....|......||.... AA Acc Cons Intersection properties capture local information 20% 10% 5% HHHEEEEELLEEEEELLLLL iiibbbbbbboooobbbbbb Feat 4 Feat 5 Feat 6 1% 0.1% 0.01% All Global All Interse ction Limited local information Significant risk of overfitting during training features > 10 2 positive samples

TRANSCRIPT

Page 1: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

Genetic Algorithms Select Protein Features Most Predictive of Enzyme

Function

Andrew Kernytsky, Burkhard RostColumbia University

Page 2: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

Enzyme function predictionGiven protein sequence predict Enzyme Commission (EC) number

NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York.EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.

OxidoreductasesOxidoreductases

TransferasesTransferasesHydrolasesHydrolases

LyasesLyases

IsomerasesIsomerasesLigasesLigases

Page 3: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

TAGHCVNYDYGAGCQSGSPVbbbbbieeeiibbieeeeee..|....|......||....

AAAccCons

Intersection properties capture local information

20%

10%

5%

HHHEEEEELLEEEEELLLLLiiibbbbbbboooobbbbbb36788842100000000123

Feat 4Feat 5Feat 6

1%

0.1%

0.01%

All All GlobalGlobal

All All IntersectiIntersecti

onon

Limited local information

Significant risk of overfitting during training103+ features > 102 positive samples

Page 4: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

Algorithm overview

Proteinsequence

MSNLLKDFEVAQC

AA

AA×sec

sec AA×sec

Inner Learning Algorithm

SVM

Neural Network

OR

0.635

0.688

0.677

Fitness Assesed

SelectionCrossoverMutation

AA×sec

AA AA×secsec AA×sec

2nd GenerationGenome Pop.

3rd GenerationGenome Pop.

GA Evolution

Genetic Algorithm

1st 2nd 3rd 4th Generation Populations

AAsecAA×secAA secAA AA×secsec AA×secAA sec AA×sec

All possible combinations

of feature classes [genomes]

AAsecAA×sec

All intersection and global

feature classes

Page 5: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

GA improves performance

EC Level

Page 6: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

Balance between intersection and global features gives best performance

AA, acc, sec, htm, cons-95

AA, acc, sec, cons-95

AA, acc, acc×sec, htm, cons-95

AA, sec, cons-97

AA, acc×sec, sec, cons-95

AA, acc, acc×sec×cons-94, sec

AA, AA×acc×sec×cons-95, sec, cons-95

AA, sec AA, acc, sec×cons-94, cons-83×cons-94

AA, acc×sec×cons-89, cons-95

AA, acc×sec×cons-84×cons-94, sec

AA×acc×htm×cons-84×cons-95, acc, cons-94

AA AA, acc×cons-96, sec×cons-91

AA, acc×sec×cons-94, acc×cons-94

AA, acc×sec×cons-84×cons-94

AA, acc×sec×cons-88×cons-91×cons-95

AA×cons-94, acc×cons-94

AA×cons-82, acc×sec×cons-94

AA×cons-82, acc×sec×cons-94×cons-96

AA×sec×htm×cons-95×cons-96