genetic algorithms select protein features most predictive of enzyme function andrew kernytsky,...
DESCRIPTION
TAGHCVNYDYGAGCQSGSPV bbbbbieeeiibbieeeeee..|....|......||.... AA Acc Cons Intersection properties capture local information 20% 10% 5% HHHEEEEELLEEEEELLLLL iiibbbbbbboooobbbbbb Feat 4 Feat 5 Feat 6 1% 0.1% 0.01% All Global All Interse ction Limited local information Significant risk of overfitting during training features > 10 2 positive samplesTRANSCRIPT
Genetic Algorithms Select Protein Features Most Predictive of Enzyme
Function
Andrew Kernytsky, Burkhard RostColumbia University
Enzyme function predictionGiven protein sequence predict Enzyme Commission (EC) number
NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York.EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.
OxidoreductasesOxidoreductases
TransferasesTransferasesHydrolasesHydrolases
LyasesLyases
IsomerasesIsomerasesLigasesLigases
TAGHCVNYDYGAGCQSGSPVbbbbbieeeiibbieeeeee..|....|......||....
AAAccCons
Intersection properties capture local information
20%
10%
5%
HHHEEEEELLEEEEELLLLLiiibbbbbbboooobbbbbb36788842100000000123
Feat 4Feat 5Feat 6
1%
0.1%
0.01%
All All GlobalGlobal
All All IntersectiIntersecti
onon
Limited local information
Significant risk of overfitting during training103+ features > 102 positive samples
Algorithm overview
Proteinsequence
MSNLLKDFEVAQC
AA
AA×sec
sec AA×sec
Inner Learning Algorithm
SVM
Neural Network
OR
0.635
0.688
0.677
Fitness Assesed
SelectionCrossoverMutation
AA×sec
AA AA×secsec AA×sec
2nd GenerationGenome Pop.
3rd GenerationGenome Pop.
GA Evolution
Genetic Algorithm
1st 2nd 3rd 4th Generation Populations
AAsecAA×secAA secAA AA×secsec AA×secAA sec AA×sec
All possible combinations
of feature classes [genomes]
AAsecAA×sec
All intersection and global
feature classes
GA improves performance
EC Level
Balance between intersection and global features gives best performance
AA, acc, sec, htm, cons-95
AA, acc, sec, cons-95
AA, acc, acc×sec, htm, cons-95
AA, sec, cons-97
AA, acc×sec, sec, cons-95
AA, acc, acc×sec×cons-94, sec
AA, AA×acc×sec×cons-95, sec, cons-95
AA, sec AA, acc, sec×cons-94, cons-83×cons-94
AA, acc×sec×cons-89, cons-95
AA, acc×sec×cons-84×cons-94, sec
AA×acc×htm×cons-84×cons-95, acc, cons-94
AA AA, acc×cons-96, sec×cons-91
AA, acc×sec×cons-94, acc×cons-94
AA, acc×sec×cons-84×cons-94
AA, acc×sec×cons-88×cons-91×cons-95
AA×cons-94, acc×cons-94
AA×cons-82, acc×sec×cons-94
AA×cons-82, acc×sec×cons-94×cons-96
AA×sec×htm×cons-95×cons-96