microarrays: a comparison of classification and feature selection algorithms for interpretation
DESCRIPTION
Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation. Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang. Responsibilities. - PowerPoint PPT PresentationTRANSCRIPT
Microarrays: A Comparison of Classification and Feature
Selection Algorithms for Interpretation
Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang
Responsibilities• Lynn Lee studied and described the classification
methods, and performed all the experiments that use KNN as the classification method.
• Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results.
• Eric Smith programmed, tested, and described the data parser.
• Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method.
• Each team member contributed to the writing and editing process.
The Parser• Written in Perl• 100 lines of code, plus 90 lines of comments and
blank lines• 2 phases:
– Parse SOFT headers to generate some ARFF headers
– Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix
The Data
• 75 samples• 22215 genes• 3 classes: smokers, non-smokers, those
who quit smoking• Easy phenotype to verify• Caveats?
Feature Selection • Info Gain• Chi Square• 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500
features selected• Results: almost identical features selected
for both algorithms• Reflects ‘partitionability’ of data set
Classification • ECOC• KNN• Paired the 2 classification algorithms with
2 feature selection algorithms• Results:
-KNN ‘out-classifies’ ECOC with less
features (70% with 1)
-Highest accuracy as a function of feature selection algorithm
Classification • Accuracy does not increase beyond a
maximum potential, regardless of feature #• Suggests an inherent characteristic of the
data