microarrays: a comparison of classification and feature selection algorithms for interpretation lynn...

7
Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Upload: george-sutton

Post on 31-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Microarrays: A Comparison of Classification and Feature

Selection Algorithms for Interpretation

Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Page 2: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Responsibilities• Lynn Lee studied and described the classification

methods, and performed all the experiments that use KNN as the classification method.

• Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results.

• Eric Smith programmed, tested, and described the data parser.

• Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method.

• Each team member contributed to the writing and editing process.

Page 3: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

The Parser• Written in Perl• 100 lines of code, plus 90 lines of comments and

blank lines• 2 phases:

– Parse SOFT headers to generate some ARFF headers

– Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix

Page 4: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

The Data

• 75 samples• 22215 genes• 3 classes: smokers, non-smokers, those

who quit smoking• Easy phenotype to verify• Caveats?

Page 5: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Feature Selection • Info Gain• Chi Square• 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500

features selected• Results: almost identical features selected

for both algorithms• Reflects ‘partitionability’ of data set

Page 6: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Classification • ECOC• KNN• Paired the 2 classification algorithms with

2 feature selection algorithms• Results:

-KNN ‘out-classifies’ ECOC with less

features (70% with 1)

-Highest accuracy as a function of feature selection algorithm

Page 7: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Classification • Accuracy does not increase beyond a

maximum potential, regardless of feature #• Suggests an inherent characteristic of the

data