microarrays: a comparison of classification and feature selection algorithms for interpretation

7
Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Upload: leon

Post on 19-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation. Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang. Responsibilities. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Microarrays: A Comparison of Classification and Feature

Selection Algorithms for Interpretation

Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang

Page 2: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Responsibilities• Lynn Lee studied and described the classification

methods, and performed all the experiments that use KNN as the classification method.

• Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results.

• Eric Smith programmed, tested, and described the data parser.

• Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method.

• Each team member contributed to the writing and editing process.

Page 3: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

The Parser• Written in Perl• 100 lines of code, plus 90 lines of comments and

blank lines• 2 phases:

– Parse SOFT headers to generate some ARFF headers

– Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix

Page 4: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

The Data

• 75 samples• 22215 genes• 3 classes: smokers, non-smokers, those

who quit smoking• Easy phenotype to verify• Caveats?

Page 5: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Feature Selection • Info Gain• Chi Square• 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500

features selected• Results: almost identical features selected

for both algorithms• Reflects ‘partitionability’ of data set

Page 6: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Classification • ECOC• KNN• Paired the 2 classification algorithms with

2 feature selection algorithms• Results:

-KNN ‘out-classifies’ ECOC with less

features (70% with 1)

-Highest accuracy as a function of feature selection algorithm

Page 7: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Classification • Accuracy does not increase beyond a

maximum potential, regardless of feature #• Suggests an inherent characteristic of the

data