microarrays: a comparison of classification and feature selection algorithms for interpretation lynn...
TRANSCRIPT
![Page 1: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/1.jpg)
Microarrays: A Comparison of Classification and Feature
Selection Algorithms for Interpretation
Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang
![Page 2: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/2.jpg)
Responsibilities• Lynn Lee studied and described the classification
methods, and performed all the experiments that use KNN as the classification method.
• Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results.
• Eric Smith programmed, tested, and described the data parser.
• Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method.
• Each team member contributed to the writing and editing process.
![Page 3: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/3.jpg)
The Parser• Written in Perl• 100 lines of code, plus 90 lines of comments and
blank lines• 2 phases:
– Parse SOFT headers to generate some ARFF headers
– Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix
![Page 4: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/4.jpg)
The Data
• 75 samples• 22215 genes• 3 classes: smokers, non-smokers, those
who quit smoking• Easy phenotype to verify• Caveats?
![Page 5: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/5.jpg)
Feature Selection • Info Gain• Chi Square• 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500
features selected• Results: almost identical features selected
for both algorithms• Reflects ‘partitionability’ of data set
![Page 6: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/6.jpg)
Classification • ECOC• KNN• Paired the 2 classification algorithms with
2 feature selection algorithms• Results:
-KNN ‘out-classifies’ ECOC with less
features (70% with 1)
-Highest accuracy as a function of feature selection algorithm
![Page 7: Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang](https://reader036.vdocuments.mx/reader036/viewer/2022082517/56649ead5503460f94bb4b3a/html5/thumbnails/7.jpg)
Classification • Accuracy does not increase beyond a
maximum potential, regardless of feature #• Suggests an inherent characteristic of the
data