t.r. golub et al., science 286, 531 (1999)
DESCRIPTION
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. T.R. Golub et al., Science 286, 531 (1999). Introduction. Why is Identification of Cancer Class (tumor sub-type) important? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/1.jpg)
Molecular Classification of Cancer:Class Discovery and Class Prediction by Gene Expression MonitoringT.R. Golub et al., Science 286, 531 (1999)
![Page 2: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/2.jpg)
Introduction
Why is Identification of Cancer Class (tumor sub-type) important? Cancers of Identical grade can have widely
variable clinical courses (i.e. acute lymphoblastic leukemia, or Acute myeloid leukemia).
Tradition Method: Morphological appearance. Enzyme-based histochemical analyses. Immunophenotyping. Cytogenetic analysis.
![Page 3: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/3.jpg)
Topics of Discussion
Class Prediction (supervised learning).
Class Discovery (unsupervised learning).
![Page 4: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/4.jpg)
Class Prediction
How could one use an initial collection of samples belonging to know classes to create a class Predictor? Identification of Informative Genes via
Neighborhood Analysis. Weighted Vote
![Page 5: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/5.jpg)
Neighborhood Analysis
Why do we want to start with informative genes? To be readily applied in a clinical
setting. Highly instructive
![Page 6: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/6.jpg)
Neighborhood Analysis
1. v(g) = (e1, e2, ..., en)
2. c = (c1, c2, ..., cn)
3. Compute the correlation between v(g) and c.
1. Euclidean distance
2. Pearson correlation coefficient.
3. P(g,c) = [µ1(g) - µ2(g)]/[ σ1(g) + σ2(g)]
![Page 7: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/7.jpg)
Neighborhood Analysis
![Page 8: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/8.jpg)
Class Predictor via Gene Voting
1. Parameters (ag, bg) are defined for each
informative gene
2. ag = P(g,c)
3. bg = [µ1(g) + µ2(g)]/2
4. vg = ag(xg - bg)
5. V1 = ∑ | Vg |; for Vg > 0
6. V2 = ∑ | Vg |; for Vg < 0
7. PS = (Vwin - Vlose)/(Vwin + Vlose)
8. The sample was assigned to the winning class for PS > threshold.
![Page 9: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/9.jpg)
Class Predictor via Gene Voting
![Page 10: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/10.jpg)
Data
Initial Sample: 38 Bone Marrow Samples (27 ALL, 11 AML) obtained at the time of diagnosis.
Independent Sample: 34 leukemia consisted of 24 bone marrow and 10 peripheral blood samples (20 ALL and 14 AML).
![Page 11: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/11.jpg)
Neighborhood Analysis
![Page 12: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/12.jpg)
Validation of Gene Voting
Initial Samples: 36 of the 38 samples as either AML or ALL and two as uncertain. All 36 samples agrees with clinical diagnosis.
Independent Samples: 29 of 34 samples are strongly predicted with 100% accuracy.
![Page 13: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/13.jpg)
Validation of Gene Voting
![Page 14: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/14.jpg)
Class Discovery
Can cancer classes be discovered automatically based on gene expression? Cluster tumors by gene expression Determine whether the putative
classes produced are meaningful.
![Page 15: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/15.jpg)
Cluster tumors
Self-organization Map (SOM) Mathematical cluster analysis for recognizing
and clasifying feautres in complex, multidimensional data (similar to K-mean approach)
Chooses a geometry of “nodes” Nodes are mapped into K-dimensional
space, initially at random. Iteratively adjust the nodes.
![Page 16: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/16.jpg)
Adjusting the nodes
Randomly select a data point P. Move the nodes in the direction of P. The closest node Np is moved the most. Other nodes are moved depending on their
distance from Np in the initial geometry.
![Page 17: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/17.jpg)
SOM
![Page 18: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/18.jpg)
Validation of SOM
Prediction based on cluster A1 and A2: 24/25 of the ALL samples from initial dataset
were clustered in group A1 10/13 of the AML samples from initial dataset
were clustered in group A2
![Page 19: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/19.jpg)
Validation of SOM
How could one evaluate the putative cluster if the “right” answer were not known? Assumption: class discovery could be tested
by class prediction. Testing of Assumption:
• Construct Predictors based on clusters A1 and A2.
• Construct Predictors based on random clusters
![Page 20: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/20.jpg)
Validation of SOM
Predictions using predictors based on clusters A1 and A2 yields 34 accurate predictions, one error and three uncertains.
![Page 21: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/21.jpg)
Validation of SOM
![Page 22: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/22.jpg)
Searching for Finder Class
Use SOM to divide the initial samples into four clusters (denoted B1 to B4)
B1 corresponds to AML, B2 corresponds to T-lineage ALL, B3 and B4 corresponds to B-lineage ALL.
![Page 23: T.R. Golub et al., Science 286, 531 (1999)](https://reader036.vdocuments.mx/reader036/viewer/2022062301/568132a8550346895d994a86/html5/thumbnails/23.jpg)