classification project- application using sas base programing

34
Project Scope Given a data set of profiles of mRNA expression that contain distinct adenocarcinoma subclasses classify human lung carcinomas. Data Set This data set contains 56 variables measured on 12,625 genes using Affymetrix GeneChip 95av2 (dedicated to acquiring, analyzing and managing complex genetic information). Of the 56 variables measured 20 lung carcinoid (Carcinoid), 13 are related to the metastasis of colon cancer (colon) 17 normal lung function (Normal) and 6 to lung small cell carcinoma (SmallCell). STEP 1: Data Manipulation Missing data & Format The first step is assuring that the data set is suitable for the analysis. We are checking whether the format of the data is suitable for our analysis and whether there are missing values. The format has been changed to numeric in order for the analyze to be taken care of . No missing data was in the data set. Extreme values We are using PROC UNIVARIATE to search for extreme values among out 56 variables. We see that our data set has indeed extreme values. We expect somehow this behavior as we are

Upload: andra

Post on 17-Jul-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Project classifing human carcinoid cells. The application uses SAS Base Programing. PROC ACCECLUS, PROC VARCLUS PROC FASTCLUS are being . Given a data set of profiles of mRNA expression that contain distinct adenocarcinoma subclasses classify human lung carcinomas.

TRANSCRIPT

Page 1: Classification project- application using SAS base programing

Project Scope

Given a data set of profiles of mRNA expression that contain distinct adenocarcinoma subclasses classify human lung carcinomas.

Data Set

This data set contains 56 variables measured on 12,625 genes using Affymetrix GeneChip 95av2 (dedicated to acquiring, analyzing and managing complex genetic information). Of the 56 variables measured 20 lung carcinoid (Carcinoid), 13 are related to the metastasis of colon cancer (colon) 17 normal lung function (Normal) and 6 to lung small cell carcinoma (SmallCell).

STEP 1: Data Manipulation

Missing data & Format

The first step is assuring that the data set is suitable for the analysis. We are checking whether the format of the data is suitable for our analysis and whether there are missing values. The format has been changed to numeric in order for the analyze to be taken care of . No missing data was in the data set.

Extreme values

We are using PROC UNIVARIATE to search for extreme values among out 56 variables. We see that our data set has indeed extreme values. We expect somehow this behavior as we are analyzing cancer cells which vary in size. As a consequence we will eliminate the outliers which have high values for the normal cells as this cells should have values that are not further from 3 standard deviation from the mean and smallcells. Data was standardized, although as the data set consists of variables measured in the same unit of measurement it would not affect the calculation of distances between clusters. We will keep in consequence the outliers for the colon cells and carcinoid. From our initial data base of 12,625 observations we will end up with a 11279 observation data base.

Getting an intuition over the data

We are interested to know whether or not the variables are linked between. As the results from PROC CORR indicate, the variables are strongly correlated. Our intuition that each type of variable Normal, Colon, Carcinoid and SmallCell are more correlated between

Page 2: Classification project- application using SAS base programing

their own type then let’s say Colon and Normal cell is confirmed by the results given by the division into clusters using principal components as a criteria. We will specify the number of clusters to be 4. We will obtain a cluster (CLUSTER 1) with Normal Cells and half of the Small Cells, a cluster(CLUSTER 3) with colon cells and 2 clusters with Carcinoids, one of them having the rest of the small cells (CLUSTER 2) and the other one variable concerning Colon variables. The full table is to be found in the ANNEX.

STEP 2: Choosing the right number of clusters

In finding the right clusters, three approaches in manipulating our initial variables have been used:

1. Computing 56 canonical variables – PROC ACECLUS2. Reducing the number of variables to 10 – PROC VARCLUS3. Standardizing our initial 56 variables – PROC STANDARD However when comparing the Cubic Clustering Criteria, only the latter was found appropriate to continue our analysis.

As we do not know prior the number of clusters, we will apply automatic clustering methods to figure out the exact number. The data set is to vast to apply directly the SAS procedure CLUSTER, so we will first apply FASTCLUS to find a set of initial cluters which will be used as input for PROC CLUSTER.

We choose the number of maximum clusters for the FASTCLUST procedure 53 as the square root of our total number of observations (11279) devided by 2. As there were clusters with few observations, those with fewer than 9 were deleted and the rest became seeds for a second FASTCLUST procedure. We are using the output containing clusters as an input for the CLUSTER procedure.

The criteria used in our clusterisation is the Ward distance. This meant that a loss of inertia resulted in the fusion of two classes, as a consequence it seeks to have a low interclass inertia. It is calculates using the square of the distances of two barycenteres divided by 1/the number of individuals in the corresponding class.

Page 3: Classification project- application using SAS base programing

1. Computing 56 canonical variables – PROC ACECLUS

In order to compute canonical variables for subsequent cluster analysis we obtain approximate estimates of the pooled within-cluster covariance matrix by using PROC ACECLUS. As our database contained a big number of data we choose 0.1 to be our within cluster covariance coefficient. Data with poorly separated or elongated patterns need to be transformed. Also, variables with different units of measurement or with different size variances will need to be transformed as well. In our case only the former is true as all the variables use the same unit of measure. For clusterization to be done it is advisable to have spherical clusters rather than elongates elliptical clusters.

We can apply this technique directly on the data without prior clusterization as there is no need for prior knowledge on cluster membership or number of clusters.

However the Clustering Criteria below fail to validate the data as appropriate for our analysis. The negative value of the CCC indicates a strong presence of outliers in the data set, which makes it difficult to find an appropriate number of clusters.

In consequence we will try to reduce the number of variables that are being used.

FIG1. Clustering Criteria for the clusters obtained on canonical variables

Page 4: Classification project- application using SAS base programing

2. Reducing the number of variables to 10 – PROC VARCLUS

The correlation between our variables is high as the result from the PROC CORR show .

FIG2- PROC CORR-best 8- correlation between variables.

In consequence we can reduce our number of variables. In order to reduce our 56 variables to a smaller number we will use PROC VARCLUS. This procedure will output our variables into a number of clusters from which we will select a few variables that are most representative for the subsequent cluster and use it in our analysis. This procedure is closely related to the principal component procedure, finding the groups of variables that are as correlated as possible among themselves and as uncorrelated as possible with variables in other clusters.

For our analysis we choose rather than the number of clusters the threshold for identifying additional dimensions within equal to 0.8. From each cluster we choose one variable which had the lowest 1-R**2 value as it contributed the most to the subsequent cluster .

We choosed as a consequence : Normal7 Carcinoid4 Carcinoid19 Carcinoid18 Carcinoid6 Colon5 Colon12 Colon3 SmallCell3 SmallCell4 Colon10. Table 1 Classification of variables in 10 distinct clusters

10 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

Cluster 1 Normal1 0.5776 0.4322 0.7440 Normal1

Page 5: Classification project- application using SAS base programing

10 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

  Normal2 0.7216 0.4075 0.4699 Normal2

  Normal3 0.7099 0.3027 0.4161 Normal3

  Normal4 0.6246 0.3060 0.5409 Normal4

  Normal5 0.8154 0.3919 0.3036 Normal5

  Normal6 0.7852 0.4063 0.3618 Normal6

  Normal7 0.7112 0.4114 0.4907 Normal7

  Normal8 0.7808 0.4384 0.3904 Normal8

  Normal9 0.7855 0.3485 0.3292 Normal9

  Normal10 0.7179 0.4627 0.5250 Normal10

  Normal11 0.6125 0.2770 0.5359 Normal11

  Normal13 0.5408 0.3959 0.7603 Normal13

  Normal14 0.6168 0.3040 0.5506 Normal14

  Normal15 0.5465 0.3721 0.7222 Normal15

  Normal16 0.6182 0.2556 0.5129 Normal16

  Normal17 0.6962 0.3395 0.4600 Normal17

Cluster 2 Carcinoid2 0.7208 0.4011 0.4663 Carcinoid2

  Carcinoid3 0.5375 0.1325 0.5332 Carcinoid3

  Carcinoid4 0.7874 0.2760 0.2937 Carcinoid4

  Carcinoid9 0.7170 0.3094 0.4097 Carcinoid9

  Carcinoid14 0.6630 0.1973 0.4199 Carcinoid14

  Carcinoid15 0.7679 0.3107 0.3368 Carcinoid15

  Carcinoid16 0.6685 0.1959 0.4122 Carcinoid16

  Carcinoid17 0.7478 0.4225 0.4367 Carcinoid17

Cluster 3 Colon2 0.5300 0.1949 0.5837 Colon2

  Colon4 0.6529 0.3772 0.5573 Colon4

  Colon5 0.6688 0.1567 0.3928 Colon5

  Colon7 0.5610 0.2561 0.5901 Colon7

Page 6: Classification project- application using SAS base programing

10 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

  Colon8 0.5945 0.2326 0.5283 Colon8

  Colon9 0.4309 0.1879 0.7008 Colon9

  Colon10 0.5334 0.1981 0.5818 Colon10

  Colon11 0.5477 0.2664 0.6166 Colon11

Cluster 4 Carcinoid1 0.7041 0.3215 0.4360 Carcinoid1

  Carcinoid5 0.3161 0.1972 0.8519 Carcinoid5

  Carcinoid7 0.6917 0.3549 0.4778 Carcinoid7

  Carcinoid10 0.3658 0.1754 0.7691 Carcinoid10

  Carcinoid12 0.7055 0.3482 0.4519 Carcinoid12

  Carcinoid13 0.7094 0.3875 0.4745 Carcinoid13

  Carcinoid19 0.7206 0.3585 0.4355 Carcinoid19

  Carcinoid20 0.7007 0.4087 0.5062 Carcinoid20

  Normal12 0.4108 0.3121 0.8565 Normal12

Cluster 5 Carcinoid6 0.8378 0.2576 0.2185 Carcinoid6

  Carcinoid8 0.8310 0.3032 0.2425 Carcinoid8

  Carcinoid11 0.7852 0.2763 0.2968 Carcinoid11

Cluster 6 SmallCell2 0.5349 0.1577 0.5522 SmallCell2

  SmallCell3 0.7381 0.1753 0.3176 SmallCell3

  SmallCell5 0.7062 0.2283 0.3807 SmallCell5

  SmallCell6 0.5383 0.2274 0.5976 SmallCell6

Cluster 7 Colon6 0.6001 0.1719 0.4829 Colon6

  Colon12 0.7246 0.2947 0.3904 Colon12

  Colon13 0.5620 0.2061 0.5517 Colon13

Cluster 8 Carcinoid18 0.6827 0.1944 0.3939 Carcinoid18

  Colon1 0.6827 0.2248 0.4093 Colon1

Cluster 9 SmallCell1 0.6772 0.1897 0.3983 SmallCell1

  SmallCell4 0.6772 0.1536 0.3814 SmallCell4

Page 7: Classification project- application using SAS base programing

10 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

Cluster 10 Colon3 1.0000 0.0970 0.0000 Colon3

FIG 3 Variable classification result

Page 8: Classification project- application using SAS base programing

When we executed our clustering procedure we obtained an improvement in the criteria , but still not good enough for a further analysis. The CCC criteria indicates a lower presence of a outliers and thus a better chance for obtaining a satisfying clusterisation . However the pseudo t square indicates in the are where CCC value allows for a clusterisation, a good number of clusters to be 15, as it is the number which indicates a surge followed by a drop. This number is rather big for our data of 10 variables and difficult to interpret in a proper manner as a consequence. We will continue our analysis on all the variables on which standardization has been performed.

Page 9: Classification project- application using SAS base programing

FIG4. Clustering Criteria for the clusters obtained on 10 variables

3. Standardized variables – PROC STANDARD

Page 10: Classification project- application using SAS base programing

Our third attempt consists in running PROC CLUSTER on standardized data with no outliers for the normal and smallcell variables.

We are looking for a Cubic Clustering Criteria (CCC) which is greater than 0 as well as local maximum and local maximum for the Pseudo F and Pseudo t square Criteria. As we do not observe a local spike in the Pseudo F statistic plot, we will use the pseudo t square as a criteria. We see that there are several local spikes, but we will take into consideration only those grater or equal to 11, ass for the others the CCC is negative indicating the presence of outliers. We will choose 12 clusters which is equal to K+1, K being the number of clusters where pseudo T square was a local maximum.

FIG5. Clustering Criteria for the clusters standardize variables

The resulted classification is comprised in the table below. The results are robust as there is no class with few observations.FIG 6 – Final clusters

Page 11: Classification project- application using SAS base programing

FIG 7 Dendogram obtained from the cluster procedure

We want to study the characteristics of each cluster. In order to do that we will look at the classification obtained by the VARCLUS procedure and we will create 4 variables which we will use to highlight the difference between the clusters we obtained.

Page 12: Classification project- application using SAS base programing

FIG 8 Characteristics of clusters found

Interpretation of cluster values

The clusters which include most of our observations are cluster number 3, 4 and 11. Clusters 3 and 11 are distinguishable as they contain values closer to 0. We can interpret this observations as being less prone to having a medical problem. Cluster 1 contains the fewest number of observations, however all the variables displayed high values, indicating a set of individuals which have a medical condition that is worst then the rest of the observations. Individuals from the 5th Cluster also exhibit a salient pattern as the value for the Colon cell is greater than the rest of the Colon values. The Carcinoid1 and Carcinoid2 have also striking low negative values. The values from the Carcinoid1 and Carcinoid2 display values that are somehow similar for each cluster.

REFERENCES

Page 13: Classification project- application using SAS base programing

Variable Reduction for Modeling using PROC VARCLUS, Bryan D. Nelsonhttp://www2.sas.com/proceedings/sugi26/p261-26.pdf

A Methodological approach to performing cluster analysis with SAS®, William F. McCarthyhttp://analytics.ncsu.edu/sesug/2007/DM05.pdf

SAS Institute Inc.SAS/STAT ® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999https://ciser.cornell.edu/sasdoc/saspdf/stat/chap16.pdf

Data Mining et Statistique Decisionelle, Stéphane Tufféryhttp://data.mining.free.fr/cours/Descriptives.pdf

Page 14: Classification project- application using SAS base programing

ANNEX

Results of PROC VARCLUS on a given number of 4 clusters4 classes r carré avec Rapport

1-R**2Libellé dela variableClasse Variable Propre

classeLe plusproche

Cluster 1 Normal1 0.5759 0.4497 0.7707 Normal1

  Normal2 0.7123 0.4297 0.5045 Normal2

  Normal3 0.7042 0.3347 0.4446 Normal3

  Normal4 0.6145 0.3086 0.5575 Normal4

  Normal5 0.8057 0.3996 0.3236 Normal5

  Normal6 0.7870 0.4343 0.3765 Normal6

  Normal7 0.7050 0.4419 0.5286 Normal7

  Normal8 0.7764 0.4810 0.4308 Normal8

  Normal9 0.7807 0.3743 0.3505 Normal9

  Normal10 0.7144 0.4917 0.5620 Normal10

  Normal11 0.6122 0.2759 0.5355 Normal11

  Normal13 0.5389 0.4305 0.8098 Normal13

  Normal14 0.6134 0.3293 0.5764 Normal14

  Normal15 0.5398 0.3839 0.7469 Normal15

  Normal16 0.6145 0.2525 0.5157 Normal16

  Normal17 0.6947 0.3544 0.4729 Normal17

  SmallCell2 0.1319 0.0173 0.8835 SmallCell2

  SmallCell3 0.1727 0.0407 0.8624 SmallCell3

  SmallCell6 0.2829 0.1281 0.8225 SmallCell6

Cluster 2 Carcinoid2 0.7112 0.4379 0.5138 Carcinoid2

  Carcinoid3 0.5221 0.1459 0.5595 Carcinoid3

  Carcinoid4 0.7768 0.2583 0.3009 Carcinoid4

  Carcinoid9 0.6935 0.3199 0.4506 Carcinoid9

  Carcinoid14 0.6603 0.1824 0.4155 Carcinoid14

Page 15: Classification project- application using SAS base programing

4 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

  Carcinoid15 0.7405 0.3248 0.3843 Carcinoid15

  Carcinoid16 0.6643 0.2120 0.4260 Carcinoid16

  Carcinoid17 0.7396 0.4638 0.4857 Carcinoid17

  SmallCell1 0.2858 0.1533 0.8435 SmallCell1

  SmallCell4 0.0617 0.0057 0.9437 SmallCell4

  SmallCell5 0.1136 0.0698 0.9529 SmallCell5

Cluster 3 Colon2 0.4956 0.0987 0.5597 Colon2

  Colon3 0.1472 0.0642 0.9113 Colon3

  Colon4 0.6775 0.0912 0.3549 Colon4

  Colon5 0.5922 0.1323 0.4700 Colon5

  Colon6 0.2904 0.0362 0.7363 Colon6

  Colon7 0.5091 0.2763 0.6783 Colon7

  Colon8 0.5530 0.2569 0.6015 Colon8

  Colon9 0.4197 0.1813 0.7087 Colon9

  Colon10 0.4903 0.1314 0.5868 Colon10

  Colon11 0.5567 0.1919 0.5486 Colon11

  Colon12 0.4449 0.0553 0.5876 Colon12

  Colon13 0.3369 0.0934 0.7315 Colon13

Cluster 4 Carcinoid1 0.6485 0.3165 0.5142 Carcinoid1

  Carcinoid5 0.3241 0.1925 0.8369 Carcinoid5

  Carcinoid6 0.4246 0.2670 0.7850 Carcinoid6

  Carcinoid7 0.6180 0.3510 0.5887 Carcinoid7

  Carcinoid8 0.4619 0.2834 0.7509 Carcinoid8

  Carcinoid10 0.3452 0.1753 0.7940 Carcinoid10

  Carcinoid11 0.3911 0.2645 0.8279 Carcinoid11

  Carcinoid12 0.6441 0.3407 0.5398 Carcinoid12

  Carcinoid13 0.6721 0.3800 0.5288 Carcinoid13

Page 16: Classification project- application using SAS base programing

4 classes r carré avec Rapport1-R**2

Libellé dela variableClasse Variable Propre

classeLe plusproche

  Carcinoid18 0.2518 0.1611 0.8919 Carcinoid18

  Carcinoid19 0.6462 0.3467 0.5415 Carcinoid19

  Carcinoid20 0.6727 0.3954 0.5414 Carcinoid20

  Colon1 0.2930 0.2116 0.8967 Colon1

  Normal12 0.4199 0.3089 0.8395 Normal12

Best 8 correlation between the 56 variablesCoefficients de corrélation de Pearson, N = 12625

Carcinoid1

Carcinoid1

Carcinoid1

1.00000

Carcinoid7

0.76043

Carcinoid12

0.66852

Carcinoid13

0.65419

Carcinoid19

0.64338

Carcinoid20

0.61351

Normal1

-0.55903

Normal10

-0.51519

Carcinoid2

Carcinoid2

Carcinoid2

1.00000

Carcinoid17

0.82788

Carcinoid9

0.70795

Carcinoid4

0.70294

Carcinoid15

0.69357

Carcinoid14

0.60661

Carcinoid16

0.60145

Carcinoid20

0.59006

Carcinoid3

Carcinoid3

Carcinoid3

1.00000

Carcinoid16

0.66630

Carcinoid4

0.64394

Carcinoid15

0.55431

Carcinoid14

0.54885

Carcinoid17

0.52568

Carcinoid2

0.51971

Carcinoid9

0.51357

Carcinoid4

Carcinoid4

Carcinoid4

1.00000

Carcinoid15

0.73755

Carcinoid14

0.73427

Carcinoid9

0.70689

Carcinoid2

0.70294

Carcinoid16

0.70230

Carcinoid17

0.69915

Carcinoid3

0.64394

Carcinoid5

Carcinoid5

Carcinoid5

1.00000

Carcinoid2

0.45567

Carcinoid6

0.43501

Carcinoid7

0.43160

Carcinoid17

0.43127

Carcinoid1

0.42381

Carcinoid19

0.42197

Carcinoid15

0.41406

Page 17: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

Carcinoid6

Carcinoid6

Carcinoid6

1.00000

Carcinoid8

0.76986

Carcinoid11

0.70985

Carcinoid17

0.49237

Normal1

-0.48927

Carcinoid2

0.48818

Normal10

-0.47197

Carcinoid9

0.46320

Carcinoid7

Carcinoid7

Carcinoid7

1.00000

Carcinoid1

0.76043

Carcinoid19

0.66637

Carcinoid13

0.65993

Carcinoid12

0.64727

Carcinoid20

0.61242

Normal10

-0.55620

Normal1

-0.54062

Carcinoid8

Carcinoid8

Carcinoid8

1.00000

Carcinoid6

0.76986

Carcinoid11

0.70054

Carcinoid17

0.55688

Carcinoid2

0.54662

Normal1

-0.52602

Normal8

-0.51673

Normal10

-0.51604

Carcinoid9

Carcinoid9

Carcinoid9

1.00000

Carcinoid15

0.77872

Carcinoid2

0.70795

Carcinoid4

0.70689

Carcinoid17

0.68714

Carcinoid14

0.64445

Carcinoid16

0.60833

Carcinoid3

0.51357

Carcinoid10

Carcinoid10

Carcinoid10

1.00000

Carcinoid1

0.46498

Carcinoid13

0.43394

Carcinoid19

0.42922

Normal8

-0.42764

Carcinoid7

0.42744

Carcinoid12

0.42491

Carcinoid20

0.42129

Carcinoid11

Carcinoid11

Carcinoid11

1.00000

Carcinoid6

0.70985

Carcinoid8

0.70054

Normal8

-0.52161

Normal10

-0.51058

Normal2

-0.47851

Normal7

-0.47280

Normal13

-0.46092

Carcinoid12

Carcinoid12

Carcinoid12

1.00000

Carcinoid13

0.71217

Carcinoid19

0.68722

Carcinoid1

0.66852

Carcinoid20

0.66621

Carcinoid7

0.64727

Normal10

-0.5449

Normal8

-0.5348

Page 18: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

2 9

Carcinoid13

Carcinoid13

Carcinoid13

1.00000

Carcinoid20

0.72258

Carcinoid12

0.71217

Carcinoid19

0.67641

Carcinoid7

0.65993

Carcinoid1

0.65419

Normal8

-0.58342

Normal7

-0.58017

Carcinoid14

Carcinoid14

Carcinoid14

1.00000

Carcinoid4

0.73427

Carcinoid15

0.68809

Carcinoid9

0.64445

Carcinoid17

0.63565

Carcinoid2

0.60661

Carcinoid16

0.59469

Carcinoid3

0.54885

Carcinoid15

Carcinoid15

Carcinoid15

1.00000

Carcinoid9

0.77872

Carcinoid4

0.73755

Carcinoid17

0.72976

Carcinoid2

0.69357

Carcinoid14

0.68809

Carcinoid16

0.65930

Carcinoid3

0.55431

Carcinoid16

Carcinoid16

Carcinoid16

1.00000

Carcinoid4

0.70230

Carcinoid3

0.66630

Carcinoid15

0.65930

Carcinoid17

0.65897

Carcinoid9

0.60833

Carcinoid2

0.60145

Carcinoid14

0.59469

Carcinoid17

Carcinoid17

Carcinoid17

1.00000

Carcinoid2

0.82788

Carcinoid15

0.72976

Carcinoid4

0.69915

Carcinoid9

0.68714

Carcinoid16

0.65897

Carcinoid14

0.63565

Carcinoid20

0.62751

Carcinoid18

Carcinoid18

Carcinoid18

1.00000

Carcinoid12

0.40774

Carcinoid20

0.39237

Carcinoid13

0.38769

Normal6

-0.38244

Normal8

-0.37752

Colon1

-0.36538

Normal3

-0.36512

Carcinoid19

Carcinoid19

Carcinoid19

1.00000

Carcinoid20

0.75603

Carcinoid12

0.68722

Carcinoid13

0.67641

Carcinoid7

0.66637

Carcinoid1

0.64338

Carcinoid17

0.57726

Carcinoid2

0.56875

Page 19: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

Carcinoid20

Carcinoid20

Carcinoid20

1.00000

Carcinoid19

0.75603

Carcinoid13

0.72258

Carcinoid12

0.66621

Normal10

-0.62818

Carcinoid17

0.62751

Carcinoid1

0.61351

Carcinoid7

0.61242

Colon1

Colon1

Colon1

1.00000

Colon10

0.52391

Colon8

0.47107

Carcinoid20

-0.46948

Carcinoid13

-0.46442

Colon7

0.46279

Carcinoid12

-0.42695

Colon5

0.39584

Colon2

Colon2

Colon2

1.00000

Colon5

0.53715

Colon8

0.50553

Colon11

0.50005

Colon4

0.49674

Colon10

0.48671

Colon9

0.45048

Colon7

0.41451

Colon3

Colon3

Colon3

1.00000

Colon7

0.34298

Colon13

0.30004

Carcinoid15

-0.29628

Colon11

0.27278

Colon9

0.25388

Colon10

0.24795

Carcinoid17

-0.24515

Colon4

Colon4

Colon4

1.00000

Colon5

0.76511

Colon11

0.57350

Colon12

0.55389

Colon8

0.54034

Colon10

0.50760

Colon2

0.49674

Colon7

0.48757

Colon5

Colon5

Colon5

1.00000

Colon4

0.76511

Colon10

0.61925

Colon7

0.53726

Colon2

0.53715

Colon8

0.51923

Colon11

0.47195

Colon9

0.40288

Colon6

Colon6

Colon6

1.00000

Colon12

0.51281

Colon4

0.41797

Colon11

0.39771

Colon9

0.34628

Colon13

0.33400

Colon2

0.32768

Colon8

0.31242

Colon Colon Colon Colon Colon Colon Colon Colon Colon

Page 20: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

7

Colon7

7

1.00000

8

0.55466

5

0.53726

11

0.53362

10

0.49063

4

0.48757

9

0.48755

1

0.46279

Colon8

Colon8

Colon8

1.00000

Colon7

0.55466

Colon4

0.54034

Colon10

0.53437

Colon11

0.52166

Colon5

0.51923

Colon2

0.50553

Colon12

0.47912

Colon9

Colon9

Colon9

1.00000

Colon7

0.48755

Colon11

0.46316

Carcinoid7

-0.45329

Colon2

0.45048

Colon8

0.44571

Colon4

0.42571

Carcinoid15

-0.41483

Colon10

Colon10

Colon10

1.00000

Colon5

0.61925

Colon8

0.53437

Colon1

0.52391

Colon4

0.50760

Colon7

0.49063

Colon2

0.48671

Colon13

0.41241

Colon11

Colon11

Colon11

1.00000

Colon4

0.57350

Colon7

0.53362

Colon8

0.52166

Colon2

0.50005

Colon12

0.47237

Colon5

0.47195

Colon9

0.46316

Colon12

Colon12

Colon12

1.00000

Colon4

0.55389

Colon6

0.51281

Colon8

0.47912

Colon13

0.47698

Colon11

0.47237

Colon2

0.39536

Colon5

0.37724

Colon13

Colon13

Colon13

1.00000

Colon4

0.48486

Colon12

0.47698

Colon10

0.41241

Colon11

0.35161

Colon6

0.33400

Colon2

0.32347

Normal14

-0.32289

Normal1

Norma

Normal1

1.0000

Normal6

0.7158

Normal8

0.6832

Normal13

0.6564

Normal9

0.6551

Normal10

0.6410

Normal5

0.6378

Normal15

0.6091

Page 21: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

l1 0 0 7 2 3 9 9 6

Normal2

Normal2

Normal2

1.00000

Normal10

0.81377

Normal5

0.76149

Normal7

0.75402

Normal17

0.73383

Normal8

0.73289

Normal9

0.72105

Normal3

0.68483

Normal3

Normal3

Normal3

1.00000

Normal9

0.78681

Normal7

0.77417

Normal5

0.76386

Normal6

0.76234

Normal8

0.75050

Normal16

0.68702

Normal14

0.68599

Normal4

Normal4

Normal4

1.00000

Normal5

0.74139

Normal8

0.69502

Normal6

0.68872

Normal3

0.64913

Normal2

0.64644

Normal16

0.64563

Normal17

0.63615

Normal5

Normal5

Normal5

1.00000

Normal9

0.79233

Normal17

0.77752

Normal3

0.76386

Normal2

0.76149

Normal6

0.75498

Normal8

0.75131

Normal4

0.74139

Normal6

Normal6

Normal6

1.00000

Normal9

0.77519

Normal8

0.77407

Normal3

0.76234

Normal5

0.75498

Normal7

0.73208

Normal1

0.71580

Normal10

0.71054

Normal7

Normal7

Normal7

1.00000

Normal8

0.78269

Normal10

0.78147

Normal3

0.77417

Normal9

0.76339

Normal2

0.75402

Normal5

0.73753

Normal6

0.73208

Normal8

Normal8

Normal8

1.00000

Normal9

0.82385

Normal7

0.78269

Normal6

0.77407

Normal5

0.75131

Normal3

0.75050

Normal10

0.74076

Normal2

0.73289

Normal9

Normal9

Normal8

Normal5

Normal3

Normal6

Normal7

Normal17

Normal16

Page 22: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

Normal9

1.00000

0.82385

0.79233

0.78681

0.77519

0.76339

0.76014

0.72997

Normal10

Normal10

Normal10

1.00000

Normal2

0.81377

Normal7

0.78147

Normal8

0.74076

Normal5

0.71437

Normal6

0.71054

Normal17

0.70908

Normal9

0.70497

Normal11

Normal11

Normal11

1.00000

Normal5

0.70896

Normal17

0.70307

Normal16

0.69957

Normal6

0.67592

Normal9

0.67177

Normal10

0.66273

Normal2

0.63507

Normal12

Normal12

Normal12

1.00000

Normal1

0.60477

Normal10

0.57498

Normal13

0.54883

Normal8

0.51963

Carcinoid20

-0.51910

Normal6

0.51896

Carcinoid17

-0.51502

Normal13

Normal13

Normal13

1.00000

Normal10

0.68100

Normal6

0.68037

Normal1

0.65642

Normal2

0.63214

Normal5

0.62261

Normal8

0.60417

Normal7

0.58099

Normal14

Normal14

Normal14

1.00000

Normal8

0.72169

Normal3

0.68599

Normal9

0.67728

Normal5

0.67636

Normal7

0.67364

Normal2

0.67173

Normal6

0.64870

Normal15

Normal15

Normal15

1.00000

Normal5

0.69502

Normal6

0.65994

Normal4

0.61474

Normal1

0.60916

Normal8

0.59538

Normal16

0.59358

Normal3

0.59067

Normal16

Normal16

Normal16

1.00000

Normal5

0.73010

Normal9

0.72997

Normal6

0.70484

Normal11

0.69957

Normal3

0.68702

Normal4

0.64563

Normal17

0.61784

Page 23: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

Normal17

Normal17

Normal17

1.00000

Normal5

0.77752

Normal9

0.76014

Normal2

0.73383

Normal8

0.72886

Normal10

0.70908

Normal11

0.70307

Normal6

0.68954

SmallCell1

SmallCell1

SmallCell1

1.00000

SmallCell5

0.43470

Carcinoid4

-0.41770

Carcinoid17

-0.40781

Carcinoid2

-0.39123

Carcinoid16

-0.38007

Carcinoid14

-0.37400

SmallCell3

0.37240

SmallCell2

SmallCell2

SmallCell2

1.00000

SmallCell5

0.52080

SmallCell3

0.48107

SmallCell4

0.41688

Normal17

-0.35743

SmallCell6

0.35288

Normal11

-0.31516

Normal16

-0.29002

SmallCell3

SmallCell3

SmallCell3

1.00000

SmallCell5

0.65225

SmallCell6

0.55030

SmallCell2

0.48107

SmallCell1

0.37240

Normal6

-0.35422

Normal9

-0.33901

Normal8

-0.32282

SmallCell4

SmallCell4

SmallCell4

1.00000

SmallCell2

0.41688

SmallCell1

0.35445

SmallCell5

0.35163

SmallCell3

0.31675

Colon13

-0.23498

Carcinoid2

-0.20339

Carcinoid17

-0.19663

SmallCell5

SmallCell5

SmallCell5

1.00000

SmallCell3

0.65225

SmallCell2

0.52080

SmallCell6

0.45510

SmallCell1

0.43470

SmallCell4

0.35163

Carcinoid3

-0.29408

Carcinoid18

0.28815

SmallCell6

SmallCell6

SmallCell6

1.00000

SmallCell3

0.55030

Normal6

-0.4681

Normal8

-0.4621

SmallCell5

0.45510

Normal10

-0.4432

Normal7

-0.4420

Normal9

-0.4231

Page 24: Classification project- application using SAS base programing

Coefficients de corrélation de Pearson, N = 12625

6 0 8 3 4