laboratory diagnosis of anemia and related diseases using multivariate analysis

10
American Journal of Hematology 54:108–117 (1997) Laboratory Diagnosis of Anemia and Related Diseases Using Multivariate Analysis Shuichi Shiga, 1 * Toshihito Furukawa, 3 Itsuko Koyanagi, 1 Morihisa Yamagishi, 2 Yataro Yoshida, 2 Takayuki Takahashi, 4 Reiji Kannagi, 5 and Toru Mori 1 1 Central Clinical Laboratory, Kyoto University School of Medicine, Kyoto, Japan 2 First Division of Internal Medicine, Kyoto University School of Medicine, Kyoto, Japan 3 Department of Scientific Development Headquarters, SRL, Inc., Hachiogi, Tokyo, Japan 4 Department of Medicine, Kobe General Hospital, Kobe, Japan 5 Second Division of Pathology, Aichi Cancer Center, Nagoya, Japan To establish a simple computer program for the laboratory diagnosis of anemia and related diseases, multivariate analyses were applied to the results of routine hematological laboratory tests obtained from 48 patients and 51 healthy volunteers. The patients studied were limited to those who had not been treated hematologically by the time of their first visit to our hospital, and their first data obtained in our laboratory were analyzed. Final diagnoses were aplastic anemia (AA) in 21, myelodysplastic syndrome (MDS) in 14, iron deficiency anemia (IDA) in 3, polycytemia vera (PV)in 3, and idiopathic thrombocytopenic purpura (ITP) in 7. Eight parameters, WBC, RBC, Hb, Ht, MCV, MCH, MCHC, and PLT, were transformed to normal distribution and then applied to principal component analysis to evaluate their independence. Very close relationships were observed between Ht and Hb, and between MCV and MCH. One each of these pairs was selected by discriminant analysis and two sets, RBC, MCH, Hb, PLT, and WBC, and RBC, MCV, Ht, PLT, and WBC, were obtained. Two canonical components gave good discrimination of these five diseases and also of normal subjects. When disease prediction was made using this analysis, 37 of 48 patients (77.1%) were predicted correctly, and furthermore, when two disease predictions were allowed, all patients were diagnosed properly. Some overlaps were observed in this two-dimensional coordinate system, especially of AA and MDS, and also with normal subjects. To improve the system further, the additional parameters of age and sex were added to construct a three-dimensional analysis which resulted in much clearer discrimination. The whole procedure described is being developed with subjects who are not taking medication. Subsequently, the general application of this analytical procedure should be limited to only those not on medications. In conclusion, this is in essence a demonstration project; however, this trial of laboratory diagnosis using routine hematological laboratory results appears to be promising. Further extension of the study by increasing numbers of patients and disorders studied, including secondary anemias, will allow the design of diagnostic software for use with personal computers at the sites of primary care. Am. J. Hematol. 54:108–117, 1997 Q 1997 Wiley-Liss Inc. Key words: computer diagnosis; multivariate analysis; anemia; hematology; labora- tory diagnosis INTRODUCTION Recent progress in laboratory medicine has enabled Abbreviations: RBC, red blood cell count; Hb, hemoglobin concentra- accurate laboratory information to be obtained very rap- tion; Ht, hematocrit; MCV, mean corpuscular volume; MCH, mean cor- idly. However, it may be difficult for a physician to inter- puscular hemoglobin content; MCHC, mean corpuscular hemoglobin pret all laboratory information correctly, because too concentration; PLT, platelet count; WBC, white blood cell count; AA, much and variable data are delivered every day. There- aplastic anemia; MDS, myelodysplastic syndrome; IDA, iron deficiency anemia; PV, polycytemia vera; ITP, idiopathic thrombocytopenic fore, to facilitate effective discrimination of patients at purpura. the sites of primary care, it would be desirable to have *Correspondence to: Shuichi Shiga, M.T., Central Clinical Laboratory, simple laboratory diagnosis systems utilizing routine lab- Kyoto University Hospital, 54 Shogoin Kawaharacho, Sakyo-ku, Kyoto oratory results only. Such systems have been reported 606-01, Japan. previously [1–4]. Received for publication 11 March 1996; Accepted 11 September 1996. In the present study, we developed a computer program Contract grant sponsor: SRL, Inc. for the differential diagnosis of anemia and related disor- Q 1997 Wiley-Liss, Inc.

Upload: toru

Post on 06-Jun-2016

223 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Laboratory diagnosis of anemia and related diseases using multivariate analysis

American Journal of Hematology 54:108–117 (1997)

Laboratory Diagnosis of Anemia and Related DiseasesUsing Multivariate Analysis

Shuichi Shiga,1* Toshihito Furukawa,3 Itsuko Koyanagi,1 Morihisa Yamagishi,2

Yataro Yoshida,2 Takayuki Takahashi,4 Reiji Kannagi,5 and Toru Mori1

1Central Clinical Laboratory, Kyoto University School of Medicine, Kyoto, Japan2First Division of Internal Medicine, Kyoto University School of Medicine, Kyoto, Japan

3Department of Scientific Development Headquarters, SRL, Inc., Hachiogi, Tokyo, Japan4Department of Medicine, Kobe General Hospital, Kobe, Japan

5Second Division of Pathology, Aichi Cancer Center, Nagoya, Japan

To establish a simple computer program for the laboratory diagnosis of anemia andrelated diseases, multivariate analyses were applied to the results of routine hematologicallaboratory tests obtained from 48 patients and 51 healthy volunteers. The patients studiedwere limited to those who had not been treated hematologically by the time of their firstvisit to our hospital, and their first data obtained in our laboratory were analyzed. Finaldiagnoses were aplastic anemia (AA) in 21, myelodysplastic syndrome (MDS) in 14, irondeficiency anemia (IDA) in 3, polycytemia vera (PV)in 3, and idiopathic thrombocytopenicpurpura (ITP) in 7. Eight parameters, WBC, RBC, Hb, Ht, MCV, MCH, MCHC, and PLT, weretransformed to normal distribution and then applied to principal component analysis toevaluate their independence. Very close relationships were observed between Ht and Hb,and between MCV and MCH. One each of these pairs was selected by discriminant analysisand two sets, RBC, MCH, Hb, PLT, and WBC, and RBC, MCV, Ht, PLT, and WBC, wereobtained. Two canonical components gave good discrimination of these five diseasesand also of normal subjects. When disease prediction was made using this analysis,37 of 48 patients (77.1%) were predicted correctly, and furthermore, when two diseasepredictions were allowed, all patients were diagnosed properly. Some overlaps wereobserved in this two-dimensional coordinate system, especially of AA and MDS, and alsowith normal subjects. To improve the system further, the additional parameters of ageand sex were added to construct a three-dimensional analysis which resulted in muchclearer discrimination. The whole procedure described is being developed with subjectswho are not taking medication. Subsequently, the general application of this analyticalprocedure should be limited to only those not on medications. In conclusion, this is inessence a demonstration project; however, this trial of laboratory diagnosis using routinehematological laboratory results appears to be promising. Further extension of the studyby increasing numbers of patients and disorders studied, including secondary anemias,will allow the design of diagnostic software for use with personal computers at the sitesof primary care. Am. J. Hematol. 54:108–117, 1997 Q 1997 Wiley-Liss Inc.

Key words: computer diagnosis; multivariate analysis; anemia; hematology; labora-tory diagnosis

INTRODUCTION

Recent progress in laboratory medicine has enabledAbbreviations: RBC, red blood cell count; Hb, hemoglobin concentra-accurate laboratory information to be obtained very rap-tion; Ht, hematocrit; MCV, mean corpuscular volume; MCH, mean cor-idly. However, it may be difficult for a physician to inter-puscular hemoglobin content; MCHC, mean corpuscular hemoglobin

pret all laboratory information correctly, because too concentration; PLT, platelet count; WBC, white blood cell count; AA,much and variable data are delivered every day. There- aplastic anemia; MDS, myelodysplastic syndrome; IDA, iron deficiency

anemia; PV, polycytemia vera; ITP, idiopathic thrombocytopenicfore, to facilitate effective discrimination of patients atpurpura.the sites of primary care, it would be desirable to have*Correspondence to: Shuichi Shiga, M.T., Central Clinical Laboratory,simple laboratory diagnosis systems utilizing routine lab-Kyoto University Hospital, 54 Shogoin Kawaharacho, Sakyo-ku, Kyotooratory results only. Such systems have been reported606-01, Japan.previously [1–4].Received for publication 11 March 1996; Accepted 11 September 1996.In the present study, we developed a computer programContract grant sponsor: SRL, Inc.for the differential diagnosis of anemia and related disor-

Q 1997 Wiley-Liss, Inc.

8288$$P567 07-23-97 13:20:56

Page 2: Laboratory diagnosis of anemia and related diseases using multivariate analysis

Computer Diagnosis of Anemia 109

ders. The relationship between two variables is usually on the hematological laboratory results. After extensiveanalyses of these, we are now using fractional referenceexpressed by a correlation coefficient. However, in the

case of multiple variables it is rather difficult to demon- values routinely for each of our hematological laboratoryparameters [17]. In the second analysis, these data werestrate any similarity among variables, or to look for the

underlying factors simply by calculating correlation coef- also adjusted for normalization by age and sex.ficients. For this purpose, routine hematological labora-

Statistical Analysistory results were analyzed using the discriminant analysisbased on Wilks’ lambda to devise an “expert” diagnostic Some analyses were performed after normalizing trans-

formations, because some of the laboratory result distribu-algorithm. Such an analysis is not new but has been onlyrelatively infrequently applied to hematological prob- tions were skewed. Accordingly, WBC was log-trans-

formed, PLT was log-log transformed, and MCV andlems. In order to facilitate wide application, we used SAS,which is readily available to most laboratories. These MCH were transformed by their squares.

Relationships among these hematological laboratory re-methods provided the basis for an excellent and usefulalgorithm for the differentiation of various disorders of sults were analyzed using principal component analysis

[5–7] (PRINCOMP procedure using SAS). The purpose ofthe hematopoietic system.a principal component analysis is to derive a small numberof linear combinations (principal components) from a setof variables that retain as much of the information in theMATERIALS AND METHODSoriginal variables as possible. A principal component is aClinical Materialscoordinate obtained after condensing the information of

Forty-eight patients suffering from anemia or related each variable (Eigen value) maximally. Plotting of individ-disorders were selected for the study. The patients were ual values on a principal component system will reflect thelimited to those who had not been treated by the time cumulative proportion of these variables. The first princi-of their first visit to our hospital. Since the results of pal component (PRIN1) has the largest variance (largesthematological examinations can be altered easily by med- information) of any unit-length linear combination of theications, it was critical to restrict the usage of data only observed variables. The jth principal component (PRINj)to those obtained prior to any medication. The patients has the largest variance of any unit-length linear combina-were 46.3 6 19.2 (mean 6 SD) years old, comprised of tion orthogonal to the first j-1 principal component.25 males (51.5 6 18.1 years old) and 23 females The combinations of laboratory results useful for the(40.6 6 19.1 years old). Their diagnoses were established diagnosis of the five disorders were selected by discrimi-through various clinical findings and laboratory results, nant analysis [8–11] based on Wilks’ lambda (SAS;and included 21 cases of aplastic anemia (AA), 14 of STEPDISC procedure). Discriminant analysis is a usefulmyelodysplastic syndrome (MDS), 3 of iron deficiency statistical method, analyzing multiple laboratory resultsanemia (IDA), 3 of polycytemia vera (PV),and 7 of idio- of one individual to find out a particular group amongpathic thrombocytopenic purpura (ITP). Routine hemato- several groups to which the individual belongs. Discrimi-logical examination results at the time of the first visit nant function was used to classify the observations intowere analyzed (Table I). two or more known groups on the basis of test values.

Classification was performed by the parametric method,Control Subjects using a measure of generalized squared distance (Mahala-

nobis’ generalized distance). This distance can derive theAs the control subjects, 51 healthy adult employees inour hospital volunteered to participate in this study and probability distance between the center of the group and

specific observation of other groups. Wilks’ lambda isdonated blood samples. The control subjects were33.7 6 13.1 years old and were comprised of 18 males the classification statistic based on a linear regression

between variable and variance of classification which(36.5 6 13.8 years old) and 33 females (32.1 6 12.6years old). Their test results, shown as the highest value, shows the relationship of within-group and between-

group variance. After selection of laboratory results, statis-lowest value, and mean and SD of eight parameters,respectively, were as follows: WBC, 8.3, 3.4, and tical models for the prediction of five diseases were

achieved by canonical analysis (SAS; CANDISC proce-5.5 6 1.2 (3 13/ml); RBC, 5.51, 3.75, and 4.56 6 0.39(3 106/ml); Hb, 16.8, 11.7, and 13.9 6 1.3 (g/dl); Ht, dure [12–16]). Canonical discriminant analysis derives a

linear combination of the variable that has the maximal51.2, 35.7, and 41.4 6 3.8 (%); MCV, 106.2, 84.2, and90.9 6 4.2 (fl); MCH, 35.6, 28.1, and 30.4 6 1.5 (pg); multiple correlation with the groups. This maximal multi-

ple correlation is called the first canonical correlationMCHC, 34.3, 32.4, and 33.4 6 0.5 (%); and PLT, 420,165, and 247 6 51 (3 103/ml). (CAN1). The variable defined by the linear combination is

the first canonical component. The second canonical corre-These subjects were much younger than the patientgroups. Age and sex were the major prognostic factors lation (CAN2) is obtained by finding the linear combina-

8288$$P567 07-23-97 13:20:56

Page 3: Laboratory diagnosis of anemia and related diseases using multivariate analysis

110 Shiga et al.

TABLE I. Hematological Laboratory Data From 48 Patients

Case Sex Age Diagnosis WBC RBC Hb Ht MCV MCH MCHC PLT

1 F 53 AA 3.7 3.79 12.2 37.6 99.2 32.2 32.4 262 F 49 AA 2.2 2.00 6.8 19.8 99.0 34.0 34.3 173 F 15 AA 1.9 1.70 5.6 16.9 99.7 33.2 33.2 174 F 40 AA 2.6 2.46 7.7 24.2 98.4 31.3 31.8 525 M 32 AA 1.8 1.71 7.0 20.0 117.1 41.2 35.2 336 F 61 AA 4.8 2.98 10.4 30.5 102.3 34.9 34.1 137 M 39 AA 2.7 0.87 3.7 11.0 126.4 42.5 33.5 128 M 29 AA 2.5 1.45 6.2 18.7 129.0 42.8 33.2 149 M 52 AA 4.7 3.21 12.7 38.3 119.3 39.6 33.2 25

10 F 24 AA 5.7 3.12 10.7 31.8 101.9 34.3 33.7 10811 F 33 AA 3.0 2.98 11.3 33.0 110.8 37.9 34.2 4112 F 20 AA 2.9 2.92 9.1 26.7 91.4 31.2 34.1 5413 F 60 AA 4.4 1.66 6.6 18.7 112.6 39.9 35.4 1914 F 17 AA 2.4 2.28 7.1 21.6 94.7 31.1 32.9 1015 M 63 AA 1.6 2.51 8.7 24.9 99.2 34.7 34.9 3816 F 40 AA 1.7 3.10 10.4 29.5 95.2 33.5 35.3 1817 F 14 AA 1.8 2.02 7.5 22.5 111.4 37.1 33.3 2618 M 56 AA 1.3 0.87 3.7 10.3 118.4 42.5 35.9 919 M 22 AA 1.8 2.41 8.4 25.1 104.1 34.9 33.5 2320 M 21 AA 3.5 2.42 9.6 29.0 119.8 39.7 33.1 621 M 61 AA 3.1 3.35 13.1 37.9 113.1 39.1 34.6 6022 M 74 MDS 2.6 2.05 7.0 20.7 100.9 34.1 33.8 20423 F 65 MDS 1.6 3.09 10.6 32.0 103.7 34.4 33.2 11624 M 71 MDS 2.2 1.62 6.6 19.0 117.3 40.7 34.7 7025 F 72 MDS 3.0 2.42 8.2 24.7 102.1 33.9 33.2 11426 M 50 MDS 2.2 2.62 8.5 25.5 97.5 32.4 33.3 1227 M 40 MDS 5.3 3.98 14.8 43.7 109.8 37.2 33.9 49228 M 19 MDS 2.0 1.77 7.0 20.7 116.4 39.2 33.7 2129 M 63 MDS 3.1 2.25 8.1 24.1 106.9 35.7 33.4 4430 F 39 MDS 3.1 2.38 7.1 22.9 96.2 30.0 31.0 1831 F 78 MDS 2.4 2.25 6.6 20.3 90.2 29.4 32.6 25332 M 83 MDS 1.0 3.76 10.2 31.0 82.3 27.0 32.8 6533 M 71 MDS 1.5 2.10 6.9 19.7 93.8 33.0 35.2 3034 M 64 MDS 5.3 4.58 14.7 44.3 96.7 32.1 33.2 12835 M 70 MDS 0.5 1.54 5.6 16.8 108.6 36.3 33.4 5236 F 23 IDA 3.7 4.37 8.2 27.3 62.5 18.8 30.0 31537 F 47 IDA 5.2 3.93 10.3 31.4 79.9 26.2 32.8 32138 F 41 IDA 5.0 3.80 8.7 27.2 71.6 22.9 32.0 22939 M 58 PV 22.7 6.40 13.8 43.8 68.4 21.6 31.5 1,40040 M 33 PV 8.1 5.88 17.1 50.4 85.7 29.1 33.9 8841 M 62 PV 18.9 5.21 11.6 37.0 71.0 22.3 31.4 1,25442 M 59 ITP 6.5 4.32 14.5 43.2 100.0 33.6 33.6 10543 M 55 ITP 3.6 4.68 14.6 43.7 93.2 31.2 33.5 2844 M 40 ITP 10.0 5.33 16.3 48.1 90.2 30.7 34.0 2045 F 18 ITP 13.5 4.37 13.5 39.9 91.3 30.9 33.8 4446 F 24 ITP 4.3 4.67 13.8 41.6 89.1 29.6 33.2 7647 F 48 ITP 4.0 3.80 11.7 34.7 91.2 30.8 33.8 12648 F 53 ITP 7.0 4.65 14.5 41.8 89.8 31.2 34.7 14

tion uncorrelated with CAN1. The process of extracting RESULTScanonical variables can be repeated until the number of Similarity Among Laboratory Resultscanonical components equals the number of original labo-

According to the results of principal component analy-ratory results or the number of groups minus one.sis, we defined two major factors which would explainAt first, we constructed models using laboratory resultsindependent clinical parameters properly: factors indicat-only, and then adjustments for prognostic factors (ageing hematopoietic potential, and erythrocyte indices.and sex [17]) were applied for further improvement.These two factors were found to explain approximatelyAll statistical analyses were performed using SAS

programs. 80% of all information available. Figure 1 shows distribu-

8288$$P567 07-23-97 13:20:56

Page 4: Laboratory diagnosis of anemia and related diseases using multivariate analysis

Computer Diagnosis of Anemia 111

Fig. 1. Relationships between tests: principal component analysis. PRIN1 shows firstprincipal component, and PRIN2 shows second. Plot of the test shows the Eigen vector,which is the correlation coefficient between each principal component and the test vari-ables. This figure shows the distance on the map to demonstrate the relationship amongthe tests. Tests located nearby indicate that findings resembled each other on this analysis.

tion patterns of these eight parameters, plotting principal selected by discriminant analysis. The first picked up aset of four results, RBC, MCH, Hb, and PLT, and thecomponent 1 (PRIN1) vs. PRIN2 on a coordinate system.

MCH and MCV gave very similar information, as did second put more weight on another set comprised ofRBC, MCV, Ht, and PLT. WBC is considered to giveHb and Ht. Therefore, one each of these pairs appeared

to be sufficient for further analysis. different useful information in other disorders, and weadded WBC to each of the above sets.

Selection of Laboratory Results Useful for In classification based on the generalized square dis-Disease Prediction tance, the overall ratio of correct prediction for the five

disorders was 77.1% (37 of 48 cases), using either of theAs described in Materials and Methods, two sets oflaboratory results useful for diagnosing five diseases were above sets (Table I). In other words, the false prediction

8288$$P567 07-25-97 15:37:28

Page 5: Laboratory diagnosis of anemia and related diseases using multivariate analysis

112 Shiga et al.

TABLE II. Disease Prediction in Model 1 Using RBC, MCH, Hb, PLT, and WBC

Case Diagnosis AA IDA ITP MDS PV Prediction

1 AA 0.27 0.00 0.60 0.13 0.00 ITP2 AA 0.68 0.00 0.00 0.32 0.00 AA3 AA 0.61 0.00 0.00 0.39 0.00 AA4 AA 0.25 0.00 0.00 0.75 0.00 MDS5 AA 0.64 0.00 0.00 0.36 0.00 AA6 AA 0.84 0.00 0.09 0.07 0.00 AA7 AA 0.95 0.00 0.00 0.05 0.00 AA8 AA 0.94 0.00 0.00 0.06 0.00 AA9 AA 0.84 0.00 0.06 0.11 0.00 AA

10 AA 0.35 0.00 0.06 0.59 0.00 MDS11 AA 0.61 0.00 0.01 0.38 0.00 AA12 AA 0.27 0.00 0.02 0.71 0.00 MDS13 AA 0.91 0.00 0.00 0.09 0.00 AA14 AA 0.83 0.00 0.01 0.16 0.00 AA15 AA 0.32 0.00 0.00 0.68 0.00 MDS16 AA 0.60 0.00 0.02 0.38 0.00 AA17 AA 0.57 0.00 0.00 0.43 0.00 AA18 AA 0.94 0.00 0.00 0.06 0.00 AA19 AA 0.55 0.00 0.00 0.45 0.00 AA20 AA 0.98 0.00 0.01 0.01 0.00 AA21 AA 0.54 0.00 0.02 0.43 0.00 AA22 MDS 0.09 0.00 0.00 0.91 0.00 MDS23 MDS 0.11 0.00 0.00 0.89 0.00 MDS24 MDS 0.42 0.00 0.00 0.58 0.00 MDS25 MDS 0.18 0.00 0.00 0.82 0.00 MDS26 MDS 0.78 0.00 0.01 0.20 0.00 AA27 MDS 0.16 0.00 0.11 0.73 0.00 MDS28 MDS 0.75 0.00 0.00 0.25 0.00 AA29 MDS 0.48 0.00 0.00 0.51 0.00 MDS30 MDS 0.64 0.00 0.01 0.35 0.00 AA31 MDS 0.04 0.00 0.00 0.96 0.00 MDS32 MDS 0.05 0.04 0.08 0.83 0.00 MDS33 MDS 0.31 0.00 0.00 0.69 0.00 MDS34 MDS 0.02 0.00 0.94 0.04 0.00 ITP35 MDS 0.08 0.00 0.00 0.92 0.00 MDS36 IDA 0.00 0.94 0.00 0.00 0.06 IDA37 IDA 0.00 0.94 0.05 0.01 0.00 IDA38 IDA 0.00 1.00 0.00 0.00 0.00 IDA39 PV 0.00 0.00 0.00 0.00 1.00 PV40 PV 0.00 0.02 0.95 0.00 0.04 ITP41 PV 0.00 0.00 0.00 0.00 1.00 PV42 ITP 0.06 0.00 0.87 0.07 0.00 ITP43 ITP 0.01 0.00 0.98 0.01 0.00 ITP44 ITP 0.00 0.00 1.00 0.00 0.00 ITP45 ITP 0.00 0.00 0.99 0.00 0.00 ITP46 ITP 0.00 0.01 0.98 0.01 0.00 ITP47 ITP 0.11 0.01 0.40 0.49 0.00 MDS48 ITP 0.00 0.00 1.00 0.00 0.00 ITP

rate was 22.9%. Table II shows predicted probability of viewpoint of extensive prediction, all 21 patients werediagnosed properly. In 14 patients with MDS, 10 werethe five disorders in these patients using discriminant

functions. Among 21 patients with AA, 16 (76.1%) were diagnosed correctly and 4 were indicated to have AA(n 5 3) or ITP (n 5 1). Again, extensive prediction gavepredicted correctly but 5 were mistakenly classified as

MDS (n 5 4) or ITP (n 5 1). The rather high false predic- correct results, and false prediction of AA was high as21.4%. Two of 3 PV patients were diagnosed correctly,tion rate of MDS (19.0%) indicates a partial common

property between these two disorders. Extensive predic- and 1 was predicted to have ITP. All but 1 of 7 patientswith ITP were diagnosed correctly, 1 was indicated totion was defined as the classification in which one patient

was predicted to have one of two disorders with the largest have MDS with a 49% probability, and extensive predic-tion was good. All 3 patients with IDA were predictedor the next largest posterior probability. Thus, from the

8288$$P567 07-23-97 13:20:56

Page 6: Laboratory diagnosis of anemia and related diseases using multivariate analysis

Computer Diagnosis of Anemia 113

Fig. 2. Distribution of mean canonical disease scores in model 2 using RBC, MCH, Hb,PLT, and WBL. Plots show the canonical score of the average of test values obtained byfive diseases. Canonical score is the linear combination of the test value. The coefficientof the linear combination was calculated by canonical analysis. This plot shows therelationship of the diseases based on the test.

correctly. If extensive prediction is allowed, all 48 patients appeared to be distributed almost independently fromeach other, but MDS and AA overlapped significantly,were diagnosed properly.especially in the middle portion.

Results of Canonical AnalysisDistribution of Control Subjects andUsing a two-dimensional canonical analysis, meanThree-Dimensional Analysisweights of the five disorders were shown to be distant

from each other, as shown in Figure 2. However, disease When data from 51 control subjects were added andanalyzed together, most of the calculated values for theprediction in individual patients was not fully correct,

indicating possible overlap among distribution areas of control subjects were distributed close to the center ofthe coordinate, and differed markedly from those of theeach disorder. As shown in Figure 3, IDA, PV, and ITP

8288$$P567 07-24-97 11:09:17

Page 7: Laboratory diagnosis of anemia and related diseases using multivariate analysis

114 Shiga et al.

Fig. 3. Distribution of canonical scores of patients with model 2. Plots show the canonicalscores of individual patient data as defined by the diseases. Triangle, AA; Square, IDA;open circle, ITP; diamond, MDS; closed circle, PV.

patients (Fig. 4). However, there were certain overlaps is critical with regard to: 1) selection of patients, and 2)selection of laboratory results. Previously, we attemptedwith data from diseased patients. To improve discrimina-

tion further, adjustments for prognostic factors (age and to analyze hematological laboratory results obtained frommore than 250 patients with various disorders of thesex) were added, and disease prediction was then assessed

by three-dimensional canonical analysis. As shown in hematopoietic system. These results, however, were al-most impossible to analyze, because alterations of dataFigure 5, control subjects were discriminated much more

clearly from patients, and further differentiation between after medication greatly disturbed further analysis in de-tail. In this study, we restricted the data to those obtaineddiseases, including AA and MDS, appeared much better

than disease discrimination shown in Figures 3 or 4. prior to medication. Therefore, we could not analyze asmany patients, although this selection allowed analysisin greater detail. On the other hand, useful laboratory

DISCUSSIONresults should be defined as those which contribute signif-icantly and independently to the disorder(s). MultipleMultivariate analyses have often been applied to estab-

lish disease prediction systems using laboratory results combinations of laboratory results with similar clinicalsignificance do not improve diagnostic accuracy.only [8,18]. There have been several trials of computer-

aided diagnosis of anemia and related disorders using Through the principal component analyses among eighthematological parameters, two pairs of laboratory results,laboratory databases [1–4]. A simple expert system has

been applied, especially to microcytic anemia, giving a Hb with Ht and MCH with MCV, were shown to bealmost equipollent. RBC and PLT showed significancelimited success for discrimination [3,4]. It is much more

desirable to develop a widely applicable discrimination individually. RBC and WBC were also located very closeto each other, but these cannot be taken as equipollent.system, useful at the site of primary care. To establish

an effective system for disease prediction, careful selec- WBC showed no particular significance in this seriesof five-disorder analysis; however, WBC is consideredtion of the database which is used during development

8288$$P567 07-24-97 11:09:17

Page 8: Laboratory diagnosis of anemia and related diseases using multivariate analysis

Computer Diagnosis of Anemia 115

Fig. 4. Distribution of canonical scores of patients in model 3, in which the normal groupwas added to discriminant groups (AA, MDS, and so on) in model 2. Symbols used arethe same as in Figure 3. Clover stands for control subjects.

indispensable for the further application of this analysis the present analysis should be able to discriminate themfrom MDS or AA (data not shown).to other diseases. This is probably why only RBC was

chosen in the discriminant analysis. Including WBC, two On the other hand, PV and IDA were shown to bedifferent from other disorders or controls and were almostsets of laboratory results were constructed and applied

for the further analyses: 1) RBC, MCH, Hb, PLT, and always predicted correctly. One patient with PV was mis-taken as having ITP, but this was a particular case ofWBC, and 2) RBC, MCV, Ht, PLT, and WBC.

Estimation of Mahalanobis’ generalized distances be- pseudothrombocythemia. In this study, we analyzed datafrom only 48 patients with five disorders, and 51 controls.tween these five disorders and normal controls revealed

a rather close relationship between AA, MDS, and ITP. The number of subjects and diseases may arguably havebeen too small, but the final canonical analysis, includingThese results are considered reasonable, as MDS is lo-

cated between anemia and leukemia, and further differen- patient properties, provided a quite satisfactory discrimi-nation among diseases and controls. We should extendtiation between AA and MDS is difficult without special

morphological standards or chromosomal analyses [19]. our studies further to increase the number of patients anddiseases. New patients with these five disorders should beSome ITP is difficult to differentiate from AA or MDS

without special examinations, including bone-marrow monitored with regard to this analysis, and other diseasesshould be checked for possible overlap with the fiveanalysis. Despite this, the three-dimensional analysis (Fig.

5) made it possible to differentiate between these quite disorders. The latter should include considerable numbersof patients and diseases with secondary anemia. Whenwell. In this series, we analyzed five disorders, in each

of which at least 3 cases were available. However, there certain overlaps are seen, we should consider other param-eters of routine laboratory results, such as biochemicalare other diseases, such as paroxysmal nocturnal hemo-

globulinuria or myeloproliferative disorders, which are analyses, to discriminate them clearly from the patientswith primary disorders of the hematopoietic system. Re-also close to MDS. Although the numbers of cases exam-

ined were too small and were not included in this series, sults of the present study and future studies should facili-

8288$$P567 07-24-97 11:09:17

Page 9: Laboratory diagnosis of anemia and related diseases using multivariate analysis

116 Shiga et al.

Fig. 5. Distribution of canonial scores of patients in model 4. Symbols used are the sameas in Figure 4, except for PV, shown by cylinder.

tate production of a diagnostic compact disc, easily appli- part by a grant-in-aid from SRL, Inc. to Toshihito Furu-kawa. We also thank Mr. D. Mrozek for editing the manu-cable on a personal computer at the sites of primary care,

for general usage. script in English.

REFERENCESACKNOWLEDGMENTS

1. Matsuda N: Recent progress in computer assisted laboratory diagnosisThe authors thank Mrs. K. Okamura and the staff of thefor peripheral hemogram. Rinsho Byori [Suppl] 84:159, 1989.Hematology Section of the Central Clinical Laboratory,

2. Spiegelhalter DJ: Bayesian analysis in expert systems. Presented at theKyoto University Hospital, Kyoto, Japan for their techni- Sixtieth Annual Meeting of the Japanese Statistical Society, July 1992.cal assistance and valuable advice, and Miss Y. Mitsuda 3. Houwen B: The use of inference strategies in the differential diagnosis

of microcytic anemia. Blood Cells 15:509, 1989.for secretarial assistance. This study was supported in

8288$$P567 07-24-97 11:09:17

Page 10: Laboratory diagnosis of anemia and related diseases using multivariate analysis

Computer Diagnosis of Anemia 117

4. O’Connor ML, McKinney T: The diagnosis of microcytic anemia by 14. Jennrich RI: Stepwise discriminant analysis. In Enslein K, Ralston A,Wilf H (eds): “Statistical Methods for Digital Computers.” New York:a rule-based expert system using VP-expert. Arch Pathol Lab Med

113:985, 1989. John Wiley & Sons, Inc., 1977.15. Klecka WR: Discriminant analysis. Sage University Paper Series on5. Pearson K: On lines and planes of closest fit to systems of points in

space. Philosophical Magazine 6:559, 1901. Quantitative Applications in the Social Sciences, Series No. 07-019.Beverly Hills: Sage Publications, 1980.6. Hotelling H: Analysis of a complex of statistical variables into principal

components. J Educ Psychol 24:417, 1933. 16. Rencher AC, Larson SF: Bias in Wilks’ lambda in stepwise discriminantanalysis. Technometrics 22:349, 1980.7. Kshirsager AM: Multivariate analysis. In Mardia KV, Kent JT, Bibby

JM (eds): New York and London: Academic Press, 1979. 17. Shiga S, Koyanagi I, Kannagi R: Clinical reference values for laboratoryhematology tests calculated using the iterative truncation method with8. Kshirsagar AM: “Multivariate Analysis.” New York and London: Mar-

cel Dekker, Inc., 1979. correction. Rinsho Byori 38:93, 1990.18. Ichihara K, Miyai K: Graphical methods to analyse overall pathophysio-9. Lawley DN: Tests of significance in canonical analysis. Biometrica

46:5966, 1959. logical data structure from laboratory database. Rinsho Byori [Suppl]83:39, 1989.10. Fisher RA: The use of multiple measurements in taxonomic problems.

Ann Eugen 7:179, 1936. 19. Werner M, Maschek H, Kalovtsi V, Choritz H, Georgii A: Chromosomeanalyses in patients with myelodysplastic syndromes: Correlation with11. Rao CR: “Linear Statistical Inference.” New York: John Wiley & Sons,

Inc., 1973. bone marrow histopathology and prognostic significance. VirchowsArch [A] 421:47, 1992.12. Rao CR: “Linear Statistical Inference and Its Applications.” New York:

John Wiley & Sons, Inc., 1965.13. Costanza MC, Afifi AA: Comparison of stopping rules in forward

stepwise discriminant analysis. J Am Stat Assoc 74:777, 1979.

8288$$P567 07-23-97 13:20:56