computer analysis of chromosome patterns: decision making

9
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-20, NO. 9, SEPTEMBER 1971 technique which excels in certain feature extraction tasks because of ease of programming, high speed, and versatility. REFERENCES [1] R. M. Landsman, L. B. Scott, and M. J. E. Golay, "Apparatus for counting bi-nucleate lymphocytes in blood," U.S. Patent 3,214,574, Oct. 1965. [2] S. H. Unger, "A computer oriented toward spatial problems," Proc. IRE, vol. 46, Oct. 1958, pp. 1744-1750. [3] K. Preston, Jr., "The CELLSCAN system-a leukocyte pattern analyzer," Proc. 1961 Western Joint Computer Conf., pp. 173-178. [4] N. F. Izzo and W. Coles, "Blood cell scanner identifies rare cells," Electron., vol. 35, Apr. 1962, pp. 52-57. [5] M. Ingram, P. E. Norgren, and K. Preston, Jr., "Automatic dif- ferentiation of white blood cells," in Image Processing in Biological Science. Berkeley, Calif.: Univ. of California Press, 1968. [6] M. J. E. Golay, "Hexagonal parallel pattern transformations," IEEE Trans. Comput., vol. C-18, Aug. 1969, pp. 733-740. [7] B. H. McCormick, "The Illinois pattern recognition computer- ILLIAC III," IEEE Trans. Electron. Comput., vol. EC-12, Dec. 1963, pp. 791-813. [8] E. S. Deutsch, "On some preprocessing techniques for character recognition," in Proc. Symp. Computer Processing in Communica- tion. Brooklyn, N. Y.: Polytechnic Press, Apr. 1969, pp. 221-234. [9] S. B. Gray, "Local properties of binary images in two dimensions," IEEE Computer Group Repository, Paper R-70-160, 1970. [10] A. Rosenfeld, "Connectivity in digital pictures," J. Ass. Comput. Mach., vol. 17, 1970, pp. 146-160. [11] R. A. Kirsch et al., "Experiments in processing pictorial information with a digital computer," in Proc. 1957 Eastern Joint Computer Conf., pp. 221-229. [12] M. Ingram and K. Preston, Jr., "Automatic analysis of blood cells," Sci. Amer., vol. 223, Nov. 1970, pp. 72-82. Computer Analysis of Chromosome Patterns: Feature Encoding for Flexible Decision Making ALLEN KLINGER, MEMBER, IEEE, ARNOLD KOCHMAN, AND NIKITAS ALEXANDRIDIS, STUDENT MEMBER, IEEE Abstract-Experimental pattern recognition techniques for pro- cessing chromosome slides with a computer are described. The pur- pose of the computer program is twofold: to illuminate the basic mechanisms by which a human recognizes an object, such as a chro- mosome, and distinguishes it from other entities; and the employment of these mechanisms is an automatic and precise extraction of chro- mosome features. Index Terms-Feature extraction, figure properties, hierarchial decisions, integral projections, pattern recognition, picture process- ing. INTRODUCTION T HE purpose of this study was to find methods of ab- stracting the characteristics of a chromosome and to develop a dynamic method of encoding this data for Manuscript received December 10, 1970; revised March 26, 1971. This research was partially sponsored by Cancer Research Funds of the University of California, and the Air Force Office of Scientific Research, Air Force Systems Command, USAF under AFOSR Grant 70-1915. The United States Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. A preliminary version of this paper was presented at the IEEE Symposium on Feature Extraction and Selection in Pattern Recognition, Argonne, Ill., October 5-7, 1970. The authors are with the Department of Computer Science, University of California, Los Angeles, Calif. feature extraction. The computer program must locate a single entity-chromosome or nonchromosome-then identify as many of its characteristics as possible and employ them to make a decision. If the characteristics noted are sufficient to satisfy the observer that his decision will be correct, he will make a decision. Otherwise, he will investi- gate alternative ways of looking at the object. On a com- puter, this takes the form of a decision tree (Fig. 1). The chromosome is defined and recognized by the pres- ence or absence of "arms" and of their shape and size. For the purpose of the computer program described here, it is necessary to formulate this definition more explicitly. The chromosome recognition program developed by Frey [1] was unable to process many cases without human interven- tion because it used an overly strict operational definition of "chromosome." The program developed by Neurath et al. [2], has some similarities to our program but adopts a fixed-processing format for each chromosome. The tech- niques reported here permit the recognition of distinct situations, including overlapping chromosomes; hence the best criterion can be applied to each situation. The structure of the program permits several relatively complex functions. For example, once a chromosome has 1014

Upload: doancong

Post on 30-Jan-2017

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Analysis of Chromosome Patterns: Decision Making

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-20, NO. 9, SEPTEMBER 1971

technique which excels in certain feature extraction tasksbecause ofease ofprogramming, high speed, and versatility.

REFERENCES[1] R. M. Landsman, L. B. Scott, and M. J. E. Golay, "Apparatus for

counting bi-nucleate lymphocytes in blood," U.S. Patent 3,214,574,Oct. 1965.

[2] S. H. Unger, "A computer oriented toward spatial problems," Proc.IRE, vol. 46, Oct. 1958, pp. 1744-1750.

[3] K. Preston, Jr., "The CELLSCAN system-a leukocyte patternanalyzer," Proc. 1961 Western Joint Computer Conf., pp. 173-178.

[4] N. F. Izzo and W. Coles, "Blood cell scanner identifies rare cells,"Electron., vol. 35, Apr. 1962, pp. 52-57.

[5] M. Ingram, P. E. Norgren, and K. Preston, Jr., "Automatic dif-ferentiation of white blood cells," in Image Processing in BiologicalScience. Berkeley, Calif.: Univ. of California Press, 1968.

[6] M. J. E. Golay, "Hexagonal parallel pattern transformations," IEEETrans. Comput., vol. C-18, Aug. 1969, pp. 733-740.

[7] B. H. McCormick, "The Illinois pattern recognition computer-ILLIAC III," IEEE Trans. Electron. Comput., vol. EC-12, Dec. 1963,pp. 791-813.

[8] E. S. Deutsch, "On some preprocessing techniques for characterrecognition," in Proc. Symp. Computer Processing in Communica-tion. Brooklyn, N. Y.: Polytechnic Press, Apr. 1969, pp. 221-234.

[9] S. B. Gray, "Local properties of binary images in two dimensions,"IEEE Computer Group Repository, Paper R-70-160, 1970.

[10] A. Rosenfeld, "Connectivity in digital pictures," J. Ass. Comput.Mach., vol. 17, 1970, pp. 146-160.

[11] R. A. Kirsch et al., "Experiments in processing pictorial informationwith a digital computer," in Proc. 1957 Eastern Joint Computer Conf.,pp. 221-229.

[12] M. Ingram and K. Preston, Jr., "Automatic analysis of blood cells,"Sci. Amer., vol. 223, Nov. 1970, pp. 72-82.

Computer Analysis of Chromosome Patterns:Feature Encoding for Flexible

Decision Making

ALLEN KLINGER, MEMBER, IEEE, ARNOLD KOCHMAN, AND

NIKITAS ALEXANDRIDIS, STUDENT MEMBER, IEEE

Abstract-Experimental pattern recognition techniques for pro-cessing chromosome slides with a computer are described. The pur-pose of the computer program is twofold: to illuminate the basicmechanisms by which a human recognizes an object, such as a chro-mosome, and distinguishes it from other entities; and the employmentof these mechanisms is an automatic and precise extraction of chro-mosome features.

Index Terms-Feature extraction, figure properties, hierarchialdecisions, integral projections, pattern recognition, picture process-ing.

INTRODUCTIONT HE purpose of this study was to find methods of ab-

stracting the characteristics of a chromosome and todevelop a dynamic method of encoding this data for

Manuscript received December 10, 1970; revised March 26, 1971.This research was partially sponsored by Cancer Research Funds of theUniversity of California, and the Air Force Office of Scientific Research,Air Force Systems Command, USAF under AFOSR Grant 70-1915. TheUnited States Government is authorized to reproduce and distributereprints for Government purposes notwithstanding any copyright notationhereon. A preliminary version of this paper was presented at the IEEESymposium on Feature Extraction and Selection in Pattern Recognition,Argonne, Ill., October 5-7, 1970.

The authors are with the Department ofComputer Science, Universityof California, Los Angeles, Calif.

feature extraction. The computer program must locate asingle entity-chromosome or nonchromosome-thenidentify as many of its characteristics as possible and employthem to make a decision. If the characteristics noted aresufficient to satisfy the observer that his decision will becorrect, he will make a decision. Otherwise, he will investi-gate alternative ways of looking at the object. On a com-puter, this takes the form of a decision tree (Fig. 1).The chromosome is defined and recognized by the pres-

ence or absence of "arms" and of their shape and size. Forthe purpose of the computer program described here, it isnecessary to formulate this definition more explicitly. Thechromosome recognition program developed by Frey [1]was unable to process many cases without human interven-tion because it used an overly strict operational definitionof "chromosome." The program developed by Neurathet al. [2], has some similarities to our program but adopts afixed-processing format for each chromosome. The tech-niques reported here permit the recognition of distinctsituations, including overlapping chromosomes; hence thebest criterion can be applied to each situation.The structure of the program permits several relatively

complex functions. For example, once a chromosome has

1014

Page 2: Computer Analysis of Chromosome Patterns: Decision Making

KLINGER et al.: COMPUTER ANALYSIS OF CHROMOSOME PATTERNS

COULD THEOBJECT BEOVERLAPPINGCHROMOSOMES

NO/ \YESCOULD IT CAN AN X

BE A SINGLE CHROMOSOME

CHROMOSOME BE FOUND WITHINTHE PATTERN

NO/ \YES NO / \YES

FINISH COULD BE CAN A Y CAN IT BEAN X CHROMOSOME CLEARLYCHROMOSOME BE FOUND DISTINGUISHED

YES/ \ NO/ \YES NO/ \YES 7

/ FINISH FINISH / \YESFINISH IS THIS THE ONLY

NORMAL CAN IT BE A Y

NO / \YES CHROMOSOME THAT CAN BE

BENT NO YES FOUND AT THISARM FINISH f\

LOCATION

\ESFINISH FINISH NO/ YES

NO\Y IS THIS THE

MISSING FINISH BEST X WOULD ITPART BE NORMAL

YES/ \ THAT CAN BE YES / \NO

FINOSH IS THERE FINISHRETURN 131 FINISH THE LOCATION DOUBT

WRONGNO/ \YES YES/ \ NOPRO-'

PORTIONS RETURN (2) FINISH FINISH FINISH

Fig. 1. A logical tree structure of the dynamic encoding process. (Note:When a branch terminates with "finish," a correct decision can prob-ably be made. When a branch terminates with a numbered "return,"the data must be augmented, or modified if possible, then reconsideredstarting at the corresponding auxiliary entry point of the decision tree.)

been detected, not only the modified data, but also the se-

quence of modifications that have occurred, is considered.In this way, when the program makes a decision it can takeinto account the extent and type ofmodifications to the dataas well as possible alternatives.The domain of our investigation, computer chromosome

recognition, has an extensive literature. Ledley [3] andLedley et al. [4] describe syntax-directed methods using a

special scanner for computer input of pictures. Hilditchand Rutovitz [5] present a comprehensive discussion ofshape-oriented chromosome recognition experiments usingthe same scanner. Rutovitz [6] employs two ofour concepts:"a series of increasingly complex alternatives" and "a polarcoordinate representation of the boundary." Hilditch [7]describes the separation of touching chromosomes, whileGallus and Neurath [8] improve earlier programs by incor-porating boundary analysis.Our work presents new methods for data reduction and

isolation of visual entities (chromosomes, blobs, touchingand overlapping chromosomes) as well as boundary encod-ing via a scaled polar plot which enables analysis for overlapresolution. In essence we describe an experimental programwhich exemplifies feature encoding and decision making inhierarchical fashion. Many ofthe papers cited describe moreextensive analysis and computation on actual chromosomedata.

DATA REDUCTION AND ISOLATION OF

SEPARATE ENTITIES

The first phase of the computer program modifies thedata representing the picture so that it becomes as small a

data base as possible. The scanned picture is input in theform of numbers representing gray levels, or relative dark-

nesses, corresponding to a particular small area of theoriginal photograph. A picture of 1 000 000 points is con-verted into a different form and reduced to some 10 000pairs of coordinates. Each set of coordinates in the reducedpicture is on the left or right facing edge of a dark area ofthe picture. In a picture of the type under consideration,most locations are not associated with any informationabout the objects to be studied. It is the coordinates of theintersections of the boundary of the dark chromosome witheach horizontal line slicing it that are of particular interest.The data are input line by line, each line corresponding toone sweep of the flying spot scanner digitizer across thephotograph. Thus, after the first phase of the program, thepicture is reduced to a list of points known to be on theboundaries of objects in the picture. Entries in this list startat the upper left corner of the picture; further entries occuras these points of the picture are examined from left to rightand from top to bottom, as English is read. This ordering iscrucial to the operation of the second phase.The second phase addresses the problem of isolating par-

ticular objects from other objects. A good system for accom-plishing this must associate all parts of an object with eachother, but it must not join together two or more distinctobjects. Two basic approaches were originally considered,the first being that all points found in the first phase wouldbe encoded according to vertical continuity. This involvesfinding the tops and bottoms of all vertically oriented curves.Objects would then be detected by finding points wherethese curves are joined together. However, this method wasrejected since it requires excessive bookkeeping in order toobtain reliable results.The method selected utilizes boundary tracing; this

tracing is done symbolically using the reduced table. Thecriterion used for finding the next point on the boundary isthat the last point and the next point must be on the same(left or right) sides of vertically adjacent overlapping cross-sectional line segments. Fig. 2 illustrates schematically thegray levels in a hypothetical picture; the solid black is in-tended to represent the images of objects. In Fig. 3 whichis derived from Fig. 2, the ends of the lines are the pointswhose coordinates would be retained in the list. Table Iindicates the corresponding contents of the computer pro-gram's internal table; this table contains all the requiredinformation about the picture in a compact form. (A minussign associated with a point of the table indicates that thepoint was at the left of an object.)A pointer starts at location 1, symbolically moving down

the left side of the object, jumping down the table by twountil the first overlap is detected. The first overlap detectedwhile going downward is the left-most on that line. Thepointer is set at this new position and the process of search-ing for overlaps is continued until no overlap is found on thenext line of the picture. At this time the lowest point on theleftmost edge has been reached. The pointer is then in-creased by one and moved up the right side as it was previ-ously moved down the left side.When the boundary has been traced as completely as pos-

sible, the process is repeated, this time moving clockwiseinstead of counterclockwise. Locations at which the pointer

1015

Page 3: Computer Analysis of Chromosome Patterns: Decision Making

IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1971

2

3

4

5

6 Y

7

8

9

10

11

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

x

Fig. 2. A sample picture with two objects.

1,63- 1LINE SEGMENTS ARE SHOWN TO

2. 62 - 22,42 21,41,43 2 GRAPHICALLY SUGGEST WHAT

3__23_1 _20_40 34 IS MEANT BY OVERLAPPING. ONLY3,23,61, 200,44 .THE END POINTS ARE ACTUALLY

4,24,60 19,39,45 4 CONSIDERED. THE NUMERAL AT

5.25,59 18,38 46 69,71,77 EACH POINT IS THE SEQUENCE IN64,70, 76 WHICH THE POINT WAS REACHED.

6,26,59 17,37.47 68,72 6 y NOTE THAT BOUNDARY OF THE

7,27,57 16, 36,48 '6575 67,73 LARGE OBJECT IS TWICE TRAV-

8,2827,5, 66,74 ERSED.8,28 1~ 15, 35 - 49 8 THE NUMBER OF TIMES THE

9,29 -14,3456

-50

VERTICAL DIRECTION CHANGES9,329 14,346 _ 50 IS A PARAMETER TO THE BOUND-

10,30 13, 33 _ 51 10 ARY TRACING ROUTINE54

11,31_12,32 _ 52 1153

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

x

Fig. 3. Tracing sequence for boundary of the object in Fig. 1.

TABLE ICONTENTS OF THE COMPUTER PROGRAM'S INTERNAL TABLE

CORRESPONDING TO THE PICTURE IN FIG. 1

Before After Before AfterBoundary Boundary Boundary BoundaryTracing Tracing Tracing Tracing

x,y x,y-5,1 1a -5, 1011 -14, 7 21 -14, 20276,1 2 6,1011 15,7 22 15,2027

-5, 2 3 -5, 1012 -5, 8 23 -5, 10186, 2 4 6,1012 7, 8 24 7, 1018

-8,2 5 -8, 1012 -9, 8 25 -9, 10189, 2 6 9, 1012 10, 8 26 10, 1018

-5, 3 7b -5, 1013 -5,9 27 -5, 101910,3 8 10, 1013 6,9 28 6, 1019-4,4 9c -4,1014 -9, 9 29 -9, 101910,4 10 10, 1014 10, 9 30 10, 1019

-6, 5 11 -6, 1015 -4, 10 31 -4, 102010, 5 12 10, 1015 6, 10 32 6, 1020

-14,5 13 -14, 2025 -9, 10 33 -9, 102015, 5 14 15, 2025 10, 10 34 10, 1020-6, 6 15 - -4, 11 35 -4, 1021

10,6 16 5,11 36 5,1021-14,6 17 -14,2026 -9, 11 37 -9, 1021

15, 6 18 15,2026 10, 11 38 10, 1021-6,7 19 3910,7 20 40

Some corresponding Fig. 3 tracing sequence numbers are: al, 63;b3, 23, 61, andC4, 24, 60.

Fig. 4. Points are assigned to an object as the boundary is traced.

stops are signaled by adding a multiple of 1010 to the ycoordinate at this location. The point has now been"assigned" to an object. Points associated with each otherby tracing a boundary are signaled by adding the same mul-tiple of 1010 to the y coordinate. Since the value of the Ycoordinate has not changed modulo 1010, the data has notbeen disturbed by the procedure which isolates objects.Thus an error or omission can be corrected at a later pointin processing. Throughout the procedure the spatial rela-tionships of all points with each other are preserved, eventhough the representation of the data changes. Therefore,once a point has been found to be part of an object (notnecessarily complete), that point can be used in furtherattempts to recognize complete objects.When associating a pair of points, they must belong to

overlapping segments in two consecutive lines. If the differ-ence in y coordinates of two segments is greater than somesmall integer,1 overlapping is not considered sufficient toconnect them. Since both ends of segments are adjacent inthe table, overlapping can be easily determined: if one endof a segment can be recognized as lying on the boundary ofan object, the other end of the segment must also be part ofa boundary of the same object. Recognizing both ends of aline segment as belonging to the object causes internalboundaries to be recognized at the same time that the exter-nal boundary is traced (Fig. 4).

Fig. 4 shows how the boundary of an object is traced firstin one direction of rotation and then in the other. The onlypart of the object that is not discovered is the small bump atthe top C since neither the right nor the left sides of thispart have been traversed. To have done so would have

The number chosen was 5. A larger number might cause separateobjects to appear joined; a smaller number would be more likely to causespurious changes in vertical direction because of a noise line.

1016

Page 4: Computer Analysis of Chromosome Patterns: Decision Making

KLINGER et al.: COMPUTER ANALYSIS OF CHROMOSOME PATTERNS

XX>,X\ x X XXX,YX Y~XXX

> /)4-/Y)lxx

4,)X -<

0* . . . . . . . . . . . . .

)

yXXXX

X Y XYXXYXYXYYX XX

XXYXXXY

xX +

Yes XXX xxx> xxx XXXXXXXXkXXX XXX XXXXXX

y E3 +xxxy XXXXXXXXXXY xxxxXXX + Y x' XX' XXXxxx xxx E + x sw XXX xxxx

>:YX s *~~4 XXX YXXX' XXX+ . + + +-X+ + X'XX . . . )XXX +XXXXX + +4+XXX4 +XXXXX*XXX+X*XX* + 4*-

x Y x + sexx xxxxvxxexxx xxxx

+ XXXYXXX XYXxxxxxx +

~~XXYXX YXX XXXXXYXXXX + Xi4-X4- XXXXX dXXX

YXX + XXX xxx xxX xxxXXXXX XXX XXx YXX XXX xXX XXX/ ~~~~+E4-jy X8 ve~x yYKX

E,J + xxx KXX x>El + xXxX+ XXX

X XX X XXXX 4-XXXX

+ 3w+ 2

Fig. 5. Chromosome with detached part.

required the computer program to detect the possibility atB of changing directions as at A. A change from downwardtravel to upward travel must be made at A since it is thelowest point of the object. A subroutine to detect that thedirection could be changed at B could be written, but itwould be entered at every stage of the boundary tracing.After completing the tracing of the boundary as shown, anattempt could be made to trace a path starting at C, retracingthe main body. When this is done, only the identificationofthe object need be changed in order to connect the missingpart to the main object.

DYNAMIC ENCODING OF INFORMATIONFOR FEATURE EXTRACTION

The actual encoding of information is the major task ofthe computer program. Two techniques are incorporatedin the program. These are integral projections [9], [10], orchord profiles [11], and the production of a polar plot ofthe suspected chromosome. These two methods are notparallel. The integral projection technique is used as apretest to determine which branches of the tree should befollowed. At a later time the polar plot is used for the moreexplicit decision making.

Integral ProjectionsThe third part in row 1 of Fig. 4 shows how the first

integral projection from the left of a chromosome is ob-

tained. It will be a plot ofthe lengths ofthe leftmost segmentsof horizontal chords which intersect the chromosome as afunction of the vertical position y. If the part of the objectrepresented by the first integral projection is removed, thefirst integral project of what is left is the second integralprojection of the original object. Some nonchromosomes-convex blobs-may be eliminated by the presence of onlya first integral projection on both the x and y axes. In manycases, the possibility of an object's being a chromosome iseliminated immediately because the number of points whichconstitute its boundary, henceforth called its size, are eithermuch too small or much too large. (The size is obtained bythe computer program during boundary tracing; the ex-amples given here required 60< size<2000.)One of the drawbacks in the use of integral projections is

that they change with the orientation of an object. Twoidentical objects each rotated one from the other wouldhave different integral projections. Integral projections arealso very sensitive to error, particularly when an attemptis made to obtain them in the direction perpendicular tothe direction of scanning. The data produced by scanninga photograph is susceptible to the least error in the directionof the scan. Any errors produced in scanning have unpre-dictable effects on integral projections. Isolated pointsrepresenting noise could radically alter the functional formof the integral projection data. (In Fig. 5, the Es indicateplaces where problems occurred with the integral projec-

1017

4

+

Page 5: Computer Analysis of Chromosome Patterns: Decision Making

IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1971

tions.) There are two further reasons, in addition to thisnoise sensitivity, for ruling out integral projection storageof the chromosome data for quantitative processing.

1) Such encoding makes it unclear to which chromosomearm a given chord is to be associated.

2) Overlapping chromosomes cannot be resolved fromintegral projection data.

Alternatively, the availability of the scanned picture as adata base larger than the computer core memory, the pro-cessing of the picture by reading one line into memory at atime, and the ease with which one projection can be calcu-lated from the boundary data, lead us to employ this modeof picture encoding for the "first test" or qualitativeprocessing.

Polar PlotsFor each object thus far identified, the polar plot is made

as follows. All boundary points are transformed into polarcoordinates with respect to an origin set to the averagecoordinates of the object's boundary points. A table with360 spaces (10 rings, 36 sectors) stores entries which indicatewhether the object's boundary passes through the cor-responding region of the plane. Note that the table r(O), is afunctional representation of the boundary curve of the ob-ject, and that this is not in general single valued. Let thesingle-valued function R(O), the radial extremum, be derivedfrom the table by:

R(O) = 9 * max (r( )) (1)rmax

where rmax is the radius of the point furthest from the center.The derived function v(O) in (2) is used for actual decisionmaking and is called the polar plot. (Brackets indicate thegreatest integer function.)

[R(O)]v(O) = E (i + 1). (2)

i=O

Gross-shape considerations regarding chromosomes in-dicate that 'a point relatively far from the center containsmore information than a point near the center, since thecloser a point is to the center, the less likely it is to be at theend of a chromosome arm. The effect of the function v(O)is to magnify the amplitude of peaks in R(O). However,peaks in R(O) of almost equal amplitudes will usually haveexactly the same amplitude in v(O) because of the quantiza-tion v(O) introduces. Peaks that have a somewhat greaterdifference in amplitude will have their relative difference inamplitude magnified by the mapping from R into v.Some of the problems that limit the usefulness of the in-

tegral projections do not affect the utility of a polar plot.Recognizing a chromosome in this way does not necessarilyrequire that the orientation of the object be taken into ac-count. A rotation is merely a translation of the angularcoordinate. Furthermore, use of integral projections (andpossibly of other schemes such as chain encoding) wouldrequire one to deal with the problem of pattern recognitionat a lower level oflogic than is absolutely necessary. Integral

0

2

312

2w

YXX4244XXX;\<Xx.4 4 44 4Y>44>444x

(44

(XX'X>44>4< 4' <4'4444 >44,4

(>4>4>4444>4Xy. f ' <44.4444 444,444,

(X>4' X'444x4444 (4' <4 >4444.(4474< ''4<>44>x< '2444 4

44

47

22

4

13

4>L I4?

4' 3

-1'

1?

1>

142

44)

t,)

, 7

40

Fig. 6. Polar plot of chromosome with detached arm.

projection implicitly requires that the results be comparedwith some possible prototypes, searching for an approx-imate match. By depending on the very properties that makethe image appear to be the image of a chromosome, and byrecognizing the obvious property ofchromosomes that theytend to consist of arms extending generally outward from acenter, it is possible to make correct detections of chromo-somes even under adverse circumstances.

Fig. 5 illustrates an object which should be recognizableas an X chromosome, and at the same time it should bepossible to recognize it as a less than perfect example. Theintegral projections of an object such as that in Fig. 4 areradically altered by the presence of a hole which could beignored (as the result of uneven staining, i.e., optical noise)and by the outcropping on the upper right arm which can-not be ignored (abnormal chromosome possibility). Hencethey do not facilitate distinguishing between nonchromo-somes and abnormal chromosomes. Too many assumptionsabout the data are required before the analysis can be made.

However, the polar plots in Figs. 6, 8, and 10 can beutilized in a different way. The number of chromosome armsand the number ofpeaks in v(O) should be the same; further-more, distinguishing a chromosome from a blob is essen-tially a problem of recognizing arms. Thus, the polar plot isa device for extracting those features by which chromosomescan be identified.

Figs. 6, 8, and 10 are examples of plots of v(4)(0). Asmoothed version of v(O)= v(°)(0) obtained from

(i+l -) 1 [V (On -1) ++ v"(On+ 1) (i)(On)j

v0i)(01) < max [v(')(O.. l), 0)(On + 1)] (3)otherwise (i.e., if v(i)(O,) is found to be the the apex of a peak),v(i+ )(On) = v(i)(0n). This smoothing process tends to make thepeaks triangular.

Figs. 6, 8, and 10 are plots of v(4)(0) for the objects shownin Figs. 5, 7, and 9, respectively. These data have undergone

1018

Page 6: Computer Analysis of Chromosome Patterns: Decision Making

KLINGER et al.: COMPUTER ANALYSIS OF CHROMOSOME PATTERNS

X~Y X

K Y X

X YY.

v- + tY X.,(.4++ +

X Y

xxx

Fig. 7. An example of a complete and probably normal chromosome.

a transformation which is of a specific form, but which stilldepends largely on the data itself. The transformationtaking r(O) into v(l)(0) is defined such that a plot of v(l)(0)consists only of straight lines. If, as in Fig. 10, the plot isnot commensurable with a pattern of straight lines, thetransformation is not correct. Under such circumstances,the procedure must climb to the next node of the encodingtree. In the case of the pattern depicted in Fig. 9 and plottedin Fig. 10, a successful choice of this transformation willoccur when the origin of the polar coordinates is trans-lated in the direction indicated by the arrows. Once a suit-able v(O) is found, the essence of the plot can be representedby a vector of the relative maxima and minima in the polarplot. This vector, along with a recbrd of the sequence ofoperations that have been performed in order to obtainV4(0) constitute all of the encoded information about theobject. The nature of the sequence of data modificationsindicates the decision criterion, while the vector of the ex-

trema provides the input to the decision apparatus.

ANALYSIS AND DECISION MAKINGObjects encountered by the decision making apparatus

are being evaluated for possible membership in four cate-gories: normal X chromosome, abnormal X chromosome,Y chromosome, and nonchromosome. Abnormal Y chro-mosome is not included since it is not yet obvious what willbe necessary and sufficient to distinguish an abnormal Y

chromosome from a member of one of the classes alreadymentioned. In particular, some imperfect X chromosomeslook very much like slightly misshapen Y chromosomes.

ClassificationFig. 7 is an example of what is obviously a complete and

probably normal chromosome, although it is not an excel-lent image. It was selected as an example to demonstratethat the conclusion that the data represents a good X chro-mosome can be reached by analysis of the radial plot shownin Fig. 8. Four major peaks can be seen in Fig. 7. Thesmaller peaks are small enough in comparison to the fourlarge peaks that they can probably be ignored. The signifi-cance of these smaller peaks is that, if one were to draw thelargest and smoothest outline of an ideal X or Y figure(Fig. 7) superimposed on the chromosome that does notenclose the chromosome, the fraction of the total area inthe plot constituted by the minor peaks and the closenesswith which the major peaks can be matched according tosize, can be used as a measure of the level of confidence thatmay be attributed to the decision. It should be noted thatthe peaks are usually composed of two distinct areas thatcan be compared separately. Ti"hus, there are three ways tocompare the sizes of these peaks: by their total areas andby the areas of the top and bottom parts.

Fig. 5 shows what is clearly an X chromosome and it isclearly imperfect. The first and fourth peak areas in Fig. 6

1019-

+ 7r

+ 2

X? X +

x +

Y +

+

+

X x + y xX x + X X X

X x Y XX +

x x x x x X XI'<+0 y X x x x Xx x'y x +

+ + + x x . . . . . . . . . . . . . Xv Y, + + + +

x x X X XY X +

X X +

+

+

X

X x x .y 4-

x X X IfX 'K x x 1(XYX XKX +XXYXXXI,( x )k X,K x +

Y x x /.,< Y +

x x x x 'K x +

x x )( x x X +

xxx x +

x x x Y +

x x + 3w+ 2

r

Page 7: Computer Analysis of Chromosome Patterns: Decision Making

1020 IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1971

0

KXXXYXXXXY>XXXXXXXX<X .Xxxx.xxx 4nxX:'XXxYXx,\X . <X XX x <x'xxx 35X X *' X X X X .K''t"I X0 Y -t<25Xx' <XXXx( <1i 17

Ar 14X i't:'. sYY.k4 v4.X. w \ 14

2 XXY . XXY

xxyx (XXXX.X\ 217

(XX.s ^ xX x' r N 22

X X 8.t8 X .-A.t K . s x ¢. i , r ,1 s x X ' > x 32x XY X XS.'A"ty,Y>X,8 X YX X.. x" \'X X 43XXX 50

"(yX X .'X , y X, I X > ).xx X 1 \t '. (" ": Y Y 1;< 5 )I(X X,' ;I X ,Xlk 'rX f>X vXt8 \Yt,5'yX X XV X: X'4 h)

X v ,,X x X V. < i Xo X Yt' }'X .YiX XpY 3

133

(XX I i-) 4 .r,wh[S'1,

X X \ ;XX X ,0 X E Xsy ; NiX 1:0X)W,v 'Yy Y 0,I(X. f*' V\Vj> >'X//szY' X i,,K f X 4 qKX> . i . K y¢y> {> x > >,X X e.. ) .........................40

XX ''. *Y / ' s 2?XX~~~~~~~~~~~~~~~~~~~~~~~~~x{-X> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'.:-Ys''

37r *>& 62 X 12

X X> ''4^s _1:X135KX X'4"sYXy X v -< 7 19

(X'rsXx u<t',1''''35

K ) t . wNXX> \ v > Ax 8 > 8 a z s >; \ r > \ s * s \ v~ 3531

X f > + YY /S#iX.X/XYX\>YX 3 2XXX x \ '.', > ' tx 'XX 37

Fig. 8. Polar plot from a good chromosome image.

+ xxx xxx

xxx xxxxxx + xxx

xxx . xxxxxx + xxx

xxx + xxx

xxx xxx xxx + xxxxxx xxx xxx xxx

xxx xxx + xxx xxxxxx xxx + xxx xx1

xxx xxx + xxx xxxxxx xxx + xxx xxx

xxx xxx + xxx xxxxxx xxx + xxx xxxxxx xxx . xxx xxx

4 . .. . . . . ...xxx xxx.. . . . . .. +. . . . . .+. .xxx . . .+. . .. + .4.4.4.4.4.4.4.xxx xxx + xxx xxx

xxxx xX xxx xxx

xxx xxx xx xxx xxxxxx xxx xxx xxxxxx xxx xxxxx

xxx xxxxxx xxx

xxx xxx

xxx xxxxxx xxx

xxx+xxx

xxx + xxxxxx xxx

xxx x xxx

xxx xxx +xxx xxx +

+ 3wxxx xxx +2

xxx xxx +xxxxx +

Fig. 9. Nonchromosome or markedly bent chromosome.

match very well and the second peak is fairly close to the Fig. 9 is the picture of a nonchromosome which is rela-same size. The second peak does not match well with any tively complex in shape. It might be a chromosome but isof the others, except for the bottom part of the third peak. eliminated by consideration of the polar plot. Fig. 10 illus-There are many indicators of an X chromosome but not of trates an instance in which it is clear that none of the peaksan ideal X chromosome. Therefore, it is concluded that the compares in size with any of the others. Furthermore, theobject is indeed probably an X chromosome, but probably peaks are not all triangular. Therefore, the object is tenta-abnormal. tively found to be a nonchromosome.

Page 8: Computer Analysis of Chromosome Patterns: Decision Making

KLINGER et al.: COMPUTER ANALYSIS OF CHROMOSOME PATTERNS

XX0xxxxxxxxxxxX I .oXXXXX XXXXXXXXw XX>XXXXYXXR XY YXX YYXXXXYX>XXYXXtyXXXXXXXXXYYXYYX XXX<XXXXXXXXXXXXYXXXXYXXXXXXXI"XX8XXXYXxXXXXXXXXXtXXXXtXYXYw

XX4)0~, ..X -XX )4)~-O )XXXXXYXXXXXXYYXXXXYXYXX,/XXXXX.YYXXXXXKXXXXYXXXX)X '{A'X sAX

X7 XXXXXXXXXXXXX:XXYXXYXYXXXXX XXxx'xxx

xxyxxxxxxxxxxxxXx]xxxxx xYxxYXY,XXXXX'XXXXxYxXXXXXXXXXXXXXiXxxxXX,XX,YXXXXXYXXXXYtYYXXXYYXXXXXXYXXX.XXXXXXXX XXXXXxYx.xYxXYXXXYXXAXXXXXXXXXXXXXXX~lXYy'XXXXXXXX>..< 'XYX<XXfASXXXXXXXYXXXXXX8X.' t XX,XXX>.i\,XxxYx8Yx'xx XXXXXXYXXXx/XX XXYi XXXXX XX

XusY XX,X XXYYXXXXXXX,Y XXX/X XyxxxxxyX>.'XXY &X'X XX9, YXXXXXXXY.XX"XX3dxx XY(X)XYXXYX'',; AY<XXXKxXyXYXrYXXXXXYXX XXXXX~XXXYXXY 1'XYXXXXXXXXXXX

xxxxxxxXXXYXXXXXXXAXXAXIxxxxxxxYxx XXXXXXXXXxxxxxxxxx XXXXXXXXXXXXX

3rXX XXXX YX>XXKYXXX XxXXXXXXuXXs

2XXYXXXXXX>XXf.<XYXXXXXXXVXXXXXXXX

Fig. 10. Polar plot of a bent object.

0

R(O)

0 XX 3r

2 2

Fig. 11. Two overlapping chromosomes.

Overlap Resolution

The question naturally arises, what does it mean if theradial plot indicates more than four peaks? Such a resultcould indicate an object of the type shown in Fig. 11. Thisfigure could be interpreted as two X chromosomes, one on

top of the other. The polar plot does not make it immedi-ately obvious that this is the case, but it does indicate thatthe possibility exists. The object can be considered further,using two different points as new origins of polar axes. This

0 r 37r 27r2 2

0

Fig. 12. Overlapping chromosomes reconsidered using a new origin.

is an attempt to shift attention to one chromosome only,rather than concentrating on both at the same time. Thedirection from the old origin to a new origin will be in thedirection "pointed to" by the largest peaks or pairs of adja-cent peaks. This assumes that each of the largest peaksappear so large because a chromosome is displaced fromthe origin in the direction of that peak. The distance thatthe origin must be moved in order to place it at the centroidof the chromosome can be approximated from the heightsof the peaks, possibly as the difference between the heightof the tallest peak and the average height of all peaks.Once the coordinates have been translated as shown in

Fig. 12, the polar plot can be reexamined in a new light. Itis assumed that the largest peaks in this profile are not partof the chromosome that we have tried to center. In Fig.13(a) these peaks have been deleted. Another criterion fordeleting these peaks from the polar plot is the fact that theyare heavily skewed to the right. A strongly skewed peakindicates an arm emanating from some point other thanthe center. Size, however, is a first test ofpeaks to be deletedsince a skewed peak can result when an arm of a chromo-some is merely bent. (Two examples of such skewed peaksare shown in Fig. 13.) With this first simplification, we nowhave six peaks left from the original eight.Of these six remaining peaks, one can be eliminated be-

cause it is much smaller than the others. (It should be re-

membered that the decision is really made on the basis ofthe function v(4)(0), in which the differences in sizes wouldbe greater.) When peak (a) is deleted, the polar plot hasarrived at the state depicted in Fig. 13(b). None of the re-

maining peaks can be removed because of its size or shape.

33343330272h

3440475149464545454443372 t155

1

51529435149433631282830

23w2

R(O)

1021

Page 9: Computer Analysis of Chromosome Patterns: Decision Making

IEEE TRANSACTIONS ON COMPUTERS, SEPTEMBER 1971

R(O)

0 712

(a)

R(O)

I~~ ~ ~~~ I_0

122

e

7

(b)

37'2

RIO)

Fig. 13. Decomposition

(c)

of the plot in order to distinguish twochromosomes.

If any of the remaining peaks should be removed to findan X chromosome, it must be peak (©), because only in thisway could the object have axial symmetry. Once the dele-tion-generated sharp corners are removed by the smoothingprocedure, the problem has been reduced to a situationwith which we already know how to deal. If, however, eitherpeaks (a) or @) were deleted, this would result in the findingofa possibly misshapen chromosome. Thus, of the five pos-sible interpretations, only the one resulting from the dele-tion of peak ( reflects a complete and well-defined chro-mosome image.

CONCLUSION

The experimental pattern recognition program describedhere employs several feature extraction techniques to enablecomputer analysis of chromosome patterns. The program-performs data reduction to isolate objects from the scene

in an efficient manner based on the line at a time input ofscanned picture data from magnetic tape, and retaining theuseful boundary information in this process. One test-aconvexity indication eliminating blobs-is based on the

easily computed integral projections/chord profiles usedelsewhere in pattern recognition by Pavlidis and Ball.Boundary data is encoded and features obtained (peaks ina polar plot of the boundary indicating chromosome arms)as the program continues past this test through its decisiontree. Features are summarized in a pattern vector whichconsists of the relative maxima and minima in the finalpolar plot of the chromosome, while the record of transla-tions and deformation which led to this plot is retained.Methods for using the data to classify the chromosomes

(normal Xchromosome, abnormal Xchromosome, Ychro-mosome, and nonchromosome) are described which couldeasily be incorporated in the existing program, to make itan operational program. Finally, a method for resolvingcases of overlapping chromosomes by selecting subsets ofthe polar plot peaks was presented, as was a method forassigning a confidence level to each decision.The key concept presented here is a two-level logical

structure of the pattern recognition program which allowsit to select the procedures and criteria most likely to lead toa correct decision. In particular, the program uses a chang-ing organization of the features of the object discovered asthe decision-making task changes. The resulting patternrecognition concept we term dynamic encoding for featureextraction, while the 6verall task this program is intendedto perform is that of flexible decision making needed forpicture processing.

REFERENCES[1] H. S. Frey, "An interactive computer program for chromosome

analysis," Comput. Biomed. Res., vol. 2, Feb. 1969, pp. 274-290.[2] P. W. Neurath, B. L. Bablonzian, T. H. Warms, R. C. Serbagi, and

A. Falek, "Human chromosome analysis by computer-An opticalpattern recognition problem," Ann. N. Y. Acad. Sci., vol. 128, Jan.1966, pp. 1013-1028.

[3] R. Ledley, "High speed automatic analysis of biomedical pictures,"Science, vol. 146, Oct. 1964, pp. 216-223.

[4] R. Ledley, L. Rotolo, T. Golab, J. Jacobsen, M. Ginsberg, andJ. Wilson, "FIDAC: Film input to digital automatic computer andassociated syntax directed pattern recognition programming sys-tem," Optical and Electro-Optical Information Processing, J. Tippettet al. Eds. Cambridge, Mass.: M.I.T. Press, 1965, ch. 33, pp.591-612.

[5] C. J. Hilditch and D. Rutovitz, "Chromosome recognition," Ann.N. Y. Acad. Sci., vol. 157, 1969, pp. 339-364.

[6] D. Rutovitz, "Centromere finding: Some shape descriptors for smallchromosome outlines," Machine Intelligence, vol. 5, 1970, pp. 435-462.

[7] C. J. Hilditch, "The principles of a software system for karyotypeanalysis," Human Population Cytogenetics (Pfizer Medical Mono-graphs). London: Edinburgh House, 1970, pp. 298-325.

[8] G. Gallus and P. Neurath, "Imnproved computer chromosome analy-sis incorporating pre-processing and boundary analysis," Phys.Med. Biol., vol. 15, 1970, pp. 435-445.

[9] T. Pavlidis, "Analysis of set patterns," in Pattern Recognition, vol. 1.New York: Pergamon Press, 1968, pp. 167-178.

[10] , "Computer recognition of figures through decomposition,"Inform. Contr., vol. 12, May-June 1968, pp. 526-537.

[11] G. H. Ball, "An invariant input for a pattern recognition machine,"Stanford Electron. Lab., Stanford University, Stanford, Calif., Tech.Rep. 2003-4, Apr. 1962.

1022

)