a mass assignment method for prototype induction

A Mass Assignment Method forPrototype InductionJ. F. Baldwin*, J. Lawry†, T. P. Martin‡A.I. Group, Department of Engineering Mathematics, University of Bristol,Bristol BS8 1TR, United Kingdom

A form of prototypes defined as tuples of marginal probability distributions is introduced.Addition and subtraction operations on such prototypes are then described. A briefintroduction to the mass assignment theory of the probability of fuzzy events is given, andit is shown how fuzzy sets can serve as conceptual descriptions of probability distribu-tions. Hence fuzzy descriptions of prototypes can be derived and these can be used forinference as well as enabling rule based representations of a set of prototypes to beformed. A prototype induction algorithm, based on these ideas together with the additionand subtraction operations, is described. The potential of this approach is then illustratedby its application to a number of model and real world machine learning problems.Q 1999 John Wiley & Sons, Inc.

I. INTRODUCTION

The idea of concept induction is of fundamental importance in fields suchas machine learning and knowledge discovery. For example, in classificationproblems a number of instances of each class is presented and it is required thatgeneral methods for discriminating between these classes be inferred. A myriadof machine learning algorithms use a wide variety of different approaches to thistask, but here we shall be interested in the following interpretation of theproblem. Can we learn general descriptions of the classes in terms of a set of

Ž .attributes, which can then be used to decide to which class if any an unclassi-fied object belongs? Each description of a class then can be viewed as aprototype constructed from the examples of that class present in the databaseŽ .or previously encountered in the environment . Traditional approaches to theproblem of prototype learning have taken the form of clustering algorithms1,2

where the objective is to discover the position of cluster centers or fuzzy clustercenters3 ] 5 in attribute space. These centers are then used in conjunction withsome distance metric to determine regions of the space corresponding to each

*e-mail: [email protected]†Author to whom all correspondence should be addressed. e-mail: j.lawry@

bristol.ac.uk‡e-mail: [email protected]

Ž .INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 14, 1041]1070 1999Q 1999 John Wiley & Sons, Inc. CCC 0884-8173r99r011041-30

BALDWIN, LAWRY, AND MARTIN1042

class. These regions can then be seen as characterizing the class. In the sequel,however, we shall adopt a somewhat different approach.

An alternative interpretation of a prototype representing a class is as anŽ .amalgam of points or objects belonging to that class, which are in some way

similar. Clearly, if such a prototype is to be at all representative of the whole setof points, it cannot be precisely defined. More specifically, we proposed to defineprototypes as tuples of probability distributions over each attribute universedetermined by the collection of points to be amalgamated. For instance, supposea supermarket chain wishes to determine descriptions of prototypical consumersof a product, such as jelly, in terms of the attribute Age. This information mightthen be used together with currently available information on the distribution ofAge for the population within the catchment area of any particular shop in orderto determine the quantity of the product that shop should stock. Now supposethat essentially two age groups buy jelly: people in their mid-twenties to thirtieswho have young children and elderly people who find it easy to eat and digest.Given data collected across the supermarket chain relating Age to consumption,we would want to learn prototypes representing these two groups. The notion ofsimilarity involved here is that of closeness between ages so we might choose, ina simplistic manner, to put all jelly buyers under 50 in one group and all thoseover 50 in another. In this case, the two prototypes would correspond to thefrequency distribution of values of Age for each group.

In addition to prototype generation, we must also address the problem ofhow to make inferences from such prototypes in order to determine theclassification class of an object. In this context then, we are faced with theproblem of determining a degree of matching between a probability distribution,representing a prototype, and a point or indeed between two probability distri-butions if there is uncertainty regarding the exact nature of the object to beclassified. In order to obtain such a measure we propose to generate a concep-tual description of each distribution, in terms of fuzzy sets, so that a degree ofmatch can then be determined according to the mass assignment theory of theprobability of fuzzy events.6 The notion of fuzzy constraints on attributesencoding probability distributions was first proposed in this context by Baldwin,7,8

and the idea has been utilized in a number of machine learning methods, manyof which form part of the Fril Databrowser.8 Previous methods, however, haveinvolved the generation of only a single class prototype albeit in terms ofcompound attributes in some approaches.9 In the following, we shall describe analgorithm for learning multiple prototypes for each class.

A further advantage of deriving fuzzy descriptions of prototypes is in termsof knowledge representation. Recent interest in the field of automated knowl-

Ž .edge discovery sometimes referred to as data mining has meant added empha-sis being placed on the transparency or comprehendability of the inferredknowledge. The paradigm of logic programming is particularly suitable here,since rule representations of knowledge are amongst the simplest and mostaccessible. Furthermore, given that both probabilistic and fuzzy uncertainty areinherent to this approach, a language incorporating these features would also bedesirable. Fril is a logic programming style language extended to allow proba-bilistic inference, according to the support logic calculus.10 ,7 In addition, Fril

A MASS ASSIGNMENT METHOD 1043

allows for the representation of discrete and piecewise continuous fuzzy sets andfor predicates to take fuzzy values. The latter necessitates an extended form ofunification, referred to as semantic unification, which takes into account thesemantic relationship between fuzzy sets over the same universe and quantifiesthe extent to which they match. This algorithm is also based on the theory of theprobability of fuzzy events mentioned previously.6 In view of these propertiesand our need for models defined within an uncertainty calculus, Fril provides anideal implementation language for our learning algorithm, as well as providing agood knowledge representation framework for the models generated. Indeed,each prototype can be represented by a Fril rule and the built-in uncertaintycalculus can be used directly for inference. For example, Fril equivalence rulesfor the jelly problem could be

product bought by X is jellyŽ .ŽAge of X is fairly young : 1 1 0 0Ž . Ž . Ž .Ž ..]

product bought by X is jellyŽ .ŽAge of X is old : 1 1 0 0Ž . Ž . Ž .. Ž .

w x Ž .where fairly young and e old are fuzzy subsets of 0, 100 e.g., Fig. 1 .]Notice that for this one-dimensional example we could replace the two

prototypes by a single prototype, and hence single rule, corresponding to thedisjunction fairly young k old. This option, however, is not readily available to]us for higher dimensional problems as it would necessitate the representation ofprototypes as higher dimensional fuzzy sets instead of the decomposed represen-tation we have chosen. Such a higher dimensional representation would havesignificant disadvantages in terms of the simplicity and transparency of theassociated rules. Now suppose that for a particular store the distribution of Age

Ž .for the catchment area has fuzzy description middle aged Fig. 1 . Intuitively, it]might be said that middle aged is closer, in some sense yet to be clearly defined,]to the prototype fairly young than to old and hence a measure of the likelihood]

Žof a shopper buying jelly would be given by the degree of match or semantic.unification of fairly young given middle aged.] ]

Figure 1. Fuzzy descriptions of prototypical consumers of jelly.


In the following we shall develop these ideas but initially we give some basicdefinitions relating to prototypes and their manipulation.

Ž .DEFINITION 1.1 Prototype . A prototype in V = ??? = V is a tuple1 n² :p , . . . , p , k where p is a probability distribution on V and k g N. Here, and1 n i iin the sequel, the attribute uni erses V for i s 1, . . . , n are assumed to be finite.iThe intuition behind this definition is that a prototype is representati e of a set of kcases where p is the distribution of attribute X across the cases.i i

In order to infer prototypes we require a means of both positive updatingŽ .and merging and negative updating. In this context these operations will bedefined in terms of the following notions of prototype addition and subtractionrespectively.

Ž . ² :DEFINITION 1.2 Prototype Addition . Let P s p , . . . , p , k and P s1 1 n 2² : w xq , . . . , q , c be prototypes in V = ??? = V . Then P q P is a prototype in1 n 1 n 1 2V = ??? = V such that1 n

kp x q cq xŽ . Ž .i i² :w xP q P s r , . . . , r , k q c where ; x g V r x sŽ .1 2 1 n i i k q c

Ž . ² :DEFINITION 1.3 Prototype Subtraction . Let P s p , . . . , p , k and P s1 1 n 2² : w xq , . . . , q , c be prototypes in V = ??? = V . Then P y P is the prototype in1 n 1 n 1 2V = ??? = V such that1 n

w x w xP y P q P s PŽ .1 2 2 1

pro¨ided such a prototype exists and is left undefined otherwise. Notice that ifw x ² : Ž . w Ž . Ž .xP y P s r , . . . , r , t then t s k y c and r x s kp x y cq x rk y c pro-1 2 1 n i i i

Ž Ž . . Ž . Ž .¨ided that k ) c and ; x g V p x y k q c F cq x F kp x for i s 1, . . . , n.i i i i

A possible motivation for definitions 1.2 and 1.3 could be as follows.Suppose the prototypes P and P are generated from the sets of objects S and1 2 1S so that p and q correspond to the probability distributions on V of objects2 i i i

w xin S and S , respectively. In this case, P q P corresponds to the tuple of1 2 1 2Ž .distributions generated by S j S see Fig. 2 . Similarly, if S ; S , then1 2 2 1

w xP y P gives the tuple of distributions of objects in S y S .1 2 1 2In order to determine which objects should be associated with which

prototypes some notion of similarity or distance must be defined. In fact, as willbecome apparent in a later section, objects can be viewed as a special case ofprototypes and therefore it is a distance measure between prototypes that isrequired. Such a measure will also be important when the issues of prototypemerging and the formation of new prototypes are addressed. The followingmetric was first proposed by Baldwin.11


Figure 2. Adding prototypes.

Ž .DEFINITION 1.4 Prototype Distance . Let d be a distance metric on V ; then thei i� 4nprototype distance measure based on d is a function D: P = P ª R , where Pi is1

is the class of all prototypes in V = ??? = V such that1 n

n n

D P , P s E d s p x q y d x , yŽ . Ž . Ž . Ž . Ž .Ý Ý Ý Ý1 2 p =q i i i ii iis1 is1 xgV ygVi i

² : ² :and P s p , . . . , p , k , P s q , . . . , q , c .1 1 n 2 1 n

In the next section we introduce some basic notion from the mass assign-ment theory of the probability of fuzzy events.

II. THE PROBABILITY OF FUZZY EVENTS

In order to compare an object and a fuzzy description of a prototype, ameans of evaluating the conditional degree of match between two fuzzy sets isrequired. We shall adopt a measure based on the mass assignment theory offuzzy events6 corresponding to conditional probability. Initially, we give thedefinition of a mass assignment of a fuzzy set.

Ž .DEFINITION 2.1 Mass Assignment . Let f be a fuzzy subset of a finite uni erse V� 4such that the range of the membership function of f , x , is y , . . . , y wheref 1 n

y ) y ) 0. Then, the mass assignment of f , denoted m , is a probabilityi iq1 fdistribution on 2V satisfying

<m F s y y y where F s x g V x x G y for i s 1, . . . , nŽ . Ž .� 4f i i iq1 i f i

� 4n Ž .F are referred to as the focal elements sets of m .i is1 f

The motivation for considering mass assignments12,13,7 is to provide asemantics for membership functions of fuzzy sets. Essentially, the idea is that a

Ž .fuzzy or vague concept is simply a concept for which the definition is uncertain


Ž .or variable across, say, a population of voters §. Each possible definitioncorresponds to a subset of the universe of discourse and a probability distribu-

Ž .tion mass assignment across these definitions can then be defined. Given sucha distribution, the focal sets are taken to be those with nonzero mass. In fact, forthe above definition we make the added assumption that the uncertainty is onlyregarding the degree of generality or specificity of the definition so that the focalsets form a nested hierarchy. The membership value of an element is thendefined as the sum of the masses for the focal sets containing that element.Given these constraints, there is a unique mass assignment corresponding to anyfuzzy set, as given in definition 1.1. Note that a slightly different perspective onthe above is to view the definition of a vague concept as a random set into thepower set of the universe and the mass assignment as its distribution.14,15

For a probability distribution Pr on V we now define the conditionalprobability of f given g as the expected value of the conditional probability ofthe focal elements of m given the focal elements of m relative to Pr andf gassuming that the joint mass assignment generated by f and g is given bym = m .6 The idea behind this is that since the definitions for f and g aref g

Ž .uncertain there is also uncertainty regarding to which classical conditionalŽ .probability Pr f N g refers. If we then make the assumption that the two

definitions come from different and independent sources then m = m gives usf ga probability distribution across possible conditional probability values. In this

Ž .case a natural estimate for Pr f N g is to take the expected value of thisdistribution.

Ž .DEFINITION 2.2 Conditional Probability . For Pr a probability distribution on afinite uni erse V and f and g fuzzy subsets of V such that g is normalized theconditional probability of f gi en g is defined by

Pr F l GŽ .i jPr f N g s m F m GŽ . Ž . Ž .Ý Ý f i g jPr GŽ .jF Gi j

� 4 � 4where m , F , and m , G are the mass assignments and focal elements for f andf i i g j jg, respecti ely.

Now for any normalized fuzzy set g we can clearly define, according todefinition 2.2, a posterior distribution resulting from conditioning on g. This isreferred to as the least prejudiced distribution of g with respect to the prior Pr.More formally,

m GŽ .g j; x g V lp x s Pr x N g s Pr xŽ . Ž . Ž . Ýg Pr GŽ .jG : xgGj j

§This is a somewhat non-standard definition and is sometimes referred to as theŽ 16.epistemic view of vagueness see Williamson .


Indeed it can be shown6 that the probability of f given g as defined in 2.2 isequivalent to the probability of f relative to the distribution lp on V.17 That isg

Pr f N g s x x lp xŽ . Ž . Ž .Ý f gxgV

Example 2.3. Consider a fair six sided dice so that the probability distribution� 4 Ž . Ž .on 1, 2, 3, 4, 5, 6 is given by Pr 1 s ??? s Pr 6 s 1r6. Now suppose we know

that the outcome of a throw of the dice is small ©alue where small ©alue s 1r1] ]q 2r0.7 q 3r0.3. The mass assignment for this fuzzy set is given by msm a l l © a l u e]� 4 � 4 � 4s 1, 2, 3 :0.3, 1, 2 :0.4, 1 :0.3 and hence the corresponding least prejudiceddistribution is

1 1 1lp 1 s 0.3 q 0.4 q 0.3Ž .sm a l l © a l u e] � 4 � 4 � 41, 2, 3 1, 2 1

s 0.1 q 0.2 q 0.3 s 0.61 1

lp 2 s 0.3 q 0.4 s 0.1 q 0.2 s 0.3Ž .sm a l l © a lu e] � 4 � 41, 2, 3 1, 2

1lp 3 s 0.3 s 0.1Ž .sm a l l © a lu e] � 41, 2, 3

If given this information we were to query the likelihood that the outcome ofthe throw was in fact about two s 1r0.5 q 2r1 q 3r0.5 then the required]conditional probability is given by

6<Pr about two small ©alue s x i lp iŽ . Ž .Ž . Ý] ] ab ou t t w o sm a l l © a lu e] ]

is1

s 0.5 0.6 q 1 0.3 q 0.5 0.1 s 0.65Ž . Ž . Ž .

The notion of least prejudiced distribution provides a mechanism by whichwe can, in a sense, convert a fuzzy set into a probability distribution. That is, inthe absence of any prior knowledge, we might on being told g naturally infer thedistribution lp relative to the uniform prior. If, however, fuzzy sets are to servegas descriptions of probability distributions, the converse must also hold. In otherwords, given a probability distribution, we require it to hold that there is aunique fuzzy set conditioning on which yields this distribution.

III. FUZZY DESCRIPTIONS OF PROBABILITY DISTRIBUTIONS

In what sense can fuzzy sets serve as descriptions for probability distribu-tions? Let us respond initially to this question by considering the relationshipbetween crisp sets and probability distributions. For the dice problem in exam-

Ž .ple 2.3 there is a natural sense in which the probability distribution Pr 2 sŽ . Ž . Ž . Ž . Ž . �Pr 4 s Pr 6 s 1r3, Pr 3 s Pr 1 s Pr 5 s 0 is characterized by the set 2,44, 6 . In the absence of any prior knowledge regarding the distribution over the


Ž .universe and assuming a uniform prior i.e. a fair dice then conditioning on theset of even scores yields the afore mentioned distribution. It is clearly, also theonly crisp set with this property. Furthermore, we can view the label ‘even

� 4score’, the interpretation of which is 2, 4, 6 , as a conceptual description of thisdistribution. In general for a finite universe the probability distribution gen-erated in this manner by a crisp subset S of cardinality n satisfies ; x g

Ž . Ž .S Pr x s 1rn and ; x f S Pr x s 0. Suppose, however, we encounter a proba-bility distribution of a more general form but where we also have no priorknowledge. In this case our description must be a fuzzy set. For example,

Ž . Ž . Ž .suppose we have the distribution Pr 1 s 0.6, Pr 2 s 0.3, Pr 3 s 0.1 then weknow from example 2.3 that the required fuzzy set description is 1r1 q 2r0.7 q3r0.3 since conditioning on the latter fuzzy set yields the former distribution.Indeed it is the only fuzzy set with that property as the following result shows.

THEOREM 3.1. Let Pr be a probability distribution on a finite uni erse V taking as� 4 na range of ¨alues p , . . . , p where 0 F p - p F 1 and Ý p s 1. Then Pr is1 n iq1 i is1 i

the least prejudiced distribution of a fuzzy set g if and only if g has mass assignmentŽ . � < Ž . 4gi en by m G s y y y for i s 1, . . . , n where G s x g V Pr x G p andg i i iq1 i i

< < n Ž < < < <.y s G p qÝ G y G p .i i i jsiq1 j jq1 j

Ž 18 .Proof. see Baldwin, Lawry and Martin

Example 3.2. Consider a game where a blue or a red dice is thrown behind ascreen. Only one of the dice is thrown and the player, who is told only theoutcome of the throw, must guess whether it is the blue or red dice. It is knownthat both dice are biased and the player is provided with a database of theresults of previous throws each labeled with the color of the dice thrown.Suppose that from this database hershe obtains the following distributions:

Pr s 1:0.1, 3:0.1, 4:0.3, 5:0.2, 6:0.3r ed

and

Pr s 1:0.4, 2:0.4, 4:0.2b lue

Given this information a fuzzy description of each dice in terms of its score canbe determined. For red the fuzzy description red is derived according totheorem 3.1 as follows:

p s Pr 4 s Pr 6 s 0.3, p s Pr 5 s 0.2,Ž . Ž . Ž .1 r ed r ed 2 r ed

p s Pr 1 s Pr 3 s 0.1Ž . Ž .3 r ed r ed

and

� 4 � 4 � 4G s 1, 3, 4, 5, 6 , G s 4, 5, 6 , G s 4, 61 2 3


Therefore,

y s 2 p q p q 2 p s 1, y s 3 p q 2 p s 0.8, y s 5p s 0.51 1 2 3 2 2 3 3 3

so that

red s 4r1 q 6r1 q 5r0.8 q 1r0.5 q 3r0.5

and similarly we find that

blue s 1r1 q 2r1 q 4r0.6

We have only addressed the problem of generating fuzzy descriptions ofprobability distributions on finite universes, i.e., discrete distributions. However,most real world problems involve continuous attributes and for these cases someform of partitioning of the relevant universes is required. Fuzzy sets can be usedto divide such universes into information granules, a term defined by Zadeh as agroup of points drawn together by similarity,19,20 and these can be viewed ascorresponding to the meaning of words from natural language. A linguisticvariable21 can then be defined which is associated with the original continuousattribute and takes as it values the words. Fuzzy sets on the words can then beinferred and rules can be expressed in terms of the linguistic variables. A formaldefinition of linguistic variable is as follows.

Ž .DEFINITION 3.3 Linguistic Variable . A linguistic äriable is a tuple² Ž . :5 Ž .L, T L , V, M in which L is the name of the äriable, T L is a finite term set

Ž .of labels or words i.e. linguistic älues , V is a uni erse of discourse and M is asemantic rule.

The semantic rule M is defined as a function, which associates a normal-Ž . Ž .ized fuzzy set with each word in T L . In other words the fuzzy set M w can be

viewed as encoding the meaning of w so that for x g V the membership valueŽ .x x quantifies the suitability or applicability of the word w as a label for theM Žw .

value x. Given this interpretation we can regard the semantic function M asbeing determined by a group voting model7,22,23 across a population of voters as

Ž .follows; each voter is asked to provide the subset of words from T L which areŽ .appropriate as labels for the value x. The membership value x x is thenM Žw .

taken to be the proportion of voters who include w in their set of labels.

� Ž . Ž . Ž .4Example 3.4. Consider the set of words small s , medium m , large l asw xvalues of a linguistic variable SIZE labeling elements of V s 0, 100 . Given a

5Zadeh originally defined a linguistic variable as a quintuple by including a syntacticŽ .rule according to which new terms i.e., linguistic values could be formed by applying

quantifiers and hedges to existing words. In this context, however, we shall assume thatthe term set is predefined and finite.


set of 10 voters a possible voting pattern for the value 25 is

1 2 3 4 5 6 7 8 9 10

s s s s s s s s s sm m m m m

Ž . Ž .This gives x 25 s 1 and x 25 s 0.5.M Žsmall. M Žmedium.

Now this voting pattern can be represented by a mass assignment on the� 4 � 4 � 4power set of s, m, l , s, m :0.5, s :0.5, which in turn represents a fuzzy set on

the set of words, namely sr1 q mr0.5. This fuzzy set can be viewed as alinguistic description of the value 25 in terms of the words small, medium and

Ž .large and is denoted by des 25 . Notice that the linguistic description of 25S IZ Ecan be expressed in terms of the semantic function M in the following manner:

srx 25 q mrx 25Ž . Ž .M Žsmall. M Žmedium.

Ž . Ž .Hence, in practice we need only define the fuzzy sets M small , M medium andŽ . Ž .M large from which we can determine any linguistic description see Fig. 3 .

More formally, a linguistic description of a value is defined by;

Ž .DEFINITION 3.5 Linguistic Description . Let x g V then the linguistic descriptionŽ . Ž .of x relati e to the set of words T L is the fuzzy subset of T L

des x s wrx xŽ . Ž .ÝL M Žw .Ž .wgT L

In cases where the linguistic ¨ariable is fixed we drop the subscript L and writeŽ .des x .

Figure 3. Finding linguistic descriptions of values.


In Example 3.4 each voter was permitted to select a set of labels for thevalue 25. Suppose we now insist that they select only one label. Assuming thatfor each voter all the labels in their original set of labels are equally likely thenwe obtain a probability distribution on labels. It can easily be seen that thisdistribution is equivalent to the least prejudiced distribution of the linguistic

Ž .description of 25 see section II

lp s s :0.75, m:0.25desŽ25.

When forced to choose 75% of the voting population picks small as a label for25 and 25% pick medium.

Now recall that the least prejudiced distribution is defined only for normal-ized fuzzy sets and hence it is desirable that linguistic descriptions as definedabove be normalized. This corresponds to the assumption that each voter berequired to pick at least one word from the term set as a label for any givenvalue. This will hold if and only if the semantic function generates a linguisticcovering as defined below.

Ž . � 4nDEFINITION 3.6 Linguistic Coëring . A set of fuzzy sets f forms a linguisticis1� 4 Ž .coëring of the uni erse V if and only if ; x g V ' i g 1, . . . , n x x s 1.f i

Having described how fuzzy sets can be used as descriptors for probabilitydistributions we shall now introduce a prototype induction algorithm based onthese ideas.

IV. LEARNING PROTOTYPES FOR CLASSIFICATION

The principal operation underlying mass assignment based prototype induc-tion is that of positive updating. This in turn is based on the notions of prototypeaddition and prototype distance defined in Section I. Let us initially supposethat the number of prototypes required for a particular class C has beenspecified as J. Clearly the value of J must be determined and this problem willbe addressed in the sequel. In addition, we make the assumption that alltraining databases have the following form.

Ž .DEFINITION 4.1 Database . A training database is a set on n q 1 dimensional�² :4Ntuples DB s n , . . . , n , class where n is a fuzzy subset of the finitei, 1 i, n i is1 i, j

uni erse V and class is a class label. Test databases haë essentially the same formj iexcept that tuples do not include a class label.

For problems with continuous attributes the data values can be translatedinto the above form by defining a linguistic variable with the attribute as a basevariable and where a linguistic covering of the attribute space gives the meaningof the linguistic values. Each value of the attribute is then replaced by its


Ž .corresponding linguistic description see Definition 3.5 and the attribute uni-verse by the term set of the linguistic variable.

� 4Now if our current set of prototypes for C are P , . . . , P then given a data1 J² : � 4tuple with class label C, n , . . . , n , C , we update P , . . . , P in the following1 n 1 J

² :manner. The data tuple generates a prototype lp , . . . , lp , 1 , the distancen n1 2

between which and each of P , . . . , P is then evaluated. Let P* be the closest1 J² : Ž ² :.prototype to lp , . . . , lp , 1 so that it satisfies D P*, lp , . . . , lp , 1 sn n n n1 2 1 2

Ž ² :.min D P , lp , . . . , lp , 1 ; in the case that a number of prototypesis1, . . . , J i n n1 2

satisfy this condition one is simply picked at random. P* is then updated byw x² :addition to the data prototype giving a new prototype P* q lp , . . . , lp , 1n n1 2

Ž .and all other prototypes remain unchanged see Fig. 4 . This process is thenrepeated for a number of cycles through the training database to infer a final setof prototypes for each class.

Once a set of prototypes for each class has been obtained these can then beused for classification. Given a data tuple to be classified the prototype distancemeasure is used to select the class prototype nearest the data prototype for eachclass. Fuzzy descriptions of the distributions for these prototypes are thengenerated according to the methods described in Section III. More specifically,

² :suppose the prototype selected for class C as nearest n , . . . , n is1 n² :p , . . . , p , k . Fuzzy descriptions of p , . . . , p are then derived according to1 n 1 ntheorem 3.1 and are denoted by f , . . . , f . Given the latter the support for1 n² : n Ž < .n , . . . , n being class C is taken to be Ł Pr f n and the class with the1 n is1 i ihighest support is then selected.

An alternative view of this classification procedure is that each prototyperepresents a possible rule description of the class. Essentially a single rule isthen selected for each class and these are then used for classification. Forexample, the Fril equivalence rule corresponding to the prototype selected forclass C would be

class is CŽ .ŽA is f , . . . , A is f : 1 1 0 0Ž . Ž . Ž . Ž .Ž ..1 1 n n

Figure 4. Positive updating of prototypes.


� 4nwhere A is the set of attribute for the problem. If information regardingi is1the data point is added as the Fril facts

A is nŽ .Ž .1 1...

A is nŽ .Ž .n n

then the built-in Fril uncertainty processing calculus can be used directly toobtain a support for C and will yield the same result as given previously.

Ž .Example 4.2 The L problem . Consider the following problem relating to theclassification of good and bad L’s first introduced by Baldwin.24 A mask israndomly dropped onto a grid and moved until some black pixels are encoun-

Ž .tered see Fig. 5 . Patterns for good L and bad L are given in Figure 6.

² :We use the notation A, B, C, D where each element of this vector takesthe value 1 if the square is black and 0 if the square is white and where

A B

C D

Setting J s 3 for both classes and taking the initial prototypes to be tuples ofthe uniform distribution on the four binary spaces we ran the positive updatingalgorithm for 25 cycles through the training data. The prototype distance metric

Ž 2 .was defined in terms of the Hamming distance see Kohonen for a descriptionon each space and the following prototypes were inferred:

Class: Good

²Ž .Ž .Ž .Ž . :0:2r3 1:1r3 0:0 1:1 0:2r3 1:1r3 0:1 1:0 , ]

²Ž .Ž .Ž .Ž . :0:0 1:1 0:1 1:0 0:0 1:1 0:0 1:1 , ]

²Ž .Ž .Ž .Ž . :0:3r4 1:1r4 0:1 1:0 0:3r4 1:1r4 0:1r2 1:1r2 , ]

Class: Bad

²Ž .Ž .Ž .Ž . :0:0 1:1 0:2r3 1:1r3 0:1 1:0 0:2r3 1:1r3 , ]

²Ž .Ž .Ž .Ž . :0:1 1:0 0:0 1:1 0:0 1:1 0:0 1:1 , ]

²Ž .Ž .Ž .Ž . :0:1 1:0 0:3r4 1:1r4 0:1r2 1:1r2 0:3r4 1:1r4 , ]


Figure 5. The L classification problem.

Figure 6. Patterns for good L and bad L.


These are represented pictorially in Figure 7 where the level on the grey scaleŽfor each square is proportional to the probability i.e., black means 1 has

.probability 1 and white means 1 has probability 0 . Each of these prototypes hasa clear probabilistic interpretation since for all of them one or more of theattributes has binary probability values. In this case the probability distributionsof the remaining attributes correspond to the conditional distribution for thatclass of the attribute given the fixed values of the attributes with binarydistributions. For example, for the first good L prototype the probability distri-

Ž < . Ž <bution for A and C correspond to Pr A B s 1, D s 0, good L and Pr C B s.1, D s 0, good L .

Using the methods described above we can now find fuzzy descriptions ofthese prototypes that are expressed in terms of Fril rules by

Good L Rules

L is goodŽ .ŽA is 0r1 q 1r2r3 B is 1r1 C is 0r1 q 1r2r3Ž . Ž . Ž .Ž

D is 0r1 : 1 1 0 0Ž . Ž . Ž .. Ž .

L is goodŽ .ŽA is 1r1 B is 0r1 C is 1r1 D is 1r1 : 1 1 0 0Ž . Ž . Ž . Ž . Ž . Ž .Ž . Ž .

L is goodŽ .ŽA is 0r1 q 1r1r2 B is 0r1 C is 0r1 q 1r1r2Ž . Ž . Ž .Ž

D is 1r1 q 0r1 : 1 1 0 0Ž . Ž . Ž .. Ž .

Figure 7. Pictorial representations of the prototypes for good and bad L.


Bad L Rules

L is badŽ .ŽA is 1r0 B is 0r1 q 1r2r3 C is 0r1Ž . Ž . Ž .Ž

D is 0r1 q 1r2r3 : 1 1 0 0Ž . Ž . Ž .. Ž .

L is badŽ .ŽA is 0r1 B is 1r1 C is 1r1 D is 1r1 : 1 1 0 0Ž . Ž . Ž . Ž . Ž . Ž .Ž . Ž .

L is badŽ .ŽA is 0r1 B is 0r1 q 1r1r2 C is 0r1 q 1r1Ž . Ž . Ž .Ž

D is 0r1 q 1r1r2 : 1 1 0 0Ž . Ž . Ž .. Ž .

Now consider the possible test data set for this problem, given in Figure 8.Clearly these patterns must result from some error in transmission but without amodel for how these errors can occur we cannot provide any generalization tothe unclassified cases.

For generalization purposes we will assume that only one transmissionerror can occur for any mask return and this error is equally likely for allsquares. We can now compare the results obtained using the prototypes to thosepredicted by the error model. In fact, as is shown in Table I, the prototypes giveexactly the results predicted by the model.

In the above, we have assumed that the number of prototypes J for eachclass has been prespecified. However, without considerable background knowl-edge of the classification problem this cannot be known a priori. Clearly thensome method is required by which both the number and form of the classprototypes can be inferred directly from the data. To achieve this we propose toextend the prototype updating method by allowing a number of new operationsto be performed. These operations will facilitate the formation of new and the

Figure 8. Unclassified cases.


Table I. Performance on test examples.

Class Predicted byBayesian Analysis

Normalized Assuming ErrorTest Pattern Supports Predicted Class Model

² :1, 0, 1, 0 G:0.742, B:0.258 G G² :0, 1, 0, 1 G:0.258, B:0.742 B B² :1, 1, 0, 1 G:0.0008, B:0.9992 B B² :1, 1, 1, 0 G:0.9992, B:0.0008 G G² :1, 1, 1, 1 G:0.5, B:0.5 U U² :0, 0, 1, 1 G:0.5, B:0.5 U U

merging of current prototypes. Both operations are defined in terms of theprototype distance measure as follows:

² :Formation of new prototypes. For a data object n , . . . , n , C the current1 n©� 4 � ² :4set of prototypes P , . . . , P for C are updated to P , . . . , P , lp , 1 if for P*1 J 1 J n©² :the closest prototype to lp , 1 it holds thatn

© © ©² : ² : ² :D P*, lp , 1 ) l cycle q max D P*, P* , D lp , 1 , lp , 1Ž . Ž .Ž . Ž .Ž .n n n

� 4Merging prototypes. For set of class prototypes P , . . . , P then an algo-1 Jrithm for the merging operation is:

v Ž . � 41 Let T s B and S s P , . . . , P where the ordering is arbitrary.1 Jv Ž .2 Let

<M s P g S D P , P - g cycle q max D P , P , D P , PŽ . Ž .Ž . Ž .� 4Ž .j 1 j 1 1 j j

where P is simply the first prototype in S.1v Ž . w x3 Merge prototypes in M to from q P and letj

jgM

w xT s T j q Pj½ 5jgM

v Ž .4 Let S s S y Mv Ž .5 If S s B then stop and take T to be the new set of class prototypes. Else go

Ž .to 2

Ž . Ž .Here l cycle and g cycle are updating parameters dependent on thenumber of the current cycle through the data. For the examples given in the

Ž .next section these were defined as linear functions of cycle, l cycle s cycle lŽ .and g cycle s cycleg where l and g were taken to be percentages of the

maximum possible distance between two points in attribute space. In addition,notice that for the definition of prototype distance given in section 1 thedistance between a prototype and itself is not zero. Indeed this value provides a

Ž .measure of the overall spread of the distributions. For instance, if d x, y si


Ž .2 Ž . nx y y then D P, P AÝ ¨ar , where ¨ar is the variance of the distributionis1 i ip . Clearly this must be taken into account when defining the formation andimerging operators and hence the additional terms on the right hand side of the

Ž Ž .two threshold equations. For example, the presence of the term max D P , P ,i iŽ ..D P , P as part of the threshold for merging ensures that two identicalj j

prototypes will be merged.It should be noted that none of the operators defined up to now take

account of the predictive power of the sets of prototypes. In other words, it isdesirable that prototypes responsible for a misclassification should also beupdated but in a negative manner. We introduce the following form of negativeupdating based on the notion of prototype subtraction.

©Negative updating. Suppose given current sets of prototypes a data tuple n©² :is misclassified as class C. Let P* be the C prototype closest to lp , 1 then P*n©w x² :is updated to P* y lp , 1 provided this prototype exists and is left unchangedn

otherwise. All other C prototypes are left unchanged.Normally, negative updating is not carried out for the first cycle through the

data as this can distort the formation of the prototypes.The operations introduced in this section can now be combined to give an

induction algorithm for prototypes as follows:

v Ž .1 Randomized data.v Ž .2 Select the first data tuple encountered for each class as the initial prototype

for that class.©

v Ž . ² :3 For each data tuple n , C perform positive updating of the C prototypes ifthe prototype formation criterion is not met. Otherwise append the list of C

©² :prototypes with lp , 1 .nv Ž .4 Perform the merging operation on the C prototypes.v Ž .5 If the current value of cycle is bigger than 2 obtain a predicted classification

class. If this class is incorrect then perform negative updating.v Ž . Ž .6 If the cycle stopping criterion is met then stop else go to 3 .

In the next section we introduce a number of model and real worldclassification problems and present the results obtained by applying the proto-type induction algorithm to them.

V. THE APPLICATION OF PROTOTYPE INDUCTION TOMACHINE LEARNING

We shall now attempt to illustrate the potential of the mass assignmentbased prototype induction algorithm when applied to a number of problemsfrom the fields of machine learning and data mining. All examples will includecontinuous attributes so that it will be necessary to define linguistic coverings ofuniverses. In these cases the algorithm generates trapezoidal fuzzy sets with avariable degree of overlap to correspond to the meanings of linguistic values.The fuzzy sets are positioned using a percentile based approach so that thecorresponding crisp partitions contain equal numbers of data values. The firsttwo problems are model classification problems designed to demonstrate the


algorithms potential for avoiding decomposition errors and for modeling signifi-cantly non-linear decision boundaries. The third problem relates to a projectundertaken at the Central Research Establishment of the Home Office ForensicScience Service, UK, concerning the classification of glass fragments found atcrime scenes.25

Ž .Example 5.1 The Rotated Ellipse Problem . The problem consists of classify-w x2 2 2ing points in y1.5, 1.5 as legal if they lie within the ellipse y q 2 x s 1 and

² :illegal otherwise given a database of tuples X 9, Y 9, CLASS where X 9 and Y 9' Ž .are attributes representing the rotated co-ordinates X 9 s 1r 2 X q Y and

' Ž .Y 9 s 1r 2 Y y X . The database was generated from a regular grid of 529w x2points from y1.5, 1.5 for each of which the rotated coordinate values were

then calculated and the point labeled with its classification class.

The attribute universes were partitioned by seven trapezoidal fuzzy setswith 40% overlap. These corresponded to the meanings of the linguistic values

² � X 9 X 94 w x :of two linguistic variables, L , w , . . . , w , y2.2, 2.2 , M andX 9 1 7 X 9

² � Y 9 Y 94 w x :L , w , . . . , w , y2.2, 2.2 , M giving the labels of X 9 and Y 9, respec-Y 9 1 7 Y 9

tively. The semantic functions M and M are then defined as follows:X 9 Y 9

X 9 Y 9 w xM w s M w s y1.57:1, y1.32:0 ¶Ž . Ž .1 2

X 9 Y 9 w xM w s M w s y1.82:0, y1.57:1, y0.94:1, y0.69:0Ž . Ž .2 2


X 9 Y 9 w xM w s M w s y0.57:0, y0.31:1, 0.31:1, 0.57:0Ž . Ž .4 4

X 9 Y 9 w xM w s M w s 0.06:0, 0.31:1, 0.94:1, 1.19:0Ž . Ž .5 5

X 9 Y 9 w xM w s M w s 0.69:0, 0.94:1, 1.57:1, 1.82:0Ž . Ž .6 6

X 9 Y 9 w xM w s M w s 1.32:0, 1.57:1Ž . Ž .7 7

The prototype induction algorithm was then run on this problem with learningparameters of l s 6% of the maximum distance for prototype formation andg s 2% for prototype merging. A total of four illegal and two legal prototypeswere inferred after three cycles through the data. Fuzzy descriptions of theseprototypes take the form of fuzzy sets on words. For example, a rule correspond-

¶Note that here we are using the Fril notation for piecewise linear functions10

w x Ž .where x : y , . . . , x : y denotes a function F x such that1 1 n n

w x; x g x , x F x s y y y rx y x x q x y y y x rx y xŽ . Ž . Ž .i iq1 i iq1 i iq1 i iq1 i iq1 i iq1

for is1, . . . , n


ing to one of the legal prototypes is given by:

Class is legalŽ .ŽX 9 X 9 X 9 X 9L is w r0.32 q w r1 q w r0.97 q w r0.04Ž .X 9 3 4 5 6

Y 9 Y 9 Y 9 Y 9L is w r0.07 q w r0.95 q w r1 q w r0.04 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 9 3 4 5 6

² :Knowledge relating to a data point a, b closest to this legal prototype can thenbe represented by the facts:

L is des aŽ .Ž .ž /X 9 L X 9

L is des bŽ .Ž .ž /Y 9 LY 9

² :The support for a, b being legal is then given by the product of the condi-Ž .tional probabilities of the fuzzy prototype description for L given des aX 9 L X 9

Ž .and the fuzzy prototype description for L given des b as described inY 9 LY 9

section IV. Using these prototypes for classification we obtained an accuracy of96.6% on the training set and 96.7% on a test set of 1681 points forming

w x2a regular grid of y1.5, 1.5 . Full details of the prototypes are given inthe appendix and a scatter plot of the points classified as true is in shown inFigure 9.

Ž .Example 5.2 The Figure Eight Problem . In this problem a figure eight shapey0.5Ž .was generated according to the parametric equation x s 2 sin 2 t y sin t ,

y0.5Ž . w x w x2y s 2 sin 2 t q sin t where t g 0, 2p . Points in y1.6, 1.6 are classified asŽ .legal if they lie within the figure see Figure 10 and illegal if they lie outside.

Figure 9. Points from the training set classified as true.


Figure 10. A figure eight classification problem.

² :The database consisted of 961 points X, Y, CLASS from a regular grid onw x2y1.6, 1.6 .

The prototype induction algorithm was then run for three cycles throughthe data with learning parameters set to l s 6% for prototype formation and

Žg s 2% for merging. For the legal class two prototypes were learnt see Figure.11 and six for illegal. Details of all the prototypes for this problem are given in

the Appendix. An accuracy of 94.1% on the training data and 93.5% on a testw x2database forming a regular grid of 2116 on y1.6, 1.6 was obtained.

The decision surface generated is illustrated in terms of a scatter plot of thepoints classified as legal according to the prototypes shown in Figure 12.

Ž .Example 5.3 Glass Identification . This database was taken from the UCImachine learning repository 26 and originates from a project carried out by theBritish Home Office Forensic Science Service Central Research Establishmenton the identification of glass fragments found at crime scenes.25 This is moti-vated by the fact that in a criminological investigation, the glass found at thescene of the crime can only be used as evidence if it is correctly identified. Glassfragments are divided into seven possible classes, although the database only

Ž .contains examples of six there are no instances of class 4 . These are:

v 1 building window]float processed;v 2 building windows]nonfloat processed;v 3 vehicle windows]float processed;v 4 vehicle windows]nonfloat processed;v 5 containers;v 6 tableware;v 7 headlamps.

The classification is to be made on the basis of the following nine attributesrelating to certain chemical properties of the glass:

v 1 RI]refractive index;v Ž2 Na]sodium unit measurement: weight percent of corresponding oxide, as are

.attributes 3]9 ;


Figure 11. Fuzzy descriptions of the legal prototypes for the figure eight problem.

v 3 Mg]magnesium;v 4 Al]aluminum;v 5 Si]silicon;v 6 K]potassium;v 7 Ca]calcium;v 8 Ba]barium;v 9 Fe]iron.

The database consisting of 214 instances was split into a training and test set of107 instances each in such a way that the instances of each class were dividedequally between the two sets. Learning parameters were set at l s 20%Ž . Ž .formation and g s 2% merging and the induction algorithm was run forthree cycles through the data. A total of two prototypes were learnt for classes 1,2, 3, and 5, one prototype was learnt for class 6, and four prototypes for class 7.Fuzzy descriptions of the prototypes for class 1 are given in Figures 14 and 15.


Figure 12. Points classified as legal according to the prototypes.

Figure 13. Linguistic values for Na.

The accuracy obtained, using the prototypes for classification, was 85% onthe training set and 71% on the test set. This compares favorably with otherlearning algorithms. For instance, mass assignment ID327 gave an accuracy of68% on the test set and a neural network with topology 9-6-6 gave 72% on asmaller test set where the network was trained on 50% of the data validated on25% and tested on 25%.

Attributes 1]7 were used to construct the prototypes and each universe waspartitioned into five trapezoidal fuzzy sets, according to the percentile method,except for attribute 3 where the granularity was taken to be 4. Attributes 8 and 9did not exhibit sufficient variation in their values to be partitioned effectively.

Ž .For example, the linguistic covering for Na attribute 2 is shown in Figure 13.


Figure 14. Prototype 1 class 1.


Figure 15. Prototype 2 class 1.


VI. CONCLUSIONS

An induction algorithm for learning prototypes from data has been de-scribed, where the prototypes take the form of tuples of probability distributionson attribute universes. Fuzzy descriptions of prototypes can be derived, accord-ing to mass assignment theory, which can then be used to perform the infer-ences necessary for classification. Rule representations of Prototypes can also begiven in terms of Fril equivalence rules, and this provides for relatively transpar-ent models. Finally, the algorithm’s potential has been demonstrated by itsapplication to a number of model and real world machine learning problems forwhich good results were obtained.

Thanks to Jimi Shanahan for conducting the neural network experiments on theglass database.

APPENDIX

Prototypes Relating to the Rotated Ellipse ProblemDescribed in Example 5.1

Legal prototypes

w X 9:0, w X 9:0.012, w X 9:0.466, w X 9:0.425, w X 9:0.097, w X 9:0, w X 9:0 ,² Ž .1 2 3 4 5 6 7

wY 9:0, wY 9:0.015, wY 9:0.56, wY 9:0.412, wY 9:0.01, wY 9:0, wY 9:0 , :Ž .1 2 3 4 5 6 7 y

w X 9:0, w X 9:0, w X 9:0.103, w X 9:0.457, w X 9:0.43, w X 9:0.01, w X 9:0 ,² Ž .1 2 3 4 5 6 7

wY 9:0, wY 9:0, wY 9:0.02, wY 9:0.462, wY 9:0.51, wY 9:0.01, wY 9:0 , :Ž .1 2 3 4 5 6 7 y

Corresponding fuzzy descriptions are given in the following Fril rules:

legalŽ .ŽX 9 X 9 X 9 X 9L is w r0.04 q w r1 q w r0.96 q w r0.304Ž .X 9 2 3 4 5

Y 9 Y 9 Y 9 Y 9L is w r0.06 q w r1 q w r0.85 q w r0.04 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 9 2 3 4 5

legalŽ .ŽX 9 X 9 X 9 X 9L is w r0.32 q w r1 q w r0.97 q w r0.04Ž .X 9 3 4 5 6

Y 9 Y 9 Y 9 Y 9L is w r0.07 q w r0.95 q w r1 q w r0.04 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 9 3 4 5 6

Illegal prototypes

w X 9:0.0003, w X 9:0.034, w X 9:0.235, w X 9:0.402, w X 9:0.3, w X 9:0.03, w X 9:0 ,² Ž .1 2 3 4 5 6 7

wY 9:0.223, wY 9:0.55, wY 9:0.218, wY 9:0.008, wY 9:0.0012, wY 9:0, wY 9:0 , :Ž .1 2 3 4 5 6 7 y

w X 9:0.234, w X 9:0.582, w X 9:0.184, w X 9:0, w X 9:0, w X 9:0, w X 9:0 ,² Ž .1 2 3 4 5 6 7

wY 9:0, wY 9:0.042, wY 9:0.294, wY 9:0.442, wY 9:0.216, wY 9:0.005, wY 9:0 , :Ž .1 2 3 4 5 6 7 y


w X 9:0, w X 9:0, w X 9:0.0003, w X 9:0.0072, w X 9:0.241, w X 9:0.54, w X 9:0.211 ,² Ž .1 2 3 4 5 6 7

wY 9:0, wY 9:0.019, wY 9:0.243, wY 9:0.381, wY 9:0.26, wY 9:0.097, wY 9:0.0005 , :Ž .1 2 3 4 5 6 7 y

w X 9:0.038, w X 9:0.113, w X 9:0.32, w X 9:0.38, w X 9:0.15, w X 9:0.003, w X 9:0 ,² Ž .1 2 3 4 5 6 7

wY 9:0, wY 9:0, wY 9:0.0079, wY 9:0.047, wY 9:0.26, wY 9:0.48, wY 9:0.212 , :Ž .1 2 3 4 5 6 7 y


illegalŽ .ŽX 9 X 9 X 9 X 9L is w r0.002 q w r0.165 q w r0.768 q w r1Ž X 9 1 2 3 4

X 9 X 9qw r0.9 q w r0.15.5 6

Y 9 Y 9 Y 9 Y 9L is w r0.67 q w r1 q w r0.66 q w r0.035ŽŽ Y 9 1 2 3 4

Y 9qw r0.006 : 1 1 0 0Ž . Ž .Ž .. .5

illegalŽ .ŽX 9 X 9 X 9L is w r0.65 q w r1 q w r0.55Ž .Y 9 1 2 3

Y 9 Y 9 Y 9 Y 9L is w r0.175 q w r0.85 q w r1 q w r0.7ŽŽ Y 9 2 3 4 5

Y 9qw r0.025 : 1 1 0 0Ž . Ž .Ž .. .6

illegalŽ .ŽXX 9 X 9 X X 9 X 9L is w r0.0015 q w r0.029 q w r0.7 q w r1 q w r0.64Ž .X 9 3 4 5 6 7

Y 9 Y 9 Y 9 Y 9 Y 9L is w r0.0937 q w r0.85 q w r1 q w r0.88 q w r0.41Ž Y 9 2 3 4 5 6

Y 9qw r0.003 : 1 1 0 0Ž . Ž .Ž .. .7

illegalŽ .ŽX 9 X 9 X 9 X 9L is w r0.19 q w r0.49 q w r0.94 q w r1Ž X 9 1 2 3 4

X 9 X 9qw r0.6 q w r0.018.5 6

Y 9 Y 9 Y 9 Y 9L is w r0.039 q w r0.2 q w r0.78 q w r1Ž Y 9 3 4 5 6XYqw r0.7 : 1 1 0 0Ž . Ž .Ž .. .7

Prototypes relating to the figure eight problem described in example 5.2semantic function

X 9 Y 9 w xM w s M w s y1.6:1, y1.14:1, -0.96:0Ž . Ž .1 1




X 9 Y 9 w xM w s M w s y0.41:0, y0.23:1, 0.23:1, 0.41:0Ž . Ž .4 4

X 9 Y 9 w xM w s M w s 0.046:0, 0.23:1, 0.69:1, 0.87:0Ž . Ž .5 5

X 9 Y 9 w xM w s M w s 0.5:0, 0.69:1, 1.14:1, 1.33:0Ž . Ž .6 6

X 9 Y 9 w xM w s M w s 0.96:0, 1.14:1, 1.6:1Ž . Ž .7 7

Legal prototypes

w X :0.028, w X :0.211, w X :0.38, w X :0.36, w X :0.029, w X :0, w X :0 ,² Ž .1 2 3 4 5 6 7

wY :0, wY :0, wY :0.019, wY :0.35, wY :0.36, wY :0.24, wY :0.035 , :Ž .1 2 3 4 5 6 7 y

w X :0, w X :0, w X :0.034, w X :0.35, w X :0.37, w X :0.23, w X :0.022 ,² Ž .1 2 3 4 5 6 7

wY :0.024, wY :0.23, wY :0.37, wY :0.34, wY :0.031, wY :0, wY :0 , :Ž .1 2 3 4 5 6 7 y


legalŽ .ŽX X X X XL is w r0.q w r0.69 q w r1 q w r0.98 q w r0.14Ž .X 1 2 3 4 5

Y Y Y YL is w r0.095 q w r0.99 q w r1 q w r0.76ŽŽ Y 3 4 5 6

Yqw r0.16 : 1 1 0 0Ž . Ž .Ž .. .7

legalŽ .ŽX X X X XL is w r0.16 q w r0.98 q w r1 q w r0.76 q w r0.11Ž .X 3 4 5 6 7

Y Y Y Y YL is w r0.12 q w r0.75 q w r1 q w r0.98 q w r0.15 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 1 2 3 4 5

Illegal prototypes

w X :0, w X :0.005, w X :0.33, w X :0.32, w X :0.33, w X :0.013, w X :0 ,² Ž .1 2 3 4 5 6 7

wY :0, wY :0.018, wY :0.34, wY :0.31, wY :0.32, wY :0.018, wY :0 , :Ž .1 2 3 4 5 6 7 y

w X :0.81, w X :0.17, w X :0.019, w X :0, w X :0, w X :0, w X :0 ,² Ž .1 2 3 4 5 6 7

wY :0, wY :0.025, wY :0.35, wY :0.27, wY :0.33, wY :0.033, wY :0 , :Ž .1 2 3 4 5 6 7 y


wY :0.484, wY :0.35, wY :0.16, wY :0.008, wY :0, wY :0, wY :0 , :Ž .1 2 3 4 5 6 7 y

w X :0, w X :0, w X :0, w X :0.035, w X :0.25, w X :0.31, w X :0.41 ,² Ž .1 2 3 4 5 6 7

wY :0, wY :0, wY :0, wY :0.06, wY :0.25, wY :0.31, wY :0.38 , :Ž .1 2 3 4 5 6 7 y



wY :0, wY :0, wY :0, wY :0.002, wY :0.07, wY :0.35, wY :0.58 , :Ž .1 2 3 4 5 6 7 y

w X :0, w X :0, w X :0, w X :0.04, w X :0.19, w X :0.31, w X :0.46 ,² Ž .1 2 3 4 5 6 7

wY :0.45, wY :0.31, wY :0.19, wY :0.05, wY :0, wY :0, wY :0 , :Ž .1 2 3 4 5 6 7 y


illegalŽ .ŽX X X X XL is w r0.025 q w r1 q w r0.96 q w r0.99 q w r0.056Ž .X 2 3 4 5 6

Y Y Y Y YL is w r0.09 q w r1 q w r0.95 q w r0.98 q w r0.09 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 2 3 4 5 6

illegalŽ .ŽX X XL is w r1 q w r0.37 q w r0.056Ž .X 1 2 3

Y Y Y Y YL is w r0.13 q w r1 q w r0.87 q w r0.98 q w r0.16 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 2 3 4 5 6

illegalŽ .ŽX X X X XL is w r0.98 q w r1 q w r0.93 q w r0.3 q w r0.007Ž .X 1 2 3 4 5

Y Y Y YL is w r1 q w r0.86 q w r0.485 q w r0.03 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 1 2 3 4

illegalŽ .ŽX X X XL is w r0.14 q w r0.79 q w r0.9 q w r1Ž .X 4 5 6 7

Y Y Y YL is w r0.25 q w r0.81 q w r0.93 q w r1 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 4 5 6 7

illegalŽ .ŽX X X X XL is w r1 q w r0.97 q w r0.76 q w r0.39 q w r0.006Ž .X 1 2 3 4 5

Y Y Y YL is w r0.008 q w r0.2 q w r0.77 q w r1 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 4 5 6 7

illegalŽ .X X X XL is w r0.17 q w r0.6 q w r0.85 q w r1Ž .X 4 5 6 7

Y Y Y YL is w r1 q w r0.86 q w r0.61 q w r0.2 : 1 1 0 0Ž . Ž .Ž .Ž . .Y 1 2 1 4

References

1. Duda, R.; Hart, P. Pattern Classification and Scene Analysis; Wiley: New York, 1973.2. Kohonen, T. ‘‘Self-Organisation and Associative Memory; 3rd edition; Springer-

Verlag: Berlin, 1989.3. Baldwin, J. F. In Fuzzy Logic in Artificial Intelligence; Ralescu, A., Ed., Lecture

Notes in Artificial Intelligence Vol. 847, 1993; pp. 10]23.


4. Bezdek, J. C. Pattern Recognition with Fuzzy Object Function Algorithms; PlenumPress: New York, 1981.

5. Ruspini, E. Inf Control 1969, 15, 22]32.6. Baldwin, J. F.; Lawry, J.; Martin, T. P. Fuzzy Sets Syst 1996, 83, 353]367.7. Baldwin, J. F.; Martin, T. P.; Pilsworth, B. W. FRIL}Fuzzy and Evidential Reason-

ing in A.I.; Research Studies Press, Taunton, 1995.8. Baldwin, J. F.; Martin, T. P. In Fuzzy Logic; Baldwin, J. F., Ed., Wiley: New York,

1996.9. Baldwin, J. F.; Martin, T. P.; Shanahan, J. G. Proceedings of the Fuzzy Logic in

Artificial Intelligence workshop IJCAI-97, Nagoya, Japan, 1997, pp. 1]12.Ž .10. Baldwin, J. F.; Martin, T. P.; Pilsworth, B. W. FRIL Manual Version 4.0 , FRIL

Systems, Ltd., Bristol Business Centre, Maggs House, Queens Road, Bristol BS81QX, UK, 1988.

11. Baldwin, J. F. In Logic Programming and Soft Computing; Martin, T. P.; ArcelliFontana, F., Eds., Research Studies Press: Taunton, 1998; pp. 19]51.

12. Baldwin, J. F. Symbolic and Quantitative Approaches to Uncertainty, Kruse, R.;Siegel, P., Eds., Lecture Notes in Computer Science 548, Springer-Verlag: Berlin,1991; pp. 107]115.

13. Baldwin, J. F. In The Encyclopedia of AI; Shapiro, S. A., Ed., Wiley: New York, 1992;pp. 528]537.

14. Goodman, I. R.; Nguyen, H. T. Uncertainty Models of Knowledge Based Systems;North-Holland: Amsterdam, 1985.

15. Kreinovich, V. In Random Sets: Theory and Applications; Goutsias, J.; Mahler, R. P.S.; Nguyen, H. T., Eds., Springer-Verlag: Berlin, 1997.

16. Williamson, T. Vagueness; Routledge: 1994.17. Zadeh, L. A. J Math Anal Appl 1968, 23, 421]427.18. Baldwin, J. F.; Lawry, J.; Martin, T. P. Int J Uncertainty, Fuzziness and Knowledge-

Based Syst. 1998, 6, 459]487.19. Zadeh, L. A. In Advances in Fuzzy Set Theory and Applications; Gupta, M.; Ragade,

R.; Yager, R., Eds., North Holland: Amsterdam, 1979; pp 3]18.20. Zadeh, L. A. Fuzzy Sets Syst 1997, 90, 111]127.21. Zadeh, L. A. Part I: Inf Sci 1975, 8, 199]249; Part II: Inf Sci 1975, 8, 301]357; Part

III: Inf Sci 1976, 9, 43]80.22. Gaines, B. R. J Inf Control 1978, 38, 154]169.23. Lawry, J. Int J Approx Reason 1998, 19, 315]333.24. Baldwin, J. F. In Fuzzy Logic; Baldwin, J. F., Ed., Wiley: New York, 1996.25. Evett, I. W.; Spiehler, E. J. Proceedings of the Conference ‘‘KBS in Government,’’

1987; pp. 107]118.26. Baldwin, J. F.; Lawry, J.; Martin, T. P. Proceedings of IPMU98, Paris, France, 1998.27. WWW, UCI Machine Learning repository, http:rrwww.ics.uci.edur; mlearnr

MLRepository.html.

a mass assignment method for prototype induction

Documents