introducing ivsa: a new concept learning algorithm · 2016. 12. 7. · abstract-the iterated...

An lntematlcd Journal

computers & mathematics with applkationr

PERGAMON Computers and Mathematics with Applications 43 (2002) 821-832 www.elsevier.com/locate/camwa

A New Introducing IVSA:

Concept Learning Algorithm

JIANNA J. ZHANG Department of Computer Science, University of Manitoba

Winnipeg, Manitoba, Canada, FL3T 2N2 jisnnacDcs.umsnitoba.ca

(Received and accepted January 2001)

Abstract-The iterated version spme algorithm (IVSA) has been designed and implemented to learn disjunctive concepts that have multiple classes. Unlike a traditional version space algorithm, IVSA first locates the critical attribute values using a statistical approach and then generates the base hypothesis set that describes the most significant features of the target concept. With the base hypothesis, IVSA continues to learn leas significant and more specific hypothesis sets until the system is satisfied with its own performance. During the process of expanding its hypothesis space, IVSA dynamically partitions the search space of potential hypotheses of the target concept into contour- shaped regions until all training instances are maximally correctly classified. Over-fitting is not a problem for IVSA because it does not generate over-fitted candidate hypotheses during the learning. @ 2002 Elsevier Science Ltd. All rights reserved.

Keywords-Machine learning, Iterated version space learning, Concept learning, Over-fitting.

1. INTRODUCTION

Since the mid 195Os, many AI researchers have developed learning systems that automatically improve their performance. Vere’s multiple convergence algorithm [l] and Mitchell’s candidate elimination algorithm [2,3] introduced a novel approach to concept learning known as the version space algorithm (WA). Unlike other learning algorithms, which used either generalization or specialization alone, VSA employed both. VSA guarantees a consistent concept description with all seen instances, and no back tracking for any seen training instances. However, these two advantages of VSA depend on two conditions:

(1) all training instances must be noise free (consistent), and (2) the target concept must be simple (conjunctive).

These problems have prevented VSA from practical use outside the laboratories. During the last decade, some improved algorithms based on VSA have been designed and/or

implemented. In Section 2, we will first introduce VSA and then highlight some of the improved methods compared with the IVSA approach. The discussion in Section 2 focuses on learning a disjunctive concept from six training instances, which are noise free so that problems caused by learning disjunctive concepts can be isolated from problems caused by noise training instances. Section 3 presents the overall approach of IVSA. Preliminary experimental results on several ML

The author is supported by the Natural Sciences and Engineering Research Council of Canada under Grant 228142-2000 and by the GRG Brock Start Grant.

0898-1221/02/t - see front matter @ 2002 Elsevier Science Ltd. All rights reserved. Typeset by AM-m PII: SO898-1221(01)00323-S

brought to you by COREView metadata, citation and similar papers at core.ac.uk

provided by Elsevier - Publisher Connector

https://core.ac.uk/display/82523219?utm_source=pdf&utm_medium=banner&utm_campaign=pdf-decoration-v1

822 J.J. GANG

databases [4] and English pronunciation databases [5] are presented in Section 4. Discussions on

each specific test and sample rules are also given in Section 4. In Section 5, we will summarize

current research on IVSA and give suggestions for future research.

2. VERSION SPACE RELATED RESEARCH

2.1. The Version Space Algorithm

A version apace is a representation that contains two sets of hypotheses, the general hypotheses

(G set) and the specific hypotheses (S set). Both G and S must be consistent with all examined

instances. Positive instances make the S set more general to include all positive instances seen,

while negative instances make the G set more specific to exclude all negative instances seen.

If the training instances are consistent and complete, G and S sets eventually merge into one

hypothesis set. This unique hypothesis is the learned concept description.

The following six noisefree training instances have been selected to illustrate problems with

VSA. The value ‘1’ or ‘0’ in each instance indicates that the patient had a positive or negative

allergic reaction, respectively.

Pi = (lunch, expensive, rice, coffee, Sam’s, l),

Ni = (supper, expensive, bread, coffee, Tim’s, 0),

Ps = (supper, cheap, rice, tea, Tim’s, l),

Ps = (breakfast, cheap, bread, tea, Tim’s, l),

Pa = (supper, expensive, rice, tea, Bob’s, l),

F’s = (supper, cheap, rice, coffee, Sam’s, 1).

Nl: [supper, expensive, bread, coffee, Tim’s, 0]

/L\

[lunch, ?, ?, ?, ?, l] [?, ?, rice, ?, ?, l] pruned pruned

[3 9 9 9 Sam's, l] -, -, *, -, pruned

no sulution I-_[ 1

pruned [?, ?, ?, ?, ?, ?]

P3: [breakfast, cheap, bread, tea, Tim’s, l]

[?, ?, rice, ?, ?, l]

P2: [supper, cheap, rice, tea, Tim’s, l]

[lunch, expensive, rice, co&e, Sam’s, I]

Pl: [lunch, expensive, rice, coffee, Sam’s, l]

Figure 1. The version space after the fourth instance (P3).

Concept Learning Algorithm 823

Figure 1 shows that as soon as instance Pa is processed, the new specific Hypothesis S3 must

be discarded due to over-generalization. When either G or S set becomes empty, the version

space is collapsed, and thus, “no legal concept description is consistent with this new instance as

well as all previous training instances” [2], although a concept description of Yea or rice” can be

easily derived by hand.

2.2. The Extended Approach with Multiple Version Spaces

To overcome the inability of generating disjunctive hypothesis, Mitchell proposed an extended

approach using multiple version spaces (MVS) [2]. In this approach, the operations on sets of

concept descriptions are the same as that of the original version space approach except that

the extended approach builds different subversion spaces whenever a positive instance cannot be

included in the previous boundary sets (G and S). The main idea of using MVS is to find some

subversion spaces that maximally include positive instances and exclude all negative instances.

This approach has been analyzed extensively in [6].

Although MVS can generate disjunctive hypothesis, it introduces the over-fitting problem. To

illustrate this problem, let us use MVS to learn the standard Monks database provided in [4].

In this set of databases, Monks-l is the simplest set to learn since it is noise-free and can be

described with five simple rules. Because MVS requires that each version space must exclude all

negative instances, the learning results are forced to be very specific, and therefore, more than

five rules are required to cover the entire training set. Table 1 shows a subset of generated rules

learned from the Monks-l training instances. The classification accuracy raises very slowly as

more rules are added. With an improved MVS, we have generated a total of 465 rules, but only

15 of them are useful. Although the learning accuracy has reached lOO%, the testing result is

only 89%.

Table 1. A subset of rules learned from the Monks-l database with MVS.

5 0.201612903225806 (l?? 1332)

6 0.225806451612903 (1?3?? l?)

7 0.225806451612903 (1?3?312)

1 8 1 0.241935483870968 1 (1 ? 3 ? 14 21 1 9 1 0.241935483870968 1 (1 ? 3 2 ? 1 1)

10 0.250000000000000 1 (1 ? 3 2 3 2 2)

2.3. The Incremental Version Space Merging

Another recent research into VSA is the incremental version space merging (IVSM) [7]. Instead

of building one version space, IVSM constructs many version spaces VSI,,., where 12 is the number

of training instances. For each i E R, IVSM first constructs VSi using only one training instance,

and then computes the intersection of VSi and VSci_.1). That is, for each pair of boundary hypotheses Gl, Sl in VS, and G2, S2 in V/SC~-~)~(~-~), IVSM repeatedly specializes each pair of hypotheses in Gl and G2, and generalizes each pair of hypotheses in Sl and S2 to form a new version space VSi,(i_l). This merging process repeats until all of the instances have been

learned.

The same six training instances are used to demonstrate IVSM learning. In Figure 2, after

IVSM has computed the intersection for VS, and VS6, the resulting specific hypothesis [‘?, ?, ?,

?, ?, l] is overly generalized. According to the IVSM merging algorithm [7], the current specific

hypothesis must be pruned. IVSM, therefore, does not offer a solution for this particular exercise.

VS,

: P

l: [

lunc

h, e

xpen

sive

, ri

ce,

coff

ee,

Sam

’s,

11

0 Gl

[?,?

,?,?

,?,?

I

[lun

ch,

expe

nsiv

e, r

ice,

cof

fee,

Sam

’s,

11

0 Sl

I

vs,=

VS,

m

erge

s V

S2

[?,

?, ?

, ?,

?,

?]

[lun

ch ,

7 ., v

.t 3 *,

9 -, l]

[?

, ?,

ric

e, ?

, ?,

11

[3 *, 9 -9

7 .9 3 .,

Sam

’s,

l]

[lun

ch,

expe

nsiv

e, r

ice,

cof

fee,

Sam

’s,

l]

0 Sl

VS5

= V

S3 m

erge

sVS,

0 Gl

[?, ?

, ric

e, ?

, ?,

11

[?,

?, r

ice,

?,

?, 1

1

0 Sl

vs,:

Nl:

[sup

per,

exp

ensi

ve,

brea

d, c

offe

e, T

im’s

, 0]

0 Gl

[?,?

,?,?

,?,?

I

[?.c

heap

,?,?

,?,l]

[?

,?,r

ice,

?,?,

l]

[lun

ch 9

9 .,.,.

, -7

-J -?

.,

l]

[O, 0

0 0

0

, , , ,

01 [?

,?,?

,?$J

am’s

J]

0 Sl V

&:

Pz:

[su

pper

, ch

eap,

ric

e, t

ea,

Tim

’s,

l]

0 Gl

[1 *, 3

., 7 .,

7 .9 I

., 71

-

[sup

per,

che

ap,

rice

, te

a, T

im’s

, l]

0 Sl

vs,:

p3:

[bre

akfa

st,

chea

p, b

read

, te

a, T

im’s

, 11

0 Gl

[?,

?, ?

, ?,

?,

?]

[bre

akfa

st,

chea

p, b

read

, te

a, T

im’s

, l]

0 Sl

VS,

=V

S, m

erge

sVS,

0 Gl

[?, ?

, ric

e, ?

, ?,

l]

prun

ed

[1+

-i

prun

ed

[?,

?, ?

, ?,

?,

11

0 Sl

Fig

ure

2. T

he

IVS

M a

ppro

ach

afte

r pr

oces

sing

th

e fo

urth

in

stan

ce

(fi)

.


3. THE ITERATED VERSION SPACE LEARNING

The allergy example is simple and can be described with two hypotheses. But when the number

of training instances, attributes, and classes are getting larger and larger, it becomes more and more difficult to detect which attribute value would be a true feature that distinguishes instances of different classes. However, VSA has already provided a natural way of separating different features. That is, whenever VSA collapses, the search has encountered a new feature. This is one of the new ideas behind IVSA.

3.1. Learning Disjunctive Concepts

Before showing the detailed algorithm and approach, let us apply the same six allergy instances to IVSA. As Figure 3 shows, when the version space is collapsed by processing P3, instead of failing, IVSA first collects G3 and S2 as candidate hypotheses, and then constructs a new version space with Pa to learn a different feature of the same concept. The idea behind this back tracking is to redirect the search from concentrating on one feature to another.

When all six training instances have been processed, IVSA has collected three candidate hypotheses: [?, ?, rice, ?, ?, l]; [?, ?, ?, ?, ?, l]; and [?, ?, ?, tea, ?, 11. These candidate hypotheses then are evaluated using Ri = IE+ l/l,!? ( - [Et- l/lE- 1 , where E+ and E- are sets of all positive and negative training instances, respectively. Er, a subset of E+, is a set of positive instances included by the ith candidate hypothesis, and E,: , a subset of E- , is the set of negative instances included by the same candidate hypothesis. For the allergy example,

R1=$ R2 = 0, and R3=;.

Therefore, [?, ?, rice, ?, ?, I] and [?, ?, ?, tea, ?, l] are selected as the concept description: (( A3 = rice) V (Ad = tea)) --+ allergy.

3.2. Learning from Noisy Training Instances

When training instances contain noise, the noise interferes or even stops the learning. With IVSA, noisy training instances are simply ignored [8]. Here we use the same allergy example in Section 2.1, plus a noise instance Nz = (supper, cheap, rice, tea, Tim’s, 0). Figure 4 shows this learning process. In the first version space, IVSA simply ignores Ns just like it ignores instances representing different features such as P3 in Figure 3 in the second version space. Because Ns is negative, IVSA amalgamates the second version space with P3. But if the incorrect instances were classified as positive, IVSA would start with this instance and later the hypothesis generated from this noisy instance would be discarded. The learned concept description is not interfered with by Nz because IVSA recognizes that Nz either represents a different feature of the concept, or does not belong to the target concept at all.

3.3. Learning Accurate Concept Descriptions

To show how IVSA learns accurate concept descriptions and overcomes the over-fitting problem encountered by the multiple version space method, let us apply the same set of training instances Monks-l to IVSA.

Suppose &lass, A(A”,lue) = cp + en - ep - in is the function that computes the significance of each attribute value, where A, cp, ep, en, and in denote a particular attribute, correctly classified instances, excluded positive instances, excluded negative instances, and incorrectly classified negative instances, respectively. Let Cclass = (cp - ep)/(cp + ep) be the function that decides which class will be used for the default rule. We first use the C function to determine the default rule. In IVSA, the concept of positive and negative is relative. That is, when learning hypotheses for a class, the instances with this target class are positive, while the rest of the instances are

vs,

0 Gl

[?,

?, ?

, ?,

?, l

]

[lun

ch,

?, ?

, ?,

?,

l]

,+[?

, ?,

ric

e, ?

, ?,

11

I?,

?, ?

, ?,

Sam

’s,

11

prun

ed

prun

ed

[?,

?, ?

, ?,

?, l

]

n

s3 P

3: [

brea

kfa

st,

chea

p,

R

brea

d, t

ea,

Tim

’s,

l]

- S

2 [?

, ?,

ric

e, ?

, ?,

l]

P2:

[su

pper

, ch

eap,

ric

e, t

ea,

Tim

’s,

l]

Sl

[lun

ch,

expe

nsiv

e,

rice

, co

ffee

, S

am’s

, 11

Pl:

[l

unch

, ex

pens

ive,

ri

ce,

coff

ee,

Sam

’s,

11

vs,

Gl 0 [I

prun

ed

[?,

?,?,

?,

?,

l] [sup

per,

chea

p,

rice

, co

ffee

, S

am’s

, l]

[?,

?, ?

, te

a, ?

, l]

P4:

[su

pper

, ex

pens

ive,

ri

ce,

tea,

Bob

’s,

l]

[bre

akfa

st,

chea

p, b

read

, te

a, T

im’s

,11

P3:

[br

eak

fast

, ch

eap,

bre

ad,

tea,

Tim

’s,

l]

Can

dida

tes:

A

ssem

blin

g:

(R v

alue

) D

escr

ipti

ons:

[?,

?, r

ice,

?,

?, l

] --

>

[?,?

,?,?

,?,l]

[?

, ?,

ric

e, ?

, ?,

l]

-->

di

scar

ded

[?,

?, ?

, te

a, ?

, l]

-+

[?

, ?,

?,

tea,

?,

l]

Fig

ure

3.

Usi

ng

IVSA

fo

r th

e al

lerg

y ex

ampl

e.

VS

l

-

0 [?

, ?, ?,

?, ?,

11

/ ;’

~

;~~

wpp

$;~

6 G2

[lun

ch,

?, ?

, ?,

?,

11

pru

ned

t 0 G3

-0

64

-[?,

?, r

ice,

?,

?, l

] [?

, ?,

?,

?, S

am’s

, l]

A

pru

ned

N2:

[s

uppe

r,

chea

p, r

ice,

tea

, T

im’s

, 0]

G5

[I

[?,

?, r

ice,

?,

?, l

]

s2 P

P2:

[su

pper

, ch

eap,

ric

e, t

ea,

Tim

’s,

l]

[lun

ch,

expe

nsiv

e,

rice

, co

ffee

, S

am’s

, l]

Pl:

[l

unch

, ex

pens

ive,

ri

ce,

coff

ee,

Sam

’s,

l]

vs,

Gl 0

[?,

?, ?

, ?,

?, l

]

[I

pru

ned

[?,?

,?,?

, ?,

l]

1

coff

ee,

Sam

’s,

l]

[?,

?, ?

, te

a, ?

, l]

Tak

e G

l an

d

S2

as a

par

tial

solu

tion

P4:

[su

pper

, ex

pens

ive,

ri

ce,

tea,

Bob

’s,

l]

d

Sl

[bre

akfa

st,

chea

p, b

read

, te

a, T

im’s

,11

P3:

[br

eak

fast

, ch

eap,

bre

ad,

tea,

Tim

’s,

l]

Can

dida

tes:

A

ssem

blin

g:

Des

crip

tion

s:

I?,

?, r

ice,

?,

‘f, 1

1 [?

,?,?

,?,?

,l]

[?,

?, r

ice,

?,

?, l

] di

scar

ded

[?,

?, ?

, te

a, ?

, l]

[?

, ?,

?,

tea,

?,

11

Figu

re

4.

Lea

rnin

g no

isy

trai

ning

in

stan

ces

with

IV

SA.

828 J.J. ZHANG

Table 2. Significant attributes and their values

negative. Each class has a chance to be selected as a positive target. Since Cr = 0.1935 and Cc = 0.1613, the default rule is (0, ?, ?, ?, ?, ?, ?). With the R function, we have determined the significant attributes and their values which are shown in Table 2.

Next we run the generator to produce candidate hypotheses. With the above significant attribute values, IVSA has generated 25 candidate hypotheses (see Table 3). These hypotheses are then evaluated with (cp - en)/(cp + en), and the results are shown in the second column of Table 3.

Table 3. Evaluated candidate hypotheses.

10 1 1 1 1 1 ? ? ?) 23 -0.29 (1????2?)

11 0.41 (13?????) 24 -0.27 (1????3?)

12 0.24 (l???l??) 25 -0.14 (l? l? ? ? ?)

13 0.077 (l? ? 1 ? ? ?) 26 0.048 (0?2????)

Table 4. Assembling results for Monks-l database with IVSA.

Rule Number Accuracy Corresponding Rule Status

1 I 73% I fl????l?) 1 keot

2 1 84% I f133????) 1 keot 1 3 84%

4 94%

5 94%

6 94%

(1337 l??) discarded

(122????) kept

(122? l??) discarded

(122? l? 1) discarded

7 I 100% I (11 l? ? ? ?) I kent I Accuracy on 431 unseen instances: 100%

After the candidate hypotheses are evaluated, IVSA starts to assemble them into regional hypothesis. The assembled resulting rules are accurate and general. For the 431 unseen instances, the classification accuracy reaches 100% with five rules. Table 4 shows the process of adding rules. If the accuracy increases when a rule is added, it remains at the position that gives the highest increase. But if the accuracy remains the same after a rule is added, it is discarded. The reason that IVSA does not have the over-fitting problem is that it does not produce overly specific rules. It looks for the true features of the target concept by analyzing the training data before any candidate hypotheses has been produced.


3.4. The IVSA Model

Learning a concept is similar to assemble a multidimensional jigsaw puzzle from a large selection

of possible pieces (T set). The target concept can be viewed as the puzzle and each piece of puzzle as an example in T. The pieces that belong to the puzzle are called positive instances, while those that do not belong but share similar features with the positive instances are called negative instances. Not all instances in T can be identified easily because some of them are not, members of the target puzzle, and some of them may be defected due to error. Regardless of how they become a member of T, they are treated as noise in T.

Simple puzzles may be composed of a few pieces while a more complicated puzzle may be composed of hundreds and thousands of pieces. Similarly, simple concepts may be presented by a few instances, while more complicated concepts may have to be presented by hundreds, thousands, or even millions of instances. One way to learn concepts from any set of training instances is to simulate a puzzle assembling process by repeatedly generating candidate hypotheses from uncovered examples, and adding them to the currently learned concept description until it is complete. That is, maximally includes all positive instances and excludes all negative ones.

As shown in Figure 5, IVSA contains the example analyzer, hypothesis generator, assembler, and remover. The example analyzer provides statistical evaluation for each attribute value provided by the instance space to determine the order of input training instances. The hypothesis generator produces a set of candidate hypotheses from the given set of training instances. The hypothesis assembler repeatedly selects the most promising hypothesis from a large number of candidate hypotheses according to the statistical evaluation provided by the example analyzer, and then tests this hypothesis in each position in a list of accepted hypotheses. If adding a new hypothesis increases concept coverage, it is placed in the position that causes the greatest increase; otherwise, this hypothesis is discarded. After the candidate hypotheses have been processed, the list of accepted hypotheses is examined by the hypothesis remover to see if any of the hypotheses can be removed without reducing accuracy. If the learning accuracy is satisfac- tory, the accepted hypothesis set becomes the learned concept description. Otherwise, the set of incorrectly translated instances are fed back to the generator, and a new learning cycle starts.

Analyzer Generator

Classification Accuracy

Figure 5. The IVSA model.

830 J. J. ZHANG

4. EXPERIMENTAL RESULTS

IVSA is tested on some machine learning databases provided by [4,5]. To demonstrate the

consistency of IVSA, a ten-fold cross validation test is used. That is, for each fold of the test,

use 90% of instances to train the system and then with the rules learned from the 90% instances,

testing on 10% unseen instances.

4.1. Learning the Mushroom Database

Table 5. Ten-fold tests on mushroom data (CPU: MIPS R4400).

9 7,312 812 100.00% 100.00% 8 01/46/40

10 7,312 812 100.00% 100.00% 9 01/56/16

Ave. 7,312 812 100.00% 100.00% 9 01/55/04

SD. 0.49 0.49 0.00 00.00 0.30 859.94

The mushroom database (91 has a total of 8,124 entries (tuples or instances). Each tuple

has 22 feature attributes and one decision attribute. The 22 feature attributes have two to five

values and the decision attribute has two values (or classes) ‘p’ (poison) or ‘e’ (eatable). Because

the mushroom database is noise-free, any machine learning program should be able to learn it

accurately. For example, STAGGER “asymptoted to 95% classification accuracy after reviewing

1,000 instances” [lo], HILLARY has learned 1,000 instances and reported an average accuracy of

about 90% on ten runs [ll], a back propagation network developed in [12] has generated ‘crisp

logical rules’ that give correct classification of 99.41%, and variant decision tree methods used

in [13] have 100% accuracy by a ten-fold cross validation test [13]. With IVSA, the predictive

accuracy shown in Table 5 on the mushroom database has reached 100% with nine rules.

4.2. Learning the Monk’s Databases

The Monk’s databases contains three sets: Monk-l, Monk-2, and Monk-3. Each of the three

sets is originally partitioned into training and testing sets [4,14]. IVSA is trained and tested

on Monk-l, Monk-2, and Monk-3 databases, In Table 6, the experiment shows that 5, 61, and 12 rules learned from Monk-l, Monk-2, and Monk-3 databases which gives lOO%, 81.02%,

and 96.30% classification accuracies on three different sets of 432 previously unseen instances,

respectively.

Table 6. Tests on Monks database (CPU: 296 MHz SUNW, UltraSPARC-II).

I Data Instances Accuracy # of CPU Time 1

Base Training 1 Testing 1 Training 1 Testing Rules (Seconds) 1 I Monk-l I 124 I 432 I 100.00% I 100% I 5 I 3 I I Monk-2 I I Monk-3 I

169 I 432 I 100.00% I 81.02% I 61 I 38 I 122 I 432 I 100.00% I 96.30% I 12 I 5 I


Rules learned from Monk-l, (2 2 ? ? ? ? l), (3 3 ? ? ? ? l), (1 1 ? ? ? ? l), (? ? ? ? 1

? l), (? ? ? ? ? ? 0), h s ow exactly the desired concept description with minimum number of

rules allowed by the concept language, which can be rewritten as (head-shape = body-shape) V

(jacket = red) -+ monk. For the Monk-2 database, 61 rules are learned, which is relatively large

compared with the other two sets (Monk-l and Monk-3) due to a highly disjunctive (or irregular)

concept. However, it can be improved with more statistical analysis or some improved instance

space (or representation space) shown in [15] that the predictive accuracy can be as high as lOO%,

although this method is highly specified for only Monk-2 database. For Monk-3 database, IVSA

has learned the following 12 rules, which gives 96.3% classification accuracy despite 5% noise

added to the Monk-3 training instances: (1 1 1 1 3 1 0), (1 2 1 2 3 1 0), (2 2 1 2 2 1 0), (2 2 1 3

3 1 O), (2 2 1 3 3 2 O), (2 3 1 1 3 1 l), (3 3 1 1 3 2 l), (3 3 1 1 4 1 l), (? ? ? ? 4 ? O), (? 1 ? ? ?

? l), (? 2 ? ? ? ? l), (? ? ? ? ? ? 0).

4.3. Learning English Pronunciation Databases

IVSA has been applied to learn English pronunciation rules [5,16,17]. The task is to provide a

set of rules that transform input English words into sound symbols using four steps:

(1) decompose words into graphemes,

(2) form syllables from graphemes,

(3) stress marking on syllables, and

(4) transform them into a sequence of sound symbols.

Learning and testing results are shown in Table 7. For each of the learning and testing accuracy,

the percentage of correctly classified instances are reported both in instances (letter combinations,

syllables, stress patterns, and graphemes) and words.

Table 7. Learning and testing results for individual steps

5. CONCLUSIONS

We have presented a new concept learning method IVSA, its approach, and test results. Our

analysis of previous research shows that the empty version space signals a new feature of the

same target concept presented by a particular instance. The hypotheses generated by previous

version spaces belong to one region of the target concept, while the current hypotheses generated

by a new version space belong to another region of the same concept. IVSA takes advantage of

an empty version space, using it to divide the regions of a concept, and correctly handles noisy

training instances. A concept description can be divided into regions, and each region can be represented by a

subset of training instances. These subsets can be collected according to the statistical analysis

on each attribute value provided by the example analyzer. The technique of rearranging the order of training instances according to the importance of a particular attribute value provides a

practical method to overcome order bias dependency of the training instances.

The demonstration on learning noisy training instances shows that IVSA has strong immunity

to noisy data, and has the ability to learn disjunctive concept. IVSA does not generate over-fitted

hypotheses because the statistic analysis predetermines the most significant attribute values and

the instances that contains these significant attribute values are processed first. The preliminary

experimental results show that rules learned by IVSA obtain high accuracy when applied to

832 J. J. ZHANG

previously unseen instances. In the future, we will intensively test IVSA with additional databases

and improve the example analyzer to obtain higher learning speed and smaller numbers of rules.

REFERENCES

1. S.A. Vere, Induction of concepts in predicate calculus, In Proc. IJCAI-75, Advance Papers of the Fourth International Joint Conference on Art$cial Intelligence, Volume 1, Tbilisi, USSR, 1975, pp. 281-287.

2. T.M. Mitchell, Version spaces: An approach to concept learning, Ph.D. Thesis, Stanford University, Cali- fornia, (1978).

3. T.M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston, MA, (1997). 4. UCI, UC1 repository of machine learning databases and domain theories, f tp : //f tp. its . uci . edu/pub/

machine-learning-databases/(1996). 5. J.J. Zhang, The LEP learning system: An IVSA approach, Ph.D. Thesis, University of Regina, Regina,

Canada (1998). 6. T. Hong and S. Tseng, Learning disjunctive concepts by the version space strategy, National Science Council,

Part A 10, 564-573 (1995). 7. H. Hirsh, Generalizing version spaces, Machine Learning 17, 5-46 (1994). 8. J.J. Zhang and N.J. Cercone, The iterated version space learning, In The Seventh International Workshop

on Rough Sets, fizzy Sets, Data Mining, and Gmular-Sort Computing (RSFDGrC’SS), Yamaguchi, Japan, November 1999.

9. J. Schlimmer, Editor, The Audubon Society Field Guide to North American Mushrooms, Alfred A. Knopf, New York, (1981).

10. J.C. Schlimmer, Incremental adjustment of representations for learning, In Proc. of the Fourth International Workshop on Machine Learning, Irvine, CA, June 1987 (Edited by P. Langley), pp. 79-90.

11. W. Iba, J. Wogulis and P. Langley, ?kading off simplicity and coverage in incremental concept learning, In Proceedings of the 5th International Conference on Machine Learning, pp. 73-79, (1988).

12. W. Duch, R. Adamcsak and K. Grabczewski, Extraction of logical rules from training data using back- propagation networks, In Proceedings of the The First Online Workshop on Soft Computing, pp. 19-30, (1996).

13. P.E. Utgoff, N.C. Berkman and J.A. Clouse, Decision tree induction based on efficient tree restructuring, Machine Learning 29, 5-44 (1997).

14. S.B. Thurn et al., The Monk’s problem: A performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University, Pittsburgh, PA (December 1991).

15. E. Bloedorm, S. Ryszard and S. MichaIski, The AQ17-DC1 system for data-driven constructive induction and its application to the analysis of world economics, In Proc. of Ninth International Symposium on Foundations of Intelligent Systems, Zakopane, Poland, June 1996, pp. 108-117.

16. J.J. Zhang and N.J. Cercone, Learning English stress rules: Using a machine learning approach, In Pro- ceedings of Pacific Association for Computational Linguistics (PACLING’99), Waterloo, Canada, August 1999, pp. 350-364.

17. J.J. Zhang, H.J. Hamilton and N.J. Cercone, Learning English grapheme segmentation using the iterated version space algorithm, In Intentational Symposium on Methodologies for Intelligent Systems (ISMIS’SS), Warsaw, Poland, June 1999, pp. 420-429.

introducing ivsa: a new concept learning algorithm · 2016. 12. 7. · abstract-the iterated...

Documents