introducing ivsa: a new concept learning algorithm · 2016. 12. 7. · abstract-the iterated...
TRANSCRIPT
An lntematlcd Journal
computers & mathematics with applkationr
PERGAMON Computers and Mathematics with Applications 43 (2002) 821-832 www.elsevier.com/locate/camwa
A New Introducing IVSA:
Concept Learning Algorithm
JIANNA J. ZHANG Department of Computer Science, University of Manitoba
Winnipeg, Manitoba, Canada, FL3T 2N2 jisnnacDcs.umsnitoba.ca
(Received and accepted January 2001)
Abstract-The iterated version spme algorithm (IVSA) has been designed and implemented to learn disjunctive concepts that have multiple classes. Unlike a traditional version space algorithm, IVSA first locates the critical attribute values using a statistical approach and then generates the base hypothesis set that describes the most significant features of the target concept. With the base hypothesis, IVSA continues to learn leas significant and more specific hypothesis sets until the system is satisfied with its own performance. During the process of expanding its hypothesis space, IVSA dynamically partitions the search space of potential hypotheses of the target concept into contour- shaped regions until all training instances are maximally correctly classified. Over-fitting is not a problem for IVSA because it does not generate over-fitted candidate hypotheses during the learning. @ 2002 Elsevier Science Ltd. All rights reserved.
Keywords-Machine learning, Iterated version space learning, Concept learning, Over-fitting.
1. INTRODUCTION
Since the mid 195Os, many AI researchers have developed learning systems that automatically improve their performance. Vere’s multiple convergence algorithm [l] and Mitchell’s candidate elimination algorithm [2,3] introduced a novel approach to concept learning known as the version space algorithm (WA). Unlike other learning algorithms, which used either generalization or specialization alone, VSA employed both. VSA guarantees a consistent concept description with all seen instances, and no back tracking for any seen training instances. However, these two advantages of VSA depend on two conditions:
(1) all training instances must be noise free (consistent), and (2) the target concept must be simple (conjunctive).
These problems have prevented VSA from practical use outside the laboratories. During the last decade, some improved algorithms based on VSA have been designed and/or
implemented. In Section 2, we will first introduce VSA and then highlight some of the improved methods compared with the IVSA approach. The discussion in Section 2 focuses on learning a disjunctive concept from six training instances, which are noise free so that problems caused by learning disjunctive concepts can be isolated from problems caused by noise training instances. Section 3 presents the overall approach of IVSA. Preliminary experimental results on several ML
The author is supported by the Natural Sciences and Engineering Research Council of Canada under Grant 228142-2000 and by the GRG Brock Start Grant.
0898-1221/02/t - see front matter @ 2002 Elsevier Science Ltd. All rights reserved. Typeset by AM-m PII: SO898-1221(01)00323-S
brought to you by COREView metadata, citation and similar papers at core.ac.uk
provided by Elsevier - Publisher Connector
822 J.J. GANG
databases [4] and English pronunciation databases [5] are presented in Section 4. Discussions on
each specific test and sample rules are also given in Section 4. In Section 5, we will summarize
current research on IVSA and give suggestions for future research.
2. VERSION SPACE RELATED RESEARCH
2.1. The Version Space Algorithm
A version apace is a representation that contains two sets of hypotheses, the general hypotheses
(G set) and the specific hypotheses (S set). Both G and S must be consistent with all examined
instances. Positive instances make the S set more general to include all positive instances seen,
while negative instances make the G set more specific to exclude all negative instances seen.
If the training instances are consistent and complete, G and S sets eventually merge into one
hypothesis set. This unique hypothesis is the learned concept description.
The following six noisefree training instances have been selected to illustrate problems with
VSA. The value ‘1’ or ‘0’ in each instance indicates that the patient had a positive or negative
allergic reaction, respectively.
Pi = (lunch, expensive, rice, coffee, Sam’s, l),
Ni = (supper, expensive, bread, coffee, Tim’s, 0),
Ps = (supper, cheap, rice, tea, Tim’s, l),
Ps = (breakfast, cheap, bread, tea, Tim’s, l),
Pa = (supper, expensive, rice, tea, Bob’s, l),
F’s = (supper, cheap, rice, coffee, Sam’s, 1).
Nl: [supper, expensive, bread, coffee, Tim’s, 0]
/L\
[lunch, ?, ?, ?, ?, l] [?, ?, rice, ?, ?, l] pruned pruned
[3 9 9 9 Sam's, l] -, -, *, -, pruned
no sulution I-_[ 1
pruned [?, ?, ?, ?, ?, ?]
P3: [breakfast, cheap, bread, tea, Tim’s, l]
[?, ?, rice, ?, ?, l]
P2: [supper, cheap, rice, tea, Tim’s, l]
[lunch, expensive, rice, co&e, Sam’s, I]
Pl: [lunch, expensive, rice, coffee, Sam’s, l]
Figure 1. The version space after the fourth instance (P3).
Concept Learning Algorithm 823
Figure 1 shows that as soon as instance Pa is processed, the new specific Hypothesis S3 must
be discarded due to over-generalization. When either G or S set becomes empty, the version
space is collapsed, and thus, “no legal concept description is consistent with this new instance as
well as all previous training instances” [2], although a concept description of Yea or rice” can be
easily derived by hand.
2.2. The Extended Approach with Multiple Version Spaces
To overcome the inability of generating disjunctive hypothesis, Mitchell proposed an extended
approach using multiple version spaces (MVS) [2]. In this approach, the operations on sets of
concept descriptions are the same as that of the original version space approach except that
the extended approach builds different subversion spaces whenever a positive instance cannot be
included in the previous boundary sets (G and S). The main idea of using MVS is to find some
subversion spaces that maximally include positive instances and exclude all negative instances.
This approach has been analyzed extensively in [6].
Although MVS can generate disjunctive hypothesis, it introduces the over-fitting problem. To
illustrate this problem, let us use MVS to learn the standard Monks database provided in [4].
In this set of databases, Monks-l is the simplest set to learn since it is noise-free and can be
described with five simple rules. Because MVS requires that each version space must exclude all
negative instances, the learning results are forced to be very specific, and therefore, more than
five rules are required to cover the entire training set. Table 1 shows a subset of generated rules
learned from the Monks-l training instances. The classification accuracy raises very slowly as
more rules are added. With an improved MVS, we have generated a total of 465 rules, but only
15 of them are useful. Although the learning accuracy has reached lOO%, the testing result is
only 89%.
Table 1. A subset of rules learned from the Monks-l database with MVS.
5 0.201612903225806 (l?? 1332)
6 0.225806451612903 (1?3?? l?)
7 0.225806451612903 (1?3?312)
1 8 1 0.241935483870968 1 (1 ? 3 ? 14 21 1 9 1 0.241935483870968 1 (1 ? 3 2 ? 1 1)
10 0.250000000000000 1 (1 ? 3 2 3 2 2)
2.3. The Incremental Version Space Merging
Another recent research into VSA is the incremental version space merging (IVSM) [7]. Instead
of building one version space, IVSM constructs many version spaces VSI,,., where 12 is the number
of training instances. For each i E R, IVSM first constructs VSi using only one training instance,
and then computes the intersection of VSi and VSci_.1). That is, for each pair of boundary hypotheses Gl, Sl in VS, and G2, S2 in V/SC~-~)~(~-~), IVSM repeatedly specializes each pair of hypotheses in Gl and G2, and generalizes each pair of hypotheses in Sl and S2 to form a new version space VSi,(i_l). This merging process repeats until all of the instances have been
learned.
The same six training instances are used to demonstrate IVSM learning. In Figure 2, after
IVSM has computed the intersection for VS, and VS6, the resulting specific hypothesis [‘?, ?, ?,
?, ?, l] is overly generalized. According to the IVSM merging algorithm [7], the current specific
hypothesis must be pruned. IVSM, therefore, does not offer a solution for this particular exercise.
VS,
: P
l: [
lunc
h, e
xpen
sive
, ri
ce,
coff
ee,
Sam
’s,
11
0 Gl
[?,?
,?,?
,?,?
I
[lun
ch,
expe
nsiv
e, r
ice,
cof
fee,
Sam
’s,
11
0 Sl
I
vs,=
VS,
m
erge
s V
S2
[?,
?, ?
, ?,
?,
?]
[lun
ch ,
7 ., v
.t 3 *,
9 -, l]
[?
, ?,
ric
e, ?
, ?,
11
[3 *, 9 -9
7 .9 3 .,
Sam
’s,
l]
[lun
ch,
expe
nsiv
e, r
ice,
cof
fee,
Sam
’s,
l]
0 Sl
VS5
= V
S3 m
erge
sVS,
0 Gl
[?, ?
, ric
e, ?
, ?,
11
[?,
?, r
ice,
?,
?, 1
1
0 Sl
vs,:
Nl:
[sup
per,
exp
ensi
ve,
brea
d, c
offe
e, T
im’s
, 0]
0 Gl
[?,?
,?,?
,?,?
I
[?.c
heap
,?,?
,?,l]
[?
,?,r
ice,
?,?,
l]
[lun
ch 9
9 .,.,.
, -7
-J -?
.,
l]
[O, 0
0 0
0
, , , ,
01 [?
,?,?
,?$J
am’s
J]
0 Sl V
&:
Pz:
[su
pper
, ch
eap,
ric
e, t
ea,
Tim
’s,
l]
0 Gl
[1 *, 3
., 7 .,
7 .9 I
., 71
-
[sup
per,
che
ap,
rice
, te
a, T
im’s
, l]
0 Sl
vs,:
p3:
[bre
akfa
st,
chea
p, b
read
, te
a, T
im’s
, 11
0 Gl
[?,
?, ?
, ?,
?,
?]
[bre
akfa
st,
chea
p, b
read
, te
a, T
im’s
, l]
0 Sl
VS,
=V
S, m
erge
sVS,
0 Gl
[?, ?
, ric
e, ?
, ?,
l]
prun
ed
[1+
-i
prun
ed
[?,
?, ?
, ?,
?,
11
0 Sl
Fig
ure
2. T
he
IVS
M a
ppro
ach
afte
r pr
oces
sing
th
e fo
urth
in
stan
ce
(fi)
.
Concept Learning Algorithm 825
3. THE ITERATED VERSION SPACE LEARNING
The allergy example is simple and can be described with two hypotheses. But when the number
of training instances, attributes, and classes are getting larger and larger, it becomes more and more difficult to detect which attribute value would be a true feature that distinguishes instances of different classes. However, VSA has already provided a natural way of separating different features. That is, whenever VSA collapses, the search has encountered a new feature. This is one of the new ideas behind IVSA.
3.1. Learning Disjunctive Concepts
Before showing the detailed algorithm and approach, let us apply the same six allergy instances to IVSA. As Figure 3 shows, when the version space is collapsed by processing P3, instead of failing, IVSA first collects G3 and S2 as candidate hypotheses, and then constructs a new version space with Pa to learn a different feature of the same concept. The idea behind this back tracking is to redirect the search from concentrating on one feature to another.
When all six training instances have been processed, IVSA has collected three candidate hy- potheses: [?, ?, rice, ?, ?, l]; [?, ?, ?, ?, ?, l]; and [?, ?, ?, tea, ?, 11. These candidate hypotheses then are evaluated using Ri = IE+ l/l,!? ( - [Et- l/lE- 1 , where E+ and E- are sets of all positive and negative training instances, respectively. Er, a subset of E+, is a set of positive instances included by the ith candidate hypothesis, and E,: , a subset of E- , is the set of negative instances included by the same candidate hypothesis. For the allergy example,
R1=$ R2 = 0, and R3=;.
Therefore, [?, ?, rice, ?, ?, I] and [?, ?, ?, tea, ?, l] are selected as the concept description: (( A3 = rice) V (Ad = tea)) --+ allergy.
3.2. Learning from Noisy Training Instances
When training instances contain noise, the noise interferes or even stops the learning. With IVSA, noisy training instances are simply ignored [8]. Here we use the same allergy example in Section 2.1, plus a noise instance Nz = (supper, cheap, rice, tea, Tim’s, 0). Figure 4 shows this learning process. In the first version space, IVSA simply ignores Ns just like it ignores instances representing different features such as P3 in Figure 3 in the second version space. Because Ns is negative, IVSA amalgamates the second version space with P3. But if the incorrect instances were classified as positive, IVSA would start with this instance and later the hypothesis generated from this noisy instance would be discarded. The learned concept description is not interfered with by Nz because IVSA recognizes that Nz either represents a different feature of the concept, or does not belong to the target concept at all.
3.3. Learning Accurate Concept Descriptions
To show how IVSA learns accurate concept descriptions and overcomes the over-fitting problem encountered by the multiple version space method, let us apply the same set of training instances Monks-l to IVSA.
Suppose &lass, A(A”,lue) = cp + en - ep - in is the function that computes the significance of each attribute value, where A, cp, ep, en, and in denote a particular attribute, correctly classified instances, excluded positive instances, excluded negative instances, and incorrectly classified negative instances, respectively. Let Cclass = (cp - ep)/(cp + ep) be the function that decides which class will be used for the default rule. We first use the C function to determine the default rule. In IVSA, the concept of positive and negative is relative. That is, when learning hypotheses for a class, the instances with this target class are positive, while the rest of the instances are
vs,
0 Gl
[?,
?, ?
, ?,
?, l
]
[lun
ch,
?, ?
, ?,
?,
l]
,+[?
, ?,
ric
e, ?
, ?,
11
I?,
?, ?
, ?,
Sam
’s,
11
prun
ed
prun
ed
[?,
?, ?
, ?,
?, l
]
n
s3 P
3: [
brea
kfa
st,
chea
p,
R
brea
d, t
ea,
Tim
’s,
l]
- S
2 [?
, ?,
ric
e, ?
, ?,
l]
P2:
[su
pper
, ch
eap,
ric
e, t
ea,
Tim
’s,
l]
Sl
[lun
ch,
expe
nsiv
e,
rice
, co
ffee
, S
am’s
, 11
Pl:
[l
unch
, ex
pens
ive,
ri
ce,
coff
ee,
Sam
’s,
11
vs,
Gl 0 [I
prun
ed
[?,
?,?,
?,
?,
l] [sup
per,
chea
p,
rice
, co
ffee
, S
am’s
, l]
[?,
?, ?
, te
a, ?
, l]
P4:
[su
pper
, ex
pens
ive,
ri
ce,
tea,
Bob
’s,
l]
[bre
akfa
st,
chea
p, b
read
, te
a, T
im’s
,11
P3:
[br
eak
fast
, ch
eap,
bre
ad,
tea,
Tim
’s,
l]
Can
dida
tes:
A
ssem
blin
g:
(R v
alue
) D
escr
ipti
ons:
[?,
?, r
ice,
?,
?, l
] --
>
[?,?
,?,?
,?,l]
[?
, ?,
ric
e, ?
, ?,
l]
-->
di
scar
ded
[?,
?, ?
, te
a, ?
, l]
-+
[?
, ?,
?,
tea,
?,
l]
Fig
ure
3.
Usi
ng
IVSA
fo
r th
e al
lerg
y ex
ampl
e.
VS
l
-
0 [?
, ?, ?,
?, ?,
11
/ ;’
~
;~~
wpp
$;~
6 G2
[lun
ch,
?, ?
, ?,
?,
11
pru
ned
t 0 G3
-0
64
-[?,
?, r
ice,
?,
?, l
] [?
, ?,
?,
?, S
am’s
, l]
A
pru
ned
N2:
[s
uppe
r,
chea
p, r
ice,
tea
, T
im’s
, 0]
G5
[I
[?,
?, r
ice,
?,
?, l
]
s2 P
P2:
[su
pper
, ch
eap,
ric
e, t
ea,
Tim
’s,
l]
[lun
ch,
expe
nsiv
e,
rice
, co
ffee
, S
am’s
, l]
Pl:
[l
unch
, ex
pens
ive,
ri
ce,
coff
ee,
Sam
’s,
l]
vs,
Gl 0
[?,
?, ?
, ?,
?, l
]
[I
pru
ned
[?,?
,?,?
, ?,
l]
1
coff
ee,
Sam
’s,
l]
[?,
?, ?
, te
a, ?
, l]
Tak
e G
l an
d
S2
as a
par
tial
solu
tion
P4:
[su
pper
, ex
pens
ive,
ri
ce,
tea,
Bob
’s,
l]
d
Sl
[bre
akfa
st,
chea
p, b
read
, te
a, T
im’s
,11
P3:
[br
eak
fast
, ch
eap,
bre
ad,
tea,
Tim
’s,
l]
Can
dida
tes:
A
ssem
blin
g:
Des
crip
tion
s:
I?,
?, r
ice,
?,
‘f, 1
1 [?
,?,?
,?,?
,l]
[?,
?, r
ice,
?,
?, l
] di
scar
ded
[?,
?, ?
, te
a, ?
, l]
[?
, ?,
?,
tea,
?,
11
Figu
re
4.
Lea
rnin
g no
isy
trai
ning
in
stan
ces
with
IV
SA.
828 J.J. ZHANG
Table 2. Significant attributes and their values
negative. Each class has a chance to be selected as a positive target. Since Cr = 0.1935 and Cc = 0.1613, the default rule is (0, ?, ?, ?, ?, ?, ?). With the R function, we have determined the significant attributes and their values which are shown in Table 2.
Next we run the generator to produce candidate hypotheses. With the above significant at- tribute values, IVSA has generated 25 candidate hypotheses (see Table 3). These hypotheses are then evaluated with (cp - en)/(cp + en), and the results are shown in the second column of Table 3.
Table 3. Evaluated candidate hypotheses.
10 1 1 1 1 1 ? ? ?) 23 -0.29 (1????2?)
11 0.41 (13?????) 24 -0.27 (1????3?)
12 0.24 (l???l??) 25 -0.14 (l? l? ? ? ?)
13 0.077 (l? ? 1 ? ? ?) 26 0.048 (0?2????)
Table 4. Assembling results for Monks-l database with IVSA.
Rule Number Accuracy Corresponding Rule Status
1 I 73% I fl????l?) 1 keot
2 1 84% I f133????) 1 keot 1 3 84%
4 94%
5 94%
6 94%
(1337 l??) discarded
(122????) kept
(122? l??) discarded
(122? l? 1) discarded
7 I 100% I (11 l? ? ? ?) I kent I Accuracy on 431 unseen instances: 100%
After the candidate hypotheses are evaluated, IVSA starts to assemble them into regional hypothesis. The assembled resulting rules are accurate and general. For the 431 unseen instances, the classification accuracy reaches 100% with five rules. Table 4 shows the process of adding rules. If the accuracy increases when a rule is added, it remains at the position that gives the highest increase. But if the accuracy remains the same after a rule is added, it is discarded. The reason that IVSA does not have the over-fitting problem is that it does not produce overly specific rules. It looks for the true features of the target concept by analyzing the training data before any candidate hypotheses has been produced.
Concept Learning Algorithm 829
3.4. The IVSA Model
Learning a concept is similar to assemble a multidimensional jigsaw puzzle from a large selection
of possible pieces (T set). The target concept can be viewed as the puzzle and each piece of puzzle as an example in T. The pieces that belong to the puzzle are called positive instances, while those that do not belong but share similar features with the positive instances are called negative instances. Not all instances in T can be identified easily because some of them are not, members of the target puzzle, and some of them may be defected due to error. Regardless of how they become a member of T, they are treated as noise in T.
Simple puzzles may be composed of a few pieces while a more complicated puzzle may be composed of hundreds and thousands of pieces. Similarly, simple concepts may be presented by a few instances, while more complicated concepts may have to be presented by hundreds, thousands, or even millions of instances. One way to learn concepts from any set of training instances is to simulate a puzzle assembling process by repeatedly generating candidate hypotheses from uncovered examples, and adding them to the currently learned concept description until it is complete. That is, maximally includes all positive instances and excludes all negative ones.
As shown in Figure 5, IVSA contains the example analyzer, hypothesis generator, assembler, and remover. The example analyzer provides statistical evaluation for each attribute value pro- vided by the instance space to determine the order of input training instances. The hypothesis generator produces a set of candidate hypotheses from the given set of training instances. The hypothesis assembler repeatedly selects the most promising hypothesis from a large number of candidate hypotheses according to the statistical evaluation provided by the example analyzer, and then tests this hypothesis in each position in a list of accepted hypotheses. If adding a new hypothesis increases concept coverage, it is placed in the position that causes the greatest increase; otherwise, this hypothesis is discarded. After the candidate hypotheses have been pro- cessed, the list of accepted hypotheses is examined by the hypothesis remover to see if any of the hypotheses can be removed without reducing accuracy. If the learning accuracy is satisfac- tory, the accepted hypothesis set becomes the learned concept description. Otherwise, the set of incorrectly translated instances are fed back to the generator, and a new learning cycle starts.
Analyzer Generator
Classification Accuracy
Figure 5. The IVSA model.
830 J. J. ZHANG
4. EXPERIMENTAL RESULTS
IVSA is tested on some machine learning databases provided by [4,5]. To demonstrate the
consistency of IVSA, a ten-fold cross validation test is used. That is, for each fold of the test,
use 90% of instances to train the system and then with the rules learned from the 90% instances,
testing on 10% unseen instances.
4.1. Learning the Mushroom Database
Table 5. Ten-fold tests on mushroom data (CPU: MIPS R4400).
9 7,312 812 100.00% 100.00% 8 01/46/40
10 7,312 812 100.00% 100.00% 9 01/56/16
Ave. 7,312 812 100.00% 100.00% 9 01/55/04
SD. 0.49 0.49 0.00 00.00 0.30 859.94
The mushroom database (91 has a total of 8,124 entries (tuples or instances). Each tuple
has 22 feature attributes and one decision attribute. The 22 feature attributes have two to five
values and the decision attribute has two values (or classes) ‘p’ (poison) or ‘e’ (eatable). Because
the mushroom database is noise-free, any machine learning program should be able to learn it
accurately. For example, STAGGER “asymptoted to 95% classification accuracy after reviewing
1,000 instances” [lo], HILLARY has learned 1,000 instances and reported an average accuracy of
about 90% on ten runs [ll], a back propagation network developed in [12] has generated ‘crisp
logical rules’ that give correct classification of 99.41%, and variant decision tree methods used
in [13] have 100% accuracy by a ten-fold cross validation test [13]. With IVSA, the predictive
accuracy shown in Table 5 on the mushroom database has reached 100% with nine rules.
4.2. Learning the Monk’s Databases
The Monk’s databases contains three sets: Monk-l, Monk-2, and Monk-3. Each of the three
sets is originally partitioned into training and testing sets [4,14]. IVSA is trained and tested
on Monk-l, Monk-2, and Monk-3 databases, In Table 6, the experiment shows that 5, 61, and 12 rules learned from Monk-l, Monk-2, and Monk-3 databases which gives lOO%, 81.02%,
and 96.30% classification accuracies on three different sets of 432 previously unseen instances,
respectively.
Table 6. Tests on Monks database (CPU: 296 MHz SUNW, UltraSPARC-II).
I Data Instances Accuracy # of CPU Time 1
Base Training 1 Testing 1 Training 1 Testing Rules (Seconds) 1 I Monk-l I 124 I 432 I 100.00% I 100% I 5 I 3 I I Monk-2 I I Monk-3 I
169 I 432 I 100.00% I 81.02% I 61 I 38 I 122 I 432 I 100.00% I 96.30% I 12 I 5 I
Concept Learning Algorithm 831
Rules learned from Monk-l, (2 2 ? ? ? ? l), (3 3 ? ? ? ? l), (1 1 ? ? ? ? l), (? ? ? ? 1
? l), (? ? ? ? ? ? 0), h s ow exactly the desired concept description with minimum number of
rules allowed by the concept language, which can be rewritten as (head-shape = body-shape) V
(jacket = red) -+ monk. For the Monk-2 database, 61 rules are learned, which is relatively large
compared with the other two sets (Monk-l and Monk-3) due to a highly disjunctive (or irregular)
concept. However, it can be improved with more statistical analysis or some improved instance
space (or representation space) shown in [15] that the predictive accuracy can be as high as lOO%,
although this method is highly specified for only Monk-2 database. For Monk-3 database, IVSA
has learned the following 12 rules, which gives 96.3% classification accuracy despite 5% noise
added to the Monk-3 training instances: (1 1 1 1 3 1 0), (1 2 1 2 3 1 0), (2 2 1 2 2 1 0), (2 2 1 3
3 1 O), (2 2 1 3 3 2 O), (2 3 1 1 3 1 l), (3 3 1 1 3 2 l), (3 3 1 1 4 1 l), (? ? ? ? 4 ? O), (? 1 ? ? ?
? l), (? 2 ? ? ? ? l), (? ? ? ? ? ? 0).
4.3. Learning English Pronunciation Databases
IVSA has been applied to learn English pronunciation rules [5,16,17]. The task is to provide a
set of rules that transform input English words into sound symbols using four steps:
(1) decompose words into graphemes,
(2) form syllables from graphemes,
(3) stress marking on syllables, and
(4) transform them into a sequence of sound symbols.
Learning and testing results are shown in Table 7. For each of the learning and testing accuracy,
the percentage of correctly classified instances are reported both in instances (letter combinations,
syllables, stress patterns, and graphemes) and words.
Table 7. Learning and testing results for individual steps
5. CONCLUSIONS
We have presented a new concept learning method IVSA, its approach, and test results. Our
analysis of previous research shows that the empty version space signals a new feature of the
same target concept presented by a particular instance. The hypotheses generated by previous
version spaces belong to one region of the target concept, while the current hypotheses generated
by a new version space belong to another region of the same concept. IVSA takes advantage of
an empty version space, using it to divide the regions of a concept, and correctly handles noisy
training instances. A concept description can be divided into regions, and each region can be represented by a
subset of training instances. These subsets can be collected according to the statistical analysis
on each attribute value provided by the example analyzer. The technique of rearranging the order of training instances according to the importance of a particular attribute value provides a
practical method to overcome order bias dependency of the training instances.
The demonstration on learning noisy training instances shows that IVSA has strong immunity
to noisy data, and has the ability to learn disjunctive concept. IVSA does not generate over-fitted
hypotheses because the statistic analysis predetermines the most significant attribute values and
the instances that contains these significant attribute values are processed first. The preliminary
experimental results show that rules learned by IVSA obtain high accuracy when applied to
832 J. J. ZHANG
previously unseen instances. In the future, we will intensively test IVSA with additional databases
and improve the example analyzer to obtain higher learning speed and smaller numbers of rules.
REFERENCES
1. S.A. Vere, Induction of concepts in predicate calculus, In Proc. IJCAI-75, Advance Papers of the Fourth International Joint Conference on Art$cial Intelligence, Volume 1, Tbilisi, USSR, 1975, pp. 281-287.
2. T.M. Mitchell, Version spaces: An approach to concept learning, Ph.D. Thesis, Stanford University, Cali- fornia, (1978).
3. T.M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston, MA, (1997). 4. UCI, UC1 repository of machine learning databases and domain theories, f tp : //f tp. its . uci . edu/pub/
machine-learning-databases/(1996). 5. J.J. Zhang, The LEP learning system: An IVSA approach, Ph.D. Thesis, University of Regina, Regina,
Canada (1998). 6. T. Hong and S. Tseng, Learning disjunctive concepts by the version space strategy, National Science Council,
Part A 10, 564-573 (1995). 7. H. Hirsh, Generalizing version spaces, Machine Learning 17, 5-46 (1994). 8. J.J. Zhang and N.J. Cercone, The iterated version space learning, In The Seventh International Workshop
on Rough Sets, fizzy Sets, Data Mining, and Gmular-Sort Computing (RSFDGrC’SS), Yamaguchi, Japan, November 1999.
9. J. Schlimmer, Editor, The Audubon Society Field Guide to North American Mushrooms, Alfred A. Knopf, New York, (1981).
10. J.C. Schlimmer, Incremental adjustment of representations for learning, In Proc. of the Fourth International Workshop on Machine Learning, Irvine, CA, June 1987 (Edited by P. Langley), pp. 79-90.
11. W. Iba, J. Wogulis and P. Langley, ?kading off simplicity and coverage in incremental concept learning, In Proceedings of the 5th International Conference on Machine Learning, pp. 73-79, (1988).
12. W. Duch, R. Adamcsak and K. Grabczewski, Extraction of logical rules from training data using back- propagation networks, In Proceedings of the The First Online Workshop on Soft Computing, pp. 19-30, (1996).
13. P.E. Utgoff, N.C. Berkman and J.A. Clouse, Decision tree induction based on efficient tree restructuring, Machine Learning 29, 5-44 (1997).
14. S.B. Thurn et al., The Monk’s problem: A performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University, Pittsburgh, PA (December 1991).
15. E. Bloedorm, S. Ryszard and S. MichaIski, The AQ17-DC1 system for data-driven constructive induction and its application to the analysis of world economics, In Proc. of Ninth International Symposium on Foundations of Intelligent Systems, Zakopane, Poland, June 1996, pp. 108-117.
16. J.J. Zhang and N.J. Cercone, Learning English stress rules: Using a machine learning approach, In Pro- ceedings of Pacific Association for Computational Linguistics (PACLING’99), Waterloo, Canada, August 1999, pp. 350-364.
17. J.J. Zhang, H.J. Hamilton and N.J. Cercone, Learning English grapheme segmentation using the iterated version space algorithm, In Intentational Symposium on Methodologies for Intelligent Systems (ISMIS’SS), Warsaw, Poland, June 1999, pp. 420-429.