concept discovery on relational databases: new techniques for search space pruning and rule quality...

14
Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement Y. Kavurucu, P. Senkul * , I.H. Toroslu Middle East Technical University, Department of Computer Engineering, 06531 Ankara, Turkey article info Article history: Received 4 February 2010 Received in revised form 20 April 2010 Accepted 21 April 2010 Available online 28 April 2010 Keywords: ILP Data mining MRDM Concept discovery Transitive rules Support Confidence abstract Multi-relational data mining has become popular due to the limitations of propositional problem defini- tion in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, lan- guage pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we introduce an ILP-based concept discovery framework named Concept Rule Induction System (CRIS) which includes new approaches for search space pruning and new features, such as defining aggregate predicates and han- dling numeric attributes, for rule quality improvement. In CRIS, all target instances are considered together, which leads to construction of more descriptive rules for the concept. This property also makes it possible to use aggregate predicates more accurately in concept rule construction. Moreover, it facili- tates construction of transitive rules. A set of experiments is conducted in order to evaluate the perfor- mance of proposed method in terms of accuracy and coverage. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction The amount of data collected on relational databases has been increasing due to increase in the use of complex data for real life applications. This motivated the development of multi-relational learning algorithms that can be applied to directly multi-relational data on the databases [19,17]. For such learning systems, generally the first-order predicate logic is employed as the representation language. The learning systems, which induce logical patterns valid for given background knowledge, have been investigated under a research area, called Inductive Logic Programming (ILP) [46]. In general, using logic in data mining is a common technique in the literature [52,47,57,12,63,67,39,23,49,50,5,40,35,10,36,64,16,55, 18,51,65,66]. Concept is a set of patterns to be discovered by using the hidden relationships in the database. Concept discovery in relational dat- abases is a predictive learning task. In predictive learning, there is a specific target concept to be learned in the light of the past experiences [45]. The problem setting of the predictive learning task introduced by Muggleton in [45] can be stated as follows: given target class/concept C (target relation), a set E of positive and negative examples of the class/concept C, a finite set of back- ground facts/clauses B (background relations), concept description language L (language bias); find a finite set of clauses H, expressed in concept description language L, such that H together with the background knowledge B entail all positive instances E(+) and none of the negative instances E(). In other words, H is complete and consistent with respect to B and E, respectively. Association rule mining in relational databases is a descriptive learning task. In descriptive learning, the task is to identify fre- quent patterns, associations or correlations among sets of items or objects in databases [45]. Relational association rules are ex- pressed as query extensions in the first-order logic [11,13]. In the proposed work, there is a specific target concept and association rule mining techniques are employed to induce association rules which have only the target concept as the only head relation. In this paper, we present Concept Rule Induction System (CRIS), which is a concept learning ILP system that employs relational association rule mining concepts and techniques to find frequent and strong concept definitions according to given target relation and background knowledge [31]. CRIS utilizes absorption operator of inverse resolution for generalization of concept instances in the presence of background knowledge and refines these general pat- terns into frequent and strong concept definitions with an APRIOR- I-based specialization operator based on confidence. 1.1. Contributions Major contributions and the main features of this work can be listed as follows: 0950-7051/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2010.04.011 * Corresponding author. Tel.: +90 312 2105518. E-mail addresses: [email protected] (Y. Kavurucu), senkul@ ceng.metu.edu.tr (P. Senkul), [email protected] (I.H. Toroslu). URL: http://www.ceng.metu.edu.tr/karagoz/ (P. Senkul). Knowledge-Based Systems 23 (2010) 743–756 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Upload: independent

Post on 22-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Knowledge-Based Systems 23 (2010) 743–756

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier .com/ locate /knosys

Concept discovery on relational databases: New techniques for search spacepruning and rule quality improvement

Y. Kavurucu, P. Senkul *, I.H. TorosluMiddle East Technical University, Department of Computer Engineering, 06531 Ankara, Turkey

a r t i c l e i n f o a b s t r a c t

Article history:Received 4 February 2010Received in revised form 20 April 2010Accepted 21 April 2010Available online 28 April 2010

Keywords:ILPData miningMRDMConcept discoveryTransitive rulesSupportConfidence

0950-7051/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.knosys.2010.04.011

* Corresponding author. Tel.: +90 312 2105518.E-mail addresses: [email protected]

ceng.metu.edu.tr (P. Senkul), [email protected]: http://www.ceng.metu.edu.tr/karagoz/ (P. Se

Multi-relational data mining has become popular due to the limitations of propositional problem defini-tion in structured domains and the tendency of storing data in relational databases. Several relationalknowledge discovery systems have been developed employing various search strategies, heuristics, lan-guage pattern limitations and hypothesis evaluation criteria, in order to cope with intractably largesearch space and to be able to generate high-quality patterns. In this work, we introduce an ILP-basedconcept discovery framework named Concept Rule Induction System (CRIS) which includes newapproaches for search space pruning and new features, such as defining aggregate predicates and han-dling numeric attributes, for rule quality improvement. In CRIS, all target instances are consideredtogether, which leads to construction of more descriptive rules for the concept. This property also makesit possible to use aggregate predicates more accurately in concept rule construction. Moreover, it facili-tates construction of transitive rules. A set of experiments is conducted in order to evaluate the perfor-mance of proposed method in terms of accuracy and coverage.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

The amount of data collected on relational databases has beenincreasing due to increase in the use of complex data for real lifeapplications. This motivated the development of multi-relationallearning algorithms that can be applied to directly multi-relationaldata on the databases [19,17]. For such learning systems, generallythe first-order predicate logic is employed as the representationlanguage. The learning systems, which induce logical patterns validfor given background knowledge, have been investigated under aresearch area, called Inductive Logic Programming (ILP) [46]. Ingeneral, using logic in data mining is a common technique in theliterature [52,47,57,12,63,67,39,23,49,50,5,40,35,10,36,64,16,55,18,51,65,66].

Concept is a set of patterns to be discovered by using the hiddenrelationships in the database. Concept discovery in relational dat-abases is a predictive learning task. In predictive learning, thereis a specific target concept to be learned in the light of the pastexperiences [45]. The problem setting of the predictive learningtask introduced by Muggleton in [45] can be stated as follows:given target class/concept C (target relation), a set E of positiveand negative examples of the class/concept C, a finite set of back-ground facts/clauses B (background relations), concept description

ll rights reserved.

.tr (Y. Kavurucu), senkul@tr (I.H. Toroslu).nkul).

language L (language bias); find a finite set of clauses H, expressedin concept description language L, such that H together with thebackground knowledge B entail all positive instances E(+) and noneof the negative instances E(�). In other words, H is complete andconsistent with respect to B and E, respectively.

Association rule mining in relational databases is a descriptivelearning task. In descriptive learning, the task is to identify fre-quent patterns, associations or correlations among sets of itemsor objects in databases [45]. Relational association rules are ex-pressed as query extensions in the first-order logic [11,13]. In theproposed work, there is a specific target concept and associationrule mining techniques are employed to induce association ruleswhich have only the target concept as the only head relation.

In this paper, we present Concept Rule Induction System (CRIS),which is a concept learning ILP system that employs relationalassociation rule mining concepts and techniques to find frequentand strong concept definitions according to given target relationand background knowledge [31]. CRIS utilizes absorption operatorof inverse resolution for generalization of concept instances in thepresence of background knowledge and refines these general pat-terns into frequent and strong concept definitions with an APRIOR-I-based specialization operator based on confidence.

1.1. Contributions

Major contributions and the main features of this work can belisted as follows:

Table 1The database of the ancestor example with type declarations.

Concept instances Background facts

a(kubra, ali). p(kubra, ali).a(ali, yusuf). p(ali, yusuf).a(yusuf, esra). p(yusuf, esra).a(yusuf, aysegul). p(yusuf, aysegul).a(kubra, yusuf).a(kubra, esra).a(kubra, aysegul).a(ali, esra).a(ali, aysegul).

744 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

1. The selection order of the target instance (the order in the tar-get relation) may change the resulting hypothesis set. In eachcoverage set, the induced rules depend on the selected targetinstance and the covered target instances in each step do nothave any effect on the induced rules in the following coveragesteps. To overcome this problem, first, all possible values foreach argument of a relation are determined by executing simpleSQL statements in the database. Instead of selecting a targetinstance, those values for each argument are used in the gener-alization step of CRIS. By this way, the generated rules do notdepend on the instance selection order and induced rule qualityis improved.

2. This technique facilitates the generation of transitive rules, aswell. When the target concept has common attribute types withonly some of the background predicates, the rest of the predi-cates (which are called unrelated relations) can never take partin hypothesis. This prevents the generation of transitive rulesthrough such predicates. In CRIS, since all target instances areconsidered together, there is no distinction for related andunrelated relations and hence transitive rules can be induced.

3. Better rules (higher accuracy and coverage) can be discoveredby using aggregate predicates in the background knowledge.To do this, aggregate predicates are defined in the first-orderlogic and used in CRIS. In addition, numerical attributes arehandled in a more accurate way. The rules having comparisonoperators on numerical attributes are defined and used in themain algorithm.

4. CRIS utilizes primary key-foreign key relationship (if exists)between the head and body relations in the search space as apruning strategy. If a primary-foreign key relationship existsbetween the head and the body predicates, the foreign keyargument of the body relation can only have the same variableas the primary key argument of the head predicate in the gen-eralization step.

5. The main difficulty in relational ILP systems is searching inintractably large hypothesis spaces. In order to reduce thesearch space, a confidence-based pruning mechanism is used.In addition to this, many multi-relational rule induction sys-tems require the user to determine the input–output modesof predicate arguments. Instead of this, we use the informationabout relationships between entities in the database if given.

6. Muggleton shows that [48], the expected error of an hypothesisaccording to positive versus all (positive and negative) exam-ples do not have much difference if the number of examplesis large enough. Most ILP-based concept learning systems inputbackground facts in Prolog language; this restricts the usage ofILP engines in real-world applications due to the time-consum-ing transformation phase of problem specification from tabularto logical format. The proposed system directly works on rela-tional databases, which contain only positive information, with-out any requirement of negative instances. Moreover, thedefinition of confidence is modified to apply Closed WorldAssumption (CWA) [53] in relational databases. We introducetype relations to the body of the rules in order to express CWA.

In [31], the contribution presented in the first item of the abovelist was introduced without performance evaluation. In [33], thebasics of aggregate predicate usage are presented. In this work,the features of CRIS are elaborated in more detail with perfor-mance evaluation results on several data sets. In [29,28,30,32], fea-tures of another concept discovery system developed by ourresearch group, namely C2D, are presented. Although, CRIS andC2D have common properties such as use of only positive instances,the concept discovery algorithm of CRIS has different propertiesand advantages which are presented and discussed and evaluatedin this work.

This paper is organized as follows: Section 2 gives preliminaryinformation about concept discovery in general and the conceptsemployed in CRIS. Section 3 presents the related work. Section 4describes the proposed method. Section 5 presents the experi-ments to discuss the performance of CRIS. Finally, Section 6 in-cludes concluding remarks.

2. Preliminaries

In this section, basic terminology in concept discovery and ba-sics for concept representation and discovery are introduced.

2.1. Basics

A concept is a set of patterns which are embedded in the fea-tures of the instances of a given target relation and in the relation-ships of this relation with other relations. In this work, a concept isdefined though concept rules.

Definition 1. [Concept rule] A concept rule (or shortly rule) is anassociation rule (range-restricted query extension). It is repre-sented as ‘‘h b”, where h is the head of the rule and b denotes thebody of the rule.

Definition 2. [Target relation] A target relation is a predicate thatcorresponds to the concept to be discovered. The instances of thetarget relations have to be correctly covered by the discovered pat-tern. If the discovered pattern is in the form of rules (as in thiswork), target relation appears in the head of the rule. In recursiverules, it may take part in the body part, as well.

Definition 3. [Background relation] A background relation is apredicate that is different than the target relation and involves inthe concept discovery. When discovered pattern is in the form ofrules, a background relation may appear in the body part of therule.

In Table 1, the relation given in the first column, ancestor, is thetarget relation. The content of the first column constitutes the tar-get instances. For this example, one of the concepts rules definingthe concept is ‘‘ancestor(A, B) parent(A, B)”.

We use the first-order logic as the language to represent dataand patterns. The concept rule structure is based on query exten-sion. However, to emphasize the difference from classical clauseand query, we firstly present definitions for these terms.

Definition 4. [Clause] A clause is a universally quantified disjunc-tion "(l1 _ l2 _ . . . _ ln). When it is clear from the context thatclauses are meant, the quantifier " is dropped. A clauseh1 _ h2 _ . . . _ hp _ b1 _ b2 _ . . . _ br, where the hi are positive liter-als and the bj are negative literals, can also be written ash1 _ h2 _ . . . _ hp b1 ^ b2 ^ . . . ^ br, where h1 _ h2 _ . . . _ hp

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 745

(p P 0) is called the head of the clause, and b1 ^ b2 ^ . . . ^ br (r P 0)is called the body of the clause. This representation can be read as‘‘h1 or . . . or hp if b1 and . . . and br”.

Definition 5. [Definite clause] A definite clause is a clause whichonly has one head literal. A definite clause with an empty body iscalled a fact. A denial is a clause with an empty head.

Definition 6. [Query] A query is an existentially quantified con-junction $(l1 ^ l2 ^ . . . ^ ln). When it is clear from the context thatqueries are meant, the quantifier $ is dropped.

A query $(l1 ^ . . . ^ lm) corresponds to the negation of a denial"( l1 ^ . . . ^ lm).

Definition 7. [Query extension] A query extension is an existen-tially quantified implication $(l1 ^ l2 ^ . . . ^ lm) ? $(l1 ^ l2 ^ . . . ^lm ^ lm+1 ^ . . . ^ ln), with 1 6m < n. To avoid confusion with clauses(which are also implications) we can write as l1 ^ l2 ^ . . .^lm,lmþ1 ^ . . . ^ ln. We call query l1 ^ l2 ^ . . . ^ lm the body and querylm+1 ^ . . . ^ ln the head of the query extension [11]. In [13], rela-tional association rules are called as query extensions.

Definition 8. [Range-restricted query] A range-restricted query isa query in which all variables that occur in negative literals alsooccur in at least one positive literal. A range-restricted queryextension is a query extension such that both head and body arerange-restricted queries.

Table 2The SQL queries for support calculation.

Support = COUNT1/COUNT2COUNT1:SELECT COUNT(*) FROM

SELECT DISTINCT (a.arg1, a.arg2)FROM ancestor a, parent pWHERE a.arg1 = p.arg1 AND p.arg2 = ’yusuf’

COUNT2:SELECT COUNT(*) FROM

SELECT DISTINCT (a.arg1, a.arg2)FROM ancestor a

Table 3The SQL queries for confidence calculation.

Confidence = COUNT3/COUNT4COUNT3:SELECT COUNT(*) FROM

SELECT DISTINCT (p.arg1)FROM ancestor a, parent p

Definition 9. [h-subsumption] A definite clause C h-subsumes1 adefinite clause C0, i.e. at least as general as C0, if and only if $h suchthat:

headðCÞ ¼ headðC 0Þ and bodyðCÞh # bodyðC 0Þ:

In this work, we adapt the h-subsumption definition for queryextensions.

Two basic steps in the search for a correct theory are specializa-tion and generalization [38]. If a theory covers negative examples, itmeans that it is too strong, it needs to be weakened. In otherwords, a more specific theory should be generated. This processis called specialization. On the other hand, if a theory does not im-ply all positive examples, it means that it is too weak, it needs to bestrengthened. In other words, a more general theory should begenerated. This process is called generalization. Specialization andgeneralization steps are repeated to adjust the induced theory inthe overall learning process.

2.2. Support and confidence

Two criteria are important in the evaluation of a candidate con-cept rule: how many of the concept instances are captured by therule (coverage) and the proportion of the objects which truly be-long to the target concept among all those that show the patternof the rule (accuracy); support and confidence, respectively. There-fore, the system should assign a score to each candidate conceptrule according to its support and confidence value.

Definition 10. [Support] The support value of a concept rule C isdefined as the number of different bindings for the variables in the

1 A substitution h is a set {X1/t1, . . .,Xm/tm}, where each Xi is a variable such thatXi ¼ Xj () i ¼ j; ti is a term different from Xi, and each element Xi/ti is called abinding for variable Xi.

head relation that satisfy the rule, divided by the number ofdifferent bindings for the variables in the head relation. In otherwords, it is the ratio of number of positive target instancescaptured by the rule over number of target instances.

Let C be h b,

supportðh bÞ

¼ jbindings of variables for h that satisfy h bjjbindings of variables for h that satisfy hj :

Definition 11. [Confidence] The confidence of a concept rule C isdefined as the number of different bindings for the variables inthe head relation that satisfy the rule, divided by the number ofdifferent bindings for the variables in the head relation that satisfythe body literals. In other words, it is the ratio of number of targetinstances captured by the rule over number of instances that arededucible by the body literals in the rule.

Let C be h b,

confidenceðh bÞ

¼ jbindings of variables for h that satisfy h bjjbindings of variables for h that satisfy bj :

In the literature, the support and confidence values are obtainedwith the SQL queries given in [13].

The database given in Table 1, is used as a running example toillustrate the algorithm. In this example, ancestor (a) is the conceptto be learned, and nine concept instances are given. Also a back-ground relation, namely parent(p) is provided.

For the rule, a(A, B) p(A, yusuf), support and confidence val-ues are 3/9(0.33) and 1/1(1.0), that can be obtained by the SQL que-ries shown in Tables 2 and 3.

The confidence query definition produces superficially high val-ues for the cases where head of the concept rule includes variablesnot appearing in the body, as shown above, and hence it is notapplicable as is. Therefore, we proposed a modification on theuse of the confidence query definition [29,32].

Confidence is the ratio of number of positive instances deduc-ible from the rule to the number of examples deducible from the

WHERE a.arg1 = p.arg1 AND p.arg2 = ’yusuf’COUNT4:SELECT COUNT(*) FROM

SELECT DISTINCT (p.arg1)FROM parent p

746 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

rule. In other words, it shows how strong the concept rule is. Forthe example rule, the high confidence value tells that it is verystrong. However, out of the following five deducible facts a(ali,ali), a(ali, kubra), a(ali, yusuf), a(ali, esra) and a(ali, aysegul), onlythree of them (a(ali, yusuf), a(ali, esra) and a(ali, aysegul) are posi-tive) exist in the database. As a result, the example rule coverssome negative instances.

In order to adapt the confidence query to concept discovery, weadd type relations to the body of the concept rule corresponding tothe arguments of the head predicate whose variable does not ap-pear in the body predicates. The type tables (relations) for thearguments of the target relation are created in the database (if theydo not exist). For the ancestor example, person table is the type ta-ble, which contains all values in the domain of the correspondingargument of the target relation in the database. For the ancestorexample, person table contains 5 records which arekubra, ali, yusuf,esra and aysegul. A type relation is named as the correspondinghead predicate argument type’s name and it contains a single argu-ment whose domain is the same as the domain of the correspond-ing head predicate argument. The rules obtained by adding typerelations are used only for computing the confidence values, andfor the rest of the computation, original rules without type rela-tions are used.

Definition 12. [CWA] The Closed World Assumption (CWA) is thepresumption that what is not currently known to be true, is false[53,42,44].

Definition 13. [Type-extended concept rule] A type-extended con-cept rule is a concept rule in which type relations corresponding tothe variable arguments of the head predicate that do not appear inthe body, are added to the body of the concept rule.

In this work, we extend the concept rules to type-extended con-cept rules. By adding the type relations, negative instances can bededuced as in CWA. Besides this, since the type relation is alwaystrue for the instance, this modification does not affect the seman-tics of the concept rule. In addition, the definition of the confidencequery remains intact.

As a result of this modification, the support and confidence val-ues for the example rule are as follows:

aðA;BÞ pðA; yusuf Þ; personðBÞ:ðs ¼ 0:33; c ¼ 0:6Þ:

The confidence of a concept rule is calculated with respect to allconcept instances. This is due to the fact that the covered examplesare true for the candidate rules in the following coverage steps. Onthe other hand, the support is calculated over uncovered conceptinstances.

Definition 14. [f-metric]f-metric is an hypothesis evaluation cri-terion that is calculated as follows: f-metric = ((B2 + 1) � conf �sup)/((B � conf) + supp))

The user can emphasize the effect of support or confidence bychanging the value of B. If the user defines B to be greater than 1,then confidence has a higher effect. On the other hand, if B has avalue less than 1, then support has a higher effect. Otherwise, bothsupport and confidence have equivalent weight in the evaluation.

In this work, f-metric is used as the concept rule evaluationmetric in order to select the best concept rule.

2.3. Association rule mining and APRIORI property

Association rule mining is conventionally used for discoveringfrequent associations among item sets. Association rule miningtechniques are adapted for multi-relational domains, as well.Although it is a descriptive method, by restricting the rule head

to concept, it is also useful for predictive approaches. In this work,there is a specific target concept to be learned, and, association rulemining is employed for inducing association rules with the targetconcept as the only head relation.

The most popular and well-known association rule mining algo-rithm, as introduced in [4], is APRIORI. APRIORI utilizes an impor-tant property of frequent item sets in order to prune candidateitem set space:

Property 1. All subsets of a frequent item set must be frequent.

The contra-positive of this property says that if an item set isnot frequent than any superset of this set is also not frequent. Itcan be concluded that the item set space should be traversed fromsmall size item sets to large ones in order to discard any superset ofinfrequent item sets from scratch. In order to apply this reasoning,APRIORI reorganizes the item set space as a lattice based on thesubset relation.

The search space is searched with an APRIORI-based specializa-tion operator in the proposed method.

3. Related work

In this section, we present an overview of relational learningsystems related to our work.

FOIL [52] is one of the earliest concept discovery systems. It is atop–down relational ILP system, which uses refinement graph inthe search process. In FOIL, negative examples are not explicitlyprovided, they are generated on the basis of CWA.

PROGOL [47] is a top–down relational ILP system, which isbased on inverse entailment. PROGOL extends clauses by travers-ing the refinement lattice and reduces the hypothesis space byusing a set of mode declarations given by the user, and a most spe-cific clause (also called bottom clause) as the greatest lower boundof the refinement graph. A bottom clause is a maximally specificclause, which covers a positive example and is derived using in-verse entailment. PROGOL applies the covering approach and sup-ports learning from positive data as in FOIL.

ALEPH [57] is similar to PROGOL, whereas it is possible to applydifferent search strategies, evaluation functions and refinementoperators. It is also possible to define more settings in ALEPH suchas minimum confidence and support.

Design of algorithms for frequent pattern discovery has becomea popular topic in data mining. Almost all algorithms have thesame level-wise search technique known as APRIORI algorithm.The level-wise algorithm is based on a breadth-first search in thelattice spanned by a specialization relation between patterns.WARMR [12] is a descriptive ILP system that employs APRIORI ruleto find frequent queries having the target relation by using supportcriteria.

One major difficulty in ILP is to manage the search space. Themost common approach is to perform a search of hypotheses thatare local optima for the quality measure. To overcome this prob-lem, simulated annealing algorithms [2] can be used. SAHILP [56]uses simulated annealing methods instead of covering approachin ILP for inducing hypothesis. It uses the neighborhood notion insearch space, a refinement operator similar to FOIL and weightedrelative accuracy [37] as the quality measure.

PosILP [55] extends the propositional logic to the first-ordercase to deal with exceptions in a multi-class problem. It reformu-lates the ILP problem in the first-order possibilistic logic and rede-fines the ILP problem as an optimization problem. At the end, itlearns a set of prioritized rules.

The proposed work is similar to ALEPH as both systems produceconcept definition from given target. WARMR is another similarwork in a sense that, both systems employ APRIORI-based search-

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 747

ing methods. Unlike ALEPH and WARMR, CRIS does not need input/output mode declarations. It only requires type specifications ofthe arguments, which already exist together with relational tablescorresponding to predicates. Most of the ILP-based systems requirenegative information, whereas CRIS directly works on databaseswhich have only positive data. Similar to FOIL, negative informa-tion is implicitly described according to CWA. Finally, it uses a no-vel confidence-based hypothesis evaluation criterion and searchspace pruning method.

ALEPH and WARMR can use indirectly related relations andgenerate transitive rules only with using strict mode declarations.In CRIS, transitive rules are generated without the guidance ofmode declarations.

There are some other studies that use aggregation in multi-rela-tional learning. Crossmine [67] is such an ILP based multi-rela-tional classifier that uses TupleID propagation. Mr. G-Tree [39] isproposed to extend the concepts of propagation described in Cross-mine by introducing the g-mean TupleID propagation algorithm,also known as GTIP algorithm. CLAMF [23] extends TupleID prop-agation in the order to efficiently perform single and multi-featureaggregation over related tables. Decision trees were extended tothe MR domain while incorporating single-feature aggregationand probability estimates for the classification labels [49]. A hierar-chy of relational concept classes in order of increasing complexityis presented in [50], where the complexity depends on that of anyaggregate functions used. Aggregation and selection are combinedefficiently in [5].

MRDTL [40] constructs selection graphs for rule discovery.Selection Graph is a graphical language that is developed to ex-press multi-relational patterns. These graphs can be translated intoSQL or the first-order logic expressions. Generalised SelectionGraph is an extended version of SG that uses aggregate functions[35]. It inspired this work for defining and using aggregation, how-ever, we followed a logic-based approach and included aggregatepredicates in an ILP-based context for concept discovery.

C2D [29,28,32,30] is another concept discovery system proposedby the same research group prior to CRIS. In CRIS, some of the fea-tures such as confidence-based pruning are borrowed from C2Dand are further improved. The major difference between two sys-tems appears in the generalization technique which improves rulequality and effective use of aggregate predicates and facilitatestransitive rule generation.

4. Concept discovery in CRIS

CRIS [31] is a concept discovery system that uses the first-orderlogic as the concept definition language and generates a set of con-cept rules having the target relation in the head. In this section, thebasic techniques used in CRIS are described in detail.

In the first subsection, novel pruning techniques in CRIS areintroduced. In the next subsection, concept discovery algorithmof CRIS is described in detail. Finally, in the third subsection, inclu-sion of aggregation into the concept discovery process is explained.

4.1. Pruning strategies in CRIS

In CRIS, three mechanisms are utilized for pruning the searchspace.

The first one is a generality ordering on the concept rules basedon h-subsumption:

Strategy 1. In CRIS, candidate concept rules are generated accord-ing to h-subsumption definition given in Definition 9 in Section 2.

For instance, consider the following two concept rules from theancestor example (Table 1):

C1: a(A, B) p(A, C).C2: a(A, B) p(A, C), a(C, A).

As the head of C1 and C2 (a(A, B)) are the same and body of C1 is asubset of C2, C1 is more general than C2 and it h-subsumes C2.

The second pruning strategy is about the use of confidence. Forthis strategy, firstly, we define a ‘‘non-promising rule”.

Definition 15. [Non-promising rule] Let C1 and C2 be the twoparent rules of the concept rule C in the APRIORI search lattice([3]). If the confidence value of C is not higher than the confidencevalues of C1 and C2, then it is called a non-promising rule.

Strategy 2. In CRIS, non-promising rules are pruned in the searchspace.

By using this strategy, in the solution path, each specialized rulehas higher confidence value than its parents. A similar approach isused in the Dense–Miner system [8] for traditional association rulemining.

For the illustration of this technique on the ancestor example,consider the following two rules in the first level of the APRIORIlattice:

C1: a(A, B) p(A, C). (c = 0.6)C2: a(A, B) p(C, B). (c = 0.45)

These rules are suitable for union since their head literals arethe same and they have exactly one different literal from eachother. Possible union rules are as follows:

C3: a(A, B) p(A, C), p(C, B). (c = 1.0)C4: a(A, B) p(A, C), p(D, B). (c = 0.75)

C3 and C4 have higher confidence values than C1 and C2. There-fore, they are not pruned in the search space.

The last pruning strategy employed in CRIS, which is also a no-vel approach, utilizes primary key-foreign key relationship be-tween the head and body relations:

Strategy 3. If a primary key-foreign key relationship existsbetween the head and the body predicates, the foreign keyargument of the body relation can only have the same variableas the primary key argument of the head predicate in thegeneralization step.

For example, in the Mutagenesis database [61], the target rela-tion is molecule(drug, boolean) and a background relation isatm(drug, atom, element, integer, charge). As there is a primarykey-foreign key relationship between molecule and atm relationsthrough the ‘‘drug” argument, some of the rules obtained at theend of the generalization step are as follows:

molecule(A, true) atm(A, B, c, 22, C).molecule(A, true) atm(A, B, h, C, D).molecule(A, true) atm(A, B, C, 22, D).. . .

On the basis of this idea, concept rules that have different attri-bute variables for primary key-foreign key attributes are not al-lowed in the generalization step. For example, the rule‘‘molecule(A, true) atm(B, C, c, 22, D)”. is not generated in thegeneralization step.

4.2. The algorithm

As shown in the flowchart given in Fig. 1, concept rule inductionalgorithm of CRIS takes target relation and background facts from

DATABASE(Target Relation & Background Facts)

INPUTPARAMSMin_supMin_confMax_depth Print

hypothesis

Y

N Are all targetinstancescovered?

Calculate feasible values for head and body relations

GENERALIZATIONFind General Rules (one head & one body literal) using absorption. Depth = 1.

Is Candidate Rule Set empty?

NSPECIALIZATION Refine generalrules using APRIORI.Depth = Depth + 1

Y COVERAGEFind Solution RulesCover Target Instances

Is Depth smaller thanMax_depth?

FILTERDiscard infreq and un-strong rules

N

Y

Fig. 1. CRIS algorithm.

Table 4The SQL query for finding feasible constants.

SELECT aFROM tGROUP BY aHAVING COUNT(*) P (min_sup*num_of_uncov_inst)

Table 5The SQL query example for support calculation.

SELECT drugFROM pte_activeGROUP BY drugHAVING COUNT(*) P 298*0.05

Table 6The feasible constants for pte_atm.

Arg. Constants SQL query

Drug Empty set (only variable) SELECT drugFROM pte_atmGROUP BY drugHAVINGCOUNT(*) P 9189*0.05

Atom Empty set (only variable) SELECT atomFROM pte_atmGROUP BY atomHAVINGCOUNT(*) P 9189*0.05

Element c, h, o (also variable) SELECT elementFROM pte_atmGROUP BY elementHAVINGCOUNT(*) P 9189*0.05

Integer 3, 10, 12 (also variable) SELECT integerFROM pte_atmGROUP BY integerHAVINGCOUNT(*) P 9189*0.05

Charge 19 range constants (alsovariable)

SELECT chargeFROM pte_atmORDER BY charge

748 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

the database. It works under minimum support, minimum confi-dence and maximum rule depth parameters. Rule constructionstarts with the calculation of feasible values for the head and bodyrelations in order to generate most general rules with a head and asingle body predicates. In generalization step, primary-foreign keyrelationship (Strategy 3) is also used in most general ruleconstruction.

After the generalization step, the concept rule space is searchedwith an APRIORI-based specialization operator. In this step, h-sub-sumption (Strategy 1) is employed for candidate rule generation. Inthe refinement graph, infrequent rules are pruned. In addition tothis, on the basis of Strategy 2, rules whose confidence values arenot higher than that of their parents are also eliminated.

When the maximum rule depth is reached or no more candidaterules can be found, the rules that are below the confidence thresh-old are eliminated for the solution set. Among the produced strongand frequent rules, the best rule (with the highest f-metric value) isselected. The rule search is repeated for the remaining concept in-stances that are not in the coverage of the generated hypothesisrules. At the end, some uncovered positive concept instancesmay exist due to the user settings for the thresholds. In the restof this section, the main steps of the algorithm are described.

Generalization: generalization step of the algorithm constructsthe most general two-literal rules by considering all target in-stances together. By this way, the quality of the rule induction doesnot depend on the order of target instances. This novel techniqueproceeds as follows.

For a given target relation such as t(A, B), the induced rule has ahead including either a constant or a variable for each argument of

t. Each argument can be handled independently in order to find thefeasible head relations for the hypothesis set. As an example, forthe first argument A, a constant must appear at least min_sup*num-ber_of_uncovered_instances times in the target relation so that itcan be used as a constant in an induced rule. In order to find thefeasible constants for the attribute A, the SQL statement given inTable 4 is executed.

For example, in the PTE-1 data set [60] the target relationpte_active has only one argument (drug). Initially, there are 298uncovered instances in pte_active. When the min_sup parameteris set as 0.05, the SQL statement given in Table 5 returns an emptyset which means there are not any feasible constants for the argu-ment drug of pte_active. Therefore, the argument drug of pte_activecan only be a variable for the head of the candidate concept rules.

In the same manner, for a background relation such as r(A, B, C),if a constant appears at least min_sup*number_of_instances timesfor the same argument in r, then it is a frequent value for that argu-ment of r and may take part in the solution rule for the hypothesisset. As an example, in the PTE-1 database, pte_atm(drug, atom, ele-ment, integer,charge) is a background relation and the feasible con-stants which can take part in the hypothesis set can be found foreach argument of pte_atm by using the above SQL statementtemplate.

For numeric attributes, due to support threshold, it is not feasi-ble to seek for acceptable constants. For this reason, feasible ranges

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 749

are given through less-than/greater-than operators on constants. Asan example, for the charge argument of pte_atm predicate, all val-ues in the database are sorted in ascending order. For min_sup0.05, there should be 19 (which is (1/0.05)–1) border values foreach less-than/greater-than operator. If pte_atm relation has 1000records, after ordering from smallest to largest, less-than/greater-than operator is applied for the 51st constant, 101st constant andso on. In addition to these constants denoting feasible ranges, thisargument can be a variable, as well.

The feasible constants and the SQL statements used for eachargument of pte_atm are as shown in Table 6.

As a result, pte_atm relation has 320 (which is, 1*1*4*4*20)body relations for each possible head relation in the generalizationstep of CRIS. Example generalized rules are listed in Table 7.

In the ancestor example, the target and background relationscan only have variables for the arguments in the hypothesis set.The constructed rules are listed in Table 8.

Specialization: CRIS refines the two-literal concept descriptionswith an APRIORI-based specialization operator that searches theconcept rule space in a top–down manner, from general to specific.As in APRIORI, the search proceeds level-wise in the hypothesisspace and it is mainly composed of two steps: frequent rule setselection from candidate rules and candidate rule set generationas refinements of the frequent rules in the previous level. The stan-dard APRIORI search lattice is extended in order to capture conceptrules and the candidate generation and frequent pattern selectiontasks are customized for the first-order concept rules.

The candidate rules for the next level of the search space aregenerated in three important steps:

1. Frequent rules of the previous level are joined to generate thecandidate rules via union operator. In order to apply the unionoperator to two frequent concept rules, these rules must havethe same head literal, and bodies must have all but one literalin common. Therefore, union-ed rule is h-subsumed by its par-ents. Since only the rules that have the same head literal arecombined, the search space is partitioned into disjoint APRIORIsub-lattices according to the head literal. In addition to this, the

Table 7Example generalized rules for PTE-1 data set.

pte_active(A) pte_atm(A, B, c, 3, X), X 6 �0.133.pte_active(A) pte_atm(A, B, c, 3, X), X P �0.133.pte_active(A) pte_atm(A, B, c, 3, C)pte_active(A) pte_atm(A, B, c, 10, X), X 6 �0.133.pte_active(A) pte_atm(A, B, c, 10, X), X P �0.133.pte_active(A) pte_atm(A, B, c, 10, C)pte_active(A) pte_atm(A, B, c, 22, X), X 6 �0.133.pte_active(A) pte_atm(A, B, c, 22, X), X P �0.133.pte_active(A) pte_atm(A, B, c, 22, C)pte_active(A) pte_atm(A, B, c, C, X), X 6 �0.133.pte_active(A) pte_atm(A, B, c, C, X), X P �0.133.pte_active(A) pte_atm(A, B, c, C, D)pte_active(A) pte_atm(A, B, h, 3, X), X 6 �0.133.pte_active(A) pte_atm(A, B, h, 3, X), X P �0.133.pte_active(A) pte_atm(A, B, h, 3, C)

Table 8Generalized rules for ancestor data set.

a(A, B) p(A, B). a(A, B) p(A, C).a(A, B) p(A, B). a(A, B) p(B, C).a(A, B) p(C, A). a(A, B) p(C, B).a(A, B) p(C, D). a(A, B) a(A, B).a(A, B) a(A, C). a(A, B) a(B, A).a(A, B) a(B, C). a(A, B) a(C, A).a(A, B) a(C, B). a(A, B) a(C, D).

system does not combine rules that are specializations of thesame candidate rule produced in the second step of the candi-date rule generation task in order to prevent logical redundancyin the search space.

2. For each frequent union rule, a further specialization step isemployed that unifies the existential variables of the same typein the body of the rule. By this way, rules with relations indi-rectly bound to the head predicate can be captured.

3. Except for the first level, the candidate rules that have confi-dence value not higher than parent’s confidence values areeliminated. If the concept rule has confidence value as 1, it isnot further specialized in the following steps (Strategy 3).

Evaluation: once the system constructs the search tree consist-ing of the frequent and confident candidate rules for that round,it eliminates the rules having confidence values below the confi-dence threshold. Among the remaining strong rules, the system de-cides on which rule in the search tree represents a better conceptdescription than the other candidates according to f-metric defini-tion given in Section 2. The user can emphasize the effect of sup-port or confidence by changing the value of B.

Coverage: after the best rule is selected, target instances coveredby this rule are determined and removed from the concept in-stances set. The main loop continues until all concept instancesare covered or no more candidate rules can be generated for theuncovered concept instances.

In the ancestor example, at the end of the first coverage step, thefollowing rule, which covers 7 of the target instances, is induced:

aðA;BÞ pðA;CÞ; aðC;BÞ:

In the second coverage step, the following rule, which covers allof the uncovered target instances, is induced:

aðA;BÞ pðA;BÞ:

4.3. Aggregation in CRIS

An important feature for a concept discovery method is the abil-ity of incorporating aggregated information into the concept dis-covery. Such information becomes descriptive as in the example‘‘the total charge on a compound is descriptive for the usefulnessor harmfulness of the compound”. Therefore, concept discoverysystem needs aggregation capability in order to construct high-quality (with high accuracy and coverage) for such domains.

In relational database queries, aggregate functions characterizegroups of records gathered around a common property. In conceptdiscovery, aggregate functions are utilized in order to constructaggregate predicates that capture some aggregate information overone-to-many relationships. Conditions on the aggregation such ascount < 10 or sum > 100 may define the basic characteristic of a gi-ven concept better. For this reason, in CRIS, we extend the back-ground knowledge with aggregate predicates in order tocharacterize the structural information that is stored in tablesand associations between them [33].

Definition 16. An Aggregate Predicate (P) is a predicate thatdefines aggregation over an attribute of a given Predicate (a). Weuse a notation similar to given in [24] to represent the general formfor aggregate predicates as follows:

Pa;bc;xðc;rÞ

where a is the predicate over which the Aggregate Function (x)(COUNT, MIN, MAX, SUM and AVG are the frequently used func-tions) is computed, Key (c) is a set of arguments that will form

750 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

the key for P, Aggregate Value (r) is the value of x applied to theset of values defined by Aggregate Variable List (b).

Mutagenesis data set [61] is used for illustrating the usage ofaggregate functions. In the data set, the target relation is moleculeand atom is a background relation. In addition to this, there is a pri-mary-foreign key relationship between these two relationsthrough the drug argument.

atom countatom;atom�iddrug;COUNT ðdrug; cntÞ:

The above notation represents the aggregate predicate atom_count(drug, cnt) that keeps the total number of atoms for each drug.

Definition 17. An aggregate query is an SQL statement includingaggregate functions. The instances of aggregate predicates arecreated by using an aggregate query template. Given Pa;b

c;xðc;rÞ, thecorresponding aggregate query is given in Table 9.

For example, the instances of atom_count aggregate predicateon atom relation are constructed by the query given in Table 10.

Definition 18. An aggregate rule is a concept rule which has atleast one aggregate predicate in the body of the rule.

An example aggregate rule is:

moleculeðd1; trueÞ atom countðd1;AÞ;A P 28:

As a more comprehensive example, PTE-1 data set [60] is used forexplaining how aggregation is used in concept discovery in the pro-posed system. There is one-to-many relationship between pte_ac-tive and pte_atm relations over the drug argument. A similarrelation exists between the pte_active and pte_bond tables. Alsothere is a one-to-many relationship between pte_atm and pte_bondrelations over the atm-id argument.

pte atm countatom;atm�iddrug;COUNT ðdrug; cntÞ is an example aggregate pred-

icate that can be defined in the PTE-1 data set. For simplicity, weabbreviate it as pte_atm_count(drug, cnt) which represents numberof atoms for each drug. The instances of pte_atm_count(drug, cnt)aggregate predicate on pte_atm relation are constructed by thequery given in Table 11.

All aggregate predicates defined on PTE-1 data set, theirdescriptions and the corresponding SQL query definitions are listedin Table 12.

Aggregate predicates have numeric attributes by their nature.Therefore, in order to add aggregate predicates into the system, nu-meric attribute types should also be handled. Since it is not useful

Table 9The SQL template for aggregate predicates.

SELECT c, x(b) as rFROM aGROUP BY c

Table 10SQL query for the predicate atom_count(drug, cnt).

SELECT drug, COUNT (atom-id) as cntFROM atomGROUP BY drug

Table 11SQL statement for aggregate predicate pte_atm_count(drug, cnt).

SELECT drug, COUNT (atm-id) as cntFROM pte_atmGROUP BY drug

and feasible to define concepts on specific numeric values, in thiswork, numeric attributes are considered only together with com-parison operators. For example, the pte_atm relation in the aboveexample has the argument charge which has floating-point values.It is infeasible to search for a rule such as: A drug is active if it hasan atom with charge equals to �0.117. As there are many possiblenumeric values in the relation, such a rule would probably be elim-inated according to minimum support criteria. Instead, to searchfor drugs which have charge larger/smaller than some thresholdvalue will be more feasible. For this purpose, numeric attributesare handled as described below.

As the first step, domains of the numeric attributes are explic-itly defined as infinite in the generalization step. For the infiniteattributes, concept rules are generated on the basis of the followingstrategy.

Strategy 4. For a given target concept t(a, x) and a related fact suchas p(a, b, num), where a and b are nominal values and num is anumeric value; instead of a single rule, the following two rules aregenerated:

tða; xÞ pða; b;AÞ;A P num:

tða; xÞ pða; b;AÞ;A 6 num:

In order to find the most descriptive num value, the basic meth-od is ordering the domain values for attribute A and defining theintervals with respect to the given support threshold. Therefore,a set of rules describing the interval borders are generated. Thismethod is described in Section 4.2. It is also applicable for numericattributes of the aggregate predicates, as well. However, the num-ber of generalized rules highly increases under low support thresh-old. For this reason, in order to improve the time efficiency, asimplification is employed and only the median element of the do-main is selected as the num value.

The integration of aggregate predicates into the concept rulegeneration process can be summarized as follows. One-to-manyrelationships between target concept and background relationsare defined on the basis of the schema information. Under theserelationships, aggregate predicates are generated by using theSQL template as described earlier in this section. In the generaliza-tion step, the instances of these predicates are considered for rulegeneration.

As an example, for the pte_atm_count predicate defined in PTE-1data set, the following example rules are created in the generaliza-tion step.

pte activeðA; trueÞ pte atm countðA;XÞ;X P 22:pte activeðA; trueÞ pte atm countðA;XÞ;X 6 22:

Including aggregate predicates into the concept discovery pro-cess increases the size of background relations. In addition to this,handling numeric attributes for the aggregate predicates increasesthe number of aggregate predicate instances further. Therefore,this inclusion incrases the concept discovery duration. However,it is a necessary feature that incrases the rule quality in certain do-mains. The most effective way to use this feature is to include itinto the rule generation mechanism optionally. For instance, forthe domains that do not include numeric attributes, it is not usefuland this property should be switched off.

5. Experimental results

A set of experiments were performed to test the performance ofCRIS on well-known problems in terms of coverage and predictiveaccuracy. Coverage denotes the number of target instances of testdata set covered by the induced hypothesis set. Predictive accuracydenotes the sum of correctly covered true positive and true nega-

Table 12The aggregate predicates in PTE-1 data set.

Predicate Desc. SQL query definition

pte_atm_count(drug, cnt) Number of atoms for each drug SELECT drug,COUNT(atm-id) FROM pte_atm GROUP BY drugpte_bond_count(drug, cnt) Number of bonds for each drug SELECT drug,COUNT(atm-id) FROM pte_bond GROUP BY drugpte_atm_b_cnt(atm-id, cnt) Number of bonds for each atom SELECT atm-id, COUNT(atm-id) FROM pte_bond GROUP BY atm-idpte_charge_max(drug, mx) Max charge of the atoms in a drug SELECT drug, MAX(charge) FROM pte_atm GROUP BY drugpte_charge_min(drug, mn) Min charge of the atoms in a drug SELECT drug, MIN(charge) FROM pte_atm GROUP BY drug

Table 13Summary of the benchmark data sets used in the experiments.

Data set No. of Pred. No. of Inst. No. of AggPred Min. sup. Min. conf.

Same-gen 2 408 0 0.3 0.6Mesh 26 1749 0 0.1 0.1PTE-1 32 29267 5 0.1 0.7Mutagenesis 26 15003 0 0.1 0.7Diterpene 22 46593 0 0.05 0.8Alzheimer 35 2505 0 0.25 0.8Satellite 31 18732 0 0.1 0.7Eastbound 12 196 0 0.3 0.6Elti 9 224 0 0.2 0.6

Table 14Descriptions of the benchmark data sets used in the experiments.

Data set Description

Same-gen An actual family data set containing recursive relationshipsMesh A sparse data set about learning rules to determine the elements on each edge of the edgePTE-1 The first data set of the Predictive Toxicology Evaluation (PTE) project to determine carcinogenic effects of chemicals in terms of numeric arguments and

aggregate predicatesMutagenesis A data set about chemicals in which the aim is to determine the mutagenicity of each drug in itDiterpene A data set about diterpenes containing numeric attributes and the aim is to identify the skeleton of themAlzheimer A data set concerning the design of analogues to the Alzheimer’s disease drug tacrine without numeric attributes and aggregate predicatesEastbound A data set including information about trains and their indirectly related factsSatellite A data set about diagnosis of power-supply failures in a communications satelliteElti An actual family data set containing transitive relationships

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 751

tive instances over the sum of true positive, true negative, false po-sitive and false negative instances.2 The experiments were run on acomputer with Intel Core Duo 1.6 GHz processor and 1 GB memory.

The benchmark data sets used in the experiments are summa-rized in Tables 13 and 14.

In this work, the experimental results on these benchmark datasets are sometimes given in different evaluation metrics. This isdue to the fact that we refer to the results of the related work givenin the literature and they are not available under the same metrics.For mesh data set, coverage results are given, whereas for PTE,accuracy is the measure used in the comparisons. This variationstems from the different characteristics of the data sets. For in-stance, since mesh is a sparse data set, increase in coverage is amore valuable quality than the increase in accuracy. In a similarway, in the literature, the generated rules and the parametric val-ues such as support and coverage under which they are generatedare not available. We presented the rules generated by our algo-rithm for some of the data sets. However, we could not do it forall data sets since comparable results by other systems are notavailable.

The rest of this section is organized as follows. In Section 5.1,performance for recursive rule discovery is tested and evaluated.Section 5.2 presents the performance results for rule discovery

2 In order to find the number of false positive and false negative instances, test dataset is extended with the dual of data set under CWA.

on sparse data sets. Section 5.3 includes the experiments on theaccuracy of the generated rules. This subsection includes also theperformance evaluation of the use of aggregate predicates. Lastly,in Section 5.4, performance of the proposed method for discoveryof transitive rules is presented.

5.1. Performance evaluation for linear recursive rule discovery

One of the interesting test cases that we have used is a complexfamily relation, same-generation learning problem. In this experi-ment, only linear recursion is allowed and B value is set to be 1.We set the confidence threshold as 0.6, support threshold as 0.3and maximum depth as 3.

In the data set, 344 pairs of actual family members are given aspositive examples of same-generation (sg) relation. Additionally, 64background facts are provided to describe the parental(p) relation-ships in the family. The tables sg and p have two arguments havingperson type. As there are 47 persons in the examples, the person ta-ble (type table) has 47 records.

The solutions under different evaluation criteria are given in Ta-ble 15 (the parameters in lower-case letters are constants that ex-ist in the data set). In this experiment, the effect of usingconfidence and using f-metric for selecting the best rule is evalu-ated. The row titles Conventional (as described in Definition 13)and Improved (as described in Definition 15) denote the use of con-ventional and proposed confidence query definition, as defined in

Table 16Test results for the mesh-design data set.

System Coverage (over 55 records)

CRIS 29ALEPH 26PosILP 23SAHILP 21MFOIL 19PROGOL 17GOLEM 17FOIL 17

752 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

Section 2. The sub-row titles Confidence and f-metric denote usingthe rule evaluation metrics as confidence and f-metric,respectively.

As seen from the results in Table 15, improved confidence eval-uation can find better rules than the conventional confidence eval-uation according to support and confidence values of the inducedhypothesis. With respect to the improved confidence, f-metric pro-duces better rules than using only confidence for hypothesisevaluation.

The first two rules of the solution obtained using f-metric eval-uation with improved confidence show that same-generation rela-tion is a symmetric relation and the third rule forms the baserule for the recursive solution. The induced rules with this settingcovers all the target instances and induced rules are very strongwith 100% accuracy. The discovered rules are the same as the stan-dard solution for same generation problem in the literature.

On the other hand, when conventional confidence is used inevaluation, recursive rules are not discovered. Therefore the accu-racy is very low. When improved confidence is used in recursiverules can be discovered. However, although they have high accu-racy, the rules generated under confidence evaluation are data-specific. Therefore f-metric evaluation with improved confidenceclearly produces the best set of rules when compared to the others.

For this data set, ALEPH, PROGOL and GOLEM cannot find asolution under default settings. Under strong mode declarationsand constraints, ALEPH finds the following hypothesis:

sgðA;BÞ pðC;AÞ;pðC;BÞ:sgðA;BÞ sgðA;CÞ; sgðC;BÞ:sgðA;BÞ pðC;AÞ; sgðC;DÞ;pðD;BÞ:

On the other hand, PROGOL can only find the following rule:

sgðA;BÞ sgðB;CÞ; sgðC;AÞ:

5.2. Performance evaluation on sparse data

In mechanical engineering, physical structures are representedby finite number (mesh) of elements to sufficiently minimize theerrors in the calculated deformation values. Mesh is a grid that iscomposed of points called nodes. It is programmed to contain thematerial and structural properties which define how the structurewill react to certain loading conditions. Nodes are assigned at a cer-tain density throughout the material depending on the anticipatedstress levels of a particular area. The problem is to determine anappropriate mesh resolution for a given structure that results inaccurate deformation values.

Table 15Rules discovered for same-generation data set.

Conf. usage Evaluation Rules

Conventionalconfidence

Conf.- based sg(A, B) p(C, A).sg(A, B) p(C, B).

f-metric-based

sg(A, B) p(C, A).sg(A, B) p(C, B).

Improved confidence Conf.-based sg(A, B) sg(C, D), p(C, A), p(D, B).sg(A, B) sg(A, neriman), p(yusuf,B).sg(A, B) sg(B, ali), p(mediha, A).sg(A, B) p(yusuf, A), p(yusuf, B).sg(A, B) p(mediha, A), p(mediha,B)sg(A, B) p(C, A), p(C, B).

f-metric-based

sg(A, B) sg(C, D), p(C, A), p(D, B).sg(A, B) sg(C, D), p(C, B), p(D, A).sg(A, B) p(C, A), p(C, B).

Mesh-design is, in fact, determination of the number of ele-ments on each edge of the mesh. The task is to learn the rules todetermine the number of elements for a given edge in the presenceof the background knowledge such as the type of edges, boundaryconditions, loadings and geometric position.

Four different structures called (b–e) in [15] are used for learn-ing in this experiment. Then, the structure a is used for testing theaccuracy and coverage of the induced rules. The number of ele-ments on each edge in these structures are given as positive con-cept instances, in the form of mesh (Edge, NumberOfElements).An instance of the examples such as (c15, 8) means that edge 15of the structure c should be divided in 8 sub-edges.

There are 223 positive training examples and 1474 backgroundfacts in the data set. The target relation mesh_train has two argu-ments having element and integer type. The type tables elementand integer are created having 278 and 13 records. The test relationmesh_test has 55 examples.

For this experiment, recursion is disallowed, support and confi-dence threshold are set as 0.1, B is set as 1 and maximum depth isset as 3. The details of the results and coverage of previous systemsare shown in Table 16.

Since mesh is a sparse data set, finding rules with high coverageis a hard task. For this reason, increase in coverage is a more valu-able quality than the increase in accuracy. For this special and hardcase, CRIS can find concept rules with higher coverage than theprevious systems.

5.3. Accuracy evaluation of concept discovery

5.3.1. Experiments on PTE-1 data setA large percentage of cancer incidents stems from the environ-

mental factors, such as cancerogenic compounds. The carcinoge-nicity tests of compounds are necessary to prevent cancers;however, the standard bioassays of chemicals on rodents are reallytime-consuming and expensive. Therefore, the National ToxicityProgram (NTP) of the US National Institute for EnvironmentalHealth Sciences (NIEHS) started the Predictive Toxicology Evalua-tion (PTE) project in order to relate the carcinogenic effects of

Table 17Predictive accuracies for PTE-1.

Method Type Pred. acc.

CRIS (with aggr.) ILP + DM 0.88CRIS ILP + DM 0.86Ashby Chemist 0.77PROGOL ILP 0.72RASH Biol. potency an. 0.72C2D (with aggr.) ILP + DM 0.70TIPT Propositional ML 0.67Bakale Chem. reactivity an. 0.63Benigni Expert-guided regr. 0.62DEREK Expert system 0.57TOPCAT Statistical disc. 0.54COMPACT Molecular modeling 0.54

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 753

chemicals on humans to their substructures and properties usingmachine learning methods [14].

In the NTP program, the tests conducted on the rodents result ina database of more than 300 compounds classified as carcinogenicor non-carcinogenic. Among these compounds, 298 of them areseparated as training set, 39 of them formed the test set of firstPTE challenge (PTE-1) and the other 30 chemicals constitute thetest set of the second PTE challenge (PTE-2) for the data miningprograms [60].

The background knowledge has roughly 25,500 facts [59]. Thetarget relation pte_active has two arguments having drug and booltype. The primary key for the target relation is drug and it exists inall background relations as a foreign key. The type tables drug andbool are created having 340 and 2 (true/false) records, respectively.

For this experiment, recursion is disallowed, maximum rulelength is set to be 3 predicates and support and confidence thresh-olds are set as 0.1 and 0.7, respectively. The predictive accuracy ofthe hypothesis set is computed by the proportion of the sum of thecarcinogenic concept instances classified as positive and non-car-cinogenic instances classified as negative to the total number ofconcept instances that the hypothesis set classifies.

For PTE-1 data set, the aggregate predicates given in Table 12are defined and their instances are added to the background infor-mation. An example induced rule including an aggregate predicateis as follows:

pte activeðA; falseÞ pte atmðA;B; c;22;XÞ;X P �0:020;pte has propertyðA; salmonella;nÞ;pte has propertyðA;mouse lymph; pÞ:

The predictive accuracies of the state-of-art methods and CRISfor PTE-1 data set are listed in Table 17. As seen from the table,CRIS has a better predictive accuracy than the other systems. Inaddition, it finds the best results (having highest accuracy) with re-spect to other systems. The reader may refer to [59,62,27,6,9,54,7,21,41] for more information on the compared systems in Ta-ble 17.

Within this experiment, the effect of including aggregate pred-icates in execution time is analyzed, as well. For the experiments,proposed method is applied on PTE-1 data set is with none to fiveaggregate predicates included in the background knowledge. Theresult is presented in Fig. 2. As seen in the figure, a linear increase

Fig. 2. Execution time graph for concept discovery with aggregation.

is observed with the linear increase in the number of includedaggregate predicates. The load is basically from numeric attributehandling. For the domains, where the aggregate predicates aredescriptive for the concept, experimentally observed increase ratein execution time can be tolerated.

5.3.2. Experiments on mutagenesis data setIn this experiment, we have studied the mutagenicity of 230

compounds listed in [61]. We use the regression-friendly data setwhich has 188 compounds. The target relation molecule has twoarguments having drug and bool type. The primary key for the tar-get relation is drug and it exists in all background relations as a for-eign key. The type tables drug and bool are created having 230 and2 (true/false) records, respectively.

In the literature [58], five levels of background knowledge forMutagenesis are defined. Five sets of background knowledge aredefined in the data set where Bi � Bi+1 for i = 0.3. In this experiment,B2 is used.

In this experiment, recursion is disallowed, support threshold isset as 0.1, confidence threshold as 0.7, B is set as 1 and maximumdepth is set as 3.

The predictive accuracies of the state-of-art methods and theproposed method on Mutagenesis data are listed in Table 18 [40].

As seen from the results, CRIS has the highest accuracy in thisexperiment.

5.3.3. Experiments on diterpene data setIn another experiment on the accuracy performance, we used

the diterpenes data set [20]. The data contains information on1503 diterpenes with known structure. The predicatered(Mol, Mult,Freq) keeps the measured NMR-Spectra information. For each ofthe 20 carbon atoms in the diterpene skeleton, the multiplicityand frequency values are recorded. The predicateprop(Mol, Satoms,Datoms, Tatoms, Qatoms) counts the atoms that have multiplicity s,d, t, or q, respectively. The data set contains additional unary pred-icates in order to describe to which of the 23 classes a compoundbelongs.

In this experiment, support threshold is 0.05, confidencethreshold is 0.8, and maximum depth for the rules is 2. The predic-tive accuracies of the previous systems given in [5,55] and CRIS areshown in Table 19.

Table 18Predictive accuracies for the Mutagenesis data set.

Method Predictive accuracy

CRIS 0.95PosILP 0.90SAHILP 0.89MRDTL 0.88C2D 0.85TILDE 0.85PROGOL 0.83FOIL 0.83

Table 19Predictive accuracies for the Diterpene data set.

Method Predictive accuracy

CRIS 0.98RIBL 0.91PosILP 0.91TILDE 0.9ICl 0.86SAHILP 0.84FOIL 0.78

Table 21Rules induced by C2D for elti data set.

elti(A, B) husband(C, A), husband (D, B), brother(C, D)

elti(A, B) husband(C, A), husband(D, B), brother(D, C)elti(A, B) husband(C, A), wife(B, D), brother(C, D)elti(A, B) husband(C, A), wife(B, D), brother(D, C)elti(A, B) husband(C, B), wife(A, D), brother(C, D)elti(A, B) husband(C, B), wife(A, D), brother(D, C)elti(A, B) wife(A, C), wife(B, D), brother(C, D)elti(A, B) wife(A, C), wife(B, D), brother(D, C)

Table 20The relations in the kinship data set.

Relation name Argument types

Aunt Person, personBrother Person, personDaughter Person, personFather Person, personHusband Person, personNother Person, personNephew Person, personNiece Person, personSister Person, personSon Person, personUncle Person, personWife Person, person

754 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

5.3.4. Experiments on alzheimer data setThe Alzheimer data set is about the drug design problem for

Alzheimer’s disease. In the data set, four biological/chemical prop-erties are considered and the target is to discover rules for them.These properties are maximization of acetyl cholinesterase inhibi-tion, maximization of inhibition of amine re-uptake, maximizationof the reversal of scopolamine-induced memory impairment andminimization of toxicity. In the data set, for each of the biologi-cal/chemical properties, instances are given as comparison of thedrugs for that property. For example, less_toxic(d1, d2) indicatesthat the toxicity of d1 is less than that of d2.

For this data set, the following rule is discovered by CRIS:

better acinhðA;BÞ : �alk groupsðA; CÞ; alk groupsðB;DÞ; gtðD;CÞ;ring substitutionðB;1Þ:

The same rule is discovered by GOLEM as reported in [34].There is not any other work that reports performance results onthis data set. For this data set, CRIS can discover the concept with0.893% accuracy.

5.3.5. Experiments on satellite data setThe satellite data set is about the temporal fault diagnosis of

power-supply failures in a communication satellite [1]. In the dataset, battery faults are simulated and the time of failures are re-corded in the form fault (n), where n denotes the time of failure.This predicate constitutes the target concept. The backgroundknowledge includes the history of the components (obtained from29 sensors in the power subsystem) and the basic operationphases’ start times.

For this data set, one of the rules discovered by CRIS is asfollows:

faultðAÞ : �faultðBÞ; succðA;BÞ:

The same rule discovered by GOLEM is reported in [22]. There is notany other work that reports performance results on this data set.With the data set given in [1], CRIS discovers the concept with98% accuracy. For battery fault diagnosis experiments, [22] uses aslightly different data set (the reported number of facts is differentthan given in [1] and test data set differs) In [22], accuracy of theconcept discovery by GOLEM is given as 98%.

5.4. Performance evaluation on transitive rule discovery

The approaches that consider only related facts in generaliza-tion step, as in C2D, fall short for the cases where the domain in-cludes many unrelated facts. Michalski’s trains problem [43], is atypical case for this situation. In this data set, the target relationeastbound(train) is only related with has_ car(train, car) relation.The other background relations have an argument of type carand are only related with has_car relation.

For this data set, the generated rules by C2D are very generaland cannot include any information about the properties of thecars of the train. C2D fixes this problem by adding the backgroundfacts that are indirectly related with the selected target concept in-stance into APRIORI lattice in the generalization step. As a result ofthis extension, C2D finds the following rule:

eastboundðAÞ has carðA; BÞ; closedðBÞ:ðs ¼ 1:0; c ¼ 0:72Þ

CRIS can find the same rule without any further extension, sinceits generalization step takes all target instances into account.Therefore, there is not a directly related fact/indirectly related factdistinction in CRIS. Furthermore, it finds the same rule generatedby C2D in shorter time.

PROGOL finds only the following rule (with lower support andconfidence) for this experiment:

eastboundðAÞ has carðA;BÞ;doubleðBÞ:ðs ¼ 0:4; c ¼ 0:67Þ:

ALEPH cannot find a rule without negative instances. Whennegative instances are provided, it finds the following rule (bestrule) for this experiment:

eastboundðAÞ has carðA;BÞ; shortðBÞ; closedðBÞ:ðs ¼ 1:0; c ¼ 1:0Þ:

Another example for transitive rule construction is the kinshipdata set, which is adapted from [26]. The name and arguments ofthe relations in the data set are given in Table 20.

There are 217 records in the data set. As there are 24 differentpeople in the relations, a person table (type table) is created includ-ing the names of 24 people. In this experiment, a new relationcalled elti(A, B) was defined, which represents the family relationbetween the wives of two brothers (The term elti is the Turkishword for this family relationship). In the data set, the people in eltirelation have no brothers. Therefore, brother instances are unre-lated facts of elti. The minimum support is set as 0.2 and minimumconfidence is set as 0.6.

When indirectly related facts are added to the lattice, C2D findsthe rules given in Table 21 that can capture the description of elticoncept.

For the same data set GOLEM cannot find any rule under severalmode declarations. PROGOL cannot find a successful rule for thisexperiment, as well. However, if only husband, wife and brotherrelations are given as background knowledge, then it finds onlyone transitive rule (given below) under strict mode declarations:

eltiðA;BÞ husbandðC;AÞ; husbandðD;BÞ;brotherðC;DÞ:

Similarly, ALEPH can only find one transitive rule for thisexperiment:

eltiðA;BÞ husbandðD;AÞ; wifeðB;CÞ;brotherðC;DÞ:

CRIS finds the correct hypothesis set for this experiment. Thetime efficiency and rule quality performance comparison of CRISand C2D are given in Table 22. In this table, the first column corre-sponds to C2D with extension in generalization step for handlingtransitive rules, the second column corresponds to the originalC2D algorithm and the last column corresponds to CRIS. As seen

Table 22The experimental results for train and elti data sets.

Experiment C2D with Unrel.F. C2D w/o Unrel.F. CRIS

Eastbound trainAccuracy 0.7 0 0.7Coverage 1.0 0 1.0Time (second) 8 1 5

EltiAccuracy 1.0 0.5 1.0Coverage 1.0 0.5 1.0Time (minute) 110 25 2.5

Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756 755

in the table, original C2D algorithm cannot find good concept rules.The rule quality performance is much better in the extended ver-sion. On the other hand, CRIS can find high-quality rules in shortertime due to its generalization technique.

In order to test the scalability of CRIS for this experiment, a syn-tactic data set on elti experiment was prepared which has 2170 re-cords (10 fictitious record for each record of each table in theoriginal elti data set). CRIS can still find the same hypothesis withlinear increase in time.

6. Conclusion

This work presents a concept discovery system, named CRIS,which combines rule extraction methods in ILP and APRIORI-basedspecialization operator. By this way, strong declarative biases arerelaxed, instead, support and confidence values are used for prun-ing the search space. In addition, CRIS does not require user spec-ification of input/output modes of arguments of predicates andnegative concept instances. Thus, it provides a suitable data miningframework for non-expert users who are not expected to knowmuch about the semantic details of large relations they would liketo mine, which are stored in classical data base managementsystems.

CRIS has a confidence-based hypothesis evaluation criterionand confidence-based search space pruning mechanism. Conven-tional definition of the confidence is slightly modified in order tocalculate confidence correctly when there are unmatched headpredicate arguments.

Confidence-based pruning is used in the candidate filteringphase. If the confidence value of the generated rule is not higherthan confidence values of its parents, it means that, the specifica-tions through it will not improve the hypothesis to be more confi-dent. By this way, such rules are directly eliminated at early steps.

In order to generate successful rules for the domains whereaggregated values such as sum, max, min are descriptive in thesemantics of the target concept, it is essential for a concept discov-ery system to support definition of aggregation and inclusion in theconcept discovery mechanism. In CRIS, aggregation information isdefined in the form of aggregate predicates and they are includedin the background knowledge of the concept. The aggregate valuethat takes part in the aggregate predicate is generated by consider-ing all values of the attribute. This leads to increase in executiontime, however, the concept discovery accuracy increases consider-ably. Due to the satisfactory results in rule quality, the decrease intime efficiency may be considered tolerable.

The proposed system is tested on several benchmark problemsincluding the same-generation, mesh-design, predictive toxicologyevaluation and mutagenicity test. The experiments show that CRIShas better accuracy performance than most of the state-of-the-artknowledge discovery systems. It can handle sparse data with cov-erage values compatible to the state-of-the-art systems. It can dis-cover transitive rules with high accuracy without modedeclarations.

As the future work, there are several directions, on which CRIScan be further improved. One direction is using more efficientquery processing in order to handle the repeating queries. Anotherissue to be studied is analyzing and improving the numeric attri-bute handling. As another improvement, it is possible to investi-gate the use of association rule mining techniques other thanAPRIORI. For this purpose, FP-growth [25] is a good candidate sinceit provides more efficiency with its ability to remove candidategeneration step. Since APRIORI is more straightforward to applyin relational domain, in CRIS we chose to use APRIORI for special-ization. Adapting FP-growth to relational domain appears as aninteresting question.

References

[1] Learning rules for temporal fault diagnosis in satellites. Available from:<http://www.doc.ic.ac.uk/shm/satellite.html>.

[2] E. Aarts, J. Korst, Simulated Annealing and Boltzmann Machines: A StochasticApproach to Combinatorial Optimization and Neural Computing, John Wiley &Sons, Inc., New York, NY, USA, 1989.

[3] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A.I. Verkamo, Fast discovery ofassociation rules, in: Advances in Knowledge Discovery and Data Mining,AAAI/MIT Press, 1996, pp. 307–328.

[4] R. Agrawal, R. Srikant, Fast algorithms for mining association rules. in: Proc.20th Int. Conf. Very Large Data Bases, VLDB, Morgan-Kaufmann, 12–15 1994,pp. 487–499.

[5] A. Assche, C. Vens, H. Blockeel, S. Dzeroski, First order random forests: learningrelational classifiers with complex aggregates, Mach. Learn. 64 (1–3) (2006)149–182.

[6] Dennis Bahler, Douglas W. Bristol, The induction of rules for predictingchemical carcinogenesis in rodents, in: Proceedings of the 1st InternationalConference on Intelligent Systems for Molecular Biology, AAAI Press, 1993, pp.29–37.

[7] George Bakale, Richard D. McCreary, Prospective ke screening of potentialcarcinogens being tested in rodent bioassays by the US National ToxicologyProgram, Mutagenesis 7 (2) (1992) 91–94.

[8] R.J. Bayardo, R. Agrawal, D. Gunopulos, Constraint-based rule mining in large,dense databases, Data Min. Knowl. Discovery 4 (2–3) (2000) 217–240.

[9] Romualdo Benigni, Predicting chemical carcinogenesis in rodents: the state ofthe art in light of a comparative exercise, Mutat. Res. Environ. Mutagen. Relat.Subj. 334 (1) (1995) 103–113.

[10] Y.C. Chien, Y. Chen, A phenotypic genetic algorithm for inductive logicprogramming, Expert Syst. Appl. 36 (3) (2009) 6935–6944.

[11] L. Dehaspe, Frequent Pattern Discovery in First-Order Logic. PhD thesis,Katholike University, Leuven, Belgium, 1998.

[12] L. Dehaspe, L. De Raedt, Mining association rules in multiple relations, in:ILP’97: Proceedings of the 7th International Workshop on Inductive LogicProgramming, Springer-Verlag, London, UK, 1997, pp. 125–132.

[13] L. Dehaspe, H. Toivonen, Discovery of relational association rules, in: S.Dzeroski, N. Lavrac (Eds.), Relational Data Mining, Springer-Verlag, 2001, pp.189–212.

[14] L. Dehaspe, H. Toivonen, R.D. King, Finding frequent substructures in chemicalcompounds, in: 4th International Conference on Knowledge Discovery andData Mining, AAAI Press, 1998, pp. 30–36.

[15] B. Dolsak, S. Muggleton, The application of inductive logic programming tofinite element mesh design, in: S. Muggleton (Ed.), Inductive LogicProgramming, Academic Press, London, 1992.

[16] Bojan Dolsak, Finite element mesh design expert system, Knowl. Based Syst.15 (8) (2002) 315–322.

[17] P. Domingos, Prospects and challenges for multi-relational data mining,SIGKDD Explor. 5 (2003) 80–81.

[18] Andrei Doncescu, Julio Waissman, Gilles Richard, Gilles Roux, Characterizationof bio-chemical signals by inductive logic programming, Knowl. Based Syst. 15(1-2) (2002) 129–137.

[19] S. Dzeroski, Multi-relational data mining: an introduction, SIGKDD Explor. 5(1) (2003) 1–16.

[20] S. Dzeroski, S. Schulze-Kremer, K. Heidtke, K. Siems, D. Wettschereck, H.Blockeel, Diterpene structure elucidation from C NMR spectra with inductivelogic programming, Appl. Artif. Intell. 12 (1998) 363–383.

[21] K. Enslein, B.W. Blake, H.h. Borgstedt, Prediction of probability of carcinogenicityfor a set of ongoing NTP bioassays, Mutagenesis 5 (4) (1990) 305–306.

[22] C. Feng, Inducing temporal fault diagnostic rules from a qualitative model, in:Inductive Logic Programming, Academic Press, 1992, pp. 473–488.

[23] Richard Frank, Flavia Moser, Martin Ester, A method for multi-relationalclassification using single and multi-feature aggregation functions, in: PKDD,2007, pp. 430–437.

[24] Lise Getoor, John Grant, Prl: a probabilistic relational language, Mach. Learn.62 (1–2) (2006) 7–31.

[25] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation,SIGMOD Rec. 29 (2) (2000) 1–12.

756 Y. Kavurucu et al. / Knowledge-Based Systems 23 (2010) 743–756

[26] G. Hinton, UCI machine learning repository kinship data set, 1990. Availablefrom: <http://archive.ics.uci.edu/ml/datasets/Kinship>.

[27] Troyce D. Jones, Clay E. Easterly, On the rodent bioassays currently beingconducted on 44 chemicals: a RASH analysis to predict test results from theNational Toxicology Program, Mutagenesis 6 (6) (1991) 507–514.

[28] Y. Kavurucu, P. Senkul, I.H. Toroslu, Aggregation in confidence-basedconcept discovery for multi-relational data mining, in: Proceedings of IADISEuropean Conference on Data Mining (ECDM), Amsterdam, Netherland, 2008,pp. 43–50.

[29] Y. Kavurucu, P. Senkul, I.H. Toroslu, Confidence-based concept discovery inmulti-relational data mining, in: Proceedings of International Conference onData Mining and Applications (ICDMA), Hong Kong, 2008, pp. 446–451.

[30] Y. Kavurucu, P. Senkul, I.H. Toroslu, Analyzing transitive rules on a hybridconcept discovery system, in: LNCS, Hybrid Artificial Intelligent Systems, vol.5572/2009, Springer, Berlin/Heidelberg, 2009, pp. 227–234.

[31] Y. Kavurucu, P. Senkul, I.H. Toroslu, Confidence-based concept discovery inrelational databases, in: Proceedings of 2009 World Congress on ComputerScience and Information Engineering (CSIE 2009), Los Angeles, USA, 2009, pp.43–50.

[32] Y. Kavurucu, P. Senkul, I.H. Toroslu, ILP-based concept discovery in multi-relational data mining, Expert Syst. Appl. 36 (2009).

[33] Y. Kavurucu, P. Senkul, I.H. Toroslu, Multi-relational concept discovery withaggregation, in: Proceedings of 24th International Symposium on Computerand Information Sciences (ISCIS 2009), Northern Cyprus, 2009, pp. 43–50.

[34] R.D. King, A. Srinivasan, M.J.E. Sternberg, Relating chemical activity tostructure: an examination of ILP successes, New Gener. Comput. 13 (1995)411–433.

[35] A.J. Knobbe, A. Siebes, B. Marseille, Involving aggregate functions in multi-relational search, in: PKDD ’02: Proceedings of the 6th European Conference onPrinciples of Data Mining and Knowledge Discovery, Springer-Verlag, London,UK, 2002, pp. 287–298.

[36] E. Lamma, P. Mello, M. Milano, F. Riguzzi, Integrating induction and abductionin logic programming, Inf. Sci. Inf. Comput. Sci. 116 (1) (1999) 25–54.

[37] N. Lavrac, P.A. Flach, B. Zupan. Rule evaluation measures: a unifying view, in:ILP ’99: Proceedings of the 9th International Workshop on Inductive LogicProgramming, Springer-Verlag, London, UK, 1999, pp. 174–185.

[38] N. Lavrac, S. Dzeroski, Inductive Logic Programming: Techniques andApplications, Ellis Horwood, New York, 1994.

[39] C. Lee, C. Tsai, T. Wu, W. Yang, An approach to mining the multi-relationalimbalanced database, Expert Syst. Appl. 34 (4) (2008) 3021–3032.

[40] H.A. Leiva, Mrdtl: a multi-relational decision tree learning algorithm, Master’sthesis, Iowa State University, Iowa, USA, 2002.

[41] David F.V. Lewis, Costas Ioannides, Dennis V. Parke, A prospective toxicityevaluation (COMPACT) on 40 chemicals currently being tested by the NationalToxicology Program, Mutagenesis 5 (5) (1990) 433–435.

[42] V. Liftschitz, Closed-world databases and circumscription, Artif. Intell. 27(1985) 229–235.

[43] R. Michalski, J. Larson, Inductive inference of VL decision rules, in: Workshopon Pattern-Directed Inference Systems, vol. 63, SIGART Newsletter, ACM,Hawaii, 1997, pp. 33–44.

[44] J. Minker, On indefinite databases and the closed world assumption, in:Proceedings of the Sixth International Conference on Automated Deduction(CADE’82), 1982, pp. 292–308.

[45] S. Muggleton, Inductive logic programming, New Gener. Comput. 8 (4) (1991)295–318.

[46] S. Muggleton (Ed.), Inductive Logic Programming, Academic Press, London,1992.

[47] S. Muggleton, Inverse entailment and PROGOL, New Gener. Comput. Specialissue Inductive Logic Program. 13 (3–4) (1995) 245–286.

[48] S. Muggleton, Learning from positive data, in: Proceedings of the 6thInternational Workshop on Inductive Logic Programming, Lecture Notes inArtificial Intelligence, vol. 1314, Springer-Verlag, 1996, pp. 358–376.

[49] J. Neville, D. Jensen, L. Friedland, M. Hay, Learning relational probability trees,in: KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, ACM, New York, NY, USA, 2003, pp.625–630.

[50] Claudia Perlich, Foster Provost, Aggregation-based feature invention andrelational concept classes, in: KDD ’03: Proceedings of the Ninth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining,ACM, New York, NY, USA, 2003, pp. 167–176.

[51] H. Prendinger, M. Ishizuka, A creative abduction approach to scientific andknowledge discovery, Knowl. Based Syst. 18 (7) (2005) 321–326.

[52] J.R. Quinlan, Learning logical definitions from relations, Mach. Learn. 5 (3)(1990) 239–266.

[53] R. Reiter, On Closed World Databases, LNAI, vol. 4160, Plenum Press.[54] D.M. Sanderson, C.G. Earnshaw, Computer prediction of possible toxic action

from chemical structure; the DEREK system, Hum. Exp. Toxicol. 10 (4) (1991)261–273.

[55] M. Serrurier, H. Prade, Introducing possibilistic logic in ILP for dealing withexceptions, Artif. Intell. 171 (16–17) (2007) 939–950.

[56] M. Serrurier, H. Prade, Improving inductive logic programming by usingsimulated annealing, Inf. Sci. 178 (6) (2008) 1423–1441.

[57] A. Srinivasan, The ALEPH manual, 1999.[58] A. Srinivasan, R. King, S. Muggleton, The role of background knowledge: using

a problem from chemistry to examine the performance of an ILP program,Under Review for Intelligent Data Analysis in Medicine and Pharmacology,Kluwer Academic Press, 1996.

[59] A. Srinivasan, R.D. King, S. Muggleton, M.J.E. Sternberg, Carcinogenesispredictions using ILP, in: Proceedings of the 7th International Workshop onInductive Logic Programming, vol. 1297, Springer-Verlag, 1997, pp. 273–287.

[60] A. Srinivasan, R.D. King, S.H. Muggleton, M. Sternberg, The predictivetoxicology evaluation challenge, in: Proceedings of the FifteenthInternational Joint Conference on Artificial Intelligence (IJCAI-97), Morgan-Kaufmann, 1997, pp. 1–6.

[61] A. Srinivasan, S. Muggleton, R.D. King, M.J.E. Sternberg, Theories formutagenecity: a study of first-order and feature based induction, TechnicalReport, PRG-TR-8-95, Oxford University Computing Laboratory, 1995.

[62] R.W. Tennant, J. Spalding, S. Stasiewicz, J. Ashby, Prediction of the outcome ofrodent carcinogenicity bioassays currently being conducted on 44 chemicalsby the national toxicology program, Mutagenesis 5 (1990) 3–14.

[63] I.H. Toroslu, M. Yetisgen-Yildiz, Data mining in deductive databases usingquery flocks, Expert Syst. Appl. 28 (3) (2005) 395–407.

[64] M. Uludag, M.R. Tolun, A new relational learning system using novel ruleselection strategies, Knowl. Based Syst. 19 (8) (2006) 765–771.

[65] L. Wang, X. Liu, A new model of evaluating concept similarity, Know. BasedSyst. 21 (8) (2008) 842–846.

[66] Q. Wu, Z. Liu, Real formal concept analysis based on grey-rough set theory,Know. Based Syst. 22 (1) (2009) 38–45.

[67] X. Yin, J. Han, J. Yang, P.S. Yu, Crossmine: efficient classification across multipledatabase relations, in: ICDE, 2004, pp. 399–411.