mining patterns with attribute oriented induction
TRANSCRIPT
Mining Patterns with Attribute Oriented Induction
Spits Warnars
Database, Datawarehouse & Data Mining Research Center, Surya University
Jl. Boulevard Gading Serpong Blok O/1, Tangerang 15810, Indonesia
ABSTRACT
Mining data in human activity life such as business,
education, engineering, health and so on, is
important and help human itself in order to justify
their decision making process. Attribute Oriented
Induction (AOI) has been using to mine significant
different patterns since was coined in 1989, has
been combined and as complement with other data
mining pattern. AOI has been proved and powerful,
has future opportunity to be explored in order to
help human life to find data patterns. AOI is chosen
since can reduce many patterns by summarize/roll
up many patterns in low into high level in concept
tree/hierarchy. However, non summarize pattern at
low level in concept tree/hierarchy can be used to
sharpen the mining knowledge pattern just as like
roll up and drill down in data warehouse. Mapping
implementation of AOI in human life area such as
business, education, engineering, health and so on,
is useful in order to give valuable knowledge AOI
mining pattern, particularly for those who interest
with AOI data mining technique as data mining
technique which can summarize many pattern into
simple patterns.
KEYWORDS
Data Mining, Attribute Oriented Induction, AOI,
pattern, rule.
1. INTRODUCTION
Attribute Oriented Induction (AOI) method was
first proposed in 1989 integrates a machine
learning paradigm especially learning-from-
examples techniques with database operations,
extracts generalized rules from an interesting
set of data and discovers high level data
regularities [39]. AOI provides an efficient and
effective mechanism for discovering various
kinds of knowledge rules from datasets or
databases.
AOI approach is developed for learning
different kinds of knowledge rules such as
characteristic rules, discrimination rules,
classification rules, data evolution regularities
[1], association rules and cluster description
rules[2].
1) Characteristic rule is an assertion which
characterizes the concepts which satisfied
by all of the data stored in database. This
rule provide generalized concepts about a
property which can help people recognize
the common features of the data in a class.
For example the symptom of the specific
disease [9].
2) Discriminant rule is an assertion which
discriminates the concepts of one (target)
class from another (constrasting). This rule
give a discriminant criterion which can be
used to predict the class membership of of
new data,for example to distinguish one
disease from the other [9].
3) Classification rule is a set of rules which
classifies the set of relevant data according
to one or more specific attributes. For
example, classifying diseases into classes
and provide the symptoms of each [40].
4) Association rule is association relationships
among the set of relevant data. For
example, discovering a set of symptoms
frequently occurring together[12,35].
5) Data evolution regularities rule is a general
evolution behaviour of a set of the relevant
data (valid only in time-related/temporal
data). For example, describing the major
factors that influence the fluctuations of
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 11
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
stock values through time [3,37]. Data
evolution regularities can then be classified
into characteristic rule and discrimination
rule[3].
6) Cluster description rule is used to cluster
data according to data semantics [12], for
example clustering the university student
based on different attribute(s).
2. QUANTITATIVE AND QUALITATIVE
RULES IN AOI
Rules in AOI can be represented with
quantitative and qualitative rules:
1) Quantitative rule is a rule which is
associated with quantitative information
such as statistical information which asses
the representativeness of the rule in the
database [1]. There are three types
quantitative rule i.e. quantitative
characteristic rule, quantitative
discriminative rule and quantitative
characteristic and discriminative rule.
a. Quantitative characteristic rule is
quantitative information of a
characteristic rule and each rule in final
generalization can be measured with t-
weight in formula 1.
t-weight =Votes(qa)/ (1)
where :
t-weight = percentage of each rule in
the final generalized
relation.
Votes(qa) = number of tuples in each
rule in the final
generalized relation
Where Votes(qa) is in
Votes{q1,...,qN}.
N = number of rules in the final
generalized relation.
Quantitative characteristic rule is
represented with symbol and should
be in the form of:
V(x)=target_class(x)condition1(x)
[t:w1] V...V conditionn(x)[t:wn]
Where :
x is the target class between 1..n.
n is the number of rules in the final
generalized relation.
[t:w1] is t-weight (formula 1) for rule
1 until
[t:wn] as t-weight (formula 1) for rule
n.
Example:
V(x) = graduate(x)(Birthplace(x) Є
Canada Λ GPA(x) Є excellent)
[t:75%] V (Major(x) Є science Λ
Birthplace(x) Є Foreign Λ
GPA(x) Є good) [t:25%]
b. Quantitative discriminative rule is a
discrimination rule that use quantitative
information. Each rule in the target class
will be discriminated against a rule in
the constrating class and is measured
with d-weight in formula 2.
d-weight =Votes(qa ϵ Cj) /
(2)
where :
d-weight = percentage ratio per rule
in the target class to the
total number of tuples in
the target class and the
contrasting class for the
same rule.
Votes(qa) = number of tuples in each
rule in the target class Cj.
Cj is in {C1,...,CK}.
K = total number of the target and
constrating classess for the
same rule.
Quantitative discriminative rule is
shown with symbol and should be in
the form of:
V(x)=target_class(x) condition1(x)
[d:w1] V...V conditionn(x)[d:wn]
Where:
x is the target class between 1..n.
n is the number of rules in the target
class.
[d:w1] is d-weight (formula 2) for
rule 1 in the target class.
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 12
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
[t:wn] is d-weight (formula 2) for rule
n of target class.
Example:
V(x) = graduate(x) (Birthplace(x)
ЄForeign Λ GPA(x) Є good)
[d:100%] V (Major(x) Є social
Λ GPA(x) Є good) [d:25%]
c. Quantitative characteristic and
discriminative rule use quantitative
information characteristic rule and
discriminative rule which have both t-
weight and d-weight for the same rules.
Each rule is measured with t-weight in
formula 1 for characteristic rule and d-
weight in formula 2 for discriminative
rule. Quantitative characteristic and
discriminative rule is shown with
symbol and should be in the form
of:
V(x)=target_class(x)
condition1(x)[t: w1,d:w1] V...V
conditionn(x)[t:wn,d:wn]
Where:
x is target class between 1..n.
n is number of rules in target class.
[t: w1] is t-weight in formula 1.
[d: w1] is d-weight in formula 2.
Example:
V(x) = professor(x) (Birthplace(x)
ЄForeign Λ GPA(x) Єgood)
[t:20%,d:100%] V (Major(x) Є
social Λ GPA(x) Є good)
[t:10%,d:25%]
2) Qualitative rule can be obtained by using
the same process of learning applied in its
quantitative counterpart without the
association of the quantitative attribute in
the generalized relations [1]. Qualitative
characteristic rule uses symbol and
qualitative discriminative rule uses
symbol. Qualitative rule either
characteristic or discriminative rules should
be in the form of:
V(x)=target_class(x) [|] condition1(x)
V...V conditionn(x)
Example:
V(x) = graduate(x) (Birthplace(x)
ЄCanada Λ GPA(x) Є excellent) V
(Major(x) Є science Λ Birthplace(x)
Є Foreign Λ GPA(x) Є good)
3. Concept Hierarchies
One advantage of AOI is that it has concept
hierarchy as the background knowledge which
can be provided by the knowledge engineers or
domain experts [2,3,4]. Concept hierarchy
stored a relation in the database provides
essential background knowledge for data
generalization and multiple level data mining.
Concept hierarchy represents a taxonomy of
concept of the attribute domain values. Concept
hierarchy can be specified based on the
relationship among database attributes or by set
groupings and be stored in the form of relations
in the same database [7].
Concept hierarchy can be adjusted dynamically
based on the distribution of the set of data
relevant to the data mining tasks. The
hierarchies for numerical attributes can be
constructed automatically based on data
distribution analysis [7]. Concept hierarchy for
numeric will be treated differently for the sake
of efficiency [20,21,22,23,26]. For example if
there are a range of value between 0 and 1.99,
then there will be 199 values start from 0.00
until 1.99, but for efficiency there will be only
1 record created with 3 fields rather than with
200 records with 2 fields.
In concept hierarchy concepts are ordered by
levels from specific or low level concepts into
general or higher level. Generalization is
achieved by ascending to the next higher level
concepts along the paths of the concept
hierarchy. The most general concept is the null
description as the most specific concepts
correspond to the specific values of the
attributes in the database which described as
ANY. Concept hierarchy can be balanced or
unbalanced, where unbalanced hierarchy then
must be converted to a balanced hierarchy.
Figure 1 shows the concept hierarchy tree for
attribute workclass in adult dataset [18] which
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 13
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
has three levels. The first level as the low level
has 8 concepts and they are without-pay, never-
worked, private, self-emp-not-inc, self-emp-inc,
federal-gov,state-gov and local-gov concepts.
The second level has 5 concepts and they are
charity, unemployed, entrepreneur, centre and
territory concepts. The third level as the high
level has 2 concepts and they are non
government and government concepts. For
example, the concept of non government at the
high level has 3 sub concepts in the second
level: charity, unemployed and entrepreneur
concepts. The concept entrepreneur at the
second level has 3 sub concepts in the low
level: private, self-emp-not-inc and self-emp-
inc concepts.
Figure 1. A concept hierarchy tree for attribute
workclass in adult dataset[18]
Concept hierarchy in figure 1 can be
represented with:
Without-pay Charity
Never-worked Unemployed
{Private, self-emp-not-inc,
self-emp-inc} entrepreneur
{federal-gov,state-gov} Centre
Local-gov Territory
{Charity,Unemployed,
entrepreneur} Non government
{Centre, Territory} Government
{Non government,
Government} ANY(workclass)
Where symbol indicates generalization, for
example Without-pay Charity indicates that
Charity concept is a generalization of Without-
pay concept.
There are four types of concept generalization
in the concept hierarchy [6]:
1) Unconditional concept generalization: rule
is associated with the unconditional IS-A
type rules. A concept is generalized to a
higher level concept because of the
subsumption relationship indicated in the
concept hierarchy.
2) Conditional/deductive rule generalization:
rule is associated with a generalization path
as a deduction rule where the type of rules
is conditional and can only be applied to
generalize a concept if the corresponding
condition can be satisfied. For example,
form: A(x) Λ B(x) C(x) has the meaning
that for a tuple x, the concept(attribute
value) A can be generalized to concept C if
condition B can be satisfied by x. Or
concept C can be generalized if it can be
satisfied by concept A and B.
3) Computational rule generalization: each
rule is represented by a condition which is
value-based and can be evaluated against an
attribute or a tuple or the database by
performing some computations. The true
value of the condition would then determine
whether a concept can be generalized via
the path.
4) Hybrid rule-based concept generalization: a
hierarchy can have paths associated with all
the above 3 different types of rules. It has a
powerful representation capability and is
suitable for many kinds of application.
Rules number 2-4 is three types of rule based
concept hierarchy [5,34] while rule number 1 is
a non rule based concept hierarchy.
A rule-based concept hierarchy is a concept
hierarchy whose paths have associated
generalization rules. In the rule-based
induction, data cube (hypercube) in
multidimensional datawarehouse is the
favourable data structure [6]. To perform a rule-
based induction on the data in a large
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 14
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
warehouses, the path relation algorithm is an
excellent choice because datawarehouse has
already structured as cube/hypercube [6]. Rule-
based concept has induction anomaly problem
which affects the efficiency which is caused by:
1) A rule may depend on an attribute which
has been removed.
2) A rule may depend on an attribute whose
concept level in the prime relation has been
generalized too high to match the condition
of the rule.
3) A rule may depend on a condition which
can only be evaluated against the initial
relation, e.g. the number-of-tuples in the
relation.
There are three ways to solve the induction
anomaly problem [6]:
1) Reapplying the deduction rules all over
again on the initial relation which are costly
and wasteful.
2) Repetitive generalization required by roll-
up and drill-down which can be done in an
efficient way without induction anomaly
problem.
3) Propose the use of path relation (the last
method backtracking algorithm [5,6]
4. AOI prototype
The AOI method was implemented in a data
mining system prototype called DBMINER
[5,7,17,28,29] which previously called
DBLearn and been tested successfully against
large relational database. DBLearn
[24,25,27,38] is a prototype data mining system
which was developed in Simon Fraser
University. DBMINER was developed by
integrating database, OLAP and data mining
technologies [17,36] has following features:
1) Incorporating several data mining
techniques like attribute oriented induction,
statistical analysis, progressive deepening
for mining multiple-level rules and meta-
rule guided knowledge mining [7] data cube
and OLAP technology [17].
2) Mining new kinds of rules from large
databases including multiple level
association rules, classification rules, cluster
description rules and prediction.
3) Automatic generation of numeric
hierarchies and refinement of concept
hierarchies.
4) High level SQL-like and graphical data
mining interfaces.
5) Client server architecture and performance
improvements for larger application.
6) SQL-like data mining query language
DMQL and Graphical user interfaces have
been enhanced for interactive knowledge
mining.
7) Perform roll-up and drill-down at multiple
concept levels with multiple dimensional
data cubes.
5. AOI algorithms
AOI can be implemented with an architecture
design shown in figure 2 where characteristic
rule (LCHR) and classification rule (LCLR) can
be learned directly from the transactional
database (OLTP) or Data warehouse (OLAP)
[6,8] with the help of the concept hierarchy as
the knowledge generalization. Concept
hierarchy can be created from OLTP database
as a direct resource.
Figure 2. AOI architecture
From a database we can identify two types of
learnings:
1) Positive learning as the target class where
the data are tuples in the database which are
consistent with the learning concepts.
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 15
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
Positive learning/target class will be built
when learn characteristic rule
2) Negative learning as the contrasting class in
which the data do not belong to the target
class. negative learning/contrasting class
will be built when learn discrimination or
classification rule.
Characteristic rule has been used by AOI in
order to recognize, learning and mining as a
specific character for each of attribute as their
specific mining characterization. Characteristic
rule process the generalization with help of
concept hierarchy as the standard saving
background knowledge to find target class as a
positive learning. Mining rule can not be
limited with just only one rule, as the more
rules can be created the more mining can be
done. This has been proven as an intelligent
system which can help human to make a system
that has ability to think like a human [3]. Rules
often can be discovered by generalization in
several possible directions [9].
Relational database as resources for data
mining with AOI can be read with data
manipulation language select sql statement
[13,14,15,16]. Using a query for building rules
gives an efficient mechanism for understanding
the mined rules [11,12]. In the current AOI, a
query is processed with SQL-like data mining
query language DMQL at the beginning of the
process. It collects the relevant sets of data by
processing a transformed relational query,
generalizes the data by AOI and then presents
the outputs in different forms [7].
AOI generalizes and reduces the prime relation
further until the final relation can satisfy the
user expectation based on the set threshold. One
or two thresholds can be applied, where one
threshold is used to control both of number of
distinct attributes and tuples in the
generalization process, whilst two thresholds
are used to control the number of distinct
attributes and tuples in the generalization
process.
Threshold as a control for the maximum
number of tuples of the target class in the final
generalized relation can be replaced with group
by operator in sql select statement which will
limit the final result of generalization. Setting
different threshold will generate different
generalized tuples as the needed of global
picture of induction repeatedly as time-
consuming and tedious work [10]. All
interesting generalized tuples as multiple rule
can be generated as the global picture of
induction by using group by operator or distinct
function in the sql select statement.
AOI can perform datawarehouse techniques by
doing generalization process repetitively in
order to generate rules at different concepts
levels in a concept hierarchy, enabling the user
to find the most suitable discovery levels and
rules. This technique performs roll up
(progressive generalization [6]) or drill down
(progressive specialization [6]) and operation
[2,7] have been recognized as datawarehouse
techniques. Finding the most suitable discovery
levels and rules would add multidimensional
views to a database using generalization
process repetitively at different concepts level.
Building a logical formula as the representation
of a final result of AOI can not be done with
select sql statement and not select sql statement.
However, the sql statement can be matched
with other applications like Java, Visual Basic,
programming server program like ASP, JSP or
PHP. The data resulted from the sql statement
can be used to create a logical formula using
one of those application softwares.
There are 8 strategy steps that must be done [3]
in the process of generalization. Here is step
one to seven which is for characteristic rule and
step one to eight are for
classification/discriminant rule.
1) Generalization on the smallest
decomposable components, generalization
should be performed on the smallest
decomposable components of a data
relation.
2) Attribute removal, if there is a large set of
distinct values for an attribute but there is
no higher level concept provided for the
attribute, the attribute should be removed
during generalization.
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 16
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
3) Concept tree Ascension, if there exists a
higher level concept in the concept
hierarchy for an attribute value of a tuple,
the substitution of the value by its higher
level concept would generalize the tuples.
4) Vote propagation, the value of the vote is
the value of accumulated tuples where the
vote will be accumulated when merging
identical tuples in the generalization.
5) Threshold control on each attribute, if the
number of distinct values in a resulting
relationthe is larger than the specified
threshold value, further generalization on
this attribute should be performed.
6) Threshold control on generalized relations,
if the number of tuples is larger than the
specified threshold value, further
generalization will be done based on the
selected attributes and the merging of the
identical tuples should be performed.
7) Rule transformation, change final
generalization to quantitative rule and
qualitative rule from a tuple (conjunctive)
or multiple tuples (disjunctive).
8) Handling overlapping tuples, if there are
overlapping tuples in both target and
constrasting classes, these tuples should be
marked and eliminated from the final
generalized relation.
AOI characteristic rule algorithm [3] is given as
follow:
For each of attribute Ai (1 i n, where n= # of attributes) in the
generalized relation GR
{ While
#_of_distinct_values_in_attribute_Ai >
threshold
{If no higher level concept in
concept hierarchy for
attribute_Ai
Then remove attribute Ai
Else substitute the value of Ai
by its corresponding
minimal generalized
concept
Merge identical tuples
}
}
While #_of_tuples in GR > threshold
{ Selective generalize attributes
Merge identical tuples
}
This AOI characteristic rule algorithm is the
implementation of step one to seven of the
generalization strategy steps. The algorithm
shows two sub processes i.e. control number of
distinct attributes and control number of tuples.
1) Control number of distinct attributes is a
vertical process which checks every per
attribute vertically. This is done by
checking all attributes in the learning results
of a dataset which have distinct attributes
less equal than the threshold. This first sub
process is just applied attributes that have
distinct attributes greater than threshold
while the number of distinct attributes are
also greater than the threshold. Each
attribute which have distinct attribute
greater than threshold will be checked if it
has a higher level concept in the concept
hierarchy. If it has no higher level concept
then the attribute will not be used. On the
other hand if they have higher level concept
then the attribute value will be substituted
with the value of the higher level concept.
Merging identical tuples will be done in
order to summarize generalization and
accumulate the value of the vote of the
identical tuples by eliminating the
redundant tuples. Eventually, after this first
sub process all the attributes in
generalization will have number of distinct
attributes less equal than the threshold. This
first sub process is implementation of step
one to five of the generalization strategy
steps.
2) Control number of tuples is a horizontal
process which checks per rule horizontally.
This is carried out for those attributes which
passed the first sub process where each
attribute will have number of distinct
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 17
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
attributes less equal than the threshold. This
second sub process is only done while the
number of rules is greater than threshold.
Selective generalization of the attributes
and merging of the indentical tuples will
reduce the number of rules. Selecting
candidate attribute for further generalization
can be done by preferences with finding the
ratio on the number of tuples or the number
of distinct attribute values. Selecting
candidate attribute for further generalization
can be examined by user based on the non
interesting one, either non interesting
attribute or non interesting rule. As with
first sub process merging the identical
tuples will be done in order to summarize
generalization and accumulate the vote
value of identical tuples by eliminating the
redundant tuples. Eventually, after this
second sub process the number of rules is
less equal than the threshold. This second
sub process is the implementation of step
three, four and six of the generalization
strategy steps.
AOI discriminant rule algorithm [1] is shown
below:
For each of attribute Ai (1 i n, where n= # of attributes) in the
generalized relation GR
{ Mark the overlapping tuples
While
#_of_distinct_values_in_attribute_Ai >
threshold
{ If no higher level concept in
concept hierarchy for
attribute_Ai
Then remove attribute Ai
Else substitute the value of
Ai by its corresponding
minimal generalized
concept
Mark the overlapping tuples
Merge identical tuples
}
}
While #_of_tuples in GR > threshold
{ Selective generalize attributes
Mark the overlapping tuples
Merge identical tuples
}
AOI discriminant rule algorithm is the
implementation of step one until eight of
generalization strategy steps. Since AOI
discriminant rule and AOI characteristic rule
algorithms have the same generalization
strategy steps between steps one and seven,
then literally they have the same process and
the difference is just only in step eight. They
also have the same sub processes i.e. control
number of distinct attributes as the first sub
process and control number of tuples as the
second sub process. The step handling
overlapping tuples as the eight generalization
strategy step is process in the beginning before
the first sub process and both in first and
second processes before merge indentical
tuples.
6. AOI Advantages and disadvantages
AOI provides a simple and efficient way to
learn knowledge rules from a large database
and has many advantages [9] such as:
1) AOI provides additional flexibility over
many machine learning algorithms.
2) AOI can learn knowledge rules in different
conjunctive and disjunctive forms and
provides more choices for the experts and
users.
3) AOI can use database facilities as the
traditional relational database such as
selection, join, projection whereas most
learning algorithms suffer from inefficiency
problems in a large database environment.
4) AOI can learn qualitative rules with
quantitative information while many
machine learning algorithm only can learn
qualitative rules.
5) AOI can handle noisy data and exceptional
cases elegantly by incorporating statistical
techniques in the learning process whereas
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 18
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
some learning system can only work in a
‘noise free’ environment.
However, AOI also has disadvantages [10] such
as:
1) AOI can only provides a snapshot of the
generalized knowledge and not a global
picture. Yet, the global picture can be
revealed by trying different thresholds
repeatedly.
2) Adjusting different thresholds will result in
different sets of generalized tuples.
However, using different thresholds
repeatedly is a time consuming and tedious
work.
3) There will be a problem in selecting the best
generalized rules between the large and
small threshold. Where in a large threshold
value will lead to a relatively complex rule
with many disjuncts and the results may not
be fully generalized. On the other hand a
small threshold value will lead to a simple
rule with few disjuncts and the results may
over generalized the rule with a risk of
losing some valuable informations.
7. AOI Current Studies
There are a number of recent studies on AOI. One study by Chen et al has proposed a global AOI method employing multiple-level mining technique with multiple minimum supports in order to generalize all interesting general knowledge [30]. Wu et al have proposed a Global Negative AOI (GNAOI) approach that can generate comprehensive and multiple-level negative generalized knowledge at the same time [31]. Furthermore, Muyeba et al have proposed clusterAOI, a hybrid interestingness heuristic algorithm, which uses attribute features such as concept hierarchies and distinct domain attribute values to dynamically recalculate new attribute thresholds for each less significant attribute [32]. Moreover, Huang et al have introduced the Modified AOI (MAOI) method to deal with the multi-valued attribute table and further sort the readers into different
clusters. Instead of using the concept hierarchy and concept trees, MAOI method implemented the concept climbing and generalization of multi-valued attribute table with Boolean Algebra and modified Karnaugh Map, and then described the clusters with concept description [33].
Meanwhile, Over generalization problem in AOI was reduced with entropy measurement, where AOI algorithm was extended by feature selection for generalization process depends on feature entropy measurement [41]. Meanwhile, AOI is combined with EP(Emerging Pattern) become AOI-HEP(Attribute Oriented Induction High Emerging Pattern) use to mine frequent and similar pattern [42,43,44] and have future research such as inverse discovery learning, learning more than two datasets and learning other knowledge rules[45]. Moreover, MAOI (Modified AOI) algorithm was proposed to deal with multi-valued attributes which convert the data to Boolean bit uses K-map to converge the attributes[46]. Furthermore, AOI was modified and called Frequency Count AOI (FC-AOI) and used to mine the network data[47]. Meanwhile, AOI was extended and used as Extended Attribute Oriented Induction (EAOI) for clustering mixed data type, where EAOI has function to drawback major values and numeric attributes[48,49].
Moreover, AOI was chosen as second step from 5 steps proposed algorithm in order to produce AOI characteristic rule for parallel machine scheduling[50]. Another approach was proposed where doing classification using decision tree induction which improve C4.5 classifier with 4 steps where the first step is generalization by AOI[51]. Meanwhile, CFAOI (Concept-Free AOI) was proposed in order to improve AOI from the constraint of concept tree on multi value attributes, by combining the simplified binary digits with Karnaugh Map [52].
8. Conclusion
AOI has ages of 26 years since 1989 proof that
still exist in finding pattern and have been
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 19
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
combined and as complement with other data
mining techniques. AOI can mine many
different patterns and other possible patterns in
the future. AOI has been proof as powerful
mining technique when many patterns can be
mining with simple pattern results. AOI has
powerful in order to roll up/summarize data in
low to high level in concept tree/hierarchy,
which show that produce simple pattern.
Implementation AOI shows that AOI is useful
and recognized to mine pattern summarize
pattern from huge pattern and many kinds of
different patterns. Using AOI in many kind of
field such as business, education, engineering,
health and so on, should be mapped in order to
increase the reliability of AOI as proof and
powerful data mining technique.
Acknowledgement
This research is supported under Program of
research incentive of national innovation
system (SINAS) from Ministry of Research,
Technology and Higher Education of the
Republic of Indonesia, decree number
147/M/Kp/IV/2015, Research code: RD-2015-
0020.
REFERENCES
[1] Han,J., Cai, Y., and Cercone, N. 1993. Data-driven discovery of
quantitative rules in relational databases. IEEE Trans on Knowl and Data Engin, 5(1),29-40.
[2] Han,J. and Fu, Y. 1995. Exploration of the power of attribute-oriented induction in data mining. in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, eds. Advances in Knowledge Discovery and Data Mining, 399-421.
[3] Han, J., Cai, Y. and Cercone, N. 1992. Knowledge discovery in databases: An attribute-oriented approach. In Proceedings of the 18th Int. Conf. Very Large Data Bases, 547-559.
[4] Han,J. 1994. Towards efficient induction mechanisms in database systems. Theoretical Computer Science, 133(2), 361-385.
[5] Cheung, D.W., Fu, A.W. and Han, J. 1994. Knowledge discovery in databases: A rule-based attribute-oriented approach. In Proceedings of Intl Symp on Methodologies for Intelligent Systems, 164-173.
[6] Cheung, D.W., Hwang, H.Y., Fu, A.W. and Han, J. 2000. Efficient rule-based attribute-oriented induction for data mining. Journal of Intelligent Information Systems, 15(2), 175-200.
[7] Han,J., Fu, Y.,Wang, W., Chiang, J., Gong, W., Koperski, K., Li,D., Lu, Y., Rajan,A., Stefanovic,N., Xia,B. and Zaiane,O.R.1996. DBMiner:A system for mining knowledge in
large relational databases. In Proceedings of Int'l Conf. on Data Mining and Knowledge Discovery, 250-255.
[8] Han,J., Lakshmanan, L.V.S. and Ng, R.T. 1999. Constraint-based, multidimensional data mining. IEEE Computer, 32(5), 46-50.
[9] Cai, Y. 1989. Attribute-oriented induction in relational databases. Master thesis, Simon Fraser University.
[10] Wu, Y., Chen, Y. and Chang, R. 2009. Generalized Knowledge Discovery from Relational Databases. International Journal of Computer Science and Network, 9(6),148-153.
[11] Imielinski, T. and Virmani, A. 1999. MSQL: A Query Language for Database Mining. in Proceedings of Data Mining and Knowledge Discovery, 3, 373-408.
[12] Muyeba, M. 2005. On Post-Rule Mining of Inductive Rules using a Query Operator. In Proceedings of Artificial Intelligence and Soft Computing.
[13] Meo, R., Psaila,G. and Ceri,S. 1998. An Extension to SQL for Mining Association Rules. In Proceedings of Data Mining and Knowledge Discovery,2,195-224.
[14] Muyeba,M.K. and Keane,J.A. 1999. Extending attribute-oriented induction as a key-preserving data mining method. In Proceedings 3rd European Conference on Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer science, 1704, 448-455.
[15] Muyeba, M. and Marnadapali, R. 2005. A framework for Post-Rule Mining of Distributed Rules Bases. In Proceeding of Intelligent Systems and Control.
[16] Zaiane, O.R. 2001. Building Virtual Web Views. Data and Knowledge Engineering, 39, 143-163.
[17] Han, J., Chiang, J. Y., Chee, S., Chen, J., Chen, Q., Cheng, S., Gong, W., Kamber, M.,Koperski, K., Liu, G., Lu, Y., Stefanovic, N., Winstone, L., Xia, B. B., Zaiane, O. R., Zhang, S., and Zhu, H. 1997. DBMiner: a system for data mining in relational databases and data warehouses. In Proceedings of the 1997 Conference of the Centre For Advanced Studies on Collaborative Research, 8-.
[18] Frank, A. and Asuncion, A. 2010. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[19] Elfeky, M.G., Saad, A.A. and Fouad, S.A. 2000. ODMQL: Object Data Mining Query Language. In Proceedings of the International Symposium on Objects and Databases, 128-140.
[20] Han, J. and Fu, Y. 1994. Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases. In Proceedings of AAAI Workshop on Knowledge Discovery in Databases, 157-168.
[21] Huang, Y. and Lin, S. 1996. An Efficient Inductive Learning Method for Object-Oriented Database Using Attribute Entropy. IEEE Transactions on Knowledge and Data Engineering, 8(6),946-951.
[22] Hu, X. 2003. DB-HReduction: A Data Preprocessing Algorithm for Data Mining Applications. Applied Mathematics Letters,16(6),889-895.
[23] Hsu, C. 2004. Extending attribute-oriented induction algorithm for major values and numeric values. Expert Systems with Applications, 27, 187-202.
[24] Han, J., Fu, Y., Huang, Y., Cai, Y., and Cercone, N. 1994. DBLearn: a system prototype for knowledge discovery in relational databases. ACM SIGMOD Record, 23(2), 516.
[25] Han, J., Fu, Y., and Tang, S. 1995. Advances of the DBLearn system for knowledge discovery in large databases. In Proceedings of the 14th international Joint Conference on Artificial intelligence, 2049-2050.
[26] Beneditto, M.E.M.D. and Barros, L.N.D. 2004. Using Concept Hierarchies in Knowledge Discovery. Lecture Notes in Computer Science, 3171,255–265.
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 20
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015
[27] Fudger, D. and Hamilton, H.J. 1993. A Heuristic for Evaluating Databases for Knowledge Discovery with DBLEARN. In Proceedings of the International Workshop on Rough Sets and Knowledge Discovery: Rough Sets, Fuzzy Sets and Knowledge Discovery (RSKD '93), 44-51.
[28] Han, J. 1997. OLAP Mining: An Integration of OLAP with Data Mining. In Proceedings of the 7th IFIP 2.6 Working Conference on Database Semantics (DS-7),1-9.
[29] Han, J., Fu,Y., Koperski, K., Melli, G., Wang, W. And Zaïane, O.R. 1996. Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Technologies, Canadian Artificial Intelligence,(38),4-8.
[30] Chen, Y.L., Wu,Y.Y. and Chang, R. 2012. From data to global generalized knowledge. Decision Support Systems, 52(2), 295-307.
[31] Wu,Y.Y., Chen,Y.L., and Chang,R., 2011, Mining negative generalized knowledge from relational databases, Knowledge-Based Systems,24(1), 134-145.
[32] Muyeba, M.K., Crockett, K. and Keane, J.A. 2011. A hybrid interestingness heuristic approach for attribute-oriented mining. In Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications (KES-AMSTA'11), 414-424.
[33] Huang, S., Wang, L. and Wang, W. 2011. Adopting data mining techniques on the recommendations of the library collections. In Proceedings of the 11th international conference on Information and Knowledge engineering, 46-52.
[34] Thanh, N.D., Phong, N.T. and Anh, N.K. 2010. Rule-Based Attribute-Oriented Induction for Knowledge Discovery. In Proceedings of the 2010 2nd International Conference on Knowledge and Systems Engineering (KSE '10), 55-62.
[35] Han,J. and Fu, Y. 1995. Discovery of Multiple-Level Association Rules from Large Databases. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB '95), 420-431.
[36] Han, J. 1998. Towards on-line analytical mining in large databases. SIGMOD Rec. 27(1), 97-107.
[37] Han, J., Cai, O., Cercone, N. and Huang, Y. 1995. Discovery of Data Evolution Regularities in Large Databases. Journal of Computer and Software Engineering,3(1),41-69.
[38] Cercone, N., Han, J., McFetridge, P., Popowich, F., Cai,Y., Fass, D., Groeneboer, C., Hall, G. and Huang, Y. 1994. System X and DBLearn: How to Get More from Your Relational Database, Easily. Integrated Computer-Aided Engineering, 1(4),311-339.
[39] Cai, Y., Cercone, N. and Han, J. 1991. Learning in relational databases: an attribute-oriented approach. Comput. Intell, 7(3),119-132.
[40] Cai, Y., Cercone, N. and Han, J. 1990. An attribute-oriented approach for learning classification rules from relational databases. In Proceedings of 6th International Conference on Data Engineering, 281-288.
[41] Al-Mamory, S.O., Hasson, S.T. and Hammid, M.K. 2013. Enhancing Attribute Oriented Induction of Data Mining, Journal of Babylon University, 7(21), 2286-2295.
[42] S. Warnars. 2015. Mining Frequent and similar patterns with Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) Data Mining technique, International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS), 3(11), 266-276.
[43] S.Warnars, 2014. Mining Frequent pattern with Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP). Proceedings of the 2nd International Conference on Information and Communication Technology (ICoICT), 144-149.
[44] S.Warnars, 2012. Attribute Oriented Induction High Level Emerging Pattern. Proceedings of the International Conference on Granular Computing(GrC).
[45] S.Warnars, 2014. Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) future research. Proceedings of the 8nd International Conference on Information & Communication Technology and Systems (ICTS), 13-18.
[46] Huang, s, Hsu, P. and Lam, H.N.N. 2013. An attribute oriented Induction approach for Knowledge discovery from relational databases. Advances in Information Sciences and Service Sciences (AISS), 5(3), 511-519.
[47] Tanutama, L. 2013. Frequency count Attribute Oriented Induction of Corporate Network data for Mapping Business activity. International Conference on Advances Science and Contemporary Engineering (ICASCE), 149-152.
[48] Prasad, D.H. and Punithavalli, M. 2012. An integrated GHSOM-MLP with Modified LM Algorithm for Mixed Data Clustering, ARPN Journal of Engineering and Applied Sciences, 7(9), 1162-1169.
[49] Prasad, D.H. and Punithavalli, M. 2013. A Novel approach for mixed Data Clustering using Dynamic Growing Hierarchical Self-Organizing Map and Extended Atrribute-Oriented Induction,Life Science Journal, 10(1), 3259-3266.
[50] Kaviani, M., Aminnayeri, M, Rafienejad, S.N. and Jolai, F.2012. An appropriate pattern to solving a parallel machine scheduling by combination of meta-heuristic and data mining, Journal of American Science, 8(1), 160-167.
[51] Ali, M.M, Qaseem, M.S., Rajamani, L. and Govardhan, A. 2013. Extracting useful Rules Through Improved Decision Tree Induction using Information Entropy, International Journal of Information Sciences and Techniques(IJIST), 3(1), 27-41.
[52] Huang, S. 2013. CFAOI: Concept-Free AOI on Multi Value Attributes. Life Science Journal, 10(4), 2341-2348.
ISBN: 978-1 -941968-20-8 ©2015 SDIWC 21
Proceedings of the International Conference on Database, Data Warehouse, Data Mining and Big Data (DDDMBD2015), Jakarta, Indonesia 2015