lecture+32 33+data+mining+with+aco
TRANSCRIPT
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
1/39
1
DATA MINING WITH ANT COLONYDATA MINING WITH ANT COLONYOPTIMIZATIONOPTIMIZATION
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
2/39
2
ANT ALGORITHMANT ALGORITHM
Another Collective Intelligence Approach
Ants appeared on earth some 100 million years ago
They have a current total population estimated at 1016individuals
Most of these ants are social insects living in colonies
(population may vary from 30 to millions of individuals)
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
3/39
3
ANT ALGORITHMANT ALGORITHM
Another Collective Intelligence Approach
Ants appeared on earth some 100 million years ago
They have a current total population estimated at 1016individuals
Most of these ants are social insects living in colonies
(population may vary from 30 to millions of individuals)
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
4/39
4
ANT ALGORITHMANT ALGORITHM
Another Collective Intelligence Approach
For several decades researchers have been fascinated by the
emergent behavior of colonies of social insects, such as ants
Despite the relative simplicity of an individual's behavior, the
colony as a whole can exhibit highly adaptive behavior,
leading researchers to consider the adaptation of these
collective processes for use in the area of artificial intelligence
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
5/39
5
ANT ALGORITHMANT ALGORITHM
Food Foraging Behaviour
Observations of the foraging behavior of ants has inspired
the development of ant-based algorithms
These algorithms are used to solve mainly combinatorial
optimization problems defined over discrete search spaces
One of the first behaviors studied by ethologists was the
ability of ants to find the shortest path between their nest and
a food source. These studies resulted in the first algorithmic
models of the foraging behavior of ants (Dorigo)
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
6/39
6
ANT ALGORITHMANT ALGORITHM
Food Foraging Behaviour
How do ants find the shortest path between their nest and
food source, without any visible, central, active coordination
mechanism?
Studies have shown that the search for food is random in the
beginning
As soon as a food source is located activity becomes more
organized with more and more ants following the same
(shortest) path to the food source
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
7/39
7
ANT ALGORITHMANT ALGORITHM
Food Foraging Behaviour
How do ants find the shortest path between their nest and
food source, without any visible, central, active coordination
mechanism?
Studies have shown that the search for food is random in the
beginning
As soon as a food source is located activity becomes more
organized with more and more ants following the same
(shortest) path to the food source
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
8/39
8
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
9/39
9
Ant Algorithm
An artificial ant can be considered as a simple computational
agent, A logic is implemented in the artificial ant to select a
path whenever there are several paths which can be followed
A number of artificial "ants" construct solutions to the
problem at hand by the repeated selection of parts from a
predefined set of solution components
These ants select components probabilistically biased by
heuristic information (a problem specific heuristic measure
of a component's utility) and pheromone information,
typically associated with the solution components
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
10/39
10
Ant Algorithm
To simulate the real-world process by which ants find the
shortest path to a food source, electronic ants deposit
pheromone on components in proportion to the quality of thesolutions that contain them
This is similar to a shorter path receiving pheromone
reinforcement sooner than a longer path (as ants return from
the food source sooner), thereby increasing the likelihood
that later ants will choose the shorter path over the longer
one
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
11/39
11
Ant Algorithm
ACO has been applied to solve many problems
Well suited to discrete optimization problems Job scheduling Subset problems
Network routing
Vehicle routing
Bioinformatics
Data mining
For the application of ACO to a problem we need
A representation of solution
A method to determine the fitness of the solution
A heuristic measure for the solutions component
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
12/39
12
Ant Algorithm
Consider the general problem of finding the shortest path
between two nodes on a graph, G = (V, E), where V is the set of
vertices (nodes) and E is a matrix representing the connectionsbetween nodes
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
13/39
13
Ant Algorithm and the TS problem
The Traveling Salesperson problem (TSP) is to find for n
given cities a shortest closed tour that contains every city
exactly once
In each generation each ofm ants constructs one solution
An ant starts from a random city and iteratively moves to
another city until the tour is complete and the ant is back at
its starting point
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
14/39
14
Ant Algorithm and the TS problem
When an ant decides which town to move to next, it does so
with a probability that is based on the distance to that city
and the amount ofpheromone on the connecting edge
Let dijbe the distance between the cities iandj
The probability that the ant choosesjas the next city after it
has arrived at city iwherejis in the setSof cities that have
not been visited is
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
15/39
15
Ant Algorithm and the TS problem
Here ij is the amount of pheromone on the edge (ij)
ij= 1/dijis a heuristic value,and and are constants that determines the relativeinfluences of the pheromone value and that of the heuristic
value on the decision of the ant
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
16/39
16
Ant Algorithm and the TS problem
The solution may have been even better, if the city had been
replaced by another city
But now, due to high pheromone value it will continue to
have a strong probability of selection for a considerable
length of time (until pheromone values on cities in
competition become much higher)
A high level of pheromones will be associated only with those
link that continuously get selected in good solutions
ANT ALGORITHMANT ALGORITHM
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
17/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
The goal of data mining is to discover knowledge that
is not only accurate, but also comprehensible for the
user
Comprehensibility is important whenever discovered
knowledge will be used for supporting a decision made
by a human user
If discovered knowledge is not comprehensible, it canlead to incorrect decisions
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
18/39
CLASSIFICATION PROCESS: MODEL
CONSTRUCTION
CLASSIFICATION PROCESS: MODEL
CONSTRUCTION
Training
Data
N A M E R A N K Y E A R S T E N U R E D
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yesBill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = professor
OR years > 6
THEN tenured = yes
Classifier
(Model)
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
19/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
This algorithm Ant Miner (ant colony based data miner)
algorithm was proposed by Rafael S. Parpinelli
The goal is to extract classification rules by using Ant
Colony Optimization Algorithm
The performance of Ant-Miner is compared with CN2, a
well-known data mining algorithm for classification, insix public domain data sets
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
20/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
In classification, discovered knowledge is often
expressed as
IF < condition > THEN < class >
Each condition is referred as a term, so that the rule
antecedent is a logical conjunction of terms, such as
IF term1 AND term2 AND..
Each term is a triple < Attribute, Operator, Value>,
such as
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
21/39
ANT-MINERANT-MINER
The current version of Ant-Miner copes only with
categorical attributes, so that the operator element is
always =
Continuous (real-valued) attributes are discretized in a
preprocessing step
It follows a sequential covering approach to discover a
list of classification rules covering all, or almost all, the
training cases
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
22/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Ant Miner has five major modules:
General description
Heuristic function
Rule pruning
Pheromone updating
Use of discovered rules for classifying new cases
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
23/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Ant Miner has five major modules:
General description
Heuristic function
Rule pruning
Pheromone updating
Use of discovered rules for classifying new cases
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
24/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Search Space
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
25/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Algorithm
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
26/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Algorithm
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
27/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Algorithm
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
28/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Term Selection Probability
Where:
Pijis the probability of a term that is candidate for selection in
the current partial rule
ijis the value of heuristic function
ij(t) is the amount of pheromone associated with a term
a is the total number of attributes
biis the number of values in the domain ofith attribute
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
29/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
29
Heuristic Function
For each termij of the form Ai = Vij ,where Ai is the ith
Attribute and Vij is the jth value belonging to the domain
of Ai , its entropy is:
Where
wis the class attribute kis the number of classes in the domain of class attribute
Aiis the i-th attribute
Vij is thejth value of the domain of the attributeAi
P(w | Ai= Vij) is the empirical probability of observing class w
conditional on having observeAi
= Vij
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
30/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
30
Heuristic Function
The proposed normalized informations theoratic
heuristic function is
Where:
a is the total number of attributes
xiis set to 1 if the attributeAiwas not yet used by the current ant
biis the number of values in the domain ofith attribute
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
31/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Rule Pruning
Goal of RP is to remove irrelevant terms from the
rule
Rule pruning potentially increases the predictive
power of the rule, helping to avoid its over-fitting
Rule pruning also improves the simplicity of the rule
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
32/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Rule Pruning
The basic idea is to iteratively remove one term at a time
from the rule while this process improves the quality of
the rule
In the next iteration, the term whose removal most
improves the quality of the rule is again removed and so
on
This process is repeated until the rule has just one term
or until there is no term whose removal will improve the
quality of the rule
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
33/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Pheromone Updating
The initial amount of pheromone deposited at each path
position is inversely proportional to the number of values
of all attributes and is defined by
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
34/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Pheromone Updating
The amount of pheromone associated with each
termij occurring in the rule found by the ant (after
pruning) is increased in proportion to the quality of
that rule
This corresponds to increasing the probability of
termij being chosen by other ants in the future in
proportion to the quality of the rule
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
35/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Pheromone Updating
Pheromone updating for a termij is performed for all
terms termij that occur in the rule
R is the set of terms occurring in the rule constructed bythe ant at iteration t
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
36/39
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
DATA MINING WITH AN ANT COLONY
OPTIMIZATION ALGORITHM
Rule Quality
The quality of a rule is computed according to:
TP is the number of cases covered by the rule that have
the same class label as the rule
FP is the number of cases covered by the rule that have
class label different than that of the rule
FN is the number of cases that are not covered by the rulebut have the same class label as that of the rule
TN is the number of cases that are not covered by the rule
and which do not have the same class label as that of the
rule
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
37/39
EXPERIMENTSEXPERIMENTS
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
38/39
EXPERIMENTSEXPERIMENTS
-
8/6/2019 Lecture+32 33+Data+Mining+With+ACO
39/39
CONCLUSIONCONCLUSION
As compared to CN2, the predictive accuracy of Ant
Miner gives better results in four data sets while CN2
obtained better results in one data set
On the other hand, Ant Miner has consistently found
much smaller rule lists than CN2