cost-sensitive bayesian network algorithm introduction: machine learning algorithms are becoming an...
TRANSCRIPT
Cost-Sensitive Bayesian Network algorithm
Introduction:
Machine learning algorithms are becoming an increasingly important area for research and application in the field of Artificial Intelligence and data mining. One of the most important algorithm is Bayesian network, this algorithm have been widely used in real world applications like medical diagnosis, image recognition, fraud detection, and inference problems. In all of these applications, evaluation method as accuracy is not enough because there are costs involve each decision. For example, in a fraud detection application to predict new case, there are several costs involved when the classifier predicts a fraudulent case as a non-fraudulent case. Also, fraud databases have an unbalanced class distribution which is known to affect learning algorithms adversely. Therefore, this project develops new algorithm
that aims to minimize the costs of prediction, misclassification, imbalance data, time and test .
In this work, we attempt to create a new cost-sensitive Bayesian network learning algorithm by adapting Bayesian network algorithm, which focuses on accuracy only. There are several ways of adapting our algorithm and make it cost-sensitive, this includes: changing distribution of the data; changing the construction process and even adopting alternative measure in the algorithms that take account of cost; and using Genetic Algorithm to learn structure of BN. This work will apply different approaches such as amending distributions, amending formula, and using Genetic algorithms. Finally, an empirical evaluation of
the developed algorithms will be carried on the artificial data sets (e.g diabetes data, lung cancer data, Bank data …etc) .
Conclusion:
In the real world problems such as fraud detection, medical diagnosis, or any decision
problem. Often, one class label in dataset such as (Non-fraud class) is very rare and
expansive than another class, because the cost of not recognizing some of the
instances which belong to the rare class is high. Therefore, most of machine learning
methods do not take cost into account. Thus, those algorithms (cost-insensitive
algorithms) have a poor result, because ignoring cost might produce a very week
model. In reality, misclassification problems (error of classification) are very common
problem in real-world data mining when the data is imbalanced in class label.
Eman [email protected]
University of Salford ,Manchester, UKSponsor in Libya ( Tripoli University )
Hypotheses/The problem Methodology Cost-insensitive Vs. cost-sensitive
(Research problem)
A cost-insensitive classifier focus on accuracy only (class label output)..
Cost-sensitive attempt to minimize the expected cost..
Learner
TrainingData
Classifier
)$43.45,retail,10040, .. nonfraud()$246,70,weapon,94583,.,fraud)
1 .Decision trees2 .Rules
3 .Naive Bayes
...
Transaction {fraud,nonfraud}
Testing dataClassifier Class Labels
nonfraudfraud
)$99.99,pharmacy,10027,...,?()$1.00,gas,00234,...,?(
The previously mentioned problems are happened during classification data set.
Therefore, three methods have been proposed to tackle those problems and
minimize the expected misclassification cost.
Amend the data distribution to reflect cost.
Amend the formula by modifying the statistical measures to include cost.
Utilize a Genetic algorithm to evolve a 'fittest' Bayesian network.
Up to Now, I have investigated experimentally how changing the distribution of data
will affect the performance and cost of a Bayesian classifier. I experiment my
approach that called “Cost-Sensitive Bayesian Network using Sampling” with 24
data sets from the UCI repository database. I try to compare my proposed
algorithm with the existing methods, and also compare the performance of this
proposed method with the original algorithm. In the figure below, I show the results
of Cost-sensitive Bayes Network algorithm via changing the distributions, and the
original Bayes Network algorithm.
Results
Up to now, two new methods for cost-sensitive Bayesian Network algorithms
have been developed and explored: one that uses a black box (Sampling)
approach and another that uses a transparent box approach (modifying the
statistical measures) that amends the selection measure to take account of
costs.
The effect of our algorithms are evaluated and compared with other algorithms,
such as (MetaCost+J4.8, standard decision tree(J48), and standard Bayesian
networks).