active learning for class imbalance problem. problem to be addressed motivation class imbalance...

22
Active Learning for Class Imbalance Problem

Upload: shauna-simon

Post on 27-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Active Learning for Class Imbalance Problem

Page 2: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Problem to be addressed Motivation

class imbalance problem referring to the situation that at least one of class having significantly

less number of training examples or examples in training data belonging to one class heavily outnumber

the examples in the other class

Currently, most of the machine learning algorithms assume the training data to be balanced, support vector machine, logistic regression, naïve bayesian classifier etc,.

During the last few decades, some effective methods have been proposed to attack this problem, like up-sampling, down-sampling and asymmetric bagging, etc,.

Page 3: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Problem to be addressed

Detailed problem Traditional machine learning algorithms are often

biased toward the majority class

Since the goal of the classifiers is to reduce the training error, not taking the data distribution into consideration

Consequently, examples from the majority class are well-classified while the examples from minority class tend to be misclassified

Page 4: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Several Common Approaches

From the data perspective Over-sampling Under-sampling Asymmetric Bagging

From the learning algorithm perspective Adjusting the cost function Tuning the related parameters

Page 5: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Background Knowledge Active Learning

Similar to semi-supervised learning method, the key idea is to use both the labeled and unlabeled data for classifier training.

Active learning is composed of four components A small set of labeled training data, a large pool of unlabeled data, a

based learning algorithm and an active learner (selection strategy)

Active learning is not a machine learning algorithm, It can be seen as a enhancing wrapper method

The difference between semi-supervised learning and active learning

Page 6: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Background Knowledge

Active Learning Goals of active learning

Maximizing the learning performance while minimizing the required labeled training examples

Achieving better performance using the same amount of labeled training data

Needing less training samples to obtain the same learning performance

Page 7: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Background Knowledge

Page 8: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Background Knowledge

Page 9: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

An Example

SVM-based Active Learning A small set of labeled training examples A large pool of unlabeled data Base learning algorithm SVM Active Learner (selection strategy)

Instances closest to the current separating hyperplane are selected and asks for human labeling

Page 10: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Problems

SVM-based Active Learning In classical active learning methods, the most informative samples

are selected from the entire unlabeled pool

In other words, each iteration of active learning involves the computation of distance of each sample to the decision boundary

For large-scale data set, it is time-consuming and computationally inefficient

Page 11: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Paper Contribution

Proposed method Instead of querying the whole unlabeled

pool , a subset is first selected

Select the closed sample from using the criterion that is among the top closest instances with probability

Page 12: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Paper Contribution

Proposed Method The probability that at least one of the L

instances is among the closest is We have

Page 13: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Paper Contribution

Proposed Method For example

The active learner will pick one instance, with 95% probability, that is among the top 5% closed instances to the separating hyperplane, by randomly sampling only instances regardless of the training set size

Page 14: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments

Page 15: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments Evaluation Metric

g-means

where sensitivity and specifity are the accuracies of the positive and negative instances respectively

Page 16: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments

Page 17: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments

Page 18: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments

Page 19: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Experiments

Page 20: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of

Conclusions This paper propose a method to address the class

imbalance problem using active learning technique

Experimental results show that this approach can achieve a significant decrease in the training time, while maintaining the same or even higher g-means value by using less number of training examples

Active selection of informative examples from a randomly selected subset avoid searching the whole unlabeled pool

Page 21: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of
Page 22: Active Learning for Class Imbalance Problem. Problem to be addressed Motivation class imbalance problem referring to the situation that at least one of