identifying competence-critical instances for instance-based learners

21
Identifying Competence-Critical Identifying Competence-Critical Instances for Instance-Based Instances for Instance-Based Learners Learners 2001. 5. 9 Presenter: Kyu-Baek Hwang

Upload: yered

Post on 05-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Identifying Competence-Critical Instances for Instance-Based Learners. 2001. 5. 9 Presenter: Kyu-Baek Hwang. Abstract. The basic nearest neighbor classifier with a large dataset Classification accuracy and response time Review on past works tackling these problems No consistent method - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying Competence-Critical Instances for Instance-Based Learners

Identifying Competence-Critical Instances Identifying Competence-Critical Instances for Instance-Based Learnersfor Instance-Based Learners

2001. 5. 9

Presenter: Kyu-Baek Hwang

Page 2: Identifying Competence-Critical Instances for Instance-Based Learners

AbstractAbstract

The basic nearest neighbor classifier with a large dataset Classification accuracy and response time Review on past works tackling these problems No consistent method Insight into the problem characteristics Iterative case filtering (ICF) algorithm

Page 3: Identifying Competence-Critical Instances for Instance-Based Learners

IntroductionIntroduction

Harmful and superfluous instances are stored. Selectively store instances (or delete stored instances) The data miner have to gain an insight into the structure of

the classes in the instance space. The experimental comparison of RT3 and ICF Neither algorithm performs better in all cases.

Page 4: Identifying Competence-Critical Instances for Instance-Based Learners

Defining the ProblemDefining the Problem

Two practical issues that arise in this area Instance removal (retain only the critical instances) Different approaches according to the type of the classification

problem

The same (or higher) accuracy and the less storage Which instance should be deleted?

Page 5: Identifying Competence-Critical Instances for Instance-Based Learners

Four Cases Where NNC FailsFour Cases Where NNC Fails

Noisy instance Close to the interclass border

Border instances are critical in general.

Small region defining the class Small k values cope with this kind of problem.

Unsolvable problem

Page 6: Identifying Competence-Critical Instances for Instance-Based Learners

Instance Space StructureInstance Space Structure

Two categories of instance space structure Homogeneous region (locality) Non-homogeneous region (no locality)

Page 7: Identifying Competence-Critical Instances for Instance-Based Learners

Which Instances Are Critical?Which Instances Are Critical?

Prototypes For non-homogeneous regions

Instances with high utility Needs classification feedback

Instances which lie on borders are almost always critical.

Page 8: Identifying Competence-Critical Instances for Instance-Based Learners

ReviewReview

Competence enhancement By removing noisy or corrupt instances

Competence preservation By removing superfluous instances

Hybrid approach Many modern approaches

Page 9: Identifying Competence-Critical Instances for Instance-Based Learners

Competence EnhancementCompetence Enhancement

Stochastic noise Wilson Editing

All instances which are incorrectly classified by their nearest neighbors are assumed to be nosy instances.

Smoothing effect Empirically tested

Noisy instances and genuine exceptions

Page 10: Identifying Competence-Critical Instances for Instance-Based Learners

Competence PreservationCompetence Preservation

Condensed nearest neighbor (CNN) Look for cases for which removal does not lead to additional miss-

classification

Chang’s algorithm (Korean) Merging two instances into one synthetic instance (the prototype)

Footprint deletion policy Local-set of a case c The set of cases contained in the largest hypersphere centered on c

such that only cases in the same class as c are contained in the hypersphere.

Page 11: Identifying Competence-Critical Instances for Instance-Based Learners

Footprint Deletion PolicyFootprint Deletion Policy

For a case-base CB = {c1, c2, …, cn} Coverage(c) = {c’ CB: c’ Local-set(c)} Reachable(c) = {c’ CB: c Local-set(c’)}

Pivotal group With an empty reachable set

Delete the instance with large local-set

Page 12: Identifying Competence-Critical Instances for Instance-Based Learners

Hybrid Approaches (1/2)Hybrid Approaches (1/2)

IB2 (on-line) If a new case to be added can already be classified correctly on the

basis of the current case-base, the case is discarded.

IB3 IB2 with time delay

The order of presentation is crucial for IB2 and IB3. RT1

k nearest neighbor Associates of the case p are the cases that have p as their k nearest

neighbor. The instance which has many associates is tested and removed.

Page 13: Identifying Competence-Critical Instances for Instance-Based Learners

Hybrid Approaches (2/2)Hybrid Approaches (2/2)

RT2 is identical to RT1 and additionally, Cases furthest from their nearest enemy are removed first. Removed associates still guide the deletion process.

RT3 is identical to RT2 and additionally, Wilson’s noise filtering step is executed first.

RT algorithms are analogous to the footprint deletion policy.

Page 14: Identifying Competence-Critical Instances for Instance-Based Learners

An Iterative Case Filtering AlgorithmAn Iterative Case Filtering Algorithm

Coverage set and reachable set RTn algorithm

Associate set of fixed size

Remove cases which have a reachable set size greater than the coverage set size. Intuitively, this approach removes the cases that are far from the b

order.

A noisy case will have a singleton reachable set and a singleton coverage set. This property protects the noisy case from being removed. Wilson Editing

Page 15: Identifying Competence-Critical Instances for Instance-Based Learners

ICF AlgorithmICF Algorithm

Page 16: Identifying Competence-Critical Instances for Instance-Based Learners

How The ICF Algorithm Proceeds?How The ICF Algorithm Proceeds?

Page 17: Identifying Competence-Critical Instances for Instance-Based Learners

ExperimentsExperiments

Experiments on 30 datasets from UCI repository Maximum number of iterations: 17

switzerland database

In general, 3 iterations are required.

Page 18: Identifying Competence-Critical Instances for Instance-Based Learners

Reduction ProfilesReduction Profiles

The percentage of cases removed after each iteration switzerland database: 17 iterations, 2 – 13% (complicated) zoo database: 2 iterations, 37% (simple structure)

Page 19: Identifying Competence-Critical Instances for Instance-Based Learners

Comparative EvaluationComparative Evaluation

(1) Early approaches CNN, RNN, SNN, Chang, Wilson Editing, repeated Wilson

Editing, and all k-NN

(2) Recent editions IB2, IB3, TIBLE, and IBL-MDL

(3) State of the art RT3 and ICF

Page 20: Identifying Competence-Critical Instances for Instance-Based Learners

RT3 and ICFRT3 and ICF

Page 21: Identifying Competence-Critical Instances for Instance-Based Learners

ConclusionsConclusions

The structure of the instance space is important. ICF and RT3 behave in very similar way.

The intrinsic properties of them are similar. 80% of removal and the little degradation of accuracy.

The reduction profile provides some insights into the property of the problem.