a taxonomy of similarity mechanisms for case-based reasoning

18
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning Pa´ draig Cunningham TKDE, Vol.21, 2009, pp. 1532–1543. Presenter : Wei-Shen Tai 2009/11/17

Upload: zed

Post on 09-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

A Taxonomy of Similarity Mechanisms for Case-Based Reasoning. Pa´ draig Cunningham TKDE, Vol.21, 2009, pp. 1532–1543. Presenter : Wei- Shen Tai 200 9 / 11/17. Outline. Introduction Representation Similarity measures Direct similarity mechanisms Transformation-based measures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

A Taxonomy of Similarity Mechanisms for Case-Based Reasoning

Pa´ draig Cunningham

TKDE, Vol.21, 2009, pp. 1532–1543.

Presenter : Wei-Shen Tai

2009/11/17

Page 2: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

2

Outline

Introduction Representation Similarity measures

Direct similarity mechanisms Transformation-based measures Information-theoretic measures Emergent measures

Implications for CBR research Conclusion Comments

Page 3: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

3

Motivation

Similarity is central to CBR More recently, a number of novel mechanisms have

emerged that introduce interesting alternative perspectives on similarity.

Page 4: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

4

Objective

Novel SM mechanisms review Present a taxonomy of similarity mechanisms that places

these new techniques in the context of established CBR techniques.

Page 5: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

5

Feature value representation In terms of case attributes or instance. Enhancement

Discover word associations in a text corpus and then use these associations to add terms to the representation. Bill Gates - > software, CEO, mircrosoft

Allow texts to be represented by more features.

Page 6: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

6

Structural representations Hierarchical structure

Features value themselves reference nonatomic objects. Network structure

Typically a semantic network The Semantic Web describes the relationships between things (like

tire is a part of car and John Lennon was a member of the Beatles) and the properties of things (like size, weight, age, and price)

Flow structure Share many of the characteristics of hierarchical and

network representations. For example, work or job.

Page 7: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

7

String and sequence representations

The most straightforward representation for free text. (non-structure data) It supports similarity assessment is the bag-of-words

strategy from information retrieval.

Page 8: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

8

Direct similarity mechanisms

Similarity and distance metrics k-NN

Set-theoretic measures Jaccard index, Dice similarity

Kullback-Leibler Divergence and the χ2 Statistic Compare two images described as histograms.

Symbolic attributes in taxonomies Case representation is organized by feature values

into a taxonomy of is-a relationships.

root

tea

Green tea Black tea

carbonated

PepsiCola

Page 9: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

9

Transformation-based measures I

Edit Distance the number of editing to transform one string.

From cat to rat is 1, from cats to cat is 1.

Alignment Measures for Biological Sequences A variety of sequence alignment in biology (DNA).

Page 10: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

10

Transformation-based measures II

Earth mover distance A transformation-based distance for image data.

Page 11: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

11

Transformation-based measures III

Similarity for networks and graphs Structure mapping engine (SME)

Identify the appropriate mapping between the two domains.

Page 12: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

12

Information-theoretic measures

It works directly on the raw case representation Compression-based similarity for text

Two very similar documents, the compressed size of both them will not be much greater than one.

Information-based similarity for biological sequences Specialized algorithms are required to compress them

Similarity in a taxonomy Distinguish the weight of is-a relationship between features.

A taxonomy can be quantified as the negative log likelihood. Similarity is the common parent node with the highest value.

Page 13: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

13

Emergent measures I

Random forests An ensemble of decision trees.

For each ensemble member (n > N), build a decision tree for them with less selected features (m >> M).

Track the frequency with which cases are located at the same leaf node.

Two features get more shared leaf frequency means they are more similar as well.

Page 14: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

14

Emergent measures II

Cluster kernels A semi-supervized learning, where only some of the

available data are labeled. Class labels do not change in regions of high density. Cluster kernels allow the unlabelled data to influence similarity.

where K(xi, xj)orig is a basic neighborhood kernel and K(xi, xj)bag is a kernel derived from repeated clustering of all the data.

Page 15: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

15

Emergent measures III

Web-based kernel Text snippet similarity by documents returned in

Web search.

Page 16: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

16

Implications for CBR research

Vocabulary knowledge container In some circumstances (e.g., information-theoretic

measures) the role of the similarity knowledge container is increased.

Speeding up technique New methodologies are typically computationally

intensive, the importance of strategies for speeding up case-retrieval is increased.

Page 17: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

17

Conclusions

Similarity measurement taxonomy Organize the broad range of strategies for similarity

assessment in CBR into a coherent taxonomy.

Improve effectiveness of CBR Alternative metrics simply offer better accuracy

because it embodies specific knowledge about the data.

Page 18: A Taxonomy of Similarity Mechanisms  for Case-Based Reasoning

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

18

Comments Advantage

This paper introduces and discusses those alternative metrics of similarity assessment for CBR.

Drawback .

Application Similarity measurement.