miao duoqian, qian jin, li wen, zhang zehua

25
Mining Hierarchical Decision Rules from Hybrid Data with Categorical and Continuous Valued Attributes Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Upload: lula

Post on 09-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Mining Hierarchical Decision Rules from Hybrid Data with Categorical and Continuous Valued Attributes. Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua. Outline. Introduction. Similarity-based Rough Set Model. Attribute reduction. Mining Hierarchical decision rules. Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Mining Hierarchical Decision Rules from Hybrid Data with

Categorical and Continuous Valued Attributes Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Page 2: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Outline

Conclusion

Mining Hierarchical decision rules

Attribute reduction

Similarity-based Rough Set Model

Introduction

Page 3: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Introduction

Rough set theory, proposed by Pawlak, is a useful mathematical framework to deal with imprecise, uncertain information.

Classical attribute reduction methods mainly deal with categorical data.

In practice, there exist continuous-valued (numerical) attributes in real application systems.

Page 4: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Discretization methods These methods are too categorical and may bring

information loss in some cases because the degrees of membership of numerical values to discretized values are not considered.

Existing Methods

Extended rough set model Fuzzy rough set model Tolerance rough set model Neighborhood rough set model Similarity rough set model ……

Page 5: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity rough set model

Decision rule

Attribute reduction

Similarity class

Similarity relation

Page 6: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

SimilarityThe similarity class of x, denoted by R(x), is the set

of objects which are similar to x.

( ) { : }R x y U yRx 1( ) { : }R x y U xRy

Notice that the statements yRx, which means “y is similar to x”, is directional. It has a subject y and a referent x.

Page 7: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Symmetry and Transitivity?Symmetry?The most controversial property is symmetry.Although yRx is directional, most authors dealing

with similarity relation do impose this property.

Transitivity? Imposing transitivity to R is even more

questionable.The reason for this is that, sometimes, a series of

negligible differences cannot be propagated.

Page 8: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity Measure

max min

| ( ) ( ) |( , ) 1| |aa x a ySIM x ya a

1,( , ) .

0,a

if x ySIM x y

if x y

For numerical attributes

For categorical attributes

Page 9: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity

( , ) , ( , )P a ax y R a P SIM x y

( , )( , )

| |a P a

PSIM x y

x y RP

Local similarity

Global similarity

( , ) ( , )P aa P

x y R SIM x y

Page 10: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

If a global similarity measure threshold equals 1, the similarity-based rough set model degenerates into classical rough set model.

Researchers pointed out empirically that in some contexts, similarity does not necessarily have features like symmetry or subadditivity implied by distance measures.

Page 11: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

New Similarity Distance Measure

( , ) min{ ( , ) | }P ax y R SIM x y a P

Page 12: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity distance measure?This inherent weakness of the distance-based

similarity measure comes from a lack of consideration of the contribution of the similarity direction when comparing the similarity of two objects.

Page 13: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity direction measure

( , )ia y x ( ) ( )

max( ) min( )i i

i i

a y a xa a

=

……na1a 2a 1na

……1a 2a 1na na

Fig1 Same direction

……na

1a2a

1na

…… na1a

2a1na

Fig2 Different direction

Page 14: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity direction measure

Definition 9. Given two objects x and y, the similarity direction measure of both objects is defined as

( , )D y x1

1 ( , )m

ii

a y xm

=

If D (y, x) >=0, the object y is similar to x; otherwise y is dissimilar to x.

Page 15: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

However, if we employ such similarity direction measure, similarity relation is not symmetric in most cases, even if the similarity direction differences between two objects are very small.

Furthermore, each similarity direction measure may not possess subadditivity.

Page 16: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Definition 10. Given two objects x and y, the similarity direction measure of both objects is defined as

( , )D y x max{ ( , ) | }i ia y x a P

min{ ( , ) | }i ia y x a P

= .

If D (y, x)>= , the object y is similar to x; otherwise y is dissimilar to x.

In general, the same similarity direction is good. Here we give a constraint parameter to extend similarity.

Page 17: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Similarity relation

Construction of a rational, reliable and practical similarity measure is a fundamental and substantial research topic in the field of decision making, otherwise the accuracy and validity of a similarity measure could be challenged.

Page 18: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Attribute reductionx U

( , , )DT P ( , , )IDT P All consistent objects set and inconsistent objects set are denoted by and

Definition 11. Let DT be a decision table, and , we will say that x is a consistent object under similarity measure parameters and if for all y; otherwise x is an inconsistent object.

,( , ) ( , ) ( ) ( )Px y R D x y d x d y

P A

Page 19: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Attribute reduction

x U

Definition 12. Let DT be a decision table, and , we will say that x and y are dissimilar under similarity measure parameters and if .

x U

x U P A

,( , ) ( , )Px y R D x y

Definition 13 Let DT be a decision table, and , the discernibility matrix = is defined as

P A, ,PM , ,{ ( , )}pm x y

,

, , ,

{ | ( , ) } { | ( , ) )} , ( , , ) ( ) ( )

( , ) { | ( , ) } { | ( , ) )} ( , , ) ( , , )a

P a

a x y R P D x y x y DT P d x d y

m x y a x y R P D x y x DT P y IDT P

Page 20: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Mining Hierarchical decision rules

Page 21: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Example

Company Asset profit type of product credit

1 105 67 computer software bad

2 54 75 automobile good3 80 93 automobile bad4 64 80 automobile good5 92 92 computer hardware good6 96 102 computer hardware good7 111 65 computer software bad8 58 70 automobile good9 74 77 automobile bad10 105 105 computer hardware good11 85 82 automobile bad

Page 22: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Decision rules

2

11

4 9

6

7

8

5

1

10

3

Fig 3. A similarity relation graph with =0.75 and =-0.01

Page 23: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

2

11

4 9

6

7

8

5

1

10

3

?

Without considering similarity direction parameter, we can not discern object 4 and object 9 under =0.75. In such case, we will generate some inconsistent decision rules.

Fig 4. A similarity relation graph with =0.75

Choosing a level in concept hierarchy, we can mine hierarchical decision rules.

Page 24: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua

Conclusion

This paper mainly discusses similarity distance measure and similarity direction measure, and proposes an algorithm for mining hierarchical decision rules .

Future work Both theoretical and experimental comparison of

mining hierarchical decision rules.

Page 25: Miao Duoqian, Qian Jin, Li Wen, Zhang Zehua