mining favorable facets

20
Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese Unive rsity of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) KDD’ 07, August 12-15, 2007, San Jo California, USA

Upload: quentin-reid

Post on 30-Dec-2015

15 views

Category:

Documents


1 download

DESCRIPTION

Mining Favorable Facets. Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University). KDD ’ 07, August 12-15, 2007, San Jose, California, USA. Outline. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mining Favorable Facets

Mining Favorable FacetsRaymond Chi-Wing Wong (the Chinese University of

Hong Kong)Jian Pei (Simon Fraser University)

Ada Wai-Chee Fu (the Chinese University of Hong Kong)

Ke Wang (Simon Fraser University)

KDD’ 07, August 12-15, 2007, San Jose,California, USA

Page 2: Mining Favorable Facets

Outline

1. Introduction2. Skyline3. Algorithm4. Empirical Study5. Conclusion

Page 3: Mining Favorable Facets

1. IntroductionSuppose we want to look for a vacation package

3 packages

Package ID Price Hotel-class

aa 44

bb 24002400 11

c 3000

Suppose we compare package a and b

We want to have cheaper price.

We want have a higher hotel-class.

We know that package a is “better”

than package bbecause 1. Price of package a is smaller2. Hotel-class of package a is

higher

Package a “dominates” package b5

10001000

Page 4: Mining Favorable Facets

1. IntroductionPackage ID Price Hotel-

class

aa 10001000 44

bb 24002400 11

c 3000 5

Thus, we do not need to consider package b.

We know that 1. Package a has a cheapest price2. Package c has a highest hotel-class

Packge a and c don’t dominate by other points

Thus, package a and package c are all of the “best” possible choices.

We call that package a and package c are skyline points.

Page 5: Mining Favorable Facets

Package ID

Price Hotel-class

Hotel-group

a 10001000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 55 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

6 packages

Suppose we want to look for a vacation package

Different customers may have different preferences on Hotel-group.

Suppose a customer have the

following preferences. H < T < MThe skyline points are packages a and c.

Suppose another customerhave the following

preferences. H < M < TThe skyline points are packages a, c and e.

In other words, different preferences give differentn skyline points.

Page 6: Mining Favorable Facets

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

ff 30003000 33 M (Mozilla)M (Mozilla)

Customer

Preference on Hotel-group

Skyline

Alice T < M {a, c}

Bob No special preference{a, c, e, ff}

Chris H < M {a, c, e}

David H < M < T {a, c, e}

Emily H < T < M {a, c}

Fred M < T {a, c, e, ff}

What preferences makepackage f a skyline point?

Suppose hotel-group Mozillawants to promote its own packages (e.g., package f) topotential customers.

Bob and Fred are the potential customers.

Page 7: Mining Favorable Facets

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a package, we want to

find what preferences or conditions that

this package is a skyline point?

Favorable facetsFavorable facets

Page 8: Mining Favorable Facets

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a package, we want to

find what preferences or favorable facets

that this package is a skyline point?

{}SKY={a, c, e, f}

SKY={a,c}{T < M} {H < M}

SKY={a,c,e}{T < H}

SKY={a,c,e,f}{H < T}SKY={a,c,e,f}

{M < T}SKY={a,c,e,f} SKY={a,c,e,f}

{M < H}

{T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} …SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f}

{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}

TSKY={}

We can solve the problem by a naive method: Lattice Search

Page 9: Mining Favorable Facets

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a package, we want to

find what preferences or favorable facets

that this package is a skyline point?

{}SKY={a, c, e, f}

SKY={a,c}{T < M} {H < M}

SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f}

{T < M, H < M} {T < M, T < M} {H < T, H < M} …SKY={a,c} SKY={a,c} SKY={a,c,e}

{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}

TSKY={}

We can solve the problem by a naive method: Lattice Search

Consider package f

Preferences:{}, {T < H}, {H < T}, {M < T}{M < H}

,, {T < H, M < H}

SKY={a,c,e,f}

{T < H}

{H < T}

{M < T} {M < H}

{T < H, M < H}

Page 10: Mining Favorable Facets

We need to compute all skyline points for each possible preference

There are many preferences which qualify package f as a skyline point

This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.

Page 11: Mining Favorable Facets

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a package, we want to

find what preferences or favorable facets

that this package is a skyline point?

{}SKY={a, c, e, f}

SKY={a,c}{T < M} {H < M}

SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f}

{T < M, H < M} {T < M, T < M} {H < T, H < M} …SKY={a,c} SKY={a,c} SKY={a,c,e}

{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}

TSKY={}

We can solve the problem by a naive method: Lattice Search

Consider package f

SKY={a,c,e,f}

{T < H}

{H < T}

{M < T} {M < H}

{T < H, M < H}

border for f

We find that whenever the preference contains “T < M” or “H <

M”, package f is not a skyline

point.

We can say that “T < M” or “H < M” is a minimal disqualifying condition (MDC).

Page 12: Mining Favorable Facets

3. Algorithm

How to find MDCs of a point?

Problem: Given a package, we want to

find what minimal conditions thatthis package is NOT a skyline point?

Page 13: Mining Favorable Facets

3. Algorithm

Package ID

Price Hotel-class

Hotel-group

a 1000 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)Point q is said to quasi-dominate point p ifall attributes of point q are NOT worse than those of point p.

e.g. Package a quasi-dominates package f because

1. Package a has a lower (or better) price than package f2. Package a has a higher (or better) hotel-class than package f

If package a quasi-dominates package f, we define Raf as follows. {T < M}

Page 14: Mining Favorable Facets

3. Algorithm

Two Algorithms MDC-O: Computing MDC On-the-fly

Does not store MDCs of points Compute MDC of a given points on-the-fly

MDC-M: A Materialization Method Store MDCs of all points

Indexing Method for Speed-up R*-tree

Problem: Given a package, we want to

find what minimal conditions thatthis package is NOT a skyline point?

Page 15: Mining Favorable Facets

3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm

Given data point p

Variable MDC(p): minimal disqualifying condition

Algorithm MDC(p) For each data point q which quasi-dominates p

if MDC(p) does not contain Rqp

insert Rqp to MDC(p) Return MDC(p)

Problem: Given a package, we want to

find what minimal conditions thatthis package is NOT a skyline point?

Page 16: Mining Favorable Facets

3.2 MDC-M: A Materialization Method

Materialization Algorithm Variable

MDC(p): minimal disqualifying condition Algorithm MDC(p)

For each data point p For each data point q which quasi-dominates p

if MDC(p) does not contain Rqp

then insert Rqp to MDC(p) Store MDC(p)

Problem: Given a package, we want to

find what minimal conditions thatthis package is NOT a skyline point?

Page 17: Mining Favorable Facets

4. Empirical Study Datasets

Synthetic Dataset Real Dataset (from UCI)

Nursery Dataset Automobile Dataset

Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20

Page 18: Mining Favorable Facets

4. Empirical Study

Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MBWith indexing: MDC-O and MDC-M: Fast Search Time

Page 19: Mining Favorable Facets

4. Empirical Study Automobile

Three car models

Car MDC

Honda “Toyota < Honda”

Mitsubishi“Honda < Mitsubishi” or “Toyota < Mitsubishi”

Toyota -

A salesperson should NOT promote this car to the customer who prefers Toyota to Honda.

A salesperson should NOT promote this car to the customer who prefers Toyota to Honda.

A salesperson should promote this car to ANY customers.

Page 20: Mining Favorable Facets

5. Conclusion

Skyline Favorable Facets

Minimal Disqualifying Condition Algorithm

On-the-fly Materialization

Empirical Study