mining favorable facets
DESCRIPTION
Mining Favorable Facets. Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University). KDD ’ 07, August 12-15, 2007, San Jose, California, USA. Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
Mining Favorable FacetsRaymond Chi-Wing Wong (the Chinese University of
Hong Kong)Jian Pei (Simon Fraser University)
Ada Wai-Chee Fu (the Chinese University of Hong Kong)
Ke Wang (Simon Fraser University)
KDD’ 07, August 12-15, 2007, San Jose,California, USA
Outline
1. Introduction2. Skyline3. Algorithm4. Empirical Study5. Conclusion
1. IntroductionSuppose we want to look for a vacation package
3 packages
Package ID Price Hotel-class
aa 44
bb 24002400 11
c 3000
Suppose we compare package a and b
We want to have cheaper price.
We want have a higher hotel-class.
We know that package a is “better”
than package bbecause 1. Price of package a is smaller2. Hotel-class of package a is
higher
Package a “dominates” package b5
10001000
1. IntroductionPackage ID Price Hotel-
class
aa 10001000 44
bb 24002400 11
c 3000 5
Thus, we do not need to consider package b.
We know that 1. Package a has a cheapest price2. Package c has a highest hotel-class
Packge a and c don’t dominate by other points
Thus, package a and package c are all of the “best” possible choices.
We call that package a and package c are skyline points.
Package ID
Price Hotel-class
Hotel-group
a 10001000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 55 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
6 packages
Suppose we want to look for a vacation package
Different customers may have different preferences on Hotel-group.
Suppose a customer have the
following preferences. H < T < MThe skyline points are packages a and c.
Suppose another customerhave the following
preferences. H < M < TThe skyline points are packages a, c and e.
In other words, different preferences give differentn skyline points.
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
ff 30003000 33 M (Mozilla)M (Mozilla)
Customer
Preference on Hotel-group
Skyline
Alice T < M {a, c}
Bob No special preference{a, c, e, ff}
Chris H < M {a, c, e}
David H < M < T {a, c, e}
Emily H < T < M {a, c}
Fred M < T {a, c, e, ff}
What preferences makepackage f a skyline point?
Suppose hotel-group Mozillawants to promote its own packages (e.g., package f) topotential customers.
Bob and Fred are the potential customers.
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a package, we want to
find what preferences or conditions that
this package is a skyline point?
Favorable facetsFavorable facets
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a package, we want to
find what preferences or favorable facets
that this package is a skyline point?
{}SKY={a, c, e, f}
SKY={a,c}{T < M} {H < M}
SKY={a,c,e}{T < H}
SKY={a,c,e,f}{H < T}SKY={a,c,e,f}
{M < T}SKY={a,c,e,f} SKY={a,c,e,f}
{M < H}
{T < M, H < M} {T < M, T < M} {H < T, H < M} {T < H, M < H} …SKY={a,c} SKY={a,c} SKY={a,c,e} SKY={a,c,e,f}
{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}
TSKY={}
We can solve the problem by a naive method: Lattice Search
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a package, we want to
find what preferences or favorable facets
that this package is a skyline point?
{}SKY={a, c, e, f}
SKY={a,c}{T < M} {H < M}
SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f}
{T < M, H < M} {T < M, T < M} {H < T, H < M} …SKY={a,c} SKY={a,c} SKY={a,c,e}
{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}
TSKY={}
We can solve the problem by a naive method: Lattice Search
Consider package f
Preferences:{}, {T < H}, {H < T}, {M < T}{M < H}
,, {T < H, M < H}
SKY={a,c,e,f}
{T < H}
{H < T}
{M < T} {M < H}
{T < H, M < H}
We need to compute all skyline points for each possible preference
There are many preferences which qualify package f as a skyline point
This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a package, we want to
find what preferences or favorable facets
that this package is a skyline point?
{}SKY={a, c, e, f}
SKY={a,c}{T < M} {H < M}
SKY={a,c,e} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f} SKY={a,c,e,f}
{T < M, H < M} {T < M, T < M} {H < T, H < M} …SKY={a,c} SKY={a,c} SKY={a,c,e}
{T < M, T < M, H < M} {T < M, T < M, M < H}SKY={a,c} SKY={a,c}
TSKY={}
We can solve the problem by a naive method: Lattice Search
Consider package f
SKY={a,c,e,f}
{T < H}
{H < T}
{M < T} {M < H}
{T < H, M < H}
border for f
We find that whenever the preference contains “T < M” or “H <
M”, package f is not a skyline
point.
We can say that “T < M” or “H < M” is a minimal disqualifying condition (MDC).
3. Algorithm
How to find MDCs of a point?
Problem: Given a package, we want to
find what minimal conditions thatthis package is NOT a skyline point?
3. Algorithm
Package ID
Price Hotel-class
Hotel-group
a 1000 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)Point q is said to quasi-dominate point p ifall attributes of point q are NOT worse than those of point p.
e.g. Package a quasi-dominates package f because
1. Package a has a lower (or better) price than package f2. Package a has a higher (or better) hotel-class than package f
If package a quasi-dominates package f, we define Raf as follows. {T < M}
3. Algorithm
Two Algorithms MDC-O: Computing MDC On-the-fly
Does not store MDCs of points Compute MDC of a given points on-the-fly
MDC-M: A Materialization Method Store MDCs of all points
Indexing Method for Speed-up R*-tree
Problem: Given a package, we want to
find what minimal conditions thatthis package is NOT a skyline point?
3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm
Given data point p
Variable MDC(p): minimal disqualifying condition
Algorithm MDC(p) For each data point q which quasi-dominates p
if MDC(p) does not contain Rqp
insert Rqp to MDC(p) Return MDC(p)
Problem: Given a package, we want to
find what minimal conditions thatthis package is NOT a skyline point?
3.2 MDC-M: A Materialization Method
Materialization Algorithm Variable
MDC(p): minimal disqualifying condition Algorithm MDC(p)
For each data point p For each data point q which quasi-dominates p
if MDC(p) does not contain Rqp
then insert Rqp to MDC(p) Store MDC(p)
Problem: Given a package, we want to
find what minimal conditions thatthis package is NOT a skyline point?
4. Empirical Study Datasets
Synthetic Dataset Real Dataset (from UCI)
Nursery Dataset Automobile Dataset
Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20
4. Empirical Study
Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MBWith indexing: MDC-O and MDC-M: Fast Search Time
4. Empirical Study Automobile
Three car models
Car MDC
Honda “Toyota < Honda”
Mitsubishi“Honda < Mitsubishi” or “Toyota < Mitsubishi”
Toyota -
A salesperson should NOT promote this car to the customer who prefers Toyota to Honda.
A salesperson should NOT promote this car to the customer who prefers Toyota to Honda.
A salesperson should promote this car to ANY customers.
5. Conclusion
Skyline Favorable Facets
Minimal Disqualifying Condition Algorithm
On-the-fly Materialization
Empirical Study