intelligent database systems lab n.y.u.s.t. i. m. supporting personalized ranking over categorical...

15
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu Information Sciences 178(2008)

Upload: posy-norton

Post on 13-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Supporting personalized ranking over categorical attributes

Presenter : Lin, Shu-Han

Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu

Information Sciences 178(2008)

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

3

Categorical attributes’ problem of information retrieval's personal ranking

Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute.

For example, how can we…

Find old female with the preference of soda drink.

Name age Gender FavoriteDrink

Buy

Jane 30 Female Coke Coke, Milk

Mary 25 Female Pepsi Coke, Pepsi

Tom 21 Male Water Milk, Water

Denny 26 Male Coke Milk, Juice

Tina 11 Female Pepsi Red Wine, Pepsi

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes.

Support ranking of binary representation of categorical attribute Binary encoding

Sparsity

4

Name Female

Jane 1

Mary 1

Tom 0

Denny 0

Tina 1

Name Coke Pepsi Water

Jane 1 0 0

Mary 0 1 0

Tom 0 0 1

Name Coke Pepsi Water

Jane 1 0 1

Mary 1 1 0

Tom 1 0 1

Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Overview

5

( 1)

( 2)

( 3)

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank formulation

6F= 0.5*age + 3*female + …

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank processing (TA)

A Simple example query:Find old female with the preference of soda drink.

Transform into

F= age + female

1. Candidate identification1. Sorted Access age and female

2. Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2}

2. Candidate reduction1. O1=30+0

2. O2=25+1

3. O1 with the highest F score

3. Termination1. O1 !> F(30,1)=31 // upper bound score

2. Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3}

7

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap – binary encoding

F=v1+v2+v3+v4, k=2

1) K={}, C={1111}( Initailization)2) OID=excute(C)

3) OID={o4},|OID|>0,K={[o4,4]}

4) C={0111/1011/1101/1110} ( Expansion)5) K.count < k, Back to 2)

6) …

8

v1 v2 v3 v4

O1 1 0 1 1

O2 0 1 0 0

O3 0 1 1 1

O4 1 1 1 1

o5 1 0 1 1

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Single-valued attributeF=w1v1+w2v2+…+w6v6

ranked weightw1 w2 w3;w4 w5 w6≧ ≧ ≧ ≧for simple, all w=1,k=2

1) K={}, C={100.100.100} ( Initailization)2) OID=excute(C)

3) OID={o4},|OID|>0, K=OID={[o4,2]}

4) C={010.100.100/ 100.010.100/100.100.010} ( Expansion)5) K.count<k, Back to 2)

6) …

9

 Attribute1 Attribute2 Attribute2

v1 V2 V3 V4 V5 V6 V4 V5 V6

O1 1 0 0 0 0 1 0 1 0

O2 0 1 0 0 1 0 1 0 0

O3 0 1 0 1 0 0 0 1 0

O4 1 0 0 1 0 0 1 0 0

o5 0 0 1 0 0 1 0 1 0

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Multi-valued attribute with bounded cardinality

10

Attribute1 Attribute2

v1-1 V1-2 V1-3 V1-4 V2-1 V2-2 V2-3

O1 1 0 0 1 1 0 1

O2 0 1 0 1 1 1 0

O3 0 1 1 0 1 1 0

O4 1 0 0 1 1 0 1

o5 0 0 1 1 0 1 1

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

11

• UCI’s sparsity of indicating variable

• 22% of dataset consist only the categorical attributes.

• 56% of combination of numerical & categorical attributes.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – synthetic data

12

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – real-life data

13

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions

This paper studies How to support rank formulation

Processing over data with categorical attributes

Instead of adopting existing numerical algorithms, develop a bitmap-based approach to Binary encoding Sparsity

Single-valued Multi-valued with bounded cardinality

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage …

Drawback …

Application …