supporting personalized ranking over categorical attributes

15
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu Information Sciences 178(2008)

Upload: adolph

Post on 22-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Supporting personalized ranking over categorical attributes. Presenter : Lin, Shu -Han Authors : Gae -won You, Seung -won Hwang, Hwanjo Yu. Information Sciences 178(2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Supporting personalized ranking over categorical attributes

Presenter : Lin, Shu-HanAuthors : Gae-won You, Seung-won Hwang, Hwanjo Yu

Information Sciences 178(2008)

Page 2: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

3

Categorical attributes’ problem of information retrieval's personal ranking

Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute.

For example, how can we… Find old female with the preference of soda drink.

Name age Gender FavoriteDrink Buy

Jane 30 Female Coke Coke, MilkMary 25 Female Pepsi Coke, PepsiTom 21 Male Water Milk, Water

Denny 26 Male Coke Milk, JuiceTina 11 Female Pepsi Red Wine, Pepsi

Page 4: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes.

Support ranking of binary representation of categorical attribute Binary encoding

Sparsity

4

Name Female Jane 1Mary 1Tom 0

Denny 0Tina 1

Name Coke Pepsi WaterJane 1 0 0Mary 0 1 0Tom 0 0 1

Name Coke Pepsi WaterJane 1 0 1Mary 1 1 0Tom 1 0 1

Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute

Page 5: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Overview

5

( 1)

( 2)( 3)

Page 6: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank formulation

6F= 0.5*age + 3*female + …

Page 7: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank processing (TA)

A Simple example query:Find old female with the preference of soda drink.

Transform into

F= age + female

1. Candidate identification1. Sorted Access age and female

2. Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2}

2. Candidate reduction1. O1=30+0

2. O2=25+1

3. O1 with the highest F score

3. Termination1. O1 !> F(30,1)=31 // upper bound score

2. Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3}

7

Page 8: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap – binary encoding

F=v1+v2+v3+v4, k=21) K={}, C={1111}( Initailization)2) OID=excute(C)3) OID={o4},|OID|>0,K={[o4,4]}4) C={0111/1011/1101/1110} ( Expansion)5) K.count < k, Back to 2)6) …

8

v1 v2 v3 v4

O1 1 0 1 1

O2 0 1 0 0

O3 0 1 1 1

O4 1 1 1 1

o5 1 0 1 1

Page 9: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Single-valued attributeF=w1v1+w2v2+…+w6v6

ranked weightw1 w2 w3;w4 w5 w6≧ ≧ ≧ ≧for simple, all w=1,k=2

1) K={}, C={100.100.100} ( Initailization)2) OID=excute(C)3) OID={o4},|OID|>0, K=OID={[o4,2]}4) C={010.100.100/ 100.010.100/100.100.010} ( Expansion)5) K.count<k, Back to 2)6) …

9

  Attribute1 Attribute2 Attribute2

v1 V2 V3 V4 V5 V6 V4 V5 V6

O1 1 0 0 0 0 1 0 1 0

O2 0 1 0 0 1 0 1 0 0

O3 0 1 0 1 0 0 0 1 0

O4 1 0 0 1 0 0 1 0 0

o5 0 0 1 0 0 1 0 1 0

Page 10: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Multi-valued attribute with bounded cardinality

10

Attribute1 Attribute2v1-1 V1-2 V1-3 V1-4 V2-1 V2-2 V2-3

O1 1 0 0 1 1 0 1

O2 0 1 0 1 1 1 0

O3 0 1 1 0 1 1 0

O4 1 0 0 1 1 0 1

o5 0 0 1 1 0 1 1

Page 11: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

11

• UCI’s sparsity of indicating variable• 22% of dataset consist only the categorical attributes.

• 56% of combination of numerical & categorical attributes.

Page 12: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – synthetic data

12

Page 13: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – real-life data

13

Page 14: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions This paper studies

How to support rank formulation Processing over data with categorical attributes Instead of adopting existing numerical algorithms, develop a bitmap-based approach to

Binary encoding Sparsity

Single-valued Multi-valued with bounded cardinality

Page 15: Supporting personalized ranking over categorical attributes

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage …

Drawback …

Application …