supporting personalized ranking over categorical attributes
Post on 22-Feb-2016
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Supporting personalized ranking over categorical attributes
Presenter : Lin, Shu-HanAuthors : Gae-won You, Seung-won Hwang, Hwanjo Yu
Information Sciences 178(2008)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
3
Categorical attributes’ problem of information retrieval's personal ranking
Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute.
For example, how can we… Find old female with the preference of soda drink.
Name age Gender FavoriteDrink Buy
Jane 30 Female Coke Coke, MilkMary 25 Female Pepsi Coke, PepsiTom 21 Male Water Milk, Water
Denny 26 Male Coke Milk, JuiceTina 11 Female Pepsi Red Wine, Pepsi
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes.
Support ranking of binary representation of categorical attribute Binary encoding
Sparsity
4
Name Female Jane 1Mary 1Tom 0
Denny 0Tina 1
Name Coke Pepsi WaterJane 1 0 0Mary 0 1 0Tom 0 0 1
Name Coke Pepsi WaterJane 1 0 1Mary 1 1 0Tom 1 0 1
Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Overview
5
( 1)
( 2)( 3)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Rank formulation
6F= 0.5*age + 3*female + …
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Rank processing (TA)
A Simple example query:Find old female with the preference of soda drink.
Transform into
F= age + female
1. Candidate identification1. Sorted Access age and female
2. Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2}
2. Candidate reduction1. O1=30+0
2. O2=25+1
3. O1 with the highest F score
3. Termination1. O1 !> F(30,1)=31 // upper bound score
2. Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3}
7
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap – binary encoding
F=v1+v2+v3+v4, k=21) K={}, C={1111}( Initailization)2) OID=excute(C)3) OID={o4},|OID|>0,K={[o4,4]}4) C={0111/1011/1101/1110} ( Expansion)5) K.count < k, Back to 2)6) …
8
v1 v2 v3 v4
O1 1 0 1 1
O2 0 1 0 0
O3 0 1 1 1
O4 1 1 1 1
o5 1 0 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap– sparsity
Single-valued attributeF=w1v1+w2v2+…+w6v6
ranked weightw1 w2 w3;w4 w5 w6≧ ≧ ≧ ≧for simple, all w=1,k=2
1) K={}, C={100.100.100} ( Initailization)2) OID=excute(C)3) OID={o4},|OID|>0, K=OID={[o4,2]}4) C={010.100.100/ 100.010.100/100.100.010} ( Expansion)5) K.count<k, Back to 2)6) …
9
Attribute1 Attribute2 Attribute2
v1 V2 V3 V4 V5 V6 V4 V5 V6
O1 1 0 0 0 0 1 0 1 0
O2 0 1 0 0 1 0 1 0 0
O3 0 1 0 1 0 0 0 1 0
O4 1 0 0 1 0 0 1 0 0
o5 0 0 1 0 0 1 0 1 0
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap– sparsity
Multi-valued attribute with bounded cardinality
10
Attribute1 Attribute2v1-1 V1-2 V1-3 V1-4 V2-1 V2-2 V2-3
O1 1 0 0 1 1 0 1
O2 0 1 0 1 1 1 0
O3 0 1 1 0 1 1 0
O4 1 0 0 1 1 0 1
o5 0 0 1 1 0 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
11
• UCI’s sparsity of indicating variable• 22% of dataset consist only the categorical attributes.
• 56% of combination of numerical & categorical attributes.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – synthetic data
12
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – real-life data
13
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
14
Conclusions This paper studies
How to support rank formulation Processing over data with categorical attributes Instead of adopting existing numerical algorithms, develop a bitmap-based approach to
Binary encoding Sparsity
Single-valued Multi-valued with bounded cardinality
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
15
Comments
Advantage …
Drawback …
Application …
top related