a network-aware approach for searching as-you-type in social media
TRANSCRIPT
a network-aware approach for searchingas-you-type in social media
Paul Lagrée, Bogdan Cautis, Hossein VahabiNovember 6, 2015
Université Paris-Sud
Social interactions in the Web
Social web: new development of the Web - user relationships, theirdata.
Significant portion of the Web:
∙ explicitly social (Facebook, Twitter, Google+)∙ implicitly social built with content (blogs, forums)
User-centric: consumers are generators and evaluators of content.1
Motivation – As-You-Type
∙ Social-aware method(network-based)
∙ As-you-type search, handlingprefixes
∙ Real-time approach (200msmaximum)
∙ Incremental computation: exploitswhat has already been computed
∙ Top-k
2
Social tagging context
Collaborative tagging networks: general abstraction of social media.
Model used in the following work:
∙ users form a social network which is represented as a weighedgraph, weights may reflect similarity, friendship, etc
∙ users tag items with terms. Items may be documents, videos,photos, URLs from a public pool.
Users search for items having tags matching the query
3
Example
∙ Triples (user, item, tag)∙ Weighted similarity networkbetween users (reflectsproximity, friendship,similarity and can becomputed using tags,item-tags, social links). Eachedge σ(u, v) ∈]0, 1]
4
Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α× textual(t, item) + (1− α)× social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
∙ α = 1, we come back to the classical web search∙ α = 0, exclusively social search.
5
Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α× textual(t, item) + (1− α)× social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
∙ α = 1, we come back to the classical web search∙ α = 0, exclusively social search.
5
Social score
The social score is defined as:
social(item | s, t) =∑
v tagged item with tag t
σ+(s, v)
σ+(s, v) corresponds to the extended proximity like pathmultiplication or path maximum.
∙ social(item | s,prefix) = maxt∈completions social(item | s, t)∙ textual(item | s,prefix) = maxt∈completions textual(item | s, t)
6
Completion trie index
[4] ε
[1] ε [2] ip
[2] h[3] g
[2] l
[1] oomy[2] ster [2] pie
[2] asses
[2] oth[1] allow
[4] st
[3] y
[2] lish [3] le
[3] runge
[4] reet
(i4, 2)
(i2, 1)
(i6, 1)
(i3, 1)
(i2, 4)
(i4, 2)
(i2, 1)
(i3, 1)
(i5, 1) (i1, 2)
(i3, 1)
(i4, 1)
(i5,1)
(i1, 1)
(i4, 1)
(i6, 2)
(i4, 1)
(i2,2)
(i4, 1)
(i2, 3)
(i1,2)
(i4, 1)
(i5,1)
(i6,1) (i1, 2)
(i5, 1)
(i4, 3)
(i2, 1)
(i6, 1)
IL(hipster)
(i2, street, 4)
(i4, style, 3)
(i1, stylish, 2)
(i5, stylish, 1)
(i6, style, 1)
virtual IL(st) ∙ Leaf nodes in thetrie correspond toconcrete invertedlists
∙ Internal nodesmatch a keywordprefix andrepresent a”virtual list”.
7
TOPKS-ASYT (α = 0) – General flow (1/2)
Input: seeker s, query Q = (t1, .., tr), completion trie, graph withp-spaces
1. Candidate document list D = ∅, ordered by minimal score∙ as in NRA (No Random Access), each candidate in D has a minimal anda maximum score
∙ keep also a maximum score for unseen documents
2. Seeker s is added to priority queue
8
TOPKS-ASYT (α = 0) – General flow (1/2)
Input: seeker s, query Q = (t1, .., tr), completion trie, graph withp-spaces
1. Candidate document list D = ∅, ordered by minimal score∙ as in NRA (No Random Access), each candidate in D has a minimal anda maximum score
∙ keep also a maximum score for unseen documents
2. Seeker s is added to priority queue
8
TOPKS-ASYT (α = 0) – General flow (2/2)
3. While there exists a user in the priority queue∙ Get the next closest user u from priority queue∙ Refine proximity scores for neighbours of u∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix oftr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D∙ Advance in any IL from the completion trie whose head is a documentin D
∙ Termination condition (next slide)
4. Return the top-k items.
9
TOPKS-ASYT (α = 0) – General flow (2/2)
3. While there exists a user in the priority queue∙ Get the next closest user u from priority queue∙ Refine proximity scores for neighbours of u∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix oftr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D∙ Advance in any IL from the completion trie whose head is a documentin D
∙ Termination condition (next slide)
4. Return the top-k items.
9
Termination condition
Using boundaries on item scores, we try to terminate the algorithm
∙ Each item in the buffer has∙ a score lower-bound: assuming that the score computed so far is thefinal one
∙ a score upper-bound: MaxScore(i | s, t) =max_proximity× unseen_users(i, t) + current_score(i | s, t)
∙ if upper-bound of (k+ 1)− th item is inferior to lower bound ofk− th item: termination.
10
Experimental Framework
Three datasets: Tumblr, Yelp and Twitter
Yelp Twitter Tumblr
Users 29,293 458,117 612,425Items 18,149 1.6M 1.4MTags 177,286 550,157 2.3M
Number of triples 30.3M 13.9M 11.3MAverage number of tags per item 686 8.4 7.9
Average tag length 6.5 13.1 13.0
11
Experimental Framework
Experiments:
∙ given one triple (u, i, t), do the search with keyword t with user uas the seeker
∙ metric: ranking of i in the result∙ precision (test dataset D of N triples)
P@k =#{triple | ranking < k, triple ∈ D}
N
12
Tumblr α impact
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
P@
5
Tumblr item-tag
α0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
P@
5
Tumblr tag
α0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@
5
Tumblr social network
α0
0.01
0.1
0.4
1
13
Yelp α impact
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@
5
Yelp item-tag
α0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@
5
Yelp tag
α0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
P@
5
Yelp social network
α0
0.01
0.1
0.4
1
14
Social similarity – Tumblr number of visited users
0 2 4 6 8 10 12
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@
5
Tumblr social network
ν1
5
20
100
∞
Figure: Impact of visited users15
Efficiency – NDCG vs infinite answer
0 5 10 15 20 25 30 35 40
t (ms)
0.0
0.2
0.4
0.6
0.8
1.0
ND
CG
@20
Yelp social network
l2
4
6
Figure: l impact
0 5 10 15 20 25 30 35 40
t
0.0
0.2
0.4
0.6
0.8
1.0
ND
CG
@20
Yelp social network
α0.0
0.5
1.0
Figure: α impact
16
Efficiency – Comparison
3 4 5 6l
0.0
0.2
0.4
0.6
0.8
1.0
1.2
ND
CG
@20
Yelp social network (α = 0.0)
TOPKS-ASYTBaseline
Figure: TOPKS-ASYT vs baseline
3 4 5 6l
0
10
20
30
40
50
60
70
Tim
eex
act
top-k
Yelp social network
IncrementalNot-Incremental
Figure: Incremental vsNon-Incremental
17
Efficiency – Scaling
Figure: Impact of the size of the dataset (100% -> 35 million triples)
18
Thank you.Questions?
19