a network-aware approach for searching as-you-type in social media

23
a network-aware approach for searching as-you-type in social media Paul Lagrée , Bogdan Cautis, Hossein Vahabi November 6, 2015 Université Paris-Sud

Upload: inria-oak

Post on 14-Apr-2017

178 views

Category:

Science


0 download

TRANSCRIPT

Page 1: A Network-Aware Approach for Searching As-You-Type in Social Media

a network-aware approach for searchingas-you-type in social media

Paul Lagrée, Bogdan Cautis, Hossein VahabiNovember 6, 2015

Université Paris-Sud

Page 2: A Network-Aware Approach for Searching As-You-Type in Social Media

Social interactions in the Web

Social web: new development of the Web - user relationships, theirdata.

Significant portion of the Web:

∙ explicitly social (Facebook, Twitter, Google+)∙ implicitly social built with content (blogs, forums)

User-centric: consumers are generators and evaluators of content.1

Page 3: A Network-Aware Approach for Searching As-You-Type in Social Media

Motivation – As-You-Type

∙ Social-aware method(network-based)

∙ As-you-type search, handlingprefixes

∙ Real-time approach (200msmaximum)

∙ Incremental computation: exploitswhat has already been computed

∙ Top-k

2

Page 4: A Network-Aware Approach for Searching As-You-Type in Social Media

Social tagging context

Collaborative tagging networks: general abstraction of social media.

Model used in the following work:

∙ users form a social network which is represented as a weighedgraph, weights may reflect similarity, friendship, etc

∙ users tag items with terms. Items may be documents, videos,photos, URLs from a public pool.

Users search for items having tags matching the query

3

Page 5: A Network-Aware Approach for Searching As-You-Type in Social Media

Example

∙ Triples (user, item, tag)∙ Weighted similarity networkbetween users (reflectsproximity, friendship,similarity and can becomputed using tags,item-tags, social links). Eachedge σ(u, v) ∈]0, 1]

4

Page 6: A Network-Aware Approach for Searching As-You-Type in Social Media

Score Model

For a given tag t and seeker s, the score is

score(item|s, t) = α× textual(t, item) + (1− α)× social(item|s, t)

where α ∈ [0, 1] gives how much we want the answer to be social.

∙ α = 1, we come back to the classical web search∙ α = 0, exclusively social search.

5

Page 7: A Network-Aware Approach for Searching As-You-Type in Social Media

Score Model

For a given tag t and seeker s, the score is

score(item|s, t) = α× textual(t, item) + (1− α)× social(item|s, t)

where α ∈ [0, 1] gives how much we want the answer to be social.

∙ α = 1, we come back to the classical web search∙ α = 0, exclusively social search.

5

Page 8: A Network-Aware Approach for Searching As-You-Type in Social Media

Social score

The social score is defined as:

social(item | s, t) =∑

v tagged item with tag t

σ+(s, v)

σ+(s, v) corresponds to the extended proximity like pathmultiplication or path maximum.

∙ social(item | s,prefix) = maxt∈completions social(item | s, t)∙ textual(item | s,prefix) = maxt∈completions textual(item | s, t)

6

Page 9: A Network-Aware Approach for Searching As-You-Type in Social Media

Completion trie index

[4] ε

[1] ε [2] ip

[2] h[3] g

[2] l

[1] oomy[2] ster [2] pie

[2] asses

[2] oth[1] allow

[4] st

[3] y

[2] lish [3] le

[3] runge

[4] reet

(i4, 2)

(i2, 1)

(i6, 1)

(i3, 1)

(i2, 4)

(i4, 2)

(i2, 1)

(i3, 1)

(i5, 1) (i1, 2)

(i3, 1)

(i4, 1)

(i5,1)

(i1, 1)

(i4, 1)

(i6, 2)

(i4, 1)

(i2,2)

(i4, 1)

(i2, 3)

(i1,2)

(i4, 1)

(i5,1)

(i6,1) (i1, 2)

(i5, 1)

(i4, 3)

(i2, 1)

(i6, 1)

IL(hipster)

(i2, street, 4)

(i4, style, 3)

(i1, stylish, 2)

(i5, stylish, 1)

(i6, style, 1)

virtual IL(st) ∙ Leaf nodes in thetrie correspond toconcrete invertedlists

∙ Internal nodesmatch a keywordprefix andrepresent a”virtual list”.

7

Page 10: A Network-Aware Approach for Searching As-You-Type in Social Media

TOPKS-ASYT (α = 0) – General flow (1/2)

Input: seeker s, query Q = (t1, .., tr), completion trie, graph withp-spaces

1. Candidate document list D = ∅, ordered by minimal score∙ as in NRA (No Random Access), each candidate in D has a minimal anda maximum score

∙ keep also a maximum score for unseen documents

2. Seeker s is added to priority queue

8

Page 11: A Network-Aware Approach for Searching As-You-Type in Social Media

TOPKS-ASYT (α = 0) – General flow (1/2)

Input: seeker s, query Q = (t1, .., tr), completion trie, graph withp-spaces

1. Candidate document list D = ∅, ordered by minimal score∙ as in NRA (No Random Access), each candidate in D has a minimal anda maximum score

∙ keep also a maximum score for unseen documents

2. Seeker s is added to priority queue

8

Page 12: A Network-Aware Approach for Searching As-You-Type in Social Media

TOPKS-ASYT (α = 0) – General flow (2/2)

3. While there exists a user in the priority queue∙ Get the next closest user u from priority queue∙ Refine proximity scores for neighbours of u∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix oftr (p-space exploration)

∙ Compute the score bounds of each d and insert (or update) it in D∙ Advance in any IL from the completion trie whose head is a documentin D

∙ Termination condition (next slide)

4. Return the top-k items.

9

Page 13: A Network-Aware Approach for Searching As-You-Type in Social Media

TOPKS-ASYT (α = 0) – General flow (2/2)

3. While there exists a user in the priority queue∙ Get the next closest user u from priority queue∙ Refine proximity scores for neighbours of u∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix oftr (p-space exploration)

∙ Compute the score bounds of each d and insert (or update) it in D∙ Advance in any IL from the completion trie whose head is a documentin D

∙ Termination condition (next slide)

4. Return the top-k items.

9

Page 14: A Network-Aware Approach for Searching As-You-Type in Social Media

Termination condition

Using boundaries on item scores, we try to terminate the algorithm

∙ Each item in the buffer has∙ a score lower-bound: assuming that the score computed so far is thefinal one

∙ a score upper-bound: MaxScore(i | s, t) =max_proximity× unseen_users(i, t) + current_score(i | s, t)

∙ if upper-bound of (k+ 1)− th item is inferior to lower bound ofk− th item: termination.

10

Page 15: A Network-Aware Approach for Searching As-You-Type in Social Media

Experimental Framework

Three datasets: Tumblr, Yelp and Twitter

Yelp Twitter Tumblr

Users 29,293 458,117 612,425Items 18,149 1.6M 1.4MTags 177,286 550,157 2.3M

Number of triples 30.3M 13.9M 11.3MAverage number of tags per item 686 8.4 7.9

Average tag length 6.5 13.1 13.0

11

Page 16: A Network-Aware Approach for Searching As-You-Type in Social Media

Experimental Framework

Experiments:

∙ given one triple (u, i, t), do the search with keyword t with user uas the seeker

∙ metric: ranking of i in the result∙ precision (test dataset D of N triples)

P@k =#{triple | ranking < k, triple ∈ D}

N

12

Page 17: A Network-Aware Approach for Searching As-You-Type in Social Media

Tumblr α impact

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

P@

5

Tumblr item-tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

P@

5

Tumblr tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P@

5

Tumblr social network

α0

0.01

0.1

0.4

1

13

Page 18: A Network-Aware Approach for Searching As-You-Type in Social Media

Yelp α impact

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

P@

5

Yelp item-tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

P@

5

Yelp tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

P@

5

Yelp social network

α0

0.01

0.1

0.4

1

14

Page 19: A Network-Aware Approach for Searching As-You-Type in Social Media

Social similarity – Tumblr number of visited users

0 2 4 6 8 10 12

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P@

5

Tumblr social network

ν1

5

20

100

Figure: Impact of visited users15

Page 20: A Network-Aware Approach for Searching As-You-Type in Social Media

Efficiency – NDCG vs infinite answer

0 5 10 15 20 25 30 35 40

t (ms)

0.0

0.2

0.4

0.6

0.8

1.0

ND

CG

@20

Yelp social network

l2

4

6

Figure: l impact

0 5 10 15 20 25 30 35 40

t

0.0

0.2

0.4

0.6

0.8

1.0

ND

CG

@20

Yelp social network

α0.0

0.5

1.0

Figure: α impact

16

Page 21: A Network-Aware Approach for Searching As-You-Type in Social Media

Efficiency – Comparison

3 4 5 6l

0.0

0.2

0.4

0.6

0.8

1.0

1.2

ND

CG

@20

Yelp social network (α = 0.0)

TOPKS-ASYTBaseline

Figure: TOPKS-ASYT vs baseline

3 4 5 6l

0

10

20

30

40

50

60

70

Tim

eex

act

top-k

Yelp social network

IncrementalNot-Incremental

Figure: Incremental vsNon-Incremental

17

Page 22: A Network-Aware Approach for Searching As-You-Type in Social Media

Efficiency – Scaling

Figure: Impact of the size of the dataset (100% -> 35 million triples)

18

Page 23: A Network-Aware Approach for Searching As-You-Type in Social Media

Thank you.Questions?

19