a network-aware approach for searching as-you-type in social media

a network-aware approach for searchingas-you-type in social media

Paul Lagrée, Bogdan Cautis, Hossein VahabiNovember 6, 2015

Université Paris-Sud

Social interactions in the Web

Social web: new development of the Web - user relationships, theirdata.

Significant portion of the Web:

∙ explicitly social (Facebook, Twitter, Google+)∙ implicitly social built with content (blogs, forums)

User-centric: consumers are generators and evaluators of content.1

Motivation – As-You-Type

∙ Social-aware method(network-based)

∙ As-you-type search, handlingprefixes

∙ Real-time approach (200msmaximum)

∙ Incremental computation: exploitswhat has already been computed

∙ Top-k

2

Social tagging context

Collaborative tagging networks: general abstraction of social media.

Model used in the following work:

∙ users form a social network which is represented as a weighedgraph, weights may reflect similarity, friendship, etc

∙ users tag items with terms. Items may be documents, videos,photos, URLs from a public pool.

Users search for items having tags matching the query

3

Example

∙ Triples (user, item, tag)∙ Weighted similarity networkbetween users (reflectsproximity, friendship,similarity and can becomputed using tags,item-tags, social links). Eachedge σ(u, v) ∈]0, 1]

4

Score Model

For a given tag t and seeker s, the score is

score(item|s, t) = α× textual(t, item) + (1− α)× social(item|s, t)

where α ∈ [0, 1] gives how much we want the answer to be social.

∙ α = 1, we come back to the classical web search∙ α = 0, exclusively social search.

5

Social score

The social score is defined as:

social(item | s, t) =∑

v tagged item with tag t

σ+(s, v)

σ+(s, v) corresponds to the extended proximity like pathmultiplication or path maximum.

∙ social(item | s,prefix) = maxt∈completions social(item | s, t)∙ textual(item | s,prefix) = maxt∈completions textual(item | s, t)

6

Completion trie index

[4] ε

[1] ε [2] ip

[2] h[3] g

[2] l

[1] oomy[2] ster [2] pie

[2] asses

[2] oth[1] allow

[4] st

[3] y

[2] lish [3] le

[3] runge

[4] reet

(i4, 2)

(i2, 1)

(i6, 1)

(i3, 1)

(i2, 4)

(i4, 2)

(i2, 1)

(i3, 1)

(i5, 1) (i1, 2)

(i3, 1)

(i4, 1)

(i5,1)

(i1, 1)

(i4, 1)

(i6, 2)

(i4, 1)

(i2,2)

(i4, 1)

(i2, 3)

(i1,2)

(i4, 1)

(i5,1)

(i6,1) (i1, 2)

(i5, 1)

(i4, 3)

(i2, 1)

(i6, 1)

IL(hipster)

(i2, street, 4)

(i4, style, 3)

(i1, stylish, 2)

(i5, stylish, 1)

(i6, style, 1)

virtual IL(st) ∙ Leaf nodes in thetrie correspond toconcrete invertedlists

∙ Internal nodesmatch a keywordprefix andrepresent a”virtual list”.

7

TOPKS-ASYT (α = 0) – General flow (1/2)

Input: seeker s, query Q = (t1, .., tr), completion trie, graph withp-spaces

1. Candidate document list D = ∅, ordered by minimal score∙ as in NRA (No Random Access), each candidate in D has a minimal anda maximum score

∙ keep also a maximum score for unseen documents

2. Seeker s is added to priority queue

8

TOPKS-ASYT (α = 0) – General flow (2/2)

3. While there exists a user in the priority queue∙ Get the next closest user u from priority queue∙ Refine proximity scores for neighbours of u∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix oftr (p-space exploration)

∙ Compute the score bounds of each d and insert (or update) it in D∙ Advance in any IL from the completion trie whose head is a documentin D

∙ Termination condition (next slide)

4. Return the top-k items.

9

Termination condition

Using boundaries on item scores, we try to terminate the algorithm

∙ Each item in the buffer has∙ a score lower-bound: assuming that the score computed so far is thefinal one

∙ a score upper-bound: MaxScore(i | s, t) =max_proximity× unseen_users(i, t) + current_score(i | s, t)

∙ if upper-bound of (k+ 1)− th item is inferior to lower bound ofk− th item: termination.

10

Experimental Framework

Three datasets: Tumblr, Yelp and Twitter

Yelp Twitter Tumblr

Users 29,293 458,117 612,425Items 18,149 1.6M 1.4MTags 177,286 550,157 2.3M

Number of triples 30.3M 13.9M 11.3MAverage number of tags per item 686 8.4 7.9

Average tag length 6.5 13.1 13.0

11

Experimental Framework

Experiments:

∙ given one triple (u, i, t), do the search with keyword t with user uas the seeker

∙ metric: ranking of i in the result∙ precision (test dataset D of N triples)

P@k =#{triple | ranking < k, triple ∈ D}

N

12

Tumblr α impact

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

P@

5

Tumblr item-tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

P@

5

Tumblr tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P@

5

Tumblr social network

α0

0.01

0.1

0.4

1

13

Yelp α impact

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

P@

5

Yelp item-tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

P@

5

Yelp tag

α0

0.01

0.1

0.4

1

1 2 3 4 5 6 7 8

l

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

P@

5

Yelp social network

α0

0.01

0.1

0.4

1

14

Social similarity – Tumblr number of visited users

0 2 4 6 8 10 12

l

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P@

5

Tumblr social network

ν1

5

20

100

∞

Figure: Impact of visited users15

Efficiency – NDCG vs infinite answer

0 5 10 15 20 25 30 35 40

t (ms)

0.0

0.2

0.4

0.6

0.8

1.0

ND

CG

@20

Yelp social network

l2

4

6

Figure: l impact

0 5 10 15 20 25 30 35 40

t

0.0

0.2

0.4

0.6

0.8

1.0

ND

CG

@20

Yelp social network

α0.0

0.5

1.0

Figure: α impact

16

Efficiency – Comparison

3 4 5 6l

0.0

0.2

0.4

0.6

0.8

1.0

1.2

ND

CG

@20

Yelp social network (α = 0.0)

TOPKS-ASYTBaseline

Figure: TOPKS-ASYT vs baseline

3 4 5 6l

0

10

20

30

40

50

60

70

Tim

eex

act

top-k

Yelp social network

IncrementalNot-Incremental

Figure: Incremental vsNon-Incremental

17

Efficiency – Scaling

Figure: Impact of the size of the dataset (100% -> 35 million triples)

18

Thank you.Questions?

19

a network-aware approach for searching as-you-type in social media

Science