nuanced graph representation to improve recommendation: the case of browsing and social networks
DESCRIPTION
Graphs are ubiquitous representations of a wide range of online traces generated by user activities including browsing, messaging, social linking, and many more. For their simplicity and power, graphs (like other similar representations of relational data) have been used in a plethora of applications, most of them falling under the umbrella of recommendation and personalization. However, very often the notion of graph and its atomic components (nodes and edges) are adopted uncritically, without giving much thought to their nature or meaning. In real-world scenarios the meaning of a link can vary broadly even within the same system or interaction type. We study browsing and social graph and show how a to obtain a more nuanced representation of their links to help gaining a deeper understanding of their nature and, in turn, to properly exploit the information about link type in recommendation tasks. First, we present the use of the BrowseGraph and its decomposition into ReferrerGraphs for image and news recommendation. Last, we will show how conversation graphs can be decomposed in subgraphs carrying different information about the type of resources exchanged between peers, providing an overview on the potential that such nuanced representation can have in the field of recommendation. Our analysis is conducted on large datasets extracted from Yahoo News, Flickr, and aNobii.TRANSCRIPT
Nuanced graph representation to improve recommendationThe case of browsing and social networks
1st International Workshop on Social Personalisation (SP 2014)
Luca Maria Aiello
Who’s this guy?
Network analysis
?
??
? ?
Roadmap
Part I : Browsing graphs in contextTo surface interesting content and address cold start-
scenario
Part II : Pragmatics of communication graphs To decompose the dyadic interaction and profile
user-to-user ties
Browsing graphs
Team
Luca Maria Aiello
Michele Trevisiol
Alejandro Jaimes
Luca Chiarandini
Rossano Schifanella
Browse Graph
• Nodes are pages• Edges are aggregated
browsing transitions
Trevisiol et al. “Image Ranking Based on User Browsing Behaviour” SIGIR 2012
• Centrality is a “good” indicator of content interestingness
• External layers add useful information
Flickr browsegraph
flickr
• Flickr browsing data– 2 months, 10M users, 50M nodes, 300M pageviews
Most central nodes in Flickr BrowseGraph
• Comparison with PageRank (no external nodes), Favorites, Clicks, View time– High quality– Higher topical variety– Surfaces photos related to real world events or interesting but not
popular
Trevisiol et al. “Image Ranking Based on User Browsing Behaviour” SIGIR 2012
Top 10 photos
Art Series OddEvents
Referrer Graph
• External accesses come from heterogeneous environment
Trevisiol et al. “Cold-start News Recommendation with Domain-dependent Browse Graph” RecSys 2014
• Extract subgraphs induced by the browsing traces from the same entry point
• Study their structural differences
Browsing in News
Yahoo News ReferrerGraphs
• 1 month of Yahoo News browsing log– 0.5 B entries
• Avg. number of hops per session =~2
Domain-dependent consumption
Jaccard similarity of node sets Kendall tau of nodes pageranks
Domain-dependent consumption
News consumption in time
Normalized article lifespan
PD
F(vie
ws)
Cold start recommendation
• Fingerprint of traffic depends of the referrer domain• Can we use this for recommendation?
40
30
2010
50
80
25
15
• Random• Most popular• Edge-based• Content-based
• Cosine sim + TF-IDF• (Full and mix graph
variants)
60
90
BenchmarkAveraged over 1,438 hourly graphs (~350k users per hour)
BenchmarkAveraged over 1,438 hourly graphs (~350k users per hour)
Takeaways
• Graph structure can be more useful than other simple indicators of user feedback to surface interesting content
• Browsing structure changes radically wrt referrer domain
• Historical browsing information is more effective than other cold-start indicators to predict next view (surprising?)
Conversation graphs
Team
Luca Maria Aiello
Rossano Schifanella
Bogdan State
Aiello et al. “Reading the source code of social ties” WebSci 2014
Conversation graph
Beyond simple edges
• Structure• Content
– Syntactics– Semantics
• Pragmatics (beyond saying)– Communication acts that define the type of social
relationship
35
7
hello!
Topic modeling, sentiment analysis, NLP, …
?What is the “nature” of a social tie?
Beyond simple edges
• Blau’s Social Exchange Theory– Exchange of non-material resources
• Objective: Label message with resources it conveys
5
Peter Blau “Exchange and power in social life” 1964
User profilingLink profilingVisualization…
How?
1
23
4 5
6
7
8 9
10 111. Preprocessing
– Stopwords, stemming
2. Message bucketing– NMF, LDA, …
3. Transition graph– Buckets as nodes
transitions as edges
• Intuition: conversations tend to stick to the same resource (“You’re very good at it” “You are pretty good as well”)
4. Resource extraction– Community detection
on transition graph
Input: directed comm. multigraph, arcs labeled with time and textOutput: (probabilistic) assignment message resource DISCOVERY!
A C
B D
Experiments
Dataset (anobii.com)
Aiello et al. “People are Strange when you're a Stranger: Impact and Influence of Bots on Social Networks” ICWSM'12
Status
Knowledge
Support
Anobii transition graph
Status exchangeKnowledge exchange Social support
Technical knowledge of a domain (stackoverflow)
Request for knowledge
“I read a very good review of that book”
Expression of admiration or esteem
Recognition of the partner’s higher status
“You are very smart!”
Emotional valuation
Everyday minute exchanges
“Hope your dad is feeling better now”
80% of messages are correctly assigned (human coders)
Gilbert, Karahalios “Predicting tie strength with social media” CHI 2009
Tie composition and strength
Communication networks induced by the exchange of a single resource
• Status: highly reciprocal, short lived, pervasive• Support: sentiment involved, long lived, between similar actors• Knowledge: long messages, between similar actors
Inequality
• Gini coefficient ~0.7 for all networks, higher for status
Lorenz curve Assortativity
• People receive status from people with lower status
Indegree (/instrength) = amount of resource owned
Tie evolution
• Knowledge prevails after three exchanges• Support increases steadily• Status-exchange fades away quickly
Conversation length
Rati
o o
f re
sourc
e in c
onvers
ati
on
Generality? (Flickr!)
Flickr
Conversation lengthRati
o o
f re
sourc
e in c
onvers
ati
on
Takeaways
• Need for a description of social interaction that goes beyond topics/sentiment/etc.
• Big potential impact on related fields on network studies (e.g., information propagation)
• Social tie sequence of individual exchanges Computational properties of social rituals “Grammar of society”
Conclusion
Graphs are usually not isolated, homogeneous entities. Do not oversimplify
when possible.
Quick announcement(I’ll be ready for questions in few seconds!)
BARCELONA, 10-13 November 2014www.socinfo2014.org@socinfo2014