Evaluating Similarity Measures: A Large-Scale Study in the orkut
Social Network
Ellen Spertus
Recommender systems
• What are they?
• Example: Amazon
Controversial recommenders
“What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
“What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
“What to do when your TiVo thinks you’re gay”, Wall Street Journal, Nov. 26, 2002
http://tinyurl.com/2qyepg
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Controversial recommenders
Wal-Mart DVD recommendations
http://tinyurl.com/2gp2hm
Google’s mission
To organize the world's information and make it universally accessible and useful.
communities
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
1/28
/200
4
2/28
/200
4
3/28
/200
4
4/28
/200
4
5/28
/200
4
6/28
/200
4
7/28
/200
4
8/28
/200
4
9/28
/200
4
10/2
8/20
04
11/2
8/20
04
Members
Communities
Community recommender
• Goal: Per-community ranked recommendations
• How to determine?
Community recommender
• Goal: Per-community ranked recommendations
• How to determine?– Implicit collaborative
filtering– Look for common membership
between pairs of communities
Terminology
• Consider each community to be a set of members– B: base community (e.g., “Pizza”)– R: related community (e.g., “Cheese”)
• Similarity measure– Based on overlap |B∩R|
Example: Pizza
Example: Pizza
Terminology
• Consider each community to be a set of members– B: base community (e.g., “Wine”)– R: related community (e.g., “Linux”)
• Similarity measure– Based on overlap |B∩R|– Also depends on |B| and |R|– Possibly asymmetric
Example of asymmetry
Stanford (2756) Stanford Class of
2006 (52)5
Similarity measures
• L1 normalization
• L2 normalization
• Pointwise mutual information– Positive correlations– Positive and negative correlations
• Salton tf-idf
• Log-odds
L1 normalization
• Vector notation
• Set notation
L2 normalization
• Vector notation
• Set notation
Mutual information: positive correlation
• Formally,
• Informally, how well membership in the base community predicts membership in the related community
b b r + - r - +
Mutual information: positive and negative correlation
b b r + - r - +
Salton tf-idf
LogOdds0
• Formally,
• Informally, how much likelier a member of B is to belong to R than a non-member of B is.
LogOdds0
• Formally,
• Informally, how much likelier a member of B is to belong to R than a non-member of B is.
• This yielded the same rankings as L1.
LogOdds
Predictions?
• Were there significant differences among the measures?– Top-ranked recommendations– User preference
• Which measure was “best”?
• Was there a partial or total ordering of measures?
Recommendations for “I love wine” (2400)
Experiment
• Precomputed top 12 recommendations for each base community for each similarity measure
• When a user views a community page– Hash the community and user ID to– Select an ordered pair of measures to– Interleave, filtering out duplicates
• Track clicks of new users
Click interpretation
base community related community M n j Member ? + non-member ?? ??
Click interpretation
base community related community M n j Member ? + non-member ?? ??
Overall click rate (July 1-18)
Total recommendation pages generated: 4,106,050
Overall click rate (July 1-18)
Overall click rate (July 1-18)
Analysis
For each pair of similarity measures Ma and Mb and each click C, either:
• Ma recommended C more highly than Mb
• Ma and Mb recommended C equally
• Mb recommended C more highly than Ma
Results
• Clicks leading to joinsL2 » MI1 » MI2 » IDF › L1 » LogOdds
• All clicks L2 » L1 » MI1 » MI2 › IDF» LogOdds
Positional effects
• Original experiment– Ordered recommendations by rank
• Second experiment– Generated recommendations using L2– Pseudo-randomly ordered recommendations,
tracking clicks by placement– Tracked 1.3 M clicks between
September 22-October 21
Results: single row (n=28108)
Namorado Para o Bulldog
Results: single row (n=28,108)
p=.12 (not significant)
1.00 1.01 .98
Results: two rows (n=24,459)
Results: two rows (n=24,459)
p < .001
1.04 1.05 1.08 .97 .94 .92
Results: 3 rows (n=1,226,659)
Results: 3 rows (n=1,226,659)
1.11 1.06 1.04 1.01 .97 .99 1.01 .94 .87
p < .001
Users’ reactions
• Hundreds of requests per day to add recommendations
• Angry requests from community creators– General– Specific
Amusing recommendations
C++
Amusing recommendations
C++ What’s she trying to say?
For every time a woman has confused you…
Amusing recommendations
Chocolate
Amusing recommendations
Chocolate PMS
Allowing community owners to set recommendations
Allowing community owners to set recommendations
Manual recommendations
• Eight days after release– 50,876 community owners– Added 267,623 recommendations– Deleted 59,599 recommendations– Affecting 73,230 base communities and– 111,936 related communities
• Open question: How do they compare with automatic recommendations?
Future research 1
Determining similar users based on common communities– Is it useful?– Will the measures make the same total order?
(9 users)
Other types of information
• Distance in social network
• Demographic– Country– Age– Etc.
Future research 2
Per-user community recommendations– Using social network information– Using profile information (e.g., country)
Future research 2
Per-user community recommendations– Using social network information– Using profile information (e.g., country)
Future research 2
Per-user community recommendations– Using social network information– Using profile information (e.g., country)
Future research 3
Do we get the same ordering for other domains?
L2 » MI1 » MI2 » IDF › L1 » LogOdds
Acknowledgments
• Mehran Sahami
• Orkut Buyukkokten
• orkut team
Bonus material
Self-rated beauty
• “beauty contest winners”
• “very attractive”
• “attractive”
• “average”
• “mirror-cracking material”
Self-rated beauty: men
• “beauty contest winners” 8%
• “very attractive” 18%
• “attractive” 39%
• “average” 24%
• “mirror-cracking material”11%
Self-rated beauty: women
• “beauty contest winners” 8%
• “very attractive” 16%
• “attractive” 39%
• “average” 27%
• “mirror-cracking material”9%
Self-rated beauty by country
• Most beautiful– men:– women:
• Least beautiful– men:– women:
Self-rated beauty by country
• Most beautiful– men: Syrian– women: Barbadian
• Least beautiful– men: Gambian– women: Ascension Islanders
Ratings by others
• Karma– trustiness– sexiness– coolness
• How do these correlate with age?
Ratings by others
Friend counts
Self-rated best body part