school of information university of michigan expertise networks in online communities: structure and...
Post on 22-Dec-2015
217 views
TRANSCRIPT
School of InformationUniversity of Michigan
Expertise Networks in Online Communities:Structure and Algorithms
Lada Adamic
joint work with Jun Zhang and Mark Ackerman
School of Information, University of Michigan
NetSci
May 24th, 2007
Have you sought knowledge here?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Knows
Knowledge iN
Oozing out knowledge
Knowledge In
``Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge''
NHN CEO Chae Hwi-young
Largest search engine in Korea - 70% of search (Google: 2%)
Comprehensive portal – integrated news, blogs, ‘knowledge search’
Knowledge-In had 60 million questions and answers as of Feb 2007
popular: why fingernails grow faster than toenails how fast a fly can flywhy seagulls sit in the same direction
Ranking the contributors
Level Range of points
Lowlife 0-99
Commoner 100-500
Citizen 501-3000
Middle class 3001-7000
Expert 7001-15000
Hero 15001-35000
Professional 35001-65000
Superhuman 65001-100000
(Gods) > 100000
Knowledge In
Culture of generosity
“(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.”
-- Eckart Walther (Yahoo Research)
90 million users worldwide
Limitations of Current Systems
• The Response Time Gap
4939N =
ExpertiseRating
lowhigh
WA
ITTIM
E(m
in)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
69
96
41
• The Expertise Gap • Difficult to infer reliability of answers
Automatically ranking expertise may be helpful.
Related work
• Analysis of online communities• NetScan (Smith, Fisher, et al. at Microsoft)• Social network analysis (LiveJournal, blog communities)• Motivations of online participation (Lakhani & Hippel)
• Graph-based ranking algorithms• PageRank, HITS, etc.
• Expertise sharing studies• Expertise recommenders
• ContactFinder (Krulwich et al.), Answer Garden (Ackerman)• Small Blue (Lin)
• Automatic evaluating expertise levels• Using different text resources (Kautz, et al, and a lot of others)• Using email networks (Campbell et al.)
Overview
• Social network analysis• Constructing Expertise Networks• Finding meaningful metrics
• Empirical evaluation of ranking algorithms• Human Rating vs. Algorithmic Ranking
• Simulation• Understanding underlying dynamics• Predicting performance of ranking algorithms in yet-unobserved community dynamics
Java Forum
• 87 sub-forums• 1,438,053
messages• community
expertise network constructed:• 196,191 users• 796,270 edges
Constructing a community expertise network
A B C
Thread 1 Thread 2
Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C
Thread 2: Binary file with ASCII data user ARe: File with... user C
A
B
C
1
1
A
B
C
1
2
A
B
C
1/2
1+1//2
A
B
C
0.9
0.1
unweighted
weighted by # threads
weighted by shared credit
weighted with backflow
Not Everyone Asks/Replies
• Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers.• OUT: Mostly Helpers
The Web is a bow tie The JavaForum network is
an uneven bow tie
Uneven participation
100
101
102
103
10-4
10-3
10-2
10-1
100
degree (k)
cum
ula
tive
prob
abili
ty
= 1.87 fit, R2 = 0.9730
number of people one received replies from
number of people one replied to
• ‘answer people’ may reply to thousands of others
• ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent
Who Answers Whom
Degree-degree correlations between asker and helper
helper indegree (logarithmically binned)
ask
er
ind
eg
ree
(lo
ga
rith
mic
ally
bin
ne
d)
3 7 20 55 148 403 1096 2981
3
7
20
55
148
403
1096
2981
0
1
2
3
4
5
6
Summary of JavaForum Network
• Different types of participation • Askers, ask-help-er, helpers
• Different levels of participation • top helpers, others
• Who replied to whom• Top repliers answer questions for everyone• Other helpers help those with somewhat lower expertise
Relating network structure to Java expertise
• Human-rated expertise levels• 2 raters• 135 JavaForum users with >= 10 posts• inter-rater agreement (= 0.74, = 0.83)• for evaluation of algorithms, omit users where raters disagreed by
more than 1 level (= 0.80, = 0.83)
L Category Description
5 Top Java expert Knows the core Java theory and related advanced topics deeply.
4 Java professional Can answer all or most of Java concept questions. Also knows one or some sub topics very well,
3 Java user Knows advanced Java concepts. Can program relatively well.
2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.
1 Newbie Just starting to learn java.
Structural Info Based Expertise Ranking Metrics
• # replies posted (# answers)• experts can answer many questions
• # people replied to (# indegree)• experts can answer questions from many different people
• z-score for the 2 above (observed – )/• experts are above the mean in the above two metrics
• PageRank replying to people who reply to people• higher level experts can answer mid-level experts
• HITS experts answer questions by people whose questions other experts have answered
hubs point to good authorities
10121617181920192N =
LEVCOM
1098765432
RA
NK
of
PRA
NK
160
140
120
100
80
60
40
20
0
- 20
9281
5
68
1
10121617181920192N =
LEVCOM
1098765432
RA
NK
of
REP
LY
140
120
100
80
60
40
20
0
- 20
40
101
10121617181920192N =
LEVCOM
1098765432
RA
NK
of
ZTH
REA
DS
160
140
120
100
80
60
40
20
0
- 20
40
1011
10121117171917192N =
LEVCOM
1098765432
RA
NK
of
HIT
S_A
UT
140
120
100
80
60
40
20
0
- 20
33
automated vs. human ratings
# answers
human rating
auto
mat
ed r
anki
ng
10121617181920192N =
LEVCOM
1098765432
RA
NK
of
IND
GR
160
140
120
100
80
60
40
20
0
- 20
40
101
10121117171917192N =
LEVCOM
1098765432
RA
NK
of
ZD
GR
140
120
100
80
60
40
20
0
- 20
106104
z # answers
HITS authority
indegree
z indegree
PageRank
Algorithm Rankings vs. Human Ratings
simple local measures do as well (and better) than measures incorporating the wider network topology
Top K Kendall’s Spearman’s
# answersz-score # answersindegreez-score indegreePageRankHITS authority
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Modeling community structure to explain algorithm performance
Control Parameters: Distribution of expertise Who asks questions most often? Who answers questions most often? best expert most likely someone a bit more expert
ExpertiseNet Simulator
Simulating probability of expertise pairing
0 1 2 3 4 50
1
2
3
4
5
replier expertise
aske
r ex
pert
ise
0.02
0.04
0.06
0.08
0.1
0.12
0 1 2 3 4 50
1
2
3
4
5
replier expertise
aske
r ex
pert
ise
0
0.05
0.1
0.15
suppose:
expertise is uniformly distributed
probability of posing a question is inversely proportional to expertise
pij = probability a user with expertise j replies to a user with expertise i
2 models:‘best’ preferred ‘just better’ preferred
iep ijij /~ )( iep ji
ij /~ )( j>i
Degree correlation profiles
best preferred (simulation) just better (simulation)
Java Forum Networkas
ker
inde
gree
aske
r in
degr
ee
aske
r in
degr
ee
The Simulation of JavaForum
• Settings: • Distribution of expertise (skewed)• Who asks questions most often? (novices)• Who answers questions? (best expert most likely)
• Results • Similar bow tie structure• Similar degree distribution• Slightly different correlation profiles• Similar algorithm performance
• PageRank does not outperform simpler degree-based metrics
Different ranking algorithms perform differently
In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS
It can tell us when to use which algorithms
Preferred Helper: ‘just better’
Preferred Helper: ‘best available’
Summary
• Expertise Networks have interesting characteristics• A set of useful metrics • Ranking algorithms are affected by network structures• Simulation as an analysis tool
• There are rich design opportunities• Find experts with the help of structural information (and content
analysis) • Predict good answers • Re-order questions/answers to match expertise
working paper: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”
• Looking at diverse sets of question-answer forums (Yahoo Answers)
• Expertise across different topics
• Using explicit ratings for evaluation of automated expertise identification & incorporation into algorithms (battling spam)
• Users’ expertise change over time• Continually developing and evaluating our systems built upon these
findings
Future Work
cars & transportation
maintenance & repairs
beauty & style
hair
for more info
• ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure
and Algorithms, WWW’07
• Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study
Online Community Network Formation and Implications, C&T2007
Jun Zhang [email protected]
http://www-personal.si.umich.edu/~junzh
Mark Ackerman [email protected]
http://www.eecs.umich.edu/~ackerm/
Lada Adamic [email protected]
http://www-personal.umich.edu/~ladamic
NSF (IRI-9702904)
ads
• Jun Zhang is graduating and on the job market ([email protected])
• Lada is looking for a postdoc ([email protected])