pruning cooccurrence networks
TRANSCRIPT
Bibliometric networks
Density problem
Dense networks are hard to
visualize
interpret
Solution: pruning networks
PathFinder (Schvaneveldt, 1990)
Deleting low-weight links (De Nooy, Mrvar, and Batagelj, 2005)
Cocitation and bibliographic coupling (Persson, 2010)
Threshold for cosine values (Leydesdorff, 2007; Egghe &
Leydesdorff, 2009)
Cooccurrence networks
E.g. cocitation, bibliographic coupling, coauthorship…
Especially prone to density problem
Cooccurrence networkTwo-mode network
e.g., authors
e.g., citingpapers
Methods
Steps
Based on Zweig and Kaufman (2011): we start from two-mode network
1. Define pattern of interest
2. Determine interestingness of cooccurrence
3. If cooccurrence is interesting, authors are linked
Why interestingness?
Highly cited author
High coocurrence counts with many other authors
Citing paper referring to many authors under consideration
Resulting cooccurrences are less important
Determining interestingness
Here:
How to determine Exp and σ?
Estimate by sampling from Fixed Degree Sequence Model (FDSM): all two-mode networks with same node degrees
Markov Chain Monte Carlo simulation: link swapping
If p < 0.0001 (or z > 3.29) , we consider link interesting
Link swapping
e.g., authors
e.g., citingpapers
Link swapping
Results
Author cocitation
Author (co-)citations to
12 authors from bibliometrics
12 authors from information retrieval
in Scientometrics and JASIS, 1996-2000
Same data set studied by
Ahlgren, Jarneving & Rousseau (2003)
Egghe & Leydesdorff (2009)
Leydesdorff & Vaughan (2006)
Author cocitations: cosine
Author cocitations: FDSM and z-scores
Bibliographic coupling
Bibliographic coupling of all JASIST articles, 1999-2000
n = 371
12 981 unique references
Two VOSviewer maps
cosine normalization
FDSM and z-scores
Bibliographic coupling: cosine
Bibliographic coupling: FDSM and z-scores
Conclusions
Advantages
1. Both positive and negative cooccurrences
2. Thresholds correspond to specific p-values
3. Accounts for degree variations of bottom nodes
Disadvantages
1. Some nodes may become isolates
2. More computationally intensive than cosine similarity
Thank you!