pruning cooccurrence networks

Pruning cooccurrence networks

Raf Guns

[email protected]

Bibliometric networks

Density problem

Dense networks are hard to

visualize

interpret

Solution: pruning networks

PathFinder (Schvaneveldt, 1990)

Deleting low-weight links (De Nooy, Mrvar, and Batagelj, 2005)

Cocitation and bibliographic coupling (Persson, 2010)

Threshold for cosine values (Leydesdorff, 2007; Egghe &

Leydesdorff, 2009)

Cooccurrence networks

E.g. cocitation, bibliographic coupling, coauthorship…

Especially prone to density problem

Cooccurrence networkTwo-mode network

e.g., authors

e.g., citingpapers

Methods

Steps

Based on Zweig and Kaufman (2011): we start from two-mode network

1. Define pattern of interest

2. Determine interestingness of cooccurrence

3. If cooccurrence is interesting, authors are linked

Why interestingness?

Highly cited author

High coocurrence counts with many other authors

Citing paper referring to many authors under consideration

Resulting cooccurrences are less important

Determining interestingness

Here:

How to determine Exp and σ?

Estimate by sampling from Fixed Degree Sequence Model (FDSM): all two-mode networks with same node degrees

Markov Chain Monte Carlo simulation: link swapping

If p < 0.0001 (or z > 3.29) , we consider link interesting

Link swapping

e.g., authors

e.g., citingpapers

Link swapping

Results

Author cocitation

Author (co-)citations to

12 authors from bibliometrics

12 authors from information retrieval

in Scientometrics and JASIS, 1996-2000

Same data set studied by

Ahlgren, Jarneving & Rousseau (2003)

Egghe & Leydesdorff (2009)

Leydesdorff & Vaughan (2006)

Author cocitations: cosine

Author cocitations: FDSM and z-scores

Bibliographic coupling

Bibliographic coupling of all JASIST articles, 1999-2000

n = 371

12 981 unique references

Two VOSviewer maps

cosine normalization

FDSM and z-scores

Bibliographic coupling: cosine

Bibliographic coupling: FDSM and z-scores

Conclusions

Advantages

1. Both positive and negative cooccurrences

2. Thresholds correspond to specific p-values

3. Accounts for degree variations of bottom nodes

Disadvantages

1. Some nodes may become isolates

2. More computationally intensive than cosine similarity

Thank you!

pruning cooccurrence networks

Science