pruning cooccurrence networks

19
Pruning cooccurrence networks Raf Guns [email protected]

Upload: rafg

Post on 13-Jul-2015

66 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Pruning cooccurrence networks

Pruning cooccurrence networks

Raf Guns

[email protected]

Page 2: Pruning cooccurrence networks

Bibliometric networks

Page 3: Pruning cooccurrence networks

Density problem

Dense networks are hard to

visualize

interpret

Solution: pruning networks

PathFinder (Schvaneveldt, 1990)

Deleting low-weight links (De Nooy, Mrvar, and Batagelj, 2005)

Cocitation and bibliographic coupling (Persson, 2010)

Threshold for cosine values (Leydesdorff, 2007; Egghe &

Leydesdorff, 2009)

Page 4: Pruning cooccurrence networks

Cooccurrence networks

E.g. cocitation, bibliographic coupling, coauthorship…

Especially prone to density problem

Cooccurrence networkTwo-mode network

e.g., authors

e.g., citingpapers

Page 5: Pruning cooccurrence networks

Methods

Page 6: Pruning cooccurrence networks

Steps

Based on Zweig and Kaufman (2011): we start from two-mode network

1. Define pattern of interest

2. Determine interestingness of cooccurrence

3. If cooccurrence is interesting, authors are linked

Page 7: Pruning cooccurrence networks

Why interestingness?

Highly cited author

High coocurrence counts with many other authors

Citing paper referring to many authors under consideration

Resulting cooccurrences are less important

Page 8: Pruning cooccurrence networks

Determining interestingness

Here:

How to determine Exp and σ?

Estimate by sampling from Fixed Degree Sequence Model (FDSM): all two-mode networks with same node degrees

Markov Chain Monte Carlo simulation: link swapping

If p < 0.0001 (or z > 3.29) , we consider link interesting

Page 9: Pruning cooccurrence networks

Link swapping

e.g., authors

e.g., citingpapers

Page 10: Pruning cooccurrence networks

Link swapping

Page 11: Pruning cooccurrence networks

Results

Page 12: Pruning cooccurrence networks

Author cocitation

Author (co-)citations to

12 authors from bibliometrics

12 authors from information retrieval

in Scientometrics and JASIS, 1996-2000

Same data set studied by

Ahlgren, Jarneving & Rousseau (2003)

Egghe & Leydesdorff (2009)

Leydesdorff & Vaughan (2006)

Page 13: Pruning cooccurrence networks

Author cocitations: cosine

Page 14: Pruning cooccurrence networks

Author cocitations: FDSM and z-scores

Page 15: Pruning cooccurrence networks

Bibliographic coupling

Bibliographic coupling of all JASIST articles, 1999-2000

n = 371

12 981 unique references

Two VOSviewer maps

cosine normalization

FDSM and z-scores

Page 16: Pruning cooccurrence networks

Bibliographic coupling: cosine

Page 17: Pruning cooccurrence networks

Bibliographic coupling: FDSM and z-scores

Page 18: Pruning cooccurrence networks

Conclusions

Advantages

1. Both positive and negative cooccurrences

2. Thresholds correspond to specific p-values

3. Accounts for degree variations of bottom nodes

Disadvantages

1. Some nodes may become isolates

2. More computationally intensive than cosine similarity

Page 19: Pruning cooccurrence networks

Thank you!