filippo radicchi graph-based ranking algorithms · f. radicchi, s. fortunato, b. markines and a....

42
Filippo Radicchi BioCenit Research Lab @ Universitat Rovira i Virgili Graph-based ranking algorithms

Upload: others

Post on 07-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Filippo Radicchi

BioCenit Research Lab @ Universitat Rovira i Virgili

Graph-based ranking algorithms

Page 2: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Collaboration networksCollaboration networks Citation networksCitation networks

Scientifi c motivationElectronic databases store a huge amount of information about scientifi c publications ( only in 2006, Journals ~104 Papers ~106 Citations ~107 )

Why analyze bibliographic data?

Page 3: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Practical motivation

Citations represent the fundamental units used to measure the scientifi c relevance of papers, journals, scientists, research groups and institutions

Why analyze bibliographic data?

Page 4: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

DisciplineDiscipline ResearcherResearcher Ass. ProfessorAss. Professor Full ProfessorFull Professor

MathematicsMathematics

PhysicsPhysics

BiologyBiology

Computer SciComputer Sci..

source: ”Indicatori di Attività Scientifi ca e di Ricerca”, Consiglio Universitario Nazionale (CUN), Dec. 2008

P = # publications, T = period of activity , C = total # of citations

ChemistryChemistry

A practical example

Page 5: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

A practical example

source: ”Abilitazione Scientifi ca Nazionale – La normalizzazione degli indicatori per l'eta' accademica (ANVUR)”, Jul. 2012

1) Number of papers:

2) Total number of citations:

I (N p , AA)=10N p

AA

I (N C , AA)=N C

AA

3) Contemporary h-index: S (i , t i , t )=4

(t−t i+1)C (i , t i , t )

AA

N p

N C

C (i , t i , t )

hc

total number of papers

academic age

total number of citations

citations accumulated by paper i published in year ti and measured in year t

A. Sidiropoulos et al., Scientometrics 72, 253 (2007)

R. Manella and P. Rossi, arXiv:1207.3499 (2012)

calculated over a population of 1400 Italian physicists

Page 6: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

On a larger sample of scientists35000 profi les on Google Scholar citations

Page 7: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

The network structure of citation data is often neglected in research evaluation

D.J. de Solla Price, Science 169, 510 (1965)

Network approachNetwork approach““Standard” approachStandard” approach

Citation countsCitation counts

h-index, g-index, ...h-index, g-index, ...

Impact factorImpact factor

papers

journals

scientists

CiteRankCiteRank

EigenfactorEigenfactor

??

Page 8: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Physical Review Series I (PRI), Physical Review (PR), Physical Review Letters (PRL), Physical Review A (PRA), Physical Review B (PRB), Physical Review C (PRC), Physical Review D (PRD), Physical Review E (PRE), Reviews of Modern Physics (RMP)

between 1893 and 2006

Paper Citation NetworkPaper Citation Network

Weighted Author Citation NetworkWeighted Author Citation Network

Graph-based ranking of scientists

Page 9: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

key-words: ”complex network” , ”scale-free network”, ”small-world network”, etc..

Weighted author citation network

Page 10: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Divide 8,783,994 total references into Divide 8,783,994 total references into homogeneoushomogeneous intervals intervals

MMII = # of intervals = # of intervals

MMRR = # of references in each interval = # of references in each interval

MMRR

2M2MRR

MMI I MM

RR(M(M

II-1)-1)

MM

RR(M(M

II-2)-2)

MM

RR00

MR/2 3M

R/2 (M

I - 1/2) M

R(MI - 3/2) M

R

MR ~ 488,000

MI = 18

20061893-1966

Dynamical representation

Page 11: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Diffusion equation

weight of the arc from j to i

out-strength of the node j

each paper carries a ”scientifi c credit”, equally divided among its authors

SARA scores depend on the choice of the redistribution probability q

Science Author Rank Algorithm

Page 12: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Science Author Rank Algorithm

Page 13: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Benchmarking SARA

Considered prizes: Nobel prize, Wolf prize, Boltzmann medal, Dirac medal and Planck medal

Comparison with different metrics

Page 14: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

NP= Nobel prize, WP= Wolf prize, BM= Boltzmann medal, DM= Dirac medal, and PM= Planck medal

Best physicists according to SARA

Page 15: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

physauthorsrank.org

Page 16: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Ranking tennis players

Page 17: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

ATP points distribution as of 2009

Grand Slams: Australian Open, Roland Garros, Wimbledon, US Open

Masters 1000: Indian Wells, Miami, Monte Carlo, Madrid, Rome, Canada, Cincinnati, Shanghai, Paris

500 Series: Rotterdam, Memphis, Acapulco, Dubai, Barcelona, Hamburg, Washington, Beijing, Tokyo, Basel, Valencia

250 Series: Doha, Chennai, Brisbane, Sydney, Auckland, ........

4

9

1140

Best results in 18 tournaments: 4 Grand Slams, 8 Masters 1000, best 4 results in 500 Series and best 2 results in 250 Series

source: wikipedia.org

ATP World Tour Finals: reserved to the best 8 players in the ranking

Page 18: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

ATP points 2008

ATP points 2009

Page 19: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

ATP data cover all tournaments since 1968

Page 20: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

The Open Era

3700 players, 3600 tournaments, 133000 matches

Page 21: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Tennis contact graph

each match is a directed edge from the loser to the winner

edges are weighted wi , j = total matches i vs. j, won by j

Page 22: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Top players in Grand Slams

Only players with at least two Grand Slam titles between 1968 and 2010

Page 23: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Tennis is “complex”

Matthew effect in career longevity, A.M. Pertersen at al., Proc. Natl. Acad. Sci. USA 108, 18 (2011)

Page 24: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Prestige score

diffusion random relocation

correction for dangling nodes

for a Grand Slam tournament

for a tournament

Page 25: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

#3 : John McEnroe

Career prize money: $12,547,797 Career record: 875–198 (81.55%) Career titles: 104 including 77 listed by the ATP

Page 26: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

#2 : Ivan Lendl

Career prize money: $21,262,417 Career record: 1071–239 (81.8%) Career titles: 144 including 94 listed by the ATP

Page 27: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

#1 : Jimmy Connors

Career prize money: $8,641,040 Career record: 1241–277 (81.75%) Career titles: 148 including 109 listed by the ATP

Page 28: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Prestige Rank

Page 29: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Relation with other scores

Page 30: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Relation with other scores2009 ATP year-end rank

Page 31: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Best player of the year

Page 32: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Best player of the year

Page 33: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Best players in Grand Slams

Page 34: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

What did people think about this ranking?

Page 35: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

What did players think about this ranking?

journalist: “There is a weird study by an American physician...”

Pete: “Who is this guy!?!?!?!”

Page 36: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

References

F. RadicchiPloS ONE 6, e17249 (2011)

F. Radicchi, S. Fortunato, B. Markines and A. VespignaniPhys. Rev. E 80, 056103 (2009)

Diffusion of scientifi c credits and the ranking of scientists

Who is the best player ever? A complex network analysis of the history of professional tennis

F. Radicchi, S. Fortunato and A. VespignaniIn Models of Science Dynamics: Encounters Between Complexity Theory and Information Sciences.Eds. A.Scharnhorst; K. Börner and P. van den Besselaar (Springer, 2012)

Citation networks

Page 37: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the
Page 38: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Can bibliographic data be used for research evaluation?

disciplinediscipline scientistscientist h-indexh-index

Physics

Edward Witten Marvin Cohen Philip W. Anderson Manuel Cardona Frank Wilczek

110 94 91 86 68

Chemistry

George Whitesides Martin Karplus

Kurt Wuthrich

135 Elias J. Corey

Alan Hegeer

132 129 114 113

Computer science

Hector Garcia-Molina Ian Foster

70 Deborah Estrin

Scott Shenker, Jeffrey D. Ullman, Don Towsley

68 67

65

P. Ball, Nature 448, 737 (2007)

Page 39: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Papers are classified in 172 scientific disciplines (from Acoustics to Zoology)

Publication yearPublication year

Number of citationsNumber of citations

PapersPapersJournalsJournals

Subject CategoriesSubject Categories

Description of the dataset

Page 40: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Source of data March 2008

Different scientific disciplines

Page 41: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Different scientific disciplines

Page 42: Filippo Radicchi Graph-based ranking algorithms · F. Radicchi, S. Fortunato, B. Markines and A. Vespignani Phys. Rev. E 80, 056103 (2009) Diffusion of scientii c credits and the

Different scientific disciplines