COMS 6998-06 Network Theory Week 11

COMS 6998-06 Network Theory Week 11. Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010. (29) Bibliometrics.

COMS 6998-06 Network TheoryWeek 11Dragomir R. RadevWednesdays, 6:10-8 PM325 Pupin TerraceFall 2010

(29) Bibliometrics

Early workThe Science Citation Index (1960)More than 8,700 journals in the natural and social sciencesEugene Garfieldde Solla Price study of networks of papers and citation patterns

Recent systemsCiteseerRexaGoogle ScholarACL Anthology Network

Garfields indicesJournal citation reportsImpact factor: Computed over a three-year period as B/A, whereFirst two years: A = number of citable itemsThird year: B = the number of citations to themIn science (2006)Science (30.03)Nature (26.68)PNAS (9.64)

CriticismFavor certain fields and types of researchAbsolute value is meaninglessIgnores certain type of scholarly work (e.g., books, software, conference papers)Possible to manipulateSelf-citationsIgnore citation type (this applies to all other metrics!)

Citation types[Weinstock 1971]

Networks of scientific papers (1965)In a given year, about 35% of the papers of all existing papers are not cited at all. Another 49% are cited only once. The rest are cited an average of 3.2 times each.Degree coefficient is about 2.5-3.07% annual growthMost papers are obsolete after 10 years

De Solla Price 1965

Miscellaneous metricsCitation countImpact factorPagerank (e.g., http://www.eigenfactor.org/)H-index

H-indexProposed by Jorrge Hirsch of UCSD in 2005Equals the number of papers of yours, h that have been cited at least h times.For physicists, 12=tenure, 18=full prof, 45=NAS (statement by Hirsch)

See demo (ACL Anthology Network)also: PoP (guess what it means?)citationspapersh

CriticismGaloiss is 2 (short career)Hard to compare two people with the same score but very different distributionHugely different based on the underlying database

Example

AANGoogle ScholarName1638Ken Church1532Kevin Knight1430Ralph Grishman1433Aravind Joshi1445Hermann Ney1445Fernando Pereira1330David Yarowsky1224Michael Collins1232Chris Manning1232Daniel Marcu1239Kathy McKeown1235Robert Mercer1225Franz Och1225Yves Schabes1234Stuart Shieber1123Eric Brill1137Eugene Charniak1124Ido Dagan1125Mark Johnson1130Philip Resnik

Recent study (An et al. 2004)31.5% of the papers have been cited.In-degree power law coefficient 1.71Diameters:Neural networks (n=23,371) d=24, ud=18Automata (n=28,168) d=33, ud=19Software eng (n=19,018) d=22, ud=16Largest connected components:NN WCC=79.6%Automata WCC=92%SE WCC=87.9%

Collaboration networks[Beaver 2001; Glaenzel 2003]Many reasons why people collaborate:

[Paul Erdos]

(23) The Ising model(24) Percolation on graphs

What is percolation [Grimmett 1999]Will water flow through a porous stone?

Let p be the probability that an edge is open.This process is called bond percolationPaths (percolation) appear at p=0.5059. This is a quintessential example for phase transitionspq(p)11(1,1)

Example: ferromagnetism. The Curie point is when there is no longer spontaneous magnetizationGeneric example of a magnetic field:[http://ibiblio.org/e-notes/Perc/ising.htm]

The Ising modelGiven a lattice in D-dimensional space.Each vertex can be -1 or 1.Configurations: specific assignments of -1 and 1The energy of a configuration is

In statistical physics: P(S) ~ e-E

[http://ibiblio.org/e-notes/Perc/trans.htm]

Demohttp://webphysics.davidson.edu/applets/ising/default.html http://stp.clarku.edu/simulations/ising/ising2d.htmlhttp://www.phy.syr.edu/courses/ijmp_c/Ising.html Ferromagnetic alignment (J>0)Temperature tends to break the alignment: causes the spins to randomly change their valuesExternal magnetic field tends to support the alignment

Site percolationThe critical value is around 0.59 but has not been derived analytically.

Demohttp://theorie.physik.uni-wuerzburg.de/~reents/ComputationalPhysics/percgr.html http://ibiblio.org/e-notes/Perc/perc.htmhttp://ibiblio.org/e-notes/Perc/distr.htmhttp://stp.clarku.edu/simulations/

(15) Diffusion on graphs

Epidemics in small worldsEpidemic = in the limit of a large graph, a non-zero fraction is infected.Fully mixed networks everyone is connected to everyone the same way.In real life this is not true.Let f = average number of shortcuts per vertex.Let k = 1: every vertex is connected to at least its one nearest neighbor.For large L (#vertices), the prob. that two random vertices have a shortcut is:

Moore and Newman 2000. Epidemics and Percolation in small-world networks.

Moore and Newman 2000 contd

Moore and Newman 2000 contd

More recent workNewman 2002Outbreak size distributionDegree of infected individualsBipartite graphs

