resolving an anomaly you can get there from here! how a few simple actions can restore sense to it...
TRANSCRIPT
Resolving an Anomalywww.eprints.org
You Can Get There From Here!
How a few simple actions can restore sense to it all
Resolving an Anomalywww.eprints.org
H. G. Wells, World Brain: The Idea of a Permanent World Encyclopaedia
Encyclopédie Française, August, 1937
• Encyclopaedias of the past sufficed for the needs of a cultivated minority– universal education was unthought of– gigantic increase in recorded knowledge
• Discontent with the role of universities and libraries in the intellectual life of mankind
• Universities multiply but do not enlarge their scope– thought & knowledge organization of the world
• No obstacle to the creation of an efficient index to all human knowledge, ideas and achievements
Resolving an Anomalywww.eprints.org
The Optimal and Inevitable for ResearchersAll of this will come to pass. The only question is “How Soon?”
• The entire full-text refereed corpus online
• On every researcher’s desktop, everywhere
• 24 hours a day
• All papers citation-interlinked
• Fully searchable, navigable, retrievable
• For free, for all, forever
Resolving an Anomalywww.eprints.org
The Subversive Proposal: Sufficient to free entire refereed corpus forever, immediately:
1. Universities install off-the-shelf, OAI-compliant Eprint software
2. Authors self-archive (preprints & postprints)
3. Institutions subsidize first start-up wave of self-archiving
4. The Give-Away corpus is freed
___________________________________________________________
Hypothetical Sequel:
5. Users prefer free version?
6. Publisher S/L/P revenues shrink, Library S/L/P savings grow?
7. Publishers downsize to QC/C service-providers + optional add-ons?
8. QC/C service costs funded by author-institution out of reader-institution S/L/P savings?
Resolving an Anomalywww.eprints.org
Five Essential PostGutenberg Distinctions:(if you don’t make them, none of this will make sense)
1. Distinguish the non-give-away vs. give-away literature Litmus test: “Does the author seek a royalty/fee?”: books (yes) vs. refereed journal papers (no)
2. Distinguish income (from paper sale) vs. impact (from paper use)
(and distinguish give-away-author imprint-income [0] vs. impact-income [??])
3. Distinguish give-away author copyright protection from: theft-of-authorship (wanted) vs. theft-of-text (unwanted)
4. Distinguish self-publishing (vanity press) vs. self-archiving (of published, refereed research)
5. Distinguish unrefereed preprints vs. refereed postprints “eprints” = preprints + postprints
Resolving an Anomalywww.eprints.org
Zeno’s Prima-FaQs“I worry about self-archiving because…”:
1. Preservation
2. Authentication
3. Corruption
4. Navigation (info-glut)
5. Certification
6. Evaluation
7. Peer review
8. Paying the piper
9. Downsizing
10. Copyright
11. Plagiarism
12. Priority
13. Censorship
14. Capitalism
15. Readability
16. Graphics
17. Publishers’ future
18. Libraries’ future
19. Learned Societies’ future
20. University conspiracy
21. Serendipity
22. Tenure/Promotion
23. (your prima-FaQ here…)
Answers available at
Answers available at < http://cogsci.soton.ac.uk/~harnad/Tp/resolution.htm >
www.eprints.org
Rob Tansley’s Eprints < www.eprints.org > is dedicated to freeing the research literature, pre- and post-refereeing, through author/institution self-archiving in interoperable
Open Archives < www.openarchives.org >
To help the self-archiving initiative quickly gain momentum, archive-creating software, compliant with the OAi protocol, hence fully interoperable with all other Open Archives, has been developed at the University of Southampton.
Eprints is designed to be as flexible and adaptable as possible, so that all universities world-wide can immediately adopt and configure it with minimal effort for all their disciplines’ self-archiving needs.
The Eprints software, is available (for free, of course) from eprints.org since December 2000.
Resolving an Anomalywww.eprints.org
arXiv submission rates - linear growth only
30% of citations to papers deposited in arXiv
From Linear Growth to Exponential
Time
DepositRates
Exponential growth in archiving to catch up with paper-based research
100% of papers archived, in all disciplines
Resolving an Anomalywww.eprints.org
Multiple Updates by LANL Subfield(based on LANL meta-data)
adap-orgastro-ph
chao-dyncomp-gas
cond-matcs
gr-qchep-ex
hep-lat
mathmath-ph
nlinnucl-ex
nucl-thpatt-sol
physicsquant-ph
solv-int
hep-thhep-ph
0 5000 10000 15000 20000 25000
No. of Papers with Updates
No Updates 1 Update 2 Updates 3 Updates 4 Updates
Citation Linking& Scientometric Analysis
New OAI Services
Resolving an Anomalywww.eprints.org
• The raw data show that the latency of the citation peak has been reducing over the period of the archive
Frequency of Citation Latencies: 1992-1999
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 12 24 36 48 60 72 84 96
Time Difference/Months
Cita
tions
99 98 97 96 95 94 93 92
Decreasing Citation Latencies
Resolving an Anomalywww.eprints.org
The “New Paper Rush”
• Users subscribe to an email alerting service that informs them of new papers.
Age of paper against number of downloads
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Age of Paper (days)
Nu
mb
er o
f D
ow
nlo
ads
Resolving an Anomalywww.eprints.org
hep-th
0
25
50
75
100
125
150
175
20019
9107
1992
01
1992
07
1993
01
1993
07
1994
01
1994
07
1995
01
1995
07
1996
01
1996
07
1997
01
1997
07
1998
01
1998
07
1999
01
1999
07
2000
01
Pap
ers
With J-R With J-R/Report Report Unknow n
• Papers with a journal reference [J-R] cross papers without a J-R at an age of 13 months, suggesting a time difference of 13 months between pre-print and post-print
Article Embryology
Resolving an Anomalywww.eprints.org
• The papers were split into three sets based on the number of citations to them.
• There are an equal number of citations to the papers in the low, medium and high sets.
Effect of Paper Impact
Resolving an Anomalywww.eprints.org
• High impact authors update more than medium or low
• High and medium impact authors deposit more papers than low
Quartile Total % Total Citations PapersCitations/Aut
hor/PaperDeposits
Mean Updates/Author
High 25% 798 2.09% 240,092 2,732 0.11 6,720 0.48Med 50% 9,262 24.20% 733,272 37,318 0.00212 93,671 0.37Low 25% 28,211 73.71% 251,925 67,951 0.000131 165,971 0.27
Author Impact Quartiles
Resolving an Anomalywww.eprints.org
Citation Quality
• Papers generally cite papers of like impact (2 underway).
High
Medium
Low
LowMedium
High
0
20000
40000
60000
80000
100000
120000
140000
No of Citations
Dest. Impact
Source Impact
Do Papers Cite Papers of Like Impact
Resolving an Anomalywww.eprints.org
Histogram of Citations per Paper(author impact) 30,000 papers were by authors w ith no citation
1386534 6072 5863
9627
30807
13668 11527
6784
3105
1797121 24925717047814441
2060
0
5000
10000
15000
20000
25000
30000
35000
40000
No citations 1 Citation 2/3 Citations 4/5/6Citations
7/8/9/10Citations
11 or moreCitations
Pap
ers
High (2.53%) Medium (34.55%) Low (62.92%)
• A small number of papers receive a very large number of citations
Citation Spread
Resolving an Anomalywww.eprints.org
Effect of Paper Impact on Usage
• Higher impact papers have a longer download life expectancy.
All Papers
0
0.0005
0.001
0.0015
0.002
0.00250
109
218
327
436
545
654
763
872
981
1090
1199
1308
1417
1526
1635
1744
1853
1962
2071
2180
2289
2398
Age of paper (days)
Fre
qu
ency
Den
sity
High (2.0%) Medium (7.7%) Low (46.5%) Unknown (39.6%)
Resolving an Anomalywww.eprints.org
Correlating citations and downloads
• There is a significant positive correlation between citations and downloads for high impact papers.
Download type r nAll Papers 0.11155 63671
High Impact Papers (2.0%) 0.27293 1981Medium Impact Papers (7.7%) 0.01288 5937
Low Impact Papers (46.5%) -0.01412 30163