surveying usage of academic research in journalismsurveying usage of academic research in journalism...
TRANSCRIPT
SURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISM
How can we use authorship and citations to better understand information diffusion between popular media and academic articles for the purpose of informing the general public as well as the academic community?
PROBLEM STATEMENT
Logan Walls - Researcher Isabelle Edwards - Researcher Tin Ho - Project Manager
PROCESS
RESULTS
Special thanks to Dr. Jevin West and Dr. Emma Spiro of the UW’s DataLab for guiding us along in our project
Eigenfactor is a metric used to measure the influence of aca-demic publications: by employing a similar algorithm to Pag-eRank, the influence of each article is not merely determined by the number of citations it receives, but also by the influence of the papers which cite it. By plotting the number of academ-ic citations in a New York Times article against the average ei-genfactor of those citations we show a significant positive cor-relation (Pearson's r = 0.31, p < 0.0001).
Initially we interpreted this as a difference in research styles between journalists, but upon examining the same variables aggregated by journalist the correlation was much weaker, suggesting that the pattern we observe between citation counts and eigenfactor is more content-driven than journal-ist-driven.
PERRI KLASS, M.D
JANE E. BRODY
DOUGLAS QUENQUA
ASHLEY TAYLOR
NICHOLAS WADE
MELANIE WARNER
PETER ANDREY SMITH
DANIELLE OFRI, M.D
KENNETH CHANG
ALAN SCHWARZ
JAMES GORMAN
TARA PARKER-POPE
CATHERINE SAINT LOUIS
ABBY ELLIN
PAULA SPAN
ANAHAD O\'CONNOR
BARRY MEIER
PAULINE W. CHEN, M.D
HARRIET BROWN
KAREN WEINTRAUB
PAM BELLUCK
SABRINA TAVERNISE
DONALD G. McNEIL Jr
JESSICA NUTIK ZITTER, M.D
JAN HOFFMAN
NICHOLAS BAKALAR
JILL WERMAN HARRIS
DAVID STIPP
DENISE GRADY
GINA KOLATA
DEBORAH BLUM
LAURA GEGGEL
RICHARD A. FRIEDMAN, M.D
BENEDICT CAREY
JUDITH GRAHAM
LAURIE TARKANKATHERINE BOUTON
LAURA BEIL
ALAN SCHWARZ and SARAH COHEN
PAULINE CHEN, M.D
RONI CARYN RABIN
ANAHAD O\'CONNOR and KAMARA SWABY
DAVID DOBBSABIGAIL ZUGER, M.D
SOPHIE EGAN
GRETCHEN REYNOLDS
Annette Peters
Thomas J. Esposito
Aamar Sleemi
Johanna Penell
S. Moebus
Elena Losina
Gerard Hoek
D. Leone
Toshinori Murayama
Jodeanne Bellant
M. M. Hilgeman
S. C. Smith
M. A. Cohen
Kelly G. Baron
W. E. Haley
D. Gandell
Thomas E. Young
Stephen D. Wexner
Frank Gilliland
Johan Auwerx
Rita I. Kirk
Paula Lozano
Patrick J. Brown
Y. Santalucia
Amber Thornton-Bullock
Benson Silverman
Gina S. Lovasi
Renee Marquett
Ladson Hinton
V. Rozalski
Rochelle P. Walensky
Fu-Chen Chen
Yun Wang
James H. Ware
C. Arden Pope
J. R. Brook
Ana V. Diez-RouxJ. Long
T. Lumley
Philip Greenland
Karl-Heinz Jöckel
Xavier Basagaña
S. E. Straus
J. N. Leonard
Delores Gallagher-Thompson
Morton Lippmann
Sverre Vedal
Bruno Kajiyama
L. W. Thompson
Kevin Chan
Joseph B. Tomlins
Georgina Charlesworth
Jane G Muir
M Garaulet
Eunice Rodriguez
Kateryna Fuks
Raimund Erbel
Olaoluwa Okusaga
K. L. Thompson
Martin I. Meltzer
Steven B. Abramson
Paul Fischer
H. Kyriazopoulos
Adam A. Szpiro
Sapphire Li
David B. Allison
Larry E. Beutler
Petros Koutrakis
Conrad P. Earnest
Peter C Gøtzsche
Lianne Sheppard
Julian Montoro-Rodriguez
J. M. Holland
Ulf de Faire
Frank E. Speizer
Henry Brodaty
P. D. Sampson
J. F. Ludvigsson
B. T. Mausbach
Benoît G. Bardy
Lynn Buckvar-Keltz
Margaret D. Carroll
Rebecca Peng
Mark J. Travers
Phyllis C. Zee
Wendy J. Mack
L.W. Thompson
Peng-Chih Wang
N. Dragano
Kristine Yaffe
F. Sun
Miriam Fuchsluger
Alma Au
Linda Lam
Q Yang
Kathryn J. Reid
Annette M. Hartmann
J. E. Manson
Jaume Marrugat
Naoharu Iwai
R. G. Barr
Eric de Groot
Simon Hales
R. Graham Barr
Timothy Gould
Kenneth H. Mayer
Frank R. Lewis
Patrick Kinney
Pere Puigserver
Yea-Ing Shyu
Laurie LaBree
Damiano Baldassarre
R. A. Kronmal
A. Bhatnagar
Nico Dragano
Teodor T. Postolache
Daniel Jimenez
Alan Schatzberg
D. Mozaffarian
Douglas W. Dockery
Dolores Gallagher Thompson
Ann BilbreyN. Lehmann
Cristine D. Delnevo
Nina Kraus
Barbara Hoffmann
Ann Rojas-CheathamA. H. Auchincloss
J. Keeler
Ruth M. O’Hara
Aleksandra Stepanenko
Ray Chan
K. Mann
P. H. R. Green
Jamie J. Coleman
D. E. Bild Julia Dratva
M. A. Mittleman
Sanjay Rajagopalan
B. Draper
Bernardo Beckerman
Bert Brunekreef
Andrea Z. LaCroix
Kristen Shepherd
Jacqueline S Barrett
Sebastien Villard
S. RajagopalanRoberto Elosua
Majid Ezzati
D. S. Sanders
Vinnie Cheung
Diana M. Thomas
Dianna Jacob
Heather L. Gray
Terry Gordon
Rodney U. Anderson
K Lewis
Knut Kröger
Aixia Wang
Annecy Majoros
Evan D Newnham
Thomas A. Stoffregen
Gloria Reeves
Robert I. Grossman
Richard J. Shaw
S. MohlenkampDaniel B. Jones
C. L. Curl
E. C. Saenz
Natasha Sokol
Paul Lichtenstein
David Wise
Sharona B. Ross
Michelle M. Mielke
Jeremy A. Sarnat
M. M. Walker
R. W. Allen
Marc Triola
Lynn C. Waelde
Anjum Hajat
Edward Avol
I-Min Lee
D. W. Coon
Karen Hinckley Stukovsky
Susanne Moebus
Howard N. Hodis
Kevin R. Fontaine
Michael Memmesheimer
Peibin Yue
Yawen Yu
Mary Hrywna
Alison D. Schecter
Patrick Leung
M. Memmesheimer
R. Erbel
R. V. Luepker
D. Siscovick
Peter R Gibson
Axel Schmermund
W. Edryd Stephens
Steven Shea
V. Tsui
Eric D. Peterson
Rob Beelen
R. S. Allen
M. Rubert
Paul D. Sampson
Jessica R Biesiekierski
Meng-kong Wong
Joseph F. Polak
Man Kin Lai
Yuan Marian Tzuang
Pey-Chyou Pan
Sa Liu
M. Hadjivassiliou
C. Ciacci
Michael A. Cucciare
D. M. Lloyd-Jones
Cristine Delnevo
Ryan W. Allen
Tracy Ayers
D. W. Durkin
S. Katharine Hammond
Andrés Losada
S. D. Adar
D. A. Leffler
Francesca DominiciTamiko Eto-Iwase
Robert Steinbrook
Xiping Xu
Dimitri A. Christakis
Hermann Jakobs
P Gómez-Abellán
H. L. Gray
John Peters
WILLIAM L. HASKELL
Christopher R. Braden
D. R. Jacobs
K. E. A. Lundin
Lee L. Swanstrom
L. Nichols
O. Berenfeld
Karl Klontz
F A J L Scheer
Sidney C. Smith
A. Drewnowski
C. A. Depp
Susan S. Swan
A. P. Spira
K.-H. Jockel
Mio Yamashita
Michael Jerrett
Robert Reiser
Bettina Konte
Marcus Bauer
Paul Mowery
Nigel Field
J. C. Bai
Victoria Harnik
Sara D. Adar
A. Fasano
Christine Moran
Nagalingeswaran Kumarasamy
A. M. Casillas
Bryan Forrester
Mari Tervaniemi
T. Arguëlles
J. Kaufman
J. DeCoster
Martha E. Fay
Robert H. Yolken
Pathmaja Paramsothy
Y. G. Rabinowitz
Joseph M. Currier
Pooja S. Tandon
T. G. Franklin
Timothy V. Larson
Andrew W. Correia
Michelle T. Bover Manderski
Gavin M. Bidelman
Lung Chi Chen
K. Kaukinen
B. Mausbach
R. O\'Hara
Ronald C. Petersen
Hui-Qi Tong
Larry Thompson
Alan F. Schatzberg
Yumiko Hiura
Yaron G. Rabinowitz
Marie S. O\'Neill
John A. Painter
Heather Gray
Thomas Lumley
A. A. Szpiro
Didier Moatti
Jeffrey H. Sullivan
Matthew Budoff
Michelle W. Voss
Michael H. Criqui
F. Holguin
B. Astor
Maurice E. Arregui
S. A. Beaudreau
M. Bundookwala
Kam-Mei Lau
D. H. O\'Leary
Mary M. Machulda
Kristy Lee
A. Stang
L. D. Burgio
M. J. Budoff
Rosalie V. Caruso
Ximei Jin
Michael Nonnemacher
Jason M. Holland
James D Doecke
Samer G. Mattar
S. L.-J. Liu
Alejandro Lucia
Raquel Garcia-Esteban
Rosebud O. RobertsKenneth A. Freedberg
Karol Watson
Andrew S. KernSusan J Shepherd
Jess Leung
Johanna Rengifo
Gary Mallach
Christoph Kessler
Sylvain Moreno
Ruth M. O\'Hara
Melissa Haines
Bo Lu
Alicia Bourne
M. L. Daviglus
Y. Hong
Cuno S. Uiterwaal
Timothy Sawyer
M. Justin Byron
D. G. Thompson
David S. Knopman
Karen D. Stukovsky
B. Hoffmann
Paul Sampson
Daniel E. Jimenez
Renee M. Marquett
Keith Sudheimer
Andreas Stang
Nathan D. Wong
JoAnn E. Manson
Ruth O’Hara
Brent T. Mausbach
Jennifer Cullen
R. D. Brook
Masayuki Yokode
Fritz Francois
David W. Coon
Hui-jing Lu
A. Schmermund
Bernardo Beckermann
Edward A. Gill
Dolores Gallagher-ThompsonM. Rothkopf
J J Alburquerque-Béjar
Robert Detrano
C. Shanley
Robert H. Eckel
Y-C Lee
David S. Siscovick
N. Carragher
Ana V. Diez Roux
Laura Perez
Martin A. Cohen
Paul T. Williams
Bryan Pogue
Michelle T. Bover-Manderski
L. Whitsel
Carrie Breton
Richard J. O’Connor
Wai-Chi Chan
Landon Myer
Chuan Zhou
J. I. Rotter
Michael M. Awad
Patricia M. Griffin
A. Peters
A. Navas-Acien
H. Kraemer
John D. Spengler
A. V. D. Roux
Jing Shiang Hwang
J. H. Stein
Amy H. Auchincloss
David R. Jacobs
Mary E. Klingensmith
David Hardie
Lea Liviakis
Natalie Rasgon
Teri L. HernandezMan-Kin Lai
Bruce D. Schirmer
Rafael Rivera
Rebecca M. Minter
Shelli R. KeslerKristin A. Miller
Rashmi Gupta
J M Ordovás
Cynthia L. Curl
Jochen Seissler
José M. Martinez
D. Gallagher-Thompson
A. Mollina
Melen McBride
Duncan Thomas
G. L. Burke
Alberto Ascherio
Robert V. Tauxe
Joel D. Kaufman
Philip J. Atkinson
Morris E. Franklin
Dan Rujescu
Armin Azar
Robert M. Hoekstra
Sang E. Lee
T Oliver
J. A. Murray
Weiling Liu
C W Woods
Patricia Langenberg
Ina Giegling
Leonardo H. Tonelli
C. A. Pope
Mianhua Zhong
R. J. Tiongson
Jennifer Beal
L. Chen
Frederick J. Angulo
John Di Mario
L. Sheppard
F. Zingone
F. Biagi
S. M. H. Alibhai
Nino Künzli
Qiang Li
C. P. Kelly
Anne Ho
Melvin Rosenfeld
Helen H. Suh
D. Rohan Jeyarajah
D. S. Siscovick
Michiel L. Bots
Garnet L. Anderson
N. Solano
Danielle China
Merce Medina
Yasuharu Niwa
Robert P. Reiser
Jane A. Allen
A. V. Diez Roux
L.-J. Sally Liu
Matthew Allison
Kala Mehendra Mehta
Abdullah A. Al Rabeeah
J. D. Kaufman
T. V. Larson
Ralph W. Aye
J. M. Donelan
Andrew Futterman
A. F. Schatzberg
Stephanie von Klot
Chunli Quan
Xavier Basagana
Larry W. Thompson
A. V. Diez-Roux
Arthur F. Kramer
Stefan Möhlenkamp
Ziad A. Memish
T. Raghunathan
Qinghua Sun
R. P. Tracy
Meir J. Stampfer
Maren Schmidt-Kassow
Benjamin G. Ferris
Adnan A. AlseidiVirissa Lenters
Hugh Davies
David V. Feliciano
Grace S. Rozycki
Mary Ann Hopkins
Peter M Irving
Author Network
The figure above is a network of all authors in our data set. The blue nodes are academic paper authors, and the green nodes are New York Times journalists. Each line coming from the journalist node is a citation to an academic article. This network only contains nodes that have received more than 5 citations (academic), and nodes that cite more than 5 articles (New York Times). The size of the nodes are determined by the amount of citations they have received or papers they have cited.
The journalists Deborah Blum and Nicholas Bakalar are shown to have cited many of the same academic articles (shown on the top left of the network). Similarly, the journalists Abby Ellin and Judith Graham are also shown to cite many of the same academic authors. Jour-nalists in the center of the network have citations to authors all over the network, and do not seem to overlap too much with any of the other journalists. The journalists at the bottom of the network have a similar amount of citations as many of the other journalists, but are shown to have cited a fewer amount of academic authors. This could mean that they have cited a smaller sample authors on several oc-casions, or that they cited many different authors less than five times. Journalists on the edges (with no connections to academic au-thors) do cite more than five authors, but do not cite those authors more than five times.
0 5 10 15 20 25 30 35 40 45Number of Academic Papers Cited
0.0e+00
2.0e-07
4.0e-07
6.0e-07
8.0e-07
1.0e-06
1.2e-06
Avg.
Eig
enfa
ctor
of P
aper
s C
ited
Pearson r = 0.30P-value > 0.0001
*There is a single outlier node not shown on the graph
Number of Citations vs. Average Eigenfactor
RA
W D
AT
A C
OLL
EC
TIO
N
Obtained data from New York Times application programming interface (API)
Collect metadata about all of the articles which match our query.
Includes web URLs, headlines,keywords, publication dates, word counts, and more for each article
Use ontology query to retrieve web URLs of entities which have the “scholarly publication” attribute
Used regex to extract domainnames domain names from URLs
Combine domain names to create a list of academic publication web domains
Parsed HTMLs recieved from API with BeautifulSoup for scholarly documents
Save links which appear to be citations (any URL in the list of academic publication domains)
URLs that contain ‘pubmed’, ‘.gov’, ‘.edu’, ‘doi’, ‘abstract’, or ‘pdf’ are also suspected to be citations
Collected digital object identifiers (DOI) from scholarly documents
If link leads to an HTML page, parsed the page for DOIs using BeautifulSoup and regex
If the link leads to a PDF, parse XML file generated by GROBID machine learning package for DOIs
Obtained metadata through DOI lookup service
For each DOI, send a request to http://dx.doi.org for a response in turtle format
Parse response using regex to retrieve DOI metadata (including article title, authors, publisher, etc.)
Requested Eigenfactor and open-access information from Dr. Jevin West for each DOI
Look at any anomalies or interesting trends in the data and figures
Analyzed data and created figures using GraphLab and Tableau
DA
TA
PR
OC
ES
SIN
G
Combine the NYT and link databy concatenating all NYT metadata into one table using Graphlab
Sorted out nested data struc-tures and extract relevant information
Join DOIs to each NYT article via the links from which the DOIs were retrieved
Reconcile inconsistent field-names and missing values
Merge all metadata retrieved from DOI lookup service into a single table
Join the metadata received from Dr. West into the table
Generate topic groupings forthe NYT articles
Compute unigram set for each NYT article using body-text and filter by calculating Term-Frequency-Inverse-Doc-ument-Frequency score
Iteratively train topic models on the filtered unigrams using Graphlab, adjusting parame-ters as needed
Journalists
Academic Author