surveying usage of academic research in journalismsurveying usage of academic research in journalism...

1
SURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISM How can we use authorship and citations to better understand information diffusion between popular media and academic articles for the purpose of informing the general public as well as the academic community? PROBLEM STATEMENT Logan Walls - Researcher Isabelle Edwards - Researcher Tin Ho - Project Manager PROCESS RESULTS Special thanks to Dr. Jevin West and Dr. Emma Spiro of the UW’s DataLab for guiding us along in our project Eigenfactor is a metric used to measure the influence of aca- demic publications: by employing a similar algorithm to Pag- eRank, the influence of each article is not merely determined by the number of citations it receives, but also by the influence of the papers which cite it. By plotting the number of academ- ic citations in a New York Times article against the average ei- genfactor of those citations we show a significant positive cor- relation (Pearson's r = 0.31, p < 0.0001). Initially we interpreted this as a difference in research styles between journalists, but upon examining the same variables aggregated by journalist the correlation was much weaker, suggesting that the pattern we observe between citation counts and eigenfactor is more content-driven than journal- ist-driven. PERRI KLASS, M.D JANE E. BRODY DOUGLAS QUENQUA ASHLEY TAYLOR NICHOLAS WADE MELANIE WARNER PETER ANDREY SMITH DANIELLE OFRI, M.D KENNETH CHANG ALAN SCHWARZ JAMES GORMAN TARA PARKER-POPE CATHERINE SAINT LOUIS ABBY ELLIN PAULA SPAN ANAHAD O\'CONNOR BARRY MEIER PAULINE W. CHEN, M.D HARRIET BROWN KAREN WEINTRAUB PAM BELLUCK SABRINA TAVERNISE DONALD G. McNEIL Jr JESSICA NUTIK ZITTER, M.D JAN HOFFMAN NICHOLAS BAKALAR JILL WERMAN HARRIS DAVID STIPP DENISE GRADY GINA KOLATA DEBORAH BLUM LAURA GEGGEL RICHARD A. FRIEDMAN, M.D BENEDICT CAREY JUDITH GRAHAM LAURIE TARKAN KATHERINE BOUTON LAURA BEIL ALAN SCHWARZ and SARAH COHEN PAULINE CHEN, M.D RONI CARYN RABIN ANAHAD O\'CONNOR and KAMARA SWABY DAVID DOBBS ABIGAIL ZUGER, M.D SOPHIE EGAN GRETCHEN REYNOLDS Annette Peters Thomas J. Esposito Aamar Sleemi Johanna Penell S. Moebus Elena Losina Gerard Hoek D. Leone Toshinori Murayama Jodeanne Bellant M. M. Hilgeman S. C. Smith M. A. Cohen Kelly G. Baron W. E. Haley D. Gandell Thomas E. Young Stephen D. Wexner Frank Gilliland Johan Auwerx Rita I. Kirk Paula Lozano Patrick J. Brown Y. Santalucia Amber Thornton-Bullock Benson Silverman Gina S. Lovasi Renee Marquett Ladson Hinton V. Rozalski Rochelle P. Walensky Fu-Chen Chen Yun Wang James H. Ware C. Arden Pope J. R. Brook Ana V. Diez-Roux J. Long T. Lumley Philip Greenland Karl-Heinz Jöckel Xavier Basagaña S. E. Straus J. N. Leonard Delores Gallagher-Thompson Morton Lippmann Sverre Vedal Bruno Kajiyama L. W. Thompson Kevin Chan Joseph B. Tomlins Georgina Charlesworth Jane G Muir M Garaulet Eunice Rodriguez Kateryna Fuks Raimund Erbel Olaoluwa Okusaga K. L. Thompson Martin I. Meltzer Steven B. Abramson Paul Fischer H. Kyriazopoulos Adam A. Szpiro Sapphire Li David B. Allison Larry E. Beutler Petros Koutrakis Conrad P. Earnest Peter C Gøtzsche Lianne Sheppard Julian Montoro-Rodriguez J. M. Holland Ulf de Faire Frank E. Speizer Henry Brodaty P. D. Sampson J. F. Ludvigsson B. T. Mausbach Benoît G. Bardy Lynn Buckvar-Keltz Margaret D. Carroll Rebecca Peng Mark J. Travers Phyllis C. Zee Wendy J. Mack L.W. Thompson Peng-Chih Wang N. Dragano Kristine Yaffe F. Sun Miriam Fuchsluger Alma Au Linda Lam Q Yang Kathryn J. Reid Annette M. Hartmann J. E. Manson Jaume Marrugat Naoharu Iwai R. G. Barr Eric de Groot Simon Hales R. Graham Barr Timothy Gould Kenneth H. Mayer Frank R. Lewis Patrick Kinney Pere Puigserver Yea-Ing Shyu Laurie LaBree Damiano Baldassarre R. A. Kronmal A. Bhatnagar Nico Dragano Teodor T. Postolache Daniel Jimenez Alan Schatzberg D. Mozaffarian Douglas W. Dockery Dolores Gallagher Thompson Ann Bilbrey N. Lehmann Cristine D. Delnevo Nina Kraus Barbara Hoffmann Ann Rojas-Cheatham A. H. Auchincloss J. Keeler Ruth M. O’Hara Aleksandra Stepanenko Ray Chan K. Mann P. H. R. Green Jamie J. Coleman D. E. Bild Julia Dratva M. A. Mittleman Sanjay Rajagopalan B. Draper Bernardo Beckerman Bert Brunekreef Andrea Z. LaCroix Kristen Shepherd Jacqueline S Barrett Sebastien Villard S. Rajagopalan Roberto Elosua Majid Ezzati D. S. Sanders Vinnie Cheung Diana M. Thomas Dianna Jacob Heather L. Gray Terry Gordon Rodney U. Anderson K Lewis Knut Kröger Aixia Wang Annecy Majoros Evan D Newnham Thomas A. Stoffregen Gloria Reeves Robert I. Grossman Richard J. Shaw S. Mohlenkamp Daniel B. Jones C. L. Curl E. C. Saenz Natasha Sokol Paul Lichtenstein David Wise Sharona B. Ross Michelle M. Mielke Jeremy A. Sarnat M. M. Walker R. W. Allen Marc Triola Lynn C. Waelde Anjum Hajat Edward Avol I-Min Lee D. W. Coon Karen Hinckley Stukovsky Susanne Moebus Howard N. Hodis Kevin R. Fontaine Michael Memmesheimer Peibin Yue Yawen Yu Mary Hrywna Alison D. Schecter Patrick Leung M. Memmesheimer R. Erbel R. V. Luepker D. Siscovick Peter R Gibson Axel Schmermund W. Edryd Stephens Steven Shea V. Tsui Eric D. Peterson Rob Beelen R. S. Allen M. Rubert Paul D. Sampson Jessica R Biesiekierski Meng-kong Wong Joseph F. Polak Man Kin Lai Yuan Marian Tzuang Pey-Chyou Pan Sa Liu M. Hadjivassiliou C. Ciacci Michael A. Cucciare D. M. Lloyd-Jones Cristine Delnevo Ryan W. Allen Tracy Ayers D. W. Durkin S. Katharine Hammond Andrés Losada S. D. Adar D. A. Leffler Francesca Dominici Tamiko Eto-Iwase Robert Steinbrook Xiping Xu Dimitri A. Christakis Hermann Jakobs P Gómez-Abellán H. L. Gray John Peters WILLIAM L. HASKELL Christopher R. Braden D. R. Jacobs K. E. A. Lundin Lee L. Swanstrom L. Nichols O. Berenfeld Karl Klontz F A J L Scheer Sidney C. Smith A. Drewnowski C. A. Depp Susan S. Swan A. P. Spira K.-H. Jockel Mio Yamashita Michael Jerrett Robert Reiser Bettina Konte Marcus Bauer Paul Mowery Nigel Field J. C. Bai Victoria Harnik Sara D. Adar A. Fasano Christine Moran Nagalingeswaran Kumarasamy A. M. Casillas Bryan Forrester Mari Tervaniemi T. Arguëlles J. Kaufman J. DeCoster Martha E. Fay Robert H. Yolken Pathmaja Paramsothy Y. G. Rabinowitz Joseph M. Currier Pooja S. Tandon T. G. Franklin Timothy V. Larson Andrew W. Correia Michelle T. Bover Manderski Gavin M. Bidelman Lung Chi Chen K. Kaukinen B. Mausbach R. O\'Hara Ronald C. Petersen Hui-Qi Tong Larry Thompson Alan F. Schatzberg Yumiko Hiura Yaron G. Rabinowitz Marie S. O\'Neill John A. Painter Heather Gray Thomas Lumley A. A. Szpiro Didier Moatti Jeffrey H. Sullivan Matthew Budoff Michelle W. Voss Michael H. Criqui F. Holguin B. Astor Maurice E. Arregui S. A. Beaudreau M. Bundookwala Kam-Mei Lau D. H. O\'Leary Mary M. Machulda Kristy Lee A. Stang L. D. Burgio M. J. Budoff Rosalie V. Caruso Ximei Jin Michael Nonnemacher Jason M. Holland James D Doecke Samer G. Mattar S. L.-J. Liu Alejandro Lucia Raquel Garcia-Esteban Rosebud O. Roberts Kenneth A. Freedberg Karol Watson Andrew S. Kern Susan J Shepherd Jess Leung Johanna Rengifo Gary Mallach Christoph Kessler Sylvain Moreno Ruth M. O\'Hara Melissa Haines Bo Lu Alicia Bourne M. L. Daviglus Y. Hong Cuno S. Uiterwaal Timothy Sawyer M. Justin Byron D. G. Thompson David S. Knopman Karen D. Stukovsky B. Hoffmann Paul Sampson Daniel E. Jimenez Renee M. Marquett Keith Sudheimer Andreas Stang Nathan D. Wong JoAnn E. Manson Ruth O’Hara Brent T. Mausbach Jennifer Cullen R. D. Brook Masayuki Yokode Fritz Francois David W. Coon Hui-jing Lu A. Schmermund Bernardo Beckermann Edward A. Gill Dolores Gallagher-Thompson M. Rothkopf J J Alburquerque-Béjar Robert Detrano C. Shanley Robert H. Eckel Y-C Lee David S. Siscovick N. Carragher Ana V. Diez Roux Laura Perez Martin A. Cohen Paul T. Williams Bryan Pogue Michelle T. Bover-Manderski L. Whitsel Carrie Breton Richard J. O’Connor Wai-Chi Chan Landon Myer Chuan Zhou J. I. Rotter Michael M. Awad Patricia M. Griffin A. Peters A. Navas-Acien H. Kraemer John D. Spengler A. V. D. Roux Jing Shiang Hwang J. H. Stein Amy H. Auchincloss David R. Jacobs Mary E. Klingensmith David Hardie Lea Liviakis Natalie Rasgon Teri L. Hernandez Man-Kin Lai Bruce D. Schirmer Rafael Rivera Rebecca M. Minter Shelli R. Kesler Kristin A. Miller Rashmi Gupta J M Ordovás Cynthia L. Curl Jochen Seissler José M. Martinez D. Gallagher-Thompson A. Mollina Melen McBride Duncan Thomas G. L. Burke Alberto Ascherio Robert V. Tauxe Joel D. Kaufman Philip J. Atkinson Morris E. Franklin Dan Rujescu Armin Azar Robert M. Hoekstra Sang E. Lee T Oliver J. A. Murray Weiling Liu C W Woods Patricia Langenberg Ina Giegling Leonardo H. Tonelli C. A. Pope Mianhua Zhong R. J. Tiongson Jennifer Beal L. Chen Frederick J. Angulo John Di Mario L. Sheppard F. Zingone F. Biagi S. M. H. Alibhai Nino Künzli Qiang Li C. P. Kelly Anne Ho Melvin Rosenfeld Helen H. Suh D. Rohan Jeyarajah D. S. Siscovick Michiel L. Bots Garnet L. Anderson N. Solano Danielle China Merce Medina Yasuharu Niwa Robert P. Reiser Jane A. Allen A. V. Diez Roux L.-J. Sally Liu Matthew Allison Kala Mehendra Mehta Abdullah A. Al Rabeeah J. D. Kaufman T. V. Larson Ralph W. Aye J. M. Donelan Andrew Futterman A. F. Schatzberg Stephanie von Klot Chunli Quan Xavier Basagana Larry W. Thompson A. V. Diez-Roux Arthur F. Kramer Stefan Möhlenkamp Ziad A. Memish T. Raghunathan Qinghua Sun R. P. Tracy Meir J. Stampfer Maren Schmidt-Kassow Benjamin G. Ferris Adnan A. Alseidi Virissa Lenters Hugh Davies David V. Feliciano Grace S. Rozycki Mary Ann Hopkins Peter M Irving Author Network The figure above is a network of all authors in our data set. The blue nodes are academic paper authors, and the green nodes are New York Times journalists. Each line coming from the journalist node is a citation to an academic article. This network only contains nodes that have received more than 5 citations (academic), and nodes that cite more than 5 articles (New York Times). The size of the nodes are determined by the amount of citations they have received or papers they have cited. The journalists Deborah Blum and Nicholas Bakalar are shown to have cited many of the same academic articles (shown on the top left of the network). Similarly, the journalists Abby Ellin and Judith Graham are also shown to cite many of the same academic authors. Jour- nalists in the center of the network have citations to authors all over the network, and do not seem to overlap too much with any of the other journalists. The journalists at the bottom of the network have a similar amount of citations as many of the other journalists, but are shown to have cited a fewer amount of academic authors. This could mean that they have cited a smaller sample authors on several oc- casions, or that they cited many different authors less than five times. Journalists on the edges (with no connections to academic au- thors) do cite more than five authors, but do not cite those authors more than five times. 0 5 10 15 20 25 30 35 40 45 Number of Academic Papers Cited 0.0e+00 2.0e-07 4.0e-07 6.0e-07 8.0e-07 1.0e-06 1.2e-06 Avg. Eigenfactor of Papers Cited Pearson r = 0.30 P-value > 0.0001 *There is a single outlier node not shown on the graph Number of Citations vs. Average Eigenfactor RAW DATA COLLECTION Obtained data from New York Times application programming interface (API) Collect metadata about all of the articles which match our query. Includes web URLs, headlines, keywords, publication dates, word counts, and more for each article Use ontology query to retrieve web URLs of entities which have the “scholarly publication” attribute Used regex to extract domain names domain names from URLs Combine domain names to create a list of academic publication web domains Parsed HTMLs recieved from API with BeautifulSoup for scholarly documents Save links which appear to be citations (any URL in the list of academic publication domains) URLs that contain ‘pubmed’, ‘.gov’, ‘.edu’, ‘doi’, ‘abstract’, or ‘pdf’ are also suspected to be citations Collected digital object identifiers (DOI) from scholarly documents If link leads to an HTML page, parsed the page for DOIs using BeautifulSoup and regex If the link leads to a PDF, parse XML file generated by GROBID machine learning package for DOIs Obtained metadata through DOI lookup service For each DOI, send a request to http://dx.doi.org for a response in turtle format Parse response using regex to retrieve DOI metadata (including article title, authors, publisher, etc.) Requested Eigenfactor and open-access information from Dr. Jevin West for each DOI Look at any anomalies or interesting trends in the data and figures Analyzed data and created figures using GraphLab and Tableau DATA PROCESSING Combine the NYT and link data by concatenating all NYT metadata into one table using Graphlab Sorted out nested data struc- tures and extract relevant information Join DOIs to each NYT article via the links from which the DOIs were retrieved Reconcile inconsistent field-names and missing values Merge all metadata retrieved from DOI lookup service into a single table Join the metadata received from Dr. West into the table Generate topic groupings for the NYT articles Compute unigram set for each NYT article using body-text and filter by calculating Term-Frequency-Inverse-Doc- ument-Frequency score Iteratively train topic models on the filtered unigrams using Graphlab, adjusting parame- ters as needed Journalists Academic Author

Upload: others

Post on 19-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISMSURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISM How can we use authorship and citations to better understand information diffusion

SURVEYING USAGE OF ACADEMIC RESEARCH IN JOURNALISM

How can we use authorship and citations to better understand information diffusion between popular media and academic articles for the purpose of informing the general public as well as the academic community?

PROBLEM STATEMENT

Logan Walls - Researcher Isabelle Edwards - Researcher Tin Ho - Project Manager

PROCESS

RESULTS

Special thanks to Dr. Jevin West and Dr. Emma Spiro of the UW’s DataLab for guiding us along in our project

Eigenfactor is a metric used to measure the influence of aca-demic publications: by employing a similar algorithm to Pag-eRank, the influence of each article is not merely determined by the number of citations it receives, but also by the influence of the papers which cite it. By plotting the number of academ-ic citations in a New York Times article against the average ei-genfactor of those citations we show a significant positive cor-relation (Pearson's r = 0.31, p < 0.0001).

Initially we interpreted this as a difference in research styles between journalists, but upon examining the same variables aggregated by journalist the correlation was much weaker, suggesting that the pattern we observe between citation counts and eigenfactor is more content-driven than journal-ist-driven.

PERRI KLASS, M.D

JANE E. BRODY

DOUGLAS QUENQUA

ASHLEY TAYLOR

NICHOLAS WADE

MELANIE WARNER

PETER ANDREY SMITH

DANIELLE OFRI, M.D

KENNETH CHANG

ALAN SCHWARZ

JAMES GORMAN

TARA PARKER-POPE

CATHERINE SAINT LOUIS

ABBY ELLIN

PAULA SPAN

ANAHAD O\'CONNOR

BARRY MEIER

PAULINE W. CHEN, M.D

HARRIET BROWN

KAREN WEINTRAUB

PAM BELLUCK

SABRINA TAVERNISE

DONALD G. McNEIL Jr

JESSICA NUTIK ZITTER, M.D

JAN HOFFMAN

NICHOLAS BAKALAR

JILL WERMAN HARRIS

DAVID STIPP

DENISE GRADY

GINA KOLATA

DEBORAH BLUM

LAURA GEGGEL

RICHARD A. FRIEDMAN, M.D

BENEDICT CAREY

JUDITH GRAHAM

LAURIE TARKANKATHERINE BOUTON

LAURA BEIL

ALAN SCHWARZ and SARAH COHEN

PAULINE CHEN, M.D

RONI CARYN RABIN

ANAHAD O\'CONNOR and KAMARA SWABY

DAVID DOBBSABIGAIL ZUGER, M.D

SOPHIE EGAN

GRETCHEN REYNOLDS

Annette Peters

Thomas J. Esposito

Aamar Sleemi

Johanna Penell

S. Moebus

Elena Losina

Gerard Hoek

D. Leone

Toshinori Murayama

Jodeanne Bellant

M. M. Hilgeman

S. C. Smith

M. A. Cohen

Kelly G. Baron

W. E. Haley

D. Gandell

Thomas E. Young

Stephen D. Wexner

Frank Gilliland

Johan Auwerx

Rita I. Kirk

Paula Lozano

Patrick J. Brown

Y. Santalucia

Amber Thornton-Bullock

Benson Silverman

Gina S. Lovasi

Renee Marquett

Ladson Hinton

V. Rozalski

Rochelle P. Walensky

Fu-Chen Chen

Yun Wang

James H. Ware

C. Arden Pope

J. R. Brook

Ana V. Diez-RouxJ. Long

T. Lumley

Philip Greenland

Karl-Heinz Jöckel

Xavier Basagaña

S. E. Straus

J. N. Leonard

Delores Gallagher-Thompson

Morton Lippmann

Sverre Vedal

Bruno Kajiyama

L. W. Thompson

Kevin Chan

Joseph B. Tomlins

Georgina Charlesworth

Jane G Muir

M Garaulet

Eunice Rodriguez

Kateryna Fuks

Raimund Erbel

Olaoluwa Okusaga

K. L. Thompson

Martin I. Meltzer

Steven B. Abramson

Paul Fischer

H. Kyriazopoulos

Adam A. Szpiro

Sapphire Li

David B. Allison

Larry E. Beutler

Petros Koutrakis

Conrad P. Earnest

Peter C Gøtzsche

Lianne Sheppard

Julian Montoro-Rodriguez

J. M. Holland

Ulf de Faire

Frank E. Speizer

Henry Brodaty

P. D. Sampson

J. F. Ludvigsson

B. T. Mausbach

Benoît G. Bardy

Lynn Buckvar-Keltz

Margaret D. Carroll

Rebecca Peng

Mark J. Travers

Phyllis C. Zee

Wendy J. Mack

L.W. Thompson

Peng-Chih Wang

N. Dragano

Kristine Yaffe

F. Sun

Miriam Fuchsluger

Alma Au

Linda Lam

Q Yang

Kathryn J. Reid

Annette M. Hartmann

J. E. Manson

Jaume Marrugat

Naoharu Iwai

R. G. Barr

Eric de Groot

Simon Hales

R. Graham Barr

Timothy Gould

Kenneth H. Mayer

Frank R. Lewis

Patrick Kinney

Pere Puigserver

Yea-Ing Shyu

Laurie LaBree

Damiano Baldassarre

R. A. Kronmal

A. Bhatnagar

Nico Dragano

Teodor T. Postolache

Daniel Jimenez

Alan Schatzberg

D. Mozaffarian

Douglas W. Dockery

Dolores Gallagher Thompson

Ann BilbreyN. Lehmann

Cristine D. Delnevo

Nina Kraus

Barbara Hoffmann

Ann Rojas-CheathamA. H. Auchincloss

J. Keeler

Ruth M. O’Hara

Aleksandra Stepanenko

Ray Chan

K. Mann

P. H. R. Green

Jamie J. Coleman

D. E. Bild Julia Dratva

M. A. Mittleman

Sanjay Rajagopalan

B. Draper

Bernardo Beckerman

Bert Brunekreef

Andrea Z. LaCroix

Kristen Shepherd

Jacqueline S Barrett

Sebastien Villard

S. RajagopalanRoberto Elosua

Majid Ezzati

D. S. Sanders

Vinnie Cheung

Diana M. Thomas

Dianna Jacob

Heather L. Gray

Terry Gordon

Rodney U. Anderson

K Lewis

Knut Kröger

Aixia Wang

Annecy Majoros

Evan D Newnham

Thomas A. Stoffregen

Gloria Reeves

Robert I. Grossman

Richard J. Shaw

S. MohlenkampDaniel B. Jones

C. L. Curl

E. C. Saenz

Natasha Sokol

Paul Lichtenstein

David Wise

Sharona B. Ross

Michelle M. Mielke

Jeremy A. Sarnat

M. M. Walker

R. W. Allen

Marc Triola

Lynn C. Waelde

Anjum Hajat

Edward Avol

I-Min Lee

D. W. Coon

Karen Hinckley Stukovsky

Susanne Moebus

Howard N. Hodis

Kevin R. Fontaine

Michael Memmesheimer

Peibin Yue

Yawen Yu

Mary Hrywna

Alison D. Schecter

Patrick Leung

M. Memmesheimer

R. Erbel

R. V. Luepker

D. Siscovick

Peter R Gibson

Axel Schmermund

W. Edryd Stephens

Steven Shea

V. Tsui

Eric D. Peterson

Rob Beelen

R. S. Allen

M. Rubert

Paul D. Sampson

Jessica R Biesiekierski

Meng-kong Wong

Joseph F. Polak

Man Kin Lai

Yuan Marian Tzuang

Pey-Chyou Pan

Sa Liu

M. Hadjivassiliou

C. Ciacci

Michael A. Cucciare

D. M. Lloyd-Jones

Cristine Delnevo

Ryan W. Allen

Tracy Ayers

D. W. Durkin

S. Katharine Hammond

Andrés Losada

S. D. Adar

D. A. Leffler

Francesca DominiciTamiko Eto-Iwase

Robert Steinbrook

Xiping Xu

Dimitri A. Christakis

Hermann Jakobs

P Gómez-Abellán

H. L. Gray

John Peters

WILLIAM L. HASKELL

Christopher R. Braden

D. R. Jacobs

K. E. A. Lundin

Lee L. Swanstrom

L. Nichols

O. Berenfeld

Karl Klontz

F A J L Scheer

Sidney C. Smith

A. Drewnowski

C. A. Depp

Susan S. Swan

A. P. Spira

K.-H. Jockel

Mio Yamashita

Michael Jerrett

Robert Reiser

Bettina Konte

Marcus Bauer

Paul Mowery

Nigel Field

J. C. Bai

Victoria Harnik

Sara D. Adar

A. Fasano

Christine Moran

Nagalingeswaran Kumarasamy

A. M. Casillas

Bryan Forrester

Mari Tervaniemi

T. Arguëlles

J. Kaufman

J. DeCoster

Martha E. Fay

Robert H. Yolken

Pathmaja Paramsothy

Y. G. Rabinowitz

Joseph M. Currier

Pooja S. Tandon

T. G. Franklin

Timothy V. Larson

Andrew W. Correia

Michelle T. Bover Manderski

Gavin M. Bidelman

Lung Chi Chen

K. Kaukinen

B. Mausbach

R. O\'Hara

Ronald C. Petersen

Hui-Qi Tong

Larry Thompson

Alan F. Schatzberg

Yumiko Hiura

Yaron G. Rabinowitz

Marie S. O\'Neill

John A. Painter

Heather Gray

Thomas Lumley

A. A. Szpiro

Didier Moatti

Jeffrey H. Sullivan

Matthew Budoff

Michelle W. Voss

Michael H. Criqui

F. Holguin

B. Astor

Maurice E. Arregui

S. A. Beaudreau

M. Bundookwala

Kam-Mei Lau

D. H. O\'Leary

Mary M. Machulda

Kristy Lee

A. Stang

L. D. Burgio

M. J. Budoff

Rosalie V. Caruso

Ximei Jin

Michael Nonnemacher

Jason M. Holland

James D Doecke

Samer G. Mattar

S. L.-J. Liu

Alejandro Lucia

Raquel Garcia-Esteban

Rosebud O. RobertsKenneth A. Freedberg

Karol Watson

Andrew S. KernSusan J Shepherd

Jess Leung

Johanna Rengifo

Gary Mallach

Christoph Kessler

Sylvain Moreno

Ruth M. O\'Hara

Melissa Haines

Bo Lu

Alicia Bourne

M. L. Daviglus

Y. Hong

Cuno S. Uiterwaal

Timothy Sawyer

M. Justin Byron

D. G. Thompson

David S. Knopman

Karen D. Stukovsky

B. Hoffmann

Paul Sampson

Daniel E. Jimenez

Renee M. Marquett

Keith Sudheimer

Andreas Stang

Nathan D. Wong

JoAnn E. Manson

Ruth O’Hara

Brent T. Mausbach

Jennifer Cullen

R. D. Brook

Masayuki Yokode

Fritz Francois

David W. Coon

Hui-jing Lu

A. Schmermund

Bernardo Beckermann

Edward A. Gill

Dolores Gallagher-ThompsonM. Rothkopf

J J Alburquerque-Béjar

Robert Detrano

C. Shanley

Robert H. Eckel

Y-C Lee

David S. Siscovick

N. Carragher

Ana V. Diez Roux

Laura Perez

Martin A. Cohen

Paul T. Williams

Bryan Pogue

Michelle T. Bover-Manderski

L. Whitsel

Carrie Breton

Richard J. O’Connor

Wai-Chi Chan

Landon Myer

Chuan Zhou

J. I. Rotter

Michael M. Awad

Patricia M. Griffin

A. Peters

A. Navas-Acien

H. Kraemer

John D. Spengler

A. V. D. Roux

Jing Shiang Hwang

J. H. Stein

Amy H. Auchincloss

David R. Jacobs

Mary E. Klingensmith

David Hardie

Lea Liviakis

Natalie Rasgon

Teri L. HernandezMan-Kin Lai

Bruce D. Schirmer

Rafael Rivera

Rebecca M. Minter

Shelli R. KeslerKristin A. Miller

Rashmi Gupta

J M Ordovás

Cynthia L. Curl

Jochen Seissler

José M. Martinez

D. Gallagher-Thompson

A. Mollina

Melen McBride

Duncan Thomas

G. L. Burke

Alberto Ascherio

Robert V. Tauxe

Joel D. Kaufman

Philip J. Atkinson

Morris E. Franklin

Dan Rujescu

Armin Azar

Robert M. Hoekstra

Sang E. Lee

T Oliver

J. A. Murray

Weiling Liu

C W Woods

Patricia Langenberg

Ina Giegling

Leonardo H. Tonelli

C. A. Pope

Mianhua Zhong

R. J. Tiongson

Jennifer Beal

L. Chen

Frederick J. Angulo

John Di Mario

L. Sheppard

F. Zingone

F. Biagi

S. M. H. Alibhai

Nino Künzli

Qiang Li

C. P. Kelly

Anne Ho

Melvin Rosenfeld

Helen H. Suh

D. Rohan Jeyarajah

D. S. Siscovick

Michiel L. Bots

Garnet L. Anderson

N. Solano

Danielle China

Merce Medina

Yasuharu Niwa

Robert P. Reiser

Jane A. Allen

A. V. Diez Roux

L.-J. Sally Liu

Matthew Allison

Kala Mehendra Mehta

Abdullah A. Al Rabeeah

J. D. Kaufman

T. V. Larson

Ralph W. Aye

J. M. Donelan

Andrew Futterman

A. F. Schatzberg

Stephanie von Klot

Chunli Quan

Xavier Basagana

Larry W. Thompson

A. V. Diez-Roux

Arthur F. Kramer

Stefan Möhlenkamp

Ziad A. Memish

T. Raghunathan

Qinghua Sun

R. P. Tracy

Meir J. Stampfer

Maren Schmidt-Kassow

Benjamin G. Ferris

Adnan A. AlseidiVirissa Lenters

Hugh Davies

David V. Feliciano

Grace S. Rozycki

Mary Ann Hopkins

Peter M Irving

Author Network

The figure above is a network of all authors in our data set. The blue nodes are academic paper authors, and the green nodes are New York Times journalists. Each line coming from the journalist node is a citation to an academic article. This network only contains nodes that have received more than 5 citations (academic), and nodes that cite more than 5 articles (New York Times). The size of the nodes are determined by the amount of citations they have received or papers they have cited.

The journalists Deborah Blum and Nicholas Bakalar are shown to have cited many of the same academic articles (shown on the top left of the network). Similarly, the journalists Abby Ellin and Judith Graham are also shown to cite many of the same academic authors. Jour-nalists in the center of the network have citations to authors all over the network, and do not seem to overlap too much with any of the other journalists. The journalists at the bottom of the network have a similar amount of citations as many of the other journalists, but are shown to have cited a fewer amount of academic authors. This could mean that they have cited a smaller sample authors on several oc-casions, or that they cited many different authors less than five times. Journalists on the edges (with no connections to academic au-thors) do cite more than five authors, but do not cite those authors more than five times.

0 5 10 15 20 25 30 35 40 45Number of Academic Papers Cited

0.0e+00

2.0e-07

4.0e-07

6.0e-07

8.0e-07

1.0e-06

1.2e-06

Avg.

Eig

enfa

ctor

of P

aper

s C

ited

Pearson r = 0.30P-value > 0.0001

*There is a single outlier node not shown on the graph

Number of Citations vs. Average Eigenfactor

RA

W D

AT

A C

OLL

EC

TIO

N

Obtained data from New York Times application programming interface (API)

Collect metadata about all of the articles which match our query.

Includes web URLs, headlines,keywords, publication dates, word counts, and more for each article

Use ontology query to retrieve web URLs of entities which have the “scholarly publication” attribute

Used regex to extract domainnames domain names from URLs

Combine domain names to create a list of academic publication web domains

Parsed HTMLs recieved from API with BeautifulSoup for scholarly documents

Save links which appear to be citations (any URL in the list of academic publication domains)

URLs that contain ‘pubmed’, ‘.gov’, ‘.edu’, ‘doi’, ‘abstract’, or ‘pdf’ are also suspected to be citations

Collected digital object identifiers (DOI) from scholarly documents

If link leads to an HTML page, parsed the page for DOIs using BeautifulSoup and regex

If the link leads to a PDF, parse XML file generated by GROBID machine learning package for DOIs

Obtained metadata through DOI lookup service

For each DOI, send a request to http://dx.doi.org for a response in turtle format

Parse response using regex to retrieve DOI metadata (including article title, authors, publisher, etc.)

Requested Eigenfactor and open-access information from Dr. Jevin West for each DOI

Look at any anomalies or interesting trends in the data and figures

Analyzed data and created figures using GraphLab and Tableau

DA

TA

PR

OC

ES

SIN

G

Combine the NYT and link databy concatenating all NYT metadata into one table using Graphlab

Sorted out nested data struc-tures and extract relevant information

Join DOIs to each NYT article via the links from which the DOIs were retrieved

Reconcile inconsistent field-names and missing values

Merge all metadata retrieved from DOI lookup service into a single table

Join the metadata received from Dr. West into the table

Generate topic groupings forthe NYT articles

Compute unigram set for each NYT article using body-text and filter by calculating Term-Frequency-Inverse-Doc-ument-Frequency score

Iteratively train topic models on the filtered unigrams using Graphlab, adjusting parame-ters as needed

Journalists

Academic Author