preserving news apps presents huge challenges

15
Newspaper Research Journal 2015, Vol. 36(3) 299–313 © 2015 NOND of AEJMC Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0739532915600742 nrj.sagepub.com Article Preserving news apps present huge challenges By Meredith Broussard Abstract Currently the digital archives of newspapers are not archiving news apps, the interactive database-driven, multimedia projects. Because of the multiple elements required to access a news app, conversion of the dynamic news app into static HTML pages is one possible avenue for future archiving. Keywords archiving, software preservation, data journalism, news apps, copyright, reproducible research A s digital technology has become more complex in recent years, archiving the news has also grown far more complex. Today’s digital news organiza- tions create interactive data visualizations, video, animation and apps in addition to print artifacts. These multimedia projects are undoubtedly a crucial part of how users experience news in the digital age, yet they are not preserved in database versions of newspapers or in library archives. Nor can these multimedia elements sim- ply be added to existing library databases, much like a floppy disk cannot be inserted into a USB port. This paper argues that entirely new engineering solutions are required in order to effectively preserve today’s multimedia news for tomorrow’s scholars. I outline the substantial technological challenges involved in preserving today’s most cutting-edge multimedia artifact, a type of database-driven online story that news developers call a news app. This is not the same as the news app that one might use to read a newspaper on a phone or mobile device, although the term used is the same. News developers Broussard is an assistant professor in the Arthur L. Carter Journalism Institute at New York University. Broussard is the corresponding author: [email protected] 600742NRJ XX X 10.1177/0739532915600742Newspaper Research JournalBroussard research-article 2015

Upload: nyu

Post on 05-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Newspaper Research Journal2015, Vol. 36(3) 299 –313© 2015 NOND of AEJMC

Reprints and permissions:sagepub.com/journalsPermissions.nav

DOI: 10.1177/0739532915600742nrj.sagepub.com

Article

Preserving news apps present huge challenges

By Meredith Broussard

AbstractCurrently the digital archives of newspapers are not archiving news apps, the interactive database-driven, multimedia projects. Because of the multiple elements required to access a news app, conversion of the dynamic news app into static HTML pages is one possible avenue for future archiving.

Keywordsarchiving, software preservation, data journalism, news apps, copyright, reproducible research

As digital technology has become more complex in recent years, archiving the news has also grown far more complex. Today’s digital news organiza-tions create interactive data visualizations, video, animation and apps in

addition to print artifacts. These multimedia projects are undoubtedly a crucial part of how users experience news in the digital age, yet they are not preserved in database versions of newspapers or in library archives. Nor can these multimedia elements sim-ply be added to existing library databases, much like a floppy disk cannot be inserted into a USB port.

This paper argues that entirely new engineering solutions are required in order to effectively preserve today’s multimedia news for tomorrow’s scholars. I outline the substantial technological challenges involved in preserving today’s most cutting-edge multimedia artifact, a type of database-driven online story that news developers call a news app. This is not the same as the news app that one might use to read a newspaper on a phone or mobile device, although the term used is the same. News developers

Broussard is an assistant professor in the Arthur L. Carter Journalism Institute at New York University. Broussard is the

corresponding author: [email protected]

600742 NRJXXX10.1177/0739532915600742Newspaper Research JournalBroussardresearch-article2015

300 Newspaper Research Journal 36(3)

have identified the news app as a priority for preservation efforts. I describe how news apps work and why their unique design presents particular challenges to archivists. By borrowing preservation strategies from video games and contemporary art, media scholars can begin to develop an innovative path forward that will allow us to preserve the first draft of news app history.

Literature Review and BackgroundDefining News Apps

News apps, or interactive news applications, can be thought of as online story pack-ages. Klein writes:

Inside newsrooms, these interactive databases are sometimes called ‘news applications’1—but don’t be confused. They’re interactive databases published on the web, not something you buy on your smartphone. Think Dollars for Docs, not Flipboard.2

ProPublica’s “Dollars for Docs” project is a news app that allows readers to search for payments drug companies made to individual doctors and health professionals for promotional talks, research and consulting. It is a searchable online database accom-panied by investigative stories. Another example of a news app is The New York Times’ “Red Carpet Project,” which allows readers to search and view 19 years of Oscar fashion photos. The most common type of news app includes an interactive online database and one or more accompanying stories. Unlike a print story displayed online, a news app is created using computer programming techniques.3 A news app thus has multiple components:

•• A database•• The data in the database•• The unique graphical interface that appears in the browser, through which the

user interacts with the database•• One or more text-based stories•• Photos or illustrations

News developers engage in highly specialized labor to make all of these elements interoperable on a newspaper’s unique Web server. Preserving a news app would involve packaging all of these elements and making them work on a succession of dif-ferent servers in perpetuity. This is easier said than done, for reasons that will be spelled out shortly.

Literature in Adjacent FieldsBecause journalists have only begun producing artifacts labeled news apps in the past

few years,4 and because very few news apps are produced each year, communication scholars are just now beginning to look at news app preservation issues. Thus, much of the relevant literature may be found in adjacent disciplines such as contemporary art,

Broussard 301

computer history, game development and library science.5 The problem of how to pre-serve digital artifacts is very much in process, and definitive long-term strategies are still being developed about what to preserve and how to preserve it.6

Rothenberg writes of four challenges for digital archives: “physical decay of media, loss of information about the format, encoding, or compression of files, obsolescence of hardware, and unavailability of software.” 7 He notes that the practical physical lifetime of a magnetic disk is 5-10 years, and the average time until the disk is obsolete is only five years. Today’s news app storage solutions will likely be ported to future technologies, just as newspapers were converted to microfilm and then microfilm was widely digitized.

Loss of information is of particular interest to communication scholars who use full-text databases, created by aggregators such as Lexis-Nexis Academic or EBSCOhost, to acquire material for content analysis. Youngblood et al8 write that media researchers assume—incorrectly—that the database version of a newspaper is identical to the print version. They argue that the mismatch has substantial implica-tions for content analysis.

News apps form a subset of the material that does not appear in aggregators’ data-bases, and as such their content is not available to scholars for systematic analysis. However, news apps represent some of newsrooms’ most innovative and technologi-cally advanced work today, and as such they are of clear interest to communication researchers. A method to collect and analyze these artifacts would benefit the academy and would allow newsrooms to preserve their work product more effectively.

Although we cannot know what scholars in the future will want to know about news apps, we can confidently predict that they will want to know about them and that they will want to view news apps on platforms and devices that do not exist today. This will require us to develop standards and communicate them in order to ensure that today’s software can run on tomorrow’s computers.

Software Preservation Obstacles

Grad wrote of preserving software:

Many of us in the software business believe that by studying the systems and applications software produced over the past 50 years, historians can gain special insight into the economic, political and social changes that have modified the world and led to the dramatic increase in globalization and democracy.9

It is important to address the logistical issues associated with preserving software, however.10 A news app or any Web-based software runs in a layered structure in the following order:

•• Web browser•• News app•• Application/program software•• Operating system•• Hardware

302 Newspaper Research Journal 36(3)

Any piece of software is built to run on top of other software called an operating system, which in turn runs on top of specific hardware. Today’s Mac laptop cannot run a program written in BASIC on a Commodore 64 from the 1980s, in part because the hardware is different in the Mac and the C64. Rothenberg11 and Rinehart argue that using emulators is one viable strategy for running software in the future. Creating a hardware emulator will allow us to install and run actual copies of today’s software in the future. Bollacker writes of the inevitable problem associated with emulation:

Emulation is now a common technique used to run old software on new hardware. It does, however, have a problem of recursion—what happens when there is no longer compatible hardware to run the emulator itself? Emulators can by layered like Matryoshka dolls, one running inside another running inside another.12

An effective strategy for preserving news apps will address these known challenges and will address the nuances of preserving digital artifacts for future scholars.

Research QuestionsAs a first step toward effective news app preservation, researchers must make stra-

tegic decisions about which news apps should be preserved and what type of docu-mentation or contextual material should accompany them. This study asks three research questions:

RQ1:

Which news apps should be preserved, and how should this determination be made?

RQ2:

Which components of the apps should be preserved?

RQ3:

What are the known technological and legal challenges that must be addressed?

MethodsIn the digital age, journalism has increasingly borrowed from qualitative approaches

in the social sciences, notably multi-method research strategies in sociology, anthro-pology and social psychology.13 This study relied on a “grounded theory” approach to qualitative research as outlined by Glaser and Strauss.14 Theoretical saturation15 was achieved by combining ethnographic participant-observation, document analysis and focused interviews. As ethnography often requires the researcher to draw on past experiences to develop rapport with informants as well as to interpret accurately the expertise of cultural insiders as “local knowledge,”16 this researcher relied on her admittedly eclectic professional experience as a former section editor of an American

Broussard 303

newspaper in a top five media market as well as academic training in computer science and several years spent in industry as a professional software developer.

I conducted ethnographic interviews with 25 data journalists as key informants, all of whom work at major news organizations, as well as scholars, librarians and developers. I reviewed the interview transcripts in order to identify common themes, and I met many of these informants while doing participatory fieldwork at a full-day software preservation conference and planning session in March 2014 at the Newseum in Washington, DC. Organized by the Mozilla Open News Foundation, that event was the first gathering of journalists and scholars concerned about archival issues in digital news.17 I also collected the outputs from the event and analyzed them for related background material and contex-tual evidence. These outputs included a collaboratively developed document, a “hackpad,” about next steps and strategies; tweets from the event, hashtagged #apparchive; and mul-tiple blog posts. Additionally, I analyzed the archives of the NICAR-L listserv, the primary communication avenue for the international community of data journalists. In keeping with recently developed qualitative research approaches conducted online, I spent a year as a “virtual” participant-observer on the NICAR-L listserv.18 Finally, I supplemented this research by assembling a bibliography of scholarly and popular sources in the adjacent disciplines of game development, library science, visual art (specifically new media art or software-based installations) and software preservation.

A national news app registry could potentially address the question of which news apps should be preserved.

FindingsRQ1:

Which news apps should be preserved and how this determination should be made?

The case of Everyblock.com, a very early news app that was not effectively pre-served, illustrates the myriad issues associated with preservation. Everyblock is iden-tifiable as an app that probably should have been preserved; however, it is only identifiable as such in retrospect because news developers use Everyblock (and its demise) as both a cultural touchstone and a cautionary tale.

Everyblock began in 2005 when journalist and programmer Adrian Holovaty launched a site called Chicagocrime.org. The site was revolutionary in the field as the first example of a journalist combining geo-location and public data. Chicago maga-zine wrote of the project:

Google Maps had just launched. The Chicago Police Department had put some of its statistics online. Holovaty combined the two and created Chicagocrime.org, a website that allowed anyone to search for crimes by location, type and date—and on a map, no less.19

Chicagocrime.org won substantial acclaim for digital hyper-local news, including a 2005 Knight-Batten Award for Innovation in Journalism and a $1.1 million grant from the Knight Foundation.

304 Newspaper Research Journal 36(3)

As the number of people involved in the project grew, the software changed. Holovaty used the Knight grant to expand chicagocrime.org into Everyblock.com, a neighborhood news and discussion site, in 2007. EveryBlock used geo-location to feed users relevant nearby news. Readers could search for local news and other information by entering a zip code, neighborhood or address. Msnbc.com bought EveryBlock—the company and the site—in 2009, and it expanded to 16 cities. Later, msnbc.com was acquired by NBC News. Holovaty, who is also known for creating the Django open source framework used by a number of news organizations to develop original news apps, left Everyblock in 2012. In early 2013, NBC News shut down the site. A small part of the project was resurrected as Chicago.everyblock.com, focusing only on Chicago data—but the rest of the site is gone.

A national news app registry could potentially address the question of which news apps should be preserved. An organization such as the Library of Congress or a profes-sional association such as IRE/NICAR could maintain the registry. To return to the difference between a “mobile app” and a “news app” referenced in the first paragraph, the difference is one of perception. Both are pieces of software, and both present a viewer or reader with journalistic stories. However, the news app is considered by news developers to be a work of journalism; the mobile app, when it is considered at all, is considered to be merely a delivery mechanism. It seems unfair to ask an archi-vist to parse out the nuances of why one piece of software is considered more journal-istically prestigious than another. Thus it would seem useful for developers or journalists themselves to nominate projects that they collectively deem important to preserve.

Prompted in part by an emerging online conversation about preserving apps, Document Cloud developer Ted Han started a Reddit page where anyone can contrib-ute a link to a news app project. Han’s idea was that such a list would be the first step toward determining what apps are out there, how many of them exist and which ones news developers or Reddit users deem noteworthy. He wrote on the NICAR-L listserv:

I’ve started informally collecting links to news apps here: http://www.reddit .com/r/newsapps. I would entreat other folks who are interested in collecting links to join me in doing so, since as far as I know there isn’t any sort of public index to this info (would love to know if folks have tried elsewhere, though!). Archives of articles and retrospective access to documents are important to projects like DocumentCloud. Some have written articles around viewers embedded from our site, so we both care about whether and how those articles are available and whether they get reformatted or transformed in the future. We’d also like to make sure uploaded documents that we maintain remain stable and available over the long term.20

Han was motivated to create this informal collection, he said, after discovering in February 2014 that U.S. News & World Report was no longer making archived content before 2007 available on its website. U.S. News had switched to a different website content management system and had determined that it was unfeasible to continue to maintain these older archives using this new technology; readers were encouraged to consult EBSCO, LexisNexis or bound volumes for archival material. As of March

Broussard 305

2014, Han’s list included links to 37 news apps or stories about news apps. This is a relatively small number and could serve as a preliminary list of targets to optimize.

RQ2:

Which components of a news app should be preserved?

Preserving code alone is not enough. In preparing software for the future, it is important to think about preserving code as well as documentation and information about the development process and its infrastructure. In the future, a scholar might ask questions like: What led Adrian Holovaty to take data published by the Chicago police department and display it in a searchable Google Maps interface? What did the code look like? How did the constraints of the programming language influence the visual design of the project?

Some insight can be gained by asking current news developers what they would like to know about EveryBlock. Two developers wrote of what they were curious about:

It’s not just the code that Adrian wrote or the map itself, though his reverse engineering of the Google Maps Flash API was one of its great innovations when it first came out. We want to know about his process. We want to know the infrastructure on which he built the app (indeed, making his use of Google Maps even more impressive). We want to know about how it was designed, how the user interactions worked. We want to know the impact it had and who responded to it.21

In visual art, today’s archivists try to preserve some documentation about the art along with the physical artwork. “You really cannot understand contemporary art without its documentation,”22 said Pilar Garcia, archive director at the Museo Universitario Arte Contemporáneo in a 2013 interview on the Museum of Modern Art/PS1 blog. For example: Marcel Duchamp’s 1917 “Fountain” is important because it is an iconic example of Dadaist artwork. Without an explanation of how and why the artist turned this everyday object into art and why Dada was historically significant, in 100 years “Fountain” will look like an ordinary urinal. So, too, will this happen with the collection of bits and bytes that make up a news app. Unless some historical infor-mation is preserved to explain the context, news apps in 100 years will look like piles of unreadable 0s and 1s.

Interestingly, a connection to the art world might have saved components of EveryBlock from being lost forever. Future scholars might be able to piece together a representation of the site by using material in the MOMA archives. Just after Chicagocrime.org was absorbed into EveryBlock, Holovaty wrote on his blog:

This story has a fitting epilogue. In just a few weeks after Chicagocrime.org goes offline, the site will be featured in an exhibition at New York’s Museum of Modern Art, called Design and the Elastic Mind. Chicagocrime.org will have ended its life and become a museum piece.23

306 Newspaper Research Journal 36(3)

A news app is usually developed in an iterative fashion, meaning that a version of the app is released to the public and the technology or presentation is fine-tuned over the course of the next few days or weeks. This raises obvious questions about which version of a news app should be preserved. Should it be the first version or the final?

In addition to versioning concerns, attention might be paid to which components of an app might be reused in the future. Data-driven apps have two major components: the underlying data and the presentation layer. The presentation layer includes the app architecture, the data analysis and the user interface. The underlying data is potentially reusable. Just as social scientists preserve and share their data through the ICPSR data library, so too could journalists share data through a central organization, such as the IRE Data Library, for the benefit of other data journalists.

Currently, news app history has been preserved on an ad-hoc basis. At the 2014 IRE/NICAR annual conference, a group of data journalists presented a panel called “Save the data: going from Zip (drive) to news by rescuing, analyzing old data.” Cheryl Phillips, the multiple Pulitzer Prize-winning reporter and data innovation edi-tor at The Seattle Times, showed photos of her storage method for old data. She keeps her old computers in a pile in her basement. Cardboard boxes nearby house Zip disks and floppy disks. Other journalists contributed photos and samples of their own archives. Paul Overberg of USA Today demonstrated a nine-track tape, a one-half -inch magnetic tape reel that was used on minicomputers and mainframes from the 1970s to the 1990s.24 Overberg had received this particular 9-track in response to a request for Census data, and it had seemed important to hang onto it. For decades. In the basement.

There is some psychology attached to keeping these objects: perhaps the idea is that if the physical storage medium is still in the reporter’s possession, the story could be fact-checked or the data recovered. Cheryl Phillips no longer owns a Zip drive, but she does have the Zip disks and the computer that once ran them.

Phillips spoke at NICAR of her reasoning for organizing the panel on data storage and recovery: “Why do we care about this? Because the data we’re using now is going to be just like this,”25 she said, gesturing at the 9-track tape.

We’re not going to have any way to get it because it will be on a USB drive, it’s going to be on little floppies; we won’t be able to access it unless we figure out now a way to save it for future geeks as well as aggregate it for good stories. That’s why we need to document our data, and share it.26

Overberg wrote on a listserv about the utility of using older data as an efficient starting point for new stories:

Phil Meyer said that precision journalists must adopt the scientific method, including replicability, way back when “transparency” was just a UN buzzword. So he pushed us at USA Today to document our work and archive data so we could share it, including with our later selves. The archive from our 1997 Interstate speeding ticket project gave our 2004 project a lot more legs. Our Census 2000 archive saved us a huge amount of work setting up for Census 2010. And archiving weekly best-selling book data let us produce an interactive of every book that has topped our list when it turned 20 last fall.27

Broussard 307

Each of the various layers of a news app has potential value, whether it is the poten-tial reuse value of the underlying data or the value a future scholar can derive from seeing a representation of a multimedia news artifact created out of the cultural con-text of America in 2014. Looking at each layer separately may help scholars to priori-tize their efforts to develop technical solutions for the challenge of preserving news apps.

RQ3:

What are the known technological and legal challenges that must be addressed as researchers develop standards?

The 2013 loss of EveryBlock prompted news developers to start asking questions about what and how such content should be preserved An online and offline conversa-tion identified problems and opportunities. News developer Matt Waite, a professor at the University of Nebraska, wrote in Source in September 2013:

News people know there’s value in longevity. A good project becomes a resource, or a monument to a moment in our history. And you can’t be the first draft of history if you delete the draft.28

Prompted by Waite’s piece, The New York Times news developer Jacob Harris orga-nized a panel at a conference called Newsfoo at which developers tried to explain the need for archiving news apps. Harris then published an essay on Source called “And remember—this is for posterity,”29 in which he argued for the benefits of archiving dynamic sites as static pages.30

Static versus dynamic is the current methodological battleground among news developers. The issue of archiving static versus dynamic pages is tied to human factors and technical constraints associated with archiving. Chicagocrime.org and EveryBlock.com were taken offline because of human factors: high-level business decisions. Unfortunately, the tumultuous corporate history of Chicagocrime.org is typical of news projects and contemporary companies. Internet companies, even digital media companies, are bought, sold, consolidated and bankrupted at a rapid rate. The media landscape will only get more complicated: Pew estimates that there were 438 small digital news organizations in the US in 2013, most of which are digital-first startups.31

Unlike legacy media organizations, these startups do not have archiving contracts with news database companies like Lexis-Nexis or Thompson/Reuters. Their digital content may be archived through a snapshot captured by the Internet Archive, which attempts to preserve the history of the Internet. However, the content may not show up in the Internet Archive, depending on the back-end technology the media company uses. Internet Archive snapshots preserve static web pages, not dynamic ones, and images are often not stored with the text of a web page. Here is an image of Everyblock taken by the Internet Archive in 2008. [See Figure 1]

The live site likely had content in the large area that appears blank in this archived version. Clicking on any of the links or attempting to use the search box yields the following page. [See Figure 2]

308 Newspaper Research Journal 36(3)

The technical challenge of preserving static versus dynamic content has wide-rang-ing implications.

In addition to information about file encoding, compression and format, scholars may want to consider storing copyright and/or licensing information alongside news apps. Copyright issues govern the completeness of the library databases that media researchers rely on to construct samples. Chen32 writes that thousands of articles were deleted from library databases in the wake of the 2001 New York Times vs. Tasini rul-ing, which held that freelance writers—not publishers—own the electronic rights to their articles.

News apps, because they are interactive databases, pose another potential copyright issue that news organizations may have to navigate in the future. A database is copy-rightable, in part because it may include unique “selection and arrangement”33 of facts. News organizations are familiar with copyright issues around text and photos, but will likely need education around the additional concerns of database copyright. Should news app developers drift into creating unique software, which is not a far stretch from creating databases, publishers may find themselves in the realm of software develop-ment and intellectual property rights. Intellectual property rights around software are another complex field that will challenge archivists.34 If a newspaper’s staffers develop a news app, usually the same employment agreement governs the intellectual property that the staffers produce. However, if freelancers contribute to the news app, each freelancer’s contract can contain unique provisions regarding intellectual property. The underlying data in a news app may also have licensing information associated with it that will affect future use; archivists will want to preserve this licensing information.

Figure 1Screencap of Everyblock Taken by Internet Archive, 2008

Broussard 309

The nuances of copyright and modes of digital expression have been explored most extensively by visual artists and curators. In order to preserve code-based artworks submitted to the Rhizome ArtBase, a platform for new media art, Rinehart35 developed a questionnaire for visual artists seeking to preserve their work. The Rhizome ArtBase serves as a registry and documentation repository for such artworks. The questionnaire focuses on emulation: a submitting artist must specify what hardware and software is necessary to emulate the artwork at a future point in time. The questionnaire also asks for rights associated with performance, so that future curators do not accidentally vio-late the artist’s copyright.36

The reason that news apps don’t get archived at legacy media organizations has to do with the back-end technology of the newsroom. Story text, bylines and images are typically stored inside a newspaper’s content management system, or CMS, which is also used to transmit files to the printer. Newspapers typically have two content man-agement systems—one that pushes stories and images to the Web and one that pushes stories and images to the printing press. USA Today, for example, uses the CCI NewsDesk Editorial and Pagination System for layout and page design. Reporters file their stories in CCI, designers lay out the pages and the issue is transmitted via satellite to 36 US printing plants and four printing plants in Europe and Asia.37 After stories and images are entered into CCI, they are edited and approved, and they are pushed to the Web content management system (CMS). The Web CMS delivers Web pages to users who visit usatoday.com and related URLs.

CCI, Saxotech, Hermes and other print production management systems have been in existence much longer than any Web CMS. Automatic archiving systems are set up

Figure 2Result of Clicking Links or Using Search Box on Everyblock, 2008

310 Newspaper Research Journal 36(3)

to pull content from the print CMS, not the Web CMS. For example, if LexisNexis pulls an automatic feed of material from The Philadelphia Inquirer, the feed is set up to pull from Hermes, the Inquirer’s print CMS, not Clickability, the Web CMS. Reporters’ blog posts and other material posted on Clickability will not be automati-cally archived.

The CMS issue gets even more complicated for news apps because interactive news apps do not appear in print and they are typically made outside of the regular Web CMS. There are convincing technical reasons for this, starting with the fact that most Web content management systems are unstable technologies. Harris explains:

Almost any news programmer generally loathes their organization’s Content Management System; its codified formats and rigid workflows often feel more like strictures to our project. And so, we do our work outside the CMS, skinning our pages so they look like the main news site while remaining architecturally apart. For instance, look at our how we reported election results in 2012.38 It’s actually hosted on Amazon S3 and skinned to look like The New York Times content. Why go through this extra work just to make it look like articles produced via the CMS in the end? In our case, controlling our own technology stack enabled us to do dynamic projects like election results that wouldn’t be possible within the CMS. Also, the CMS model for stories is a foolish fit for data projects that may include many thousands of browsable (sic) pages; you just can’t and shouldn’t represent a relational database in a CMS. So, we do our work outside the bounds of the CMS, but it has a cost.39

The convenience of creating news apps outside of a CMS currently comes at the cost of easy archiving. Automatic bulk archiving is one of the reasons for extensive newspaper archives today; however, if news apps are outside of the automated system, an important first step is figuring out how to manually archive these journalistic proj-ects. As one developer put it, “News apps are the artisanal cheeses of the journalism world.”40 They are unique and exciting, but they are expensive to produce and very hard to store.

Considering that preserving the underlying software is so complex, “baking out” dynamic news apps as static pages seems like a practical strategy for news apps at the end of their life cycle. A dynamic site could be converted into a set of dozens or thou-sands of static HTML pages. Static HTML pages, because they are flat text files, seem like they will be more likely to remain readable for the foreseeable future. The images associated with the Web pages could be rendered as TIFFs, which have emerged as a popular format for archived images.41 It would be helpful to select a single app for preservation and test an emulator solution, plus develop documentary metadata.

DiscussionImportant strides have been made toward the goal of preserving news apps; how-

ever, it is clear that more work remains to be done.If the goal is to allow future journalists and historians to experience today’s news

apps, an important first step is to identify which news apps should be preserved. The fact that there are 400-plus media organizations producing digital content, some of

Broussard 311

which are news apps, suggests the need for some kind of registry of data journalism projects.

Such a registry might maintain standardized documentation listing the app’s run-time environment, its copyright and intellectual property restrictions and other crucial metadata. In addition to the app presentation layer, the registry could make available the underlying data that powers the app, allowing other journalists to use the cleaned data as a starting point for other investigations.

The logistical issues associated with running today’s software on tomorrow’s machines must be addressed if any preservation efforts are to succeed. Hardware and software emulators may solve this issue if properly preserved and kept up to date. It will also be helpful if, in the future, news apps can be archived automatically just as traditional print stories and images are now archived automatically.

The journalists who are making news apps today are making data journalism his-tory. Developing methods for preserving news apps is an important step toward mak-ing sure that this first draft of history is available to future generations.

Notes

1. While there is a whole universe of digital artifacts that could be archived, this paper focuses on what news developers call “news apps.” Software developers tend to use the term “app” generically to mean “application,” but the specific meaning of “application” varies depending on the situation. The elec-tronic artifacts that news organizations generate, all of which could potentially be preserved, include data visualizations, video, animation and news apps. The important distinction is that a news app is considered a piece of journalism, and thus radically different from an app that one might download from the iTunes Store in order to read individual articles in the newspaper. For the sake of clarity, below are some definitions: News app is short for “interactive news application.” Scott Klein, senior editor for news development at ProPublica, gives the following definition in The Data Journalism Handbook: “A news application is a big interactive database that tells a news story. Think of it like you would any other piece of journalism. It just uses software instead of words and pictures.” See Scott Klein, “News Apps at ProPublica,” datajournalismhandbook.org, <http://datajournalismhandbook.org/1.0/en/delivering_data_2.html> (May 28, 2015). Web app refers to a piece of software that runs inside a Web browser. A Web app may be accessed on a desktop computer, a laptop computer or a mobile device such as an iPhone, iPad, tablet or Android phone. A news app is usually a Web app in that a news app is custom software designed to be viewed within a Web browser. Native app refers to a piece of software designed to work with a mobile device’s native operating system. For Android phones, this means the Android operating system; for iPhones and iPads, this means the iOS operating system. Native apps are typically obtained via proprietary online stores such as the Apple Store or Google Play. News apps are rarely native apps, but many news organizations also publish native apps. For example. The New York Times published twelve different native apps as of March 2014. Among these were: The NYTimes app for iPad, The NYTimes app for Android, The NYTimes app for Kindle Fire, The NYTimes Crosswords app, The Scoop: NYC app for iPhone, The NYTimes Real Estate app. The NYTimes app for iPad, Android, or Kindle Fire is a native app that readers would use to read articles from the newspaper. It includes mobile advertising and is akin to an electronic version of a newspaper. The NYTimes Crosswords app is a delivery device for crossword puzzles. The Scoop let readers sort through restaurant reviews and ideas for New York City outings. The NYTimes Real Estate app presents real estate listings and content from the real estate section. These native apps all repurpose content that journalists have produced. Mobile app may be used to refer to a native app, the mobile version of a Web app or the mobile version of a news app.

2. Scott Klein and Tyler Fisher, “A Conceptual Model for Interactive Databases in News,” propublica.org, March 18, 2014, <http://www.propublica.org/nerds/item/a-conceptual-model-for-interactive-databases-in-news> (May 28, 2015).

3. Alexander Howard, “Aron Pilhofer on Data Journalism, Culture and Going Digital,” towcenter.org, March 27, 2014, <http://towcenter.org/aron-pilhofer-on-data-journalism-culture-and-going-digital/> (June 22, 2015).

312 Newspaper Research Journal 36(3)

4. Meredith Broussard, “Future-Proofing News Apps,” pbs.org, April 23, 2014, <http://www.pbs.org/mediashift/2014/04/future-proofing-news-apps/> (May 28, 2015).

5. National Digital Information Infrastructure and Preservation Program of the Library of Congress, “PRESERVING.EXE: Toward a National Strategy for Software Preservation,” digitalpreservation.gov, October 2013, <http://www.digitalpreservation.gov/multimedia/documents/PreservingEXE_report_final101813.pdf> (June 22, 2015).

6. Jeff Rothenberg, “Ensuring the Longevity of Digital Information,” clir.org, February 22, 1999, <http://www.clir.org/pubs/archives/ensuring.pdf> (May 28, 2015).

7. Ibid. 8. Norman E. Youngblood, Barbara A. Bishop and Debra L. Worthington, “Database Search Results Can

Differ from Newspaper Microfilm,” Newspaper Research Journal 34, no. 1 (winter 2013): 36-49. 9. Burton Grad, “Preserving the Software Industry’s Past,” IEEE Annals of the History of Computing 25,

no. 1 (January 2003): 88.10. For more on the myriad logistical challenges of metadata around software preservation, see Kurt D.

Bollacker, “Avoiding a Digital Dark Age,” americanscientist.org, <http://www.americanscientist.org/issues/pub/avoiding-a-digital-dark-age/1> (May 28, 2015); James Mitchell Crow, “Cultural Decay,” New Scientist 206, no. 2765 (June 2010): 42-45; Helen R. Tibbo, “On the Nature and Importance of Archiving in the Digital Age,” Advances in Computers 57 (2003): 1-67; Omar Alam, Bram Adams and Ahmed E. Hassan, “Preserving Knowledge in Software Projects,” Journal of Systems and Software 85, no. 10 (October 2012): 2318-2330; James W. Cortada, “Think Piece: Preserving Records of the Past, Today,” IEEE Annals of the History of Computing 31, no. 2 (April 2009): 88-87; Michael W. Godfrey, “Understanding Software Artifact Provenance,” Science of Computer Programming 97, part 1 (January 2015): 86-90.

11. Jeff Rothenberg, “Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation,” clir.org, January 1998, <http://www.clir.org/pubs/reports/rothenberg/contents.html> (May 28, 2015).

12. Bollacker, “Avoiding a Digital Dark Age.”13. Sharon Hartin Iorio, ed., Qualitative Research in Journalism: Taking It to the Streets (Mahwah, NJ:

Lawrence Erlbaum Associates, 2004).14. Barney Glaser and Anselm Strauss, The Discovery of Grounded Theory: Strategies for Qualitative

Research (Chicago: Aldine Publishing Company, 1999).15. Ibid.16. Clifford Geertz, Local Knowledge: Further Essays in Interpretative Anthropology (New York: Basic

Books, 2000).17. Erika Owens, “Smart People Working on a Tough Problem: NICAR News Apps Archive Designathon,”

erikaowens.com, February 7, 2014, <http://erikaowens.com/blog/smart-people-working-tough-prob lem-nicar-news-apps-archive-designathon> (May 28, 2015).

18. Tom Boellstorff, ed., Ethnography and Virtual Worlds: A Handbook of Method (Princeton, NJ: Princeton University Press, 2012).

19. Andrew Huff, “Street Wise,” chicagomag.com, June 2009, <http://www.chicagomag.com/Chicago-Magazine/June-2009/Street-Wise/> (May 28, 2015).

20. Ted Han, “Re: NICAR News Apps Archive Designathon,” zotero.org, February 19, 2014, <https://www.zotero.org/groups/app_archive/items/itemKey/DVKGNI37> (June 22, 2015).

21. Tyler Fisher and Scott Klein, “Preserving Interactive News Projects with Newseum, OpenNews and Pop Up Archive,” knightlab.northwestern.edu, March 18, 2014, <http://knightlab.northwestern.edu/2014/03/18/preserving-interactive-news-projects-with-newseum-opennews-and-pop-up-archive/> (May 28, 2015).

22. Naomi Kuromiya, “Examining Archives Exhibition Strategies in Mexico City,” moma.org, October 7, 2013, <http://www.moma.org/explore/inside_out/2013/10/07/examining-archives-exhibition-strate-gies-in-mexico-city> (May 28, 2015).

23. Adrian Holovaty, “In Memory of chicagocrime.org,” holovaty.com, January 31, 2008, <http://www .holovaty.com/writing/chicagocrime.org-tribute/> (May 28, 2015).

24. Frank da Cruz, “IBM Mainframe Magnetic Storage Media,” columbia.edu, July 2010, <http://www .columbia.edu/cu/computinghistory/media.html> (March 29, 2014).

25. Cheryl Phillips, “Save the Data: Going from Zip (Drive) to News by Rescuing, Analyzing Old Data” (paper presented at the IRE/NICAR 2014 Conference, Baltimore, MD, February 2014).

26. Ibid.

Broussard 313

27. Paul Overberg, “Archiving News Applications,” zotero.org, January 21, 2014, <https://www.zotero .org/groups/app_archive/items/VEFHZ7QX> (June 22, 2015).

28. Matt Waite, “Kill All Your Darlings,” source.opennews.org, September 12, 2013, <https://source .opennews.org/en-US/learning/kill-all-your-darlings/> (May 28, 2015).

29. Jacob Harris, “And Remember, This Is for Posterity,” source.opennews.org, November 14, 2013, <https://source.opennews.org/en-US/learning/and-remember-ones-posterity/> (May 28, 2015).

30. Ibid.31. Mark Jurkowitz, “The Growth of Digital Reporting,” journalism.org, March 26, 2014, <http://www

.journalism.org/2014/03/26/the-growth-in-digital-reporting/> (May 28, 2015).32. Xiaotian Chen, “Embargo, Tasini, and ‘Opted Out’: How Many Journal Articles are Missing from

Full-Text Databases,” Internet Reference Services Quarterly 7, no. 4 (September 2002): 23-34.33. I. Trotter Hardy, “Project Looking Forward: Sketching the Future of Copyright in a Networked

World,” copyright.gov, May 1998, <http://www.copyright.gov/reports/thardy.pdf> (May 28, 2015).34. National Research Council (U.S.), The Digital Dilemma: Intellectual Property in the Information Age

(Washington, DC: National Academy Press, 2000); Feng-Cheng Chang, Chin-Yuan Chang and Hsueh-Ming Hang, “A Study on the Meta-Data Design for Long-Term Digital Multimedia Preservation,” in Intelligent Information Hiding and Multimedia Signal Processing (proceedings from IIHMSP International Conference, Harbin, China, 2008); Len Shustek, “What Should We Collect to Preserve the History of Software?” IEEE Annals of the History of Computing 28, no. 4 (October 2006): 112-111.

35. Richard Rinehart, “Preserving the Rhizome ArtBase,” archive.rhizome.org, September 2002, <http://archive.rhizome.org/artbase/preserving-the-rhizome-artbase-richard-rinehart/> (May 28, 2015).

36. For more on contemporary art preservation issues, see Berkeley Art Museum and Pacific Film Archive, “Archiving the Avant-Garde: Documenting and Preserving Digital/Media Art,” bampfa.berkeley.edu, 2001, <http://www.bampfa.berkeley.edu/about/avant-garde> (June 22, 2015); Dirk Von Suchodoletz and Jeffrey Van der Hoeven, “Emulation: From Digital Artefact to Remotely Rendered Environments,” International Journal of Digital Curation 4, no. 3 (December 2009): 146-155; Alain Depocas, Jon Ippolito and Caitlin Jones, “Permanence through Change: The Variable Media Approach,” variablemedia.net, <http://www.variablemedia.net/pdf/Permanence.pdf> (May 28, 2015); Jon Ippolito, “The Museum of the Future: A Contradiction in Terms?” Cross Talk ArtByte 1, no. 2 (July 1998): 18-19; Richard Rinehart, “The Straw That Broke the Museum’s Back? Collecting and Preserving Digital MediaArtworks for the Next Century,” switch.sjsu.edu, June 14, 2000, <http://switch.sjsu.edu/web/v6n1/article_a.htm> (May 28, 2015).

37. USA Today, “How the Newspaper Is Produced,” usatoday30.usatoday.com, <http://usatoday30.usa-today.com/marketing/media_kit/pressroom/press_kit_usat_how_newspaper_produced.html> (March 21, 2014).

38. See The New York Times, “President Map,” elections.nytimes.com, November 29, 2012, <http:// elections.nytimes.com/2012/results/president> (May 28, 2015).

39. Harris, “And Remember, This Is for Posterity.”40. Quote taken from author’s notes from software preservation conference and planning session at

Newseum, Washington, DC, March 2014.41. Leslie Johnston, Library of Congress digital archivist, personal communication with author, March 2,

2014.