open data journalism

37
Open Data Journalism: Introducing Key Concepts By Gabriella Razzano Middelburg: 20 October

Upload: gabriella-razzano

Post on 09-May-2015

539 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Open Data Journalism

Open Data Journalism:Introducing Key Concepts

By Gabriella Razzano

Middelburg: 20 October

Page 2: Open Data Journalism

ODAC is a specialist law centre working in the areas of access to information, open data and whistle blowing.

We provide legal advice and support to access public and

private information through the Promotion of Access to Information Act (PAIA).

We support and provide legal advice to bona fide

whistleblowers using the Protected Disclosures Act

(PDA).

We also provide training on effective implementation of

PAIA, the PDA and open data issues.

Page 3: Open Data Journalism

State of journalism

• Mpumalanga:– While 71% of stories were potentially

investigative, only 18% were investigative.

• Limpopo:– While 73% of stories from papers were

potentially investigative, only a quarter (24%) were actually investigative

– Look at the event not the issue

Page 4: Open Data Journalism

Footprints on the beach near Coral Bay, Australia by Peter Nijenhuis

How do we move forward?

Page 5: Open Data Journalism

Open DataInformation library

1912 2012

Page 6: Open Data Journalism

Info

rmati

on

in

Afr

ica

Page 7: Open Data Journalism

Data is machine-readable

Open data is free for anyone to reuse or redistribute for any person

Page 8: Open Data Journalism

Open Government Data– UK, Kenya, USA– World Bank– OGP– StatSA

Community generated data– Open Street Map– Flickr, SlideShare

Sources of open data

Page 9: Open Data Journalism

1s and 0s everywhere…so?

Page 10: Open Data Journalism

Data Journalism• “Data journalism is obtaining, reporting on, curating and

publishing data in the public interest.”

• (Jonathan Stray, professional journalist and a computer scientist)

• “Data driven journalism is a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story.”

•(Mirko Lorenz, information architect and multimedia journalist)

Page 11: Open Data Journalism

Breaking news has already broken….so what are we contributing?

Page 12: Open Data Journalism

When we are deluged with information, it is the connecting of these different forms of data that become really valuable.Its not about events, but contexts and trends.

Butterfly by Charlene N Simmons’ photostream

Page 13: Open Data Journalism

Why bother?

“The Tribunes more than three dozen interactive websites have drawn three times as many page views as the sites stories [75% of traffic]”

- http://bit.ly/dj2dmz

By Evan P. Cordes through Flickr

Page 14: Open Data Journalism

“Data-driven journalism is the future. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way some times. But now it’s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country”.

— Tim Berners-Lee, founder of the World Wide Web

Page 15: Open Data Journalism

“I think it’s important to stress the “journalism” or reporting aspect of ‘data journalism’. The exercise should not be about just analyzing data or visualizing data for the sake of it, but to use it as a tool to get closer to the truth of what is going on in the world. I see the ability to be able to analyze and interpret data as an essential part of today’s journalists' toolkit, rather than a separate discipline. Ultimately, it is all about good reporting, and telling stories in the most appropriate way.”

— Cynthia O’Murchu, Financial Times

Page 16: Open Data Journalism

The “Murder Mysteries” project by Tom Hargrove of the Scripps Howard News Service (Figure 8). He builtfrom government data and public records requests a demographically-detailed database of more than 185,000 unsolved murders, and then designed an algorithm to search it for patterns suggesting the possible presence of serial killers. This project has it all: hard work gathering a database better than the government’s own, clever analysis using social science techniques, and interactive presentation of the data online.

Page 17: Open Data Journalism

And…the Expenses Scandal again!Using ATI to get information, using data journalism to process. This leaked release of expense statements from MPs by the Telegraph in May 2009 (Rayner, 2009) brought widespread attention to a perceived lack of

transparency by Government on how they spent the money paid to them in taxes. This ‘scandal’ led to changes throughout the political spectrum with much of the resulting data

now available (with regular updates) on data.gov.uk.

http://www.guardian.co.uk/news/datablog/interactive/2012/sep/07/full-list-mps-expenses-ipsa-data-interactive - Go Play!

Page 18: Open Data Journalism

So a data story is...• Typical examples: Census, election results, service delivery,

budget reporting, crime stats (see Follow the Money).• However, narrative is not excluded - the age-old news formula

5W+H remains:– What

• History, dimensions, ...

– Who• Individuals, crowds, ...

– When• Dates, times, intervals, ...

– Where• Locations; country, town, property, ...

– Why– How

Journalism = data gathering and data distribution, in story format (Izak Minaar)

Page 19: Open Data Journalism

Data In

Analysis

Information out

Page 20: Open Data Journalism

Data

Gathering information for a

story

Connecting information that is

gathered

Expressing information as a

story

Localising and personalising news

Page 21: Open Data Journalism

How to?

See:

http://datajournalismhandbook.org/1.0/en/index.html

Page 22: Open Data Journalism

1. Finding• Wobbing (PAIA)• Browse data sites and services:

– http://databank.worldbank.org/ddp/home.do – http://www.africaopendata.org/pt_BR/– You’d be surprised what you can find on SA sites! Lots of big

databases online, though usability often an issue.• Scraping

– ScraperWiki. an online tool to make the process of extracting "useful bits of data easier so they can be reused in other apps, or rummaged through by journalists and researchers." Most of the scrapers and their databases are public and can be re-used. Also, HacksHackers may be able to assist you in find someone to help scrape a particular site.

• Ask a Forum• Search for existing answers or ask a question at Get The Data or

on Quora. However, most of these will not have an African focus – no harm in exploiting journalist networks!

Page 23: Open Data Journalism

• Ask a Mailing List– Mailing lists combine the wisdom of a whole community on a

particular topic. For data journalists, the Data Driven Journalism List and the NICAR-L lists are excellent starting points. Both of these lists are filled with data journalists and Computer Assisted Reporting (CAR) geeks, who work on all kinds of projects. You could also try Project Wombat (“a discussion list for difficult reference questions”), theOpen Knowledge Foundation’s many mailing lists, mailing lists at theInfo, or searching for mailing lists on the topic, or in the region that you are interested in.

• Join Hacks/Hackers– Hacks/Hackers is a rapidly expanding international grassroots

journalism organization with dozens of chapters and thousands of members across four continents. Its mission is to create a network of journalists ("hacks") and technologists ("hackers") who rethink the future of news and information. With such a broad network — you stand a strong chance of someone knowing where to look for the thing you seek.

– There is a Johannesburg (Guy) and Cape Town (Raymond) branch.

• Ask an Expert

Page 24: Open Data Journalism

• Streamlining Your SearchHere are a few tips:– When searching for data, make sure that you include both

search terms relating to the content of the data you’re trying to find as well as some information on the format or source that you would expect it to be in. Google and other search engines allow you to search by file type. For example, you can look only for spreadsheets (by appending your search with ‘filetype:XLS filetype:CSV’), geodata (‘filetype:shp’), or database extracts (‘filetype:MDB, filetype:SQL, filetype:DB’). If you’re so inclined, you can even look for PDFs (‘filetype:pdf’).

– You can also search by part of a URL. Googling for ‘inurl:downloads filetype:xls’ will try to find all Excel files that have “downloads” in their web address (if you find a single download, it’s often worth just checking what other results exist for the same folder on the web server). You can also limit your search to only those results on a single domain name, by searching for, e.g. ‘site:agency.gov’.

“quotes search for exact phrase”+ ensures it contains a word: +logs

- Ensures words are omitted: -wooden~ synonyms: ~death

Page 25: Open Data Journalism

2.Connecting and interrogating

• Numeracy skills• Learn to love excel

http://www.openoffice.org/• DocumentCloud if you don’t have a database

– Sorts through OpenCalais, you can annotate and reference your story from the source doc, then share

• Newsrooms to develop toolboxes for:– Data gathering and capturing (eg spreadsheets in

Google docs for team collaboration)– Analysis– Visualisation

Page 26: Open Data Journalism

The main contribution of excel for your data:1.Sorting

• Organises into more revealing order.2.Filtering

• Gets rid of unnecessary data3.Using math and text functions

• AutoSum, median, maximum, minimum4.Pivot tables

• Creates new tables from your ‘labels’ or variable

Page 27: Open Data Journalism

3. Expressing

Page 29: Open Data Journalism

What not to do…

Where’s the story?

Page 30: Open Data Journalism

Tool CategoryMulti-purpose

Mapping   PlatformSkill Data stored Designed for

visualization level    or processedWeb publishing?

Data Wrangler Data cleaning No No Browser 2 External server No

Google Refine Data cleaning No No Browser 2 Local No

R ProjectStatistical analysis Yes With plugin

Linux, Mac OS X, Unix, Windows XP or later 4 Local No

Google Fusion Tables Visualization app/service Yes Yes Browser 1 External server Yes

Impure Visualization app/service Yes No Browser 3 Varies Yes

Many Eyes Visualization app/service Yes Limited Browser 1

Public external server Yes

Tableau Public Visualization app/service Yes Yes Windows 3

Public external server Yes

VIDI Visualization app/service Yes Yes Browser 1 External server Yes

Zoho Reports Visualization app/service Yes No Browser 2 External server Yes

ChooselFramework Yes Yes

Chrome, Firefox, Safari 4

Local or external server Not yet

ExhibitLibrary Yes Yes

Code editor and browser 4

Local or external server Yes

Google Chart Tools Library and Visualization app/service Yes Yes

Code editor and browser 2

Local or external server Yes

JavaScript InfoVis Toolkit Library Yes No

Code editor and browser 4

Local or external server Yes

Page 31: Open Data Journalism

Tool CategoryMulti-purpose

Mapping   PlatformSkill Data stored

visualizationlevel    or processed

OpenHeatMap GIS/mapping: Web No Yes Browser 1 External server

OpenLayers GIS/mapping: Web, Library No Yes

Code editor and browser 4

local or external server

OpenStreetMap GIS/mapping: Web No Yes

Browser or desktops running Java 3

Local or external server

TimeFlow Temporal data analysis No No

Desktops running Java 1 Local

IBM Word-Cloud Generator

Word clouds No NoDesktops running Java 2 Local

GephiNetwork analysis No No

Desktops running Java 4 Local

NodeXLNetwork analysis No No

Excel 2007 and 2010 on Windows 4 Local

CSVKit

CSV file analysis No No

Linux, Mac OS X or Linux with Python installed 3 Local

DataTablesCreate sortable, searchable tables No No

Code editor and browser 3

Local or external server

FreeDiveCreate sortable, searchable tables No No Browser 2 External server

Highcharts*Library Yes No

Code editor and browser 3

Local or external server

Mr. Data ConverterData reformattingNo No Browser 1

Local or external server

Panda Project Create searchable tables No No

Browser with Amazon EC2 or Ubuntu Linux 2

Local or external server

PowerPivot Analysis and charting Yes No

Excel 2010 on Windows 3 Local

WeaveVisualization app/service Yes Yes

Flash-enabled browsers; Linux server on backend 4

Local or external server

Page 32: Open Data Journalism

4. Personalisation

• Your users are an additional source of data:“Give me a headline to a story that I have no interest in and I'm not likely to click it; suggest a topic that I know something about and I'll read the article”. Sarah Marshall

• Personalised content is King• Solution to “info glut” – filters out noise• About developing personal connections

between publication and reader• Link to local content

Page 33: Open Data Journalism

How to

• Start with data and look for stories?– MP expenses scandal again

• Or start with lead and look for data?• Or redirect because of data?• Deductive v inductive

Page 34: Open Data Journalism

Starter Tools

• ICFJ Anwhere– Online lessons

• Many Eyes– Visualisation

• Google fusion tables– Mapping – Don’t forget Open Street Map

• Google Refine– Tool for cleaning up data

Page 35: Open Data Journalism

What to do?

1. Publish your own data using an open license

• Creative Commons

2. Work with existing communities• ODADI, HacksHackers

3. Use and support existing initiatives and technologies

• ODADI, CKAN

4. Keep innovating

Page 36: Open Data Journalism

Let’s rethink

Let’s pick two or four of the reported stories and rethink them in terms of

the data journalism four steps.

Page 37: Open Data Journalism

Story

Data

PAIALeaks