data journalism 101

Post on 26-Jan-2015

156 Views

Category:

News & Politics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Data Journalism 101 workshop, presented by AP data journalist Serdar Tumgoren on April 29, 2014 to Bay Area journalists. Organized by the Society of Professional Journalists - Northern California chapter.

TRANSCRIPT

Data Journalism 101

What is data journalism?

DJ in the wild

What is data journalism?

?

?

? ?

?

??

?

? ??

? ???

“Wrangling, vetting and visualizing data to bring forth news stories in the public interest that we never would have found otherwise.” - Garance Burke, AP data journalist

“A data journalist is anyone ...who can fluently work with this primary source [data]. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.”- Me (I know, so lame to quote yourself)

“Data journalism is a form of reporting that makes use of structured data (e.g. spreadsheets, databases) as a key component of researching and telling stories.”- Chad Skelton, data journalist at Vancouver Sun and journalism instructor

“Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.”- Paul Bradshaw, Data Journalism Handbook

Step-by-Step Guide on How To Become a Journicorn

Step 1: Master the Basics

In no particular order:

Excel, MySQL, Postgres, SPSS, R, Javascript, Linux, Python, Ruby, QGIS, pdftk, ARCGIS, Ruby on Rails, Django, Backbone, Node, Hadoop, Mongo, C, Algol, Hypercard, Can, You, Tell, I’m, Just, Making, Shit, Up, Now?

Don’t try to be a Journicorn.(Hint: They don’t exist.)

Be a journalist who uses data.

Data is just another source.

Start with a Question, then Data

● Are housing prices going up?● Do reports of falling crime bear out across

the entire city?● Are developers helping to finance

campaigns of politicians who approved their projects?

● Are public employee salaries on the rise?

Data sources

● Public agencies (local, county, state, federal)● Data.gov sites● Social networking sites (often APIs)

● Nonprofits/industry experts● Academic institutions● Manually gathered

Not everything is on the web.

A whole world of data may never see light of day on gov websites. How do you find it?

● Government forms provide clues● Gov employees● Software contracts and manuals

Useful datasets● Building permits● Campaign finance● Corporate records● Election● Inspections● Planning & Zoning● Land records● Etc. Etc.

Open Records Laws

● Know and understand your rights● Try to negotiate first● Seek expert advice (CalAware, CFAC)● Don’t go fishing; craft targeted requests● Follow through on requests

So I’ve found data. Now what?

Understand the Data.

● What is the origin of the data?● What do the fields mean?● What rules surround the data?● Seek expert advice and sanity checks.

Wrangle the Data.

● What format is the source data?● How do I convert the data for tool of choice?● Explore the data. Is it dirty?● What cleanups are needed to answer my

question?

Sort, Filter, Sum, etc.

● Spreadsheets can take you far.● Aggregate functions in SQL.● Patterns and outliers in stats programs.

Add tools as needed.

Tools are abundant, free and paid.Knowledge is abundant, freely shared*.

(*see IRE-L/NICAR-L)

Keep reporting.

Most often data is a starting point or supplement. Check conclusions in the real world and circle back to refine and qualify data analyses.

If you’re a visual person...

...confounded by the last few bits (like me)...

Talk to people

“What data do I need to answer my question?”

Get The Data

Clean The Data

Check The Data

Interview The Data Interview People

Display The Data

Tell The Story

The Data Journalism Process

Story idea is the key.

Most stats were already available and supported or confirmed by reporting. But we wanted county breakdowns for 2013 (most recent full year of granular data). So...

Data wrangling ain’t pretty.

We got (dirty) data for 2013.

● copy/paste -> Excel = Fail● pdftk -> CSV -> Excel = Fail● pdftk -> CSV -> python -> Excel = Success

Check the data.

A few strategies to ensure accuracy:

● Manually calculate a sample of subtotals, compare to calculated results.

● Compare totals to summary stats from third party.

● Have someone else check your work.

Keep a Data Diary

● Document data sources● Document field descriptions, quirks, etc.● Document data cleaning process● Document analysis

Remember.

Journicorns don’t exist.

The Data Padawan

● See data as another source.● Find and master tools, as needed.● Write stories.● Keep learning. ● Rinse and repeat.● The end.

Ping me.

Serdar Tumgoren@zstumgorenzstumgoren@gmail.com

top related