data journalism 101

40
Data Journalism 101

Upload: serdar-tumgoren

Post on 26-Jan-2015

156 views

Category:

News & Politics


0 download

DESCRIPTION

Data Journalism 101 workshop, presented by AP data journalist Serdar Tumgoren on April 29, 2014 to Bay Area journalists. Organized by the Society of Professional Journalists - Northern California chapter.

TRANSCRIPT

Page 1: Data Journalism 101

Data Journalism 101

Page 2: Data Journalism 101

What is data journalism?

Page 3: Data Journalism 101

DJ in the wild

Page 4: Data Journalism 101
Page 8: Data Journalism 101

What is data journalism?

?

?

? ?

?

??

?

? ??

? ???

Page 9: Data Journalism 101

“Wrangling, vetting and visualizing data to bring forth news stories in the public interest that we never would have found otherwise.” - Garance Burke, AP data journalist

Page 10: Data Journalism 101

“A data journalist is anyone ...who can fluently work with this primary source [data]. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.”- Me (I know, so lame to quote yourself)

Page 11: Data Journalism 101

“Data journalism is a form of reporting that makes use of structured data (e.g. spreadsheets, databases) as a key component of researching and telling stories.”- Chad Skelton, data journalist at Vancouver Sun and journalism instructor

Page 12: Data Journalism 101

“Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.”- Paul Bradshaw, Data Journalism Handbook

Page 13: Data Journalism 101

Step-by-Step Guide on How To Become a Journicorn

Page 14: Data Journalism 101

Step 1: Master the Basics

In no particular order:

Excel, MySQL, Postgres, SPSS, R, Javascript, Linux, Python, Ruby, QGIS, pdftk, ARCGIS, Ruby on Rails, Django, Backbone, Node, Hadoop, Mongo, C, Algol, Hypercard, Can, You, Tell, I’m, Just, Making, Shit, Up, Now?

Page 15: Data Journalism 101

Don’t try to be a Journicorn.(Hint: They don’t exist.)

Page 16: Data Journalism 101

Be a journalist who uses data.

Data is just another source.

Page 17: Data Journalism 101

Start with a Question, then Data

● Are housing prices going up?● Do reports of falling crime bear out across

the entire city?● Are developers helping to finance

campaigns of politicians who approved their projects?

● Are public employee salaries on the rise?

Page 18: Data Journalism 101

Data sources

● Public agencies (local, county, state, federal)● Data.gov sites● Social networking sites (often APIs)

● Nonprofits/industry experts● Academic institutions● Manually gathered

Page 20: Data Journalism 101

Not everything is on the web.

A whole world of data may never see light of day on gov websites. How do you find it?

● Government forms provide clues● Gov employees● Software contracts and manuals

Page 21: Data Journalism 101

Useful datasets● Building permits● Campaign finance● Corporate records● Election● Inspections● Planning & Zoning● Land records● Etc. Etc.

Page 22: Data Journalism 101

Open Records Laws

● Know and understand your rights● Try to negotiate first● Seek expert advice (CalAware, CFAC)● Don’t go fishing; craft targeted requests● Follow through on requests

Page 24: Data Journalism 101

So I’ve found data. Now what?

Page 25: Data Journalism 101

Understand the Data.

● What is the origin of the data?● What do the fields mean?● What rules surround the data?● Seek expert advice and sanity checks.

Page 26: Data Journalism 101

Wrangle the Data.

● What format is the source data?● How do I convert the data for tool of choice?● Explore the data. Is it dirty?● What cleanups are needed to answer my

question?

Page 27: Data Journalism 101

Sort, Filter, Sum, etc.

● Spreadsheets can take you far.● Aggregate functions in SQL.● Patterns and outliers in stats programs.

Page 28: Data Journalism 101

Add tools as needed.

Tools are abundant, free and paid.Knowledge is abundant, freely shared*.

(*see IRE-L/NICAR-L)

Page 29: Data Journalism 101

Keep reporting.

Most often data is a starting point or supplement. Check conclusions in the real world and circle back to refine and qualify data analyses.

Page 30: Data Journalism 101

If you’re a visual person...

...confounded by the last few bits (like me)...

Page 31: Data Journalism 101

Talk to people

“What data do I need to answer my question?”

Get The Data

Clean The Data

Check The Data

Interview The Data Interview People

Display The Data

Tell The Story

The Data Journalism Process

Page 33: Data Journalism 101

Story idea is the key.

Most stats were already available and supported or confirmed by reporting. But we wanted county breakdowns for 2013 (most recent full year of granular data). So...

Page 34: Data Journalism 101

Data wrangling ain’t pretty.

We got (dirty) data for 2013.

● copy/paste -> Excel = Fail● pdftk -> CSV -> Excel = Fail● pdftk -> CSV -> python -> Excel = Success

Page 35: Data Journalism 101

Check the data.

A few strategies to ensure accuracy:

● Manually calculate a sample of subtotals, compare to calculated results.

● Compare totals to summary stats from third party.

● Have someone else check your work.

Page 36: Data Journalism 101

Keep a Data Diary

● Document data sources● Document field descriptions, quirks, etc.● Document data cleaning process● Document analysis

Page 37: Data Journalism 101

Remember.

Journicorns don’t exist.

Page 38: Data Journalism 101

The Data Padawan

● See data as another source.● Find and master tools, as needed.● Write stories.● Keep learning. ● Rinse and repeat.● The end.

Page 40: Data Journalism 101

Ping me.

Serdar Tumgoren@[email protected]