Data Journalism 101

Download Data Journalism 101

Post on 26-Jan-2015



News & Politics

0 download

Embed Size (px)


Data Journalism 101 workshop, presented by AP data journalist Serdar Tumgoren on April 29, 2014 to Bay Area journalists. Organized by the Society of Professional Journalists - Northern California chapter.


<ul><li> 1. Data Journalism 101 </li></ul> <p> 2. What is data journalism? ? ? ? ? ? ? ? ? ? ?? ? ??? 3. Wrangling, vetting and visualizing data to bring forth news stories in the public interest that we never would have found otherwise. - Garance Burke, AP data journalist 4. A data journalist is anyone ...who can fluently work with this primary source [data]. Its the same as a traditional reporter, who should know how to hunt down human sources and interview them. - Me (I know, so lame to quote yourself) 5. Data journalism is a form of reporting that makes use of structured data (e.g. spreadsheets, databases) as a key component of researching and telling stories. - Chad Skelton, data journalist at Vancouver Sun and journalism instructor 6. Data can be the source of data journalism, or it can be the tool with which the story is toldor it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it. - Paul Bradshaw, Data Journalism Handbook 7. DJ in the wild 8. Data can be used to... Fact-check official narratives Identify trends and patterns Rank things Analyze relationships Find questions to explore Automate breaking news alerts 9. Step-by-Step Guide on How To Become a Journicorn 10. Step 1: Master the Basics In no particular order: Excel, MySQL, Postgres, SPSS, R, Javascript, Linux, Python, Ruby, QGIS, pdftk, ARCGIS, Ruby on Rails, Django, Backbone, Node, Hadoop, Mongo, C, Algol, Hypercard, Can, You, Tell, Im, Just, Making, Shit, Up, Now? 11. Dont try to be a Journicorn. (Hint: They dont exist.) 12. Be a journalist who uses data. Data is just another source. 13. Start with a Question, then Data Are housing prices going up? Do reports of falling crime bear out across the entire city? Are developers helping to finance campaigns of politicians who approved their projects? Are public employee salaries on the rise? 14. Data sources Public agencies (local, county, state, federal) sites Social networking sites (often APIs) Nonprofits/industry experts Academic institutions Manually gathered 15. Databases of Databases Paid Accurint ($) Nexis ($) Free BRB Online Searches Libraries 16. Not everything is on the web. A whole world of data may never see light of day on gov websites. How do you find it? Government forms provide clues Gov employees Software contracts and manuals 17. Useful datasets Building permits Campaign finance Corporate records Election Inspections Planning &amp; Zoning Land records Etc. Etc. 18. Open Records Laws Know and understand your rights Try to negotiate first Seek expert advice (CalAware, CFAC) Dont go fishing; craft targeted requests Follow through on requests 19. FOIA Resources RCFP Open Gov Guide RCFP Letter Generator FOIA Machine Experts: CalAware and CFAC 20. So Ive found data. Now what? 21. Understand the Data. What is the origin of the data? What do the fields mean? What rules surround the data? Seek expert advice and sanity checks. 22. Wrangle the Data. What format is the source data? How do I convert the data for tool of choice? Explore the data. Is it dirty? What cleanups are needed to answer my question? 23. Sort, Filter, Sum, etc. Spreadsheets can take you far. Aggregate functions in SQL. Patterns and outliers in stats programs. 24. Add tools as needed. Tools are abundant, free and paid. Knowledge is abundant, freely shared*. (*see IRE-L/NICAR-L) 25. Keep reporting. Most often data is a starting point or supplement. Check conclusions in the real world and circle back to refine and qualify data analyses. 26. If youre a visual person... ...confounded by the last few bits (like me)... 27. Talk to people What data do I need to answer my question? Get The Data Clean The Data Check The Data Interview The Data Interview People Display The Data Tell The Story The Data Journalism Process 28. Quick Hit Data Wrangling 29. Story idea is the key. Most stats were already available and supported or confirmed by reporting. But we wanted county breakdowns for 2013 (most recent full year of granular data). So... 30. Data wrangling aint pretty. We got (dirty) data for 2013. copy/paste -&gt; Excel = Fail pdftk -&gt; CSV -&gt; Excel = Fail pdftk -&gt; CSV -&gt; python -&gt; Excel = Success 31. Check the data. A few strategies to ensure accuracy: Manually calculate a sample of subtotals, compare to calculated results. Compare totals to summary stats from third party. Have someone else check your work. 32. Keep a Data Diary Document data sources Document field descriptions, quirks, etc. Document data cleaning process Document analysis 33. Remember. Journicorns dont exist. 34. The Data Padawan See data as another source. Find and master tools, as needed. Write stories. Keep learning. Rinse and repeat. The end. 35. Join the Community If you do nothing else, sign up for IRE-L and NICAR-L. Also, shameless plug for PythonJournos. 36. Ping me. Serdar Tumgoren @zstumgoren </p>