data journalism
DESCRIPTION
TRANSCRIPT
Philip Meyer, Detroit, 1967Knight newspapers reporter. Nieman Fellow interested in social research methods. Teamed up with academic to test stories being told about riots (poor immigrants being ‘deviant’). Field research, analysis, publication - 1 month debunked - no correlation between income, origin. Line about information abundance and need for ‘truth about the facts’
Online JournalismCity UniversityPaul Bradshaw
Data journalism: “The truth about the facts”
1. How is 2012 different to 1967?2. Getting data3. Getting stories
Themes
Holly Watt, 2009
The Guardian and Wikileaks
“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”
Adrian Holovaty
• Times Data Blog
”QUOTE”
Now is a good time.
“The Tribune’s more than three dozen interactive databases, collectively have drawn three times as many page views as the site’s stories. [75% of traffic]”
http://bit.ly/dj2dmz
.
Everything is zeroes and ones
NumbersTextLive dataBehavioural dataImages, audio, video
If it’s digitised, it’s a subject for data journalism
(comparison, themes)
Times film genres
.
The process.
25
Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?
Passive vs active data journalism
Official sources: ONS, data.gov.uk, etc.Secondary FOI: disclosure logs, WDTK, HansardReports and research: Google alertsUnofficial sources: Scraperwiki, OpenlyLocal, OpenCorporates, OpenCharities, etc.
Compile: Reactive
Communities, mailing lists, groupsAdvanced search: Site:gov.uk (etc), Filetype:pdf (etc) Tip: database contents are invisibleScrapers - tools, write or ask
Compile: Proactive
29
“disclosure log” site:gov.uk“hate crime” filetype:xls site:police.uk“confidential” filetype:pdf site:gov.uk
Walkthrough: advanced search
RSS, XML, JSON, RDF - and APIsScraperwikiOutwit HubGoogle RefineYahoo! PipesGoogle Docs formulae
Feeds and scrapers
Format? Table? Pattern? URL?
'Structured' data
http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=®ion=&country=united+kingdom§or=
http://www.ltscotland.org.uk/scottishschoolsonline/schools/5thyear.asp?iSchoolID=5237521
'Structured' HTML? (Use Firebug)
<p> <strong>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>The complainant requested a copy of the authorities approved business plan [...]<br /><strong>Section of Act/EIR & Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /><a title="Opens in new window" href="~/media/documents/decisionnotices/2010/fs_50295557.ashx" target="_blank">View PDF of Decision Notice FS50295557</a></p>
=ImportHTML("http://bob.com/mytable", "table", 1)=ImportXML("http://backtweets.com/search.xml?itemsperpage=100&...”)=ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2)
Spreadsheet formulae
1. Open a spreadsheet2. In cell A1 type a URL of a page with a table, e.g. http://www.horsedeathwatch.com3. In cell A2 type:=ImportHTML(A1, "table", 1)
Instructions at http://excelnotes.posterous.com/tag/importhtml
Walkthrough: =IMPORT (Google Docs)
"A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit."
Ethics
If you have to do a job more than once...
Let the computer do the work
Start with a question
What is the average? Who is top? Bottom?Time: what has happened since last year? 10 years ago? Space: Trends in fields/regions?What is the context?
Total expenditure =SUM(D:D)Biggest single spend =MAX(D:D)Average invoice value =MEDIAN(D:D)Spend per day =SUM(D:D)/30Number of invoices =COUNT(D2:D200)Number of invoices over £5000 =COUNTIF(D2:D200,”>5000”)
Interview the data
= indicates this is a formulaSUM is the formula to be applied( contains the ingredients for that formulaD2:D300 this is a range of cells*) ends the list of ingredients
*You might instead use a single cell, a value, or a ‘nested’ formula
Basic calculations
Walkthrough: using formulae
Use =COUNTIF to get a total number (e.g. loans over £1m)Use =SUMIF to find the total value of those loansUse =IF to create a new column that divides loans into 2 types
Data health
warning!
Remember the context: spending over £500
Insert > Pivot table > Layout... Put focus category in left columnIn middle: count or sum or averageAcross top: sub-categoriesSort, then re-edit to add count or sum, sub-categories
Data journalism on a deadline: Pivot tables
.
Questions?
Links
OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj08Delicious.com/paulb/DJDelicious.com/paulb/visDelicious.com/paulb/data
- Use advanced search to find data- Use tools to scrape data- Visualise a politician's speeches using Wordle or Many Eyes- Google form to crowdsource beer cost data?
Lab
Books
Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn'tDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data