data journalism

52

Upload: paul-bradshaw

Post on 28-Jan-2015

916 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data journalism
Page 2: Data journalism
Page 3: Data journalism

Philip Meyer, Detroit, 1967Knight newspapers reporter. Nieman Fellow interested in social research methods. Teamed up with academic to test stories being told about riots (poor immigrants being ‘deviant’). Field research, analysis, publication - 1 month debunked - no correlation between income, origin. Line about information abundance and need for ‘truth about the facts’

Page 4: Data journalism

Online JournalismCity UniversityPaul Bradshaw

Data journalism: “The truth about the facts”

Page 5: Data journalism

1. How is 2012 different to 1967?2. Getting data3. Getting stories

Themes

Page 6: Data journalism

Holly Watt, 2009

Page 7: Data journalism

The Guardian and Wikileaks

Page 8: Data journalism
Page 9: Data journalism
Page 10: Data journalism
Page 11: Data journalism
Page 12: Data journalism

“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”

Adrian Holovaty

Page 13: Data journalism
Page 14: Data journalism
Page 15: Data journalism

• Times Data Blog

Page 16: Data journalism
Page 17: Data journalism

”QUOTE”

Now is a good time.

Page 18: Data journalism

“The Tribune’s more than three dozen interactive databases, collectively have drawn three times as many page views as the site’s stories. [75% of traffic]”

http://bit.ly/dj2dmz

Page 19: Data journalism

.

Everything is zeroes and ones

Page 20: Data journalism

NumbersTextLive dataBehavioural dataImages, audio, video

If it’s digitised, it’s a subject for data journalism

Page 21: Data journalism
Page 22: Data journalism

(comparison, themes)

Page 23: Data journalism

Times film genres

Page 24: Data journalism

.

The process.

Page 25: Data journalism

25

Page 26: Data journalism

Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?

Passive vs active data journalism

Page 27: Data journalism

Official sources: ONS, data.gov.uk, etc.Secondary FOI: disclosure logs, WDTK, HansardReports and research: Google alertsUnofficial sources: Scraperwiki, OpenlyLocal, OpenCorporates, OpenCharities, etc.

Compile: Reactive

Page 28: Data journalism

Communities, mailing lists, groupsAdvanced search: Site:gov.uk (etc), Filetype:pdf (etc) Tip: database contents are invisibleScrapers - tools, write or ask

Compile: Proactive

Page 29: Data journalism

29

Page 30: Data journalism

“disclosure log” site:gov.uk“hate crime” filetype:xls site:police.uk“confidential” filetype:pdf site:gov.uk

Walkthrough: advanced search

Page 31: Data journalism

RSS, XML, JSON, RDF - and APIsScraperwikiOutwit HubGoogle RefineYahoo! PipesGoogle Docs formulae

Feeds and scrapers

Page 32: Data journalism

Format? Table? Pattern? URL?

'Structured' data

Page 33: Data journalism

http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=&region=&country=united+kingdom&sector=

http://www.ltscotland.org.uk/scottishschoolsonline/schools/5thyear.asp?iSchoolID=5237521

Page 34: Data journalism

'Structured' HTML? (Use Firebug)

<p>      <strong>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>The complainant requested a copy of the authorities approved business plan  [...]<br /><strong>Section of Act/EIR &amp; Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /><a title="Opens in new window" href="~/media/documents/decisionnotices/2010/fs_50295557.ashx" target="_blank">View PDF of Decision Notice FS50295557</a></p>

Page 35: Data journalism

=ImportHTML("http://bob.com/mytable", "table", 1)=ImportXML("http://backtweets.com/search.xml?itemsperpage=100&...”)=ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2)

Spreadsheet formulae

Page 36: Data journalism

1. Open a spreadsheet2. In cell A1 type a URL of a page with a table, e.g. http://www.horsedeathwatch.com3. In cell A2 type:=ImportHTML(A1, "table", 1)

Instructions at http://excelnotes.posterous.com/tag/importhtml

Walkthrough: =IMPORT (Google Docs)

Page 37: Data journalism

"A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit."

Ethics

Page 38: Data journalism

If you have to do a job more than once...

Let the computer do the work

Page 39: Data journalism

Start with a question

What is the average? Who is top? Bottom?Time: what has happened since last year? 10 years ago? Space: Trends in fields/regions?What is the context?

Page 40: Data journalism
Page 41: Data journalism
Page 42: Data journalism

Total expenditure =SUM(D:D)Biggest single spend =MAX(D:D)Average invoice value =MEDIAN(D:D)Spend per day =SUM(D:D)/30Number of invoices =COUNT(D2:D200)Number of invoices over £5000 =COUNTIF(D2:D200,”>5000”)

Interview the data

Page 43: Data journalism

= indicates this is a formulaSUM is the formula to be applied( contains the ingredients for that formulaD2:D300 this is a range of cells*) ends the list of ingredients

*You might instead use a single cell, a value, or a ‘nested’ formula

Basic calculations

Page 44: Data journalism
Page 45: Data journalism

Walkthrough: using formulae

Use =COUNTIF to get a total number (e.g. loans over £1m)Use =SUMIF to find the total value of those loansUse =IF to create a new column that divides loans into 2 types

Page 46: Data journalism

Data health

warning!

Remember the context: spending over £500

Page 47: Data journalism
Page 48: Data journalism

Insert > Pivot table > Layout... Put focus category in left columnIn middle: count or sum or averageAcross top: sub-categoriesSort, then re-edit to add count or sum, sub-categories

Data journalism on a deadline: Pivot tables

Page 49: Data journalism

.

Questions?

Page 50: Data journalism

Links

OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj08Delicious.com/paulb/DJDelicious.com/paulb/visDelicious.com/paulb/data

Page 51: Data journalism

- Use advanced search to find data- Use tools to scrape data- Visualise a politician's speeches using Wordle or Many Eyes- Google form to crowdsource beer cost data?

 Lab

Page 52: Data journalism

Books

Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn'tDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data