Download - Data Journalism (City Online Journalism wk8)
Online JournalismCity UniversityPaul Bradshaw
Data journalism
1. What is it?2. Where to get it3. How to get it
Themes
“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”
Adrian Holovaty
Times film genres
• Times Data Blog
”QUOTE”
Now is a good time.
“The Tribune’s more than three dozen interactive databases, collectively have drawn three times as many page views as the site’s stories. [75% of traffic]”
http://bit.ly/dj2dmz
.
What is data?
NumbersTextLive dataBehavioural dataImages, audio, video
Anything that a computer can work with
Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?
Passive vs active data journalism
Data Journalism Continuum
Data.gov.ukGuardian datastoreOpenlylocal,Open Corporates, Open Charities, Who's Lobbying etc.FOI requests (WDTK), disclosure logsBooks - British Political Facts
Finding
GetTheData.orgWDMMG forumsMySociety mailing listsOpen Data CookbookWolfram Alpha forum
Finding – data communities
Government - national and local'Monitors' - regulators & other bodiesCharities, pressure groupsInstitutions - academic, scientific, healthBusiness, financeMedia, entertainment, sport
Other secondary sources
Site:gov.uk (etc)Filetype:pdf (etc) Imagine the page you hope to find, including jargon etc. Database contents are invisibleGoogle News alerts: report OR review
Advanced search
"quotes search for exact phrases""disclosure logs" site:nhs.uk + ensures page contains word: +logs- omits results with word: -wooden* wildcard, e.g. "deaths * custody"~ synonyms, e.g. ~deaths
Advanced search
Tip: use overseas sources
• US medicine databases• EU subsidy databases • Swedish people data• International police agency
correspondence with UK
RSS, XML, JSON, RDF - and APIsScraperwikiOutwit HubYahoo! PipesSpreadsheet formulae(look them up)
Feeds and scrapers
Format? Table? Pattern? URL?
'Structured' data
http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=®ion=&country=united+kingdom§or=
'Structured' HTML? (Use Firebug)
<p> <strong>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>The complainant requested a copy of the authorities approved business plan [...]<br /><strong>Section of Act/EIR & Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /><a title="Opens in new window" href="~/media/documents/decisionnotices/2010/fs_50295557.ashx" target="_blank">View PDF of Decision Notice FS50295557</a></p>
=ImportHTML("http://bob.com/mytable", "table", 1)=ImportXML("http://backtweets.com/search.xml?itemsperpage=100&...”)=ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2)
Spreadsheet formulae
Fetch Page module Regex
Yahoo! Pipes
"A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit."
Ethics
.
Questions?
Links
OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj08Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/data
- Use advanced search to find data- Use tools to scrape data- Visualise a politician's speeches using Wordle or Many Eyes- Read up on some of the tools or technologies before the lab
Lab
Books
Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn'tDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data
.
Assignments
Enough time?
10 credits = 100 hoursLectures = 15 hoursGroup blog = 60 hours (75%)Strategy = 20 hours (25%)(Some in labs) + 5 hours on other issues
Enough time? Blog
Just an example:10 posts ranging from simple links to interviews, analysis, experiment5.5 hours ave per week x10 weeks = 55 hours+ 5 hours to write evaluation
Enough time? Strategy
Just an example:12.5 hours researching community30 mins per week x10 weeks with community (2.5 hours)5 hours analysis & write up
Group blogs
8 areas:1.Online video; 2. Online audio3. Data; 4. UGC5. Community management6. Mobile; 7. Social media8. Infographics and photography
Criteria
Ass1: Newsgathering/researchProductionLaw, ethics and strategyAss 2: ResearchAnalysisExecution