![Page 1: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/1.jpg)
IntroductionPaul Bradshaw
Data journalism
![Page 2: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/2.jpg)
Ivy Lee
![Page 3: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/3.jpg)
“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”
Adrian Holovaty
![Page 4: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/4.jpg)
![Page 5: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/5.jpg)
Great storiesEngagementTargeting/relevance
Why?
![Page 6: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/6.jpg)
![Page 7: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/7.jpg)
![Page 8: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/8.jpg)
![Page 9: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/9.jpg)
![Page 10: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/10.jpg)
![Page 11: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/11.jpg)
“The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.”
http://bit.ly/dj2dmz
![Page 12: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/12.jpg)
![Page 13: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/13.jpg)
![Page 14: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/14.jpg)
Times film genres
![Page 15: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/15.jpg)
![Page 16: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/16.jpg)
Data Journalism Continuum
![Page 17: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/17.jpg)
1. Finding data
![Page 18: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/18.jpg)
What is data?
![Page 19: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/19.jpg)
NumbersTextConnectionsLive dataBehavioural dataImages, audio, video
Anything that a computer can work with
![Page 20: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/20.jpg)
![Page 21: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/21.jpg)
Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?
Passive vs active data journalism
![Page 22: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/22.jpg)
Data.gov.ukWhat Do They KnowOpenlylocal, Scraperwiki
Disclosure logsRSS feeds, XML, structured data
Some UK projects
![Page 23: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/23.jpg)
Delicious.com/paulb/car
CAR
![Page 24: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/24.jpg)
Advanced search by file type
“Performance figures” Filetype: pdfFiletype: xlsFiletype: docFiletype: pptFiletype: rdf OR xml
![Page 25: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/25.jpg)
Advanced search by domain
“Disclosure logs” site: .gov.esDatabase site: .org.cat OR .org+Tables –chairs site:Health, police, military domains
![Page 26: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/26.jpg)
Use overseas sources
• US medicine databases• EU subsidy databases • Swedish people data• International police agency
correspondence
![Page 27: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/27.jpg)
Scraping
Scraping can automate & schedule the gathering process if there are multiple sourcesTools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
![Page 28: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/28.jpg)
Interrogating data
![Page 29: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/29.jpg)
Humans collect dataHumans enter dataHuman error
Time spent now...
![Page 30: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/30.jpg)
Different words for the same thingDouble spaces, punctuationWrong data typeMistypedDuplicate entriesDefault entries (1/1/00)
...Saves time later
![Page 31: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/31.jpg)
"Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do."
David Donald, Center for Public Integrity
![Page 32: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/32.jpg)
Group by term then sort to see duplicationsFind & replace double spaces, etc. Select column/row & check data typeSort to find unusually large/small, and neighbouring misspellings
Cleaning methods
![Page 33: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/33.jpg)
Never publish a name from data without running a background check
Check.
![Page 34: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/34.jpg)
Other tools
Freebase Gridworks: see http://vimeo.com/10081183
![Page 35: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/35.jpg)
Visualising data
![Page 36: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/36.jpg)
![Page 37: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/37.jpg)
or http://chartchooser.juiceanalytics.com/
![Page 38: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/38.jpg)
![Page 39: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/39.jpg)
(trends, dips, correlations)
![Page 40: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/40.jpg)
![Page 41: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/41.jpg)
(comparison, themes)
![Page 42: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/42.jpg)
(proportions, comparison)
![Page 43: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/43.jpg)
Mashing data
![Page 44: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/44.jpg)
Geocoded data with map- Live data (e.g. Twitter API)- Static data (e.g. Google Docs)- Dynamic data (e.g. Google Form)2 spreadsheets with common data- Tools: MySQL, Access, etc.
Combining data sources
![Page 45: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/45.jpg)
![Page 46: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/46.jpg)
![Page 47: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/47.jpg)
![Page 48: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/48.jpg)
![Page 49: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/49.jpg)
![Page 50: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/50.jpg)
![Page 51: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/51.jpg)
![Page 52: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/52.jpg)
TwittermapWikipedia mapNYT PropertyGuardian vs NatureBBC Most ReadBBC Olympic Village
Combining data sources
![Page 53: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/53.jpg)
Big events (protests, Olympics, inauguration)ComparisonsGeocoded dataConnections
What mashes well?
![Page 54: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/54.jpg)
AggregatesMapsFiltersCountsCleans or reformats (regex)
Yahoo! Pipes
![Page 55: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/55.jpg)
Scraperwiki – mapping libraryMaptube – combine mapsGoogle Docs – publish in different formats+++
Other tools
![Page 56: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/56.jpg)
Computer-readable dataParis – France, Texas, or Hilton?Unique identifiers – usually URIRDF, RDFa, XML, etc.
Semantic web & linked data
![Page 57: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/57.jpg)
Application Programming InterfaceBuild on top of dataGoogle Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc.
API
![Page 58: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/58.jpg)
Slideshare.net/onlinejournalistTwitter.com/paulbradshaw
Q&A
![Page 59: New information for new journalists pt2: data](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554acffbb4c90524738b54f2/html5/thumbnails/59.jpg)
Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/statistics
Bookmarks