data: now i’ve got it; what do i do with...

35
DATA: Now I’ve got it; what do I do with it? Tom Johnson Managing Director Inst. for Analytic Journalism Santa Fe, New Mexico USA t o m @ j t j o h n s o n . c o m 1

Upload: others

Post on 28-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA:Now I’ve got it;

what do I do with it?

Tom JohnsonManaging Director

Inst. for Analytic JournalismSanta Fe, New Mexico USA

t o m @ j t j o h n s o n . c o m

1

Page 2: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA:Now I’ve Got It;

what do I do with it?Presentation atIRE’s Albuquerque Watchdog WorkshopFeb. 12-13, 2011Hosted by the University of New Mexico

• This PowerPoint deck posted at:

• Tipsheet handout posted at:

2

Page 3: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Important point

Thedocumentis notthe data.

3

Page 4: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Important point

The documentis notthe data.The data arenot thestory withoutanalysis.

4

Page 5: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Important point

5

Nothing is asimportant, andvaluable, as agood theory!

Page 6: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

A good, important THEORY

Theory of Journalistic Process

6

Data InAnalysis Info Out

Page 7: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA IN: Retrieve

Bookmark apps• Objectives:

• Access via browser – but not standard equipment• Create/manage sub-folders, categories &

keywords, annotations• Private and/or public sharing• Save and Export to backup system(s)

• Examples:• Xmarks: www.xmarks.com/• Diigo: www.diigo.com/index• Freeware/shareware search for at

www.tucows.com7

Page 8: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA IN: Store & Share in the Cloud

OK, it’s downloaded. Where ya gonna save it?

• Multiple back-up sites: desktop and…• Safer in Cloud than otherwise

• Passwords, but share capabilities

• Easier with “Cloud-sync” apps• Free to low-cost

8

Page 9: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA IN: Store & Share in the Cloud

OK, it’s downloaded. Where ya gonna save it?• Avoid MS Windows Live, SkyDrive and Mesh – more

trouble than they are worth• Dropbox - www.dropbox.com

9

Your Hard Drive

Page 10: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

OK, it’s downloaded. Where ya gonna save it?• Avoid MS Windows Live, SkyDrive and Mesh – more

trouble than they are worth• Dropbox - www.dropbox.com

Viewed in your browser

DATA IN: Store & Share in the Cloud

10

Folders, subfolders,sub-subfolders, etc.

Nearly instant sync-ingwith/from your desktop

Page 11: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA IN: Store & Share in the Cloud

OK, it’s downloaded. Where ya gonna save it?• Avoid MS Windows Live, SkyDrive and Mesh – more

trouble than they are worth• Dropbox - www.dropbox.com• SugarSync - www.sugarsync.com• Syncplicity - www.syncplicity.com• Jungle Disk ($3p/m) - www.jungledisk.com• Zumodrive (3p/m) - www.zumodrive.com• AeroFS - www.zumodrive.com• SpiderOak - spideroak.com• MiMedia, Wuala, Quanp,

11

Page 12: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In Analysis Info Out

12

Data In Analysis Info Out•Notes•Text•Numeric• Images•Charts/Graphs•Maps•Audio•Video•Atoms Bits•How? Who?

Page 13: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In: Objectives

• Move data from “out there” to analytic site/tools• Seeking fine-grained data, NOT aggregations

• Seek data in original form (i.e. NO PDFs)• Who collected the data? Why? How?• Who proofed/edited the data? Why? How?• If from data base, first ask for “record” or “code

sheet” or “schema”• Definitions of variables or fields. Constant or ???• Get data in lowest common denominator format:

Comma-delimited files in ASCII or Text

13

Page 14: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In: Challenges• New site in New Mexico: www.sunshineportalnm.com• “Beta,” but looks to be a cruel joke on taxpayers; torture for

journos

14

Page 15: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In: “Typical” problems with SunshineportalNM

• Barriers data = barriers to analysis• NO site search capability; no site map• Completely abandoned open-standard HTML, going forthe closed-standard Adobe Flash/Shockwaveenvironment.

• Page formats/layouts not standard; too many drill-downs instead of search-driven generators

• Jiggly roll-overs; too much effort spent on bling• Impossible to download or scrape data for analysis• State makes information available only in Adobe PDFfiles; notoriously unfriendly to data analysis.

15

Page 16: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In: Challenges in SunshinePort

•Comprehensive Annual Financial Reports•Possible to machine download, but laborious to format foranalysis

• Investment Holdings reports are far worse•They are poor-quality static image files, not machine-readable.

•Tabular data roughly formatted; makes conversion foranalysis an arduous, if not impossible task.

16

Page 17: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Bottom line on SunshinePortalNM.com

“If the State of New Mexico takes theposition that through this site it isdischarging all of its disclosureobligations with respect to theseparticular records, open government is introuble there.”

17

“This is not even a web page, it’s a Flashapplication, so there’s not going to bemuch sunlight escaping from this portal. ““A perfect example of creating the

appearance of transparency withoutactually being transparent.”

Page 18: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Challenge for Watchdogs?

• Failure on the part of planners/bureaucratsto simply…

• Give The People THEIR Data…• In The Most Basic, Original,

Straightforward Form…• And Let Them Figure Out What

Should Be Done With It!

18

Page 19: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Positive example of gov’t data

• Positive example: NM Leg Bill Locatorhttp://www.nmlegis.gov/lcs/_session.aspx?chamber=H&legtype=B&legno=%20406&year=11

19

Same data available intwo formats!

Page 20: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

NM HB 406• Senate approved 39-0 on Feb. 9

http://www.nmlegis.gov/Sessions/11%20Regular/bills/house/HB0406.html

• “An Act RELATING TO PUBLIC RECORDS;PROVIDING FOR THE INSPECTION OFELECTRONIC RECORDS.”

20

Page 21: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

NM HB 406

• “…information contained in information systems databasescreated or maintained by or on behalf of a public body … shallbe subject to disclosure to any person requesting theinformation in the format requested.

• “The information shall be provided in the most effective andefficient manner available to the custodian, as defined in theInspection of Public Records Act.

• B. The custodian may charge a reasonable fee forproduction of the information requested. The fee shall notexceed the cost of the materials and reasonable charges for thepersonnel required to retrieve and provide the information.

21

But what if it wasn’tNew Mexico state

employees directly atfault?

Page 22: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Why is it sunshineportalNM.COM ?• Domain Name: SUNSHINEPORTALNM.COM• Registrar:• Referral URL: http://www.wildwestdomains.com• Name Server: ENESFOUR.SKS.COM• Name Server: ENESONE.SKS.COM• Name Server: ENESTHREE.SKS.COM• Name Server: ENESTWO.SKS.COM• Status: clientDeleteProhibited• Status: clientRenewProhibited• Status: clientTransferProhibited• Status: clientUpdateProhibited• Updated Date: 30-mar-2010• Creation Date: 30-mar-2010• Expiration Date: 30-mar-2011NOTICE:

22

WILD WEST DOMAINS, INC

Page 23: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Wild West Domains

23

Page 24: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

• Registrant:Wild West Domains, Inc.14455 N Hayden Rd Suite 219Scottsdale, Arizona 85260 United States

Registered through: WWDomains.com• Domain Name: WILDWESTDOMAINS.COM

Created on: 22-Aug-00• Expires on: 22-Jul-19• Last Updated on: 08-Dec-09

Administrative Contact:• Wild West Domains, Inc.

[email protected]• Wild West Domains, Inc.• 14455 N Hayden Rd Suite 219• Scottsdale, Arizona 85260 United States• +1.4805058800 Fax -- +1.4805058844

• Technical Contact:Wild West Domains, [email protected] West Domains, Inc.14455 N Hayden Rd Suite 219Scottsdale, Arizona 85260 United States

+1.4805058800 Fax -- +1.4805058844

• Domain servers in listed order:CNS1.SECURESERVER.NETCNS2.SECURESERVER.NET

24

Page 25: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Media Ecology Association - Junio 2007Mexico City 25

Post-data recovery: Analytic DNA

Qualitative

•Who

•What

•When

•Why

•Where

•How

Quantitative

•How many/much

•What categories

•What type data andlevels

•What changes

•What“timeline”

Geo-location

•All stories havegeography

•People areinterested in how

close is this to me?

Page 26: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In Analysis Info Out

26

Data In Analysis Info Out• Notes• Text• Numeric• Images• Charts/Graphs• Maps• Audio• Video• AtomsBits

How? Who?

• What are we lookingfor? How can we besurprised?

• Source• Definition• Context• Estimating• Counting• Statistical• Geostatistical• Social Network Analysis• Forensic accounting

Page 27: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

The “Fundamental Five” Statistics

1. Calculating percent of change• (New-Old) ÷ Old * 100or• ((new/old) –1) * 100

2. Calculating proportion:• (# of parts ÷ TOTAL # of parts) * 100 = %

of whole

27

Page 28: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

The “Fundamental Five” Statistics

3. Calculating Rates:(incidents ÷ population) * 10,000 (or100,000)

4. Calculating Ratios:• Take first of two numbers being compared

and divide by second.• 600 ÷ 30 = 20 [Ratio is 20-to-1; if fraction,

round off]

28

Page 29: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

The “Fundamental Five” Statistics

5. Calculating Inflation:• (CPI Now ÷ CPI Then) * Item Price Then = Item

then in today’s $$$[Tool: http://www.westegg.com/inflation/]

• Calculating INFLATION RATECPI in 2000 is 3,500 CPI in 2001 is 4,500 What's the inflationrate?

4500 - 3500 = 10001000/3500 = .2857.....2857 * 100 = 28.57 is the INFLATION RATE

29

Page 30: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In Analysis Info Out

• Online tools• Google Docs Spreadsheets• Google Refine• Freebase• Google Fusion Tables

30

Google Refine is a power toolfor working with messy data,cleaning it up, transforming itfrom one format intoanother, extending it withweb services, and linking it todatabases like Freebase.

Fusion Tables: a service formanaging large collections of tabulardata in the cloud. You can uploadtables of up to 100MB and sharethem with collaborators, or makethem public.

Freebase is an open, CreativeCommons licensed repository ofstructured data of almost 20 millionentities. An entity is a single person,place, or thing. Freebase connectsentities together as a graph.

Page 31: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Data In Analysis Info Out

31

Data In Analysis Info Out• Notes• Text• Numeric• Images• Charts/Graphs• Maps• Audio• Video• AtomsBits

How?

• What are welooking for? Howcan we besurprised?

• Source• Definition• Context• Estimating• Counting• Statistical• Geostatistical• Social Network

Analysis• Forensic

accounting

• Broadcast• Web• Audio• Video• Text• Data visualization• Maps• Dynamic databases• Archives

Page 32: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

“Analytic tools” also for story-telling

• Spreadsheets:• Tables, charts, infographics

• Data base programs• Charts, graphs, data tables

• Stats programs (SPSS or SAS or R)• Generate graphics

• Social network analytic graphics• GIS

32

Page 33: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

“Analytic tools” also for story-telling

• Many Eyes: http://www-958.ibm.com/software/data/cognos/manyeyes/

• Timelines:• Sarah Cohen's Timeflow

https://github.com/FlowingMedia/TimeFlow/wiki/

• xTimeline (http://www.xtimeline.com/timeline/JTJ-Newspaper-History)

33

Page 34: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

Tomorrow?

• Our job is to “monitor the centres ofpower.”

-- Amira Haassaid

34

The documentis notthe data

Page 35: DATA: Now I’ve got it; what do I do with it?online.sfsu.edu/jjohnson/Presentations/Johnson_IRE...NM HB 406 • “…information contained in information systems databases created

DATA:Now I’ve got it;

what do I do with it?

Tom JohnsonManaging Director

Inst. for Analytic JournalismSanta Fe, New Mexico USA

t o m @ j t j o h n s o n . c o m

35

Gracias a todos