conversations with data

55
Conversations with Data Tony Hirst Computing and Communications, The Open University

Upload: tony-hirst

Post on 17-Dec-2014

3.013 views

Category:

Education


4 download

DESCRIPTION

#dalmooc 27/10/12 slides

TRANSCRIPT

Page 1: Conversations with data

Conversations with Data

Tony HirstComputing and Communications,

The Open University

Page 2: Conversations with data

(Recognising and addressing

a skills gap)

Page 3: Conversations with data

/via Adam Cooper, “Exploratory Data Analysis” http://blogs.cetis.ac.uk/adam/2012/05/18/exploratory-data-analysis/

http://cm.bell-labs.com/cm/ms/departments/sia/tukey/memo/techtools.html

“The Technical Tools of Statistics” read at the 125th Anniversary Meeting of the American Statistical Association, Boston, November 1964, published in April 1965 American Statistician.Jo

hn T

ukey

“journeyman carpenter of data-analytical tools”

Page 4: Conversations with data

“A Boy's Work is Never Done”, KellyB. (flickr: foreverphoto/2467694199/)

ouseful.info

Page 5: Conversations with data

“Exploratory data analysis is an attitude,

a flexibility,and reliance on display,

not a bundle of techniquesand should be so taught.” Jo

hn T

ukey

http://www.ece.rice.edu/~fk1/classes/ELEC697/TukeyEDA.pdf

Tukey, John W. "We need both exploratory and confirmatory." The American Statistician 34.1 (1980): 23-25.

Page 6: Conversations with data

“I … cannot disagree strongly enough with statements about the dangers of putting powerful tools in the hands of novices. Computer algebra, statistics, and graphics systems provide plenty of rope for novices to hang themselves and may even help to inhibit the learning of essential skills needed by researchers. The obvious problems caused by this situation do not justify blunting our tools, however. They require better education in the imaginative and disciplined use of these tools. And they call for more attention to the way powerful and sophisticated tools are presented to novice users.”

Leland Wilkinson, The Grammar of Graphics, Springer-Verlag, 1999, ISBN 0-387-98774-6, p15-16.

Page 7: Conversations with data

Data accessibility

Data sensemaking

Page 8: Conversations with data

CleanShape

AugmentLook

Page 9: Conversations with data

Dirty Data10th March, 2014,

3-10-14,

10/03/14

£1,249 millionNULL, NA, ‘’

Page 10: Conversations with data

openrefine.org

Page 11: Conversations with data
Page 12: Conversations with data
Page 13: Conversations with data
Page 14: Conversations with data
Page 15: Conversations with data
Page 16: Conversations with data
Page 17: Conversations with data
Page 18: Conversations with data

Shapes…

Page 19: Conversations with data
Page 20: Conversations with data

I see trees…

Page 21: Conversations with data

See also: IPython notebook demohttp://nbviewer.ipython.org/gist/psychemedia/9c54721e853403b43d21/pivotTable_demo.ipynb

Page 22: Conversations with data

“There is no more reason to expect one graph to ‘tell all’ than to expect one number to do the same.”

-- John Tukey

Page 23: Conversations with data

If quantities are conserved,can you think of them in terms of flow?

Page 24: Conversations with data

“[T]he picture examining eye is the best finder we haveof the wholly unanticipated.”

John Tukey

http://www.ece.rice.edu/~fk1/classes/ELEC697/TukeyEDA.pdf

Tukey, John W. "We need both exploratory and confirmatory." The American Statistician 34.1 (1980): 23-25.

Page 25: Conversations with data

How can we look at data?

Page 26: Conversations with data
Page 27: Conversations with data
Page 28: Conversations with data

How do we ask questions

of data?

else

Page 29: Conversations with data

underspend filetype:xls site:gov.uk

Search limits

Page 30: Conversations with data

underspend filetype:xls site:gov.uk

select webPages where text like “%underspend%” and filetype=“xls”

and domain=“gov.uk”

Structured queries

SQL

Page 31: Conversations with data

Count things

Sort things

Page 32: Conversations with data

http://www.coolinfographics.com/blog/2014/8/29/false-visualizations-sizing-circles-in-infographics.html

Page 33: Conversations with data

How do we interpret the

answers?

start to

Page 34: Conversations with data

Look for outliers

Top 3…

…bottom 3

median

mean

Page 35: Conversations with data

Outliers may be rare occurrences over time too…

Streaks and runs…

Page 36: Conversations with data

Look for similarities & differences

Page 37: Conversations with data
Page 38: Conversations with data
Page 39: Conversations with data
Page 40: Conversations with data

Look for trends

Page 41: Conversations with data
Page 42: Conversations with data
Page 43: Conversations with data
Page 44: Conversations with data

Look for patterns & structure

Page 45: Conversations with data
Page 46: Conversations with data
Page 47: Conversations with data
Page 48: Conversations with data

“Hand-drawing of graphs, except perhaps for reproduction in books and in some journals, is now economically wasteful, slow, and on the way out.”

– John Tukey

Page 49: Conversations with data
Page 50: Conversations with data
Page 51: Conversations with data

Recording your conversations

Page 52: Conversations with data

Rstudio.org

Page 53: Conversations with data

IPython Notebook

Page 54: Conversations with data

“I know of no person or group that is taking nearly adequate advantage of the graphical potentialities of the computer.”

– John Tukey

Page 55: Conversations with data

Hopefully, that contained some

ouseful.info-- @psychemedia