an introduction to designing and building data visualizations

127
An introduction to designing and building data visualizations Kristen Sosulski [email protected]

Upload: dinos

Post on 25-Feb-2016

75 views

Category:

Documents


3 download

DESCRIPTION

An introduction to designing and building data visualizations. Kristen Sosulski [email protected]. About me. One of many influences…. Agenda. I. What is data visualization? II. What types of stories can you tell with a visualization? III. How to approach c reating visualizations? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An introduction to designing and building data visualizations

An introduction to designing and building data visualizations

Kristen [email protected]

Page 2: An introduction to designing and building data visualizations

About me

Page 3: An introduction to designing and building data visualizations

One of many influences…

Page 4: An introduction to designing and building data visualizations
Page 5: An introduction to designing and building data visualizations
Page 6: An introduction to designing and building data visualizations

Agenda

I. What is data visualization?II. What types of stories can you tell with a visualization?III. How to approach creating visualizations?III. Try it and apply it.

Page 7: An introduction to designing and building data visualizations

I. DEFINING VISUALIZATION

Page 8: An introduction to designing and building data visualizations

Visualization is a kind of narrative, providing aclear answer to a question without extraneous

details.

-- Ben Fry, 2008, p. 4.

Page 9: An introduction to designing and building data visualizations

Visualization is a graphical representation of some data or concepts

-- Colin Ware, 2008, p. 20

Page 10: An introduction to designing and building data visualizations

Visual design is mapping datato visual form. It should conveythe unique properties of thedata set it represents.

Page 13: An introduction to designing and building data visualizations

Visualizations

• Help us think• Use perception to

offload cognition• Serves as an external

aid to augment working memory

• Boost our cognitive abilities

Page 14: An introduction to designing and building data visualizations

Visualizations are helpful in communication and analysis

Dual channels

Limited capacity

Active Processing

Page 15: An introduction to designing and building data visualizations

However, visualizations can hinder our message when designed poorly.

Wong, 2010, p. 15

Page 16: An introduction to designing and building data visualizations

Good Chart Design

Use natural increments for the y-axis scale

Include a zero baseline in all bar charts

Place the larger segments of a pie chart on top at the 12 o’clock

Wong, 2010, p. 143

Page 17: An introduction to designing and building data visualizations

Data visualization enables us to record, analyze, and communicate

Past Present Future

Page 18: An introduction to designing and building data visualizations

Rationale

• Traditional reports using tables, rows, and columns do not paint the whole picture or, even worse, lead an analyst to a wrong conclusion.

• Firms need to use data visualization because information workers: – Cannot see a pattern without data visualization– Cannot fit all of the necessary data points onto a single

screen– Cannot effectively show deep and broad data sets on a

single screen.

Source: Evelson, B. & Yuhanna, N. (2012). The Forrester Wave: Advanced data visualization (ADV) Platforms, Q3, 2012. Forrester Research, July 17.

Page 19: An introduction to designing and building data visualizations

Patterns: Violence in Video Games News Stores: Using a filled density plot

Source: http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html. Begin at 4:45

Page 20: An introduction to designing and building data visualizations

Data Points: U.S. Unemployment Rate using a choropleth map

Source: Forbes

Page 21: An introduction to designing and building data visualizations

Data Points: Student Loan Debt using a bar, line, and area charts

Source: The Federal Reserve Bank of New York: http://www.newyorkfed.org/studentloandebt/

Page 23: An introduction to designing and building data visualizations

Deep and Broad: Four Ways to Slice Obama’s 2013 Budget Proposal using a bubble pie chart

Source: New York Times: http://www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html

Page 24: An introduction to designing and building data visualizations

II. WHAT STORIES CAN YOU TELL WITH DATA VISUALIZATION?

Page 25: An introduction to designing and building data visualizations

Hans Rosling on Poverty using a bubble chart with sliders

http://www.ted.com/talks/hans_rosling_reveals_new_insights_on_poverty.html

Page 26: An introduction to designing and building data visualizations

All medalists racing the 100 meter sprint

Source: http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one-race-every-medalist-ever.html

Page 27: An introduction to designing and building data visualizations

Old vs. New Data Visualization

• Dynamic data = = Dynamic Visualizations• Visual querying. Drill downs. Drop downs.• Animated visualization.

– If a particular dimension, such as time, has hundreds or thousands of values (i.e. daily values over multiple years), manually clicking through every day is not practical.

– An animated scroll up/down is more practical.

Page 28: An introduction to designing and building data visualizations

You could tell a story like this… or

Page 29: An introduction to designing and building data visualizations

Patterns: How people spend their time using a stacked area/line graph

Source: New York Times

Page 30: An introduction to designing and building data visualizations

The growth of Target from 1962 to 2008 using an animated graduated symbol map

Source: Flowing data

Page 31: An introduction to designing and building data visualizations

How long does it take to afford a beer? Using a horizontal bar chart.

Page 32: An introduction to designing and building data visualizations

III. HOW TO APPROACH CREATING VISUALIZATIONS

Page 33: An introduction to designing and building data visualizations

A framework to get started…

Who’s the audience?

What’s the task?

What’s the data?

What’s the best visual display?

What’s the best visual display?

Page 34: An introduction to designing and building data visualizations

What do these charts have in common?

Scatter plot Matrix chart Network diagram

They show a relationship between points.

Page 35: An introduction to designing and building data visualizations

What do these charts have in common?

Bar Chart Block Histogram

Bubble Chart

They compare a set of values.

Page 36: An introduction to designing and building data visualizations

What do these charts have in common?

Line Graph Stacked Line/Area Graph

Track rises and falls over time

Page 37: An introduction to designing and building data visualizations

What do these charts have in common?

Pie Chart Treemaps

Seeing parts of the whole

Page 38: An introduction to designing and building data visualizations

What do these charts have in common?

Phrase Nets Word Clouds Word Trees

Page 39: An introduction to designing and building data visualizations

Edward Tufte: On exploring forms of display

http://www.youtube.com/watch?v=Th_1azZA2OY&noredirect=1

Page 40: An introduction to designing and building data visualizations

After we select our display, we need to apply effective design principles.

Let’s test our knowledge with a graph IQ test.

Page 41: An introduction to designing and building data visualizations

Graph Design IQ Test

This test will ask you 10 questions to determine how well you understand the

principles of good table and graph design. Good luck!

Page 42: An introduction to designing and building data visualizations

1: Which graph makes it easier to determine whether Mid-Cap U.S. Stock or Small-Cap U.S. Stock has the greater share?

International Stock

Large Cap US Stock

Bonds

Real Estate

Mid-Cap US Stock

Investment Portfolio Breakdown

Small Cap US Stock

Commodities

Page 43: An introduction to designing and building data visualizations

1: Which graph makes it easier to determine whether Mid-Cap U.S. Stock or Small-Cap U.S. Stock has the greater share?

International Stock

Large-Cap U.S. Stock

Bonds

Real Estate

Mid-Cap U.S. Stock

Small-Cap U.S. Stock

Commodities

Investment Portfolio Breakdown

0% 4% 8% 12% 16% 20%

Page 44: An introduction to designing and building data visualizations

1: Which graph makes it easier to determine whether Mid-Cap U.S. Stock or Small-Cap U.S. Stock has the greater share?

A. Pie ChartB. Bar Graph

Pie Chart Bar Graph

Page 45: An introduction to designing and building data visualizations

2: Which of these line graphs is easier to read?2-D Line Graph

60

Millions of USD

50

40

30

20

10

0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Company Sales

Page 46: An introduction to designing and building data visualizations

2: Which of these line graphs is easier to read?3-D Line GraphMillions

of USD

60

50

40

30

20

10

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Company Sales

Page 47: An introduction to designing and building data visualizations

2: Which of these line graphs is easier to read?

A. 2-D Line GraphB. 3-D Line Graph

2-D Line Graph 3-D Line Graph

Page 48: An introduction to designing and building data visualizations

3: Which of these two tables is easier to read?Table A

Region Revenue % of Total Revenue

Expenses Profit % of Total Profit

Europe $75,904,604 31.06% $40,988,486 $34,916,117 22.31%

Canada $51,572,694 21.10% $17,534,715 $34,037,978 21.75%

Western US $42,660,178 17.46% $11,944,849 $30,715,328 19.63%

Eastern US $33,977,385 13.90% $7,135,150 $26,842,134 47.15%

Central US $26,139,598 10.70% $3,920,939 $22,218,658 14.20&

Asia $14,135,278 5.78% $6,360,875 $7,774,402 4.97%

Total (or Avg) $244,389,737 100.00% $87,885,117 $156,504,619 100.00%

Sales Summary by Region

1st Quarter, 2007Regions are Sorted by Revenue

Page 49: An introduction to designing and building data visualizations

3: Which of these two tables is easier to read?Table B

Region Revenue % of Total Revenue

Expenses Profit % of Total Profit

Europe $75,904,604 31.06% $40,988,486 $34,916,117 22.31%

Canada $51,572,694 21.10% $17,534,715 $34,037,978 21.75%

Western US $42,660,178 17.46% $11,944,849 $30,715,328 19.63%

Eastern US $33,977,385 13.90% $7,135,150 $26,842,134 47.15%

Central US $26,139,598 10.70% $3,920,939 $22,218,658 14.20&

Asia $14,135,278 5.78% $6,360,875 $7,774,402 4.97%

Total (or Avg) $244,389,737 100.00% $87,885,117 $156,504,619 100.00%

Sales Summary by Region(USD) 1st Quarter, 2007Regions are Sorted by Revenue

Page 50: An introduction to designing and building data visualizations

3: Which of these two tables is easier to read?

A. Table AB. Table B

Table A

Table B

Page 51: An introduction to designing and building data visualizations

4: Which graph makes it easier to focus on the pattern of change through time, instead of the individual

values.

Bar Graph

Unique Visitors

Page Views

Millions

3.0

2.5

2.0

1.5

1.0

0.5

0.0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2006 Web Traffic

Page 52: An introduction to designing and building data visualizations

4: Which graph makes it easier to focus on the pattern of change through time, instead of the individual

values.

Line Graph

Millions

3.0

2.5

2.0

1.5

1.0

0.5

0.0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2006 Web Traffic

Page 53: An introduction to designing and building data visualizations

4: Which graph makes it easier to focus on the pattern of change through time, instead of the individual values.

A. Bar GraphB. Line Graph

Line Graph

Bar Graph

Page 54: An introduction to designing and building data visualizations

5: Only one of these graphs accurately encodes the values. The other skews the values in a misleading

manner. Which graph presents the data accurately?

Graph ANumbers of Shareholders

2,500

2,400

2,300

2,200

2,100

2000Yes No Undecided

Page 55: An introduction to designing and building data visualizations

5: Only one of these graphs accurately encodes the values. The other skews the values in a misleading

manner. Which graph presents the data accurately?

Graph BNumbers of Shareholders

2,500

2,000

1,500

1,000

500

0Yes No Undecided

Page 56: An introduction to designing and building data visualizations

5: Only one of these graphs accurately encodes the values. The other skews the values in a misleading

manner. Which graph presents the data accurately?

A. Graph AB. Graph B

Graph A Graph B

Page 57: An introduction to designing and building data visualizations

6: Which map makes it easier to find all of the counties with positive growth rates?

Map A2006 Growth Rate by County

-3% 0% +3%

Page 58: An introduction to designing and building data visualizations

6: Which map makes it easier to find all of the counties with positive growth rates?

Map B2006 Growth Rate by County

-3% 0% +3%

Page 59: An introduction to designing and building data visualizations

6: Which map makes it easier to find all of the counties with positive growth rates?

A. Map AB. Map B

Map A Map B

Page 60: An introduction to designing and building data visualizations

7: Which graph makes it easier to determine R&D’s travel expense?

USD 70

60

50

40

30

20

10 0

Payroll

Equipment

Travel

Supplies

Software

Misc.

R&D Sales

Management

Accounting

2006 Expenses by Department 3D Bar Graph

Page 61: An introduction to designing and building data visualizations

7: Which graph makes it easier to determine R&D’s travel expense?

R&D Sales Management Accounting Payroll

Equipment

Travel Supplies

Software

Misc.

2006 Expenses by Department

0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80

2D Bar Graph

Page 62: An introduction to designing and building data visualizations

7: Which graph makes it easier to determine R&D’s travel expense?

A. 3D Bar Graph (left)B. 2D Bar Graph (below)

Page 63: An introduction to designing and building data visualizations

8: In which graph are the labels easier to read?

Graph A

2006 Marketing Expenditures By CountryThousands of USD

7,000

6,000

5,000

4,000

3,000

2,000

1,000

0 United States

Canada United Kingdom

Japan France Germany Mexico China

Page 64: An introduction to designing and building data visualizations

8: In which graph are the labels easier to read?Graph B

2006 Marketing Expenditures By Country

Thou

sand

s of U

SD

7,000

6,000

5,000

4,000

3,000

2,000

1,000

0,000

Uni

ted

Stat

es

Cana

da

Uni

ted

King

dom

Japa

n

Fran

ce

Germ

any

Mex

ico

Chin

a

Page 65: An introduction to designing and building data visualizations

8: In which graph are the labels easier to read?

A. Graph AB. Graph B

Graph A Graph B

Page 66: An introduction to designing and building data visualizations

9: Which graph is easier to look at?

Graph A

Nebraska Oklahoma Kansas

USD in Thousands

100

80

60

40

20

0

Human Accounting Management Sales Manufacturing Resources

Median Employee Salary by Department and State

Page 67: An introduction to designing and building data visualizations

9: Which graph is easier to look at?

Graph B

Nebraska Oklahoma Kansas

USD in Thousands

100

80

60

40

20

0

Human Accounting Management Sales Manufacturing Resources

Median Employee Salary by Department and State

Page 68: An introduction to designing and building data visualizations

9: Which graph is easier to look at?

A. Graph AB. Graph B

Graph B

Graph A

Page 69: An introduction to designing and building data visualizations

10: Which table allows you to see the areas of poor performance more quickly?

Table A

Region Overall Revenue Expenses Profit Avg. Order Size

East Good $4,652,462 $2,682,765 $1,969,697 $6,845

West Fair 3,705,426 2,211,773 1,493,653 4,266

North Fair 3,215,789 2,712,984 502,805 4,568

South Poor 2,215,752 1,562,735 653,017 1,358

Overall Fair $13,789,429 $9,170,257 $4,619,172 $4,259

2006 Key Metrics

Page 70: An introduction to designing and building data visualizations

10: Which table allows you to see the areas of poor performance more quickly?

Table B

Region Overall Revenue Expenses Profit Avg. Order Size

East Good $4,652,462 $2,682,765 $1,969,697 $6,845

West Fair 3,705,426 2,211,773 1,493,653 4,266

North Fair 3,215,789 2,712,984 502,805 4,568

South Poor 2,215,752 1,562,735 653,017 1,358

Overall Fair $13,789,429 $9,170,257 $4,619,172 $4,259

2006 Key Metrics

Page 71: An introduction to designing and building data visualizations

10: Which table allows you to see the areas of poor performance more quickly?

A. Table AB. Table B

Table B

Table A

Page 72: An introduction to designing and building data visualizations

Above all else show the data

---Edward Tufte

Page 73: An introduction to designing and building data visualizations

Sometimes decorations can help editorializeabout the substance of the graphic. But it’swrong to distort the data measures—the inklocating values of numbers—in order to makean editorial comment or fit a decorative scheme.

--Edward Tufte

Page 74: An introduction to designing and building data visualizations

Principles• Chartjunk• Data-ink ratio• Data integrity

– Lie Factor• Data Richness• Scales

– Pie chart. Zero point.• Color.

– Color blindness– Using color sparingly– Use red for negative earnings

• Attribution

Page 75: An introduction to designing and building data visualizations

Avoid chart junk

Useless, non-informative, or information-obscuring elements of quantitative information displays.

Page 76: An introduction to designing and building data visualizations
Page 77: An introduction to designing and building data visualizations

Chart Junk: Remove grid lines

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr0

1

2

3

4

5

6

7

8

9

Sales

Sales

Page 78: An introduction to designing and building data visualizations

Chart Junk: Remove the frame around the visual

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr0

1

2

3

4

5

6

7

8

9

2010 Sales Data (in millions)

Page 79: An introduction to designing and building data visualizations

Chart Junk: Consider if tick marks are necessary

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr0

1

2

3

4

5

6

7

8

92010 Sales Data (in millions)

Page 80: An introduction to designing and building data visualizations

Tables and Charts: Remove Gridlines

2010 Forecast vs. Performance (U.S. $)

Forecast Performance

Qtr 1 $85,000 $95,000

Qrt 2 $80,000 $75,000

Qtr 3 $75,000 $65,000

Qtr 4 $60,000 $60,000

Total $300,000 $295,000

Page 81: An introduction to designing and building data visualizations

Tables and Charts: Remove Gridlines

2010 Forecast vs. Performance (U.S. $)

Forecast Performance

Qtr 1 $85,000 $95,000

Qrt 2 $80,000 $75,000

Qtr 3 $75,000 $65,000

Qtr 4 $60,000 $60,000

Total $300,000 $295,000

Page 82: An introduction to designing and building data visualizations

Data Ink Ratio

Reduce the amount of “ink” used to represent the data.

Page 83: An introduction to designing and building data visualizations

Data Ink Ratio: Too many bars to represent a single data point

Page 84: An introduction to designing and building data visualizations

Data Ink Ratio: Consider bin size.

Page 85: An introduction to designing and building data visualizations

Data Ink Ratio: Would an area chart work better?

Page 86: An introduction to designing and building data visualizations

Data Ink Ratio: Or a line Chart?

Page 87: An introduction to designing and building data visualizations

Data Integrity: Lie Factor

Lie Factor = size of effect shown in graphic size of effect of data

Page 88: An introduction to designing and building data visualizations

Data Integrity: Lie Factor = 14.8

Page 89: An introduction to designing and building data visualizations

Data Integrity: Decorate data without lying

Page 90: An introduction to designing and building data visualizations

Data Integrity: Does a change in perspective help tell your story?

1st Qtr2nd Qtr

3rd Qtr4th Qtr

0123456789

2010 Sales Data (in millions)

Page 91: An introduction to designing and building data visualizations

Data Integrity: Ensure a zero point scale

Page 92: An introduction to designing and building data visualizations

Proportions

Page 93: An introduction to designing and building data visualizations

Sales

1st Qtr2nd Qtr3rd Qtr4th Qtr

Proportions

Page 94: An introduction to designing and building data visualizations

Proportions

Page 95: An introduction to designing and building data visualizations

Proportions: What else is wrong?

8.23.2

1.4

1.2

Sales

1st Qtr2nd Qtr3rd Qtr4th Qtr

Page 96: An introduction to designing and building data visualizations

Doesn’t add up to 1 or 100%

Page 97: An introduction to designing and building data visualizations

Better. What’s still wrong?

0.5860.229

0.100

0.086

Sales

1st Qtr2nd Qtr3rd Qtr4th Qtr

Page 98: An introduction to designing and building data visualizations

Qtr 1 Qrt 2 Qtr 3 Qtr 4$0

$20,000

$40,000

$60,000

$80,000

$100,000

$120,000

$140,000

$160,000

$180,000

$200,000

PerformanceForecast

Sales performance compared to forecasted sales 2010U.S. $

Page 99: An introduction to designing and building data visualizations

Data RichnessRich data means quality data – accurate data from reputable sources plus effective filtering of data for the audience.

Wong, 2010, p. 28

Page 100: An introduction to designing and building data visualizations

Data Richness. Tell the whole story with an excerpt

Wong, 2010, p. 29

This Year Last Year

Page 101: An introduction to designing and building data visualizations

Data Richness. However, don’t be misleading….

Wong, 2010, p. 29

This Year Last Year

Page 102: An introduction to designing and building data visualizations

Data Quantity!= Data Richness

Wong, 2010, p. 29

Inconclusive

An upward trend

Page 103: An introduction to designing and building data visualizations

Color

• Minimize the use of color

• Use shading instead– From lightest to darkest

(no zebra pattern• Consider using red for

negative earnings.

Page 104: An introduction to designing and building data visualizations

Color. Some people are color blind

Page 105: An introduction to designing and building data visualizations

Labeling and Attribution• Explain encodings. • The design of every graph has a similar flow. You get the

data; encoded it with circles, bars, and colors; and then you let others read it.

• The readers have to decode your encodings at this point. • Describe what do the circles, bars, and colors represent.• Label directly on the data instead of/or in addition to

using a legend. • Cite your data source.

Source: Wong (2010); Yau (2011), p. 13

Page 106: An introduction to designing and building data visualizations

IV. TRY IT. APPLY IT.

Page 107: An introduction to designing and building data visualizations

Which MBA?

Page 108: An introduction to designing and building data visualizations
Page 109: An introduction to designing and building data visualizations

Let’s try to create something similar

Page 110: An introduction to designing and building data visualizations

Run mbarankpart1.pyDefault Sorted

Page 111: An introduction to designing and building data visualizations

01 – Bigger Figure Size

Page 112: An introduction to designing and building data visualizations

02 - Removing gray background and frame

Page 113: An introduction to designing and building data visualizations

03 - Make room for others… Remove frame

Page 114: An introduction to designing and building data visualizations

04 – Iterate to remove tick marks

Page 115: An introduction to designing and building data visualizations

05 – Bar height and bar color and edge

Page 116: An introduction to designing and building data visualizations

mbarank_part3.py: Now, plot 2 others and add ranks…

Page 117: An introduction to designing and building data visualizations

Refine your visual display in Adobe Illustrator

Page 118: An introduction to designing and building data visualizations

Tips for saving your image file

• If you are going to modify the image in Illustrator save your file as a PDF from PYTHON– Use savfig(filename)

savfig(mbarankings.pdf)– Or save from the function show() that launches

the interactive window in ipython

Page 119: An introduction to designing and building data visualizations

Working in Adobe Illustrator1. Open pdf document in Adobe Illustrator2. If you don’t see the Tools window, go to the Window menu and click

Tools to turn it on.3. The black arrow is called the Selection tool. Select it, and your

mouse pointer becomes a black arrow. 4. Click and drag it over the border. The border appears highlighted.

This is know as a clipping mask.5. Press delete on your keyboard to get rid of it.6. If this deletes the graphic, undo the edit, and use the Direct

Selection tool, which is represented by a white arrow, to highlight the clipping mask instead.

7. Use the Selection tool to change fonts, change colors, add text, etc.

Page 120: An introduction to designing and building data visualizations

One Mistake: Don’t Average Percentages

• You must go back to the original data source to recalculate the new percentage.

Page 121: An introduction to designing and building data visualizations

RESOURCES

Page 122: An introduction to designing and building data visualizations

Edward Tufte

Page 123: An introduction to designing and building data visualizations

Nathan Yau and Dona Wong

Page 124: An introduction to designing and building data visualizations

Casey Reas and Ben Fry

Page 125: An introduction to designing and building data visualizations

Stephen Few

Page 126: An introduction to designing and building data visualizations

Colin Ware and Richard Mayer

Page 127: An introduction to designing and building data visualizations

Seth Godin and Andy Goodman