how humans see data
TRANSCRIPT
How Humans See Data
John Rauser@jrauser
November 2016
How Humans See Data
John Rauser@jrauser
November 2016
visualization
visualizationis
communication
how to make better visualizations
help humans solve analytical problems quickly and accurately
with visualization
Part I: Why visualize data at all?
x1.972
y1.236
x y
0.111 0.5421.112 1.994 0.902 0.0050.000 1.009 0.598 0.0850.665 1.942 1.613 1.7900.235 0.356 1.298 1.9550.247 1.658 0.651 1.9371.275 1.961 1.949 1.3160.702 0.045 0.099 0.5671.760 0.350 0.862 0.0101.691 0.277 0.027 0.7681.628 1.778 0.706 1.9561.957 1.290 1.042 1.999
pre-attentive processing
A graph is an encoding of the data.
x1.972
y1.236
x y
0.111 0.5421.112 1.994 0.902 0.0050.000 1.009 0.598 0.0850.665 1.942 1.613 1.7900.235 0.356 1.298 1.9550.247 1.658 0.651 1.9371.275 1.961 1.949 1.3160.702 0.045 0.099 0.5671.760 0.350 0.862 0.0101.691 0.277 0.027 0.7681.628 1.778 0.706 1.9561.957 1.290 1.042 1.999
n x y n x y1 1.972 1.236 13 0.111 0.5422 1.112 1.994 14 0.902 0.0053 0.000 1.009 15 0.598 0.0854 0.665 1.942 16 1.613 1.7905 0.235 0.356 17 1.298 1.9556 0.247 1.658 18 0.651 1.9377 1.275 1.961 19 1.949 1.3168 0.702 0.045 20 0.099 0.5679 1.760 0.350 21 0.862 0.010
10 1.691 0.277 22 0.027 0.76811 1.628 1.778 23 0.706 1.95612 1.957 1.290 24 1.042 1.999
Good visualizations optimize for the human visual system.
How does the human visual system work?
How does the human visual system decode a graph?
Cleveland’s three visual operations of pattern perception:
1. Detection2. Assembly3. Estimation
Part II: estimation
Three levels of estimation
a. discrimination X=Y X!=Yb. ranking X>Y X<Yc. ratioing X / Y = ?
At the heart of quantitative reasoning is a single question: Compared to what?
- Tufte, Envisioning Information
Three levels of estimation
a. discrimination X=Y X!=Yb. ranking X>Y X<Yc. ratioing X / Y = ?
the most important
thing
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
“The first rule of color: do not talk about color!”
- Tamara Munzner
luminance
saturation
hue
luminance
saturation
hue
Observation: Alphabetical is almost never the correct ordering
of a categorical variable.
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
11 mpg
11 mpg
11 mpg
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned
scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
Observation: Stacked anything is nearly always
a mistake.
Stacking makes the reader decode lengths, not position
on a common scale.
11 mpg
Observation: Stacked anything is nearly always
a mistake.
Observation: Pie charts are
ALWAYS a mistake.
Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
Clinton TrumpAmong Democrats 99% 1%Among Republicans 53% 47%
Who do you think did a better job in tonight’s debate?
Afghanistan Albania Algeria Angola ArgentinaAustralia Austria Bahrain Bangladesh BelgiumBenin Bolivia Bosnia and Herzegovina Botswana BrazilBulgaria Burkina Faso Burundi Cambodia CameroonCanada Central African Republic Chad Chile ChinaColombia Comoros Congo, Dem. Rep. Congo, Rep. Costa RicaCote d'Ivoire Croatia Cuba Czech Republic DenmarkDjibouti Dominican Republic Ecuador Egypt El SalvadorEquatorial Guinea Eritrea Ethiopia Finland FranceGabon Gambia Germany Ghana GreeceGuatemala Guinea Guinea-Bissau Haiti HondurasHong Kong, China Hungary Iceland India IndonesiaIran Iraq Ireland Israel ItalyJamaica Japan Jordan Kenya Korea, Dem. Rep.Korea, Rep. Kuwait Lebanon Lesotho LiberiaLibya Madagascar Malawi Malaysia MaliMauritania Mauritius Mexico Mongolia MontenegroMorocco Mozambique Myanmar Namibia NepalNetherlands New Zealand Nicaragua Niger NigeriaNorway Oman Pakistan Panama ParaguayPeru Philippines Poland Portugal Puerto RicoReunion Romania Rwanda Sao Tome and Principe Saudi ArabiaSenegal Serbia Sierra Leone Singapore Slovak RepublicSlovenia Somalia South Africa Spain Sri LankaSudan Swaziland Sweden Switzerland SyriaTaiwan Tanzania Thailand Togo Trinidad and TobagoTunisia Turkey Uganda United Kingdom United StatesUruguay Venezuela Vietnam West Bank and Gaza Yemen, Rep.Zambia Zimbabwe
All good pie charts are jokes.
Observation: Comparison is trivial on a common scale.
the dashboard metaphor is fundamentally flawed
Observation: Scatterplotsshow relationships directly.
Observation: Growth charts usually aren’t.
If growth (slope) is important, plot it directly.
Observation: Growth charts usually aren’t.
If growth (slope) is important, plot it directly.
The most important measurement should exploit the highest ranked encoding possible.
• Position along a common scale• Position on identical but nonaligned scales• Length• Angle or Slope• Area• Volume or Density or Color saturation• Color hue
Cleveland’s three visual operations of pattern perception:
1. Detection2. Assembly3. Estimation
Part three: assembly
Gestalt Psychology
reification
emergence
emergence
Prägnanz
Law Of Closure
Law Of Continuity
Observation: Good plots leverage the law of continuity
to assist with assembly.
Law of Similarity
Law of Proximity
Observation: dodged bar charts are a bad idea
Cleveland’s three visual operations of pattern perception:
1. Detection2. Assembly3. Estimation
Part IV: detection
excel’s defaults are pretty bad
1 2 3 4 5 6 -
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Observation: Detection isn’t as trivial as it seems.
“Above all else, show the data.”-Tufte
Part V: other useful results
Weber’s law: The “Just Noticeable Difference” is proportional to the
size of the initial stimuli.
10 20
10 20
100 110
12 units
12 units
Observation: Weber’s Law is why gridlines are useful
“Erase non-data ink.”
-Tufte
“Erase non-data ink, within reason.”
-Tufte
“Erase non-data ink that interferes with detection or doesn’t assist assembly and estimation.”
-Rauser
You are best at detecting variation in slope near 45 degrees.
banking to 45
Observation: Banking to 45 best shows variation in slope
Q: Should I include 0 on my scale?
Q: Should I include 0 on my scale?
A: It depends.
Q: Should I include 0 on my scale?
A: Relying on the pre-attentive perception of size or intensity?Yes, otherwise you will mislead.
Using position? It’s up to you.
“Above all else, show the data.”
-Tufte
“Above all else, show the variation in the data.”
-Rauser (via Tufte)
R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV
The rendered document is at http://rpubs.com/jrauser/hhsd_notes
This presentation is at http://goo.gl/VKxxya
I will tweet these links as @jrauser
coda
visualization is
communication
art is
communication
visualization is art
why does it make you feel that way?
visualization has as much to learn from art as from science
R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV
The rendered document is at http://rpubs.com/jrauser/hhsd_notes
This presentation is at http://goo.gl/VKxxya
I will tweet these links as @jrauser
end