data visualization by david kretch
TRANSCRIPT
![Page 1: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/1.jpg)
Data Visualization
April 3, 2015
• When you should graph
• What you should graph
• Given some data, how would you graph it
![Page 2: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/2.jpg)
2
When should you graph your data?
Data Visualization
AlwaysDon’t just make graphs for client reports -- graph your data for yourself, so you understand it.
If you use a table in a report, see if you can make it into a graph.
![Page 3: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/3.jpg)
3
Why graphs?
Because of the environment that humans evolved in, we are much
better at getting info from color, size, shape, and position than from reading text.
Data Visualization
Find the dangerous creatures!
![Page 4: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/4.jpg)
4
Why graphs work
• Color
• Size• Shape
• Position
Data Visualization
![Page 5: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/5.jpg)
5
Why else do people like graphs?
People like cool-looking stuff.
Data Visualization
Not cool Cool
![Page 6: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/6.jpg)
6
What are we currently doing?
• Making lots of tables
Data Visualization
Group Mean 25% 50% 75%
Bananas 11.3 2.7 4.6 23.1
Kittens 4.0 0.9 3.6 7.5
Phones -3.1 -11.0 -2.9 2.2
Variable Parameter Estimate
Cuteness 0.6***
Ability to Fly 1.4***
Deadliness 11.2***
Telepathy -9.8***
Big Ears -17.3***
![Page 7: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/7.jpg)
7
What is wrong with tables?
Tables give only a partial picture – means only tell us so much.
Figuring out what’s bigger, and by how much, requires more work.
The information is not necessarily in any order, so we need to read all the numbers.
Data Visualization
![Page 8: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/8.jpg)
8
What kinds of graphs should you make?
• The distribution, instead of giving just mean, median, etc.
• The relationship between two variables – the conditional distribution
• Graph estimation results’ point estimates and confidence intervals
Data Visualization
![Page 9: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/9.jpg)
9
What to expect out of this presentation
1. Discussion of the type of graph (e.g. distributions)
2. How the type of graph applies to continuous vs. categorical data
3. Extensions (e.g. graphing more than one at a time)
What not to expect: how to do these in any particular software.
Data Visualization
![Page 10: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/10.jpg)
10
Distributions
Data Visualization
![Page 11: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/11.jpg)
11
Distributions – Continuous variables
Make density plots/histograms for continuous variables. These give much more information than means, medians, etc.
Two distributions with the same mean, but which are dramatically different.
Data Visualization
![Page 12: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/12.jpg)
12
Density vs. histogram
A density plot is basically a smoothed histogram.
Data Visualization
![Page 13: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/13.jpg)
13
Distributions – Categorical variables
Make bar charts for categorical variables.
Tip: if your categories don’t have any inherent order, order them from largest to smallest.
Data Visualization
![Page 14: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/14.jpg)
14
Compare distributions using color
Suppose we want to compare the distribution of income among different occupations. Plot all the distributions, distinguished by color, and use transparency to make them all visible simultaneously.
Data Visualization
![Page 15: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/15.jpg)
15
Highlighting important facts
Add vertical lines to highlight the means.
Data Visualization
![Page 16: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/16.jpg)
16
Relationships
Data Visualization
![Page 17: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/17.jpg)
17
Relationships between variables
If we’re asking, for example, what GDP growth looks like at different levels of government spending, we can show this using a scatterplot.
Data Visualization
![Page 18: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/18.jpg)
18
How to show trends
We can highlight the trend using scatterplot smoothing, which adapts the shape of the trend line to the data.
Data Visualization
![Page 19: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/19.jpg)
19
How to show multiple groups
We can see if the relationship differs among groups by giving each group a color.
Data Visualization
![Page 20: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/20.jpg)
20
Another use for colors
Suppose we want to come up with rules to identify people’s favorite food based on population density and elevation (bear with me)
Can we see this on a graph?
Data Visualization
![Page 21: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/21.jpg)
21
Graphing relationships with categorical data
With categorical data, you typically can’t use scatterplots because points fall right on top of each other (‘overplotting’).However! We can use jittering to move the plotted points slightly.
Data Visualization
Without jittering With jittering
![Page 22: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/22.jpg)
22
Graphing relationships with categorical data
The next step beyond jittering is to use a boxplot, which shows– The mean, – 25th and 75th percentiles, – 1.5 times the inter-quartile range (IQR)– outliers (plotted as points)
Data Visualization
mean
75th pctile
mean + 1.5 *IQR
outlier
![Page 23: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/23.jpg)
23
Looping back
A boxplot isn’t, after all, all that different from the multi-colored density plot we showed earlier. Which is better depends on what you’re trying to show.
Data Visualization
![Page 24: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/24.jpg)
24
Use log scale if your data spans a wide range
Let’s say you have a large range of values, but most of your data is concentrated to one part of the range.
It’s easier to see what’s going when we use log scale.
Data Visualization
![Page 25: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/25.jpg)
25
Estimation results
Data Visualization
![Page 26: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/26.jpg)
26
Graphing estimation results
We make a lot of regression tables, but we can make them easier to understand by putting them into graphs.
Data Visualization
![Page 27: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/27.jpg)
27
ggplot(df, aes(population_density, elevation, color = favorite_food)) + geom_point()
Data Visualization
dataset x variable y variable
make scatterplot
color variable
All graphs made in R and ggplot2
![Page 28: Data Visualization by David Kretch](https://reader030.vdocuments.mx/reader030/viewer/2022032514/55d6bb00bb61eb47468b469d/html5/thumbnails/28.jpg)
28
Data Visualization Checklist
• Always graph
• Use color, size, shape, and position
• Three important types of graph:– Distribution– Relationship– Estimation results
• Highlight important facts
• Make it cool-looking
Data Visualization