session 42: visualization: a picture speaks a thousand words · 2018 predictive analytics symposium...

120
2018 Predictive Analytics Symposium Session 42: Visualization: A Picture Speaks a Thousand Words SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Upload: others

Post on 27-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

2018 Predictive Analytics Symposium

Session 42: Visualization: A Picture Speaks a Thousand Words

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Page 2: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Telling Your Data StoryMARY PAT CAMPBELL, FSA, MAAA, PRM

VP, Insurance Research, Conning

21 September 2018

Page 3: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

https://en.wikipedia.org/wiki/Charles_Joseph_Minard

Page 4: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

3

Page 5: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

The Why of Data Visualization. https://www.soa.org/News-and-Publications/Newsletters/Compact/2016/march/The-Why-of-Data-Visualization.aspx

Page 6: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Evaluate Your Visualization

Completeness

Perceptibility

Intuitiveness

Source: http://www.perceptualedge.com/articles/visual_business_intelligence/data_visualization_effectiveness_profile.pdf

No relevant data

All relevant data

Unclear and difficult

Clear and easy

Unfamiliar; hard to understand

Familiar; easy to understand

Page 7: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What is Your Story?

Distribution

Change over time

Correlation or Relationship

Comparison between items (ranking)

Comparison over space (maps)

Parts of a whole

Page 8: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Things to Try to Improve Readability

REMOVEGridlinesLegend – replace with data labels

Instead:Add explanatory textHighlight key elementsUse multiples of same graph

Page 9: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Some Data Stories

Page 11: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

10

Data Set 1: Modeled Income Percentiles

Data source: http://go.epi.org/unequalstates2018data

Report: Sommeiller, Estelle and Price, Mark. “The New Gilded Age”. Economic Policy Institute. 19 July 2018. https://www.epi.org/publication/the-new-gilded-age-income-inequality-in-the-u-s-by-state-metropolitan-area-and-county/

Page 13: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

12

Source: “See How Much the Top 1% Earn in Every State”, 30 Aug 2018 https://howmuch.net/articles/average-annual-income-of-the-top-1-percent

Page 14: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

13

Connecticut, #1

District of Columbia, #5 Massachusetts, #3

New York,#2

Wyoming, #4

$0

$500,000

$1,000,000

$1,500,000

$2,000,000

$2,500,000

$3,000,000

The Long Tail of High Income99th percentile Average income of top 1%

Page 15: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

14

Average Income of Top 1% Taxpayers

Page 16: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

15

AlabamaAlaska

ArizonaArkansas

California

Colorado

Connecticut

Delaware

District of Columbia

Florida

Georgia

HawaiiIdaho

Illinois

IndianaIowa

Kansas

KentuckyLouisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

MissouriMontana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North CarolinaNorth Dakota

OhioOklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

Vermont

Virginia

Washington

West Virginia

Wisconsin

Wyoming

R² = 0.7231

$0.0

$0.5

$1.0

$1.5

$2.0

$2.5

$3.0

$0.2 $0.3 $0.4 $0.5 $0.6 $0.7 $0.8

Average Income

of Top 1%Taxpayers less the

99th PercentileIncome

99th Percentile Income

Higher Percentile, Longer Tail(circle size scales by number of taxpayers, $ in millions)

Page 17: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

16

Alabama

Alaska

Arizona

ArkansasCalifornia

Colorado

Connecticut

Delaware

District of Columbia

Florida

GeorgiaHawaiiIdaho

Illinois

Indiana

Iowa

KansasKentuckyLouisiana

Maine

Maryland

Massachusetts

MichiganMinnesota

Mississippi

Missouri

MontanaNebraska

Nevada

New HampshireNew Jersey

New Mexico

New York

North Carolina

North DakotaOhio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South DakotaTennessee

Texas

Utah

VermontVirginia

Washington

West Virginia

Wisconsin

Wyoming

R² = 0.1053

100%

150%

200%

250%

300%

350%

400%

0 50,000 100,000 150,000 200,000

Percent Difference BetweenAverage Income

of Top 1% and

99th Percentile

Number of Taxpayers in the 1%

Low Correlation Between Population and Income Tail Length

Page 18: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

17

California

Connecticut

District of ColumbiaFlorida

Illinois

Massachusetts

New Jersey

New York

Texas

Wyoming

R² = 0.1476

$0.0

$0.5

$1.0

$1.5

$2.0

$2.5

0 50,000 100,000 150,000 200,000

Average Incomeof Top 1%Taxpayers,

$ in millions

Number of Taxpayers in the 1%

Geographic Outliers of Top Income

Page 19: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

18

Data Set 2: Mortality by Cause

Source: National Center for Health Statistics

Data Visualization Gallery

https://www.cdc.gov/nchs/data-visualization/index.htm

Page 20: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

19Source: https://www.cdc.gov/nchs/data-visualization/mortality-trends/

Page 21: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

20

Accidents

Cancer

Heart Disease

Influenza and Pneumonia

Stroke

0

100

200

300

400

500

600

700

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

Age

-Ad

just

ed D

eath

Rat

esAccidents Cancer Heart Disease Influenza and Pneumonia Stroke

Page 22: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

21

Accidents, 66Accidents, 43

Cancer, 196

Cancer, 159

Heart Disease, 543

Heart Disease, 169

Influenza and Pneumonia, 47 Influenza and

Pneumonia, 15

Stroke, 166

Stroke, 38

1965 2015

Age-Adjusted Death Rates, per 100,000

Page 23: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

22

Data Set 3: Public Plans Data

http://publicplansdata.org/

Page 24: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

23

The most frequently used return assumption is

7.5%

Page 25: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

24

Return Assumptions Are Concentrated, And Shifting Down

Page 26: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

25

Return Assumptions Are Concentrated, And Shifting Down

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

6.0% 6.5% 7.0% 7.5% 8.0% 8.5% 9.0% 9.5%

Cumulative Percentage

of Public Plans

Investment Return Assumption

In FY 2001, 19% did

In FY 2016, 82%of plans in the Public Plans Databaseused return assumptions of 7.75% or less

Page 27: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

26

Public Plan Funded Ratios

2011 201620062001

Page 28: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Choosing a Visualization Type

Page 29: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What Kind of Data Do You Have?Dimensionality:

One: histogram, box-and-whisker, pie chart, table with summary statsTwo: line, bar/column, scatterplotMany: multiples

Numerical or categoricalCategorical: bar/column (may want to sort categories), histogram

GroupedClustered columns, multiple graphs

Large set – or just a few numbersLarge: will generally need to simplify/summarize/group along some dimensionFew: consider table or just a number

GeographicDoes location actually count?Tile grid when entities equally weighted

Page 30: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What is Your Story?

DistributionDensity plot, histogram, box-and-whisker

Change over time Line, slope

Correlation or RelationshipScatterplot, bubble plot

Comparison between items (ranking)Slope, list/table, conditional formatting on table

Comparison over space (maps)Choropleths, tile maps

Parts of a wholePie, stacked bar/column

Page 31: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Additional Resources

Page 32: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Additional Resources

Storytelling with Data

Looks at how to design graphs and other displays for maximum effect

Most can be done in Excel

Page 33: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Websites

The Chartmaker Directory

http://chartmaker.visualisingdata.com/

Visualization Universe

Chart types: http://visualizationuniverse.com/charts/

Charting books: http://visualizationuniverse.com/books/

PolicyViz

https://policyviz.com/

Page 35: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Can You See It?

Page 36: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust
Page 37: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

CLIMBING THE ZEN MOUNTAINCLIMBING THE ZEN MOUNTAIN

Page 38: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

WHAT WE’LL TALK ABOUTWHAT WE’LL TALK ABOUT

Seeing numbersSeeinghypothesesSeeing models

Page 39: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SEEING NUMBERSSEEING NUMBERS

Page 40: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

THE TREACHERY OF IMAGESTHE TREACHERY OF IMAGES

Image taken from a University of Alabama site, “Approaches toModernism”: [1], Fair use,https://en.wikipedia.org/w/index.php?curid=555365

Page 41: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

THE NUMBER 7THE NUMBER 7

Page 42: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

WE WE CANNOTCANNOT SEE NUMBERS SEE NUMBERS

Arabic or sanskrit are no more legitimate than any other representationof numbers.

We can no more see numbers than we can hear, smell or taste them.

Page 43: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SCALING THE ZEN MOUNTAINSCALING THE ZEN MOUNTAIN

“Before I studied Zen, I saw mountains as mountains and rivers asrivers. When I had studied Zen for thirty years I no longer sawmountains as mountains and rivers as rivers. But now that I havefinally mastered Zen, I once again see mountains as mountains andrivers as rivers.”

Ch’an master Ch’ing Yuan

Page 44: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MANY NUMBERS - STATISTICSMANY NUMBERS - STATISTICS

Statistics maps a set of many numbers into a set of fewer numbers.

set.seed(1234)

meanlog_actual <- log(10e3)

sdlog_actual <- 0.5

tbl_obs <- tibble(

x = rlnorm(5e3, meanlog = meanlog_actual, sdlog = sdlog_actual)

)

tbl_obs$x %>%

summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 1830 7132 9970 11266 13947 49429

Page 45: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MANY NUMBERS VISUALLYMANY NUMBERS VISUALLY

Page 46: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MANY NUMBERS VISUALLYMANY NUMBERS VISUALLY

Looking at summary statistics is always reduced information.

Looking at a visualization represents all of the data, but forces our eyesto compute the statistics.

Increased efficiency vs. decreased accuracy

Page 47: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SEEING HYPOTHESESSEEING HYPOTHESES

Page 48: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

STATISTICAL HYPOTHESESSTATISTICAL HYPOTHESES

Many different sorts:

Were the data generated by this form of distribution?Were these two samples generated by different processes?Is there a relationship between these two variables?

[list influenced by ]http://had.co.nz/stat645/graphical-

inference.pdf

Page 49: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SAMPLE DATASAMPLE DATA

Page 50: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SAMPLE AND HYPOTHESISSAMPLE AND HYPOTHESIS

Page 51: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

HYPOTHESIS TESTINGHYPOTHESIS TESTING

Kolmogorov-SmirnovParameter significance\(\chi^2\) test

Also:

Test against other candidates, visually

Page 52: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

COULD THE DATA HAVE COME FROM SOMEWHERECOULD THE DATA HAVE COME FROM SOMEWHEREELSE?ELSE?

Page 53: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

EXERCISE FOR THE STUDENTEXERCISE FOR THE STUDENT

The same, but with:

p-p or q-q plotCumulative distribution functionIsolate important areas of the distribution

Page 54: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

BUT NOW …BUT NOW …

Test the null itself!!

Page 55: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

GRAPHICAL INFERENCEGRAPHICAL INFERENCE

Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja

H/T -> Xan Gregg @xangregg

Graphical inference helps us answer the question“Is what we see really there?”

http://had.co.nz/stat645/graphical-inference.pdf

Page 56: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

HOW IT WORKSHOW IT WORKS

Visual test

1. Generate many (or 19) samples of the NULL2. Add your actual data3. Shuffle4. Observe5. Power may be increased by using more than one observer

Page 57: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

CAN YOU SPOT THE SAMPLE DATA?CAN YOU SPOT THE SAMPLE DATA?

Page 58: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

HOW ABOUT NOW?HOW ABOUT NOW?

Page 59: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

NOW?NOW?

Page 60: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

A BIT EASIERA BIT EASIER

Page 61: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

A BIT HARDERA BIT HARDER

Page 62: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

THE STATISTICAL LINEUPTHE STATISTICAL LINEUP

If can pick my data out of a lineup, I may reject the null hypothesis.

Page 63: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SEEING MODELSSEEING MODELS

Page 64: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

SEEING MODELSSEEING MODELS

A “good” model is one which displays noise. We are most interested inseeing something which isn’t there.

Page 65: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MOVE ALONG, NOTHING TO SEE HEREMOVE ALONG, NOTHING TO SEE HEREsegment adj.r.squared sigma

1 0.6294916 1.236603

2 0.6291578 1.237214

3 0.6292489 1.236311

4 0.6296747 1.235696

Page 66: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

NOTHING TO SEE?NOTHING TO SEE?

Page 67: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

RESIDUALSRESIDUALS

Page 68: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MISSING VARIABLESMISSING VARIABLES

Let’s look at ozone data from mlbench package.

At first, we will only fit to las_wind_speed.

A simple model may tell us more than we think!

Page 69: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

BASIC EDABASIC EDA

Page 70: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

OUR APPROACHOUR APPROACH

A very messy PoissonFit a GLM with a subset of predictorsPlot residuals against all predictorsLook for pattern

Page 71: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MISSING VARIABLESMISSING VARIABLES

Page 72: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

AUGMENT OUR MODELAUGMENT OUR MODEL

Let’s add lax inversion temperature!

Page 73: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

MISSING VARIABLES REDUXMISSING VARIABLES REDUX

Page 74: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

TREESTREES

Simple trees are easy to visualizeThey’re also not too usefulEnsemble models are tough to see

Page 75: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

VARIABLE IMPORTANCEVARIABLE IMPORTANCE

Page 76: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

PARTIAL PLOTSPARTIAL PLOTS

Page 77: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

PARTIAL PLOTSPARTIAL PLOTS

Page 78: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

CONCLUSIONCONCLUSION

Page 79: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

THE ZEN MOUNTAINTHE ZEN MOUNTAIN

-Me

Numbers are not numbers, models are notmodels …

Page 80: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

THANK YOU!THANK YOU!

Page 81: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

REFERENCESREFERENCES

http://dicook.github.io/nullabor/index.html

Page 82: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

WHERE TO FIND THISWHERE TO FIND THIS

This presentation may be found at:

Code to produce the examples and slides:

http://pirategrunt.com/soa_symposium_2018/#/

https://github.com/PirateGrunt/soa_symposium_2018

Page 83: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Understanding the Layers of Your DataSession 42 – Visualization: A Picture Speaks a Thousand Words

September 2018 – Predictive Analytics Symposium

Page 84: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Good Graphics Get to the Point

2

Page 85: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Bad Graphics Do More Harm than Good

3

Identify Possible Solutions

Page 86: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

More Bad Graphics

4

LinkedIn Body Language for Leaders

Page 87: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

…. what?

5

Page 88: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Using a Layered Approach to Displaying Data

6

Guide: ggplot2 R package ggplot2 is an implementation of the concept of the grammar of graphics

Basics of the grammar: Data Geometric objects (e.g. points, lines, bars) Aesthetic attributes (e.g. color, size, shape)

Additional components: Statistical transformations of data (e.g. count, mean) Coordinate system (generally assumed to be Cartesian)

The combination and layering of these components defines the grammar

Page 89: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

7

Variable Description Examples

manufacturer manufacturer name Audi, Chevrolet, Nissanmodel model name A4, Corvette, Altimadispl engine displacement, in liters 2.0, 4.2, 6.0year year of manufacture 1999 or 2008cyl number of cylinders 4, 6, 8trans type of transmission auto, manualdrv front-wheel, rear-wheel, 4wd f, r, 4wdcty city miles per gallon 14, 16, 20hwy highway miles per gallon 15, 20, 27fl fuel type e: E85, d: diesel, r: regular, p: premium, c: CNGclass type of car compact, midsize, SUV

Sample Dataset ‘mpg’Fuel economy data from 1999 and 2008 for 38 popular models of car

Page 90: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Basic Comparisons – Density

8

Page 91: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Basic Comparisons – The Structure of Data Matters

9

## City mpg density (basic)ggplot(data = mpg, aes(x = cty)) +

geom_density()

## City mpg density (full prettied) ggplot(data = mpg, aes(x = cty)) +

geom_density(col = 'lightblue', fill = 'lightblue') +

scale_y_continuous(labels = scales::percent) +ylab('% of data') +xlab('City MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

## Highway mpg density (basic)ggplot(data = mpg, aes(x = hwy)) +

geom_density()

## Highway mpg density (full prettied) ggplot(data = mpg, aes(x = hwy)) +

geom_density(col = 'lightblue', fill = 'lightblue') +

scale_y_continuous(labels = scales::percent) +ylab('% of data') +xlab('Highway MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 92: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Basic Comparisons – The Structure of Data Matters

10

Page 93: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Basic Comparisons – The Structure of Data Matters

11

## Create a new format for our dataplot_data <- mpg %>%

gather(key = 'mpg_type', value = 'mpg', cty, hwy)

## Plot city and highway mpg under same plot controls (basic)ggplot(plot_data, aes(x = mpg)) +

geom_density() +facet_wrap(~ mpg_type, nrow = 2)

## Plot city and highway mpg under same plot controls (prettied) ggplot(plot_data, aes(x = mpg)) +

geom_density(col = 'lightblue', fill = 'lightblue') +

facet_wrap(~ mpg_type, nrow = 2, labeller = as_labeller(c('cty' = 'City',

'hwy' = 'Highway'))) +scale_y_continuous(labels = scales::percent) +ylab('% of data') +xlab('MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 94: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Scatterplots – More Than Just Dots

12

## Highway mpg as a function of city mpg (basic)ggplot(data = mpg, aes(x = cty, y = hwy)) +

geom_point()

## Highway mpg as a function of city mpg (prettied) ggplot(data = mpg, aes(x = cty, y = hwy)) +

geom_point() +xlab('City MPG') +ylab('Highway MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 95: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Scatterplots – More Than Just Dots

13

## Highway mpg as a function of city mpg (basic)## Add color based on classggplot(data = mpg, aes(x = cty, y = hwy, col = class)) +

geom_point()

## Highway mpg as a function of city mpg (prettied)## Add color based on classggplot(data = mpg, aes(x = cty, y = hwy , col = class)) +

geom_point() +xlab('City MPG') +ylab('Highway MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 96: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Scatterplots – More Than Just Dots

14

## Highway mpg as a function of city mpg (basic)## Add a trend lineggplot(data = mpg, aes(x = cty, y = hwy, col = class)) +

geom_count() +geom_smooth(aes(group = 1), method = 'lm', se = FALSE,

linetype = 'dashed')

## Highway mpg as a function of city mpg (prettied)## Add a trend lineggplot(data = mpg, aes(x = cty, y = hwy , col = class)) +

geom_count() +geom_smooth(aes(group = 1), method = 'lm', se = FALSE,

linetype = 'dashed') +xlab('City MPG') +ylab('Highway MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 97: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Scatterplots – More Than Just Dots

15

## Highway mpg as a function of city mpg (basic)## Add multiple trend linesggplot(data = mpg, aes(x = cty, y = hwy, col = class)) +

geom_count() +geom_smooth(method = 'lm', se = FALSE)

## Highway mpg as a function of city mpg (prettied)## Add multiple trend linesggplot(data = mpg, aes(x = cty, y = hwy , col = class)) +

geom_count() +geom_smooth(method = 'lm', se = FALSE) +xlab('City MPG') +ylab('Highway MPG') +theme(axis.text = element_text(size = 12),

axis.title = element_text(size = 16))

Page 98: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Bar Charts – Not So Boring After All

16

## Plot count of cars by manufacturer (basic)ggplot(data = mpg, aes(x = manufacturer)) +

geom_bar(stat = 'count')

## Plot count of cars by manufacturer (prettied)ggplot(data = mpg, aes(x = manufacturer)) +

geom_bar(stat = 'count') +theme(axis.text.x = element_text(angle = 45, hjust = 1),

axis.title = element_text(size = 16))

Page 99: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Bar Charts – Not So Boring After All

17

## Plot count of cars by manufacturer (basic)## Add transmission type as a “fill”ggplot(data = mpg, aes(x = manufacturer, fill = factor(trans))) +

geom_bar(stat = 'count', position = ‘dodge’)

## Plot count of cars by manufacturer (prettied)ggplot(data = mpg, aes(x = manufacturer, fill = factor(trans))) +

geom_bar(stat = 'count', position = ‘dodge’) +theme(axis.text.x = element_text(angle = 45, hjust = 1),

axis.title = element_text(size = 16))

Page 100: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Bar Charts – Not So Boring After All

18

## Plot count of cars by manufacturer (basic)## Facet on no. of cylindersggplot(data = mpg, aes(x = manufacturer, fill = factor(trans))) +

geom_bar(stat = 'count', position = ‘dodge’) +facet_grid(cyl ~ .)

## Plot count of cars by manufacturer (prettied)## Facet on no. of cylindersggplot(data = mpg, aes(x = manufacturer, fill = factor(trans))) +

geom_bar(stat = 'count‘, position = ‘dodge’) +facet_grid(cyl ~ .) +theme(axis.text.x = element_text(angle = 45, hjust = 1),

axis.title = element_text(size = 16))

Page 101: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Conclusion: Layers Help Tell the Story

19

Coordinate system Data Coordinates of where shot was taken Make or miss

Geometrics Bins of court coordinates Percentages within bins

Aesthetics Size of hexagons Color based on relative percentage

Statistical Transformations Count of shots, makes within bin

Page 102: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Thank youMike Hoyer, Actuary and Product ManagerMilliman IntelliScript

Page 103: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Telling Your Data StoryMARY PAT CAMPBELL, FSA, MAAA, PRMVP, Insurance Research, Conning21 June 2018

Page 104: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

https://en.wikipedia.org/wiki/Charles_Joseph_Minard

Page 105: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

23

Page 106: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

The Why of Data Visualization. https://www.soa.org/News-and-Publications/Newsletters/Compact/2016/march/The-Why-of-Data-Visualization.aspx

Page 107: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Evaluate Your VisualizationCompleteness

Perceptibility

Intuitiveness

Source: http://www.perceptualedge.com/articles/visual_business_intelligence/data_visualization_effectiveness_profile.pdf

No relevant data All relevant data

Unclear and difficult

Clear and easy

Unfamiliar; hard to understand

Familiar; easy to understand

Page 108: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What is Your Story?Distribution

Change over time

Correlation or Relationship

Comparison between items (ranking)

Comparison over space (maps)

Parts of a whole

Page 109: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Things to Try to Improve ReadabilityREMOVE

GridlinesLegend – replace with data labels

Instead:Add explanatory textHighlight key elementsUse multiples of same graph

Page 110: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Some Data Stories

Page 112: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

30

Examples To ComeI will be telling some data stories in the session, and full slides will be available after the meeting.

Photo by Casey Horner on Unsplash

Page 113: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Choosing a Visualization Type

Page 114: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What Kind of Data Do You Have?• Dimensionality:

• One: histogram, box-and-whisker, pie chart, table with summary stats• Two: line, bar/column, scatterplot• Many: multiples

• Numerical or categorical• Categorical: bar/column (may want to sort categories), histogram

• Grouped• Clustered columns, multiple graphs

• Large set – or just a few numbers• Large: will generally need to simplify/summarize/group along some dimension• Few: consider table or just a number

Page 115: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

What is Your Story?• Distribution

• Density plot, histogram, box-and-whisker

• Change over time • Line, slope

• Correlation or Relationship• Scatterplot, bubble plot

• Comparison between items (ranking)• Slope, list/table, conditional formatting on table

• Comparison over space (maps)• Choropleths, tile maps

• Parts of a whole• Pie, stacked bar/column

Page 116: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Additional Resources

Page 117: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Additional Resources

Storytelling with Data

Looks at how to design graphs and other displays for maximum effect

Most can be done in Excel

Page 118: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

WebsitesThe Chartmaker Directoryhttp://chartmaker.visualisingdata.com/

Visualization UniverseChart types: http://visualizationuniverse.com/charts/Charting books: http://visualizationuniverse.com/books/

PolicyVizhttps://policyviz.com/

Page 119: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust

Can You See It?

Page 120: Session 42: Visualization: A Picture Speaks a Thousand Words · 2018 Predictive Analytics Symposium . Session 42: Visualization: A Picture Speaks a Thousand Words . SOA Antitrust