heat (and hexagon) plots in stata - timberlake consultants€¦ · heat(andhexagon)plotsinstata...

Post on 28-Oct-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Heat (and hexagon) plots in Stata

Ben Jann

University of Bern, ben.jann@soz.unibe.ch

2019 London Stata ConferenceLondon, September 5–6, 2019

Ben Jann (University of Bern) heatplot London, 05.09.2019 1

Outline

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 2

What is a heat plot?

Generally speaking, a heat plot is a graph in which some aspect ofthe data is displayed as a color gradient.

A simple example is a bivariate histogram; the color gradient isused to illustrate (relative) frequencies within bins of X and Y .

Ben Jann (University of Bern) heatplot London, 05.09.2019 3

. quietly drawnorm y x, n(10000) corr(1 .5 1) cstorage(lower) clear

. heatplot y x, backfill colors(plasma)-4

-20

24

y

-4 -2 0 2 4x

.84893

.78679

.72464

.6625

.60036

.53821

.47607

.41393

.35179

.28964

.2275

.16536

.10321

.04107

percent

Ben Jann (University of Bern) heatplot London, 05.09.2019 4

What about hexagons?

Hexagons are great because they look a bit like circles, but you canjoin them together without leaving gaps.

Bees found out how awesome hexagons are long time ago.

Ben Jann (University of Bern) heatplot London, 05.09.2019 5

What about hexagons?

Latter on, gully cover designers found out that hexagons look greaton gully covers.

Ben Jann (University of Bern) heatplot London, 05.09.2019 6

What about hexagons?

Finally, also statisticians discovered the virtues of hexagons.

“The here are many reasons for using hexagons, at least oversquares. Hexagons have symmetry of nearest neighbors which islacking in square bins. Hexagons are the maximum number ofsides a polygon can have for a regular tesselation of the plane,so in terms of packing a hexagon is 13% more efficient forcovering the plane than squares. This property translates intobetter sampling efficiency at least for elliptical shapes. Lastlyhexagons are visually less biased for displaying densities thanother regular tesselations. For instance with squares our eyesare drawn to the horizontal and vertical lines of the grid.”1

1Lewin-Koh, N. (2018). Hexagon Binning: an Overview. Available fromhttps://cran.r-project.org/web/packages/hexbin/vignettes/hexagon_binning.pdf

Ben Jann (University of Bern) heatplot London, 05.09.2019 7

Example from above using hexagons

. hexplot y x, backfill colors(plasma)-4

-20

24

y

-4 -2 0 2 4x

.8875

.8225

.7575

.6925

.6275

.5625

.4975

.4325

.3675

.3025

.2375

.1725

.1075

.0425

percent

Ben Jann (University of Bern) heatplot London, 05.09.2019 8

Why heat plots (be it squares or hexagons)?

Heat plots are great for visualizing structure in (large) datasets.

Here is an example:. use example, clear. count

134,100. list in 1/10

X Y Z

1. 16 193 .124843352. 371 13 .007729073. 157 380 .573158054. 334 443 .316669945. 424 205 .23699765

6. 47 319 .306750087. 50 288 .310039268. 434 5 .039255079. 180 303 .5651538510. 428 183 .21671468

Ben Jann (University of Bern) heatplot London, 05.09.2019 9

Run some analyses . . .. two (lpoly Z X, degree(1)) (lpoly Z Y), legend(order(1 "X" 2 "Y"))

0.2

.4.6

lpol

y sm

ooth

: Z

0 100 200 300 400 500lpoly smoothing grid

X Y

Interesting! We clearly see the business cycles and a general upwardtrend in country Y , but country X did not develop much and therehas been some severe crisis between time 200 and 300.

Ben Jann (University of Bern) heatplot London, 05.09.2019 10

Here is a heat plot of the data:. hexplot Z Y X, xbins(10) ybins(15) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')

Ben Jann (University of Bern) heatplot London, 05.09.2019 11

Here is a heat plot of the data:. hexplot Z Y X, xbins(20) ybins(30) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')

Ben Jann (University of Bern) heatplot London, 05.09.2019 12

Here is a heat plot of the data:. hexplot Z Y X, xbins(40) ybins(60) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')

Ben Jann (University of Bern) heatplot London, 05.09.2019 13

Here is a heat plot of the data:. hexplot Z Y X, xbins(80) ybins(120) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')

Ben Jann (University of Bern) heatplot London, 05.09.2019 14

Here is a heat plot of the data:. hexplot Z Y X, xbins(160) ybins(240) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')

Ben Jann (University of Bern) heatplot London, 05.09.2019 15

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 16

Main commands

Bivariate histogram

heatplot Y X[if] [

in] [

weight] [

, options]

Trivariate heat plot (color gradient for Z)

heatplot Z Y X[if] [

in] [

weight] [

, options]

Heat plot from Stata matrix

heatplot matname[, options

]Heat plot from Mata matrix

heatplot mata(name)[, options

]Heat plot using hexagons

hexplot ...

Ben Jann (University of Bern) heatplot London, 05.09.2019 17

Main options

Color gradient optionslevels(#) number of color binscuts(numlist) custom cutpoints for color binscolors(palette) color map to be used for the color binsstatistic(stat) how Z is aggregatedsize

[(exp)

]| sizeprop size of color fields

values(options) display values as marker labelsscatter

[(...)

]render color fields as scatter plot

keylabels(spec) how legend keys are labeled. . .

Binning of Y and X[x|y

]bins(spec) how continuous Y and X are binned[

x|y]bwidth(spec) alternative to bins()[

x|y]discrete

[(#)

]treat variables as discrete and omit binning

(note: categorical X and Y can be specified as i.varname). . .

Ben Jann (University of Bern) heatplot London, 05.09.2019 18

Main options

Matrix optionsdrop(numlist) drop elements equal to values in numlistlower display lower triangle onlyupper display upper triangle only. . .

Graph optionsaddplot(plots) add other plots to the graphby(varlist

[, options

]) repeat plot by subgroups

twoway_options general twoway options. . .

Some more options related to storing results . . .

Ben Jann (University of Bern) heatplot London, 05.09.2019 19

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 20

Default

. webuse nhanes2, clear

. heatplot weight height

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

.86884

.80958

.75033

.69108

.63182

.57257

.51332

.45406

.39481

.33556

.2763

.21705

.15779

.09854

.03929

percent

Ben Jann (University of Bern) heatplot London, 05.09.2019 21

Change resolution

. heatplot weight height, xbins(20) ybwidth(10 30)

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

4.26823.97453.68083.38713.09342.79972.5062.21231.91871.6251.33131.0376.74389.4502.15651

percent

Ben Jann (University of Bern) heatplot London, 05.09.2019 22

Use counts, change color ramp, change binning, and labeling

. heatplot weight height, statistic(count) color(plasma, reverse) ///> cut(1(5)@max) keylabels(, range(1))

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

91-9386-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5

count

Ben Jann (University of Bern) heatplot London, 05.09.2019 23

Use hexagons instead of squares

. hexplot weight height, statistic(count) color(plasma, reverse) ///> cut(1(5)@max) keylabels(, range(1))

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5

count

Ben Jann (University of Bern) heatplot London, 05.09.2019 24

Scale size of hexagons by relative frequency

. hexplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5

count

Ben Jann (University of Bern) heatplot London, 05.09.2019 25

Scaling also available with squares

. heatplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

91-9386-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5

count

Ben Jann (University of Bern) heatplot London, 05.09.2019 26

Adding other plots

. hexplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size ///> addplot(lpolyci weight height, degree(1) psty(p2) lw(*1.5) ac(%50) alc(%0))

050

100

150

200

wei

ght (

kg)

140 160 180 200height (cm)

96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5

count

Ben Jann (University of Bern) heatplot London, 05.09.2019 27

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 28

Gender distribution (proportion female) by weight and height

. webuse nhanes2, clear

. hexplot female weight height, color(PiYG) ylabel(25(25)175) cuts(0(.05)1)

2550

7510

012

515

017

5w

eigh

t (kg

)

140 160 180 200height (cm)

.975

.925

.875

.825

.775

.725

.675

.625

.575

.525

.475

.425

.375

.325

.275

.225

.175

.125

.075

.025

female

Ben Jann (University of Bern) heatplot London, 05.09.2019 29

Same graph taking account relative frequencies

. hexplot female weight height, color(PiYG) ylabel(25(25)175) cuts(0(.05)1) ///> sizeprop recenter p(lcolor(black) lwidth(vthin) lalign(center))

2550

7510

012

515

017

5w

eigh

t (kg

)

140 160 180 200height (cm)

.975

.925

.875

.825

.775

.725

.675

.625

.575

.525

.475

.425

.375

.325

.275

.225

.175

.125

.075

.025

female

Ben Jann (University of Bern) heatplot London, 05.09.2019 30

Distribution of the body mass index by gender and its relation to highblood pressure

. heatplot highbp bmi i.sex, xdiscrete(0.9) yline(18.5 25) cuts(0(.05).75) ///> sizeprop recenter colors(inferno) plotregion(color(gs11)) ylabel(, nogrid)

1020

3040

5060

Body

Mas

s In

dex

(BM

I)

Male Female

.875

.725

.675

.625

.575

.525

.475

.425

.375

.325

.275

.225

.175

.125

.075

.025

highbp

Ben Jann (University of Bern) heatplot London, 05.09.2019 31

Sea surface temperature by longitude, latitude, and date

. sysuse surface, clear(NOAA Sea Surface Temperature). heatplot temperature longitude latitude, discrete(.5) statistic(asis) ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)

3031

3233

3435

3637

38

142 144 146 148 150 142 144 146 148 150

01mar2011 11mar2011

30N

to 3

8.5N

142E to 150EGraphs by date

Ben Jann (University of Bern) heatplot London, 05.09.2019 32

Same plot using hexagons

. hexplot temperature longitude latitude, discrete(.5) statistic(asis) ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)

3031

3233

3435

3637

38

142 144 146 148 150 142 144 146 148 150

01mar2011 11mar2011

30N

to 3

8.5N

142E to 150EGraphs by date

Ben Jann (University of Bern) heatplot London, 05.09.2019 33

Same plot using hexagons

. hexplot temperature longitude latitude, discrete(.5) statistic(asis) clip ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)

3031

3233

3435

3637

38

142 144 146 148 150 142 144 146 148 150

01mar2011 11mar2011

30N

to 3

8.5N

142E to 150EGraphs by date

Ben Jann (University of Bern) heatplot London, 05.09.2019 34

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 35

Same plot using hexagons

. quietly sysuse auto, clear

. hexplot price weight mpg, values(format(%9.0f)) legend(off) aspectratio(1) ///> colors(plasma, intensity(.6)) p(lc(black) lalign(center))

5147 4294 4425

4296 5837 3876 3866 4194 5397

7103 4647 5651

11995 4976 4569 4647

12990 4888

9298 8814

11385 15906

12546

1000

2000

3000

4000

5000

Wei

ght (

lbs.

)

10 20 30 40Mileage (mpg)

Ben Jann (University of Bern) heatplot London, 05.09.2019 36

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 37

First store correlations in a matrix and then plot from there

. quietly sysuse auto, clear

. quietly correlate price mpg trunk weight length turn foreign

. matrix C = r(C)

. heatplot C, values(format(%9.3f)) color(hcl, diverging intensity(.6)) ///> legend(off) aspectratio(1)

1.000 -0.469 0.314 0.539 0.432 0.310 0.049

-0.469 1.000 -0.582 -0.807 -0.796 -0.719 0.393

0.314 -0.582 1.000 0.672 0.727 0.601 -0.359

0.539 -0.807 0.672 1.000 0.946 0.857 -0.593

0.432 -0.796 0.727 0.946 1.000 0.864 -0.570

0.310 -0.719 0.601 0.857 0.864 1.000 -0.631

0.049 0.393 -0.359 -0.593 -0.570 -0.631 1.000

price

mpg

trunk

weight

length

turn

foreign

price mpg trunk weight length turn foreign

Ben Jann (University of Bern) heatplot London, 05.09.2019 38

Plot lower triangle only

. heatplot C, values(format(%9.3f)) color(hcl, diverging intensity(.6)) ///> legend(off) aspectratio(1) lower nodiagonal

-0.469

0.314 -0.582

0.539 -0.807 0.672

0.432 -0.796 0.727 0.946

0.310 -0.719 0.601 0.857 0.864

0.049 0.393 -0.359 -0.593 -0.570 -0.631

mpg

trunk

weight

length

turn

foreign

price mpg trunk weight length turn

Ben Jann (University of Bern) heatplot London, 05.09.2019 39

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 40

Preparation: Run a cluster analysis and obtain dissimilarity matrix; addinformation on clusters to the matrix

. sysuse lifeexp, clear(Life expectancy, 1998). keep if gnppc<.(5 observations deleted). cluster wards popgrowth lexp gnppccluster name: _clus_1. cluster generate N = groups(`=_N'), ties(fewer). cluster generate G = groups(5). sort G N. matrix dissim D = popgrowth lexp gnppc. mata: st_matrixcolstripe("D", strofreal(st_data(., "G N"))). mata: st_matrixrowstripe("D", strofreal(st_data(., "G N")))

Ben Jann (University of Bern) heatplot London, 05.09.2019 41

Display matrix with highlighted clusters

. heatplot D, equations(lcolor(red) lwidth(*2)) ///> plotregion(margin(zero)) legend(off) aspectratio(1) xscale(alt)

1

2

3

45

1 2 3 4 5

Ben Jann (University of Bern) heatplot London, 05.09.2019 42

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 43

Copy some data

. copy http://www.stata-press.com/data/r15/homicide1990.dta .

. copy http://www.stata-press.com/data/r15/homicide1990_shp.dta .

Compute spacial weights matrix (this might take a while)

. use homicide1990(S.Messner et al.(2000), U.S southern county homicide rates in 1990). spmatrix create contiguity W. spmatrix matafromsp W id = W. mata mata describe W

# bytes type name and extent

15,949,952 real matrix W[1412,1412]

(matrix W has about 2 million cells)

Ben Jann (University of Bern) heatplot London, 05.09.2019 44

Heat plot of W with default settings, ignoring cells (i.e. weights) that areequal to zero

. heatplot mata(W), drop(0) aspectratio(1)0

500

1000

1500

Row

s

0 500 1000 1500Columns

22.73221.17519.61718.0616.50314.94513.38811.83110.2738.71617.15875.60144.04412.4867.92938

sum

Ben Jann (University of Bern) heatplot London, 05.09.2019 45

Hexagon plot with fine-grained resolution

. heatplot mata(W), drop(0) aspectratio(1) hexagon bins(100)0

500

1000

1500

Row

s

0 500 1000 1500Columns

5.83255.44065.04884.6574.26513.87333.48143.08962.69772.30591.9141.52221.1303.73848.34663

sum

Ben Jann (University of Bern) heatplot London, 05.09.2019 46

Plot each cell individually using the discrete option

. heatplot mata(W), drop(0) aspectratio(1) discrete color(black) p(lalign(center))

0500

1000

1500

Row

s0 500 1000 1500

Columns

Ben Jann (University of Bern) heatplot London, 05.09.2019 47

Could also use the scatter option

. heatplot mata(W), drop(0) aspectratio(1) discrete color(black) scatter p(ms(p))

0500

1000

1500

Row

s0 500 1000 1500

Columns

Ben Jann (University of Bern) heatplot London, 05.09.2019 48

1 Introduction

2 Syntax of heatplot and hexplot

3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix

4 Installation

Ben Jann (University of Bern) heatplot London, 05.09.2019 49

Installation

To install heatplot (and hexplot) type

. ssc install heatplot, replace

heatplot depends on the palettes package, which itself dependson the ColrSpace Mata library, so you may also want to type

. ssc install palettes, replace

. ssc install colrspace, replace

Ben Jann (University of Bern) heatplot London, 05.09.2019 50

top related