ggplot2 - fabrice rossiapiacoa.org/publications/teaching/r/ggplot.pdf · size=0.5)+...

122
Ggplot2 Fabrice Rossi CEREMADE Université Paris Dauphine 2020

Upload: others

Post on 24-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Ggplot2

Fabrice Rossi

CEREMADEUniversité Paris Dauphine

2020

Page 2: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Outline

Introduction

Core principles

A tour of ggplot2

Extensions and other packages

2

Page 3: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Ggplot2

https://ggplot2.tidyverse.org/

I ggplot2 is a graphic system for RI ggplot2 takes a declarative approach to graphicsI current standard for data science in R

divorced married single

30 50 70 90 30 50 70 90 30 50 70 90

0

200

400

age

coun

t

educationprimarysecondarytertiaryunknown

3

Page 4: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Ggplot2

ProsI explain what you want not

how to do it (declarative)I high quality defaults (e.g. for

colors)I easy conditional analysisI consistent presentationI included best practices

ConsI rather steep learning curve

(new logic compared tostandard R plot)

I data science oriented (needsa data frame)

I difficult to customize in somecircumstances

I no interactivity

4

Page 5: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Principles

In ggplot, a plot is composed ofI a data setI a mapping from some variables to aesthetics (graphical primitives)I layers that compute summaries of the data (stats)I layers that represent with graphical objects the data (geom)I scales that transform values in the data space into values in the

aestheticsI a coordinate systemI a faceting specification for conditional analysisI a theme which specifies details such as fonts and colormap

5

Page 6: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

I data: age, education leveland marital status

I mappings:I x axis: ageI fill color: education level

I geom: histogramI facet: conditionally on marital

statusI theme: lot of tweaking...

divorced married single

30 50 70 90 30 50 70 90 30 50 70 90

0

200

400

age

coun

teducation primary secondary tertiary unknown

6

Page 7: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Outline

Introduction

Core principles

A tour of ggplot2

Extensions and other packages

7

Page 8: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data set

Bank marketing data setI Bank+MarketingI direct marketing (phone call)I 17 variablesI target variable: success of

the offerI 3 groups of variables:

I description of the client(age, education, etc.)

I banking status (balance,loan, etc.)

I previous contact with thisclient

AnalysisI marketing oriented questions

I success/failurecharacterization

I effect of past contactsI more general questions

I relationship betweenvariables

I characterization of thecustomers

8

Page 9: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Minimal graphic

Simple graphic in ggplot2I a data setI a mapping from the variables

to the aestheticsI a geom layer

Bank exampleI data: bankI mapping:

I age to x axisI balance to y axis

I layer: points (scatter plot)

Exampleggplot(bank, aes(x = age,

y = balance)) + geom_point()

0

20000

40000

60000

30 50 70 90age

bala

nce

9

Page 10: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

General form

Typical formggplot(data, aes(mapping)) + layer

I (almost) always starts withggplot to set the data andthe mappings

I then followed by one orseveral calls to other ggplot2functions

I at least onegeom_something layer todraw someting

Exampleggplot(bank, aes(x = age)) +

geom_histogram(binwidth = 5)

0

250

500

750

30 50 70 90age

coun

t

10

Page 11: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Parameters

Geom layersI each geom_something

has specific parametersI reasonable defaults in

many casesI aesthetics can be

specified as parameters

Example

ggplot(bank, aes(age,balance)) +geom_point()

0

20000

40000

60000

30 50 70 90age

bala

nce

11

Page 12: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Parameters

Geom layersI each geom_something

has specific parametersI reasonable defaults in

many casesI aesthetics can be

specified as parameters

Example

ggplot(bank, aes(age,balance)) +geom_point(alpha=0.3)

0

20000

40000

60000

30 50 70 90age

bala

nce

11

Page 13: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Parameters

Geom layersI each geom_something

has specific parametersI reasonable defaults in

many casesI aesthetics can be

specified as parameters

Example

ggplot(bank, aes(age,balance)) +geom_point(alpha=0.3,color="red")

0

20000

40000

60000

30 50 70 90age

bala

nce

11

Page 14: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mapping many variables

ggplot(bank, aes(x=age, y=balance)) +geom_point(alpha=0.3)

0

20000

40000

60000

30 50 70 90age

bala

nce

12

Page 15: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mapping many variables

ggplot(bank, aes(x=age, y=balance, color=marital)) +geom_point(alpha=0.3)

0

20000

40000

60000

30 50 70 90age

bala

nce marital

divorcedmarriedsingle

12

Page 16: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mapping many variables

ggplot(bank, aes(x=age, y=balance, color=marital, shape=education)) +geom_point(alpha=0.3)

0

20000

40000

60000

30 50 70 90age

bala

nce

maritaldivorcedmarriedsingle

educationprimarysecondarytertiaryunknown

12

Page 17: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Facetting

Aesthetic limitationsI only a small number of

graphical attributes can beused in a given graphic

I encoding quality differs fromone attribute to another

I visual clutter

Small multiplesI combine several graphsI keep each graph simple

0

20000

40000

60000

30 50 70 90

age

bala

nce

marital divorced married single

13

Page 18: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Facetting

Aesthetic limitationsI only a small number of

graphical attributes can beused in a given graphic

I encoding quality differs fromone attribute to another

I visual clutter

Small multiplesI combine several graphsI keep each graph simple

divorced married single

30 50 70 90 30 50 70 90 30 50 70 90

0

20000

40000

60000

age

bala

nce

ggplot(bank, aes(x=age, y=balance)) +geom_point(alpha=0.3,size=0.5) +

facet_wrap(~marital)

13

Page 19: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Facetting

facet_wrap

I conditioning on a singlevariable

I facet_warp(~variable)

I one panel per value ofvariable

facet_grid

I conditioning on one or twovariables

I facet_grid(.~x):horizontal

I facet_grid(y~.): verticalI facet_grid(y~x): as a

grid

14

Page 20: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank, aes(age, balance)) + geom_point(alpha = 0.3,size = 0.5) + facet_grid(education ~ .)

primary

secondarytertiary

unknown

30 50 70 90

0

20000

40000

60000

0

20000

40000

60000

0

20000

40000

60000

0

20000

40000

60000

age

bala

nce

15

Page 21: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank, aes(age, balance)) + geom_point(alpha = 0.3,size = 0.5) + facet_grid(~marital)

divorced married single

30 50 70 90 30 50 70 90 30 50 70 90

0

20000

40000

60000

age

bala

nce

15

Page 22: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank, aes(age, balance)) + geom_point(alpha = 0.3,size = 0.5) + facet_grid(education ~ marital)

divorced married single

primary

secondarytertiary

unknown

30 50 70 90 30 50 70 90 30 50 70 90

0200004000060000

0200004000060000

0200004000060000

0200004000060000

age

bala

nce

15

Page 23: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank, aes(age, balance, color = y)) + geom_point(alpha = 0.3,size = 0.5) + facet_grid(education ~ marital)

divorced married single

primary

secondarytertiary

unknown

30 50 70 90 30 50 70 90 30 50 70 90

0200004000060000

0200004000060000

0200004000060000

0200004000060000

age

bala

nce y

noyes

15

Page 24: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Outline

Introduction

Core principles

A tour of ggplot2

Extensions and other packages

16

Page 25: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bar Graph

Discrete variable distributionI always use a bar graph

(a.k.a. bar plot)I never a pie chartI see e.g. Save the Pies of

Dessert by Stephen Few

geom_bar

I x aesthetics: the discretevariable

I fill aesthetics: color of thebars

I width parameter: width ofthe bars

ggplot(bank, aes(x = marital)) +geom_bar(width = 0.8)

0

1000

2000

divorced married singlemarital

coun

t

17

Page 26: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bar Graph

Discrete variable distributionI always use a bar graph

(a.k.a. bar plot)I never a pie chartI see e.g. Save the Pies of

Dessert by Stephen Few

geom_bar

I x aesthetics: the discretevariable

I fill aesthetics: color of thebars

I width parameter: width ofthe bars

ggplot(bank, aes(x = marital,fill = marital)) +geom_bar(width = 0.8)

0

1000

2000

divorced married singlemarital

coun

t maritaldivorcedmarriedsingle

17

Page 27: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bar Graph

Discrete variable distributionI always use a bar graph

(a.k.a. bar plot)I never a pie chartI see e.g. Save the Pies of

Dessert by Stephen Few

geom_bar

I x aesthetics: the discretevariable

I fill aesthetics: color of thebars

I width parameter: width ofthe bars

ggplot(bank, aes(x = marital,fill = marital)) +geom_bar(width = 0.8) +theme(legend.position = "none")

0

1000

2000

divorced married singlemarital

coun

t

17

Page 28: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = y)) +geom_bar()

0

1000

2000

3000

4000

no yesy

coun

t

ggplot(bank,aes(x=y, fill=marital)) +geom_bar() +theme(legend.position="bottom")

0

1000

2000

3000

4000

no yesy

coun

t

marital divorced married single

18

Page 29: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = marital)) +geom_bar()

0

1000

2000

divorced married singlemarital

coun

t

ggplot(bank,aes(x=marital, fill=y)) +geom_bar() +theme(legend.position="bottom")

0

1000

2000

divorced married singlemarital

coun

t

y no yes

19

Page 30: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = y)) +geom_bar() + facet_grid(education ~marital)

divorced married single

primary

secondarytertiary

unknown

no yes no yes no yes

0

500

1000

0

500

1000

0

500

1000

0

500

1000

y

coun

t

20

Page 31: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = education,fill = marital)) +geom_bar() + facet_wrap(~y)

no yes

primarysecondary tertiary unknown primarysecondary tertiary unknown

0

500

1000

1500

2000

education

coun

t maritaldivorcedmarriedsingle

21

Page 32: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = marital,fill = education)) +geom_bar() + facet_wrap(~y)

no yes

divorced married single divorced married single

0

500

1000

1500

2000

2500

marital

coun

t

educationprimarysecondarytertiaryunknown

22

Page 33: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Variations on fill

Stacking behaviorI several values for a given x

induced by fillI stacked by default

AlternativesI side by side

position="dodge"

I stacked with normalizationposition="fill"

ggplot(bank,aes(x=marital, fill=y)) +geom_bar() +theme(legend.position="bottom")

0

1000

2000

divorced married singlemarital

coun

t

y no yes

23

Page 34: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Variations on fill

Stacking behaviorI several values for a given x

induced by fillI stacked by default

AlternativesI side by side

position="dodge"

I stacked with normalizationposition="fill"

ggplot(bank,aes(x=marital, fill=y)) +geom_bar(position="dodge") +theme(legend.position="bottom")

0

500

1000

1500

2000

2500

divorced married singlemarital

coun

t

y no yes

23

Page 35: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Variations on fill

Stacking behaviorI several values for a given x

induced by fillI stacked by default

AlternativesI side by side

position="dodge"

I stacked with normalizationposition="fill"

ggplot(bank,aes(x=marital, fill=y)) +geom_bar(position="dodge2") +theme(legend.position="bottom")

0

500

1000

1500

2000

2500

divorced married singlemarital

coun

t

y no yes

23

Page 36: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Variations on fill

Stacking behaviorI several values for a given x

induced by fillI stacked by default

AlternativesI side by side

position="dodge"

I stacked with normalizationposition="fill"

ggplot(bank,aes(x=marital, fill=y)) +geom_bar(position="fill") +theme(legend.position="bottom")

0.00

0.25

0.50

0.75

1.00

divorced married singlemarital

coun

t

y no yes

23

Page 37: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Statistical Transformations

Count?I a bar chart display a count

variableI not available in the original

dataI implicit transformation

Stat layerI calculation layersI with a default geom layer

(and vice versa)

Bar chartI geom_bar and

stat_count

I stat_count computescount and prop

I prop useful in somecontexts

Computed variablesI accessible for aestheticsI ..name.. syntax

24

Page 38: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank,aes(x=marital)) +geom_bar() +facet_wrap(~y)

no yes

divorced married single divorced married single

0

500

1000

1500

2000

2500

marital

coun

t

25

Page 39: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank,aes(x=marital)) +geom_bar(mapping=aes(y=..prop.., group=1)) +facet_wrap(~y)

no yes

divorced married single divorced married single

0.0

0.2

0.4

0.6

marital

prop

26

Page 40: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = y)) + geom_bar() + facet_grid(education ~marital)

divorced married singleprim

arysecondary

tertiaryunknow

n

no yes no yes no yes

0

500

1000

0

500

1000

0

500

1000

0

500

1000

y

coun

t

27

Page 41: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Bank Data

ggplot(bank, aes(x = y)) + geom_bar(mapping = aes(y = ..prop..,group = 1)) + facet_grid(education ~ marital)

divorced married singleprim

arysecondary

tertiaryunknow

n

no yes no yes no yes

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

0.000.250.500.751.00

y

prop

28

Page 42: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

DistributionI a bar chart represents an

estimation of the probabilitydistribution of a discretevariable

I equivalent for continuousvariable?

HistogramI quantify a continuous

variable into a discrete oneI display the result in a way

vaguely similar to a bar chart

ggplot(bank, aes(age)) +geom_histogram(binwidth = 2)

0

100

200

300

400

30 50 70 90age

coun

t

I geom_histogram

I stat_bin

29

Page 43: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

DistributionI a bar chart represents an

estimation of the probabilitydistribution of a discretevariable

I equivalent for continuousvariable?

HistogramI quantify a continuous

variable into a discrete oneI display the result in a way

vaguely similar to a bar chart

ggplot(bank, aes(age, fill = y)) +geom_histogram(binwidth = 2)

0

100

200

300

400

30 50 70 90age

coun

t ynoyes

I geom_histogram

I stat_bin

29

Page 44: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

AlternativeI similar binning strategyI line based representationI same stat layer, different

geom layer

Different conditioningI no stackingI multiple curves

ggplot(bank, aes(age)) +geom_histogram(binwidth = 2)

0

100

200

300

400

30 50 70 90age

coun

t

30

Page 45: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

AlternativeI similar binning strategyI line based representationI same stat layer, different

geom layer

Different conditioningI no stackingI multiple curves

ggplot(bank, aes(age)) +geom_freqpoly(binwidth = 2)

0

100

200

300

400

25 50 75age

coun

t

30

Page 46: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

AlternativeI similar binning strategyI line based representationI same stat layer, different

geom layer

Different conditioningI no stackingI multiple curves

ggplot(bank, aes(age, color = y)) +geom_freqpoly(binwidth = 2)

0

100

200

300

25 50 75age

coun

t ynoyes

30

Page 47: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

AlternativeI similar binning strategyI line based representationI same stat layer, different

geom layer

Different conditioningI no stackingI multiple curves

ggplot(bank, aes(age, fill = y)) +geom_histogram(binwidth = 2)

0

100

200

300

400

30 50 70 90age

coun

t ynoyes

30

Page 48: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=25, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 25

coun

t ynoyes

31

Page 49: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=30, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 30

coun

t ynoyes

31

Page 50: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=40, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 40

coun

t ynoyes

31

Page 51: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=50, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 50

coun

t ynoyes

31

Page 52: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=55, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 55

coun

t ynoyes

31

Page 53: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age dependencyDoes the positive answer ratio grow with age?

ggplot(bank, aes(x = age >=19, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

TRUEage >= 19

coun

t ynoyes

ggplot(bank, aes(x = age >=60, fill = y)) +geom_bar(position = "fill")

0.00

0.25

0.50

0.75

1.00

FALSE TRUEage >= 60

coun

t ynoyes

31

Page 54: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Age categoriesbank <- bank %>%

mutate(agegroup=18+2*cut(age,breaks=seq(19,91,by=2),labels=FALSE,include.lowest=TRUE))

bank %>% print(n=5,width=35)## # A tibble: 4,521 x 18## age job marital## <dbl> <chr> <chr>## 1 30 unem~ married## 2 33 serv~ married## 3 35 mana~ single## 4 30 mana~ married## 5 59 blue~ married## education default balance## <chr> <chr> <dbl>## 1 primary no 1787## 2 secondary no 4789## 3 tertiary no 1350## 4 tertiary no 1476## 5 secondary no 0## # ... with 4,516 more rows,## # and 12 more variables:## # housing <chr>,## # loan <chr>,## # contact <chr>, day <dbl>,## # month <chr>,## # duration <dbl>,## # campaign <dbl>,## # pdays <dbl>,## # previous <dbl>,## # poutcome <chr>, y <chr>,## # agegroup <dbl>

Yes frequenciesbankbyage <- bank %>% group_by(agegroup,y) %>%

summarise(n=n()) %>% group_by(agegroup) %>%mutate(freq=n/sum(n)) %>% filter(y=="yes")

bankbyage %>% print(n=6,width=30)## # A tibble: 34 x 4## # Groups: agegroup [34]## agegroup y n freq## <dbl> <chr> <int> <dbl>## 1 20 yes 4 0.286## 2 22 yes 5 0.172## 3 24 yes 14 0.206## 4 26 yes 21 0.123## 5 28 yes 30 0.15## 6 30 yes 32 0.0917## # ... with 28 more rows

32

Page 55: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzoggplot(bank,aes(y)) + geom_bar() + facet_wrap(~agegroup,nrow=5) +

theme(text=element_text(size=5),strip.text=element_text(size=4,margin=margin(0.01,0.01,0.01,0.01,"cm")))

76 78 80 82 84 86

62 64 66 68 70 72 74

48 50 52 54 56 58 60

34 36 38 40 42 44 46

20 22 24 26 28 30 32

no yes no yes no yes no yes no yes no yes

no yes

0

100

200

300

0

100

200

300

0

100

200

300

0

100

200

300

0

100

200

300

y

coun

t

33

Page 56: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzoggplot(bank,aes(y)) + geom_bar(aes(y=..prop..,group=1)) + facet_wrap(~agegroup,nrow=5) +

theme(text=element_text(size=5),strip.text=element_text(size=4,margin=margin(0.01,0.01,0.01,0.01,"cm")))

76 78 80 82 84 86

62 64 66 68 70 72 74

48 50 52 54 56 58 60

34 36 38 40 42 44 46

20 22 24 26 28 30 32

no yes no yes no yes no yes no yes no yes

no yes

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

y

prop

34

Page 57: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzoggplot(bankbyage,aes(x=agegroup,y=freq)) + geom_point() +

geom_smooth() + labs(x="age",y="success rate")

0.00

0.25

0.50

0.75

1.00

20 40 60 80age

succ

ess

rate

35

Page 58: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Data Analysis Intermezzo

Typical data analysis sequence

1. display the target variable and another variable2. spot a potential dependency3. use other visualizations to confirm/reject the dependency4. compute new variables (in general aggregated ones)5. use again visualizations to explore the dependency6. rinse and repeat

36

Page 59: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Density estimationI similar goal: probability

estimationI continuous random variable

version: densityI smooth estimate (binning

replaced by smoothing)I geom_density (with stat)

Different conditioningI multiple curvesI curve by curve normalization

ggplot(bank,aes(age)) +geom_freqpoly(binwidth = 2)

0

100

200

300

400

25 50 75age

coun

t

37

Page 60: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Density estimationI similar goal: probability

estimationI continuous random variable

version: densityI smooth estimate (binning

replaced by smoothing)I geom_density (with stat)

Different conditioningI multiple curvesI curve by curve normalization

ggplot(bank, aes(age)) +geom_density()

0.00

0.01

0.02

0.03

0.04

30 50 70 90age

dens

ity

37

Page 61: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Density estimationI similar goal: probability

estimationI continuous random variable

version: densityI smooth estimate (binning

replaced by smoothing)I geom_density (with stat)

Different conditioningI multiple curvesI curve by curve normalization

ggplot(bank, aes(age, color = y)) +geom_density()

0.00

0.01

0.02

0.03

0.04

30 50 70 90age

dens

ity ynoyes

37

Page 62: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Density estimationI similar goal: probability

estimationI continuous random variable

version: densityI smooth estimate (binning

replaced by smoothing)I geom_density (with stat)

Different conditioningI multiple curvesI curve by curve normalization

ggplot(bank, aes(age, color = y)) +geom_freqpoly(binwidth = 2)

0

100

200

300

25 50 75age

coun

t ynoyes

37

Page 63: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Density estimationI similar goal: probability

estimationI continuous random variable

version: densityI smooth estimate (binning

replaced by smoothing)I geom_density (with stat)

Different conditioningI multiple curvesI curve by curve normalization

ggplot(bank, aes(age, color = y)) +geom_freqpoly(aes(y=..density..),

binwidth = 2)

0.00

0.01

0.02

0.03

0.04

25 50 75age

dens

ity ynoyes

37

Page 64: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Tail effectsI a small number of extreme

valuesI most graphs are barely

readable

ggplot(bank,aes(balance)) +geom_density()

0.0000

0.0002

0.0004

0.0006

0 20000 40000 60000balance

dens

ity

38

Page 65: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Tail effectsI a small number of extreme

valuesI most graphs are barely

readable

FilteringI get quantiles

bank %>%summarise(quantile(balance,probs=0.95))

## # A tibble: 1 x## # 1## `quantile(bal~## <dbl>## 1 6102

bank %>%summarise(quantile(balance,probs=0.99))

## # A tibble: 1 x## # 1## `quantile(bal~## <dbl>## 1 14195.

ggplot(bank,aes(balance)) +geom_density()

0.0000

0.0002

0.0004

0.0006

0 20000 40000 60000balance

dens

ity

38

Page 66: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Tail effectsI a small number of extreme

valuesI most graphs are barely

readable

FilteringI filtering

bank2 <- bank %>% filter(balance<=14195)

ggplot(bank2,aes(balance)) +geom_density()

0.0000

0.0002

0.0004

0.0006

0.0008

0 5000 10000balance

dens

ity

38

Page 67: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Tail effectsI a small number of extreme

valuesI most graphs are barely

readable

FilteringI filtering

bank2 <- bank %>% filter(balance<=14195)

I filtering morebank3 <- bank %>% filter(balance<=6102)

ggplot(bank3,aes(balance)) +geom_density()

0.00000

0.00025

0.00050

0.00075

−2000 0 2000 4000 6000balance

dens

ity

38

Page 68: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

FilteringI extremely important to

make graph readableI usefulness increases with

the complexity of the graphI nonlinear transformations

can help also (e.g.logarithm)

ggplot(bank,aes(balance,color=y)) +geom_density()

0.0000

0.0002

0.0004

0.0006

0.0008

0 20000 40000 60000balance

dens

ity ynoyes

39

Page 69: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

FilteringI extremely important to

make graph readableI usefulness increases with

the complexity of the graphI nonlinear transformations

can help also (e.g.logarithm)

ggplot(bank2,aes(balance,color=y)) +geom_density()

0.00000

0.00025

0.00050

0.00075

0 5000 10000balance

dens

ity ynoyes

39

Page 70: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

bank %>% ggplot(aes(balance)) + geom_density() + facet_grid(marital~education)

primary secondary tertiary unknown

divorcedm

arriedsingle

0 200004000060000 0 200004000060000 0 200004000060000 0 200004000060000

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

balance

dens

ity

40

Page 71: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

bank2 %>% ggplot(aes(balance)) + geom_density() + facet_grid(marital~education)

primary secondary tertiary unknown

divorcedm

arriedsingle

0 5000 10000 0 5000 10000 0 5000 10000 0 5000 10000

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

balance

dens

ity

40

Page 72: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

bank %>% ggplot(aes(balance,color=y)) + geom_density() + facet_grid(marital~education)

primary secondary tertiary unknowndivorced

married

single

0 200004000060000 0 200004000060000 0 200004000060000 0 200004000060000

0.0000

0.0005

0.0010

0.0015

0.0000

0.0005

0.0010

0.0015

0.0000

0.0005

0.0010

0.0015

balance

dens

ity ynoyes

41

Page 73: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

bank2 %>% ggplot(aes(balance,color=y)) + geom_density() + facet_grid(marital~education)

primary secondary tertiary unknowndivorced

married

single

0 500010000 0 500010000 0 500010000 0 500010000

0.0000

0.0005

0.0010

0.0015

0.0000

0.0005

0.0010

0.0015

0.0000

0.0005

0.0010

0.0015

balance

dens

ity ynoyes

41

Page 74: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

More filtering

bank %>% group_by(marital, education) %>% summarise(nb = n()) %>%spread(marital, nb)

education divorced married single1 primary 79 526 732 secondary 270 1427 6093 tertiary 155 727 4684 unknown 24 117 46

42

Page 75: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

More filtering

bank %>% filter(balance <= 14195) %>% group_by(marital, education) %>%summarise(nb = n()) %>% spread(marital, nb)

education divorced married single1 primary 79 523 712 secondary 270 1413 6083 tertiary 154 713 4604 unknown 24 114 46

42

Page 76: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

bank %>% filter(balance<=14195,education!="unknown") %>%ggplot(aes(balance,color=y)) + geom_density() + facet_grid(marital~education)

primary secondary tertiary

divorcedm

arriedsingle

0 5000 10000 0 5000 10000 0 5000 10000

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

0.0000

0.0002

0.0004

0.0006

0.0008

balance

dens

ity ynoyes

43

Page 77: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Box plotsI density estimation and

histogram are somewhatcomplex

I displaying many of them onone graphics isoverwhelming

I simpler solution: box plotsI geom_boxplot

(stat_boxplot)

ggplot(bank,aes(marital,age)) +geom_boxplot(outlier.size=0.5,size=0.1,

outlier.alpha=0.5)

30

50

70

90

divorced married singlemarital

age

44

Page 78: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank,aes(marital,age,fill=education)) +geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5)

30

50

70

90

divorced married singlemarital

age

educationprimarysecondarytertiaryunknown

45

Page 79: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank,aes(job,age)) + geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

30

50

70

90

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

age

46

Page 80: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(bank,aes(job,balance)) + geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

0

20000

40000

60000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

47

Page 81: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Examplebank %>% filter(balance<=14195) %>%

ggplot(aes(job,balance)) + geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

0

5000

10000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

48

Page 82: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Examplebank %>% filter(balance<=14195) %>%

ggplot(aes(job,balance,fill=education)) +geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

0

5000

10000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

education

primary

secondary

tertiary

unknown

49

Page 83: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Examplebank %>% filter(balance<=14195) %>%

ggplot(aes(job,balance,fill=education)) +geom_boxplot(outlier.size=0.1,size=0.1,outlier.alpha=0.5,varwidth=TRUE) +theme(text=element_text(size=5), axis.text.x = element_text(angle = 270, hjust = 1))

0

5000

10000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

education

primary

secondary

tertiary

unknown

50

Page 84: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Summarizing continuous variables

Violin plotsI compromise between box

plots and density plotsI conditional approach as box

plotI density visualization as

density plotI geom_violin

(stat_ydensity)

ggplot(bank,aes(marital,age)) +geom_violin(size=0.1)

30

50

70

90

divorced married singlemarital

age

51

Page 85: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Comparison

ggplot(bank,aes(marital,age)) +geom_boxplot(outlier.size=0.5,size=0.1,

outlier.alpha=0.5)

30

50

70

90

divorced married singlemarital

age

ggplot(bank,aes(marital,age)) +geom_violin(size=0.1)

30

50

70

90

divorced married singlemarital

age

52

Page 86: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Comparison

ggplot(bank,aes(marital,age,fill=education)) +geom_boxplot(outlier.size=0.5,size=0.1,

outlier.alpha=0.5) +theme(legend.position="bottom") +guides(fill=guide_legend(nrow=2,byrow=TRUE))

30

50

70

90

divorced married singlemarital

age

educationprimary secondary

tertiary unknown

ggplot(bank,aes(marital,age,fill=education)) +geom_violin(size=0.1) +theme(legend.position="bottom") +guides(fill=guide_legend(nrow=2,byrow=TRUE))

30

50

70

90

divorced married singlemarital

age

educationprimary secondary

tertiary unknown

53

Page 87: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Comparisonbank %>% filter(balance<=14195) %>%

ggplot(aes(job,balance,fill=education)) +geom_boxplot(outlier.size=0.5,size=0.1,outlier.alpha=0.5) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

0

5000

10000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

education

primary

secondary

tertiary

unknown

54

Page 88: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Comparisonbank %>% filter(balance<=14195) %>%

ggplot(aes(job,balance,fill=education)) +geom_violin(size=0.1) +theme(text=element_text(size=5),axis.text.x = element_text(angle = 270, hjust = 1))

0

5000

10000

admin.

blue−collar

entrepreneur

housemaid

managem

ent

retired

self−em

ployed

services

student

technician

unemployed

unknown

job

bala

nce

education

primary

secondary

tertiary

unknown

55

Page 89: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Temporal data

Time seriesI many data have a temporal componentI temporal data are easily represented by continuous linesI must be in an adapted format for ggplot

I ggplot operates on variables in the data setI one variable has to be the time mapped to the x axisI any other variable can be time evolving quantity

Example: air quality data setI Air+QualityI hourly average of reading from chemical sensorsI hourly average of gas concentrations

56

Page 90: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

airquality## # A tibble: 9,357 x 15## Date Time `CO(GT)` `PT08.S1(CO)` `NMHC(GT)` `C6H6(GT)`## <dttm> <dttm> <dbl> <dbl> <dbl> <dbl>## 1 2004-03-10 00:00:00 1899-12-31 18:00:00 2.6 1360 150 11.9## 2 2004-03-10 00:00:00 1899-12-31 19:00:00 2 1292. 112 9.40## 3 2004-03-10 00:00:00 1899-12-31 20:00:00 2.2 1402 88 9.00## 4 2004-03-10 00:00:00 1899-12-31 21:00:00 2.2 1376. 80 9.23## 5 2004-03-10 00:00:00 1899-12-31 22:00:00 1.6 1272. 51 6.52## 6 2004-03-10 00:00:00 1899-12-31 23:00:00 1.2 1197 38 4.74## 7 2004-03-11 00:00:00 1899-12-31 00:00:00 1.2 1185 31 3.62## 8 2004-03-11 00:00:00 1899-12-31 01:00:00 1 1136. 31 3.33## 9 2004-03-11 00:00:00 1899-12-31 02:00:00 0.9 1094 24 2.34## 10 2004-03-11 00:00:00 1899-12-31 03:00:00 0.6 1010. 19 1.70## # ... with 9,347 more rows, and 9 more variables: `PT08.S2(NMHC)` <dbl>, `NOx(GT)` <dbl>,## # `PT08.S3(NOx)` <dbl>, `NO2(GT)` <dbl>, `PT08.S4(NO2)` <dbl>, `PT08.S5(O3)` <dbl>, T <dbl>,## # RH <dbl>, AH <dbl>

57

Page 91: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(airquality, aes(Date, `CO(GT)`)) + geom_line(size = 0.02)

−200

−150

−100

−50

0

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

CO

(GT

)

58

Page 92: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Missing data

OutliersI numerous occurrences of −200I impossible according to the data modelI poor encoding of missing values!

airquality <- airquality %>% mutate_all(~replace(., . == -200, NA))

59

Page 93: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

ggplot(airquality, aes(Date, `CO(GT)`)) + geom_line(size = 0.02)

0.0

2.5

5.0

7.5

10.0

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

CO

(GT

)

60

Page 94: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Zooming

ggplot(airquality,aes(Date,`CO(GT)`))+geom_line(size=0.02)+coord_cartesian(xlim=as_datetime(c("2004-03-10"," 2004-04-10")))

0.0

2.5

5.0

7.5

10.0

mars 15 mars 22 mars 29 avril 05Date

CO

(GT

)

61

Page 95: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Encoding issues

airquality %>% select(Date, Time) %>% print(n = 5, n_extra = 0)## # A tibble: 9,357 x 2## Date Time## <dttm> <dttm>## 1 2004-03-10 00:00:00 1899-12-31 18:00:00## 2 2004-03-10 00:00:00 1899-12-31 19:00:00## 3 2004-03-10 00:00:00 1899-12-31 20:00:00## 4 2004-03-10 00:00:00 1899-12-31 21:00:00## 5 2004-03-10 00:00:00 1899-12-31 22:00:00## # ... with 9,352 more rows

airquality <- airquality %>% mutate(fulldate = Date + hour(Time) * 3600)airquality %>% select(Date, Time, fulldate) %>% print(n = 5, n_extra = 0)## # A tibble: 9,357 x 3## Date Time fulldate## <dttm> <dttm> <dttm>## 1 2004-03-10 00:00:00 1899-12-31 18:00:00 2004-03-10 18:00:00## 2 2004-03-10 00:00:00 1899-12-31 19:00:00 2004-03-10 19:00:00## 3 2004-03-10 00:00:00 1899-12-31 20:00:00 2004-03-10 20:00:00## 4 2004-03-10 00:00:00 1899-12-31 21:00:00 2004-03-10 21:00:00## 5 2004-03-10 00:00:00 1899-12-31 22:00:00 2004-03-10 22:00:00## # ... with 9,352 more rows

62

Page 96: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Encoding issues

airquality %>% select(Date, Time) %>% print(n = 5, n_extra = 0)## # A tibble: 9,357 x 2## Date Time## <dttm> <dttm>## 1 2004-03-10 00:00:00 1899-12-31 18:00:00## 2 2004-03-10 00:00:00 1899-12-31 19:00:00## 3 2004-03-10 00:00:00 1899-12-31 20:00:00## 4 2004-03-10 00:00:00 1899-12-31 21:00:00## 5 2004-03-10 00:00:00 1899-12-31 22:00:00## # ... with 9,352 more rows

airquality <- airquality %>% mutate(fulldate = Date + hour(Time) * 3600)airquality %>% select(Date, Time, fulldate) %>% print(n = 5, n_extra = 0)## # A tibble: 9,357 x 3## Date Time fulldate## <dttm> <dttm> <dttm>## 1 2004-03-10 00:00:00 1899-12-31 18:00:00 2004-03-10 18:00:00## 2 2004-03-10 00:00:00 1899-12-31 19:00:00 2004-03-10 19:00:00## 3 2004-03-10 00:00:00 1899-12-31 20:00:00 2004-03-10 20:00:00## 4 2004-03-10 00:00:00 1899-12-31 21:00:00 2004-03-10 21:00:00## 5 2004-03-10 00:00:00 1899-12-31 22:00:00 2004-03-10 22:00:00## # ... with 9,352 more rows

62

Page 97: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Corrected data

ggplot(airquality,aes(fulldate,`CO(GT)`))+geom_line(size=0.02)+coord_cartesian(xlim=as_datetime(c("2004-03-10"," 2004-04-10"))) +

labs(x="Date", y="CO")

0.0

2.5

5.0

7.5

10.0

mars 15 mars 22 mars 29 avril 05Date

CO

63

Page 98: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Corrected data

ggplot(airquality,aes(fulldate,`CO(GT)`))+geom_line(size=0.02) +labs(x="Date", y="CO")

0.0

2.5

5.0

7.5

10.0

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

CO

64

Page 99: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Another series

ggplot(airquality,aes(fulldate,`PT08.S1(CO)`))+geom_line(size=0.02) +labs(x="Date", y="CO sensor")

1000

1500

2000

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

CO

sen

sor

65

Page 100: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Trajectory

ggplot(airquality,aes(`CO(GT)`,`PT08.S1(CO)`))+geom_path(size=0.02, color="lightgrey") + geom_point(size=0.1) +labs(x="CO", y="CO sensor") + geom_smooth()

1000

1500

2000

0.0 2.5 5.0 7.5 10.0CO

CO

sen

sor

66

Page 101: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Trajectory with timeggplot(airquality,aes(`CO(GT)`,`PT08.S1(CO)`)) +

geom_path(size=0.02, color="lightgrey") +geom_point(size=0.1,aes(color=fulldate)) + labs(x="CO", y="CO sensor")

1000

1500

2000

0.0 2.5 5.0 7.5 10.0CO

CO

sen

sor

avril 2004juil. 2004oct. 2004janv. 2005avril 2005

fulldate

67

Page 102: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple time series

GroupsI group aestheticsI used to group values in a single graphical object (e.g. a line)I useful to display complex entities on a single graphI typical use

I time series!I each time series is a groupI identified by a variable

68

Page 103: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Wide to tallI air quality date are wideI entities: datesI gas point of view: a time series per gasI grouping on gases is impossible: one variable per gasI turn the data to a tall representation: an entity is a date and a gas

gases <- airquality %>% select(-Date, -Time, -AH, -RH,-T) %>% gather(Sensor, Measure, -fulldate) %>%rename(Date = fulldate)

gases %>% print(n = 5)## # A tibble: 93,570 x 3## Date Sensor Measure## <dttm> <chr> <dbl>## 1 2004-03-10 18:00:00 CO(GT) 2.6## 2 2004-03-10 19:00:00 CO(GT) 2## 3 2004-03-10 20:00:00 CO(GT) 2.2## 4 2004-03-10 21:00:00 CO(GT) 2.2## 5 2004-03-10 22:00:00 CO(GT) 1.6## # ... with 9.356e+04 more rows

69

Page 104: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple series

ggplot(gases, aes(Date, Measure, group = Sensor, color = Sensor)) +geom_line(size = 0.02)

0

1000

2000

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

Mea

sure

SensorC6H6(GT)CO(GT)NMHC(GT)NO2(GT)NOx(GT)PT08.S1(CO)PT08.S2(NMHC)PT08.S3(NOx)PT08.S4(NO2)PT08.S5(O3)

70

Page 105: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple series

ggplot(gases, aes(Date, Measure, group = Sensor, color = Sensor)) +geom_line(size = 0.02)

0

1000

2000

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

Mea

sure

SensorC6H6(GT)CO(GT)NMHC(GT)NO2(GT)NOx(GT)PT08.S1(CO)PT08.S2(NMHC)PT08.S3(NOx)PT08.S4(NO2)PT08.S5(O3)

71

Page 106: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple seriesggplot(gases, aes(Date, Measure, group = Sensor, color = Sensor)) +

geom_line(size = 0.02) + coord_cartesian(xlim = as_datetime(c("2004-03-10"," 2004-04-10")))

0

1000

2000

mars 15 mars 22 mars 29 avril 05Date

Mea

sure

SensorC6H6(GT)CO(GT)NMHC(GT)NO2(GT)NOx(GT)PT08.S1(CO)PT08.S2(NMHC)PT08.S3(NOx)PT08.S4(NO2)PT08.S5(O3)

72

Page 107: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple seriesggplot(gases.n, aes(Date, Measure, group = Sensor,

color = Sensor)) + geom_line(size = 0.02) + coord_cartesian(xlim = as_datetime(c("2004-03-10"," 2004-04-10")))

−2.5

0.0

2.5

5.0

7.5

mars 15 mars 22 mars 29 avril 05Date

Mea

sure

SensorC6H6(GT)CO(GT)NMHC(GT)NO2(GT)NOx(GT)PT08.S1(CO)PT08.S2(NMHC)PT08.S3(NOx)PT08.S4(NO2)PT08.S5(O3)

73

Page 108: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Multiple series

ggplot(gases.n, aes(Date, Measure, group = Sensor,color = Sensor)) + geom_line(size = 0.02)

−2.5

0.0

2.5

5.0

7.5

avril 2004 juil. 2004 oct. 2004 janv. 2005 avril 2005Date

Mea

sure

SensorC6H6(GT)CO(GT)NMHC(GT)NO2(GT)NOx(GT)PT08.S1(CO)PT08.S2(NMHC)PT08.S3(NOx)PT08.S4(NO2)PT08.S5(O3)

74

Page 109: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

Weekly sales dataI Sales_Transactions_Dataset_WeeklyI 811 productsI sales per week for each product

prodperweek %>% print(10, n_extra = 2)## # A tibble: 811 x 53## Product_Code W0 W1 W2 W3 W4 W5 W6 W7 W8 W9## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 P1 11 12 10 8 13 12 14 21 6 14## 2 P2 7 6 3 2 7 1 6 3 3 3## 3 P3 7 11 8 9 10 8 7 13 12 6## 4 P4 12 8 13 5 9 6 9 13 13 11## 5 P5 8 5 13 11 6 7 9 14 9 9## 6 P6 3 3 2 7 6 3 8 6 6 3## 7 P7 4 8 3 7 8 7 2 3 10 3## 8 P8 8 6 10 9 6 8 7 5 10 10## 9 P9 14 9 10 7 11 15 12 7 13 12## 10 P10 22 19 19 29 20 16 26 20 24 20## # ... with 801 more rows, and 42 more variables: W10 <dbl>, W11 <dbl>, ...

75

Page 110: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Outline

Introduction

Core principles

A tour of ggplot2

Extensions and other packages

76

Page 111: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mosaic plots

PrinciplesI Mosaic plots display dependencies

between categorical variablesI Recursive splitting and conditional

analysis:I horizontal widths are proportional

to the first variable categoryfrequencies

I vertical heights are proportional tothe second variable categoryconditional frequencies

I can be used with more than 2variables (less readable)

I ggmosaic package for ggplot

primary

secondary

tertiary

unknown

divorced married singlex

y

marital divorced married single

77

Page 112: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mosaic plots

ggplot(bank) + geom_mosaic(mapping=aes(x=product(education),fill=education))

0

0.25

0.5

0.75

1

primary secondary tertiary unknownx

y

educationprimarysecondarytertiaryunknown

78

Page 113: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mosaic plots

ggplot(bank) + geom_mosaic(mapping=aes(x=product(marital,education),fill=education))

divorced

married

single

primary secondary tertiary unknownx

y

educationprimarysecondarytertiaryunknown

79

Page 114: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Mosaic plots

ggplot(bank) + geom_mosaic(mapping=aes(x=product(y,marital,education),fill=y)) +theme(axis.text.x = element_text(angle = -50, hjust = 1))

divorced

married

single

no:primary

yes:primary

no:secondary

yes:secondary

no:tertiary

yes:tertiaryno:unknown

yes:unknown

x

y

ynoyes

80

Page 115: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Alternative package

LimitationsI ggmosaic provides only illustrative plotsI it misses diagnostic plots

Categorical data analysisI specific methodsI specific visualization techniquesI vcd and vcdExtra packages

81

Page 116: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

With vcdstructable(marital~education,data=bank)## marital divorced married single## education## primary 79 526 73## secondary 270 1427 609## tertiary 155 727 468## unknown 24 117 46

mosaic(structable(marital~education,data=bank),

direction=c("v"))

education

mar

ital

primary

sing

lem

arrie

ddi

vorc

ed

secondary tertiary unknown

82

Page 117: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

With vcdmosaic(structable(marital~education,

data=bank),direction=c("v"),gp=shading_hcl)

ShadingI diagnostic plotI Pearson residuals of a

chi-squared testI divergent colormap:

I red: less than expectedunder independence

I blue: more than expectedunder independence

−7.9

−4.0

−2.0

0.0

2.0

4.0

5.9

Pearsonresiduals:

p−value =< 2.22e−16

education

mar

ital

primary

sing

lem

arrie

ddi

vorc

ed

secondary tertiary unknown

83

Page 118: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

With vcdmosaic(structable(y~education+marital,

data=bank),direction=c("v"),gp=shading_hcl)

ShadingI diagnostic plotI Pearson residuals of a

chi-squared testI divergent colormap:

I red: less than expectedunder independence

I blue: more than expectedunder independence

−7.5

−4.0

−2.0

0.0

2.0

4.0

5.9

Pearsonresiduals:

p−value =< 2.22e−16

education

ym

arita

l

primary

sing

le

no yes

mar

ried

divo

rced

secondary

no yes

tertiary

no yes

unknown

noyes

84

Page 119: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Example

−7.5

−4.0

−2.0

0.0

2.0

4.0

5.9

Pearsonresiduals:

p−value =< 2.22e−16

education

y

mar

ital

primarysi

ngle

no yes

mar

ried

divo

rced

secondary

no yes

tertiary

no yes

unknown

noyes

85

Page 120: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Licence

This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.

http://creativecommons.org/licenses/by-sa/4.0/

86

Page 121: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Version

Last git commit: 2020-01-27By: Fabrice Rossi ([email protected])Git hash: 3aeca5ec2f6c6884d5584abc31bc2c55fa38022c

87

Page 122: Ggplot2 - Fabrice Rossiapiacoa.org/publications/teaching/R/ggplot.pdf · size=0.5)+ facet_grid(education~marital) divorced married single primary secondary tertiary unknown 30 50

Changelog

I October 2019: initial version

88