fairwashing in machine learning€¦ · background problem formulation fairwashing experiments...

1/28

Background Problem formulation Fairwashing Experiments Conclusion & Perspectives

Fairwashing in Machine LearningThe risk of rationalization in black-box explanation

Ulrich Aıvodji, Hiromi Arai, Olivier Fortineau,Sebastien Gambs, Satoshi Hara, Alain Tapp

UQAM

2/28


Motivations

ML models are becoming ubiquitous

High stakes decision-making systems: medical diagnosis,criminal justice, financeDemand for the design of an ethically-aligned AI

Europe: GDRP Right to an explanationMontreal: Declaration de Montreal pour un developpementresponsable de l’IA

Interpretability by designData → decision tree

Black-box explanation a.k.a. post-hoc explanationDNN → decision tree

This work: We show that a dishonest ML models’ producercan perform fairwashing

Given the false perception that a ML model complies with agiven ethical requirement

Case study: fairness as the ethical requirement to “fairwash”

3/28


Motivations

Objective

Raise awareness of fairwashing in machine learning: the risk thatan unfair ML model can be explained in such a way that theunderlying decisions seem fairer than they actually were

3/28


Motivations

How?

Show that one can systematically found a fair interpretable modelto rationalize decisions of an unfair black-box model.

4/28


1 Background

2 Problem formulation

3 Fairwashing

4 Experiments

5 Conclusion & Perspectives

5/28


Metrics

Fairness: demographic parity

|P(y = 1|s = 1)− P(y = 1|s = 0)|.

Fidelity

fidelity(c) =1

|X |∑x∈X

I(c(x) = b(x)).

6/28


Rule list

A rule list d = (dp, δp, q0,K ) of length K ≥ 0 is a (K + 1)−tupleconsisting of K distinct association rules rk = pk → qk , wherepk ∈ dp is the antecedent of the association rule and qk ∈ δp itscorresponding consequent, followed by a default prediction q0.

Example of rule list for salary prediction

IF occupation:white-collar THEN income:≥ 50k

ELSE IF occupation:professional THEN income:≥ 50k

ELSE IF education:bachelors THEN income:≥ 50k

ELSE income:< 50k

7/28


Learning optimal rule lists

CORELS (Angelino et al., 2017)

Input: n categorical attribute + binary labels

Output: optimal rule list

Supervised learning algorithm

Represent the search space as a n-level trie

Objective function: R(d , x , y) = misc(d , x , y) + λK

Select the rule list that minimize R(d , x , y)

Use an efficient branch-and-bound algorithm to prune the trie

8/28


Enumerating rule lists

Model Enumeration (Satoshi Hara & Masakazu Ishihata, 2018)

Enumerate rule lists in a descending order of the objective functionby calculating successively the optimal rule list using CORELS,and then constructing sub-problems excluding the solutionobtained.

9/28


Model rationalization

Given a black-box model b, a set of instances X , and a sensitiveattribute s, find a global interpretable model cg = f (b,X ) derivedfrom b and X , using some process f (·, ·), such thatε(cg ,X , s) > ε(b,X , s), for some fairness metric ε(·, ·, ·).

10/28


Outcome rationalization

Given a black-box model b, an instance x , its neighborhood V(x),and a sensitive attribute s, find a local interpretable modelcl = f (b, x) derived from b and V(x), using some process f (·, ·),such that ε(cl ,V(x), s) > ε(b,V(x), s), for some fairness metricε(·, ·, ·).

11/28


Better call LaundryML

Explores the search space of rule lists with a modified versionof CORELS

New objective function:obj(d , x , y) = (1− β)misc(d , x , y) +βunfairness(d , x , y) +λK

Enumerate rule lists

Select fair rule lists that have higher fidelity

12/28


LaundryML

13/28


Setup

Data & black-box models

Data: Adult Income (resp. ProPublica Recidivism)

Sensitive attribute: gender (resp. race)

Black-box models: random forests

Unfairness of the black-box models: 0.13 (resp. 0.17)

Search space: 28! (resp. 27!)

50 models enumerated per experiment

Evaluation metrics

Unfairness

Fidelity

Feature importance via FairMl

14/28


Model rationalization – Unfairness and Fidelity

●●●●●●

●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●

●●●●●●● ●●● ●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●●

●●●●

●●● ●● ●●● ●●●●●●●●● ●●●●● ●●●●●

●●●

● ●●●●

●●●

●●●●●●●●●●●

●●●●●●●

●●●●●● ●● ●●●

●●

●●●●●●●

●●

●●●●●

Adult Income

λ=0.005

Adult Income

λ=0.01

ProPublica Recidivism

λ=0.005

ProPublica Recidivism

λ=0.01

0.0 0.1 0.2 0.0 0.1 0.2 0.0 0.1 0.2 0.0 0.1 0.2

0.6

0.7

0.8

0.9

Unfairness

Fid

elity

β

●●● 0

0.1

0.2

0.5

0.7

0.9

Figure: Model rationalization for Adult Income and ProPublicaRecidivism.

Best rationalization models

Adult Income: fidelity = 0.908, unfairness =0.058.

ProPublica Recidivism: fidelity = 0.748, unfairness=0.080.

15/28


Model rationalization – Unfairness and Fidelity tradeoffs

Figure: Fidelity/fairness tradeoffs on Adult Income.

16/28


Model rationalization – Unfairness and Fidelity tradeoffs

Figure: Fidelity/fairness tradeoffs on ProPublica Recidivism.

17/28


Model rationalization – Feature importance

Figure: Feature importance Black-box vs Best rationalization model onAdult Income

18/28


Model rationalization – Feature importance

Figure: Feature importance Black-box vs Best rationalization model onProPublica Recidivism

19/28


Outcome rationalization

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3

Unfairness

Pro

port

ion

of u

sers

black−box

β=0.1

β=0.3

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.05 0.06 0.07 0.08 0.09

Unfairness

Pro

port

ion

of u

sers

black−box

β=0.1

β=0.3

β=0.5

β=0.7

β=0.9

Figure: Outcome rationalization. Adult Income (left), ProPublicaRecidivism (right).

20/28


Generalization to other fairness metrics (1/3)

0.00

0.25

0.50

0.75

1.00

0.05 0.10 0.15

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.9

Figure: Model rationalization. Adult Income, Random forest, OverallAccuracy Equality.

21/28



0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

Figure: Model rationalization. Adult Income, Random forest, ConditionalProcedure Accuracy.

22/28



0.00

0.25

0.50

0.75

1.00

0.00 0.05 0.10 0.15

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

Figure: Model rationalization. Adult Income, Random forest,Demographic parity.

23/28


Generalization to other black-box models (1/3)

0.00

0.25

0.50

0.75

1.00

0.00 0.05 0.10

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6 0.8

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

Figure: Model rationalization. Adult Income, SVM, Demographic parity.

24/28



0.00

0.25

0.50

0.75

1.00

0.00 0.05 0.10 0.15

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

Figure: Model rationalization. Adult Income, XGBOOST, Demographicparity.

25/28



0.00

0.25

0.50

0.75

1.00

0.00 0.05 0.10 0.15 0.20

Unfairness

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75

Fidelity

Pro

port

ion

of m

odel

s

β=0.0

β=0.1

β=0.2

β=0.5

β=0.7

β=0.9

Figure: Model rationalization. Adult Income, MLP, Demographic parity.

26/28


Conclusion

LaundryMl: black-box explanations can be used to rationalizeunfair decisions of a black-box model

Can we trust black-box explanations?

27/28


Perspectives

Detecting fairwashing

Study the root cause: robustness of explanations

Learn more

Our work: Fairwashing: the risk of rationalization. ICML’19

Another approach: Pretending Fair Decisions via StealthilyBiased Sampling. arXiv:1901.08291, 2019

Blog post on post rationalization: Interpretability andPost-Rationalization

28/28


Thank you!

fairwashing in machine learning€¦ · background problem formulation fairwashing experiments...

Documents