attribute interactions in machine learningjakulin/int/interaction-slides.pdfmarital−status...

21
Attribute Interactions in Machine Learning Aleks Jakulin Faculty of Computer and Information Science University of Ljubljana Attribute Interactions – p.1/17

Upload: others

Post on 28-Jan-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Attribute Interactionsin Machine Learning

Aleks Jakulin

Faculty of Computer and Information Science

University of Ljubljana

Attribute Interactions – p.1/17

Page 2: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

A Classification Problem

ATTRIBUTES LABEL

Name Hair Height Weight Lotion Result

Sarah blonde average light no sunburned

Dana blonde tall average yes tanned

Alex brown short average yes tanned

Annie blonde short average no sunburned

Emily red average heavy no sunburned

Pete brown tall heavy no tanned

John brown average heavy no tanned

Katie blonde short light yes tanned

TASK: PREDICT AN INSTANCE’S CLASS GIVEN THE ATTRIBUTE VALUES.

Attribute Interactions – p.2/17

Page 3: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Interactions

“We cannot conquer a group of interacting attributes by

dividing them.”

Most machine learning algorithms assume either

that all attributes are independent (naïve Bayes, logisticregression, linear SVM, perceptron),

or that all attributes are dependent (classification trees,

constructive induction, rules, kernel methods, instance-based

methods).

However, voting ensembles, where a number of classifiers

trained on subsets of attributes or instances vote to predict

the label (attribute decomposition, random forests, decision

graphs, subspace methods), yield good results. Why?

Attribute Interactions – p.3/17

Page 4: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Voting

Hair Name Height

Result

Lotion Weight

Attribute Interactions – p.4/17

Page 5: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Voting

Hair Name Height

SKIN Result

Lotion Weight

WE DECLARE A TRUETHIS TO BE: INTERACTION

SPURIOUS RELATIONSHIP

MODERATOR

Attribute Interactions – p.4/17

Page 6: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Voting

Hair Name Height

SKIN Result SIZE

Lotion Weight

WE DECLARE A TRUE A FALSETHIS TO BE: INTERACTION INTERACTION

SPURIOUS RELATIONSHIP

MODERATOR

LATENT CAUSE

LATENT CAUSE

Attribute Interactions – p.4/17

Page 7: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Simpson’s Paradox

0

0.1

0.2

0.3

0.4

0.5

0.6D

eath

Rat

e (%

)

Location

Tuberculosis Patients

New York Richmond

Attribute Interactions – p.5/17

Page 8: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Simpson’s Paradox

0

0.1

0.2

0.3

0.4

0.5

0.6D

eath

Rat

e (%

)

Location

Tuberculosis Patients

New York Richmond

WhiteNon-White

Both

Attribute Interactions – p.5/17

Page 9: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Information Gain

C

BA

An attribute is an information source. We

want to estimate the amount of information

shared between two sources.

The amount learned about a label C from

an attribute A is quantified by information

gain: GainC(A) := H(A) + H(C)−H(AC).

Interpretation: our ignorance about an

unknown C reduces by GainC(A) given the

knowledge of A.

Sufficient, if all attributes are conditionally in-

dependent with respect to the label, when

there are only 2-way interactions.

Attribute Interactions – p.6/17

Page 10: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Interaction Gain

C

BA

How to estimate the amount ofinformation shared among threeattributes?

Generalization of information gain for3-way interactions is interaction gain:

IG3(ABC) := H(AB) + H(AC) + H(BC)−H(A)

−H(B)−H(C)−H(ABC)

= GainC(AB)−GainC(A)−GainC(B).

If IG negative: a false interaction.

If IG positive: a true interaction.

If IG zero: no 3-way interaction.

Attribute Interactions – p.7/17

Page 11: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

False Interaction Analysis

age

marital−status

relationship

hours−per−week

sex

workclass

native−country

race

education

education−num

occupation

capital−gain

capital−loss

fnlwgt

0 200 400 600 800 1000

Height The Census/Adult domain

from UCI, 2-classes of

individuals: rich, poor.

Similarity between two

attributes is proportional to

negated 3-interaction gain

between them and the label.

Only false interactions were

included into consideration.

Agglomerative clustering was

used to create the interaction

dendrogram.Attribute Interactions – p.8/17

Page 12: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

True Interaction Analysis

native_country

age

100%

race

23%

workclass

75%

occupation

75%

capital_loss

capital_gain

63%

education

59%

marital_status

52%

relationship

46%

hours_per_week

35%

A percentage on an interaction graph edge indicatesthe strength of a true interaction.

Native country appears to be an important moderator,moderating a large number of 2-way interactions.

True interactions are rarely transitive relations.

True interactions are a forest of trees, not a single tree.

Attribute Interactions – p.9/17

Page 13: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Interaction Significance (1)

When is an interaction significant?

Special statistics for conditional dependenceand independence tests, e.g.,Cochran-Mantel-Haenszel.

Evaluate classifier performance on unseendata by comparing:

A classifier assuming independence between twoattributes (voting).

A classifier exploiting dependence between twoattributes via interaction resolution (segmentation).

Attribute Interactions – p.10/17

Page 14: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Interaction Significance (2)

DRUZINSK

ED_FAZAS

27%

KTZHT

0%

ED_UPA

1%

VASK.INV

0%

1%

OPERACIJ

0%

RT

0%

UPASEL

2%

0%

ED_PAI2

0%

PAI2SEL

0%

INV.KAPS

2%

NEOKT

0%

ANAMNEZA

1%

1%

33%

LOKOREG.

1%

ODDALJEN

2%

MKL

0%

ED_PREGL.BE

0%

0% 0%

NOV.TU

0%

MENSEL

1%

NKL

1%

0%

UICC

0%

KT

0%

D_STEVILO

0%

INVAZIJ1

1%

1%

D_PAI1

4%

0%

PAI1SEL

0% 0%

UPARSEL

3% 2%20%

0%

INVAZIJA

3%

0%

HR

D_PR.B

4%

HT

3%

1%

1% 0% 0%

0%

ZDRAVLJE

0%0%

LOKALIZA

0%0%

KAT.L

0%

0%

VELSEL

0%NODSEL

0%

D_DFS2

0%

UPAR

0%

1%

GRADSEL

2%

TIP.TU

0%

GRADUS

0%

LIMF.INV

0%

D_ER.B

0%

ED_KATL

1% 2%

100%

ED_KAT.D

2%

0%

VEL.C

2%

There are generally few significantinteractions.

Attribute Interactions – p.11/17

Page 15: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Interaction Significance (2)

DRUZINSK

ED_FAZAS

27%

KTZHT

0%

ED_UPA

1%

VASK.INV

0%

1%

OPERACIJ

0%

RT

0%

UPASEL

2%

0%

ED_PAI2

0%

PAI2SEL

0%

INV.KAPS

2%

NEOKT

0%

ANAMNEZA

1%

1%

33%

LOKOREG.

1%

ODDALJEN

2%

MKL

0%

ED_PREGL.BE

0%

0% 0%

NOV.TU

0%

MENSEL

1%

NKL

1%

0%

UICC

0%

KT

0%

D_STEVILO

0%

INVAZIJ1

1%

1%

D_PAI1

4%

0%

PAI1SEL

0% 0%

UPARSEL

3% 2%20%

0%

INVAZIJA

3%

0%

HR

D_PR.B

4%

HT

3%

1%

1% 0% 0%

0%

ZDRAVLJE

0%0%

LOKALIZA

0%0%

KAT.L

0%

0%

VELSEL

0%NODSEL

0%

D_DFS2

0%

UPAR

0%

1%

GRADSEL

2%

TIP.TU

0%

GRADUS

0%

LIMF.INV

0%

D_ER.B

0%

ED_KATL

1% 2%

100%

ED_KAT.D

2%

0%

VEL.C

2%

ODDALJEN > 0: y

ODDALJEN <= 0:

:...LOKOREG. <= 0: n

LOKOREG. > 0: y

A PERFECT CLASSIFICATION TREE FOR

THE ‘BREAST’ DOMAIN, INDUCED BY

C4.5.

But they matter: non-myopic feature selection,non-myopic split selection, non-myopic discretization,rules, trees, constructive induction.

Attribute Interactions – p.11/17

Page 16: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Classification Performance

‘adult’ Base False True

NBC 0.416 0.352 0.392

LR 1.562 0.418 1.564

SVM — — —

‘breast’ Base False True

NBC 0.262 0.187 0.171

LR 0.016 0.016 0.016

SVM 0.032 0.032 0.016

A wrapper algorithm detects true or false interactions with

interaction gain and uses minimal-error attribute reduction to

resolve them. No feature selection and no parameter tuning

was used.

It improves results with logistic regression, SVM, and the

naïve Bayesian classifier.

There must be enough data!

Attribute Interactions – p.12/17

Page 17: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Applications

Prediction:

Resolving significant interactions helps improveclassification performance.

Interactions limit or prevent myopia in discretizationand feature selection.

Interactions justify constructive induction.

Analysis:

Interactions are interesting, especially if unexpected:interactions between treatments, symptoms, etc.

Attribute Interactions – p.13/17

Page 18: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Summary of Contributions

Two kinds of interactions: true and false interactions.

Interaction gain is an interaction probe, able to detectand classify 3-way interactions.

The pragmatic interaction significance test, based oncomparison of classification performance on unseendata.

True and false interaction analysis methodology, withinteraction graphs and interaction dendrograms.

Improving classification performance with interactionresolution.

Attribute Interactions – p.14/17

Page 19: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Further Work

A full-fledged tool for interaction analysis.

Support for numerical and ordered attributes.

Generalization to k-way interactions.

Improved methods of resolution, especially offalse interactions.

Exploration of implications of interactions todiscretization, split selection, etc.

Applications.

Attribute Interactions – p.15/17

Page 20: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Cardinality of Attributes

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0 200 400 600 800 1000 1200 1400 1600

impr

ovem

ent b

y re

plac

emen

t

number of joint attribute values

Adult/Census

The greater the number of values in the constituent at-

tributes, the lower the chances of the interaction between

them to be significant.Attribute Interactions – p.16/17

Page 21: Attribute Interactions in Machine Learningjakulin/Int/interaction-slides.pdfmarital−status relationship hours−per−week sex workclass native−country race education education−num

Attribute Reduction

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

-0.08 -0.06 -0.04 -0.02 0 0.02 0.04

impr

ovem

ent b

y re

plac

emen

t (C

arte

sian

)

improvement by replacement (MinErr)

Adult

Minimal-error attribute reduction often yields better results

than using the non-reduced Cartesian product of attributes.

Attribute Interactions – p.17/17