evaluation poster neurips - amanda gentzel poster neurips.pdf · type of evaluation? q4. does...

The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data Summary Algorithms for causal discovery are unlikely to be widely accepted unless they can be shown to accurately predict the effects of intervention when applied to empirical data. Q1. Don’t we do this already? Q2. Aren’t structural measures enough? Q3. Is there any data that supports this type of evaluation? Q4. Does empirical data add any value? • We surveyed 91 causality papers from the past 5 years of NeurIPS, AAAI, KDD, UAI, and ICML. Source of data Both Synthetic Empirical 0 10 20 30 40 50 60 70 Structural Distributional Visual inspection Interventional Synthetic Empirical Type of algorithm Bivariate Other Multivariate Observational Sampling Observational Data Interventional Data C T O Parameterized DAG Structure Learning & Parameterization Compute Interventional Distribution Estimate Interventional Distribution Evaluation T O C 1 5.7 L 0 3.2 L 1 4.5 H 0 4.3 H 1 6.2 H 0 1.5 H 1 5.3 L ID 1 1 2 2 3 3 4 0 4.6 L 4 … T O C 0 3.2 L 1 4.5 H 1 6.2 H 1 5.3 L ID 1 2 3 4 … Synthetic (PC) Synthetic (GES) Empirical GES MMHC PC • Many datasets exist with known interventional effects (DREAM, ACIC 2016 challenge, flow cytometry, cause-effect pairs challenge, etc.) • We can collect data from computational systems • Many advantages: • Empirical • Easily intervenable • Natural stochasticity • We collected data from Postgres, the JDK, and networking infrastructure and intervened by changing system parameters • Can create pseudo-observational data by biasing with an observed covariate • Most structural measures penalize all errors equally • Even structural measures designed to consider interventions (ex: Structural Intervention Distance) produce similar results to other structural measures • Interventional measures (ex: Total variation distance) • Generated random DAGs and compare different evaluation measures, for both GES and PC • TVD and SHD produce significantly different results, varying by algorithm Measures of interventional effect are necessary when trying to assess how well an algorithm learns actual causal effects Effective methods exist to create empirical data for evaluating algorithms for causal discovery. GES PC • Empirical data provides realistic complexity generally not present in synthetic data • Data not generated by the researcher is less likely to contain unintentional biases and can be standardized across the community • Provides a stronger demonstration of effectiveness • Learned causal structure of computational systems using PC (left) and GES (center) • Generated synthetic data based on these structures and evaluated performance of GES, MMHC, and PC • Compare to performance of GES, MMHC, and PC on the original empirical data – significantly different relative order A. No, fewer than 10% of papers published in the past 5 years use a combination of empirical data and interventional measures. A. No, structural measures correspond poorly to measures of interventional effect, such as TVD. A. Yes, several data sets exist, including ones we have recently created from experimentation with computational systems. A. Yes, results on empirical data sets appear to differ substantially from results on ‘look-alike’ synthetic data sets. Amanda Gentzel, Dan Garant, and David Jensen College of Information and Computer Sciences,University of Massachusetts Amherst Percentage of papers using Evaluation measures

Upload: others

Post on 04-Jun-2020

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Evaluation poster NeurIPS - Amanda Gentzel poster NeurIPS.pdf · type of evaluation? Q4. Does empirical data add any value? •We surveyed 91 causality papers from the past 5 years

The Case for Evaluating Causal Models UsingInterventional Measures and Empirical Data

SummaryAlgorithms for causal discovery

are unlikely to be widely acceptedunless they can be shown to accurately predict the effects of intervention when

applied to empirical data.

Q1. Don’t we do this already? Q2. Aren’t structural measures enough?

Q3. Is there any datathat supports thistype of evaluation?

Q4. Does empirical data add any value?

• We surveyed 91 causality papers from the past 5 years of NeurIPS, AAAI, KDD, UAI, and ICML.

Source of data

BothSynthetic

Empirical

Structural Distributional Visual inspection Interventional

Synthetic Empirical

Type of algorithm

BivariateOther

Multivariate

ObservationalSampling

Observational DataInterventional Data

Parameterized DAG

Structure Learning &Parameterization

Compute InterventionalDistribution

Estimate InterventionalDistributionEvaluation

T O C1 5.7 L0 3.2 L1 4.5 H0 4.3 H1 6.2 H0 1.5 H1 5.3 L

ID1122334

0 4.6 L4…

T O C0 3.2 L1 4.5 H1 6.2 H1 5.3 L

ID1234

…

Synthetic (PC) Synthetic (GES) Empirical

GES

MMHC

• Many datasets exist with knowninterventional effects (DREAM, ACIC 2016 challenge, flow cytometry,cause-effect pairs challenge, etc.)

• We can collect data fromcomputational systems

• Many advantages:• Empirical• Easily intervenable• Natural stochasticity

• We collected data from Postgres, the JDK, and networking infrastructure and intervened by changing system parameters

• Can create pseudo-observational data by biasing with an observed covariate

• Most structural measures penalize all errors equally• Even structural measures designed to consider

interventions (ex: Structural Intervention Distance) produce similar results to other structural measures

• Interventional measures (ex: Total variation distance) • Generated random DAGs and compare different evaluation

measures, for both GES and PC• TVD and SHD produce significantly different results,

varying by algorithm

Measures of interventional effect arenecessary when trying to assess how well an

algorithm learns actual causal effects

Effective methods exist to createempirical data for evaluating

algorithms for causaldiscovery.

GES

• Empirical data provides realistic complexity generally not

present in synthetic data• Data not generated by the

researcher is less likely to containunintentional biases and can be

standardized across the community• Provides a stronger demonstration of

effectiveness• Learned causal structure of computational systems using

PC (left) and GES (center)• Generated synthetic data based on these structures and evaluated

performance of GES, MMHC, and PC• Compare to performance of GES, MMHC, and PC on the original

empirical data – significantly different relative order

A. No, fewer than 10% of papers published in the past 5 years use a combination of empirical data and interventional measures.

A. No, structural measures correspond poorly to measures of interventional effect, such as TVD.

A. Yes, several data sets exist, including ones we have recently created from experimentation with computational systems.

A. Yes, results on empirical data sets appear to differ substantially from results on ‘look-alike’ synthetic data sets.

Amanda Gentzel, Dan Garant, and David JensenCollege of Information and Computer Sciences,University of Massachusetts Amherst

Percentage of papers usingEvaluation measures

Exploring Global Hedge Accounting Changes and Their Impacts · 2016-04-12 · Exploring Global Hedge Accounting Changes and Their Impacts Dan Gentzel, CPA Managing Director Chatham

GENTZEL, TRACY, M.S. Effects of Methane Producers and Consumers on

Shape and Material from Sound - NeurIPS

Manifold structure in graph embeddings - NeurIPS

CO-Optimal Transport - NeurIPS

NeurIPS 2019 Notes · This document contains notes I took during the events I managed to make it to at NeurIPS 2019 in Vancouver, BC, Canada. Please feel free to distribute it and

NeurIPS 2019 Eﬃcient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

Baxter Permutation Process - NeurIPS

Learning to Play the Game of Chess - NeurIPS

91.5x122 cm Poster Template - SAGA91.5x122 cm Poster Template Subject Free PowerPoint poster templates Keywords poster presentation, poster design, poster template Created Date 12/6/2010

6 Supplemental material - NeurIPS

Generative 3D Part Assembly via Dynamic Graph ... - NeurIPS

New NSBA Guide Helps School Boards Navigate Student Data … · 2014-04-29 · NSBA Executive Director Thomas J. Gentzel. “This guide is a helpful tool for school boards as they

poster example ee poster

Cultural Analysis Project by Margaret Gentzel Introduction

Reinforcement Learning for Trading - NeurIPS

Exploring Global Hedge Accounting Changes and … Global Hedge Accounting Changes and Their Impacts Dan Gentzel, CPA Managing Director Chatham Financial Karen Weller, CTP Director

Poster. Poster Bus Poster Bus poster

Self-Paced Deep Reinforcement Learning - NeurIPS

Adversarial Machine Learning in Sequential Decision Makingpages.cs.wisc.edu/~jerryzhu/pub/AdvMLSeq.pdf · Adversarial attacks on stochastic bandits. NeurIPS, 2018 • L Lessard, X

· Poster #1333 Poster #1305 Poster #339 Poster #285 Poster #276 Poster #245 Poster #327 Poster #233 Poster #234 Poster #235 Poster #237 Poster #250

Poster Poster Poster Poster Poster Poster Poster Poster

NeurIPS 2020 E cientQA Competition: Systems, Analyses and

Support Vector Regression Machines - NeurIPS

American Propaganda Posters WWII · 2019-03-05 · Poster 1 . Poster 2 . Poster 3 . Poster 4 . Poster 5 . Poster 6 . Poster 7 . Poster 8 . Poster 9 . Poster 10 . Title: Microsoft

Generative adversarial networks...Generative adversarial nets. NeurIPS. Longer version Goodfellow, (2016). NIPS 2016 Tutorial: Generative Adversarial Networks. Extra Radford et al,

The Power of Amnesia - NeurIPS

Unsupervised Deep Learning - NeurIPS · Part 1 – Alex Graves Introduction to unsupervised learning Autoregressive models Representation learning ... Want rapid generalisation to

Adaptive Density Estimation for Generative ModelsDensity Estimation for Generative Models. NeurIPS 2019 - Thirty-third Conference on Neural Infor NeurIPS 2019 - Thirty-third Conference

Optimal Brain Damage - NeurIPS

Optimization by Mean Field Annealing - NeurIPS

Constrained Differential Optimization - NeurIPS

Solving Stochastic Games - NeurIPS

From the Pastor October 9, 2005 - easeton.neteaseton.net/wp-content/uploads/2015/09/B-September-… · Web viewJoan Drake, Fred Gentzel; Altar ... Thomas Murdock, Stephanie Kornspan,

POSTER, STUDENT POSTER, CLINICAL VIGNETTE … STUDENT POSTER, CLINICAL VIGNETTE POSTER, AND INNOVATION ACADEMY POSTER PRESENTATION INSTRUCTIONS POSTER Viewing: Thursday, ... and pin