processes and constraints in scientific model construction

PROCESSES AND CONSTRAINTS IN

SCIENTIFIC MODEL CONSTRUCTION

Will Bridewell† and Pat Langley†‡

†Cognitive Systems Laboratory, CSLI, Stanford University

‡CIRCAS, Arizona State University

Where Are We Going?

Introduction to inductive process modeling

Constraints in inductive process modeling

Learning constraints

Inductive Process Modeling

Observations PredictionsModel

Model Objectives: Explanation and PredictionLangley et al. 2002, ICML; Bridewell et al.

2008, ML

• Ordinary Differential Equations

• Processes

Quantitative Process Models

process exponential_growth equations d[hare.density, t, 1] = 2.5 * hare.density

process exponential_loss equations d[wolf.density, t, 1] = −1.2 * wolf.density

process predation_holling_type_1 equations d[hare.density, t, 1] = −0.1 * hare.density * wolf.density d[wolf.density, t, 1] = 0.3 * 0.1 * hare.density * wolf.density

dhare.density/dt = 2.5 * hare.density + −0.1 * hare.density * wolf.densitydwolf.density/dt = −1.2 * wolf.density + 0.3 * 0.1 * hare.density * wolf.density

Advantages of Quantitative Process Models

Process models offer scientists a promising framework because:

· they embed quantitative relations within qualitative structure;

· that refer to notations and mechanisms familiar to experts;

· they provide dynamical predictions of changes over time;

· they offer causal and explanatory accounts of phenomena;

· while retaining the modularity needed for induction/abduction.

Quantitative process models provide an important alternative to formalisms used currently in computational discovery.


• Processes

Modularity in Quantitative Process Models

process exponential_growth equations d[hare.density, t, 1] = 2.5 * hare.density


process predation_holling_type_1 equations d[hare.density, t, 1] = −0.1 * hare.density * wolf.density d[wolf.density, t, 1] = 0.3 * 0.1 * hare.density * wolf.density

dhare.density/dt = 2.5 * hare.density + −0.1 * hare.density * wolf.densitydwolf.density/dt = −1.2 * wolf.density + 0.3 * 0.1 * hare.density * wolf.density


• Processesprocess exponential_growth equations d[hare.density, t, 1] = 2.5 * hare.density


dhare.density/dt = 2.5 * hare.densitydwolf.density/dt = −1.2 * wolf.density



• Processesprocess exponential_growth equations d[hare.density, t, 1] = 2.5 * hare.density


process predation_holling_type_2 equations d[hare.density, t, 1] = −0.1 * hare.density * wolf.density / (1 + 0.2 * –0.1 * hare.density) d[wolf.density, t, 1] = 0.3 * 0.1 * hare.density * wolf.density / (1 + 0.2 * –0.1 * hare.density)

dhare.density/dt = 2.5 * hare.density + −0.1 * hare.density * wolf.density / (1 + 0.2 * –0.1 * hare.density)dwolf.density/dt = −1.2 * wolf.density + 0.3 * 0.1 * hare.density * wolf.density / (1 + 0.2 * –0.1 * hare.density)


Generic Processesgeneric process predation_Holling_1 entities P1{prey}, P2{predator} parameters r[0, infinity], e[0, infinity] equations d[P1.density, t, 1] = −1 * r * P1.density * P2.density d[P2.density, t, 1] = e * r * P1.density * P2.density


InstantiationP1: hare P2: wolf

r: 0.1 e: 0.3


process wolves_eat_hares equations d[hare.density, t, 1] = −1 * 0.1 * hare.density * wolf.density d[wolf.density, t, 1] = 0.3 * 0.1 * hare.density * wolf.density

InstantiationP1: hare P2: wolf

r: 0.1 e: 0.3

The IPM System• Given:

- A library of generic entities and processes- Instantiated entities- Data

Ground the generic processes with instantiated entities Generate all combinations of the ground

processes Fit the numeric parameters of each structure

• Output: The best models based on fit to the data

(a naive approach)

Applications

Aquatic Ecosystems Fjord Dynamics

also, biochemical kinetics, protist interactions, photosynthesis

See Bridewell et al. 2008, Machine Learning, 71, 1–32

Life After IPM

• help scientists formalize their modeling knowledge;

• let scientists consider several alternative models;

• reduce some of the drudgery of model construction;

• speed exploration and evaluation.

Early versions of inductive process modeling systems:

However, IPM produces several structurally implausible models, some of which account quite well for the data.

Model Constraints

eliminate implausible models;

reduce the size of the search space;

make complex domains tractable;

improve model accuracy during incomplete search.

HIPM, Todorovski et al. AAAI-05

Constraints on the structure of models:

Structural constraints differ from constraints on model behavior most importantly because they do not require simulation.

SC-IPM Constraints: Necessary

Name: Nutrient-ReplenishmentType: necessaryProcesses: nutrient_mixing(N), remineralization(N,_ )

Specifies Required Processes

P = primary producerG = grazerN = nutrient

Name: Growth-LimitationType: always-togetherProcesses: limited(P), nutrient_limitation(P, N)

All or None


SC-IPM Constraints: Always-Together

Name: Growth-AlternativesType: exactly-oneProcesses: exponential(P), logistic(P), limited(P)

Mutual Exclusion


SC-IPM Constraints: Exactly-One

Name: Optional-GrazingType: at-most-oneProcesses:

holling_1(P,G), holling_2(P,G), holling_3(P,G)

Enables Optional Processes


SC-IPM Constraints: At-Most-One

The SC-IPM System1. Ground the generic processes with instantiated

entities.

2. Treat ground processes as Boolean literals.

3. Conjoin the individual constraints.

4. Rewrite the constraints in conjunctive normal form.

5. Apply a SAT solver (e.g., DPLL,WalkSAT).

6. Instant model structure!

7. Fit parameters, etc.

Advantages of SC-IPM

• constraints that limit the consideration of implausible models;

• constraint modularity that eases control of the search space.

SC-IPM adds several powerful features to IPM, such as:

The constraints used by SC-IPM typically come from a scientist’s implicit knowledge, and we can both elicit them through examples and learn them computationally.

Goal:Identify implicit or unknown constraintsto use in future modeling tasks

Plan:Analyze the space of model structuresUse machine learning techniques to help

Key Idea:Don’t throw away any modelsEven the bad ones contain valuable

information

Learning Constraints

Bridewell & Todorovski 2007, ILP and KCAP

Learning Constraints1. Build and parameterize process models

2. Store the models for analysis

3. Formally describe the structure of the models

4. Identify good and bad models

5. Use ILP to generate descriptions of accurate and inaccurate model structures

6. Convert the descriptions into SC-IPM constraints

We chose Aleph by Ashwin Srinivasan due to its ready availability and capabilities.

Good and Bad Models

1996–1997 Ross Sea

Good Bad

Extracted ConstraintsA model that includes a second-order exponential mortality process for phytoplankton will be inaccurate. (positive:560, negative: 0)

A model that includes the Lotka–Volterra grazing process will be inaccurate. (positive: 80, negative: 0)

A model that lacks both the first and second order Monod growth limitation process between iron and phytoplankton will be inaccurate. (positive: 448, negative: 0)

Apply Constraints to Other Problems

Ross Sea Across YearsSearch Spaces: 9x–16x smallerModel Distribution: more accurate

Apply Constraints toOther Domains

Ross Sea to Bled Lake

Bridewell & Todorovski AAAI-08 (Transfer Learning Workshop)

Related Work Other quantitative modelers

LAGRAMGE (Todorovski & Dzeroski)

PRET (Bradley & Stolle)

Metalearning and others

Learning Constraint Networks via Version Spaces (Bessiere et al.)

Relational Clichés (Silverstein & Pazzani; Morin & Matwin)

Mode Declarations in ILP (McCreath & Sharma)

Rule Reliability from Prior Performance (Mark Reid)

• continuing the analysis of constraint transfer;

• closing the automated modeling + constraint learning loop;

• basing new analyses and methodologies on model ensembles;

• adapting the general strategies to other tasks;

• supporting other modeling paradigms.

Future Directions

We are currently working in several directions which include:

Inductive process modeling is a fruitful paradigm for exploring knowledge representation, modeling, discovery, and creativity in scientific practice.

processes and constraints in scientific model construction

Documents

density dhare

density dwolf

density process exponential

density process predation

growth equations dhare

loss equations dwolf

quantitative relations

icml bridewell