pat langley institute for the study of learning and expertise 2164 staunton court, palo alto,...

31
Pat Langley Pat Langley Institute for the Study of Learning and Expertise Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California 2164 Staunton Court, Palo Alto, California and and School of Computing and Informatics School of Computing and Informatics Arizona State University, Tempe, AZ Arizona State University, Tempe, AZ http://www.isle.org/~langley http://www.isle.org/~langley An Interactive Environment An Interactive Environment for for Scientific Model Discovery Scientific Model Discovery . Asgharbeygi, K. Arrigo, D. Billman, S. Borrett, W. Bridewell, S. . Asgharbeygi, K. Arrigo, D. Billman, S. Borrett, W. Bridewell, S. D and L. Todorovski for their contributions to this research, which and L. Todorovski for their contributions to this research, which i m the National Science Foundation. m the National Science Foundation.

Upload: katherine-mcwilliams

Post on 27-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Pat LangleyPat Langley

Institute for the Study of Learning and ExpertiseInstitute for the Study of Learning and Expertise2164 Staunton Court, Palo Alto, California2164 Staunton Court, Palo Alto, California

andand

School of Computing and InformaticsSchool of Computing and InformaticsArizona State University, Tempe, AZArizona State University, Tempe, AZ

http://www.isle.org/~langleyhttp://www.isle.org/~langley

An Interactive Environment forAn Interactive Environment forScientific Model DiscoveryScientific Model Discovery

Thanks to N. Asgharbeygi, K. Arrigo, D. Billman, S. Borrett, W. Bridewell, S. Dzeroski, Thanks to N. Asgharbeygi, K. Arrigo, D. Billman, S. Borrett, W. Bridewell, S. Dzeroski, J. Sanchez, and L. Todorovski for their contributions to this research, which is funded by J. Sanchez, and L. Todorovski for their contributions to this research, which is funded by a grant from the National Science Foundation.a grant from the National Science Foundation.

Page 2: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Scientific Discovery as Problem SolvingScientific Discovery as Problem Solving

attempts to understand the nature of scientific creativityattempts to understand the nature of scientific creativity by replicating episodes from the history of scienceby replicating episodes from the history of science by generating entirely new scientific knowledgeby generating entirely new scientific knowledge

produces knowledge in formalisms used by scientistsproduces knowledge in formalisms used by scientists descriptive laws (qualitative and quantitative)descriptive laws (qualitative and quantitative) explanatory models (structural and process)explanatory models (structural and process)

treats discovery as a form of everyday problem solvingtreats discovery as a form of everyday problem solving laws and models are mental problem stateslaws and models are mental problem states discovery occurs through heuristic searchdiscovery occurs through heuristic search

Computational scientific discovery is a research movement that:Computational scientific discovery is a research movement that:

When Simon (1966) first proposed this view, it seemed radical; When Simon (1966) first proposed this view, it seemed radical; now it has received wide acceptance .now it has received wide acceptance .

Page 3: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Computational Research on Scientific DiscoveryComputational Research on Scientific Discovery

1989 19901979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Bacon.1–Bacon.5Abacus,

CoperFahrehneit, E*,

Tetrad, IDSN

Hume,ARC

DST, GPN

LaGrangeSDS

SSF, RF5,LaGramge

Dalton, Stahl

RL, Progol

Gell-MannBR-3,

MendelPauli

Stahlp,Revolver

Dendral

AM Glauber NGlauberIDSQ,

Live

IECoast, Phineas,AbE, Kekada

Mechem, CDPAstra,GPM

HR

BR-4

Numeric laws Qualitative laws Structural models Process modelsLegendLegend

Few efforts have attempted to model the details of human discovery Few efforts have attempted to model the details of human discovery either historically or in the laboratory.either historically or in the laboratory.

Most work has aimed to automate discovery, rather than to produce Most work has aimed to automate discovery, rather than to produce computational aids.computational aids.

Page 4: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Observations from the Ross SeaObservations from the Ross Sea

Page 5: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

A Model of the Ross Sea EcosystemA Model of the Ross Sea Ecosystem

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo + 0.411 zoo + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[detritus,t,1] = 0.307 d[detritus,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo + 0.385 zoo + 0.385 0.495 0.495 zoo zoo 0.005 0.005 detritusdetritus

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 detritus detritus

Differential equation models of this sort are regularly used to Differential equation models of this sort are regularly used to explain observations and predict future behavior. explain observations and predict future behavior.

Page 6: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

The Task of Model ConstructionThe Task of Model Construction

Environmental scientists are confronted with a challenging task: Environmental scientists are confronted with a challenging task:

Given:Given: A set of variables of interest to the scientist; A set of variables of interest to the scientist;

Given:Given: Observations of how these variables change over time; Observations of how these variables change over time;

Find:Find: A model that explains these variations in plausible terms A model that explains these variations in plausible terms and that generalizes well to future observations. and that generalizes well to future observations.

Automating such model construction is a natural task for artificial Automating such model construction is a natural task for artificial intelligence and machine learning. intelligence and machine learning.

We can develop algorithms that search the space of differential We can develop algorithms that search the space of differential equation models, but this space is huge, so we need constraints. equation models, but this space is huge, so we need constraints.

Page 7: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Another Account of the Ross Sea EcosystemAnother Account of the Ross Sea Ecosystem

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo + 0.411 zoo + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[detritus,t,1] = 0.307 d[detritus,t,1] = 0.307 phyto + phyto + 0.251 0.251 zoo + 0.385 zoo + 0.385 0.495 0.495 zoo zoo 0.005 0.005 detritusdetritus

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 detritus detritus

As phytoplankton uptakes nitrogen, its As phytoplankton uptakes nitrogen, its concentration increases and nitrogen concentration increases and nitrogen decreases. This continues until the decreases. This continues until the nitrogen supply is exhausted, which nitrogen supply is exhausted, which leads to a phytoplankton die off. This leads to a phytoplankton die off. This produces detritus, which gradually produces detritus, which gradually remineralizes to replenish the nitrogen. remineralizes to replenish the nitrogen. Zooplankton grazes on phytoplankton, Zooplankton grazes on phytoplankton, which slows the latter’s increase and which slows the latter’s increase and also produces detritus. also produces detritus.

Page 8: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Processes in the Ross Sea EcosystemProcesses in the Ross Sea Ecosystem

d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo + 0.411 zoo + 0.411 phyto phyto

d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo + 0.615 zoo + 0.615 0.495 0.495 zoo zoo

d[detritus,t,1] = 0.307 d[detritus,t,1] = 0.307 phyto phyto + + 0.251 0.251 zoo + 0.385 zoo + 0.385 0.495 0.495 zoo zoo 0.005 0.005 detritusdetritus

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 detritus detritus

Here we highlight the terms related to phytoplantkon loss, which Here we highlight the terms related to phytoplantkon loss, which decreases phyto concentration and increases detritus. decreases phyto concentration and increases detritus.

Knowledge about candidate processes requires that some terms Knowledge about candidate processes requires that some terms occur either together or not at all. occur either together or not at all.

Page 9: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

d[phyto,t,1] =d[phyto,t,1] = 0.307 0.307 phyto phyto 0.495 0.495 zoo zoo + 0.411 + 0.411 phyto phyto

d[zoo,t,1] =d[zoo,t,1] = 0.251 0.251 zoo + zoo + 0.615 0.615 0.495 0.495 zoo zoo

d[detritus,t,1] =d[detritus,t,1] = 0.307 0.307 phyto + phyto + 0.251 0.251 zoo + zoo + 0.385 0.385 0.495 0.495 zoo zoo 0.005 0.005 detritusdetritus

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto + 0.005 phyto + 0.005 detritus detritus

Processes in the Ross Sea EcosystemProcesses in the Ross Sea Ecosystem

We can use knowledge about processes to reorganize models and We can use knowledge about processes to reorganize models and constrain search through the model space. constrain search through the model space.

Here we highlight terms related to zooplankton grazing, which Here we highlight terms related to zooplankton grazing, which decreases phyto but increases zoo and detritus. decreases phyto but increases zoo and detritus.

Page 10: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

A Process Model for the Ross SeaA Process Model for the Ross Sea

model Ross_Sea_Ecosystemmodel Ross_Sea_Ecosystem

variables: phyto, zoo, nitro, detritusvariables: phyto, zoo, nitro, detritusobservables: phyto, nitroobservables: phyto, nitro

process phyto_lossprocess phyto_loss equations:equations: d[phyto,t,1] = d[phyto,t,1] = 0.307 0.307 phyto phyto

d[detritus,t,1] = 0.307 d[detritus,t,1] = 0.307 phyto phyto

process zoo_lossprocess zoo_loss equations:equations: d[zoo,t,1] = d[zoo,t,1] = 0.251 0.251 zoo zoo

d[detritus,t,1] = 0.251 d[detritus,t,1] = 0.251 zoo zoo

process zoo_phyto_grazingprocess zoo_phyto_grazing equations:equations: d[zoo,t,1] = 0.615 d[zoo,t,1] = 0.615 0.495 0.495 zoo zoo

d[detritus,t,1] = 0.385 d[detritus,t,1] = 0.385 0.495 0.495 zoo zood[phyto,t,1] = d[phyto,t,1] = 0.495 0.495 zoo zoo

process nitro_uptakeprocess nitro_uptake equations:equations: d[phyto,t,1] = 0.411 d[phyto,t,1] = 0.411 phyto phyto

d[nitro,t,1] = d[nitro,t,1] = 0.098 0.098 0.411 0.411 phyto phyto

process nitro_remineralization;process nitro_remineralization; equations:equations: d[nitro,t,1] = 0.005 d[nitro,t,1] = 0.005 detritus detritus

d[detritus,t,1 ] = d[detritus,t,1 ] = 0.005 0.005 detritus detritus

This model is equivalent to a This model is equivalent to a standard differential equation standard differential equation model, but it makes explicit model, but it makes explicit assumptions about which assumptions about which processes are involved. processes are involved.

For completeness, we must For completeness, we must also make assumptions about also make assumptions about how to combine influences how to combine influences from multiple processes. from multiple processes.

Page 11: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

The Task of Inductive Process ModelingThe Task of Inductive Process Modeling

We can use these ideas to reformulate the modeling problem: We can use these ideas to reformulate the modeling problem:

Given:Given: A set of variables of interest to the scientist; A set of variables of interest to the scientist;

Given:Given: Observations of how these variables change over time; Observations of how these variables change over time;

Given:Given: Background knowledge about plausible processes; Background knowledge about plausible processes;

Find:Find: A A process modelprocess model that explains these variations and that that explains these variations and that generalizes well to future observations. generalizes well to future observations.

We can use background knowledge about candidate processes to We can use background knowledge about candidate processes to make search much more tractable. make search much more tractable.

Moreover, the resulting model will be consistent with this domain Moreover, the resulting model will be consistent with this domain knowledge, making it more comprehensible. knowledge, making it more comprehensible.

Page 12: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Generic Processes as Background KnowledgeGeneric Processes as Background Knowledge

the variables involved in a process and their types;the variables involved in a process and their types;

the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges;

the forms of conditions on the process; andthe forms of conditions on the process; and

the forms of associated equations and their parameters.the forms of associated equations and their parameters.

We cast background knowledge as We cast background knowledge as generic processesgeneric processes that specify: that specify:

Generic processes are building blocks from which one can compose Generic processes are building blocks from which one can compose a specific process model. a specific process model.

Page 13: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Generic Processes for Aquatic EcosystemsGeneric Processes for Aquatic Ecosystems

generic process exponential_lossgeneric process exponential_loss generic process remineralizationgeneric process remineralization variables: S{species}, D{detritus}variables: S{species}, D{detritus} variables: N{nutrient}, variables: N{nutrient}, D{detritus}D{detritus} parameters: parameters: [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S,t,1] = d[S,t,1] = 1 1 S S equations: equations: d[N, t,1] = d[N, t,1] = D D

d[D,t,1] = d[D,t,1] = S S d[D, t,1] = d[D, t,1] = 1 1 DD

generic process grazinggeneric process grazing generic process constant_inflowgeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus}variables: S1{species}, S2{species}, D{detritus} variables: variables: N{nutrient}N{nutrient} parameters: parameters: [0, 1], [0, 1], [0, 1] [0, 1] parameters: parameters: [0, 1] [0, 1] equations:equations: d[S1,t,1] = d[S1,t,1] = S1 S1 equations: equations: d[N,t,1] = d[N,t,1] =

d[D,t,1] = (1 d[D,t,1] = (1 ) ) S1 S1d[S2,t,1] = d[S2,t,1] = 1 1 S1 S1

generic process nutrient_uptakegeneric process nutrient_uptake variables: S{species}, N{nutrient}variables: S{species}, N{nutrient} parameters: parameters: [0, [0, ], ], [0, 1], [0, 1], [0, 1] [0, 1] conditions:conditions: N > N > equations:equations: d[S,t,1] = d[S,t,1] = S S

d[N,t,1] = d[N,t,1] = 1 1 S S

Our current library contains Our current library contains about 20 generic processes, about 20 generic processes, including ones with alternative including ones with alternative functional forms for loss and functional forms for loss and grazing processes. grazing processes.

Page 14: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

process exponential_growth process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1,equations: d[P,t] = [0, 1,] ] P P

process logistic_growthprocess logistic_growth variables: P {population}variables: P {population} equations: d[P,t] = [0, 1, equations: d[P,t] = [0, 1, ] ] P P (1 (1 P / [0, 1, P / [0, 1, ])])

process constant_inflowprocess constant_inflow variables: I {inorganic_nutrient}variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, equations: d[I,t] = [0, 1, ]]

process consumptionprocess consumption variables: P1 {population}, P2 {population}, variables: P1 {population}, P2 {population}, nutrient_P2 nutrient_P2 equations: d[P1,t] = [0, 1, equations: d[P1,t] = [0, 1, ] ] P1 P1 nutrient_P2, nutrient_P2, d[P2,t] = d[P2,t] = [0, 1, [0, 1, ] ] P1 P1 nutrient_P2 nutrient_P2

process no_saturationprocess no_saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = Pequations: nutrient_P = P

process saturationprocess saturation variables: P {number}, nutrient_P {number}variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, equations: nutrient_P = P / (P + [0, 1, ])])

Constructing Process ModelsConstructing Process Models

model AquaticEcosystemmodel AquaticEcosystem

variables: nitro, phyto, zoo, nutrient_nitro, variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phytonutrient_phytoobservables: nitro, phyto, zooobservables: nitro, phyto, zoo

process phyto_exponential_growthprocess phyto_exponential_growth equations: d[phyto,t] = 0.1 equations: d[phyto,t] = 0.1 phyto phyto

process zoo_logistic_growthprocess zoo_logistic_growth equations: d[zoo,t] = 0.1 equations: d[zoo,t] = 0.1 zoo / (1 zoo / (1 zoo / 1.5) zoo / 1.5)

process phyto_nitro_consumptionprocess phyto_nitro_consumption equations: d[nitro,t] = equations: d[nitro,t] = 1 1 phyto phyto nutrient_nitro, nutrient_nitro, d[phyto,t] = 1 d[phyto,t] = 1 phyto phyto nutrient_nitro nutrient_nitro

process phyto_nitro_no_saturationprocess phyto_nitro_no_saturation equations: nutrient_nitro = nitroequations: nutrient_nitro = nitro

process zoo_phyto_consumptionprocess zoo_phyto_consumption equations: d[phyto,t] = equations: d[phyto,t] = 1 1 zoo zoo nutrient_phyto, nutrient_phyto, d[zoo,t] = 1 d[zoo,t] = 1 zoo zoo nutrient_phyto nutrient_phyto

process zoo_phyto_saturationprocess zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5)equations: nutrient_phyto = phyto / (phyto + 0.5)

HeuristicHeuristicSearchSearch

observationsobservations

generic processesgeneric processes

process modelprocess model

phyto, nitro, zoo, phyto, nitro, zoo, nutrient_nitro, nutrient_phytonutrient_nitro, nutrient_phyto

variablesvariables

Page 15: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

A Method for Process Model ConstructionA Method for Process Model Construction

1. Find all ways to instantiate known generic processes with 1. Find all ways to instantiate known generic processes with specific variables, subject to type constraints;specific variables, subject to type constraints;

2. Combine instantiated processes into candidate generic models 2. Combine instantiated processes into candidate generic models subject to additional constraints (e.g., number of processes); subject to additional constraints (e.g., number of processes);

3. For each generic model, carry out search through parameter 3. For each generic model, carry out search through parameter space to find good coefficients;space to find good coefficients;

4. Return the parameterized model with the best overall score.4. Return the parameterized model with the best overall score.

Our initial system, IPM, constructs process models from generic Our initial system, IPM, constructs process models from generic components in four stages:components in four stages:

Our typical evaluation metric is squared error, but we have also Our typical evaluation metric is squared error, but we have also explored other measures of explanatory adequacy. explored other measures of explanatory adequacy.

Page 16: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Results on Observations from Ross SeaResults on Observations from Ross Sea

We provided IPM with 188 We provided IPM with 188 samples of phytoplnkton, samples of phytoplnkton, nitrate, and ice measures nitrate, and ice measures taken from the Ross Sea. taken from the Ross Sea.

From 2035 distinct model From 2035 distinct model structures, it found accurate structures, it found accurate models that limited phyto models that limited phyto growth by the nitrate and growth by the nitrate and the light available. the light available.

Some high-ranking models Some high-ranking models incorporated zooplankton, incorporated zooplankton, whereas others did not. whereas others did not.

Page 17: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Results with Inductive Process ModelingResults with Inductive Process Modeling

population dynamicspopulation dynamics battery behaviorbattery behavior

hydrologyhydrology biochemical kineticsbiochemical kinetics

Page 18: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Extensions to Inductive Process ModelingExtensions to Inductive Process Modeling

heuristic beam search through the space of process models;heuristic beam search through the space of process models;

hierarchical generic processes that further constrain search;hierarchical generic processes that further constrain search;

an ensemble-like method that mitigates overfitting effects; an ensemble-like method that mitigates overfitting effects;

an EM-like method that deals with missing observations.an EM-like method that deals with missing observations.

In recent work, we have extended our system to incorporate:In recent work, we have extended our system to incorporate:

This approach has great potential to speed the construction of This approach has great potential to speed the construction of scientifc models – scientifc models – provided that domain users adopt itprovided that domain users adopt it. .

Page 19: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

specify a quantitative process model of the target system;specify a quantitative process model of the target system;

display and edit the model’s structure and details graphically;display and edit the model’s structure and details graphically;

simulate the model’s behavior over time and situations;simulate the model’s behavior over time and situations;

compare the model’s predicted behavior to observations; compare the model’s predicted behavior to observations;

invoke a revision module in response to detected anomalies.invoke a revision module in response to detected anomalies.

Because few scientists want to be replaced, we are developing an Because few scientists want to be replaced, we are developing an interactive environment, Pinteractive environment, PROMETHEUSROMETHEUS, that lets users:, that lets users:

The environment offers computational assistance in forming and The environment offers computational assistance in forming and evaluating models but lets the user retain control. evaluating models but lets the user retain control.

Interfacing with ScientistsInterfacing with Scientists

Page 20: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Viewing a Process Model GraphicallyViewing a Process Model Graphically

Page 21: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Viewing a Process Model as EquationsViewing a Process Model as Equations

Page 22: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Adding a Process ManuallyAdding a Process Manually

Page 23: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Requesting Automatic Model RevisionRequesting Automatic Model Revision

Page 24: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Results of Automatic Model RevisionResults of Automatic Model Revision

Page 25: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

liked the ability to alter models at an abstract process level; liked the ability to alter models at an abstract process level;

appreciated the ability to simulate models and plot variables;appreciated the ability to simulate models and plot variables;

believed the environment would be a good teaching tool; believed the environment would be a good teaching tool;

had some confusion about edge semantics in graphical display;had some confusion about edge semantics in graphical display;

had interference from STELLA’s stock and flow approach;had interference from STELLA’s stock and flow approach;

felt system would be more useful if it supported PDE models. felt system would be more useful if it supported PDE models.

An Initial User StudyAn Initial User Study

We asked three oceanographers to use PWe asked three oceanographers to use PROMETHEUSROMETHEUS to revise a to revise a model of the Ross Sea ecosystem; this study revealed that they: model of the Ross Sea ecosystem; this study revealed that they:

We plan to address the last three issues in our next version of the We plan to address the last three issues in our next version of the modeling environment. modeling environment.

Page 26: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Directions for Future ResearchDirections for Future Research

provide better ways to visualize models, data, and their relationprovide better ways to visualize models, data, and their relation

offer users more natural ways to define the space of modelsoffer users more natural ways to define the space of modelsspecifying constraints on relations among entities and processesspecifying constraints on relations among entities and processes

characterizing subsystems that decompose complex modelscharacterizing subsystems that decompose complex models

support finer-grained control of search through model spacesupport finer-grained control of search through model space

incorporate intuitive metrics like match to trajectory shapeincorporate intuitive metrics like match to trajectory shape more generally improve the usability of Pmore generally improve the usability of PROMETHEUS ROMETHEUS

Despite our progress to date, we need further work in order to:Despite our progress to date, we need further work in order to:

Taken together, these will make inductive process modeling a Taken together, these will make inductive process modeling a more robust approach to scientific model construction.more robust approach to scientific model construction.

Page 27: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

computational scientific discovery (e.g., Langley et al., 1983);computational scientific discovery (e.g., Langley et al., 1983);

theory revision in machine learning (e.g., Towell, 1991);theory revision in machine learning (e.g., Towell, 1991);

qualitative physics and simulation (e.g., Forbus, 1984);qualitative physics and simulation (e.g., Forbus, 1984);

languages for scientific simulation (e.g., languages for scientific simulation (e.g., STELLA, MATLABSTELLA, MATLAB););

interactive tools for data analysis (e.g., Schneiderman, 2001).interactive tools for data analysis (e.g., Schneiderman, 2001).

Intellectual InfluencesIntellectual Influences

Our approach to aiding scientific model construction incorporates Our approach to aiding scientific model construction incorporates ideas from many traditions:ideas from many traditions:

Our work combines, in novel ways, insights from machine learning, Our work combines, in novel ways, insights from machine learning, AI, programming languages, and human-computer interaction.AI, programming languages, and human-computer interaction.

Page 28: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Contributions of the ResearchContributions of the Research

a new formalism for representing scientific process models;a new formalism for representing scientific process models;

a computational method for simulating these models’ behavior;a computational method for simulating these models’ behavior;

an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes;

algorithms for inducing process models from time-series data;algorithms for inducing process models from time-series data;

an interactive environment for model construction/utilization.an interactive environment for model construction/utilization.

In summary, our work on computational discovery has produced:In summary, our work on computational discovery has produced:

We believe that PWe believe that PROMETHEUSROMETHEUS offers a promising approach for offers a promising approach for supporting innovative development of scientific models. supporting innovative development of scientific models.

Page 29: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

In MemoriamIn Memoriam

Herbert A. Simon (1916 – 2001)Herbert A. Simon (1916 – 2001)

Jan M. Zytkow (1945 – 2001)Jan M. Zytkow (1945 – 2001)

Five years ago, the field of computational scientific discovery Five years ago, the field of computational scientific discovery lost two of its founding fathers:lost two of its founding fathers:

Both contributed to the field in many ways: posing new problems, Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings.inventing methods, training students, and organizing meetings.

Moreover, both were interdisciplinary researchers who contributed Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics.to computer science, psychology, philosophy, and statistics.

Herb Simon and Jan Zytkow were excellent role models who we Herb Simon and Jan Zytkow were excellent role models who we should all aim to emulate. should all aim to emulate.

Page 30: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

End of PresentationEnd of Presentation

Page 31: Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona

Relevant Historical EventsRelevant Historical Events

1956 – Newell, Shaw, and Simon develop the Logic Theorist, which uses 1956 – Newell, Shaw, and Simon develop the Logic Theorist, which uses heuristic search to prove theoremsheuristic search to prove theorems

1966 – Simon proposes computational modeling of scientific discovery as 1966 – Simon proposes computational modeling of scientific discovery as problem solving through heuristic searchproblem solving through heuristic search

Late 1970s – Lenat and Langley use this framework to reproduce episodes Late 1970s – Lenat and Langley use this framework to reproduce episodes from the history of mathematics and physicsfrom the history of mathematics and physics

1987 – Langley, Simon, Bradshaw, and Zytkow report computational 1987 – Langley, Simon, Bradshaw, and Zytkow report computational accounts for a wide range of historical discoveriesaccounts for a wide range of historical discoveries

1990 – Valdes-Perez extends this approach to explanatory process models 1990 – Valdes-Perez extends this approach to explanatory process models in physical chemistryin physical chemistry

1994 onward – Researchers adapt these methods to discover knowledge in 1994 onward – Researchers adapt these methods to discover knowledge in many scientific domains, leading to refereed publicationsmany scientific domains, leading to refereed publications

1995 onward – The data mining movement redirects attention toward 1995 onward – The data mining movement redirects attention toward business applications and purely predictive modelsbusiness applications and purely predictive models

2006 – Challenges remain in constructing understandable scientific models2006 – Challenges remain in constructing understandable scientific models