analytical pipelines

Post on 07-Jan-2016

47 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer Center. AP 0. AS x. TS 1. AS y. AS z. TS 2. AS r. Parameters w/Semantics. AS x. Library of Analysis steps & Analytical Pipeline. - PowerPoint PPT Presentation

TRANSCRIPT

Pipelines andPipelines andScientific Scientific WorkflowsWorkflows

with Ptolemy IIwith Ptolemy II

Deana PenningtonDeana PenningtonUniversity of New MexicoUniversity of New Mexico

LTER Network OfficeLTER Network Office

Shawn BowersShawn BowersUCSDUCSD

San Diego Supercomputer CenterSan Diego Supercomputer Center

Analytical Analytical PipelinesPipelines

ASx TS1 ASy ASz ASrTS2

ASx

TS1

Analysis Step in anExecution Environment:SAS, MATLAB, etc.

Transformation Step

ASx

AP0

Library ofAnalysis steps &Analytical Pipeline

ECO Taxon

Parameter Ontologies& Taxonomies

Semantic Mediation SystemLogic Rules Query Processing

Parameters w/Semantics

AP0

Scientific WorkflowsScientific Workflows

ASx TS1 ASy ASz ASrTS2

Search forrelevant

data(Query)

ASx TS1 ASz ASrTS2

ASrTS2

Iterative

SW0

BenefitsBenefits

•Reusable analysis steps, pipelines, and workflows•Formal documentation of methods

(output in report format)•Reproducibility of methods•Visual creation and communication of methods•Versioning•Automated data typing and transformation

Ptolemy II demoPtolemy II demo

Geographic Space Ecological Space

Projection back onto geography

Native range prediction

Invaded range prediction

Ecological Niche Ecological Niche ModelingModeling

Results used for integration with

other data realms (e.g., human populations, public health,

etc.)

Geospatial and remotely sensed data

Vegetation class

Precipitation

Modified from B. Michener

ecological niche modeling

vegetation class

Model of niche in ecological dimensions

pre

cip

itatio

n

Model type:•Linear regression (GRASP)•Genetic algorithms (GARP)

Biodiversity information … e.g., data from

museum specimens

Ecological Niche ModelsEcological Niche Models

Elevation (m)

Vegetation cover type

P, juniper, 2200m, 16CP, pinyon, 2320m, 14CA, creosote, 1535m, 22C

Sample 3, lat, long, absence

Mean annual temperature (C)

Access File

Excel File

Integrated data:

Sample 2, lat, long, presence

Sample 1, lat, long, presence

GARP Native-Species Pipeline GARP Native-Species Pipeline (informal)(informal)

Training sample

GARPrule set

Test sample

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Species pres. & abs.

points

GARP Native-Species Pipeline GARP Native-Species Pipeline (informal)(informal)

GARPrule set

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

GARPrule set

Training sample

Test sample

SampleData

Integrated layers

Species pres. & abs.

points

We will look at this

analytic step

+A3+A2

+A1

Sample Data: Basic Input/OutputSample Data: Basic Input/Output

parameters

SampleData

+A3+A2

+A1

Test Sample of Conditioned Data

Training Sample of Conditioned Data

Environmental Layers(temp., vegetation, etc.)

Species presence

points

input output

Presence under environmental

conditions

Dependent-Variable Coordinates

Independent-Variable Coordinates

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step (for conceptually define the analytic step (for discovery and integration)discovery and integration)

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Sample Data: Physical LevelSample Data: Physical Level

parameters

SampleData

+A3+A2

+A1

33.454606, 106.789098;33.454606, 106.789097; …

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

1, 56.25, 0, 20, …, 44;0, 57.34, 0, 55, …, 14;…

0, 77.33, 1, 50, …, 44;1, 56.01, 0, 55, …, 14;…

input output

An actual program thatimplements Sample Data

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

Data ascomma-delimited,

plain text files

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

GARP Native-Species Pipeline GARP Native-Species Pipeline (informal)(informal)

GARPrule set

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

GARPrule set

Training sample

Test sample

SampleData

Integrated layers

Species pres. & abs.

points

We will look at this

analytic step

+A3+A2

+A1

Sample Data: Basic Input/OutputSample Data: Basic Input/Output

parameters

SampleData

+A3+A2

+A1

Test Sample of Conditioned Data

Training Sample of Conditioned Data

Environmental Layers(temp., vegetation, etc.)

Species presence

points

input output

Presence under environmental

conditions

Dependent-Variable Coordinates

Independent-Variable Coordinates

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step (for conceptually define the analytic step (for discovery and integration)discovery and integration)

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Sample Data: Physical LevelSample Data: Physical Level

parameters

SampleData

+A3+A2

+A1

33.454606, 106.789098;33.454606, 106.789097; …

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

1, 56.25, 0, 20, …, 44;0, 57.34, 0, 55, …, 14;…

0, 77.33, 1, 50, …, 44;1, 56.01, 0, 55, …, 14;…

input output

An actual program thatimplements Sample Data

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

Data ascomma-delimited,

plain text files

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Logical descriptionsLogical descriptions

Recall that a schema sets the Recall that a schema sets the allowable structure for dataallowable structure for data

Employee

name : string age : integer ssn : string title : string salary : int

Smith 40 555-… 5

Jones 36 555-… 4

Davis 22 555-… 2

Clark 50 555-… Mgr. 75000

Lewis 36 555-… Sales 40000

These tables are not allowable instancesof the logical description

Allen

Young

too many columnstoo few columns,wrong datatypes

Sample Data: Logical LevelSample Data: Logical Level

parameters

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, temp, veg, …, zn)

input output

sample2(pres, temp, veg, …, zn)

2-dimensional matrix

List of 3-dimensional matrices, one matrix per

environmental layer

Relation of n+1 attributesfor n environmental layers

Why have the Logical Level?Why have the Logical Level?Data independenceData independence

Hides the details of Hides the details of howhow information is information is represented (text or binary files) from represented (text or binary files) from whatwhat is is represented (a table of integers)represented (a table of integers)

Reduced application development timeReduced application development timeMakes information more easily reusable, for Makes information more easily reusable, for

example, by other applications or services – example, by other applications or services – with programs for handling the with programs for handling the physical/logical levelphysical/logical level

Can help enable integrationCan help enable integrationExplicit knowledge of the structure and types Explicit knowledge of the structure and types

of data can help automate conversion, for of data can help automate conversion, for example, by using higher-level languagesexample, by using higher-level languages

Choosing a logical Choosing a logical representationrepresentation

parameters

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, temp, veg, …, zn)

input output

sample2(pres, temp, veg, …, zn)

2-dimensional matrix

List of 3-dimensional matrices, one matrix per

environmental layer

Relation of n+1 attributesfor n environmental layers

Can you see any potential problems with this choice of logical output?

Choosing a logical Choosing a logical representationrepresentation

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, z1, z2, …, zn)

sample2(pres, z1, z2, …, zn)

Service

avail(pres, temp, veg, elev)

The output structure is dependent on the input data…

?

+A3+A2

+A1

GARP Native-Species Pipeline GARP Native-Species Pipeline (informal)(informal)

GARPrule set

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

GARPrule set

Training sample

Test sample

SampleData

Integrated layers

Species pres. & abs.

points

We will look at this

analytic step

+A3+A2

+A1

Sample Data: Basic Input/OutputSample Data: Basic Input/Output

parameters

SampleData

+A3+A2

+A1

Test Sample of Conditioned Data

Training Sample of Conditioned Data

Environmental Layers(temp., vegetation, etc.)

Species presence

points

input output

Presence under environmental

conditions

Dependent-Variable Coordinates

Independent-Variable Coordinates

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step (for conceptually define the analytic step (for discovery and integration)discovery and integration)

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Sample Data: Physical LevelSample Data: Physical Level

parameters

SampleData

+A3+A2

+A1

33.454606, 106.789098;33.454606, 106.789097; …

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

1, 56.25, 0, 20, …, 44;0, 57.34, 0, 55, …, 14;…

0, 77.33, 1, 50, …, 44;1, 56.01, 0, 55, …, 14;…

input output

An actual program thatimplements Sample Data

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

Data ascomma-delimited,

plain text files

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Logical descriptionsLogical descriptions

Recall that a schema sets the Recall that a schema sets the allowable structure for dataallowable structure for data

Employee

name : string age : integer ssn : string title : string salary : int

Smith 40 555-… 5

Jones 36 555-… 4

Davis 22 555-… 2

Clark 50 555-… Mgr. 75000

Lewis 36 555-… Sales 40000

These tables are not allowable instancesof the logical description

Allen

Young

too many columnstoo few columns,wrong datatypes

Sample Data: Logical LevelSample Data: Logical Level

parameters

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, temp, veg, …, zn)

input output

sample2(pres, temp, veg, …, zn)

2-dimensional matrix

List of 3-dimensional matrices, one matrix per

environmental layer

Relation of n+1 attributesfor n environmental layers

Why have the Logical Level?Why have the Logical Level?Data independenceData independence

Hides the details of Hides the details of howhow information is information is represented (text or binary files) from represented (text or binary files) from whatwhat is is represented (a table of integers)represented (a table of integers)

Reduced application development timeReduced application development timeMakes information more easily reusable, for Makes information more easily reusable, for

example, by other applications or services – example, by other applications or services – with programs for handling the with programs for handling the physical/logical levelphysical/logical level

Can help enable integrationCan help enable integrationExplicit knowledge of the structure and types Explicit knowledge of the structure and types

of data can help automate conversion, for of data can help automate conversion, for example, by using higher-level languagesexample, by using higher-level languages

Choosing a logical Choosing a logical representationrepresentation

parameters

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, temp, veg, …, zn)

input output

sample2(pres, temp, veg, …, zn)

2-dimensional matrix

List of 3-dimensional matrices, one matrix per

environmental layer

Relation of n+1 attributesfor n environmental layers

Can you see any potential problems with this choice of logical output?

Choosing a logical Choosing a logical representationrepresentation

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(pres, z1, z2, …, zn)

sample2(pres, z1, z2, …, zn)

Service

avail(pres, temp, veg, elev)

The output structure is dependent on the input data…

?

+A3+A2

+A1

Choosing a logical Choosing a logical representationrepresentation

SampleData

+A3+A2

+A1matrix[x, y]

list(matrix[x, y, z])

sample1(obs, property, value)

sample2(obs, property, value)

Service

avail(obs, property, value)

Reusability is easier when the logical representation is known ahead of time…

Analytic-Step AbstractionsAnalytic-Step Abstractions

Physical LevelPhysical LevelAn analytic step is a particular software An analytic step is a particular software

implementation that takes and produces implementation that takes and produces physical data (for example, files) physical data (for example, files)

Logical LevelLogical LevelDefines the structure of input and output Defines the structure of input and output

(like a database schema)(like a database schema)

Semantic LevelSemantic LevelUses ontological information to Uses ontological information to

conceptually define the analytic step conceptually define the analytic step (for discovery and integration)(for discovery and integration)

Sample Data: Semantic Sample Data: Semantic input/outputinput/output

EcologicalModel

BiodiversityModel

EcoNicheModel

RegressionBased ENM

LogisticRegression

RegressionModel

StatisticalModel

usesRegressionModel

DependentVariable

IndependentVariable

StatisticalVariable

StatisticalContext

hasIndVarhasDepVar

hasContext

Putting it all togetherPutting it all together

parameters

SampleData

+A3+A2

+A1

input output

Physical = DataLogical + Semantic Metadata

list(matrix[x, y, z])33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

33.454606, 106.789098, 56.25;33.454606, 106.789097, 56.37;…

IndependentVariable

hasContextGridCoordinate

StatisticalContext

DependentVariable

hasContextGrid

Coordinate

StatisticalContext

matrix[x, y]33.454606, 106.789098;33.454606, 106.789097; …

StatisticalDataset

DependentVariable

IndependentVariable

hasDepVar

hasIndVar

sample1(obs, property, value)1, 56.25, 0, 20, …, 44;0, 57.34, 0, 55, …, 14;…

StatisticalDataset

DependentVariable

IndependentVariable

hasDepVar

hasIndVar

sample2(obs, property, value)1, 56.25, 0, 20, …, 44;0, 57.34, 0, 55, …, 14;…

Domain WorkflowDomain Workflow

Training sample

GARPrule set

Test sample

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Species pres. & abs.

points

Generic WorkflowGeneric Workflow

Training sample

GARP (or other)

rule set

Test sample

OccurrenceData

Binary, Categorical or Numeric

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Prediction map

Environmental

layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Temperature Interpolation Temperature Interpolation WorkflowWorkflow

Training sample

GARPrule set

Test sample

Weather stationtemperature

data

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Prediction map:

Interpolated temperature

grid

Environmental

layers:elevation, aspect,

land cover

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Extending Workflows: Extending Workflows: ClimateClimate

ASx TS1 ASy ASz ASrTS2

Current environmental layers:

Prediction maps under current conditions

ASx TS1 ASy ASz ASrTS2

Changed environmental layers:

Prediction maps under changed conditions

Compare to get predictedeffect of environmental

change on species

Prediction model fromnative area

Extending Workflows: Extending Workflows: InvasionInvasion

ASx TS1 ASy ASz ASrTS2

Native area occurrence and environmental layers:

Prediction maps innative area

ASx TS1 ASy ASz ASrTS2

Invasion area environmental layers:

Prediction maps in invasion area

Prediction model fromnative area

ProcessProcess

1.Create the domain workflow at a conceptual level

2.Define the physical and logical data types for each step

3.Define the ontological data types for each step, for both the domain and a generic ontology

4.Map the domain workflow to a generic workflow

5.Map the generic workflow to other domain workflows

ExerciseExercise Divide into two groups (roughly half in each): Divide into two groups (roughly half in each):

1.1. Climate changeClimate change2.2. Invasive speciesInvasive species

Download generic workflow from:Download generic workflow from:ftp://ftp.lternet.edu/pub/outgoing/penningdftp://ftp.lternet.edu/pub/outgoing/penningd

Work on conceptual workflows that:Work on conceptual workflows that:1.1. Reuse the generic pipelineReuse the generic pipeline2.2. Extend the generic pipelineExtend the generic pipeline3.3. Create new pipelinesCreate new pipelines

Use Power Point, Visio, or paper tablets…Use Power Point, Visio, or paper tablets…your choice!your choice!

top related