capturing provenance data

24
Capturing provenance data Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering

Upload: philip-herring

Post on 03-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Capturing provenance data. Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering. Purpose of presentation. to present the DAME provenance research to discuss the experiences of deploying this technology in a Grid based systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Capturing  provenance data

Capturing provenance data

Dr Alison McKay (in place of Dr Richard Bagshaw)

University of Leeds, School of Mechanical Engineering

Page 2: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Purpose of presentation

• to present the DAME provenance research • to discuss the experiences of deploying this

technology in a Grid based systems

Page 3: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Outline of presentation

• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?

Page 4: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

• Provenance Data– Recording the history of data and its place of origin

Page 5: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Provenance Database

Provenance Viewer Workflow Advisor

Workflow Script

Workflow Definition (BPEL)

Workflow InstanceWorkflow Instance

Workflow InstanceWorkflow Instance

Workflow InstanceService Instance

Workflow Manager

DAME Provenance Architecture

Page 6: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Outline of presentation

• What do we mean by “provenance data”?

• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?

Page 7: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

RR Integrated Product Development process

Stage 1

New Project Planning

Business ConceptDefinition

Identify the Need

Preliminary ConceptDefinition

Stage 2Full

ConceptDefinition

Stage 3Propulsion

SystemRealisation

Stage 4In-ServiceMonitoring

&TechnicalSupport

Capability Acquisition

EngineLaunch

Entryinto

Service

Page 8: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Provenance Requirement

LegalImplications

Audit Trail

Contractual Obligations

Troubleshooting

Re-run diagnosis

DAME provenance data users

Page 9: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

failure mode curvesPosition and shape depend on-engine type (from PDM/SDM)- engine state (eg, age)- events (eg, from QUOTE data)

this line shows when failure occurs – its positionand shape depends upon its operating environment

position of an engine, ie, its current state ofhealth

Time

Textra T

Potential benefits

Page 10: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Specific tasks to be supported

• Create an audit trail (Who, What, Where, Why, When, Which, hoW)

• Re-execute a workflow process– repeat a workflow process (same Grid resources & services,

sequence and data)

– rerun a workflow process (same Grid resources & services and sequence on different data)

Page 11: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Outline of presentation

• What are we aiming for?

• What does achieving this goal entail?• What progress has been made to date?• What remains to be done?

Page 12: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Initial requirements

Support the re-execution of workflows with new data *

Provide provenance data for the Workflow Advisor

Provide a viewer to captured provenance data

* As opposed to repeating a given workflow using the same data and resources

Page 13: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

DS&S perspective on requirements

• Origin of data fully traceable– (Including time and date stamps)

• Processed data traceable through application software

• Any human interaction/annotations must be captured

Page 14: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Research issues

Specify DefineExecute /

deploy

Product

Process

Product Data Management

system

Service Data Manager

Workflow process definition

Workflow execution data

Page 15: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Process definition (as defined)

processdefinition

process

processrelationship

compositionrelationship

connection relationship

processelement

processelement

relationship

(1)

(1)

[GRID]resource

GRIDresource

usage

start

enddate_and_

timename

description

id

resource

callee

caller

why_usedoutcome

executed_by

descriptionid

related relating

*

of

Page 16: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Case Workflow Resource

Case_idUser_idOpen_dateClose_dateFlight_start_dateDeadline_dateTail_numberAirlineAirportStandQuote_diagnosisQuote_statusEngineerEngineer_activeEngineer_whyAnalystAnalyst_activeAnalyst_whyExpertExpert_activeExpert_why

Workflow_sequence_numberWorkflow_idWorkflow_author_idWorkflow_nameWorkflow_descriptionWorkflow_start_dateWorkflow_end_dateWorkflow_ip_data_typeWorkflow_op_data_typeWorkflow_diagnosisWorkflow_status

Resource_sequence_numberResource_idResource_nameResource_typeResource_descriptionResource_start_timeResource_end_timeResource_locationResource_configurationResource_version_numberResource_statusResource_req_no_of_processorsResource_req_memoryResource_req_operating_systemResource_req_op_sys_ver_number

Process definition (as executed)

Page 17: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

MyGrid Workflow Provenance

• Workflow instance capture– Workflow overview

• Workflow ID, Status, Start Time, End Time, O/All input and outputs, Service List.

– Service Invocations• Status, Start Time, End Time,

WSDLURI, DataSets x 2.

– Inputs and Outputs• ID, Name, Type, Value

Page 18: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Outline of presentation

• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?

• What progress has been made to date?• What remains to be done?

Page 19: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Legend

Interface (transfer) resource

Data storage resource

Transient data resource

Compute resource

Application resource Interface (search) resource

User executed process step

XTOControl

FilesXTO

MySQL-SDM2XTOSDM

CR1

Look at SDM to select an engine

Get XTO control files for selected

engine

Run XTO for selected engine

Data interface GRID resource

Page 20: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

BOM data viewer

Product data

database

Software

(Java)

Software

(Java)

Software

(Microsoft .Net)

Web service: Database

Graphical user interface

Web service: Structure

constructor

Page 21: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Outline of presentation

• What do we mean by “provenance data”?• What are we aiming for?• What does achieving this goal entail?• What progress has been made to date?

• What remains to be done?

Page 22: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Remaining tasks

• Support the re-execution of workflows with new data• Provide provenance data for the Workflow Advisor• Provide a viewer for captured provenance data• Provide audit trail for accountability purposes

Page 23: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Provenance research issues

• Provenance requirements and scope• Provenance data security• Data storage format• Centralised provenance data• Stop points for audit trails• Repeatability of GRID resources

Page 24: Capturing  provenance data

Distributed Aircraft Maintenance Environment - DAME

Longer term research

Specify DefineExecute /

deploy

Product

Process

Product Data Management

system

Service Data Manager

Workflow process definition

Workflow execution data

Requirements definition

Workflow process specification