process/data api

25
Process/data API

Upload: cain-villarreal

Post on 03-Jan-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Process/data API. Process API - intro. The workflow engine runs applications Executable code in different languages API – methods Web services Applications require setup to run Where are they Where will they run (farm, local machine, specific machine Data IO Version etc. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Process/data API

Process/data API

Page 2: Process/data API

Process API - intro

• The workflow engine runs applications– Executable code in different languages– API – methods– Web services

• Applications require setup to run– Where are they– Where will they run (farm, local machine, specific

machine– Data IO– Version etc

Page 3: Process/data API

Process API - Intro

• We do this 2 ways• As a single object process

– We have defined a data object to hold things– We can use the same idea for the processAPI– Set up the object and “doIt”

• As setup calls and application call– Define setups for a process– Use a single call to run the process

Page 4: Process/data API

ProcessAPI

• The following are the fields within the WFE process object. (ignoring WFE specific)– Name & Human-readable name : not impt.– type– File : Where, could be URL– Data : see later– Runtime/fail time : does the API monitor these– parameters

Page 5: Process/data API

Process Object fields

• Type – Ie is this an exec, URL, and so on

• Process– The actual mapped process name. A Site specific mapping

will define the actual meaning of the process name

• Location : – Where is the application to run (client/server/farm), or

other things like URL.– Is it useful to have this in the WFE - XML file – or as a

separate process API XML setup. I would think the latter.

Page 6: Process/data API

Process-API

• Data– The WFE data object defines input and output at

run time – only mutability is class (static)– We have to pass data to a process, then it might

be sensible to put the process object– See the data API definition for the object.– Some object containers are data in and some are

data out – they need to have the same structure though.

Page 7: Process/data API

Process-API

• Runtime and failtime– These are WFE exception manager properties– It might not be a good idea reproduce the

exception outside the WFE as the WFE needs to handle any failure. Process failure must not be hidden from the WFE

Page 8: Process/data API

Process API

• Parameters– Probably a python dictionary is best here.– Needs to be exposed to the WFE since different

parts of the workflow may need different parameters (consider MAXIT)

Page 9: Process/data API

Process API

• The problem I have is defining which data object is which. The data object needs a definition so the program knows what the data – see process API.– Using python class object

ProcOb = ApiProcess()ProcOb.set( ‘name ‘,‘myAlignProg’)ProcObset(‘parameters’], ‘-P 33 –x ddd’)ProcOb.set(‘type’,‘exec’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output1’,data.ob[‘D3’])

These will of course be defined in the workflow engine variables.Note that adding of multiple data objects

Page 10: Process/data API

Process API

• Program Exec– Executable– Process : Use a mapped

name for application – site specific

– Location : local/server/farm – mapped names

– How do we know which objects are which ?

ProcOb = ApiProcess()ProcOb.set(‘type’,‘exec’)ProcOb.set(‘process’,‘maxit’)ProcOb.set(‘location’,’server’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output1’,data.ob[‘D3’])processAPI.run (procOb)

Page 11: Process/data API

Process API

• DataAPI copy– Copy data– Parameters = new version– Data objects – see later

ProcOb = ApiProcess()ProcOb.set(‘name ‘, ‘copy’)ProcOb.set(‘parameters’, ‘newVersion’)ProcOb.set(‘process’,‘method’)ProcOb.set(‘location’,’dataAPI’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘output’,data.ob[‘D3’])processAPI.run (procOb)

Page 12: Process/data API

Automated questions in XML• <wf:task taskID="TD3" name="SequenceOK" nextTask="J1" breakpoint="false">

<wf:description>Check whether the sequence align was OK</wf:description> <wf:decision type="AUTO"> <wf:dataObjectsLocation> <wf:location dataID="D6" type="input"/> </wf:dataObjectsLocation> <wf:nextTasks> <wf:nextTask taskID="TW4"> <wf:function dataID="D6" gte="20" less="200000000"/> </wf:nextTask> <wf:nextTask taskID="TM5"> <wf:function dataID="D6" gte="2" less="20"/> </wf:nextTask> <wf:nextTask taskID="T9"> <wf:function dataID="D6" gte="0" less="2"/> </wf:nextTask> </wf:nextTasks> </wf:decision> </wf:task>

Decision data object

Decision option

More complex functions will require python methods specific to the question

Page 13: Process/data API

Detail description to technology

• A data object is pre-declared in the XML– Data place holder– Defines API object detail

• A task object can reference data objects– As input, output or both

• A process task :• API method• Exec program

<wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject>

<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Page 14: Process/data API

Creating data objects in WFE• # the data object ID'

self.object.set("deposition-dataset-ID",depID) self.object.set("workflow-class-ID",classID) self.object.set("workflow-instance-ID",instID)

self.type = data.getAttribute("type") self.object.set("return-type",data.getAttribute("type")) if (data.getAttribute("mutable")=="true"): self.object.set("access",data.getAttribute("read-write")) else: self.object.set("access",data.getAttribute("read-only"))

# internal workflow cross reference self.name = data.getAttribute("dataID") self.nameHumanReadable = data.getAttribute("name")

for detail in data.childNodes: if (detail.nodeName == "wf:description"): self.description = detail.firstChild.data elif (detail.nodeName == "wf:location"): self.nameSpace = detail.getAttribute("namespace") self.object.set("data-object-name",detail.getAttribute("namespace")) self.where = detail.getAttribute("where") self.object.set("data-object-location",detail.getAttribute("where"))

Each data XML statement is stored as a reference object

This object is a place holder which can be passed to processes

It contains information where to access data

Page 15: Process/data API

The engine data object– May be a real or virtual payload of data– Where, what and type– Payload is passed between tasks– The WF is a data processing pipeline

• A real value can be examined to effect the WF• The path is dependent on data values (auto/manual

decisions are based on these values)• The data version is WF instance data

– Can be domain data (via dataAPI)– Can be WF data (via statusAPI) – scope defined by the

object the data is stored in

Page 16: Process/data API

Engine process manager• def run(self):

self.status = 1; for key, value in self.inputObjects istat = myApi.do(value)

• if self.task.uniqueType == "test": # test method - just counts for 5 seconds for i = in (0,5): time.sleep(1.0) elif self.task.uniqueType == "method": # this is an API process if self.task.uniqueWhere == "API": # this is an API method call self.processAPI.runMethod(task.uniqueName) elif self.task.uniqueType == "exec": # this is an exec program found "where" self.processAPI.runExec(task.uniqueName, task.uniqueWhere)

• for key, value in self.outputObjects istat = myApi.do(value)

self.statusAPI.setStatus(“finished”)

This is a thread – running inside exception manager

Send the request data objects

Get the response data objects

What sort of process is it ?

Page 17: Process/data API

Workflow granularity• It does not really matter• A process can be as complex as you like

– Depends on go-back granularity– Depends on “how much would loose if it crashed”

• Data is the problem !– The workflow is a flow of data – so hiding data from the engine will

collapse a workflow to nothing.– The pathway choice is all about data – the less visible the data – the less

choice in the workflow.– If a process decides what to do with data the consequence is :

• Loose go-back ability• Loose track of the data and what is going on• Loose plug and play on the process.• Loose exception management.

Page 18: Process/data API

Engine design examples

Read XML – store objects

and tasks

Run tasks – follow path

Start/restart (maybe at go-

back point)

Exit

Send data object

requests

Run process

Get response data objects

Send data objects to interface

Wait for interface

Send actionable

events

Get return action from

interface

Process task

Interface task

Page 19: Process/data API

John’s requirements 1• 1) Identify and copy and archive object

– Object declaration <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataCopy" type="Object" dependence="D1" mutable="true"> <wf:description>General object - new copy of data</wf:description> <wf:location namespace="__new_object" where="DM"/> </wf:dataObject>

– Task declaration• <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false">

<wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Name reference

The actual data

The process – a method within the API

Page 20: Process/data API

John’s requirement 2make new data version

• Declare data– Input D1– Output D2

• Declare task– Method in API

<wf:dataObjects> <wf:dataObject dataID="D1" name="dataToAddNewVersion" type="Object" mutable="true"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataNewVersion" type="Object" dependence="D1" mutable="true"> <wf:description>New version of data</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> </wf:dataObjects>

<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task create a new version of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APInewVersion" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Page 21: Process/data API

John’s requirement 3Get version list and show

• Data – 3 objects– D1 – object target– D2 – Version list– D3 – Which one to use

• Some tasks– Get list from API– Interface to chose

(not shown)

<wf:dataObject dataID="D1" name="dataObjectTarget" type="Object" mutable="false"> <wf:description>target object to query on</wf:description> <wf:location namespace="__object_name" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="VersionList" type="List" mutable="false"> <wf:description>Return version list</wf:description> <wf:location namespace="versionList" where="local"/> </wf:dataObject> <wf:dataObject dataID="D3" name="useVersion" type="Integer" mutable="true"> <wf:description>Version to use</wf:description> <wf:location namespace="version" where="WF"/> </wf:dataObject>

<wf:task taskID="T2" name="requestVersionList" nextTask="T3" breakpoint="false"> <wf:description>Run API to get the version list of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIversionList" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Page 22: Process/data API

John’s requirement 4/5data selector

• A data object may need additional qualifiers to say what it is.– Selector value– “selection”

• It is likely that the qualifier will :– need to be a WF class (static) variable– Need to be a WF inst (dynamic) variable.

<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="_entity.id=1" where="DM"/> </wf:dataObject>

<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="set_entity.type='protein' where entity.id=1" where="DM"/> </wf:dataObject>

Page 23: Process/data API

John’s requirement 6Length/size of object

• <wf:dataObject dataID="D1" name="dataTarget" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataLength" type="integer" dependence="D1" mutable="true"> <wf:description>Length of data object</wf:description> <wf:location namespace="dataLength" where="WF"/> </wf:dataObject>

<wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIObjectSize" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process>

Define object and place holder for size value

Run task to input data to function, and return length

Page 24: Process/data API

John’s requirement 7Format conversion

• <wf:dataObjects> <wf:dataObject dataID="D1" name="dataObjectPDB" type="Object" mutable="false"> <wf:description>General object to convert format</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataObjectMMCIF" type="Object" dependence="D1" mutable="true"> <wf:description>New data in different format</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> <wf:dataObject dataID="D3" name="status" type="string" dependence="D1" mutable="true"> <wf:description>A status code return</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> </wf:dataObjects>

• <wf:task taskID="T2" name="formatChange" nextTask="T9" breakpoint="false"> <wf:description>Run API task to change the format of data</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIformatChangePDBtoPDBx" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> <wf:location dataID="D3" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Input and output formats

Place holder for status – this might be so intrinsic to all tasks that it should probably be pre-declared and always present

And the API function to do this

Page 25: Process/data API