informatica power center - workflow manager
TRANSCRIPT
Informatica PowerCenter Training - Day 3
2
Agenda day 3
Workflow Manager in Detail
Workflow Monitor
Error Logging
Labs
Workflow Manager – the Details
4
Workflow Manager - In Detail
Workflow manager is the key to loading the final product into the target database(s).
Used to manager how jobs run, order, criteria
Used for scheduling job runs
Used to notify users when a job as completed / failed
Used to partition loads and perform performance turning
5
Register Server
Similar to Relational Connection dialog Same parameters apply with the exception of
the new variables for Workflow logs.
6
Assign to Workflows
While folders are closed it is possible to assign server to a particular session
This dialog allows for individual or globally selected sessions to be assigned to run on a particular server
7
Links and Conditions
DefinitionDefinition
Links and their underlying conditions are what provide process control to the Links and their underlying conditions are what provide process control to the workflow. When an attached link condition resolves to TRUE then the attached workflow. When an attached link condition resolves to TRUE then the attached object may begin processing. There can be no looping and links can only execute object may begin processing. There can be no looping and links can only execute once per workflow. However more complex branching and decisions can be made once per workflow. However more complex branching and decisions can be made by combining multiple links to a single object or branching into decision type paths. by combining multiple links to a single object or branching into decision type paths. Each link has its own expression editor and can utilize upstream resolved object Each link has its own expression editor and can utilize upstream resolved object variables or user-defined variables for its own evaluation.variables or user-defined variables for its own evaluation.
Link conditionLink condition
8
Links and Conditions
Object VariablesObject VariablesThe default set of object The default set of object variables from a session variables from a session can provide more can provide more information than just a information than just a status of ‘Completed’. More status of ‘Completed’. More complex evaluation can be complex evaluation can be done for ErrorCode, done for ErrorCode, StartTime, StartTime, SrcSuccessRows, etc. SrcSuccessRows, etc.
In addition to the default In addition to the default object variables, User object variables, User Defined variables can be Defined variables can be created and populated via created and populated via parameter files or changed parameter files or changed in the workflow via in the workflow via Assignment tasks. Also any Assignment tasks. Also any upstream task that has upstream task that has completed can have its completed can have its variables utilized in variables utilized in downstream link conditions.downstream link conditions.
Object Object VariablesVariables
9
Tasks
Local Tasks – Sessions
Commands
Decision
Assignment
Timer
Control
Event Raise
Event Wait
Global (Reusable) Tasks – Sessions
Commands
Tasks are the default units of work for building the workflow. Global tasks are reusable across workflows. Local tasks are independent and self-contained within workflows.
10
Sessions
Session -> Workflow NotificationSession -> Workflow NotificationOptions can be set to treat conditional links attached to the object as AND/OR functionality. Also control option to fail the parent (container) if task fails or does not run.
Disabling a task in a workflow allows the task to be skipped instead of having to remove it.
Updated Updated parametersparameters
11
Sessions - Continued
ComponentsComponentsThe area where commands or email unique to this object can be defined. You can alternately The area where commands or email unique to this object can be defined. You can alternately select a reusable task to use as well. select a reusable task to use as well.
Choice of Choice of reusable or reusable or
local commandlocal command
12
Non-Reusable Commands
ComponentsComponentsRegardless of reusable or non-reusable it is necessary to name the object since there is Regardless of reusable or non-reusable it is necessary to name the object since there is potential to promote it.potential to promote it.
Option for local Option for local or reusableor reusable
Name of Name of command command
objectobject
13
Non-Reusable Commands
ComponentsComponentsThe properties tab allows for error control for commands/tasksThe properties tab allows for error control for commands/tasks
Error Control for Error Control for multiple multiple
commands/taskscommands/tasks
14
Sessions - Continued
PartitionsPartitionsNew partitioning scheme allows for repartitioning after Source Qualifier at almost any other New partitioning scheme allows for repartitioning after Source Qualifier at almost any other transformation object in the mapping. There are four main partition types Pass Through, Round transformation object in the mapping. There are four main partition types Pass Through, Round Robin, Hash Auto Keys, Hash User Keys.Robin, Hash Auto Keys, Hash User Keys.
Add Partition pointsAdd Partition points
Change Partition TypeChange Partition Type
15
Session Partitions (Partition Points)
Partition points mark thread boundaries as well as divide the pipeline into stages.
The partition point at the source qualifier marks the boundary between the first (reader) and second (transformation) stages. The partition point at the Aggregator transformation marks the boundary between the second and third (transformation) stages. The partition point at the target instance marks the boundary between the third (transformation) and fourth (writer) stage.
16
Session Partitions (Partition Types)
Round-robin partitioning. The Informatica Server distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.
Hash partitioning. The Informatica Server applies a hash function to a partition key to group data among partitions.
Key range partitioning. You specify one or more ports to form a compound partition key.
Pass-through partitioning. The Informatica Server passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.
17
Partitions Defined
First stage. To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through.
Second Stage. Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation.
Third Stage. To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the Informatica Server to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows.
Fourth Stage. Since the target tables are partitioned by key range, specify key range partitioning at the target to optimize writing data to the target.
18
Command Tasks
CommandCommandThe command object can be created globally under the Task Developer. It can also be The command object can be created globally under the Task Developer. It can also be promoted here from within a mapping. The command task is used to call a shell commands promoted here from within a mapping. The command task is used to call a shell commands during the workflow.during the workflow.
Created in Task Created in Task DeveloperDeveloper
19
Command Tasks
CommandCommandThe properties section homes the ability to either run all commands regardless or run them if The properties section homes the ability to either run all commands regardless or run them if each previous command completes. Commands tab is where the actual commands are created. each previous command completes. Commands tab is where the actual commands are created. One command per line. One command per line.
Process Control Process Control for multiple for multiple commandscommands
20
Email Tasks
EmailEmailEmail task is very similar to the command task since it can be either created in the Task Email task is very similar to the command task since it can be either created in the Task Developer or promoted from a mapping. The properties tab allows for an expression editor for Developer or promoted from a mapping. The properties tab allows for an expression editor for text creation utilizing the built-in variables.text creation utilizing the built-in variables.
Email text Email text creation dialogcreation dialog
Built-in Built-in VariablesVariables
21
Workflow Variables
Pre-defined VariablesPre-defined VariablesThis is the list of all pre-defined task level variables available to evaluate uponThis is the list of all pre-defined task level variables available to evaluate upon
Variable Task Type Datatype ** Supported Status Returns
ABORTED
DISABLED
FAILED
NOTSTARTED
STARTED
STOPPED
SUCCEEDED
Condition Decision Task IntegerEndTime All tasks Date/timeErrorCode All tasks IntegerErrorMsg All tasks Nstring*FirstErrorCode Session task IntegerFirstErrorMsg Session task Nstring*PrevTaskStatus All tasks IntegerSrcFailedRows Session task IntegerSrcSuccessRows Session task IntegerStartTime All tasks Date/timeStatus** All tasks IntegerTgtFailedRows Session tasks Integer
TgtSuccessRows Sessions Integer
TotalTransErrors Sessions Integer
* Variables of type Nstring can have a maximum length of 600 characters.
22
Workflow Variables
User-defined VariablesUser-defined VariablesVariables are created at the container level much like the mappings. (Workflows=Mappings, Variables are created at the container level much like the mappings. (Workflows=Mappings, Worklets=Mapplets). Once created, values can be passed to objects within the same container Worklets=Mapplets). Once created, values can be passed to objects within the same container for evaluation. (Assignment Task can modify/calculate variables)for evaluation. (Assignment Task can modify/calculate variables)
Edit VariablesEdit Variables
23
Workflow Variables
User-defined VariablesUser-defined VariablesA user-defined variable can assist in more complex evaluations. In the above example, an A user-defined variable can assist in more complex evaluations. In the above example, an external parameter file contains the number of expected rows. This in turn is evaluated against external parameter file contains the number of expected rows. This in turn is evaluated against the actual rows successfully read from an upstream session. $ signifies and is reserved for pre-the actual rows successfully read from an upstream session. $ signifies and is reserved for pre-defined variables. User defined variables should maintain $$ naming.defined variables. User defined variables should maintain $$ naming.
User Defined User Defined VariablesVariables
Pre-Defined Pre-Defined VariableVariable
24
Assignment Task
UsageUsageThe assignment task allows for the user to assign a value to a user-defined workflow variable. To The assignment task allows for the user to assign a value to a user-defined workflow variable. To use the assignment task first create and add the assignment task to the workflow. Then use the assignment task first create and add the assignment task to the workflow. Then configure the assignment task by assigning values or expressions to user defined variables. This configure the assignment task by assigning values or expressions to user defined variables. This assigned value will then be used for the remainder of the workflow.assigned value will then be used for the remainder of the workflow.
Edit VariablesEdit Variables
25
Event Task
UsageUsageEvent tasks are used to specify the sequence of task execution. The event is triggered based on Event tasks are used to specify the sequence of task execution. The event is triggered based on the completion of a sequence of tasks. Event-Raise task and Event-Wait task help to use event the completion of a sequence of tasks. Event-Raise task and Event-Wait task help to use event tasks in a workflow.tasks in a workflow.
Edit EventsEdit Events
26
Event Task
UsageUsageIf using Event tags then an Event Raise is used in conjunction with an Event Wait. In the above If using Event tags then an Event Raise is used in conjunction with an Event Wait. In the above example two branches are executed in parallel. The second session of the lower branch will example two branches are executed in parallel. The second session of the lower branch will remain in stasis until the upper branch completes triggering the event. The lower branches remain in stasis until the upper branch completes triggering the event. The lower branches event wait task recognizes the event and allows for the second session to start.event wait task recognizes the event and allows for the second session to start.
Event RaiseEvent Raise
Event WaitEvent Wait
27
Event Raise
UsageUsageTo configure the Event Raise task the drop-down box allows for selection of the appropriate To configure the Event Raise task the drop-down box allows for selection of the appropriate user-defined Event tag. This will create an entry in the repository for a matching event wait to user-defined Event tag. This will create an entry in the repository for a matching event wait to look for.look for.
28
Event Wait
UsageUsageThe event wait allows for configuration for an Event Raise (user-defined event) or existence The event wait allows for configuration for an Event Raise (user-defined event) or existence check for an indicator file.check for an indicator file.
User Defined User Defined EventEvent
Indicator FileIndicator File
29
Event Wait
UsageUsageThe properties section of the Event Wait task allows for further definition of behavior. If your The properties section of the Event Wait task allows for further definition of behavior. If your workflow has failed/suspended after Event Raise but before the Event Wait has resolved, then workflow has failed/suspended after Event Raise but before the Event Wait has resolved, then the Enable Past Events is able to recognize that the Event has happened already. If working the Enable Past Events is able to recognize that the Event has happened already. If working with indicator files you have the ability to either delete the file or allow it to stay in case some with indicator files you have the ability to either delete the file or allow it to stay in case some downstream Event Waits are also looking for that file.downstream Event Waits are also looking for that file.
Resume/Restart Resume/Restart SupportSupport
Flat-file CleanupFlat-file Cleanup
30
Decision Task
UsageUsageThe decision task allows for True/False based branching of process ordering. The Decision task The decision task allows for True/False based branching of process ordering. The Decision task can home multiple conditions and therefore downstream links can be evaluated simply upon the can home multiple conditions and therefore downstream links can be evaluated simply upon the Decision being True or False.Decision being True or False.
**Note it is possible to have the decision based on SUCCEEDED or FAILED of previous task, **Note it is possible to have the decision based on SUCCEEDED or FAILED of previous task, however if workflow is set to suspend on error than that branch is suspended and the decision however if workflow is set to suspend on error than that branch is suspended and the decision won’t trigger on a FAILED conditionwon’t trigger on a FAILED condition
31
Control Task
UsageUsageThe control task is utilized in a branching manner to present a level of stoppage during the The control task is utilized in a branching manner to present a level of stoppage during the workflow. Consider if too many sessions have too many failed rows. The options allow for workflow. Consider if too many sessions have too many failed rows. The options allow for different levels such as failing at the object level to Aborting the whole workflow.different levels such as failing at the object level to Aborting the whole workflow.
32
Timer Task
UsageUsageThe timer task has two main ways to be utilized. The first way is by absolute time that is time The timer task has two main ways to be utilized. The first way is by absolute time that is time evaluated by server time or a user-defined variable (that contains the date/time stamp to start).evaluated by server time or a user-defined variable (that contains the date/time stamp to start).
33
Timer Task
UsageUsageThe second usage is by Relative time that offers options of time calculated from when the The second usage is by Relative time that offers options of time calculated from when the process reached this (Timer) task, from the start of the container this task, or from the start of the process reached this (Timer) task, from the start of the container this task, or from the start of the absolute top-level workflow.absolute top-level workflow.
34
Practical
Business CaseBusiness CaseNeed for three sessions to wait for Need for three sessions to wait for indicator file(s) to start each one. indicator file(s) to start each one. Window of opportunity is only between Window of opportunity is only between 10PM and 2AM (next morning). A cutoff 10PM and 2AM (next morning). A cutoff time is needed to stop the process time is needed to stop the process (polling - not existing runs) so that new (polling - not existing runs) so that new activity does not continue between 2AM activity does not continue between 2AM and 10PM. Workflow is scheduled to run and 10PM. Workflow is scheduled to run everyday at 10PMeveryday at 10PM
Objects Used: •Assignment Task – Assigns the appropriate cutoff time for logic•File Wait Tasks – Polls for the appropriate Indicator files•Timer Task – Assigned to start based on the variable assigned by the Assignment task•Command Tasks – After cutoff time the commands will put an indicator file to release the polling
Link Logic – The remainder of the logic is contained within the links themselves. The main sessions evaluate end time of file wait tasks to the cutoff time. If within cutoff then sessions will run. If over cutoff sessions will not run. The cutoff branch also evaluates to see if file wait tasks are running over. If they are still running then the command tasks will fire.
35
Practical-Descriptive
Products.sql
36
Labs
Error Logging
38
Error Types
Transformation Error Data row has only passed partway through the mapping
transformation logic An error occurs within a transformation
Data reject Data row is fully transformed according to the mapping
logic Due to a data issue, it cannot be written to the target A data reject can be forced by an Update Strategy
39
Error Types
Error Log Options are set in the Session task (via Workflow Manager)
Error Type Logging OFF (default) Logging ONTransformation Errors
Written to session log then discarded
Appended to flat file or relational tables. Only fatal errors written to session log
Data rejects Appended to reject file (one .bad file per target)
Written to row error tables or file
40
Error Logging Off
Transformation Errors: Details and data written to the session log Data row is discarded If data flows concatenated, corresponding rows in
parallel flow are also discarded Data Rejects
Conditions causing data to be rejected include: Target database constraint violations, out-of-space errors, logspace
errors, null values not accepted Data-driven records, contain value ‘3’ or DD_REJECT (the reject
has been been forced by an update strategy) Target table properties ‘reject truncated/overflowed rows’
41
Error Logging to a Relational Database
Option set in Session Configuration
Results written to several tables: PMERR_SESS: Stores metadata about the session run
such as workflow name, session name, repository name, etc.
PMERR_MSG: Error messages for a row of data are logged in this table
PMERR_TRANS: Metadata about the transformation such as transformation group name, source name, port names with datatypes are logged in this table
PMERR_DATA: The row data of the error row as well as the source row data is logged here. The row data is in a string format such as [indicator1:data1 | indicator2 : data2]
42
Error Logging to a Flat File Option set in Session Configuration Format: Session metadata followed by de-normalized error information Sample session metadata:
Repository GID: 510u6f02-8733-11d7-9db7-00e01823c14dRepository: RowErrorLoggingWorkflow: w_unitTestsSession: s_customersMapping: m_customersWorkflow Run ID: 6079Worklet Run ID: 0Session Instance ID: 806Session Start Time: 10/19/2004 11:24:15Session Start Time (OTC): 1066587856
Row data format:Transformation || Transformation Mapplet Name || Transformation
Group || Partition Index || Transformation Row ID || Error Sequence || Error Timestamp || Error UTC Time || Error Code || Error Message || Error Type || Transformation Data || Source Mapplet Name || Source Name || Source Row ID || Source Row Type || Source Data
43
Log Source Row Data
Separate checkbox in session task Logs the source row associated with the error row
Logs metadata about source, e.g. Source Qualifier, source row ID, and source row type
NOTE: Source row logging is not available downstream of an Aggregator, Joiner, Sorter, or other transformation (where output rows are not uniquely correlated with input rows).
44
Labs