8.0 master data extract guide_reva

MASTER DATA EXTRACT GUIDE

Release 8.0

Revision A

Initiate™, InitiateSM and Initiate Identity Hub™ are trademarks and/or service marks of Initiate Systems, Inc., which may be registered in some jurisdictions. All rights reserved. All other marks are owned by their respective owners. The information in this document is protected under the applicable federal law as an unpublished work, and is confidential and proprietary to Initiate Systems, Inc. Its use, disclosure, reproduction, or publication, in whole or in part, without the express prior written consent of Initiate Systems, Inc. is prohibited.

iii

Table of Contents

ABOUT THIS MANUAL ........................................................................................................ 4 Audience and purpose.............................................................................................. 4 Organization ............................................................................................................ 4 Additional reference documentation ........................................................................ 4 How to get help........................................................................................................ 4 ATSC........................................................................................................................ 4 Support Center Knowledge Base .................................................................................. 5 Acknowledgements .................................................................................................. 5

CHAPTER 1: MASTER DATA EXTRACT OVERVIEW .................................................................... 6 Clover.ETL basics ..................................................................................................... 6 The Master Data Extract Sample Graphs .................................................................. 7

CHAPTER 2: USING THE MASTER DATA EXTRACT SAMPLE GRAPHS............................................. 10 Importing the sample graphs................................................................................. 10 Configuring Readers............................................................................................... 15 Creating a database connection.................................................................................. 15 Specifying a database connection for each Reader ........................................................ 16 Configuring the extract_full_all.grf sample graph.................................................. 17 Configuring the extract_incremental_db.grf sample graph .................................... 19 Parameters for incremental extraction......................................................................... 19 Configuring the extract_incremental_file.grf sample graph ................................... 23 Parameters for incremental extraction......................................................................... 23 Running a graph..................................................................................................... 25 Troubleshooting graphs ......................................................................................... 25 Debugging a graph Edge ........................................................................................... 25 Viewing logs and error messages ............................................................................... 26 Automatic graph execution .................................................................................... 26 Using the madconfig utility to create a properties file for a scheduled job ......................... 26 Using madconfig to launch a graph using a specified properties file ................................. 26 Recording responses to the madconfig utility ............................................................... 27 Using extract.ddl to create target database schema .............................................. 27

4

About this manual

Audience and purpose This guide is intended for solution architects and developers responsible for development of Extract Transfer Load (ETL) graphs for data extraction. Through discussion of sample graphs, this guide describes how to extract data from Master Data Engine database tables, transform it for use with downstream applications such as reporting and analytic tools, and write the transformed data to database tables or extract files.

Organization The information presented includes:

Contents of Manual

In Chapter… You will find…

1 Overview of the Master Data Extract application

2 Detailed information about using the Master Data Extract sample graphs

Additional reference documentation For additional information, refer to the following documents:

Workbench User Guide

Initiate Master Data Service™ Data Model Description

Clover Documentation:

The Clover.GUI User’s Guide, which can be downloaded at http://www.cloveretl.org/documentation/clover-gui

The Clover.ETL Wiki at http://wiki.clovergui.net/doku.php

How to get help

ATSC

Each organization designates two (2) or more individuals to act as Authorized Technical Support Contacts (ATSC) for Initiate® software issues. These individuals interact with users in your organization and, when necessary, work with the technical support staff at Initiate Systems, Inc. to resolve issues.

When you have questions or concerns about the software, and if the information in this guide does not answer your questions, contact your ATSC. Your ATSC will try to determine if the problem is a hardware system issue or an operational issue before contacting Initiate Systems for assistance.

About this manual

5

Support Center Knowledge Base

We realize that you might have questions that may not be addressed in the documentation, training, or within your standard workflow procedure. The Initiate Systems Customer Support Web site (http://www.initiatesystems.com/support) provides a knowledge base that offers additional information about Initiate® products and their use. New items are frequently added, so please refer to the Web site when possible.

Acknowledgements Third party software code files are shipped along with the Initiate® 8.0 (the “Third Party Code”) software. Third Party Code files are the property of their respective owners and not Initiate Systems and Initiate Systems claims no rights in or to the Third Party Code. Your use and access to the Third Party Code is governed by the specific restrictions and limitations set forth in the applicable licenses provided by the Third Party Code owners. The Third Party Code is provided to you by Initiate solely for use with the Initiate® software product and Initiate Systems does not authorize or promote any other use of the Third Party Code by you. The full text of the applicable Third Party Code licenses is provided in the Third Party License.zip file included along with the Initiate® Release 8.0, located on the Initiate Systems product CD or downloaded CD image.

6

Chapter 1: Master Data Extract overview

Master Data Extract uses Clover.ETL, an open-source Extract Transfer Load utility, to extract data from the Master Data Engine to external files, for use with reporting and analytical systems. Extracts are designed and executed as graphs in Clover.ETL, and can be either full or incremental.

Master Data Extract provides several sample graphs which illustrate how the Extract Transfer Load process works. The sample graphs extract entity-level attribute data from the Master Data Engine database, and write it to a variety of output options.

The sample graphs are designed to be examples of how to use the Clover.ETL application; some configuration is necessary in order to use the graphs with your own data. In addition, graphs can be edited and customized according to your specific data extraction requirements.

Clover.ETL basics For basic information on using the Clover.ETL application, refer to the Initiate™ Workbench User Guide and to the Clover documentation.

Master Data Extract overview

The Master Data Extract Sample Graphs Master Data Extract provides three sample data extract graphs:

extract_full_all.grf: This graph does a full extract of entity-level attribute data from the Master Data Engine database, removes duplicate entities, and writes the output to a selected target file or database. The graph consists of several “subgraphs” or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

7


extract_incremental_db.grf: This graph does an incremental extract of entity-level attribute data from the Master Data Engine database, filters it based on audit-record parameters supplied by the user as a configuration parameter, removes duplicates, and writes the output to a specified database. The graph consists of several “subgraphs” or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

8


extract_incremental_file.grf: Like the extract_incremental_db.grf, this graph does an incremental extract of entity-level attribute data and filters it on user-supplied audit record numbers, and removes duplicates. The output of this graph is written to a series of delimited files. The graph consists of several “subgraphs” or series of connected Reader, Transformer, and Writers, which operate in parallel; each subgraph reads data from a specific database table in the Master Data Engine database.

9

10

Chapter 2: Using the Master Data Extract sample graphs

This chapter provides information about how to configure each of the sample graphs for use with your data.

Each graph consists of several components:

Readers read data from an external source such as a database or file. Before you can use a graph, the Readers must be configured with parameters for connecting to these external sources.

Note: Readers for each of the sample graphs are configured in the same manner; therefore Reader configuration is described independently of the specific sample graphs.

Transformers perform operations on data, such as sorting, filtering, merging, and deduplicating. The Transformers in the sample graphs have been configured to process data as needed for each type of extract, you may want to edit the Transformers to adjust how data is handled. Also, in some cases you will need to delete some Transformers which copy data to Writers that you do not plan to use.

Writers write processed data to specified target files, such as database tables or a designated flat file. Before you can use a graph, Writers must be configured with parameters that specify the target output file(s).

Importing the sample graphs You must import the sample graphs into Workbench in order to access them in Clover.ETL.

To import the sample graphs:

1. In the Navigator view, right-click on the Project folder you want to import the sample graphs into, and choose Import.

Using the Master Data Extract sample graphs

2. In the Import - Select dialog, navigate to and select Import graphs – version conversion (in the Clover ETL node).

3. Click Next.

11


4. In the Import – Clover ETL Graphs dialog, click the Browse button beside the From directory field.

12


5. Navigate to and select the <ROOTDIR>\Workbench x.x.x\samples\graphs directory (where <ROOTDIR> is your Initiate program files installation directory and x.x.x is your application version number).

6. Click OK.

7. The Into folder field lists the folder into which the graphs will be imported; the field you right-clicked on in Step 1 is displayed here by default. If you wish to specify a different folder, click the Browse button beside the Into folder field to browse to and select another folder.

13


8. The Import – Clover ETL Graphs window is now populated with all available sample graphs. Check the boxes for the graphs you want to import:

extract_full_all.grf

extract_incremental_db.grf

extract_incremental_file.grf

9. Click Finish.

14


15

Configuring Readers The Readers in the sample graphs query database tables in parallel, ordering results by entity record number and modified audit record number. Before executing the graph, each of the Reader elements must be configured with the appropriate database connection information.

Before you can specify a database connection for your Reader(s), you must create a database connection. Once the database connection is created, it can be used for all your Reader(s) in the sample graph.

Creating a database connection

To create a database connection:

1. In Outline view, right-click Connections and choose Connections > Create internal. This opens the database connections window.

Note: You must have a graph open in the Graph editor to see the nodes, including the Connection node, in the Outline view.

2. Click to select a database driver from the available drivers window.

Note: It is recommended that you use one of the supplied Initiate drivers.

3. Enter a Name for your connection.

4. Enter the User and Password for connecting to your database.

5. In the URL field, enter the appropriate values for the database parameters:

hostname

port

database (for MSSQL, DB2, and Informix databases)

SID (for Oracle databases)

6. Click the Validate Connection button to validate your database connection.

7. Click Finish.


Specifying a database connection for each Reader

Once the database connection has been created, you must edit each of the Readers to reference this connection.

To specify a database connection in a Reader:

1. Double-click the Reader to open the Edit component dialog.

2. On the Properties tab, under Clover.ETL properties – basic, click in the Value field for DB connection. A down arrow appears.

3. Click the down arrow and select the database connection you created for this database.

4. Click OK to save your changes and close the Edit component dialog.

Note: You must specify a database connection for each of the Readers in the sample graph.

16


Configuring the extract_full_all.grf sample graph The extract_full_all.grf sample graph must be edited to add database connection properties to the Readers. In addition, this graph provides several types of Writer for writing the output; before running the graph you must select the Writers you want to use, and remove the ones you will not use.

Note: Each sample graph consists of several “subgraphs” or connected series of Readers, Transformers, and Writers, which operate in parallel when the graph is executed. The section below describes how to edit one of these subgraphs. You will need to repeat the steps below for each subgraph in your sample graph.

To configure the extract_full_all.grf sample graph:

1. In the Navigator view, double-click the extract_full_all.grf sample graph to open it in the Graph editor.

2. Edit the Reader to provide database connection parameters. Detailed information on how to configure a Reader to connect to a database is given in the Configuring Readers section above.

17


3. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:

Sample graph Writer types

Database or file type Name format Example

Oracle oracle_data type oracle_name

DB2 db2_data type db2_ssn

MSSQL mssql_data type mssql_phone

Delimited file delimited_data type delimited_addr

Note: You can also disable a Writer by right-clicking the Writer and choosing Disable.

4. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer.

5. Use the Select tool to drag the Edge linking the Dedup Transformer to the Copy Transformer so that it connects the Dedup Transformer to the input port of your remaining Writer instead.

6. Delete the Copy Transformer.

7. If you are using a database Writer, connect the Writer to a database:

A. Double-click the Writer to open the Edit component dialog.

B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.

Required Oracle properties

Property Value

Path to sqlldr utility

The path to Oracle’s SQL Loader (sqlldr) utility. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility.

User name The user name for connecting to the database

Password The password for connecting to the database

18


19

Required Oracle properties

Property Value

TNS name The transparent network substrate (TNS) name identifier

Required DB2 properties

Property Value

Database The database to which this data will be written



Database table The name of the database table where this data will be written

Required MSSQL properties

Property Value

Path to bcp utility Path to the utility that copies data between Microsoft® SQL Server™ and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility.


C. Click OK to save your changes and close the Edit component dialog.

8. Repeat the steps above as needed to edit each of the subgraphs in the sample graph.

Refer to the Running a graph section below for information on how to run your graph once it is configured.

Configuring the extract_incremental_db.grf sample graph The extract_incremental_db.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. In addition, this graph provides several Writers for writing the output; before running the graph you must select the Writer you want to use, configure a database connection for it, and remove the ones you will not use.

Parameters for incremental extraction

The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information.



To configure the extract_incremental_db.grf sample graph:

1. In the Navigator view, double-click the extract_incremental_db.grf sample graph to open it in the Graph editor.


20


3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.

Note that parameters listed here apply to the graph as a whole, and are not edited for individual components.

Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information.

4. Using the Select tool from the Palette, select and delete each of the Writers you do not wish to use. Each subgraph includes the following Writer types; delete all but the type you wish to use:

Sample graph Writer types

Database or file type Name format Example

Oracle oracle_data type oracle_name

DB2 db2_data type db2_ssn

MSSQL mssql_data type mssql_phone

Note: You can also disable a Writer by right-clicking the Writer and choosing Disable.

5. Using the Select tool, delete the Edge linking the Copy Transformer to your remaining Writer.

21


6. Use the Select tool to drag the Edge linking the Reformat Transformer to the Copy Transformer so that it connects the Reformat Transformer to the input port of your remaining Writer instead.

7. Delete the Copy Transformer.

8. Edit the Writer to connect to a database:

A. Double-click the Writer to open the Edit component dialog.

B. On the Properties tab, enter the relevant required properties according to the tables below. Required properties with missing values are flagged with a yellow exclamation-point icon.

Required Oracle Properties

Property Value

Path to sqlldr utility

The path to Oracle’s SQL Loader (sqlldr) utility.



TNS name The transparent network substrate (TNS) name identifier

Required DB2 Properties

Property Value




Database table The name of the database table where this data will be written

Required MSSQL Properties

Property Value

Path to bcp utility Path to the utility that copies data between Microsoft® SQL Server™ and a data file. Click in this field to display an ellipsis, then click on the ellipsis to browse to the utility.


22


23

C. Click OK to save your changes and close the Edit component dialog.


Configuring the extract_incremental_file.grf sample graph The extract_incremental_file.grf must be edited to add database connection properties to the Reader, and configure appropriate attribute and audit record number parameters. Writers have been configured to write output to specified files; you can edit the Writer properties to edit the name and location if you wish, but further configuration of the Writer is not required.

Parameters for incremental extraction

The parameters for the Transformers in this graph determine which attributes are read from the database, and a range of auditor record numbers which specify which records to extract data from. In a typical use case, you will configure the attributes one time (typically, within the graph itself) to determine which attributes are extracted, but the audit record range will typically be updated each time you run the graph. As an alternative to manually setting a range of auditor record numbers in the graph each time you run it, you can set up an automated, scheduled job to automatically update a parameter file with current values. See the Automatic graph execution section below for detailed information.



To configure the extract_incremental_file.grf sample graph:

1. In the Navigator view, double-click the extract_incremental_file.grf sample graph to open it in the Graph editor.


24


3. Verify that the parameters for attributes and auditor record numbers are correct. Parameters are listed in the Outline view, in the Parameters node.

Note that parameters listed here apply to the graph as a whole, and are not edited for individual components.

Note: In a typical use case, you will edit the attribute record parameters on a one-time basis as part of general graph configuration, but update the auditor record numbers each time you run the graph. You can use the madconfig utility to populate the auditor record number parameters via a scheduled job. See the Automatic graph execution section below for more information.


Running a graph To run a graph, click the Run icon in the toolbar, or choose Run > Run from the menu. When a graph is run, the number of records processed along each Edge is displayed.

For detailed information about graph runtime options, refer to the Clover documentation.

Troubleshooting graphs Use the following processes and tools to troubleshoot your graphs.

Debugging a graph Edge

To debug a graph Edge, right-click on the Edge and choose Debug > Enable Debug. A green “bug” icon is displayed on edges with debugging enabled. Debug information is captured when the graph is run.

25


26

You can view debug data after the graph is run by right-clicking the edge and choosing Debug > View Data.

Viewing logs and error messages

Warning and error messages, processing information, and graph status are captured on the Console, Problems, Clover–Graph tracking, and Clover Log views. Refer to the Clover documentation for detailed information about the contents of these tabs.

Automatic graph execution Clover.ETL graphs can be executed automatically as part of a scheduled job, using the madconfig utility. You may wish to create an external properties file which the scheduled madconfig utility can reference, if you want to update parameters such as audit record number range when you run the scheduled job.

Using the madconfig utility to create a properties file for a scheduled job

Incremental extracts typically select data based on a range of audit record numbers which change each time the graph is run. Although you may manually set the range of record numbers to extract manually in the graph, it may be more practical to generate a properties file automatically via a scheduled job. The properties file then supplies the graph with the appropriate values for the record number range.

This section describes how to use the madconfig utility to launch a graph using a designated, external properties file. You can set up a scheduled job to launch the madconfig utility on a regular basis.

Note: It is outside the scope of this document to describe how to set up a scheduled job which generates the properties file. You can use a standard utility such as the Windows Task Scheduler or a Unix chron utility (or other methods) to set up a scheduled job.

Using madconfig to launch a graph using a specified properties file

This madconfig operation requires a properties file containing auditor record number files.

To use madconfig to launch a Clover.ETL graph:

1. From a command prompt, run madconfig launch_etl

Note: This utility is run from the <ROOTDIR>\Engine 8.0.0\scripts directory

2. When prompted, enter the path to the graph (*.grf file) you want to run.

3. When prompted, enter the path to your configuration file (that is, the file containing the properties for your graph’s audit record number parameters).

4. When prompted, enter a memory size setting. 256 is the default.

Note: Complete documentation of the madconfig utility is found in the Initiate Master Data Service™ Master Data Engine Installation Guide.


27

Recording responses to the madconfig utility

If you want to launch a graph via madconfig on a scheduled basis, you can record a set of responses to the madconfig utility’s prompts.

To record a set of responses to the madconfig launch_etl function, run madconfig -recordfile myfile.properties launch_etl where myfile.properties is the name of the file which will store your responses.

Note: In addition to recording your responses, this command also executes the graph.

To run madconfig using the recorded responses, run madconfig –propertyfile myfile.properties lauch_etl, where myfile.properties is the name of the file where your responses are stored.

Using extract.ddl to create target database schema An extract.ddl file is provided as a convenience for creating target database schema with the maddbx utility. For detailed information on how to use maddbx with a *.ddl file to create database schema, refer to the Master Data Engine Installer Guide.

Note that the provided extract.ddl file references the schema used by the sample graphs in their original format. If you edit the graphs in a way which alters the metadata layout for the Writers, you must also edit the extract.ddl file before using it to create your target database schema.

8.0 master data extract guide_reva

Documents